--- title: "The EMbC R-package: quick reference" author: |- J.Garriga; F.Bartumeus\ ICREA Movement Ecology Laboratory (CEAB-CSIC)\ jgarriga@ceab.csic.es date: "Version 2.0.4, `r format(Sys.Date(), '%b %Y')`" output: pdf_document: number_sections: yes toc: no bibliography: biblio.bib vignette: > %\VignetteIndexEntry{The EMbC R-package: quick reference} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r set-options,echo=FALSE} options(width=80) ``` \graphicspath{{imgs/}} \bigskip \begin{abstract} In this document we give a brief overview of the EMbC R-package with special emphasis on its use for behavioral annotation of animal's movement trajectories. For details about the EMbC algorithm please refer to (Garriga et. al 2016) and for further details about the package please refer to the package reference manual. \end{abstract} \bigskip # The EMbC Algorithm The Expectation-maximization binary clustering (EMbC) is a general purpose, unsupervised, multi-variate, clustering algorithm [@embc2015], driven by two main motivations: (i) it looks for a good compromise between statistical soundness and ease and generality of use - by minimizing prior assumptions and favouring the semantic interpretation of the final clustering- and, (ii) it allows taking into account the uncertainty in the data. These two features make it specially suitable for the behavioral annotation of animal's movement trajectories. The method is a variant of the well sounded Expectation-Maximization Clustering (EMC) algorithm - i.e. under the assumption of an underlying Gaussian Mixture Model (GMM) describing the distribution of the data-set - but constrained to generate a binary partition of the input space. This is achieved by means of the *delimiters*, a set of parameters that discretize the input features into high and low values and define the binary regions of the input space. As a result, each final cluster includes a unique combination of either low or high values of the input variables. Splitting the input features into low and high values is what favours the semantic interpretation of the final clustering. The initial assumptions implemented in the EMbC algorithm aim at minimizing biases and sensitivity to initial conditions: (i) each data point is assigned a uniform probability of belonging to each cluster, (ii) the prior mixture distribution is uniform (each cluster starts with the same number of data points), (iii) the starting partition, (*i.e.* initial delimiters position), is selected based on a global maximum variance criterion, thus conveying the minimum information possible. The number of output clusters is $2^m$ determined by the number of input features $m$. This number is only an upper bound as some of the clusters can vanish along the likelihood optimization process. The EMbC algorithm is intended to be used with not more than 5 or 6 input features, yielding a maximum of 32 or 64 clusters. This limitation in the number of clusters is consistent with the main motivation of the algorithm of favouring the semantic interpretation of the results. The algorithm deals very intuitively with data reliability: the larger the uncertainty associated with a data point, the smaller the leverage of that data point in the clustering. With respect to close related methods like EMC and Hidden Markov Models (HMM), the EMbC is specially useful when: (i) we can expect bi-modality, to some extent, in the conditional distribution of the input features or, at least, we can assume that a binary partition of the input space can provide useful information, and (ii) a first order temporal dependence assumption, a necessary condition in HMM, can not be guaranteed. # The EMbC R-package The EMbC algorithm is of general purpose and can deal with any type of data sets or time series. However, the EMbC R-package is mainly intended for the behavioral annotation of animals' movement trajectories where an easy interpretation of the final clustering and the reliability of the data constitute two key issues, and the conditions of bi-modality and unfair temporal dependence usually hold. In particular, the temporal dependence condition is easily violated in animal's movement trajectories because of the heterogeneity in empirical time series due to large gaps, or prefixed sampling scheduling. Input movement trajectories are given either as a *data.frame* or a *Move* object from the **move** R-package. The package deals also with stacks of trajectories for population level analysis. Segmentation is based on local estimates of velocity and turning angle, eventually including a solar position covariate as a daytime indicator. The core clustering method is complemented with a set of functions to easily visualize and analyse the output: * clustering statistics, * clustering scatter-plot (2D and 3D), * temporal labeling profile (ethogram), * plotting of intermediate variables, * confusion matrix (numerical validation with respect to an expert's labeling), * visual validation versus external information (e.g. environmental data), * generation of kml or web-map documents for detailed inspection of the output. Also, some functions are provided to further refine the output, either by pre-processing (smoothing) the input data or by post-processing (smoothing, relabeling, merging) the output labeling. The results obtained for different empirical datasets suggest that the EMbC algorithm behaves reasonably well for a wide range of tracking technologies, species, and ecological contexts (e.g. migration, foraging). ## Working Environment The EMbC package has dependencies with the following packages: - **methods**, *formal methods and classes* [@methods]; - **sp**, *classes and methods for spatial data* [@sp1; @sp2]; - **suntools**, *calculate sun position, sunrise, sunset, solar noon and twilight* [@suntools]; - **mnormt**, *the multivariate normal and t distributions* [@mnormt]; - **RColorBrewer**, *ColorBrewer palettes* [@RColorBrewer]; We also suggest the package **rgl**, *3D visualization device system (OpenGL)* [@rgl], to allow for dynamic 3D scatter-plots in multivariate analyses. For researchers who are familiar with the MoveBank framework, we include a special link with the package **move**, *Visualizing and analizing animal track data* [@move], to allow users to make use of *Move* objects as input trajectories. ## Basic structure Basically, the package consists of a hierarchy of classes: - *binClst*, the main class, representing the binary clustering of a multivariate data set; - *binClstPath*, a child class of the former, representing the binary clustering of a trajectory; - *binClstStck*, basically a list of *binClstPath* objects resulting from the global clustering of a stack of trajectories. Instances of these classes are build by means of two main constructors: - *embc()*, the main core of the package, implementing the EMbC algorithm itself; this constructor takes as input a matrix of data-points and returns an object of class *binClst* with the **multivariate** binary clustering of the input data; - *stbc()*, a specific constructor for the behavioral annotation of movement trajectories; the input to this constructor is a trajectory (given either as a data.frame, a *Move* object or a list of them) and returns an object of class *binClstPath* (or any of its child classes) with the **bivariate** (velocity/turn) clustering of the trajectory; eventually it can compute a **trivariate** clustering by including a parameter indicating a solar covariate (either *height* or *azimuth*) to be used as a daytime indicator. The behavior of the constructors can be modified by means of different parameters (*e.g.* maximum number of iterations, information shown at each step, pre-smoothing of the data). The output objects have several slots containing all information related to the binary clustering (input data, intermediate computations and output data). All slots are accessible and can be used with any R function external to the package or even modified. However, **we recommend not to manually change the values in the slots in order to keep the internal consistency**. Let's load the package; ```{r, message=FALSE, warning=FALSE} library(EMbC) ``` # Class: binClst This is the core class that implements the multivariate binary clustering algorithm. The input data-set is given as a matrix with data points given as rows and input features as columns. No more than 5 (6 at most) variables should be used in order to get a meaningful set of clusters. Let's use the object *x2d* included in the package. This object contains a set of data points generated from a bivariate GMM with four components (slot x2d@D), and a labeling indicating which component generated each data point (slot x2d@L); ```{r, fig.width=5, fig.height=4.0, fig.align='center'} par(mgp=c(1.5, 0.4, 0), cex.lab=0.8, cex.axis=0.8) plot(x2d@D, col=x2d@L, xlab='X1', ylab='X2') # x2d@D is a matrix with the input data # x2d@L is a numeric vector with the reference labeling ``` ## Binary clustering We can cluster a general dataset by calling the *embc()* constructor and passing in the input data, in matrix form, and storing the result in an output variable (*e.g.* *mybc*); ```{r, } mybc <- embc(x2d@D) ``` At each iteration, the algorithm shows the iteration number, the current likelihood value, the number of effective clusters and the number of labels that have changed with respect to the previous iteration. ## Slots *mybc* is an instance of class *binClst*. Any slot of a *binClst* object is accessible and can be used by (passed to) any R function. The most basic slots of a *binClst* object are: ```{r, } slotNames(mybc) ``` - mybc@X, a matrix with the input data points; - mybc@U, a matrix of the same dimension as the input data matrix, with a reliability value (ranging from 0 to 1) for each input value, (by default is a matrix of ones); - mybc@R, a matrix with the values of the delimiters for each binary region; - mybc@P, a list where each element is a named list with the Gaussian parameters of each output cluster; - mybc@W, a matrix with the likelihood weights of each data-point with respect to each output cluster; - mybc@A, a numeric vector with the output labeling of each location, (the number of the cluster with the highest likelihood weight, coded as 1:LL, 2:LH, 3:HL, and 4:HH). ## Likelihood Plot The likelihood plot allows a visual assessment of the convergence of the algorithm; ```{r, fig.width=3.8, fig.height=3.0, fig.show='hold'} # the lkhp() function allows an offset parameter; lkhp(mybc) # left panel lkhp(mybc, 10) # right panel ``` The last iterations may show some decrease in likelihood. This is due to a slight discrepancy between binary and optimal likelihood clusterings that can appear at the last steps of the algorithm, normally involving just a few data-points at the boundaries of the binary regions (note the low number of data-points changing their labels beyond iteration number 12 where the likelihood starts decreasing). ## Clustering parameters The function *stts()* shows the statistics of the clustering. The columns $X_{i}.mn$ and $X_{i}.sd$ show the mean and standard deviation of the input features. The last two columns show the marginal distribution of the clustering in absolute (number of data-points) and relative (percentage) values; ```{r, } stts(mybc) ``` The complete set of parameters of the Gaussian mixture is accessible through the slot mybc@P. This slot is a list of inner named-lists (M for mean and S for the covariance matrix) for each cluster. For instance, the parameters for cluster 1 (LL) are; ```{r, } mybc@P[[1]] ``` The *delimiters* are accessible through the slot mybc@R where we have the *min()* and *max()* values that delimit each binary region; ```{r, } mybc@R ``` ## Clustering scatter-plot The function *sctr()* makes a scatter-plot of the data-points, showing the clusters in different colours, and depicting the binary delimiters (light grey lines) to show the binary regions; ```{r, fig.width=5, fig.height=2.5, fig.align='center'} sctr(mybc) ``` The *NC* in the legend stands for not classified points. Not classified points may appear only when performing the behavioral annotation of movement trajectories (explained later in this document) and correspond to outliers due to errors or gaps in the trajectory or, typically, to the last track of the trajectory. ## Clustering validation In a supervised case, that is in case that an expert's labeling is available, we can use this labeling to validate the results of the clustering. The expert labeling must be numerically coded and translated to the range of the number of clusters, and must be given as a numeric vector with one numeric label for each location. We can make a visual validation of the clustering versus the expert labeling by means of the *sctr()* function passing in the expert labeling vector as a second parameter; ```{r, fig.width=5, fig.height=4.8, fig.align='center'} sctr(mybc, x2d@L) # the top plot shows the clustering result; # the bottom plot shows the reference labeling; ``` We can perform a numeric assessment of the clustering in terms of a confusion matrix by means of the function *cnfm()*; ```{r, } cnfm(mybc, x2d@L) ``` The confusion matrix shows values of row *precision* and row *F-measure*, and values of column *recall* and column *F-measure*. The 3x2 subset of cells at the bottom-right show respectively: the *overall accuracy*, the *average recall*, the *average precision*, NaN, NaN and the *Macro-F-measure*. # Class: binClstPath The *binClstPath* is a *binClst* subclass intended to automatically perform the bivariate clustering of a movement trajectory, based on estimated local values of velocity and turn. It can also perform a trivariate clustering by incorporating a daytime covariate (i.e. solar height or solar azimuth). The input data-set is a trajectory given as a data.frame with timestamps, longitudes and latitudes in columns 1:3 respectively (column headers are user free). Timestamps must be given as.POSIXct() with the specific format "%Y-%m-%d %H:%M:%S". As an example, the package includes an object named *expth*. This is a synthetically generated trajectory stored as a data.frame; ```{r, } head(expth) ``` *expth* is a synthetically generated trajectory with expert labeling (note the column expth\$lbl). Further columns of data can be included in the input *data.frame* as long as the first three columns respect the required format. **Tip: by including an expert labeling with a column labeled *lbl* all validation functions will make use of it by default.** ## Bivariate velocity-turn clustering To perform the bivariate velocity/turn clustering of this trajectory we simply call the *stbc()* constructor passing in the data.frame with the time/space coordinates of the trajectory and storing the output *binClstPath* object in a variable (*e.g.* *mybcp*); ```{r, } mybcp <- stbc(expth, info=-1) # info=-1 supresses any step wise output information ``` The output object *mybcp* is a *binClstPath* instance with the following slots; ```{r, } slotNames(mybcp) ``` As a child class, a *binClstPath* object inherits and extends the set of slots of the *binClst* class. The basic slot differences with respect to the *binClst* class are: - mybcp@pth, a data.frame with three first columns named as *dTm*, *lon*, *lat* plus all additional columns of data included in the input data.frame; - mybcp@spn, a numeric vector with the computed time span between locations; - mybcp@dst, a numeric vector with the estimated distances between locations, computed as loxodromic lines; - mybcp@hdg, a numeric vector with local heading directions, given in clockwise radians from North (a value of $2\,\pi$ is used to distinguish no movement from movement heading North); - mybcp@X, is the matrix of input data that in this case is automatically generated with the estimated local values of velocity and turn; - mybcp@U, is the matrix of uncertainties that is also automatically generated based on the time-spans between locations. Slots *tracks*, *midpoints* and *bursted* are related to the bursted visualization of the trajectory (covered later in this document) and should not be manipulated. ## Basic functionality Because of class inheritance, all functionality described for the *binClst* class (*e.g.* likelihood plot, clustering parameters, scatter-plot, validation) holds for a *binClstPath* instance. ```{r, } stts(mybcp) ``` ```{r, fig.width=5, fig.height=3.0, fig.align='center'} sctr(mybcp) ``` ```{r, } cnfm(mybcp) # the expert labeling given in expth$lbl is used by default ``` Nonetheless, the *binClstPath* class has some particular functionalities of special interest for the case of behavioral annotation of movement trajectories. These functionalities are described in the following. ### Labeling profile The function *lblp()* plots the temporal series of data and the temporal profile of the behavioral labeling; ```{r, fig.width=5, fig.height=3.5, fig.align='center'} # lims=c(a, b) limits the plot to a chunk of the trajectory lblp(mybcp, lims=c(100, 500)) ``` ### Fast visualization of the annotated trajectory The function *view()* shows the annotated trajectory and a top panel with the temporal sequence of behaviors; ```{r, fig.width=5, fig.height=3.5, fig.align='center'} # this function allows a parameter lims=c(a,b) as well view(mybcp, lims=c(100, 500)) ``` ### Detailed inspection of the annotated trajectory We can generate *kml* or *html* documents for a detailed inspection of the output by means of *google-earth* or the user's system browser. The package allows two types of visualization of the annotated trajectory: a point-wise visualization (functions *pkml()* or *pmap()*) or a burst-wise visualization (functions *bkml()* or *bmap()*) [@embc2015]; ```{r, eval=FALSE} # point-wise kml doc generation; # display=TRUE launches google-earth from within R; pkml(bc, display=TRUE) ``` By default, the *kml* or *html* documents are named with a *Sys.time()* based name and saved in a folder *embcdocs* automatically generated in the user's home directory. This can be modified by means of the corresponding parameters. The burst-wise visualization requires the computation of burst segments and midpoints. This is computed only the first time that a burst-visualization of a trajectory is requested. In case of long trajectories, this process can take some time. ### Plot intermediate variables Intermediate data computed by the *stbc()* constructor and stored in the *binClstPath* object can be easily plotted with automatic formatting and labeling of axes; ```{r, fig.width=5, fig.height=4.8, fig.align='center'} # plotting time-spans, distances and heading directions; # this is the default behavior when we just pass the binClstPath instance; varp(mybcp) ``` ```{r, fig.width=5, fig.height=3.5, fig.align='center'} # plotting input data (estimated local values of velocity and turn); varp(mybcp@X) ``` ```{r, fig.width=5, fig.height=3.5, fig.align='center'} # plotting certainties associated to each data-point (and input feature) varp(mybcp@U) ``` Indeed, the function *varp()* is a wrapper for the R *plot()* function. The purpose of this function is simply to ease the visualization of intermediate variables by formatting and labeling the axes accordingly to each one. ## Using *Move* objects from the R-package *move* **Note: the dependency with respect to the *move* R-package has been dropped, and the use of the old *binClstMove* objects is now deprecated. Nonetheless *Move* objects can still be passed directly to the *stbc()* function.** This is intended for users having trajectories in *Movebank* () and familiarized with the *move* R-package. Let's use the *leroy* data in the *Move R-package*. ```{r, message=FALSE, warning=FALSE} library(move) data(leroy) ``` *leroy* is a GPS trajectory of an urban Fisher (*Martes pennati*) with `r length(leroy@timestamps)` tracks, spanning from `r leroy@timestamps[1]` to `r leroy@timestamps[length(leroy@timestamps)]`, with a mean time interval between tracks of `r round(mean(diff(leroy@timestamps)),1)` minutes. *Move* objects can be passed directly to the *stbc()* constructor; ```{r, warning=FALSE} # leroy is passed directly to the constructor leroybc <- stbc(leroy, info=-1) ``` ## Trivariate clustering: including a daytime covariate Daytime covariates refer to the solar position. This can be given as solar height in degrees above the horizon (night/day distinction), or by solar azimuth in degrees from north (sunrise/sunset distinction). Including daytime covariates is the natural way of incorporating time information in the clustering of an animal's movement trajectory, with the potential advantage of increasing the maximum number of output clusters to $2^3=8$, *i.e.* the number of movement behaviors that can potentially be distinguish. A trivariate clustering including a daytime covariate is done by means of the parameter *scv* with possible values 'height' or 'azimuth'; ```{r, } leroybc3 <- stbc(leroy, scv='height', info=-1) ``` The output of the *stbc()* constructor is still a *binClstPath* (the *binClstMove* object of previous versions is deprecated). As we included a covariate, *leroybc3* corresponds now to a trivariate binary clustering and therefore its functionality presents some particularities. Let's see the statistics of the clustering; ```{r, } stts(leroybc3) ``` Features are ordered as X1:daytime, X2:velocity and X3:turn. Note that highs and lows for daytime (the solar height in degrees above the horizon) do not necessarily correspond to daytime or night-time clusters (note the negative mean for X1 in HXX clusters). This is so because almost all of the activity of this animal happens during the night and it is more likely to discern different behaviors along night-time. ## Trivariate clustering scatter plot By default, the *sctr()* function of a trivariate clustering depicts a double scatter-plot corresponding to low and high values of the covariate respectively. This can be changed by means of the parameter *showVars=c()*. ```{r, fig.width=6, fig.height=3.5, fig.align='center'} sctr(leroybc3, showVars=c(1, 2, 3)) # showVars=c(1,2,3) is the default option and it is only shown for illustrative purposes # by default the background colour is set to light-grey to enhance visibility # the "bg"" parameter allows changing this default behavior ``` If the R-package *rgl* is installed one can use the function *sct3()* to get a dynamic 3D (i.e. can be zoomed and rotated) plot, more useful for a visual understanding of the clusters. ```{r, eval=FALSE} sct3(leroybc3, showClst=c(5, 6, 7, 8)) # with showClst=c() we can restrict the plot to a particular subset of clusters ``` The *sct3()* function is defined for and inherited from the *binClst* class, and therefore intended for a general multivariate clustering. If the number of input features is greater than 3 and *showVars=c()* is not specified, the first three variables are used by default. ## Smoothing When clustering a time series the EMbC disregards the temporal information. As a result, the output labeling may reveal small (possibly irrelevant) changes in behavior framed in a broader temporal context (*e.g.* a long-term predominant behavioral mode). The package includes two possibilities to account for the temporal information in the time series and smooth out the fine grain locality of the output labeling. The *smth()* function applies a post-smoothing procedure [@embc2015] to the output labeling and returns a smoothed copy of the input instance; ```{r, } # dlta is the maximum likelihood difference to accept a relabeling # dlta=1 (accept all changes) is the default behavior postbc3 <- smth(leroybc3, dlta=0.9) ``` Alternatively, a pre-smoothing of the input data is also possible by means of the parameter *smth* of the *stbc()* constructor. ```{r, } # smth sets the smoothing time window length in hours prebc3 <- stbc(leroy, smth=1, scv='height', info=-1) ``` The *lblp()* function allows comparing two output labelings, adding a bottom line indicating the differences; ```{r, fig.width=6, fig.height=3.5, fig.align='center'} lblp(postbc3, smth(prebc3), lims=c(200, 600)) # of note: # although performing a pre-smoothing, we can still aply a post-smoothing; # there is no real need to instantiate the smoothed copy of prebc3; # this is useful for saving memory in case of long trajectories; ``` ## Relabeling Note that by pre-smoothing the input data, cluster 5 (HLL) has been merged into cluster 6 (HLH) and we get a final clustering with only 7 different behaviors. When merging occurs, the semantics of the final labeling is somewhat misleading because the final labeling is only a result of how the algorithm evolved until reaching the merging point. In any case, the label should be read as HLX, that is, by taking into account that the last feature (in this case the turn) is meaningless given the values of the rest (*i.e* turn can be either H or L given H values of daytime and L values of velocity). Using the *pkml()* function we can visualize which locations correspond to cluster HLH; ```{r, eval=FALSE} pkml(smth(prebc3), showClst=6, display=TRUE) ``` \begin{figure}\centering \includegraphics[width=12cm,height=5.5cm]{pkml0.png} \caption{Fisher (\textit{Martes pennati}) foraging trajectory. A kml point-wise view of the annotated trajectory showing only the HLH locations. The blob shows the spatial clustering of these locations most likely indicating the nest.} \label{pkml0} \end{figure} By combining the spatially clustered distribution of locations HLX (Figure \ref{pkml0}) with the semantics of the cluster (high daytime, low velocity), we could tell that these locations are most probably indicating the nests. Obviously, the package does not deal with labels like HLX. However, one can change labels as desired (even to manually force the merging of two clusters). In this case, we would probably feel more comfortable by relabeling the cluster HLH (cluster number 6) as HLL (cluster number 5) to suggest a more clear semantics of *resting* behavior; ```{r, } rlbl(prebc3, 6, 5) ``` Note that the function *rlbl()* does not return a relabeled copy of the input instance, instead it relabels the self instance. Nonetheless, the parameters of the clustering remain unchanged. The relabeling is effective only for visualization purposes and can be easily reversed by means of the parameter "reset". ## Validation versus external information The *chkp()* function is similar to *lblp()* but plots the labeling profile versus a control variable (*e.g.* environmental information). The control variable must be given as a numeric vector that is depicted as coloured background bars (with specific parameters to control the colouring and legend labels); ```{r, fig.width=6, fig.height=3, fig.align='center'} chkp(smth(prebc3), lims=c(200, 600)) # the solar height is the control variable used by default; # note the relabeling we did before; ``` # Class: binClstStck The *binClstStck* is an extension (not a child class) of the *binClstPath* class particularly designed to work with multiple trajectories. This is intended for population level analysis from trajectories of several individuals, or period level analysis by splitting long trajectories of several years. To illustrate this let's figure out two trajectories from our example path, simulating two different individuals; ```{r, fig.width=3.2, fig.height=3.3, fig.show='hold'} tmp <- runif(nrow(expth)) # simulated trajectory of individual 1 expth1 <- expth[which(tmp<=0.5), ] # simulated trajectory of individual 2 expth2 <- expth[which(tmp>=0.5), ] ``` To perform the clustering of a stack of trajectories we pass the individual trajectories to the *stbc()* constructor as a list (either of data.frame trajectories, *Move* objects, or a mixture of them); ```{r, } # we can combine data.fame trajectories and move objects # only for illustrative purposes !!! mystck <- stbc(list(expth1, expth2, leroy), info=-1) ``` In this case, the *stbc()* constructor returns an instance of the *binClstStck* class. In general, all the functionality described for a *binClst* class will work for a *binClstStck* instance; ```{r, } stts(mystck) ``` ```{r, fig.width=5, fig.height=3.5, fig.align='center'} sctr(mystck) ``` The exception is the *cnfm()* function. This function will work only if expert's labeling is supplied for all trajectories in the stack (in our example, *leroy* does not have expert's labeling); ```{r, } cnfm(mystck) # this will only work when expert labeling is given for all trajectories in the stack ``` ## binClstStck slots It is worth noting that a *binClstStck* instance is not a binary clustering object itself. Instead, it is an object with two slots: ```{r, } slotNames(mystck) ``` - slot mystck@bC is a *binClst* object with the population level clustering, thus it has no path associated, and functions like *view()*, *pkml()* or *bkml()* will not work; ```{r, } class(mystck@bC) ``` - slot mystck@bCS is a list of *binClstPath* objects, which are the results of the population level clustering upon each individual; ```{r, } class(mystck@bCS) ``` Each element in mystck@bCS is a *binClstPath* instance corresponding to each individual path given in the input data list; ```{r, } lapply(mystck@bCS, class) ``` It is important to keep this in mind when applying the above functions to either the population (mystck@bC, a *binClst* instance) or the individual (mystck@bCS[[i]], a *binClstPath* instance) levels. ## Select an individual out of the stack For ease of use, the function *slct()* allows selecting an individual's clustering out of the population level; ```{r, } bcInd1 <- slct(mystck,1) ``` As usual, it is not necessary to instantiate each individual; ```{r, fig.width=3.7, fig.height=3.2, fig.show='hold'} sctr(slct(mystck, 1)) # left panel sctr(slct(mystck, 3)) # right panel ``` ```{r, } # sctr(slct(mystck,1)) yields the same output as sctr(bcInd1) or sctr(mystck@bCS[[1]]); ``` ## Comparing individual's behavior with population's average behavior We can use all the functionality of a *binClstPath* object that allows comparisons (*e.g.* *sctr()*, *lblp()*, *cnfm()*) to make numeric assessments or visualizations of diferences among individuals or among individuals and population: - we can compare individuals with their correspondent out of the population clustering; ```{r, } cnfm(stbc(expth1, info=-1), slct(mystck, 1)) # stbc(expth1, info=-1) is the individual level clustering corresponding to individual 1; # slct(mystck, 1) is the population level clustering corresponding to individual 1; ``` - or we can compare individuals within the population; ```{r, fig.width=6, fig.height=3.5, fig.align='center'} lblp(slct(mystck, 1), slct(mystck, 2)) ``` # References