--- title: "Overview of MixtComp Object" author: "Quentin Grimonprez" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Overview of MixtComp Object} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` `mixtCompLearn` returns an object of class *MixtCompLearn* and *MixtComp* whereas `mixtCompPredict`returns an object of class *MixtComp*. ## MixtComp Object Overview of output object with variables named *categorical*, *gaussian*, *rank*, *functional*, *poisson*, *nBinom* and *weibull* with model respectively *Multinomal*, *Gaussian*, *Rank_ISR*, *Func_CS* (or *Func_SharedAlpha_CS*), *Poisson*, *NegativeBinomial* and *Weibull*. In case of a successfull run, the output object is a list of list organized as follows: ```text output |_______ algo __ nbBurnInIter | |_ nbIter | |_ nbGibbsBurnInIter | |_ nbGibbsIter | |_ nInitPerClass | |_ nSemTry | |_ mode | |_ nInd | |_ confidenceLevel | |_ nClass | |_ ratioStableCriterion | |_ nStableCriterion | |_ basicMode | |_ hierarchicalMode | |_______ mixture __ BIC | |_ ICL | |_ lnCompletedLikelihood | |_ lnObservedLikelihood | |_ IDClass | |_ IDClassBar | |_ delta | |_ runTime | |_ nbFreeParameters | |_ completedProbabilityLogBurnIn | |_ completedProbabilityLogRun | |_ lnProbaGivenClass | |_______ variable __ type __ z_class | |_ categorical | |_ gaussian | |_ ... | |_ data __ z_class __ completed | | |_ stat | |_ categorical __ completed | | |_ stat | |_ ... | |_ functional __ data | |_ time | |_ param __ z_class __ stat | |_ log | |_ paramStr |_ functional __ alpha __ stat | | |_ log | |_ beta __ stat | | |_ log | |_ sd __ stat | | |_ log | |_ paramStr |_ rank __ mu __ stat | | |_ log | |_ pi __ stat | | |_ log | |_ paramStr | |_ gaussian __ stat | |_ log | |_ paramStr |_ poisson __ stat | |_ log | |_ paramStr |_ ... ``` ### warnLog In case of an unsuccessfull run, the output object is a list containing an element **warnLog** with all the warnings returned by MixtComp. ### algo A copy of *algo* parameter. - **nbBurnInIter** Number of iterations of the burn-in part of the SEM algorithm. - **nbIter** Number of iterations of the SEM algorithm. - **nbGibbsBurnInIter** Number of iterations of the burn-in part of the Gibbs algorithm. - **nbGibbsIter** Number of iterations of the Gibbs algorithm. - **nInitPerClass** Number of individuals used to initialize each cluster. - **nSemTry** Number of try of the algorithm for avoiding an error. - **confidenceLevel** Confidence level for confidence bounds for parameter estimation. - **ratioStableCriterion** Stability partition required to stop earlier the SEM . - **nStableCriterion** Number of iterations of partition stability to stop earlier the SEM. - **nInd** number of samples in the dataset - **nClass** number of class of the mixture - **mode** "predict" for `mixtCompPredict` or "learn" for `mixtCompLearn` - **basicMode** If TRUE, mixtCompLearn has run in basic mode (mode using classic R formatting for missing data and with automatic detection of model) - **hierarchicalMode** If TRUE, mixtCompLearn has run in hierarchical mode (learn a model with two classes, then split each classes in two and so on) ### mixture - **BIC** value of BIC - **ICL** value of ICL - **nbFreeParameters** number of free parameters of the mixture model - **lnObservedLikelihood** observed loglikelihood - **lnCompletedLikelihood** completed loglikelihood - **IDClass** entropy used to compute the discriminative power (see computeDiscrimPowerVar function) - **IDClassBar** entropy used to compute the discriminative power (see computeDiscrimPowerVar function) - **delta** entropy used to compute the similarities between variables (see heatmapVar function) - **completedProbabilityLogBurnIn** evolution of the completed log-probability during the burn-in period (can be used to check the convergence and determine the ideal number of iteration) - **completedProbabilityLogRun** evolution of the completed log-probability after the burn-in period (can be used to check the convergence and determine the ideal number of iteration) - **runTime** a list containing the execution time in seconds of different part of the algorithm - **lnProbaGivenClass** log-probability of each sample for each class times the proportion): $\log(\pi_k)+\log(P(X_i|z_i=k))$ ### variable #### type Named list (according to variable names) containing model used for each variable (e.g. "Gaussian"). #### data Except for functional models and LatentClass, data contains, for each variable, two elements: *completed* and *stat*. *completed* contains the completed data and *stat* contains statistics about completed data. The format is detailed below according to the model. - **LatentClass** Two elements: *completed* and *stat*. *completed* contains the completed data. *stat* is a matrix with the same number of columns as the number of class. For each sample, it contains the $t_{ik}$ (probability of $x_i$ to belong to class *k*) estimated with the imputed values during the Gibbs at the end of each iteration after the burn-in phase of the algorithm. - **Gaussian/Poisson/NegativeBinomial/Weibull** *stat* is a matrix where each row corresponds to a missing data and contains 4 elements: index of the missing data, median, 2.5% quantile, 97.5% quantile (if the confidenceLevel parameter is set to 0.95) of imputed values during the Gibbs at the end of each iteration after the burn-in phase of the algorithm. - **Multinomial** *stat* is a named list where each element corresponds to a missing data. The name of the element corresponds to the index of the missing data. It contains a matrix containing the imputed values, during the Gibbs at the end of each iteration after the burn-in phase of the algorithm, and their frequency. - **Rank_ISR** *stat* is a named list where each element corresponds to a missing data. The name of the element corresponds to the index of the missing data. It contains a matrix containing the imputed values, during the Gibbs at the end of each iteration after the burn-in phase of the algorithm, and their frequency. - **Func_CS** and **Func_SharedAlpha_CS** Two elements: *data* and *time*. *time* (resp. *data*) is a list containing the time (resp. value) vector of the functional for each sample. - **Other Models** One element: *completed*, a matrix/vector containing the completed version of the dataset. #### param For one variable, it contains a list with estimated parameters (*param*), log recorded during the SEM (*log*) and hyperparameters if any (*paramStr*). The output format depends of the model but in most of the case, *stat* is a matrix with 3 columns containing the median values of estimated parameters and quantile ate the desired confidence level, *log* is matrix containing the estimated proportion during the M step of each iteration of the algorithm after the burn-in phase and *paramStr* is a string. For the meaning of the parameters, user can refer to the documentation [data format](dataFormat.html). - **LatentClass** A list of 3 elements: *stat*, *log*, *paramStr*. *log* is matrix containing the estimated proportion during the M step of each iteration of the algorithm after the burn-in phase. *stat* is a matrix containing the median (and quantiles corresponding to the confidenceLevel parameter) of the estimated proportion. The median proportions are the returned proportions. *paramStr* contains `""`. - **Gaussian** The *stat* matrix has 2\*nClass rows. For a class $k$, parameters are mean ($\mu_k$) and sd ($\sigma_k$). - **Poisson** The *stat* matrix has nClass rows. For a class $k$, the parameter is lambda ($\lambda_k$). - **NegativeBinomial** The *stat* matrix has 2\*nClass rows. For a class $k$, parameters are n ($n_k$) and p ($p_k$). - **Weibull** The *stat* matrix has 2\*nClass rows. For a class $j$, parameters are k (shape) ($k_j$) and lambda (scale) ($\lambda_j$). - **Multinomial** *paramStr* contains `"nModality: J"` where $J$ is the number of modalities. The *stat* matrix has J\*nClass rows. For a class $k$, parameters are probabilities to belong to modality $J$. - **Rank_ISR** *paramStr* contains `"nModality: J"` where $J$ is the length of the rank (number of sorted objects). Two lists (named *mu* and *pi*) of 2 elements: *stat*, *log*. For *pi*, *stat* is a matrix with nClass rows. For a class $k$, parameter is pi ($pi_k$). For *mu*, *stat* is a list with nClass elements. For a class $k$, a list is returned with the mode of the parameter ($\mu_k$), and the frequency of the mode during the SEM algorithm after the burn-in phase. - **Func_CS** and **Func_SharedAlpha_CS** *paramStr* contains `"nSub: S, nCoeff: C"` where $S$ is the number of subregressions and $C$ the number of coefficients of each regression. Three lists (named *alpha*, *beta* and *sd*) of 2 elements: *stat*, *log*. For *alpha*, *stat* is a matrix with 2\*S\*nClass rows. For a class $k$ and a subregression $s$, parameters are the estimated coefficients of a logistic regression controlling the transition between subregressions. For *beta*, *stat* is a matrix with S\*C\*nClass rows. For a class $k$ and a subregression $s$, parameters are the estimated coefficient of the regression. For *sd*, *stat* is a matrix with S\*nClass rows. For a class $k$ and a subregression $s$, the parameter is the standard deviation of the residuals of the regression. ## MixtCompLearn Object A *MixtCompLearn* object is the output of `mixtCompLearn` function. It contains one or several $MixtComp$ object. - **nClass** A vector containing the number of classes tested - **crit** ICL and BIC values for each value of *nClass* - **criterion** "BIC" or "ICL", the criterion used to choose the number of classes - **algo**, **mixture**, **variable**, **warnLog** MixtComp object associated with the best number of classes - **res** A list containing one *MixtComp* object per number of class. The first element (res[[1]]) corresponds to the *MixtComp* object for a number of classes of *nClass[1]* - **nRun** Number of runs for each number of classes - **totalTime** Total running time