Matrix Variate Mixture Models with the t distribution

Geoffrey Thompson

2021-11-15

library(MixMatrix)

Matrix Variate Mixture Modeling with the \(t\) Distribution

The matrix variate t distribution was introduced in a previous vignette along with an EM algorithm for maximum likelihood fitting of the parameters. This can be extended rather easily to the case of mixture models for model-based clustering.

As in the case of mixture modeling in general (Fraley and Raftery 2002; McLachlan, Lee, and Rathnayake 2019), the difference in the EM algorithm is that one is now including estimates of \(\pi_{j}\) for \(j\) in \(1,2, \ldots, g\), the estimated probabilities of group membership for the \(g\) groups in each step and weights \(\tau_{ij}\), weights for each observation \(i\) and group \(j\), where \[\pi_j = \frac{1}{n}\sum_{i = 1}^n \tau_{ij}\] and \[ \tau_{ij} = \frac{\pi_j f(x_i, \Theta_j)}{\sum_{l=1}^g \pi_l f(x_i, \Theta_l)}\]

The case of the matrix variate normal distribution can be seen in Viroli (2011), while the case of the multivariate t can be seen in Andrews, McNicholas, and Subedi (2011).

The updates on the parameters \(\Theta\) are weighted by \(\tau_{ij}\) in an Expectation/Conditional Maximization algorithm.

Usage

The matrixmixture() function fits unrestricted covariance matrices currently, but future features will implement a teigen type of covariance restriction capability for use with the \(t\) distribution. It can set means to be constant along rows or columns or both using the row.mean = TRUE and col.mean = TRUE settings.

Currently, this can perform model fitting with unrestricted covariance matrices and fixed degrees of freedom (nu) parameter or for the matrix normal distribution. It does not solve the identifiability problem, that is, that permutations of the labels will yield identical solutions.

matrixmixture function

The function takes data array x, either an argument K for how many groups there are or an initialization of a vector of probabilities prior, an optional initialization of centers and covariance matrices init (if the covariances are left blank, they will be initialized to identity matrices), and optional arguments controlling the other parameters of function, such as number of iterations and normal vs t. If model = "t" is chosen, the degrees of freedom nu must be provided, but in the future it can be estimated.

library(MixMatrix)
 set.seed(20180221)
 A <- rmatrixt(30,mean=matrix(0,nrow=3,ncol=4), df = 10) # 3x4 matrices with mean 0
 B <- rmatrixt(30,mean=matrix(1,nrow=3,ncol=4), df = 10) # 3x4 matrices with mean 2
 C <- array(c(A,B), dim=c(3,4,60)) # combine into one array
 prior <- c(.5,.5) # equal probability prior
 # create an intialization object, starts at the true parameters
 init = list(centers = array(c(rep(0,12),rep(1,12)), dim = c(3,4,2)),
              U = array(c(diag(3), diag(3)), dim = c(3,3,2)),
              V = array(c(diag(4), diag(4)), dim = c(4,4,2))
             )
 # fit model
 res<-matrixmixture(C, init = init, prior = prior, nu = 10,
                    model = "t", tolerance = 1e-2)
 print(res$centers) # the final centers
#> , , 1
#> 
#>              [,1]        [,2]        [,3]        [,4]
#> [1,] -0.005608544  0.01263007  0.03606995  0.02008999
#> [2,] -0.036807465 -0.00922546 -0.02606685 -0.03205214
#> [3,] -0.068637713 -0.06653189 -0.00752340 -0.04191753
#> 
#> , , 2
#> 
#>           [,1]      [,2]      [,3]      [,4]
#> [1,] 0.9785204 0.9459763 0.9561746 0.9857805
#> [2,] 1.0928959 0.9807877 0.8435277 1.0341407
#> [3,] 1.1422649 1.0527232 0.9998141 1.0383794
 print(res$pi) # the final mixing proportion
#> [1] 0.4999994 0.5000006
 logLik(res)
#> 'log Lik.' -442.517 (df=54)
 AIC(logLik(res))
#> [1] 993.0339
 plot(res) # the log likelihood by iteration

The default method for determining convergence is based on Aitken acceleration of the log-likelihood. However, it can be set to stop based on changes in the log-likelihood instead.

Initialization function

The packages also provides a helper function init_matrixmixture() to provide the init object for you. At present, it can either use the kmeans() function on the vectorization of the input data to provide starting centers or select random points. The ... arguments are passed to kmeans() (so nstart of other similar arguments can be set). If a partially formed init object is sent to the initializer, it will complete it. However, it will not validate that, for instance, the covariance matrices are valid. Partial supply of initial centers is also supported - that is, if fewer centers than groups are provided, the remainder will be chosen by whatever method selected.

init_matrixmixture(C, prior = c(.5,.5), centermethod = 'kmeans')
#> $centers
#> , , 1
#> 
#>             [,1]       [,2]        [,3]         [,4]
#> [1,]  0.11412574 -0.1191619 -0.04078157  0.127204662
#> [2,] -0.09798116  0.1763393 -0.02998913  0.001676302
#> [3,]  0.16767860 -0.1738115 -0.20268828 -0.079679047
#> 
#> , , 2
#> 
#>           [,1]      [,2]      [,3]      [,4]
#> [1,] 1.0480976 1.0029779 1.0264273 1.0225414
#> [2,] 0.9755583 1.0876427 1.0131553 1.2830619
#> [3,] 0.9418708 0.9333342 0.8656007 0.9025196
#> 
#> 
#> $U
#> , , 1
#> 
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    1    0
#> [3,]    0    0    1
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    1    0
#> [3,]    0    0    1
#> 
#> 
#> $V
#> NULL
init_matrixmixture(C, K = 2, centermethod = 'random')
#> $centers
#> , , 1
#> 
#>          [,1]      [,2]      [,3]      [,4]
#> [1,] 1.678144 1.1288934 1.0701532 0.8945011
#> [2,] 1.452156 1.0876125 0.6176221 1.0579284
#> [3,] 1.271707 0.8391508 1.0755241 1.3799744
#> 
#> , , 2
#> 
#>             [,1]        [,2]       [,3]        [,4]
#> [1,] -0.13670235 -0.22542286 -0.1013062  0.08921306
#> [2,]  0.02554019  0.07765350 -0.1971374 -0.09639281
#> [3,] -0.47626926 -0.03061907  0.1719943 -0.17784713
#> 
#> 
#> $U
#> , , 1
#> 
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    1    0
#> [3,]    0    0    1
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    1    0
#> [3,]    0    0    1
#> 
#> 
#> $V
#> NULL

Session Information

sessionInfo()
#> R version 4.0.5 (2021-03-31)
#> Platform: x86_64-redhat-linux-gnu (64-bit)
#> Running under: Fedora 34 (MATE-Compiz)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] magrittr_2.0.1  dplyr_1.0.7     ggplot2_3.3.5   MixMatrix_0.2.6
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.7        highr_0.9         bslib_0.3.0       compiler_4.0.5   
#>  [5] pillar_1.6.4      jquerylib_0.1.4   tools_4.0.5       digest_0.6.28    
#>  [9] jsonlite_1.7.2    evaluate_0.14     lifecycle_1.0.1   tibble_3.1.5     
#> [13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.12      DBI_1.1.1        
#> [17] yaml_2.2.1        xfun_0.26         fastmap_1.1.0     withr_2.4.2      
#> [21] stringr_1.4.0     knitr_1.34        generics_0.1.0    sass_0.4.0       
#> [25] vctrs_0.3.8       tidyselect_1.1.1  grid_4.0.5        glue_1.4.2       
#> [29] R6_2.5.1          fansi_0.5.0       rmarkdown_2.11    farver_2.1.0     
#> [33] purrr_0.3.4       scales_1.1.1      htmltools_0.5.2   ellipsis_0.3.2   
#> [37] assertthat_0.2.1  CholWishart_1.1.0 colorspace_2.0-2  labeling_0.4.2   
#> [41] utf8_1.2.2        stringi_1.7.4     munsell_0.5.0     crayon_1.4.1

All the code for easy copying

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(MixMatrix)
 set.seed(20180221)
 A <- rmatrixt(30,mean=matrix(0,nrow=3,ncol=4), df = 10) # 3x4 matrices with mean 0
 B <- rmatrixt(30,mean=matrix(1,nrow=3,ncol=4), df = 10) # 3x4 matrices with mean 2
 C <- array(c(A,B), dim=c(3,4,60)) # combine into one array
 prior <- c(.5,.5) # equal probability prior
 # create an intialization object, starts at the true parameters
 init = list(centers = array(c(rep(0,12),rep(1,12)), dim = c(3,4,2)),
              U = array(c(diag(3), diag(3)), dim = c(3,3,2)),
              V = array(c(diag(4), diag(4)), dim = c(4,4,2))
             )
 # fit model
 res<-matrixmixture(C, init = init, prior = prior, nu = 10,
                    model = "t", tolerance = 1e-2)
 print(res$centers) # the final centers
 print(res$pi) # the final mixing proportion
 logLik(res)
 AIC(logLik(res))
 plot(res) # the log likelihood by iteration


init_matrixmixture(C, prior = c(.5,.5), centermethod = 'kmeans')

init_matrixmixture(C, K = 2, centermethod = 'random')

sessionInfo()

References

Andrews, Jeffrey L., Paul D. McNicholas, and Sanjeena Subedi. 2011. “Model-Based Classification via Mixtures of Multivariate T-Distributions.” Computational Statistics & Data Analysis 55 (1): 520–29. https://doi.org/10.1016/j.csda.2010.05.019.

Fraley, Chris, and Adrian E Raftery. 2002. “Model-Based Clustering, Discriminant Analysis, and Density Estimation.” Journal of the American Statistical Association 97 (458): 611–31. https://doi.org/10.1198/016214502760047131.

McLachlan, Geoffrey J, Sharon X Lee, and Suren I Rathnayake. 2019. “Finite Mixture Models.” Annual Review of Statistics and Its Application 6: 355–78. https://doi.org/10.1146/annurev-statistics-031017-100325.

Viroli, Cinzia. 2011. “Finite Mixtures of Matrix Normal Distributions for Classifying Three-Way Data.” Statistics and Computing 21 (4): 511–22. https://doi.org/10.1007/s11222-010-9188-x.