--- title: "C. Unsupervised multiblock analysis" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{C. Unsupervised multiblock analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=6, fig.height=4 ) # Legge denne i YAML på toppen for å skrive ut til tex #output: # pdf_document: # keep_tex: true # Original: # rmarkdown::html_vignette: # toc: true ``` ```{r setup} # Start the multiblock R package library(multiblock) ``` # Unsupervised methods The following unsupervised methods are available in the _multiblock_ package (function names in parentheses): * SCA - Simultaneous Component Analysis (_sca_) * GCA - Generalised Canonical Analysis (_gca_) * GPA - Generalised Procrustes Analysis (_gpa_) * MFA - Multiple Factor Analysis (_mfa_) * PCA-GCA (_pcagca_) * DISCO - Distinct and Common Components with SCA (_disco_) * HPCA - Hierarchical Principal component analysis (_hpca_) * MCOA - Multiple Co-Inertia Analysis (_mcoa_) * JIVE - Joint and Individual Variation Explained (_jive_) * STATIS - Structuration des Tableaux à Trois Indices de la Statistique (_statis_) * HOGSVD - Higher Order Generalized SVD (_hogsvd_) The following sections will describe how to format your data for analysis and invoke all methods from the list above. # Formatting data for multiblock data analysis Data blocks are best stored as named lists for use with unsupervised methods in this package. If also column names and row names are used for all blocks, these can be used for easy labelling in plots supplied by the _pls_ package. See examples below for illustrations of this. ```{r} # Load potato data data(potato) class(potato) # data.frames can contain matrices as variables, # thus becoming object linked lists of blocks. str(potato[1:3]) # Explicit conversion to a list potList <- as.list(potato[1:3]) str(potList) ``` # Method interfaces All unsupervised methods supplied by this package share a common interface which expects a list of blocks as the first input. Methods that are imported from other packages are wrapped in a function that gives the mentioned interface. Results from the imported method are stored in a separate slot in the output in case specialised _plot_ or _summary_ functions are available or direct inspection is needed. If default parameters are used, a single list of blocks with suitably linked matrices (shared objects or variables) will result in a basic analysis (see first code block below). In addition, all methods have parameters that control their behaviour, e.g., number of components, convergence criteria etc. ## Shared sample mode The following block of code loads a multiblock data set, extracts three blocks and runs through all included unsupervised methods having shared sample mode using the same interface. ```{r} # Object linked data data(potato) potList <- as.list(potato[c(1,2,9)]) suppressWarnings( # FactoMineR <=2.3 uses recycling of length 1 array. invisible({capture.output({ # DISCOsca in package RegularizedSCA is highly verbose. pot.sca <- sca(potList) pot.gca <- gca(potList) pot.gpa <- gpa(potList) pot.mfa <- mfa(potList) pot.pcagca <- pcagca(potList) pot.disco <- disco(potList) pot.hpca <- hpca(potList) pot.mcoa <- mcoa(potList) })})) ``` ## Shared variable mode The following block of code loads a sensory data set, extracts blocks and runs through all included unsupervised methods having shared variable mode using the same interface. ```{r} # Shared variable mode data data(candies) candyList <- lapply(1:nlevels(candies$candy), function(x)candies$assessment[candies$candy==x,]) invisible({capture.output({ # jive in package r.jive is highly verbose. can.sca <- sca(candyList, samplelinked = FALSE) can.jive <- jive(candyList) can.statis <- statis(candyList) can.hogsvd <- hogsvd(candyList) })}) ``` ## Common output elements across methods Output from all methods include slots called _loadings_, _scores_, _blockLoadings_ and _blockScores_, or a suitable subset of these. An _info_ slot describes which types of (block) loadings/scores are in the output. There may be various extra elements in addition to the common elements, e.g. coefficients, weights etc. ```{r} # SCA used with shared variable mode data returns block loadings and common scores: names(pot.sca) summary(pot.sca) # MFA stores individual PCA scores and loadings as block elements: names(pot.mfa) summary(pot.mfa) ``` ## Scores and loadings Functions for accessing scores and loadings are based on functions from the _pls_ package, but extended with a _block_ parameter to allow extraction of common/global scores/loadings and their block counterparts. The default block is 0, corresponding to the common/global block. Block scores/loadings can be accessed by number or name. ```{r} # Global scores plotted with object labels scoreplot(pot.sca, labels = "names") ``` ```{r} # Block loadings for Sensory block with variable labels in scatter format loadingplot(pot.sca, block = "Sensory", labels = "names") ``` ```{r} # Non-existing elements are swapped with existing ones with a warning. sc <- scores(pot.sca, block = 1) ``` ## Plot from imported package Some methods in the package are wrappers for imported methods from other packages. Using the stored object from an imported method (see \$statis code below), one can exploit methods from the original package to expand on the methods available in this package. An example is the summary plot for the _statis_ method in the package _ade4_. ```{r} # Apply a plot function from ade4 (no extra import required). plot(can.statis$statis) ```