The **cvsem** package provides cross-validation (CV) of
structural equation models (SEM) across a user-defined number of folds.
CV is based on computing the discrepancy among the held-out test sample
covariance and the model implied covariance from the training samples.
This approach of cross-validating SEM’s is described in Cudeck and
Browne (1983) and Browne and Cudeck (1992). The individual models are fitted
via the **lavaan** package (Rosseel 2012) to obtain the model implied
covariance matrix. The discrepancy of the implied matrix to the test
sample covariance matrix is obtained via a pre-specified metric
(defaults to Kullback-Leibler divergence aka. Maximum Likelihood
discrepancy). The `cvsem`

function returns the average
discrepancy together with a corresponding standard error for each tested
model.

Currently, the provided model code needs to follow one of
**lavaan**’s allowed specifications.

You can install the development version of **cvsem**
from GitHub with:

```
# install.packages("devtools")
::install_github("AnnaWysocki/cvsem") devtools
```

Cross-validating the Holzingerswineford1939 dataset

Load package and read in data from the **lavaan**
package:

```
library(cvsem)
<- lavaan::HolzingerSwineford1939 example_data
```

Add column names

```
colnames(example_data) <- c("id", "sex", "ageyr", "agemo", "school", "grade",
"visualPerception", "cubes", "lozenges", "comprehension",
"sentenceCompletion", "wordMeaning", "speededAddition",
"speededCounting", "speededDiscrimination")
```

Define some models to be compared with `cvsem`

using
`lavaan`

notation:

```
<- 'comprehension ~ sentenceCompletion + wordMeaning'
model1
<- 'comprehension ~ meaning
model2
## Add some latent variables:
meaning =~ wordMeaning + sentenceCompletion
speed =~ speededAddition + speededDiscrimination + speededCounting
speed ~~ meaning'
<- 'comprehension ~ wordMeaning + speededAddition' model3
```

Gather models into a named list object with `cvgather`

.
These could also be fitted `lavaan`

objects based on the same
data.

`<- cvgather(model1, model2, model3) models `

Define number of folds `k`

and call `cvsem`

function. Here we use `k=10`

folds. CV is based on the
discrepancy between test sample covariance matrix and the model implied
matrix from the training data. The discrepancy among sample and implied
matrix is defined in `discrepancyMetric`

. Currently three
discrepancy metrics are available: `KL-Divergence`

,
Generalized Least Squares `GLS`

, and Frobenius Distance
`FD`

. Here we use `KL-Divergence`

.

```
<- cvsem( data = example_data, Models = models, k = 10, discrepancyMetric = "KL-Divergence")
fit #> [1] "Cross-Validating model: model1"
#> [1] "Cross-Validating model: model2"
#> [1] "Cross-Validating model: model3"
```

Print fitted `cvsem`

-object. Note, the model with the
smallest (best) discrepancy is listed first. The metric reflects the
average of the discrepancy metric across all folds (aka. expected
cross-validation index (ECVI)) together with the associated standard
error.

```
fit#> Cross-Validation Results of 3 models
#> based on k = 10 folds.
#>
#> Model E(KL-D) SE
#> 1 model1 1.29 0.44
#> 3 model3 2.28 0.50
#> 2 model2 3.48 0.64
```

Browne, Michael W., and Robert Cudeck. 1992. “Alternative Ways of
Assessing Model Fit.” *Sociological Methods & Research* 21:
230–58.

Cudeck, Robert, and Michael W. Browne. 1983. “Cross-Validation of
Covariance Structures.” *Multivariate Behavioral Research* 18:
147–67. https://doi.org/10.1207/s15327906mbr1802_2.

Rosseel, Yves. 2012. “lavaan: An R Package for Structural Equation
Modeling.” *Journal of Statistical Software*. https://doi.org/10.18637/jss.v048.i02.