This vignette shows how you can further investigate your models using Monte Carlo simulations. In addition to reporting power, simulations allows you to investigate parameter bias, model misspecification, Type I errors and convergence issues. For instance, even if the analytical power calculations work well, it can be useful to check the small sample properties of the model, e.g. estimates of cluster-level random effects with only a few clusters.
The simulation-function accepts the same study setup-object that we’ve used in the two- and three-level power vignettes. The simulation-function supports all models that are available in powerlmm. As an example we’ll investigate the partially nested three-level model.
When doing simulations it can be a good idea to specify the model using raw values, to ensure all parameters are reasonable. Here I use estimates from a real clinical trial.
p <- study_parameters(n1 = 11,
n2 = 10,
n3 = 6,
fixed_intercept = 37,
fixed_slope = -0.64,
sigma_subject_intercept = 2.8,
sigma_subject_slope = 0.4,
sigma_cluster_intercept = 0,
cor_subject = -0.2,
icc_slope = 0.05,
sigma_error = 2.6,
dropout = dropout_weibull(proportion = 0.3,
rate = 1/2),
partially_nested = TRUE,
cohend = -0.8)
p
The get_monte_carlo_se
-function accepts an power object, and will return the 95 % interval for a given amount of simulations. This is useful to gauge the precision of the empirical power estimates.
x <- get_power(p)
get_monte_carlo_se(x, nsim = c(1000, 2000, 5000))
We would need about 5000 simulation if we want our empirical power to be +/- 1% from the analytical power.
Now, let’s run the simulation. First we need to write the lme4-formula, since the simulated data sets are analyzed using lme4::lmer
. Then we pass the study_parameters
-object to the simulate
function, and when the simulation is finished we use summary
to view the results. You can run the simulation in parallel by settings cores
> 1. The parallel computations will be done using the parallel package. If performed on a Windows machine or inside a GUI, then PSOCK clusters are used, if on Unix/macOS and the simulation is run non-interactively in a terminal then forking is used.
f <- "y ~ treatment * time + (1 + time | subject) + (0 + treatment:time | cluster)"
cores <- parallel::detectCores() # use all cores
res <- simulate(object = p,
nsim = 5000,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
summary(res)
It is also possible to automatically create the formula
by leaving it blank. Then the lmer-formula
is created by including the parameters that are not specified as NA
or NULL
in the model, see ?create_lmer_formula
. N.B., parameters specified as 0 are included in the model.
Let’s compare the correct partially nested model to a two-level linear mixed-effects model. The formula argument accepts a named list with a correct
and wrong
formula. By setting cohend
to 0 the simulation results tells us the empirical Type I errors for the correct and misspecified model.
p <- update(p, cohend = 0)
f <- list("correct" = "y ~ treatment * time + (1 + time | subject) + (0 + treatment:time | cluster)",
"wrong" = "y ~ treatment * time + (1 + time | subject)")
res <- simulate(object = p,
nsim = 5000,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
summary(res)
All of the parameters are summarized and presented for both models. Since we specified satterthwaite = TRUE
, empirical power is both based on the Wald Z test and the Wald t test using Satterthwaite’s df approximation (from the lmerTest package).
We can also simulate multiple designs and compare them. Let’s compare 6 vs 12 clusters in the treatment arm, and ìcc_slope
of 0.05 and 0.1, and a Cohen’s d of 0.5 and 0.8.
p <- study_parameters(n1 = 11,
n2 = 10,
n3 = c(6, 12),
fixed_intercept = 37,
fixed_slope = -0.64,
sigma_subject_intercept = 2.8,
sigma_subject_slope = 0.4,
sigma_cluster_intercept = 0,
cor_subject = -0.2,
icc_slope = c(0.05, 0.1),
sigma_error = 2.6,
dropout = dropout_weibull(proportion = 0.3,
rate = 1/2),
partially_nested = TRUE,
cohend = -c(0.5, 0.8))
f <- "y ~ treatment * time + (1 + time | subject) + (0 + treatment:time | cluster)"
res <- simulate(object = p,
nsim = 5000,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
The setup and simulation is the same, except that we define vectors of values for some of the parameters. In this scenario we have a total of 2 * 2 * 2 = 8 different combination of parameter values. For each of the 8 setups nsim
number of simulations will be performed. With multiple designs summary
works a bit differently. Instead of summarizing all parameters, we now have to choose which parameters to summarize.
# Summarize the 'time:treatment' results
summary(res, para = "time:treatment", type = "fixed", model = "correct")
# Summarize the cluster-level random slope
summary(res, para = "cluster_slope", type = "random", model = "correct")