This example comes directly from Maslen et al. 2023, for more details behind the methodology go to this paper. If you use this package for sample size analysis in an associated paper or otherwise, please cite Maslen et al. 2023.

Researchers within the Operation Crayweed Restoration Project in
Sydney are restoring the locally extinct macro-algae *Phyllospora
comosa* (“crayweed”: see Coleman et al.,
2008; Campbell
et al., 2014; Vergés et
al., 2020 and are interested in the effect of this restoration on
associated ecological communities (Marzinelli
et al., 2014, 2016;
Wood et al.,
2019). Pilot data have already been collected, where the abundance
of fish species in nine open ocean sites have been recorded in 2015. We
are interested in observing if there is a change in mean fish abundance
between control sites (those in Sydney without crayweed) and restored
sites (similar sites where crayweed has recently been transplanted).
There are plans to collect more data in the future, however there is an
upper bound of approximately 24 possible spatially independent restored
or control coastal bays/sites within the Sydney region and
surroundings.

In this example we attempt to answer the following experimental design questions:

- How many sites are required to likely detect \(20\%\) differences between treatments?
- Under the maximum independent sampling design of 24 sites, what are the size of effects that are likely to be detected?

Ecopower can be installed from CRAN using
`install.packages("ecopower")`

, however for the latest
version users can install from GitHub with
`devtools::install_github("BenMaslen/ecopower")`

.

The first 34 columns of the `fish`

data set make up the
multivariate abundance matrix of fish counts across our 9 sites.

Below we first fit a `manyglm`

model, and then a copula
model using the `cord`

function from the
`ecoCopula`

package to our pilot data as our data generating
model.

To specify an effect of interest, we first specify a list of responses (or taxa) we believe are ‘increasing’ or ‘decreasing’ in response to our treatment. Responses/taxa we do not specify as either ‘increasers’ or ‘decreasers’ are assumed to have no effect.

Next we specify the multiplicative effect size and term of interest
in the effect_alt function. Here `effect_size=1.2`

is a \(20\%\) change in mean abundance.

```
#specify 'increaser' and 'decreaser' species:
increasers <- c("Aplodactylus.lophodon", "Atypichthys.strigatus", "Cheilodactylus.fuscus",
"Olisthops.cyanomelas", "Pictilabrius.laticlavius")
decreasers <- c("Abudefduf.sp", "Acanthurus.nigrofuscus", "Chromis.hypsilepis",
"Naso.unicornis", "Parma.microlepis", "Parupeneus.signatus",
"Pempheris.compressa", "Scorpis.lineolatus", "Trachinops.taeniatus")
#specify effect size of interest
coeff.alt <- effect_alt(fit, effect_size = 1.2, increasers, decreasers, term = "Site.Type")
```

Notice in the original pilot data we have three treatments but are
only interested in comparing control vs. restored sites. In order to
simulate under only control and restored sites, we can create a new
dataset where reference sites have been relabeled as restored (below -
`fish_data_sim`

) and simulate under this design using the
`newdata`

argument in `powersim()`

.

Note that we could have just started out fitting a model to only restored and control sites, however we would lose samples that could be used to help estimate nuisance parameters like overdispersion and covariance. Thus we are effectively ‘borrowing strength’ from the reference sites to help in the estimation of our nuisance parameters.

Let’s start by estimating power for a single sample size specification.

Below we calculate power for a sample size of \(N=100\) using our pre-specified effect size
(`coeff.alt`

) of \(20\%\)
differences in mean abundance.

Note that `nsim`

and `ncrit`

have been set to
50 and 49 respectively in order to run timely, however in practice you
would want to set these to larger values (we recommend setting
`nsim=1000`

and `ncrit=4999`

) to get a more
accurate estimate of power.

```
powersim(fit.cord, N=100, coeff.alt, term="Site.Type", nsim=50, ncrit=49, newdata = fish_data_sim, ncores=2)
#> Time elapsed: 0 hr 0 min 20 sec
#> Power: 0.48
```

Based on this initial power simulation we would need more samples in order to obtain the conventional power target of \(80\%\).

In order to answer our first experimental design question, we can estimate power across a range of sample sizes for an effect size of \(20\%\) differences in mean abundance. By plotting the results, we can observe the sample size required to reliably detect (with a conventional power target of \(80\%\)) the effect of interest.

Users could also plot multiple power curves to get an idea of the variability in the power estimates (as in Maslen et al. 2023).

Note: The below code takes ~10minutes-1hour to run depending on the
speed of your computer and the number of logical processors you have. As
such, this code has not been evaluated by default. Instead, the
resultant figure has been added for reference. If you want to run this
code and reduce computation time, you can reduce `nsim`

or
`ncrit`

, however this will also make power estimates more
variable.

First we estimate power at each sample size of interest:

```
#specify sample sizes to simulate under
sample_sizes <- c(10, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300)
power_estimates <- rep(NA, length(sample_sizes))
#loop through sample sizes and estimate power at each step
for (i in 1:length(sample_sizes)){
power_estimates[i] <- powersim(fit.cord, N = sample_sizes[i], coeff.alt, term = "Site.Type",
nsim = 1000, ncrit = 4999, newdata = fish_data_sim)
}
```

Next, let’s save the results as a data frame and produce a plot against the conventional power target of \(80\%\):

```
#store the results in a data.frame
powercurve_dat <- data.frame(sample_sizes = sample_sizes, Power = unlist(power_estimates))
#plot power curves
ggplot(powercurve_dat, aes(x = sample_sizes, y = Power)) +
geom_line(size = 1.06) + theme_bw() + geom_hline(yintercept = 0.8, linetype = "dashed", color = "red", size=1) +
xlab("N") + ylab("Power") + scale_x_continuous(breaks = sample_sizes) +
scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8, 1.0))
```

This suggests that we require a sample size of at-least \(N \approx 150\) in order to likely detect \(20\%\) changes in mean abundances, using a conventional power target of \(80\%\).

Thus we have answered our first experimental design question!

In order to answer the second experimental design question, we can estimate and then plot multiple effect size curves for different effect size specifications (let’s do this for effect sizes of \(10\%, 20\%, \ldots, 100\%\) changes in mean abundance by specifying effect_size\(=1.1, 1.2, \ldots, 2\). We will also do this across sample sizes that we can sample using a balanced sampling design (from \(N=4,6,\ldots,24\)).

As we are simulating from copula models with small sample sizes, we
will also increase `n.samp`

(number of simulations for
importance sampling in copula modelling) and decrease `nlv`

to 1 (number of latent factors in the copula) as recommended in the
`cord`

function description.

Note: The below code takes ~30minutes-2hours to run depending on the
speed of your computer and the number of logical processors you have. As
such, this code has not been evaluated by default. Instead, the
resultant figure has been added for reference. If you want to run this
code and reduce computation time, you can reduce `nsim`

or
`ncrit`

, however this will also make power estimates more
variable.

First we estimate power at each sample size and effect size of interest:

```
#specify effect sizes and sample sizes to loop through
sample_sizes <- c(3:12)*2
effect_sizes <- c(1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2)
mult_power_estimates <- rep(NA,length(sample_sizes)*length(effect_sizes))
#loop through sample sizes and effect sizes
for (j in 1:length(effect_sizes)){
coeff.alt <- effect_alt(fit, effect_size = effect_sizes[j], increasers, decreasers, term = "Site.Type")
for (i in 1:length(sample_sizes)){
try(mult_power_estimates[10*(j-1)+i] <- powersim(fit.cord, N = sample_sizes[i], coeff.alt, term = "Site.Type", nsim = 1000,
ncrit = 4999, newdata = fish_data_sim, nlv = 1, n.samp = 500))
}
}
```

Next, let’s save the results as a data frame and produce a plot of the resulting power estimates:

```
#store the results in a data.frame
mult_powercurve_dat <- data.frame(sample_sizes = rep(sample_sizes, times = length(effect_sizes)),
power_estimates = unlist(mult_power_estimates),
Perc_Change = factor(rep(c("10%", "20%", "30%", "40%", "50%", "60%",
"70%", "80%", "90%", "100%"), each = 10),
levels = c("100%", "90%", "80%", "70%", "60%", "50%",
"40%", "30%", "20%", "10%")),
effect_sizes = rep(effect_sizes, each = length(sample_sizes)))
#plot multiple power curve
ggplot(mult_powercurve_dat, aes(x = sample_sizes, y = power_estimates, colour = Perc_Change)) +
geom_line(linewidth = 1.05) + theme_bw() + geom_hline(yintercept = 0.8, linetype = "dashed", color = "red", size = 1) +
xlab("N") + ylab("Power") + scale_x_continuous(breaks = sample_sizes) +
scale_y_continuous(breaks = c(0.2, 0.4, 0.6, 0.8, 1.0)) + scale_color_brewer(palette = "Paired", direction = -1, name = "% change")
```