---
title: "Tutorial: Mini-Meta Delta"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tutorial: Mini-Meta Delta}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, warning = FALSE, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

When scientists conduct replicates of the same experiment, the effect size of each replicate often varies, which can make interpreting the results more challenging. This vignette documents how `dabestr` is able to compute the meta-analyzed weighted effect size given multiple replicates of the same experiment. This feature can help resolve differences between replicates and simplify interpretation.

This function uses the generic _inverse-variance_ method to calculate the effect size, as follows:

$$\theta_{weighted} = \frac{\sum{\hat{\theta_i}}w_i}{\sum{w_i}}$$
where:

$$\hat{\theta_i}=\text{Mean difference for replicate } i$$
$$w_i=\text{Weight for replicate } i = \frac{1}{s_i^2}$$
$$s_i^2=\text{Pooled variance for replicate } i = \frac{(n_{test}-1)s_{test}^2 + (n_{control}-1)s_{control}^2} {n_{test}+n_{control}-2}$$
$$n = \text{sample size and } s^2 = \text{variance for control/test}$$
Note that `dabestr` uses the _fixed-effects_ model of meta-analysis, as opposed to the random-effects model. This means that all variation between the results of each replicate is assumed to be due solely to sampling error. We recommend using this function only for replications of the same experiment, where it can be safely assumed that each replicate estimates the same population mean, $\mu$.

The `dabestr` package can only compute weighted effect size for _mean difference only_, and not standardized measures such as *Cohen’s d*.

For more information on meta-analysis, please refer to [Chapter 10](https://training.cochrane.org/handbook/current/chapter-10) of the *Cochrane* handbook: 

```{r setup, warning = FALSE, message = FALSE}
library(dabestr)
```

## Create dataset for demo
```{r, warning = FALSE}
set.seed(12345) # Fix the seed so the results are reproducible.
# pop_size = 10000 # Size of each population.
N <- 20 # The number of samples taken from each population

# Create samples
c1 <- rnorm(N, mean = 3, sd = 0.4)
c2 <- rnorm(N, mean = 3.5, sd = 0.75)
c3 <- rnorm(N, mean = 3.25, sd = 0.4)

t1 <- rnorm(N, mean = 3.5, sd = 0.5)
t2 <- rnorm(N, mean = 2.5, sd = 0.6)
t3 <- rnorm(N, mean = 3, sd = 0.75)

# Add a `gender` column for coloring the data.
gender <- c(rep("Male", N / 2), rep("Female", N / 2))

# Add an `id` column for paired data plotting.
id <- 1:N

# Combine samples and gender into a DataFrame.
df <- tibble::tibble(
  `Control 1` = c1, `Control 2` = c2, `Control 3` = c3,
  `Test 1` = t1, `Test 2` = t2, `Test 3` = t3,
  Gender = gender, ID = id
)

df <- df %>%
  tidyr::gather(key = Group, value = Measurement, -ID, -Gender)
```

We now have 3 Control groups and 3 Test groups, simulating 3 replicates of the same experiment. Our dataset also includes a non-numerical column indicating gender and another column indicating the identity of each observation.

This is known as a ‘long’ dataset. See this [writeup](https://simonejdemyr.com/r-tutorials/basics/wide-and-long/) for more details.

```{r}
knitr::kable(head(df))
```

## Loading Data
Next, we load data as we would typically do using `load()`. This time, however, we also specify the argument `minimeta = TRUE`. As we are loading three experiments’ worth of data, `idx` is passed as a list of vectors, as shown below:

```{r, warning = FALSE}
unpaired <- load(df,
  x = Group, y = Measurement,
  idx = list(
    c("Control 1", "Test 1"),
    c("Control 2", "Test 2"),
    c("Control 3", "Test 3")
  ),
  minimeta = TRUE
)
```

When this `dabest` object is printed, it should show that effect sizes will be calculated for each group, as well as the weighted delta. Note once again that the weighted delta will only be calculated for the mean difference.

```{r, warning = FALSE}
print(unpaired)
```

After applying the `mean_diff()` function to the `dabest` object, you can view the mean differences for each group as well as the weighted delta by printing the `dabest_effectsize_obj`.

```{r, warning = FALSE}
unpaired.mean_diff <- mean_diff(unpaired)

print(unpaired.mean_diff)
```

You can view the details of each experiment by accessing `dabest_effectsize_obj$boot_results`, as shown below. This also contains details of the weighted delta.

```{r, warning = FALSE}
unpaired.mean_diff$boot_result
```

## Unpaired Data
Simply calling the `dabest_plot()` function it will generate a **Cumming estimation plot** showing the data for each experimental replicate as well as the calculated weighted delta.

```{r, warning = FALSE}
dabest_plot(unpaired.mean_diff)
```

You can also hide the weighted delta by passing the argument `show_mini_meta = FALSE`. In this case, the resulting graph would be identical to a multiple two-groups plot:

```{r, warning = FALSE}
dabest_plot(unpaired.mean_diff, show_mini_meta = FALSE)
```

## Paired Data
The tutorial up to this point has dealt with unpaired data. If your data is paired data, the process for loading, plotting and accessing the data is the same as for unpaired data,  except that you need to pass the argument `paired = "sequential"` or `paired = "baseline"` and an appropriate `id_col` during the `load()` step, as follows:

```{r, warning = FALSE}
paired.mean_diff <- load(df,
  x = Group, y = Measurement,
  idx = list(
    c("Control 1", "Test 1"),
    c("Control 2", "Test 2"),
    c("Control 3", "Test 3")
  ),
  paired = "baseline", id_col = ID,
  minimeta = TRUE
) %>%
  mean_diff()

dabest_plot(paired.mean_diff, raw_marker_size = 0.5, raw_marker_alpha = 0.3)
```