---
title: "`CoRpower`'s Algorithms for Simulating Placebo Group and Baseline Immunogenicity Predictor Data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Algorithms for Simulating Placebo Group and Baseline Immunogenicity Predictor Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

\DeclareMathOperator{\corr}{corr}
\DeclareMathOperator{\var}{var}

## Introduction
The `CoRpower` package assumes that $P(Y^{\tau}(1)=Y^{\tau}(0))=1$ for the biomarker sampling timepoint $\tau$, which renders the CoR parameter $P(Y=1 \mid S=s_1, Z=1, Y^{\tau}=0)$ equal to $P(Y=1 \mid S=s_1, Z=1, Y^{\tau}(1)=Y^{\tau}(0)=0)$, which links the CoR and biomarker-specific treatment efficacy (TE) parameters. Estimation of the latter requires outcome data in placebo recipients, and some estimation methods additionally require availability of a baseline immunogenicity predictor (BIP) of $S(1)$, the biomarker response at $\tau$ under assignment to treatment. In order to link power calculations for detecting a correlate of risk (CoR) and a correlate of TE (coTE), `CoRpower` allows to export simulated data sets that are used in `CoRpower`'s calculations and that are extended to include placebo-group and BIP data for harmonized use by methods assessing biomarker-specific TE. This vignette aims to describe `CoRpower`'s algorithms, and the underlying assumptions, for simulating placebo-group and BIP data. The exported data sets include full rectangular data to allow the user to consider various biomarker sub-sampling designs, e.g., different biomarker case:control sampling ratios, or case-control vs. case-cohort designs.

***
## Algorithms for Simulating Placebo Group Data
### Trichotomous \(\, X\) and \(\, S(1)\) Using Approach 1
<ol>
<li> Specify $P^{lat}_0$, $P^{lat}_2$, $P_0$, $P_2$, $risk_0$, $n_{cases, 0}$, $n_{controls, 0}$, $K$
  <ul>
  <li> $N_{complete, 0} = n_{cases, 0} + n_{controls, 0}$
  </ul>
<li> Specify $Sens$, $Spec$, $FP^0$, and $FN^2$
<li> Number of observations in each latent subgroup: $N_x = N_{complete, 0} P^{lat}_x$
<li> Simulate $X$ under the assumption of homogeneous risk in the placebo group: 
  <ul>
  <li> Cases: $\left(n_{cases, 0}(0),n_{cases,0}(1),n_{cases,0}(2)\right) \sim \mathsf{Mult}(n_{cases,0},(p_0,p_1,p_2))$, where
  \begin{align*}
  p_x=P(X=x|Y=1,Y^{\tau}=0,Z=0) &= P(X=x|Y(0)=1)\\ 
  &= \frac{P(Y(0)=1|X=x)P(X=x)}{P(Y(0)=1)}\\
  &= \frac{risk^{lat}_0(x)P^{lat}_{x}}{risk_0}\\
  &= P^{lat}_{x} \quad \text{because } risk^{lat}_0(x)=risk_0
  \end{align*}
  <li> Controls: $\left(n_{controls,0}(0),n_{controls,0}(1),n_{controls,0}(2)\right) \sim \mathsf{Mult}(n_{controls,0},(p_0,p_1,p_2))$, where
  \begin{align*}
  p_x=P(X=x|Y=0,Y^{\tau}=0,Z=0) &= P(X=x|Y(0)=0)\\ 
  &= \frac{P(Y(0)=0|X=x)P(X=x)}{P(Y(0)=0)}\\
  &= \frac{(1-risk^{lat}_0(x))P^{lat}_{x}}{(1-risk_0)}\\
  &= P^{lat}_{x} \quad \text{because } risk^{lat}_0(x)=risk_0
  \end{align*}
  <li> $n_{controls,0}(x) = N_x - n_{cases,0}(x)$
  </ul>
<li> Simulate $Y$: Vector with $n_{cases,0}(0)$ 1's, followed by $n_{controls,0}(0)$ 0's, followed by $n_{cases,0}(1)$ 1's, etc.
<li> Simulate $S(1)$: For each of the $N_x$ subjects, generate $S(1)$ by a draw from $\mathsf{Mult}(1,(p_0,p_1,p_2))$, where $p_k=P(S(1)=k|X=x)$ is given by $Sens, Spec$, etc.  
</ol>

### Trichotomous \(\, X\) and \(\, S(1)\) Using Approach 2
<ol>
<li> Specify $P^{lat}_0$, $P^{lat}_2$, $P_0$, $P_2$, $risk_0$, $N_{complete,0}$, $n_{cases,0}$, $n^S_{cases}$, $K$
<li> Specify $\rho$ and $\sigma^2_{obs}$ 
<li> Calculation of $(Sens, Spec, FP^0, FP^1, FN^1, FN^2)$:
  <ol type="i">
  <li> Assuming the classical measurement error model, where $X^{\ast} \sim \mathsf{N}(0,\sigma^2_{tr})$,  solve
  $$P^{lat}_0 = P(X^{\ast} \leq \theta_0) \quad \textrm{and} \quad P^{lat}_2 = P(X^{\ast} > \theta_2)$$
  for $\theta_0$ and $\theta_2$
  <li> Generate $B$ realizations of $X^{\ast}$ and $S^{\ast} = X^{\ast} + e$, where $e \sim \mathsf{N}(0,\sigma^2_{e})$, and
  $X^{\ast}$ independent of $e$
      + $B = 20,000$ by default
  <li> Using $\theta_0$ and $\theta_2$ from Step i., define
  \begin{align*}
  Spec(\phi_0) &= P(S^{\ast} \leq \phi_0 \mid X^{\ast} \leq \theta_0)\\
  FN^1(\phi_0) &= P(S^{\ast} \leq \phi_0 \mid X^{\ast} \in (\theta_0,\theta_2])\\
  FN^2(\phi_0) &= P(S^{\ast} \leq \phi_0 \mid X^{\ast} > \theta_2)\\
  Sens(\phi_2) &= P(S^{\ast} > \phi_2 \mid X^{\ast} > \theta_2)\\
  FP^1(\phi_2) &= P(S^{\ast} > \phi_2 \mid X^{\ast} \in (\theta_0,\theta_2])\\
  FP^0(\phi_2) &= P(S^{\ast} > \phi_2 \mid X^{\ast} \leq \theta_0)
  \end{align*}
        
  Estimate $Spec(\phi_0)$ by
  $$\widehat{Spec}(\phi_0) = \frac{\#\{S^{\ast}_b \leq \phi_0, X^{\ast}_b \leq \theta_0\}}{\#\{X^{\ast}_b \leq \theta_0\}}\,$$ etc.
  <li> Find $\phi_0 = \phi^{\ast}_0$ and $\phi_2 = \phi^{\ast}_2$ that numerically solve
  \begin{align*}
  P_0 &= \widehat{Spec}(\phi_0)P^{lat}_0 + \widehat{FN}^1(\phi_0)P^{lat}_1 + \widehat{FN}^2(\phi_0)P^{lat}_2\\
  P_2 &= \widehat{Sens}(\phi_2)P^{lat}_2 + \widehat{FP}^1(\phi_2)P^{lat}_1 + \widehat{FP}^0(\phi_2)P^{lat}_0
  \end{align*}
  and compute
  \[
  Spec = \widehat{Spec}(\phi^{\ast}_0),\; Sens = \widehat{Sens}(\phi^{\ast}_2),\; \textrm{etc.}
  \]
  </ol>      

<li> Follow Steps 3--6 under Approach 1
</ol>

### Continuous \(\, X^*\) and \(\, S^*(1)\)
<ol>
<li> Specify $P^{lat}_{lowestVE}$, $\rho$, $\sigma^2_{obs}$, $VE_{lowest}$, $risk_0$, $n_{cases,0}$, $n_{controls, 0}$, $n^S_{cases}$, $K$
  <ul> 
  <li> $N_{complete, 0} = n_{cases, 0} + n_{controls, 0}$
  </ul>
<li> Simulate $Y$ by creating a vector with $n_{cases,0}$ 1's followed by $n_{controls,0}$ 0's.
<li> Simulate $X^*$ under the assumption of homogeneous risk in the placebo group:
  <ul>
  <li> Cases: from a grid of values ranging from -3 to 3, sample $n_{cases,0}$ with replacement from:
  \begin{align*}
  f_{X^{\ast}}(x^{\ast}|Y=1,Y^{\tau}=0,Z=0) &= f_{X^{\ast}}(x^{\ast}|Y(0)=1)\\
  &= \frac{P(Y(0)=1|X^*=x^*)f_{X^{\ast}}(x^{\ast})}{P(Y(0)=1)}\\
  &= \frac{risk^{lat}_0(x^*)f_{X^{\ast}}(x^{\ast})}{risk_0}\\
  &= f_{X^{\ast}}(x^{\ast}) \quad \text{because } risk^{lat}_0(x^*)=risk_0
  \end{align*}
  <li> Controls: from a grid of values ranging from -3 to 3, sample $n_{controls,0}$ with replacement from:
  \begin{align*}
  f_{X^{\ast}}(x^{\ast}|Y=0,Y^{\tau}=0,Z=0) &= f_{X^{\ast}}(x^{\ast}|Y(0)=0)\\
  &= \frac{P(Y(0)=0|X^*=x^*)f_{X^{\ast}}(x^{\ast})}{P(Y(0)=0)}\\
  &= \frac{(1-risk^{lat}_0(x^*))f_{X^{\ast}}(x^{\ast})}{1-risk_0}\\
  &= f_{X^{\ast}}(x^{\ast}) \quad \text{because } risk^{lat}_0(x^*)=risk_0
  \end{align*}
  <li> $f_{X^{\ast}}(x^{\ast})$ is fully specified because $X^* \sim N(0, \sigma^2_{tr})$
  </ul>
<li> Simulate $S^*(1)$: $S^*(1)=X^*+\epsilon,$ where $\epsilon \sim N(0, \sigma^2_e)$ and $\sigma_e^2=(1-\rho)\sigma^2_{obs}$. $\epsilon$ is independent of $X^*$ and is simulated by `rnorm(Ncomplete, mean=0, sd=sqrt(sigma2e))` 
</ol>

***
## Algorithms for Simulating a Baseline Immunogenicity Predictor (BIP)
### Trichotomous \(\, X, S(1),\) and \(\, BIP\) Using Approach 1
<ol>
<li> The user specifies a classification rule defined by $P(BIP=i \mid S(1)=j)$, $i,j=0,1,2$.
<li> For a subject with biomarker measurement $S_k(1)$, generate $BIP_k$ by a draw from $\mathsf{Mult}(1, (q_0, q_1, q_2))$, where $q_i=P(BIP_k=i \mid S(1)=S_k(1))$, $i=0,1,2$.
</ol>

### Trichotomous \(\, X, S(1),\) and \(\, BIP\) Using Approach 2
*Note: All variables with \* are continuous.*

<ol>
<li> The user specifies $\corr(BIP^*, S^*(1))$.
<li> Assuming that $BIP^*$ follows an additive measurement error model, i.e., $BIP^* := S^*(1) + \delta$, where $\delta \sim N(0, \sigma^2_{\delta})$ with an unknown $\sigma^2_{\delta}$, and $\delta, \epsilon$, and $X^*$ are independent, solve the following equation for $\var \delta = \sigma^2_{\delta}$:
$$
\corr(BIP^*, S^*(1)) = \sqrt\frac{\var X^* + \var\epsilon}{\var X^* + \var\epsilon + \var \delta}
$$
<li> For the fixed $\phi^{\ast}_0$ and $\phi^{\ast}_2$ derived above, define
\begin{align*}
Spec_{BIP}(\xi_0) &= P(BIP^{\ast} \leq \xi_0 \mid S^{\ast} \leq \phi^{\ast}_0)\\
FN^1_{BIP}(\xi_0) &= P(BIP^{\ast} \leq \xi_0 \mid S^{\ast} \in (\phi^{\ast}_0,\phi^{\ast}_2])\\
FN^2_{BIP}(\xi_0) &= P(BIP^{\ast} \leq \xi_0 \mid S^{\ast} > \phi^{\ast}_2)\\
Sens_{BIP}(\xi_2) &= P(BIP^{\ast} > \xi_2 \mid S^{\ast} > \phi^{\ast}_2)\\
FP^1_{BIP}(\xi_2) &= P(BIP^{\ast} > \xi_2 \mid S^{\ast} \in (\phi^{\ast}_0,\phi^{\ast}_2])\\
FP^0_{BIP}(\xi_2) &= P(BIP^{\ast} > \xi_2 \mid S^{\ast} \leq \phi^{\ast}_0)
\end{align*}
<li> Using the same technique as in the derivation of $\phi^{\ast}_0$ and $\phi^{\ast}_2$ above, find $\xi_0=\xi^{\ast}_0$ and $\xi_2=\xi^{\ast}_2$ that numerically solve
\begin{align*}
P_0 &= \widehat{Spec}_{BIP}(\xi_0)P_0 + \widehat{FN}_{BIP}^1(\xi_0)P_1 + \widehat{FN}_{BIP}^2(\xi_0)P_2\\
P_2 &= \widehat{Sens}_{BIP}(\xi_2)P_2 + \widehat{FP}_{BIP}^1(\xi_2)P_1 + \widehat{FP}_{BIP}^0(\xi_2)P_0
\end{align*}
and compute
$$
Spec_{BIP} = \widehat{Spec}_{BIP}(\xi^{\ast}_0),\; Sens_{BIP} = \widehat{Sens}_{BIP}(\xi^{\ast}_2),\; \textrm{etc.}
$$
<li> For a subject with biomarker measurement $S_k(1)$, generate $BIP_k$ by a draw from $\mathsf{Mult}(1, (q_0, q_1, q_2))$, where $q_i$, $i=0,1,2$, are determined by $Sens_{BIP}$, $Spec_{BIP}$, etc. obtained in Step 4.
</ol>

### Continuous \(\, X^*, S^*(1),\) and \(\, BIP^*\)
<ol>
<li> The user specifies $\corr(BIP^*, S^*(1))$.
<li> Assuming that $BIP^*$ follows an additive measurement error model, i.e., $BIP^* := S^*(1) + \delta$, where $\delta \sim N(0, \sigma^2_{\delta})$ with an unknown $\sigma^2_{\delta}$, and $\delta, \epsilon$, and $X^*$ are independent, solve the following equation for $\var \delta = \sigma^2_{\delta}$:
$$
\corr(BIP^*, S^*(1)) = \sqrt\frac{\var X^* + \var\epsilon}{\var X^* + \var\epsilon + \var \delta}
$$
<li> For a subject with biomarker measurement $S^*_k(1)$, generate $BIP^*_k$ as $BIP^*_k = S^*_k(1) + \delta$ using $\sigma^2_{\delta} = \var \delta$ obtained in Step 2.
</ol>