---
title: "DYNATE"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{DYNATE}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The `DYNATE` package accompanies the paper "Localizing Rare-Variant Association Regions via
Multiple Testing Embedded in an Aggregation Tree". The package is designed to pinpoint the disease-associated rare variant sets
with a controlled false discovery rate in case-control study.
```{r setup}
library(DYNATE)
```
Here, we show how to apply `DYNATE`.
\par
We require the input data is a data frame with a long format: each row is a rare variant (SNP) per sample. Specifically, the input data should contains 6 variables with name `Sample.Name`, `Sample.Type`, `snpID`, `domainID`, `X1`, `X2`. Where variables `Sample.Name`, `snpID` and `domainID` indicate the Sample ID, SNP ID, and domain ID, respectively; Variable `Sample.Type` indicates the case/control status of each sample; Variables `X1` and `X2` are covariates that could be considered in the analysis. The `snp_dat` below is a toy simulated data with 6 variables and 210,454 rows. The data contains 2,000 samples (1,000 cases and 1,000 controls). In total 16,281 SNPs reside in 2,000 domains are considered in `snp_dat`.
```{r}
str(snp_dat)
```
First, we set the tuning parameters as follows. Please refer to the paper for detailed tuning parameters selection procedure.
```{r}
M <- 5 # leaf size
L <- 3 # layer number
alpha <- 0.05 # desired FDR
```
Second, we use `Test_Leaf` function to construct leaves and generate leaf P-values for the case-control study.
```{r}
# Model consider covariates effect:
t1 <- Sys.time()
p_leaf <- Test_Leaf(snp_dat=snp_dat,thresh_val=M)
t2 <- Sys.time()
t2-t1
```
\par
In the output data frames `p_leaf`, each row links to a rare variant (SNPs), and the number of rows equals the number of rare variants (SNPs) we considered (SNPs that link to a leaf with p-value=1 are excluded for maintaining the algorithm stability). The data frame includes 5 variables. In the data frame, variable `L1` is leaf ID; variable `pvals` is the leaf level p values; variable `Test` indicates the name of the statistical test to generate the leaf level p values (FET or score).
```{r}
str(p_leaf)
```
\par
Finally, we use the function `DYNATE` to conduct dynamic and hierarchical testing based on the leaf level p values.
```{r}
out <- DYNATE(struct_map=p_leaf,L=L,alpha=alpha)
```
In the output data frames `out`, each row links to a unique SNP that is detected by DYNATE. The variables `snpID`, `L1`, and `domainID` link to the detected SNP ID, leaf ID, and domain ID, respectively; Variable `Test` links to the name of the statistical test we applied (FET or score); Variable `pvals1` links to the leaf level p-values; Variable `Layer` indicates in which layer the SNP is detected.
```{r}
str(out)
```