--- title: "DYNATE" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{DYNATE} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The `DYNATE` package accompanies the paper "Localizing Rare-Variant Association Regions via Multiple Testing Embedded in an Aggregation Tree". The package is designed to pinpoint the disease-associated rare variant sets with a controlled false discovery rate in case-control study. ```{r setup} library(DYNATE) ``` Here, we show how to apply `DYNATE`. \par We require the input data is a data frame with a long format: each row is a rare variant (SNP) per sample. Specifically, the input data should contains 6 variables with name `Sample.Name`, `Sample.Type`, `snpID`, `domainID`, `X1`, `X2`. Where variables `Sample.Name`, `snpID` and `domainID` indicate the Sample ID, SNP ID, and domain ID, respectively; Variable `Sample.Type` indicates the case/control status of each sample; Variables `X1` and `X2` are covariates that could be considered in the analysis. The `snp_dat` below is a toy simulated data with 6 variables and 210,454 rows. The data contains 2,000 samples (1,000 cases and 1,000 controls). In total 16,281 SNPs reside in 2,000 domains are considered in `snp_dat`. ```{r} str(snp_dat) ``` First, we set the tuning parameters as follows. Please refer to the paper for detailed tuning parameters selection procedure. ```{r} M <- 5 # leaf size L <- 3 # layer number alpha <- 0.05 # desired FDR ``` Second, we use `Test_Leaf` function to construct leaves and generate leaf P-values for the case-control study. ```{r} # Model consider covariates effect: t1 <- Sys.time() p_leaf <- Test_Leaf(snp_dat=snp_dat,thresh_val=M) t2 <- Sys.time() t2-t1 ``` \par In the output data frames `p_leaf`, each row links to a rare variant (SNPs), and the number of rows equals the number of rare variants (SNPs) we considered (SNPs that link to a leaf with p-value=1 are excluded for maintaining the algorithm stability). The data frame includes 5 variables. In the data frame, variable `L1` is leaf ID; variable `pvals` is the leaf level p values; variable `Test` indicates the name of the statistical test to generate the leaf level p values (FET or score). ```{r} str(p_leaf) ``` \par Finally, we use the function `DYNATE` to conduct dynamic and hierarchical testing based on the leaf level p values. ```{r} out <- DYNATE(struct_map=p_leaf,L=L,alpha=alpha) ``` In the output data frames `out`, each row links to a unique SNP that is detected by DYNATE. The variables `snpID`, `L1`, and `domainID` link to the detected SNP ID, leaf ID, and domain ID, respectively; Variable `Test` links to the name of the statistical test we applied (FET or score); Variable `pvals1` links to the leaf level p-values; Variable `Layer` indicates in which layer the SNP is detected. ```{r} str(out) ```