\documentclass[12pt]{article} \usepackage[paper=letterpaper,nohead,margin=1in]{geometry} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} %\usepackage{ragged2e} %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{survSNP} %\VignetteDepends{lattice,foreach} \usepackage{url} \setlength{\parindent}{0pt} \setlength{\parskip}{2ex} \title{{\tt survSNP}: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes} \author{Kouros Owzar \and Zhiguo Li \and Nancy Cox \and Sin-Ho Jung \and Chanhee Yi} \begin{document} \maketitle \section{Introduction} This vignette serves as a tutorial for using the {\tt survSNP} extension package for conducting power and sample size calculations for SNP association studies with censored time to event outcomes. We begin by loading the package. <>= library(survSNP) @ \section{Example 1} In this example, we conduct power calculations for a SNP for which the relative allelic frequency for the risk allele {\it B} is $q=0.1$. The validation data set is assumed to consist of $n=500$ patients. The event rate (e.g., death rate) is assumed to be $\rho=0.75$ (i.e., \Sexpr{500*0.75} events among 500 patients). We hypothesize an effect size (genotype hazard ratio) $\Delta=1.25$. Finally, we assume that the median in the population is 1 unit of time (i.e., $P(\tilde{T}>t)=0.5$). To find the asymptotic power, at the two-sided $\alpha=0.05$ level, the {\tt sim.snp.expsurv.power} function is used. <>= res1<-sim.snp.expsurv.power(1.25, n=500, raf=0.1, erate=0.75, pilm=0.5, lm=1, B=0, model="additive",test="additive",alpha=0.05) @ Below, we show some of the relevant columns from the output. <>= res1[,c("n","GHR","erate","raf","B","alpha","pow0","pow","powB")] @ The asymptotic power, at the two-sided $\alpha=0.05$ level, is \Sexpr{signif(res1$pow0,2)}. Note that we are assuming that the median for the survival function in the population is 1 unit of time (that is why we have set {\tt pilm=0.5} and {\tt lm=1}). If we had desired to set power the study based on a population whose 0.7 quantile is say 2 units of time, we would have set {\tt pilm=0.7} and {\tt lm=2}. The current version of the package supports power calculations for additive models. \par By default, the asymptotic power based on the approximate asymptotic variance formula is computed. The corresponding column name is {\tt pow0} in the output shown above. The asymptotic power based on the exact variance formula (column {\tt pow}) or the empirical power (column {\tt powB}) can be computed by setting the arguments {\tt exactvar=TRUE} or {\tt B=b} where {\tt b} is a positive integer. Be sure to set a random number generation seed when calculating the empirical or the asymptotic power based on the exact variance formula to get reproducible results. We illustrate these features next. This example is based on $B=10000$ simulation replicates. <>= set.seed(123) res1b<-sim.snp.expsurv.power(1.25, n=500, raf=0.25, erate=0.75, pilm=0.5, lm=1, exactvar=TRUE,B=10000, model="additive",test="additive",alpha=0.05) @ The results are shown below. <>= res1b[,c("n","GHR","erate","raf","B","alpha","pow0","pow","powB")] @ \section{Example 2} For power calculations for SNP association studies with censored outcomes, it is generally desired to vary the effect size, sample size, event rate and the relative minor allele frequency. These are denoted by the variable names {\tt GHRs}, {\tt ns}, {\tt rafs} and {\tt erates} in this example. <>= GHRs<-seq(1.05,1.5,by=0.05) ns<-c(100,500,700) rafs<-c(0.1,0.3,0.5) erates<-c(0.5,0.7,0.9) @ The function {\tt survSNP.power.table} can be used to generate power calculations for this combination of parameters. This is a wrapper function for {\tt sim.snp.expsurv.power}. <>= res2<-survSNP.power.table(GHRs,ns,rafs,erates, pilm=0.5,lm=1, model="additive",test="additive", alpha=0.05) @ \par We print selected columns from the first three rows of the previous object next. <>= res2[1:3,c("n","GHR","erate","raf","pow0","pow","powB")] @ Next, we will consider illustrating the previous set of results using the {\tt lattice} package. The power is illustrated in Figure \ref{fig:ex2}. A revised version of this illustration limiting the presentation to $n=$\Sexpr{ns[1]} and displaying the type I error rate $\alpha$ is shown in Figure \ref{fig:ex2b}. \begin{figure}[b] \centering <>= KEY=paste("q=",levels(factor(res2$raf)),sep="") KEY<-list(lines=list(col=1:length(KEY),lty=1:length(KEY)), text=list(labels=paste("q=",levels(factor(res2$raf)),sep="")), column=3) print(xyplot(pow0~GHR|factor(erate)*factor(n),group=factor(raf), data=res2,type="l",lty=KEY$lines$lty,col=KEY$lines$col, key=KEY, xlab="Genotype Hazard Ratio",ylab="Power")) @ \caption{Power Illustration for Example 1.} \label{fig:ex2} \end{figure} \begin{figure}[b] \centering <>= print(xyplot(pow0~GHR|factor(erate),group=factor(raf), data=subset(res2,n==ns[1]), type="l",lty=KEY$lines$lty,col=KEY$lines$col, key=KEY, xlab="Genotype Hazard Ratio",ylab="Power", sub=paste("n=",ns[1],", alpha=",round(unique(res2$alpha),2)))) @ \caption{Power Illustration for Example 1 (restricted to $n=$\Sexpr{ns[1]}).} \label{fig:ex2b} \end{figure} \section{Example 3} We can also use the {\tt xtable} package to summarize the results in a table. For this illustration, we consider a subset of the rows from Example 2. <>= cols<-c("n","GHR","erate","raf","pow0") res3<-subset(res2,GHR==1.25&raf==0.3&n==500,select=cols) res3 @ The corresponding table generated by {\tt xtable} is shown in Table~\ref{tab:ex3}. \begin{table}[h] \centering <>= print(xtable(res3,digits=c(0,0,1,1,1,3)), include.rownames=FALSE,floating=FALSE) @ \caption{Tabular summary of the results from Example 2} \label{tab:ex3} \end{table} \section{Miscellaneous} The function {\tt survSNP.power.table} is a straightforward implementation of nested {\tt foreach} loops. In oder to maintain reproducibility of the randomly generated simulations, the loops are executed sequentially. A vignette in the {\tt doRNG} package provides simple examples of how the code could be modified to run in parallel, while maintaining reproducible random number generation \url{https://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf}. [The {\tt doRNG} package itself is not required to implement this functionality]. Note that some consideration should be given as to which parameter(s) to parallelize across, in order to maximize use of available processing cores and minimize I/O overhead. \par The {\tt tikzDevice} and {\tt latticeExtra} packages can be used to considerably refine the illustrations. The software development repository provides some examples \url{bitbucket.org/kowzar/survsnp/}. \par The session information for this vignette is provided in Table~\ref{tab:session} \begin{table}[b] <>= print(toLatex(sessionInfo())) @ \caption{Session Information} \label{tab:session} \end{table} \end{document}