%\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Chapter 02, Horton et al. using mosaic} %\VignettePackage{Sleuth3} \documentclass[11pt]{article} \usepackage[margin=1in,bottom=.5in,includehead,includefoot]{geometry} \usepackage{hyperref} \usepackage{language} \usepackage{alltt} \usepackage{fancyhdr} \pagestyle{fancy} \fancyhf{} %% Now begin customising things. See the fancyhdr docs for more info. \chead{} \lhead[\sf \thepage]{\sf \leftmark} \rhead[\sf \leftmark]{\sf \thepage} \lfoot{} \cfoot{Statistical Sleuth in R: Chapter 2} \rfoot{} \newcounter{myenumi} \newcommand{\saveenumi}{\setcounter{myenumi}{\value{enumi}}} \newcommand{\reuseenumi}{\setcounter{enumi}{\value{myenumi}}} \pagestyle{fancy} \def\R{{\sf R}} \def\Rstudio{{\sf RStudio}} \def\RStudio{{\sf RStudio}} \def\term#1{\textbf{#1}} \def\tab#1{{\sf #1}} \usepackage{relsize} \newlength{\tempfmlength} \newsavebox{\fmbox} \newenvironment{fmpage}[1] { \medskip \setlength{\tempfmlength}{#1} \begin{lrbox}{\fmbox} \begin{minipage}{#1} \vspace*{.02\tempfmlength} \hfill \begin{minipage}{.95 \tempfmlength}} {\end{minipage}\hfill \vspace*{.015\tempfmlength} \end{minipage}\end{lrbox}\fbox{\usebox{\fmbox}} \medskip } \newenvironment{boxedText}[1][.98\textwidth]% {% \begin{center} \begin{fmpage}{#1} }% {% \end{fmpage} \end{center} } \newenvironment{boxedTable}[2][tbp]% {% \begin{table}[#1] \refstepcounter{table} \begin{center} \begin{fmpage}{.98\textwidth} \begin{center} \sf \large Box~\expandafter\thetable. #2 \end{center} \medskip }% {% \end{fmpage} \end{center} \end{table} % need to do something about exercises that follow boxedTable } \newcommand{\cran}{\href{http://www.R-project.org/}{CRAN}} \title{The Statistical Sleuth in R: \\ Chapter 2} \author{ Linda Loi \and Ruobing Zhang\and Kate Aloisio \and Nicholas J. Horton\thanks{Department of Mathematics and Statistics, Smith College, nhorton@smith.edu} } \date{\today} \begin{document} \maketitle \tableofcontents %\parindent=0pt <>= require(knitr) opts_chunk$set( dev="pdf", fig.path="figures/", fig.height=3, fig.width=4, out.width=".47\\textwidth", fig.keep="high", fig.show="hold", fig.align="center", prompt=TRUE, # show the prompts; but perhaps we should not do this comment=NA # turn off commenting of ouput (but perhaps we should not do this either ) @ <>= print.pval = function(pval) { threshold = 0.0001 return(ifelse(pval < threshold, paste("p<", sprintf("%.4f", threshold), sep=""), ifelse(pval > 0.1, paste("p=",round(pval, 2), sep=""), paste("p=", round(pval, 3), sep="")))) } @ <>= require(Sleuth3) require(mosaic) trellis.par.set(theme=col.mosaic()) # get a better color scheme set.seed(123) # this allows for code formatting inline. Use \Sexpr{'function(x,y)'}, for exmaple. knit_hooks$set(inline = function(x) { if (is.numeric(x)) return(knitr:::format_sci(x, 'latex')) x = as.character(x) h = knitr:::hilight_source(x, 'latex', list(prompt=FALSE, size='normalsize')) h = gsub("([_#$%&])", "\\\\\\1", h) h = gsub('(["\'])', '\\1{}', h) gsub('^\\\\begin\\{alltt\\}\\s*|\\\\end\\{alltt\\}\\s*$', '', h) }) showOriginal=FALSE showNew=TRUE @ \section{Introduction} This document is intended to help describe how to undertake analyses introduced as examples in the Third Edition of the \emph{Statistical Sleuth} (2013) by Fred Ramsey and Dan Schafer. More information about the book can be found at \url{http://www.proaxis.com/~panorama/home.htm}. This file as well as the associated \pkg{knitr} reproducible analysis source file can be found at \url{http://www.math.smith.edu/~nhorton/sleuth3}. This work leverages initiatives undertaken by Project MOSAIC (\url{http://www.mosaic-web.org}), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the \pkg{mosaic} package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignette (\url{http://cran.r-project.org/web/packages/mosaic/vignettes/MinimalR.pdf}). To use a package within R, it must be installed (one time), and loaded (each session). The package can be installed using the following command: <>= install.packages('mosaic') # note the quotation marks @ Once this is installed, it can be loaded by running the command: <>= require(mosaic) @ This needs to be done once per session. In addition the data files for the \emph{Sleuth} case studies can be accessed by installing the \pkg{Sleuth3} package. <>= install.packages('Sleuth3') # note the quotation marks @ <>= require(Sleuth3) @ We also set some options to improve legibility of graphs and output. <>= trellis.par.set(theme=col.mosaic()) # get a better color scheme for lattice options(digits=3, show.signif.stars=FALSE) @ The specific goal of this document is to demonstrate how to calculate the quantities described in Chapter 2: Inference Using \emph{t}-Distributions using R. \section{Evidence Supporting Darwin's Theory of Natural Selection} Do birds evolve to adapt to their environments? That's the question being addressed by Case Study 2.1 in the \emph{Sleuth}. \subsection{Statistical summary and graphical display} We begin by reading the data and summarizing the variables. <<>>= summary(case0201) fav = favstats(Depth ~ Year, data=case0201); fav @ A total of \Sexpr{nrow(case0201)} subjects are included in the data: \Sexpr{nrow(subset(case0201, Year=="1976"))} are finches that were caught in 1976 and \Sexpr{nrow(subset(case0201, Year=="1978"))} are finches that were caught in 1978. The following figure replicates Display 2.1 on page 30. <<>>= bwplot(Year ~ Depth, data=case0201) @ <<>>= densityplot(~ Depth, groups=Year, auto.key=TRUE, data=case0201) @ The distributions are approximately normally distributed, with some evidence for a long left tail. \subsection{Inferential procedures (two-sample t-test)} First, we calculate the pooled SD and the standard error between these two different sample average (page 41, Display 2.8). <<>>= # Calculate Pooled SD n1 = fav["1976", "n"]; n1 n2 = fav["1978", "n"]; n2 s1 = fav["1976", "sd"]; s1 s2 = fav["1978", "sd"]; s2 Sp = sqrt(((n1-1)*(s1)^2+(n2-1)*(s2)^2)/(n1+n2-2)); Sp # Calculate standard error SE = Sp*sqrt(1/n1+1/n2); SE @ So the pooled SD is \Sexpr{round(Sp, 2)} and the standard error is \Sexpr{round(SE, 1)}. Based on this information, we can construct a 95\% confidence interval (page 43, Display 2.9). <<>>= Y1 = fav["1976", "mean"]; Y1 Y2 = fav["1978", "mean"]; Y2 Yd = Y2-Y1; Yd df = n1+n2-2; df qt = qt(0.975, df); qt hw = qt*SE; hw lower = Yd-hw; lower upper = Yd+hw; upper @ So the 95\% confidence interval of the difference between means is (\Sexpr{round(lower, 1)}, \Sexpr{round(upper, 1)}) Now we want to calculate the $t$-statistic and $p$-value (as shown on page 46, Display 2.10). <<>>= tstats = (Yd-0)/SE; tstats # The hypothesis difference=0 onepval = 1-pt(tstats, df); onepval twopval = 2*onepval; twopval @ The one-sided $p$-value is approximately \Sexpr{round(onepval, 2)} and the two-sided $p$-value is also approximately \Sexpr{round(twopval, 2)}. We can get the results of ``Summary of Statistical Findings" (page 29) by using the following code: <<>>= t.test(Depth ~ Year, var.equal=TRUE, data=case0201) confint(lm(Depth ~ Year, data=case0201)) @ \section{Anatomical Abnormalities Associated with Schizophrenia} Is the area of brain related to the development of schizophrenia? That's the question being addressed by case study 2.2 in the \emph{Sleuth}. \subsection{Statistical summary and graphical display} We begin by reading the data and summarizing the variables. <<>>= summary(case0202) @ A total of \Sexpr{nrow(case0202)} subjects are included in the data. There are \Sexpr{nrow(case0202[ "Affected"])} pairs of twins; one of the twins has schizophrenia, and the other does not. So there are \Sexpr{nrow(case0202["Affected"])} affected subjects and \Sexpr{nrow(case0202["Unaffected"])} unaffected subjects. The difference in area of left hippocampus of these pairs of twins is: <>= case0202 = transform(case0202, DIFF = Unaffected - Affected) favstats(~ DIFF, data=case0202) @ This matches the results on page 31, Display 2.2. <<>>= densityplot(~ DIFF, data=case0202) @ \subsection{Inferential procedures (two-sample t-test)} We want to calculate the paired t-test and 95\% confidence interval. <<>>= # Calculate t-statistics difmean = mean(~ DIFF, data=case0202); difmean difsd = sd(~ DIFF, data=case0202); difsd difn = nrow(case0202); difn difSE = difsd/sqrt(difn); difSE tscore = (difmean-0)/difSE; tscore # hypothesis difference=0 twopvalue = 2*(1-pt(tscore, difn-1)); twopvalue # Construct confidence interval tstar = qt(0.975, difn-1); tstar schizolower = difmean - tstar*difSE; schizolower schizoupper = difmean + tstar*difSE; schizoupper @ So the two-sided $p$-value is approximately \Sexpr{round(twopvalue, 3)} and the 95\% confidence interval is (\Sexpr{round(schizolower, 2)}, \Sexpr{round(schizoupper, 2)}). We can also get the results displayed on page 32 by conducting a paired $t$-test: <<>>= with(case0202, t.test(Unaffected, Affected, paired=TRUE)) @ \end{document}