\documentclass[nojss]{jss} %% need no \usepackage{Sweave} \usepackage{rotating} \newcommand{\given}{\, | \,} \title{Residual-Based Shadings in \pkg{vcd}} \Plaintitle{Residual-Based Shadings in vcd} \author{Achim Zeileis, David Meyer, \textnormal{and} Kurt Hornik\\Wirtschaftsuniversit\"at Wien, Austria} \Plainauthor{Achim Zeileis, David Meyer, Kurt Hornik} \Abstract{ This vignette is a companion paper to \cite{vcd:Zeileis+Meyer+Hornik:2007} which introduces several extensions to residual-based shadings for enhancing mosaic and association plots. The paper introduces (a)~perceptually uniform Hue-Chroma-Luminance (HCL) palettes and (b)~incorporates the result of an associated significance test into the shading. Here, we show how the examples can be easily reproduced using the \pkg{vcd} package. } \Keywords{association plots, conditional inference, contingency tables, HCL colors, HSV colors, mosaic plots} \Address{ Achim Zeileis\\ E-mail: \email{Achim.Zeileis@R-project.org}\\ David Meyer\\ E-mail: \email{David.Meyer@R-project.org}\\ Kurt Hornik\\ E-mail: \email{Kurt.Hornik@R-project.org}\\ } \begin{document} %\VignetteIndexEntry{Residual-Based Shadings in vcd} %\VignetteDepends{vcd,colorspace,MASS,grid,HSAUR3,grid} %\VignetteKeywords{association plots, conditional inference, contingency tables, HCL colors, HSV colors, mosaic plots} %\VignettePackage{vcd} \SweaveOpts{engine=R,eps=FALSE} \section{Introduction} \label{sec:intro} In this vignette, we show how all empirical examples from \cite{vcd:Zeileis+Meyer+Hornik:2007} can be reproduced in \proglang{R}\citep[\mbox{\url{http://www.R-project.org/}}]{vcd:R:2006}, in particular using the package \pkg{vcd} \citep{vcd:Meyer+Zeileis+Hornik:2006}. Additionally, the pakcages \pkg{MASS} \citep[see][]{vcd:Venables+Ripley:2002}, \pkg{grid} \citep[see][]{vcd:Murrell:2002} and \pkg{colorspace} \citep{vcd:Ihaka:2004} are employed. All are automatically loaded together with \pkg{vcd}: <>= library("grid") library("vcd") rseed <- 1071 @ Furthermore, we define a \code{rseed} which will be used as the random seed for making the results of the permutation tests (conditional inference) below exactly reproducible. In the following, we focus on the \proglang{R} code and output---for background information on the methods and the data sets, please consult \cite{vcd:Zeileis+Meyer+Hornik:2007}. \section{Arthritis data} \label{sec:arthritis} First, we take a look at the association of treatment type and improvement in the \code{Arthritis} data. The data set can be loaded and brought into tabular form via: <>= data("Arthritis", package = "vcd") (art <- xtabs(~ Treatment + Improved, data = Arthritis, subset = Sex == "Female")) @ Two basic explorative views of such a 2-way table are mosaic plots and association plots. They can be generated via \code{mosaic()} and \code{assoc()} from \pkg{vcd}, respectively. For technical documentation of these functions, please see \cite{vcd:Meyer+Zeileis+Hornik:2006b}. When no further arguments are supplied as in <>= mosaic(art) assoc(art) @ this yields the plain plots without any color shading, see Figure~\ref{fig:classic}. Both indicate that there are more patients in the treatment group with marked improvement and less without improvement than would be expected under independence---and vice versa in the placebo group. \setkeys{Gin}{width=\textwidth} \begin{figure}[b!] \begin{center} <>= grid.newpage() pushViewport(viewport(layout = grid.layout(1, 2))) pushViewport(viewport(layout.pos.col=1, layout.pos.row=1)) mosaic(art, newpage = FALSE, margins = c(2.5, 4, 2.5, 3)) popViewport() pushViewport(viewport(layout.pos.col=2, layout.pos.row=1)) assoc(art, newpage = FALSE, margins = c(5, 2, 5, 4)) popViewport(2) @ \caption{Classic mosaic and association plot for the arthritis data.} \label{fig:classic} \end{center} \end{figure} For 2-way tables, \cite{vcd:Zeileis+Meyer+Hornik:2007} suggest to extend the shading of \cite{vcd:Friendly:1994} to also visualize the outcome of an independence test---either using the sum of squares of the Pearson residuals as the test statistic or their absolute maximum. Both statistics and their corresponding (approximate) permutation distribution can easily be computed using the function \code{coindep_test()}. Its arguments are a contingency table, a specification of margins used for conditioning (only for conditional independence models), a functional for aggregating the Pearson residuals (or alternatively the raw counts) and the number of permutations that should be drawn. The conditional table needs to be a 2-way table and the default is to compute the maximum statistic (absolute maximum of Pearson residuals). For the Arthritis data, both, the maximum test <>= set.seed(rseed) (art_max <- coindep_test(art, n = 5000)) @ and the sum-of-squares test, indicate a significant departure from independence. <>= ss <- function(x) sum(x^2) set.seed(rseed) coindep_test(art, n = 5000, indepfun = ss) @ Thus, it can be concluded that the treatment is effective and leads to significantly more improvement than the placebo. The classic views from Figure~\ref{fig:classic} and the inference above can also be combined, e.g., using the maximum shading that highlights the cells in an association or mosaic plot when the associated residuals exceed critical values of the maximum test (by default at levels 90\% and 99\%). To compare this shading (using either HSV or HCL colors) with the Friendly shading (using HSV colors), we generate all three versions of the mosaic plot: <>= mosaic(art, gp = shading_Friendly(lty = 1, eps = NULL)) mosaic(art, gp = shading_hsv, gp_args = list( interpolate = art_max$qdist(c(0.9, 0.99)), p.value = art_max$p.value)) set.seed(rseed) mosaic(art, gp = shading_max, gp_args = list(n = 5000)) @ the results are shown in the upper row of Figure~\ref{fig:shadings}. The last plot could hae also been generated analogously to the second plot using \code{shading_hcl()} instead of \code{shading_hsv()}---\code{shading_max()} is simply a wrapper function which performs the inference and then visualizes it based on HCL colors. \section{Piston rings data} \label{sec:arthritis} Instead of bringing out the result of the maximum test in the shading, we could also use a sum-of-squares shading that visualizes the result of the sum-of-squares test. As an illustration, we use the \code{pistonrings} data from the \code{HSAUR3} \citep{vcd:Everitt+Hothorn:2006} package giving the number of piston ring failurs in different legs of different compressors at an industry plant: <>= data("pistonrings", package = "HSAUR3") pistonrings @ \begin{sidewaysfigure}[p] \begin{center} <>= mymar <- c(1.5, 0.5, 0.5, 2.5) grid.newpage() pushViewport(viewport(layout = grid.layout(2, 3))) pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1)) mosaic(art, margins = mymar, newpage = FALSE, gp = shading_Friendly(lty = 1, eps = NULL)) popViewport() pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2)) mosaic(art, gp = shading_hsv, margins = mymar, newpage = FALSE, gp_args = list(interpolate = art_max$qdist(c(0.9, 0.99)), p.value = art_max$p.value)) popViewport() pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 3)) set.seed(rseed) mosaic(art, gp = shading_max, margins = mymar, newpage = FALSE, gp_args = list(n = 5000)) popViewport() pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 1)) mosaic(pistonrings, margins = mymar, newpage = FALSE, gp = shading_Friendly(lty = 1, eps = NULL, interpolate = c(1, 1.5))) popViewport() pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 2)) mosaic(pistonrings, gp = shading_hsv, margins = mymar, newpage = FALSE, gp_args = list(p.value = 0.069, interpolate = c(1, 1.5))) popViewport() pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 3)) mosaic(pistonrings, gp = shading_hcl, margins = mymar, newpage = FALSE, gp_args = list(p.value = 0.069, interpolate = c(1, 1.5))) popViewport(2) @ \includegraphics[width=.9\textwidth,keepaspectratio]{residual-shadings-shadings} \caption{Upper row: Mosaic plot for the arthritis data with Friendly shading (left), HSV maximum shading (middle), HCL maximum shading (right). Lower row: Mosaic plot for the piston rings data with fixed user-defined cut offs 1 and 1.5 and Friendly shading (left), HSV sum-of-squares shading (middle), HCL sum-of-squares shading (right).} \label{fig:shadings} \end{center} \end{sidewaysfigure} Although there seems to be some slight association between the leg (especially center and South) and the compressor (especially numbers 1 and 4), there is no significant deviation from independence: <>= set.seed(rseed) coindep_test(pistonrings, n = 5000) set.seed(rseed) (pring_ss <- coindep_test(pistonrings, n = 5000, indepfun = ss)) @ This can also be brought out graphically in a shaded mosaicplot by enhancing the Friendly shading (based on the user-defined cut-offs 1 and 1.5, here) to use a less colorful palette, either based on HSV or HCL colors: <>= mosaic(pistonrings, gp = shading_Friendly(lty = 1, eps = NULL, interpolate = c(1, 1.5))) mosaic(pistonrings, gp = shading_hsv, gp_args = list(p.value = pring_ss$p.value, interpolate = c(1, 1.5))) mosaic(pistonrings, gp = shading_hcl, gp_args = list(p.value = pring_ss$p.value, interpolate = c(1, 1.5))) @ The resulting plots can be found in the lower row of Figure~\ref{fig:shadings}. The default in \code{shading_hcl()} and \code{shading_hsv()} is to use the asymptotical $p$~value, hence we set it explicitely to the permtuation-based $p$~value computed above. \section{Alzheimer and smoking} \label{sec:alzheimer} For illustrating that the same ideas can be employed for visualizing (conditional) independence in multi-way tables, \cite{vcd:Zeileis+Meyer+Hornik:2007} use a 3-way and a 4-way table. The former is taken from a case-control study of smoking and {A}lzheimer's disease (stratified by gender). The data set is available in \proglang{R} in the package \pkg{coin} \cite{vcd:Hothorn+Hornik+VanDeWiel:2006}. <>= data("alzheimer", package = "coin") alz <- xtabs(~ smoking + disease + gender, data = alzheimer) alz @ \begin{figure}[b!] \begin{center} <>= set.seed(rseed) cotabplot(~ smoking + disease | gender, data = alz, panel = cotab_coindep, panel_args = list(n = 5000)) @ \caption{Conditional mosaic plot with double maximum shading for conditional independence of smoking and disease given gender.} \label{fig:alz} \end{center} \end{figure} To assess whether smoking behaviour and disease status are conditionally independent given gender, \cite{vcd:Zeileis+Meyer+Hornik:2007} use three different types of test statistics: double maximum (maximum of maximum statistics in the two strata), maximum sum of squares (maximum of sum-of-squares statistics), and sum of squares (sum of sum-of-squares statistics). All three can be computed and assessed via permutation methods using the function \code{coindep_test()}: <>= set.seed(rseed) coindep_test(alz, 3, n = 5000) set.seed(rseed) coindep_test(alz, 3, n = 5000, indepfun = ss) set.seed(rseed) coindep_test(alz, 3, n = 5000, indepfun = ss, aggfun = sum) @ The conditional mosaic plot in Figure~\ref{fig:alz} shows clearly that the association of smoking and disease is present only in the group of male patients. The double maximum shading employed allows for identification of the male heavy smokers as the cells `responsible' for the dependence: other dementias are more frequent and Alzheimer's disease less frequent in this group than expected under independence. Interestingly, there seems to be another large residual for the light smoker group ($<$10 cigarettes) and Alzheimer's disease---however, this is only significant at 10\% and not at the 1\% level as the other two cells. <>= <> @ \section{Corporal punishment of children} As a 4-way example, data from a study of the Gallup Institute in Denmark in 1979 about the attitude of a random sample of 1,456 persons towards corporal punishment of children is used. The contingency table comprises four margins: memory of punishments as a child (yes/no), attitude as a binary variable (approval of ``moderate'' punishment or ``no'' approval), highest level of education (elementary/secondary/high), and age group (15--24, 25--39, $\ge$40 years). <>= data("Punishment", package = "vcd") pun <- xtabs(Freq ~ memory + attitude + age + education, data = Punishment) ftable(pun, row.vars = c("age", "education", "memory")) @ It is of interest whether there is an association between memories of corporal punishments as a child and attitude towards punishment of children as an adult, controlling for age and education. All three test statistics already used above confirm that memories and attitude are conditionally associated: \setkeys{Gin}{width=\textwidth} \begin{figure}[t!] \begin{center} <>= set.seed(rseed) cotabplot(~ memory + attitude | age + education, data = pun, panel = cotab_coindep, n = 5000, type = "assoc", test = "maxchisq", interpolate = 1:2) @ \caption{Conditional association plot with maximum sum-of-squares shading for conditional independence of memory and attitude given age and education.} \label{fig:pun} \end{center} \end{figure} \setkeys{Gin}{width=\textwidth} \begin{figure}[t!] \begin{center} <>= set.seed(rseed) cotabplot(~ memory + attitude | age + education, data = pun, panel = cotab_coindep, n = 5000, type = "mosaic", test = "maxchisq", interpolate = 1:2) @ \caption{Conditional mosaic plot with maximum sum-of-squares shading for conditional independence of memory and attitude given age and education.} \label{fig:pun2} \end{center} \end{figure} <>= set.seed(rseed) coindep_test(pun, 3:4, n = 5000) set.seed(rseed) coindep_test(pun, 3:4, n = 5000, indepfun = ss) set.seed(rseed) coindep_test(pun, 3:4, n = 5000, indepfun = ss, aggfun = sum) @ Graphically, this dependence can be brought out using conditional association or mosaic plots as shown in Figure~\ref{fig:pun} and \ref{fig:pun2}, respectively. Both reveal an association between memories and attitude for the lowest education group (first column) and highest age group (last row): experienced violence seems to engender violence again as there are less adults that disapprove punishment in the group with memories of punishments than expected under independence. For the remaining four age-education groups, there seems to be no association: all residuals of the conditional independence model are very close to zero in these cells. The figures employ the maximum sum-of-squares shading with user-defined cut offs 1 and 2, chosen to be within the range of the residuals. The full-color palette is used only for those strata associated with a sum-of-squares statistic significant at (overall) 5\% level, the reduced-color palette is used otherwise. This highlights that the dependence pattern is significant only for the middle and high age group in the low education column. The other panels in the first column and last row also show a similar dependence pattern, however, it is not significant at 5\% level and hence graphically down-weighted by using reduced color. <>= <> @ <>= <> @ \bibliography{vcd} \end{document}