%\VignetteIndexEntry{Computing Cumulative Incidence Functions with the etmCIF Function}

\documentclass{article}

\usepackage{amsmath, amssymb}
\usepackage{graphicx}
\usepackage{url}
\usepackage[pdftex]{color}
\usepackage[round]{natbib}

\SweaveOpts{keep.source=TRUE,eps=FALSE}

\title{Computing Cumulative Incidence Functions with the {\tt etmCIF}
  Function, with a view Towards Pregnancy Applications}

\author{Arthur Allignol}

\date{}

\begin{document}

\maketitle

\section{Introduction}

This paper documents the use of the {\tt etmCIF} function to compute
the cumulative incidence function (CIF) in pregnancy data.

\section{Data Example}

The data set {\tt abortion}, included in the {\bf etm} package will be
used to illustrate the computation of the CIFs. We first load the {\bf
  etm} package and the data set.
<<>>=
library(etm)
library(survival)
data(abortion)
@

Briefly, the data set contains information on \Sexpr{nrow(abortion)}
pregnant women collected prospectively by the Teratology Information
Service of Berlin, Germany \citep{meister}. Among these pregnant women,
\Sexpr{with(abortion, table(group)[2])} were exposed therapeutically
to coumarin derivatives, a class of orally active anticoagulant, and
\Sexpr{with(abortion, table(group)[1])} women served as
controls. Coumarin derivatives are suspected to increase the number of
spontaneous abortions. Competing events are elective abortion (ETOP) and
life birth.

Below is an excerpt of the data set
<<>>=
head(abortion)
@

{\tt id} is the individual number, {\tt entry} is the gestational age
at which the women entered the study, {\tt exit} is the gestational
age at the end of pregnancy, {\tt group} is the group membership (0
for controls and 1 for the women exposed to coumarin derivatives) and
{\tt cause} is the cause of end of pregnancy (1 for induced abortion, 2 for
life birth and 3 for spontaneous abortion.)

\section{Computing and plotting the CIFs}

\subsection{The {\tt etmCIF} function}

The CIFs are computed using the {\tt etmCIF} function. It is a
wrapper around the {\tt etm} function, meant
to facilitate the computation of the CIFs. {\tt etmCIF} takes as arguments
\begin{itemize}
\item {\tt formula}: A formula consisting of a {\tt Surv} object on
  the left of a {\tt ~} operator, and the group covariate on the
  right. A {\tt Surv} object is for example created this way: {\tt
    Surv(entry, exit, cause != 0)}. We need to specify the entry
  time ({\tt entry}), the gestational age at end of pregnancy ({\tt
    exit}), and an event indicator ({\tt cause != 0}). The latter
  means that any value different from 0 in {\tt cause} will be
  considered as an event -- which is the case in our example, as we
  don't have censoring.

\item {\tt data}: A data set in which to interpret the terms of the
  formula. In our case, it will be {\tt abortion}.

\item {\tt etype}: Competing risks event indicator. When the status
  indicator is 1 (or TRUE) in the formula, {\tt etype} describes the
  type of event, otherwise, for censored observation, the value of
  {\tt etype} is ignored.

\item {\tt failcode}: Indicates the failure type of interest. Default
  is one. This option is only interesting for some features of the
  plot function.
\end{itemize}

\subsection{Estimation and display of the CIFs}

We know compute the CIFs
<<>>=
cif.abortion <- etmCIF(Surv(entry, exit, cause != 0) ~ group,
                   abortion, etype = cause, failcode = 3)
cif.abortion
@

Above is the display provided by the {\tt print} function. It gives,
at the last event time, the probabilities ({\tt P}) standard errors
({\tt se(P)}), and the total number of events ({\tt n.event}) for the
three possible pregnancy outcomes and for both groups.

More information is provided by the {\tt summary} function.
<<>>=
s.cif.ab <- summary(cif.abortion)
@

The function returns a list of data.frames that contain probabilities,
variances, pointwise confidence intervals, number at risk and number
of events for each event times. the {\tt print} function displays this
information for some selected event times.
<<>>=
s.cif.ab
@

\subsection{Plotting the CIFs}

Interest lies in the CIFs of spontaneous abortion. We display them
using the {\tt plot} function, which by default, plots only the the
CIFs for the event of interest, i.e., the one specified in {\tt
  failcode}.
\setkeys{Gin}{width=0.9\textwidth}
\begin{figure}[!htb]
\begin{center}
<<fig = TRUE, width = 10, height = 10>>=
plot(cif.abortion)
@
\caption{CIFs of spontaneous abortion for the controls (solid line)
  and the exposed (dashed line), using the default settings of the
  {\tt plot} function.}
\end{center}
\end{figure}

\clearpage

We now add confidence intervals taken at week 27, plus a
bit of customisation.
\setkeys{Gin}{width=0.9\textwidth}
\begin{figure}[!htb]
\begin{center}
<<fig = TRUE, width = 10, height = 10>>=
plot(cif.abortion, curvlab = c("Control", "Exposed"), ylim = c(0, 0.6),
     ci.type = "bars", pos.ci = 27, col = c(1, 2), ci.lwd = 6,
     lwd = 2, lty = 1, cex = 1.3)
@
\caption{CIFs of spontaneous abortion for the controls (black) and the
  exposed (red), along with pointwise confidence intervals taken at
  week 27.}
\end{center}
\end{figure}

\clearpage

When the figure is to be in black and white, or when the confidence
intervals are not as separated as in this example, it might be a good
idea to shift slightly one of the bar representing the confidence
interval, so that the two bars don't overlap. This might be done
manipulating the {\tt pos.ci} argument:

\setkeys{Gin}{width=0.9\textwidth}
\begin{figure}[!htb]
\begin{center}
<<fig = TRUE, width = 10, height = 10>>=
plot(cif.abortion, curvlab = c("Control", "Exposed"), ylim = c(0, 0.6),
     ci.type = "bars", pos.ci = c(27, 28), col = c(1, 1), ci.lwd = 6,
     lwd = 2, lty = c(2, 1), cex = 1.3)
@
\caption{CIFs of spontaneous abortion for the controls (dashed line) and the
  exposed (solid line), along with pointwise confidence intervals.}\label{decalage}
\end{center}
\end{figure}

\clearpage

Pointwise confidence intervals can also be plotted for the whole
follow-up period.
\begin{figure}[!htb]
\begin{center}
<<fig = TRUE, width = 10, height = 10>>=
plot(cif.abortion, curvlab = c("Control", "Exposed"), ylim = c(0, 0.5),
     ci.type = "pointwise", col = c(1, 2), lwd = 2, lty = 1, cex = 1.3)
@
\caption{Same as the last pictures, except for the confidence
  intervals, that are displayed for the whole follow-up period.}
\end{center}
\end{figure}

\clearpage

CIFs for other pregnancy outcomes can also be plotted using the {\tt
  which.cif} arguments. For instance, for plotting the CIFs of ETOP
and life birth on the same graph, we specify {\tt which.cif = c(1, 2)}
in the call to {\tt plot}.
\begin{figure}[!htb]
\begin{center}
<<fig = TRUE, width = 10, height = 10>>=
plot(cif.abortion, which.cif = c(1, 2), ylim = c(0, 0.8), lwd = 2,
     col = c(1, 1, 2, 2), lty = c(1, 2, 1, 2), legend = FALSE)
legend(0, 0.8, c("Control", "Exposed"), col = c(1, 2), lty = 1,
       bty = "n", lwd = 2)
legend(0, 0.7, c("ETOP", "Life Birth"), col = 1, lty = c(1, 2),
       bty = "n", lwd = 2)
@
\end{center}
\caption{CIFs of ETOP (solid lines) and life birth (dashed lines) for
  the exposed, in red, and the controls, in black.}
\end{figure}

\clearpage

\subsection{Some More Features}

\paragraph{Competing event names}

For those who don't like using plain numbers for naming the competing
events or the group allocation, it is of course possible to give more
informative names, either as factors or character vectors. For
instance, we define a new group variable that takes value {\tt 'control'}
or {\tt 'exposed'}, and we give more informative names for the pregnancy
outcomes.

<<>>=
abortion$status <- with(abortion, ifelse(cause == 2, "life birth",
                        ifelse(cause == 1, "ETOP", "spontaneous abortion")))
abortion$status <- factor(abortion$status)

abortion$treat <- with(abortion, ifelse(group == 0, "control", "exposed"))
abortion$treat <- factor(abortion$treat)
@

We can compute the CIFs as before, taking care of changing the {\tt failcode} argument.

<<>>=
new.cif <- etmCIF(Surv(entry, exit, status != 0) ~ treat, abortion,
                  etype = status, failcode = "spontaneous abortion")
new.cif
@

The {\tt summary} and {\tt plot} functions will work as before, except
for a more informative outcome from scratch.

\paragraph{Taking advantage of the miscellaneous functions defined for
  {\tt etm} objects}

The {\tt etmCIF} function uses the more general {\tt etm} machinery
for computing the CIFs. Thus the returned {\tt etmCIF} object is for
part a list of {\tt etm} objects (one for each covariate level). It is
therefore relatively easy to use the methods defined for {\tt etm} on
{\tt etmCIF} objects.

An example would be to use the {\tt trprob} function to extract the
CIF of spontaneous abortion for the controls. This function takes as
arguments an {\tt etm} object, the transition we are interested in, in
the form ``from to'' (the state a patient comes from is automatically
defined as being 0 in {\tt etmCIF}), and possibly some time points.
Using {\tt new.cif} from the example above:
<<>>=
trprob(new.cif[[1]], "0 spontaneous abortion", c(1, 10, 27))
@
We applied the {\tt trprob} function to the {\tt etm} object for the
controls (which is in the first item of the output, for the exposed in
the second). The transition of interest is from {\tt 0} to {\tt
  spontaneous abortion}, and we want the CIF at weeks 1, 10 and 27
(just put nothing if you want the CIF for all time points).

Another example would be to use the {\tt lines} function to add a CIF
to an existing plot. The following code snippet adds the CIF of ETOP
for the exposed to Figure \ref{decalage}. That's the {\tt tr.choice}
arguments that defines which CIF to pick. It works in the same way as
in the {\tt trprob} function.

<<eval = FALSE>>=
lines(cif.abortion[[2]], tr.choice = "0 1", col = 2, lwd = 2)
@
\setkeys{Gin}{width=0.9\textwidth}
\begin{figure}[!htb]
\begin{center}
<<echo = FALSE, fig = TRUE, width = 10, height = 10>>=
plot(cif.abortion, curvlab = c("Control", "Exposed"), ylim = c(0, 0.6),
     ci.type = "bars", pos.ci = c(27, 28), col = c(1, 1), ci.lwd = 6,
     lwd = 2, lty = c(2, 1), cex = 1.3)
lines(cif.abortion[[2]], tr.choice = "0 1", col = 2, lwd = 2)
@
\caption{Figure \ref{decalage} along with the CIF of ETOP for the exposed in red.}
\end{center}
\end{figure}

\clearpage

\begin{thebibliography}{1}
\bibitem[Meister and Schaefer, 2008]{meister}
  Meister, R. and Schaefer, C. (2008).
  \newblock Statistical methods for estimating the probability of spontaneous
  abortion in observational studies--analyzing pregnancies exposed to coumarin
  derivatives.
  \newblock {\em Reproductive Toxicology}, 26(1):31--35.
\end{thebibliography}

\end{document}