| Type: | Package |
| Title: | Martingale Dependence Tools and Testing for Mixture Cure Models |
| Version: | 0.1.0 |
| Description: | Computes martingale difference correlation (MDC), martingale difference divergence, and their partial extensions to assess conditional mean dependence. The methods are based on Shao and Zhang (2014) <doi:10.1080/01621459.2014.887012>. Additionally, introduces a novel hypothesis test for evaluating covariate effects on the cure rate in mixture cure models, using MDC-based statistics. The methodology is described in Monroy-Castillo et al. (2025, manuscript submitted). |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| VignetteBuilder: | knitr |
| Suggests: | knitr, rmarkdown, pinp |
| LinkingTo: | Rcpp, RcppArmadillo, RcppParallel |
| Imports: | Rcpp, RcppParallel, ggplot2, ggtext, gridExtra, future, future.apply, smcure, npcure, survival |
| NeedsCompilation: | yes |
| SystemRequirements: | GNU make, TBB |
| URL: | https://github.com/CastleMon/MDCcure |
| BugReports: | https://github.com/CastleMon/MDCcure/issues |
| Packaged: | 2025-07-22 12:05:39 UTC; estel |
| Author: | Blanca Monroy-Castillo [aut, cre], Amalia Jácome [aut], Ricardo Cao [aut], Ingrid Van Keilegom [aut], Ursula Müller [aut] |
| Maintainer: | Blanca Monroy-Castillo <blancamonroy.96@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-07-23 18:50:02 UTC |
Goodness-of-fit tests for the cure rate in a mixture cure model
Description
The aim of this function is to test whether the cure rate p, as a function of the covariates, satisfies a certain parametric model.
Usage
goft(
x,
time,
delta,
model = c("logit", "probit", "cloglog"),
theta0 = NULL,
nsimb = 499,
h = NULL
)
Arguments
x |
A numeric vector representing the covariate of interest. |
time |
A numeric vector of observed survival times. |
delta |
A numeric vector indicating censoring status (1 = event occurred, 0 = censored). |
model |
A character string specifying the parametric model for the incidence part. Can be |
theta0 |
Optional numeric vector with initial values for the model parameters. Default is |
nsimb |
An integer indicating the number of bootstrap replicates.Default is |
h |
Optional bandwidth value used for nonparametric estimation of the cure rate. Default is |
Details
We want to test wether the cure rate p, as a function of covariates, satisfies a certain parametric model, such as, logistic, probit or cloglog model.
The hypothesis are:
\mathcal{H}_0 : p = p_{\theta} \quad \text{for some} \quad \theta \in \Theta
\quad \text{vs} \quad
\mathcal{H}_1 : p \neq p_{\theta} \quad \text{for all} \quad \theta \in \Theta,
where \Theta is a finite-dimensional parameter space and p_{\theta} is a known function up to the parameter vector \theta.
The test statistic is based on a weighted L_2 distance between a nonparametric estimator \hat{p}(x) and a parametric estimator p_{\hat{\theta}}(x) under \mathcal{H}_0,
as proposed by Müller and Van Keilegom (2019):
\mathcal{T}_n = n h^{1/2} \int \left(\hat{p}(x) - p_{\hat{\theta}}(x)\right)^2 \pi(x) dx,
where \pi(x) is a known weighting function, often chosen as the covariate density f(x).
A practical empirical version of the statistic is given by:
\tilde{\mathcal{T}}_n = n h^{1/2} \frac{1}{n} \sum_{i = 1}^n \left(\hat{p}(x_i) - p_{\hat{\theta}}(x_i)\right)^2,
where the integral is replaced by a sample average.
Value
A list with the following components:
- statistic
Numeric value of the test statistic.
- p.value
Numeric value of the bootstrap p-value for testing the null hypothesis.
- bandwidth
The bandwidth used.
References
Müller, U.U, & Van Keilegom, I. (2019). Goodness-of-fit tests for the cure rate in a mixture cure model. Biometrika, 106, 211-227. doi:10.1093/biomet/asy058
Examples
## Some artificial data
set.seed(123)
n <- 50
x <- runif(n, -2, 2) ## Covariate values
y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes
c <- rexp(n) ## Censoring values
p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible
u <- runif(n)
t <- ifelse(u < p, pmin(y, c), c) ## Observed times
d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator
data <- data.frame(x = x, t = t, d = d)
goft(x, t, d, model = "logit")
Martingale Difference Correlation (MDC)
Description
mdc computes the squared martingale difference correlation between a response variable Y
and explanatory variable(s) X, measuring conditional mean dependence.
X can be either univariate or multivariate.
Usage
mdc(X, Y, center = "U")
Arguments
X |
A vector or matrix where rows represent samples and columns represent variables. |
Y |
A vector or matrix where rows represent samples and columns represent variables. |
center |
Character string indicating the centering method to use. One of:
|
Value
Returns the squared martingale difference correlation of Y given X.
References
Shao, X., and Zhang, J. (2014). Martingale difference correlation and its use in high-dimensional variable screening. Journal of the American Statistical Association, 109(507), 1302-1318. doi:10.1080/01621459.2014.887012.
See Also
Examples
# Generate example data
set.seed(123)
n <- 50
x <- matrix(rnorm(n * 5), nrow = n) # multivariate data with 5 variables
y <- rbinom(n, 1, 0.5) # binary covariate
# Compute MDC with U-centering
mdc(x, y, center = "U")
# Compute MDC with double-centering
mdc(x, y, center = "D")
MDC-Based Dependence Tests Between Multivariate Data and a Covariate
Description
Computes dependence between a multivariate dataset x and a univariate covariate y
using different variants of the MDC (martingale difference correlation) test.
Usage
mdc_test(x, y, method, permutations = 999, parallel = TRUE, ncores = -1)
Arguments
x |
Vector or matrix where rows represent samples, and columns represent variables. |
y |
Covariate vector. |
method |
Character string indicating the test to perform. One of:
|
permutations |
Number of permutations. Defaults to 999. |
parallel |
Logical. Whether to use parallel computing. Defaults to |
ncores |
Number of threads for parallel computing (used only if |
Value
A list containing the test results and p-values.
References
Shao, X., and Zhang, J. (2014). Martingale difference correlation...
Examples
set.seed(123)
x <- matrix(rnorm(50 * 5), nrow = 50)
y <- rbinom(50, 1, 0.5)
mdc_test(x, y, method = "FMDCU")
Martingale Difference Divergence (MDD)
Description
mdd computes the squared martingale difference divergence (MDD) between response variable(s) Y
and explanatory variable(s) X, measuring conditional mean dependence.
Usage
mdd(X, Y, center = "U")
Arguments
X |
A vector or matrix where rows represent samples and columns represent variables. |
Y |
A vector or matrix where rows represent samples and columns represent variables. |
center |
Character string indicating the centering method to use. One of:
Default is |
Value
Returns the squared Martingale Difference Divergence of Y given X.
References
Shao, X., and Zhang, J. (2014). Martingale difference correlation and its use in high-dimensional variable screening. Journal of the American Statistical Association, 109(507), 1302-1318. doi:10.1080/01621459.2014.887012.
Examples
# Generate example data
set.seed(123)
n <- 50
x <- matrix(rnorm(n * 5), nrow = n) # multivariate explanatory variables
y_vec <- rbinom(n, 1, 0.5) # univariate response
y_mat <- matrix(rnorm(n * 2), nrow = n) # multivariate response
# Compute MDD with vector Y and U-centering
mdd(x, y_vec, center = "U")
# Compute MDD with matrix Y and double-centering
mdd(x, y_mat, center = "D")
Plot Cure Probability: A Comparison of Nonparametric and Parametric Estimation
Description
This function generates a plot comparing nonparametric and parametric estimations of cure probability in a univariate setting. The nonparametric estimate is displayed with 95% confidence bands, while the parametric estimate is based on a logit, probit or complementary log-log link. An optional covariate density curve can be added as a secondary axis.
Usage
plotCure(
x,
time,
delta,
main.title = NULL,
title.x = NULL,
model = "logit",
theta = NULL,
legend.pos = "bottom",
density = TRUE,
hsmooth = 10,
npoints = 100
)
Arguments
x |
A numeric vector containing the covariate values. |
time |
A numeric vector representing the observed survival times. |
delta |
A binary vector indicating the event status (1 = event, 0 = censored). |
main.title |
Character string for the main title of the plot. If |
title.x |
Character string for the x-axis label. If |
model |
A character string indicating the assumed model. Options include |
theta |
A numeric vector of length 2, specifying the coefficients for the logistic model to generate the parametric estimate. |
legend.pos |
A character string indicating the position of the legend. Options include |
density |
Logical; if |
hsmooth |
Numeric. Smoothing bandwidth parameter (h) for the cure probability estimator. |
npoints |
Integer. Number of points at which the estimator is evaluated over the covariate range. |
Details
The function estimates the cure probability nonparametrically using the probcure function
and overlays it with a parametric estimate obtained from a logistic regression model.
Confidence intervals (95%) are included for the nonparametric estimate. Optionally,
the density of the covariate can be shown as a shaded area with a secondary y-axis.
Value
A ggplot object representing the cure probability plot.
See Also
Partial Martingale Difference Correlation (pMDC)
Description
pmdd measures conditional mean dependence of Y given X, adjusting for the dependence on Z.
Usage
pmdc(X, Y, Z)
Arguments
X |
A vector or matrix where rows represent samples and columns represent variables. |
Y |
A vector or matrix where rows represent samples and columns represent variables. |
Z |
A vector or matrix where rows represent samples and columns represent variables. |
Value
Returns the squared partial martingale difference correlation of Y given X, adjusting for the dependence on Z.
References
Park, T., Shao, X., and Yao, S. (2015). Partial martingale difference correlation. Electronic Journal of Statistics, 9(1), 1492-1517. doi:10.1214/15-EJS1047.
Examples
# Generate example data
set.seed(123)
n <- 50
x <- matrix(rnorm(n * 5), nrow = n) # explanatory variables
y <- matrix(rnorm(n), nrow = n) # response variable
z <- matrix(rnorm(n * 2), nrow = n) # conditioning variables
# Compute partial MDD
pmdd(x, y, z)
Partial Martingale Difference Divergence (pMDD)
Description
pmdd measures conditional mean dependence of Y given X, adjusting for the dependence on Z.
Usage
pmdd(X, Y, Z)
Arguments
X |
A vector or matrix where rows represent samples and columns represent variables. |
Y |
A vector or matrix where rows represent samples and columns represent variables. |
Z |
A vector or matrix where rows represent samples and columns represent variables. |
Value
Returns the squared partial martingale difference divergence of Y given X, adjusting for the dependence on Z.
References
Park, T., Shao, X., and Yao, S. (2015). Partial martingale difference correlation. Electronic Journal of Statistics, 9(1), 1492-1517. doi:10.1214/15-EJS1047.
Examples
# Generate example data
set.seed(123)
n <- 50
x <- matrix(rnorm(n * 5), nrow = n) # explanatory variables
y <- matrix(rnorm(n), nrow = n) # response variable
z <- matrix(rnorm(n * 2), nrow = n) # conditioning variables
# Compute partial MDD
pmdd(x, y, z)
Covariate Hypothesis Test of the Cure Probability based on Martingale Difference Correlation
Description
Performs nonparametric hypothesis tests to evaluate the association between a covariate and the cure probability in mixture cure models. Several test statistics are supported, including martingale difference correlation (MDC)-based tests and an alternative GOFT test.
Usage
testcov(
x,
time,
delta,
h = NULL,
method = "FMDCU",
P = 999,
parallel = TRUE,
ncores = -1
)
Arguments
x |
A numeric vector representing the covariate of interest. |
time |
A numeric vector of observed survival times. |
delta |
A binary vector indicating censoring status: |
h |
Bandwidth parameter for kernel smoothing. Either a positive numeric value, |
method |
Character string specifying the test to perform. One of:
Default is |
P |
Integer. Number of permutations or bootstrap replications used to compute the null distribution of the test statistic.
For methods |
parallel |
Logical. If |
ncores |
Integer. Number of cores to use for parallel computing. If |
Details
The function computes a statistic, based on the methodology proposed by Monroy-Castillo et al.,
to test whether a covariate \boldsymbol{X} has an effect on the cure probability.
\mathcal{H}_0 : \mathbb{E}(\nu | \boldsymbol{X}) \equiv 1 - p \quad \text{a.s.}
\quad \text{vs} \quad
\mathcal{H}_1 : \mathbb{E}(\nu | \boldsymbol{X}) \not\equiv 1 - p \quad \text{a.s.}
The main problem is that the response variable (cure indicator \nu) is partially observed due to censoring.
This is addressed by estimating the cure indicator using the methodology of Amico et al. (2021).
We define \tau = \sup_x \tau(x), with \tau(x) = \inf\{t: S_0(t|x) = 0\}.
We assume \tau < \infty and that follow-up is long enough so that \tau < \tau_{G(x)} for all x.
Therefore, individuals with censored observed times greater than \tau are considered cured (\nu = 1).
Four tests are proposed: three are based on the martingale difference correlation (MDC). For the MDCU and MDCV tests, the null distribution is approximated via a permutation procedure. To provide a faster alternative, a chi-squared approximation is implemented for the MDCU test statistic (FMDCU). Additionally, a modified version of the goodness-of-fit test proposed by Müller and Van Keilegom (2019) is included (GOFT). The test statistic is given by:
\widehat{\mathcal{T}}_n = nh^{1/2}\frac{1}{n}\sum_{i = 1}^{n}\left\{\hat{p}_h(X_i) - \hat{p}\right\}^2,
where \hat{p}_h(X_i) denotes the nonparametric estimator of the cure probability under the alternative hypothesis,
and \hat{p} denotes the nonparametric estimator of the cure probability under the null hypothesis.
The approximation of the critical value for the test is done using the bootstrap procedure given in Section 3 of Müller and Van Keilegom (2019).
Value
A list containing:
-
test_results: A list with the results (e.g., test statistics and p-values) of the selected test(s). -
nu_hat: A numeric vector of estimated cure probabilities.
References
Amico, M, Van Keilegom, I. & Han, B. (2021). Assessing cure status prediction from survival data using receiver operating characteristic curves. Biometrika, 108(3), 727–740. doi:10.1093/biomet/asaa080
López-Cheda, A., Cao, R., Jácome, M. A., & Van Keilegom, I. (2016). Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Computational Statistics & Data Analysis, 100, 490–502. doi:10.1016/j.csda.2016.04.006
Müller, U.U, & Van Keilegom, I. (2019). Goodness-of-fit tests for the cure rate in a mixture cure model. Biometrika, 106, 211-227. doi:10.1093/biomet/asy058
Shao, X., & Zhang, J. (2014). Martingale difference correlation and its use in high-dimensional variable screening. Journal of the American Statistical Association, 105, 144-165. doi:10.1080/01621459.2014.887012
See Also
Examples
## Some artificial data
set.seed(123)
n <- 50
x <- runif(n, -2, 2) ## Covariate values
y <- rweibull(n, shape = .5*(x + 4)) ## True lifetimes
c <- rexp(n) ## Censoring values
p <- exp(2*x)/(1 + exp(2*x)) ## Probability of being susceptible
u <- runif(n)
t <- ifelse(u < p, pmin(y, c), c) ## Observed times
d <- ifelse(u < p, ifelse(y < c, 1, 0), 0) ## Uncensoring indicator
data <- data.frame(x = x, t = t, d = d)
testcov(x, t, d)
Hypothesis test for association between covariate and cure indicator adjusted by a second covariate
Description
Performs a permutation-based test assessing the association between a primary covariate (x) and the cure indicator, while adjusting for a secondary covariate (z).
The test calculates the p-value via permutation using the partial martingale difference correlation.
Usage
testcov2(x, time, z, delta, P = 999, H = NULL)
Arguments
x |
Numeric vector. The primary covariate whose association with the latent cure indicator is tested. |
time |
Numeric vector. Observed survival or censoring times. |
z |
Numeric vector. Secondary covariate for adjustment. |
delta |
Numeric vector. Censoring indicator (1 indicates event occurred, 0 indicates censored). |
P |
Integer. Number of permutations used to compute the permutation p-value. Default is 999. |
H |
Optional numeric. Bandwidth parameter (currently unused, reserved for future extensions). |
Details
In order to test if the cure rate depends on the covariate \boldsymbol{X} given it depends on the covariate \boldsymbol{Z}. The hypotheses are
\mathcal{H}_0 : \mathbb{E}(\nu | \boldsymbol{X}) \equiv 1 - p(\boldsymbol{X}) \quad \text{a.s.}
\quad \text{vs} \quad
\mathcal{H}_1 : \mathbb{E}(\nu | \boldsymbol{X}) \not\equiv 1 - p(\boldsymbol{X}) \quad \text{a.s.}
The proxy of the cure rate under the null hypothesis \mathcal{H}_0 is obtained by:
\mathbb{I}(T > \tau) + (1-\delta)\mathbb{I}(T \leq \tau) \, \frac{1 - p(\boldsymbol{Z})}{1 - p(\boldsymbol{Z}) + p(\boldsymbol{Z})S_0(T|\boldsymbol{X,Z})}.
The statistic for testing the covariate hypothesis is based on partial martingale difference correlation and it is given by:
\text{pMDC}_n(\hat{\nu}_{\boldsymbol{H}}|\boldsymbol{X,Z})^2.
The null distribution is approximated using a permutation test.
Value
List with components:
- statistic
Numeric. The test statistic value.
- p.value
Numeric. The permutation p-value assessing the null hypothesis of no association between
xand the latent cure indicator, adjusting forz.
References
Park, T., Saho, X. & Yao, S. (2015). Partial martingale difference correlation. Electronic Journal of Statistics, 9, 1492–1517. doi:10.1214/15-EJS1047
See Also
pmdc for the partial martingale difference correlation, pmdd for the partial martingale difference divergence,
testcov for the test for one covariate.