Introduction

Cancer development is ubiquitously associated with aberrant DNA methylation. Noninvasive biomarkers incorporating DNA methylation signatures are of rising interest for cancer early detection. Statistical tests for discovering differential methylation markers have largely focused on assessing differences of mean methylation levels, e.g., between cancer and normal samples. Cancer is a heterogeneous disease. Increased stochastic variability of cancer DNA methylation has been observed across cancers (Hansen and others, 2011; Phipson and Oshlack, 2014), which may reflect adaptation to local tumor environments in the carcinogenesis process. To date, differentially variable CpG (DVC) and excessive outliers have been examined in tumor-adjacent normal tissue samples and in cancer precursors (Teschendor and Widschwendter, 2012; Teschendor and others, 2016), with the potential of identifying early detection markers for the risk of progression to cancer.

In Dai et al (2021), we propose a joint constrained hypothesis test for hypermethylation and hypervariable CpG methylation (DMVC+) cites in a high-throughput profiling experiment. In the DMtest R package we implemented the constrained hypothesis test, along with the standard tests for DMC and DVC. We also implemented another constrained test where there is no constraint for mean difference, only increased variability. As shown in Dai et al (2021), the proposed joint tests substantially improved detection power in simulation studies and the TCGA data example, yielding more cancer CpG markers than the standard DMC and DVC tests.

Example

The following example takes the DNA methylation data from 334 samples of TCGA colorectal cancer samples (TCGA-COAD); In the illustration we use representative 500 CpG probes to save time. For genome-wide data with potentially > 500,000 CpGs, users can invoke parallel computing mode by setting appropriate numbers of cores.

library(DMtest)
#load example data
data(beta)
dim(beta)
#> [1] 500 334
data("covariate")
dim(covariate)
#> [1] 334   3
#compute p-values 
out=dmvc(beta=beta,covariate=covariate)
#> Analyze 500 CpGs across 334 samples at 2021-07-23 09:50:54
#> Start to compute DMCP pvalues at 2021-07-23 09:50:54...
#> Start to compute DVCP pvalues at 2021-07-23 09:50:54...
#> Start to compute joint-test pvalues at 2021-07-23 09:50:54...
#> The result is ready at 2021-07-23 09:50:56
head(out)
#>            Mean_normal Mean_tumor  Mean_all  SD_normal   SD_tumor     SD_all
#> cg00000029   0.2526612  0.1922551 0.1991277 0.07436559 0.11758616 0.11503127
#> cg00000165   0.1630546  0.4299927 0.3996225 0.04157534 0.25052707 0.25099766
#> cg00000236   0.8854114  0.8972493 0.8959024 0.02952325 0.02627329 0.02687998
#> cg00000289   0.7232754  0.7017441 0.7041938 0.06068134 0.07518156 0.07391403
#> cg00000292   0.6779636  0.7176742 0.7131562 0.04792571 0.11052698 0.10600403
#> cg00000321   0.3183090  0.4204032 0.4087877 0.05038070 0.17534940 0.16904093
#>                    DMCP         DVCP      Joint1P      Joint2P       LRT1
#> cg00000029 1.088704e-04 8.331051e-04 1.616176e-05 1.134035e-08  20.103846
#> cg00000165 3.277585e-10 4.263205e-14 4.049246e-24 7.591439e-24 105.232219
#> cg00000236 5.317564e-03 4.755834e-01 1.971397e-03 3.766747e-03  10.191530
#> cg00000289 8.255208e-02 7.901804e-02 1.833443e-01 6.202328e-02   1.690565
#> cg00000292 9.654407e-03 2.778392e-06 5.888740e-09 1.166352e-08  35.392820
#> cg00000321 8.272976e-04 4.598904e-11 2.032467e-14 3.839392e-14  60.587701
#>                  LRT2          pho
#> cg00000029  35.448844  0.247914116
#> cg00000165 105.232219 -0.008043614
#> cg00000236  10.191530 -0.268308357
#> cg00000289   4.722906 -0.299345743
#> cg00000292  35.392820 -0.186707643
#> cg00000321  60.587701 -0.056864336

Reference

Dai, J, Wang, X, Chen, H and others. (2021). Incorporating increased variability in discovering cancer methylation markers, Biostatistics, submitted.

Hansen, K D, Timp, W, Bravo, H C and others. (2011). Increased methylation variation in epigenetic domains across cancer types, Nature Genetics 43, 768–775.

Phipson, B and Oshlack, A. (2014). Diffvar: a new method for detecting differential variability with application to methylation in cancer and aging, Genome Biol 15, 465.

Teschendorff, A E and Widschwendter, M. (2012). Differential variability improves the identification of cancer risk markers in dna methylation studies profiling precursor cancer le- sions. Bioinformatics 28, 1487–1494.

Teschendorff, A E, Jones, A and Widschwendter, M. (2016). Stochastic epigenetic outliers can define field defects in cancer. BMC Bioinformatics 17(178).