--- title: "Simulation study from empirical data with penetrance" author: "BayesMendel Lab" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Simulation study from empirical data with penetrance} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(penetrance) library(ggplot2) library(scales) ``` ## Goal Here we apply the penetrance package to simulated families where the data-generating penetrance function is known and based on existing penetrance estimates. ## Simulated Data The data-generating distribution of the age-specific penetrances is based on existing penetrance estimates for Colorectal cancer in carriers of any pathogenic variant in MLH1 from the PanelPRO Database. The families were simulated using the PedUtils Rpackage. ```{r} dat <- test_fam2 ``` ## Simple simulation Then we run the estimation using the default settings. ```{r, eval=FALSE} # Set the random seed set.seed(2024) # Set the prior prior_params <- list( asymptote = list(g1 = 1, g2 = 1), threshold = list(min = 5, max = 30), median = list(m1 = 2, m2 = 2), first_quartile = list(q1 = 6, q2 = 3) ) # Set the allele frequency for MLH1 based on PanelPRO Database prevMLH1 <- 0.0004453125 # We use the default baseline (non-carrier) penetrance print(baseline_data_default) # We run the estimation procedure with one chain and 20k iterations out_sim <- penetrance( pedigree = dat, twins = NULL, n_chains = 1, n_iter_per_chain = 20000, ncores = 2, baseline_data = baseline_data_default , prev = prevMLH1, prior_params = prior_params, burn_in = 0.1, median_max = TRUE, ageImputation = FALSE, removeProband = FALSE ) ``` ## References Lee G, Liang JW, Zhang Q, Huang T, Choirat C, Parmigiani G, Braun D. Multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer with the novel R package PanelPRO. Elife. 2021;10:e68699. doi:10.7554/eLife.6869