--- title: "SCRIP simulation for scRNA-seq data" author: "Fei Qin" packages: "SCRIP" date: "Last updated: 11/16/2021" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{SCRIPsimu} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- # 1. Introduction to SCRIP method SCRIP proposed two frameworks based on Gamma-Poisson and Beta-Gamma-Poisson distribution for simulating scRNA-seq data. Both Gamma-Poisson and Beta-Gamma-Poisson distribution model the over dispersion of scRNA-seq data. Specifically, Beta-Gamma-Poisson model was used to model bursting effect. The dispersion was accurately simulated by fitting the mean-BCV dependency using generalized additive model (GAM). Other key characteristics of scRNA-seq data including library size, zero inflation and outliers were also modeled by SCRIP. With its flexible modeling, SCRIP enables various application for different experimental designs and goals including DE analysis, clustering analysis, trajectory-based analysis and bursting analysis # 2. Installation ```{r install-bioc, message=FALSE, warning = FALSE, eval=FALSE} BiocManager::install("splatter") library(devtools) install_github("thecailab/SCRIP") ``` # 3. Quick start Assuming you already have a count matrix for scRNA-seq data, and you want to simulation data based on it. Only a few steps are needed to creat a simulation data using SCRIP. A dataset from Xin data is used for example. ```{r quickstart, message=FALSE, warning = FALSE} library(splatter) library(SCRIP) data(acinar.data) params <- splatEstimate(acinar.data) sim_trend <- SCRIPsimu(data=acinar.data, params=params, mode="GP-trendedBCV") sim_trend ``` \newpage # 4 Single cell type simulation ## 4.1 Parameter estimation SCRIP utlized the estimation strategy from splatter, but also provided more parameters (Fold change, dropout rates, library size, BCV degree of freedom) to serve different experimental designs (i.e. Simulation for differential expression analysis, clustering analysis and trajectory analysis). Detailed description about other parameters will be shown in other sections of this document. ## 4.2 Simulation The default mode in SCRIP for simulation is "GP-trendedBCV". You can also choose other modes ("GP-commonBCV", "BGP-commonBCV","BP", "BGP-trendedBCV") in the SCRIPsimu() function. For single cell type simulation, you have to set the "method" as "single", which was default in SCRIPsimu() function. ### 4.2.1 GP-commonBCV GP-commonBCV is the model used by splatter. GP-commonBCV applied the Gamma-Poisson mixture model with mean-BCV dependency fitted by a common BCV across genes. ```{r, message=FALSE, warning = FALSE} ########################### GP-commonBCV model/Splatter ########################## ################################################################################## sim_GPcommon <- SCRIPsimu(data=acinar.data, params=params, mode="GP-commonBCV") sim_GPcommon ``` ### 4.2.2 GP-trendedBCV GP-trendedBCV is the major model of SCIRP. which used the Gamma-Poisson mixture model with mean-BCV dependency fitted by GAM. ```{r, message=FALSE, warning = FALSE} ############################### GP-trendedBCV model ############################## ################################################################################## sim_GPtrend <- SCRIPsimu(data=acinar.data, params=params, mode="GP-trendedBCV") ``` ### 4.2.3 BP BP is the model used for simulating bursting effect using Beta-Poisson mixture distributionwithout considering BCV effect. ```{r, message=FALSE, warning = FALSE} ############################### BP-commonBCV model ############################## ################################################################################## sim_BP <- SCRIPsimu(data=acinar.data, params=params, mode="BP") ``` ### 4.2.4 BGP-commonBCV BP-commonBCV is the model used for simulating bursting effect with Beta-Gamma-Poisson mixture distribution. The mean-BCV dependency was fitted by a common BCV across genes. ```{r, message=FALSE, warning = FALSE} ############################### BP-commonBCV model ############################## ################################################################################## sim_BGPcommon <- SCRIPsimu(data=acinar.data, params=params, mode="BGP-commonBCV") ``` ### 4.2.5 BGP-trendedBCV BP-trendedBCV is the model used for simulating bursting effect with Beta-Gamma-Poisson mixture distribution. The mean-BCV dependency was fitted by a GAM. ```{r, message=FALSE, warning = FALSE} ############################### BP-trendedBCV model ############################## ################################################################################## sim_BGPtrend <- SCRIPsimu(data=acinar.data, params=params, mode="BGP-trendedBCV") ``` \newpage # 5 Group simulation Group simulation is useful for studying different experimental conditions, especially for differential expression (DE) analysis. To serve different applications in scRNA-seq analysis, SCRIP provides flexible simulation. It can simulate scRNA-seq data with different parameters from multiple cell groups (i.e. cell types), which is useful for evaluating the detection of global characteristics such as clustering. It also allows simulation of group difference in a single cell group, which is useful for evaluating typical DE analysis methods. ## 5.1 Basic group simulation DEGs were simulated using multiplicative differential expression factors from a log-normal distribution with parameters including number of genes (nGenes), the path-specific proportion of DE genes (de.prob), the proportion of down-regulated DE genes (de.downProb), DE location factor (de.facLoc) and DE scale factor (de.facScale). ```{r, message=FALSE, warning = FALSE} sim.SCRIP2 <- SCRIPsimu(data=acinar.data, params=params, method="groups", batchCells=300, group.prob = c(0.25, 0.25, 0.25, 0.25), de.prob = c(0.2, 0.2, 0.2, 0.2), de.downProb = c(0.5, 0.5, 0.5, 0.5), de.facLoc = c(0.2, 0.3, 0.4, 0.5), de.facScale=c(0.2, 0.2, 0.2, 0.2)) ``` ## 5.2 Group simulation with batch effect Batch effect factors are also generated from a log-normal distribution with parameters including batchCells, batch.facLoc and batch.facScale. batchCells: number of cells for each batch \ batch.facLoc: Batch location factor in log-normal distribution for batch factor \ batch.facScale: Batch scale factor in log-normal distribution for batch factor \ ```{r, message=FALSE, warning = FALSE} sim.SCRIP3 <- SCRIPsimu(data=acinar.data, params=params, method="groups", batchCells=c(150, 150), batch.facLoc = c(0.1, 0.1), batch.facScale = c(0.1, 0.1), group.prob = c(0.25, 0.25, 0.25, 0.25), de.prob = c(0.2, 0.2, 0.2, 0.2), de.downProb = c(0.5, 0.5, 0.5, 0.5), de.facLoc = c(0.2, 0.3, 0.4, 0.5), de.facScale=c(0.2, 0.2, 0.2, 0.2)) ```