--- title: 'Benchmarking SignacFast with single cell flow-sorted data' author: 'Mathew Chamberlain' date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Benchmarking SignacFast with flow-sorted data} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} --- This vignette shows how to use SignacFast to annotate flow-sorted synovial cells by integrating SignacX with Seurat. We start with raw counts from [this publication](https://www.nature.com/articles/s41590-019-0378-1). ```{r setup0, include=FALSE} all_times <- list() # store the time for each chunk knitr::knit_hooks$set(time_it = local({ now <- NULL function(before, options) { if (before) { now <<- Sys.time() } else { res <- difftime(Sys.time(), now, units = "secs") all_times[[options$label]] <<- res } } })) knitr::opts_chunk$set( tidy = TRUE, tidy.opts = list(width.cutoff = 95), message = FALSE, warning = FALSE, time_it = TRUE ) #celltypes_fast = readRDS("./fls/celltypes_fast_citeseq.rds") #celltypes = readRDS("./fls/celltypes_citeseq.rds") # pbmc = readRDS("fls/pbmcs_signac_citeseq.rds") celltypes = readRDS(file = "fls/celltypes_amp_synovium.rds") celltypes_fast = readRDS(file = "fls/celltypes_fast_synovium_celltypes.rds") ``` ## Load data Read the CEL-seq2 data. ```{r read CELSeq2, message = F, eval = F} ReadCelseq <- function (counts.file, meta.file) { E = suppressWarnings(readr::read_tsv(counts.file)); gns <- E$gene; E = E[,-1] E = Matrix::Matrix(as.matrix(E), sparse = TRUE) rownames(E) <- gns E } counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz" meta.file = "./fls/celseq_meta.immport.723957.tsv" E = ReadCelseq(counts.file = counts.file, meta.file = meta.file) M = suppressWarnings(readr::read_tsv(meta.file)) # filter data based on depth and number of genes detected kmu = Matrix::colSums(E != 0) kmu2 = Matrix::colSums(E) E = E[,kmu > 200 & kmu2 > 500] # filter by mitochondrial percentage logik = grepl("^MT-", rownames(E)) MitoFrac = Matrix::colSums(E[logik,]) / Matrix::colSums(E) * 100 E = E[,MitoFrac < 20] ``` ## Seurat Start with the standard pre-processing steps for a Seurat object. ```{r setupSeurat, message = F, eval = F} library(Seurat) ``` Create a Seurat object, and then perform SCTransform normalization. Note: * You can use the legacy functions here (i.e., NormalizeData, ScaleData, etc.), use SCTransform or any other normalization method (including no normalization). We did not notice a significant difference in cell type annotations with different normalization methods. * We think that it is best practice to use SCTransform, but it is not a necessary step. Signac will work fine without it. ```{r Seurat, message = T, eval = F} # load data synovium <- CreateSeuratObject(counts = E, project = "FACs") # run sctransform synovium <- SCTransform(synovium) ``` Perform dimensionality reduction by PCA and UMAP embedding. Note: * Signac actually needs these functions since it uses the nearest neighbor graph generated by Seurat. ```{r Seurat 2, message = T, eval = F} # These are now standard steps in the Seurat workflow for visualization and clustering synovium <- RunPCA(synovium, verbose = FALSE) synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE) synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE) ``` ## SignacX ```{r setup2, message = F, eval = F} library(SignacX) ``` Generate Signac labels for the Seurat object. Note: * Optionally, you can do parallel computing by setting num.cores > 1 in the Signac function. ```{r Signac, message = T, eval = F} labels <- Signac(synovium, num.cores = 4) celltypes = GenerateLabels(labels, E = synovium) ``` Sometimes, training the neural networks takes a lot of time. The above classification took 27 minutes. To make a faster method, we implemented SignacFast which uses pre-trained models. Note: * SignacFast uses an ensemble of 1,800 pre-calculated neural networks using the GenerateModels function together with the training_HPCA reference data set. * Features that are absent from the single cell data and present in the neural network are set to zero. ```{r SignacFast, message = T, eval = F} # Run SignacFast labels_fast <- SignacFast(synovium, num.cores = 4) celltypes_fast = GenerateLabels(labels_fast, E = synovium) ``` Compare results: Celltypes: ```{r Compare 1, echo = F, eval = T} knitr::kable(table(Signac = celltypes$CellTypes, SignacFast = celltypes_fast$CellTypes), format = "html") ``` Cellstates: ```{r Compare 2, echo = F, eval = T} knitr::kable(table(Signac = celltypes$CellStates, SignacFast = celltypes_fast$CellStates), format = "html") ``` Save results ```{r save results, message = F, eval = F} saveRDS(synovium, file = "fls/seurat_obj_amp_synovium.rds") saveRDS(celltypes, file = "fls/celltypes_amp_synovium.rds") saveRDS(celltypes_fast, file = "fls/celltypes_fast_amp_synovium_celltypes.rds") ``` ```{r save.times, include = FALSE, eval = F} write.csv(x = t(as.data.frame(all_times)), file = "fls/tutorial_times_SignacFast_AMP.csv") ```
**Session Info** ```{r, echo=FALSE} sessionInfo() ```