--- title: 'Signac and SPRING: Learning CD56 NK cells from multi-modal analysis of CITE-seq PBMCs from 10X Genomics' author: 'Mathew Chamberlain' date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Mapping cells from CITE-seq PBMCs from 10X Genomics to another data set} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} --- This vignette shows how to use SignacX with Seurat and SPRING to learn a new cell type category from single cell data. ## Load data We start with CITE-seq data that were already classified with SignacX using the SPRING pipeline. ```{r setupSeurat, message = F, eval = F} library(Seurat) library(SignacX) ``` Load CITE-seq data from 10X Genomics processed with SPRING and classified with SignacX already. ```{r setup, message = F, eval = F} # load CITE-seq data data.dir = './CITESEQ_EXPLORATORY_CITESEQ_5K_PBMCS/FullDataset_v1_protein' E = CID.LoadData(data.dir = data.dir) # Load labels json_data = rjson::fromJSON(file=paste0(data.dir,'/categorical_coloring_data.json')) ``` Create a Seurat object for the protein expression data; we will use this as a reference. ```{r Seurat, eval = F} # separate protein and gene expression data logik = grepl("Total", rownames(E)) P = E[logik,] E = E[!logik,] # CLR normalization in Seurat colnames(P) <- 1:ncol(P) colnames(E) <- 1:ncol(E) reference <- CreateSeuratObject(E) reference[["ADT"]] <- CreateAssayObject(counts = P) reference <- NormalizeData(reference, assay = "ADT", normalization.method = "CLR") ``` Identify CD56 bright NK cells based on protein expression data. ```{r Seurat 2, eval = F} # generate labels lbls = json_data$CellStates$label_list lbls[lbls != "NK"] = "Unclassified" CD16 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD16-TotalSeqB-CD16",] CD56 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD56-TotalSeqB-CD56",] logik = log2(CD56) > 10 & log2(CD16) < 7.5 & lbls == "NK"; sum(logik) lbls[logik] = "NK.CD56bright" ``` ## SignacX Generate a training data set from the reference data and save it for later use. Note: * SignacBoot performs feature selection, bootstrapping, imputation and normalization to derive a training data set from single cell data. ```{r Signac, message = T, eval = F} # generate bootstrapped single cell data R_learned = SignacBoot(E = E, spring.dir = data.dir, L = c("NK", "NK.CD56bright"), labels = lbls, logfc.threshold = 1) # save the training data save(R_learned, file = "training_NKBright_v207.rda") ``` ## Classify a new data set with the model Load expression data for a different data set (this was also previously processed through SPRING and SignacX) ```{r Seurat Visualization 0, message = F, eval = F} # Classify another data set with new model # load new data new.data.dir = "./PBMCs_5k_10X/FullDataset_v1" E = CID.LoadData(data.dir = new.data.dir) # load cell types identified with Signac json_data = rjson::fromJSON(file=paste0(new.data.dir,'/categorical_coloring_data.json')) ``` Generate new labels. Note: * Signac trains an ensemble of 100 neural network classifiers using the new training data set built above (R_learned), and then classifies unseen data (E). ```{r Seurat Visualization 1, message = F, eval = F} # generate new labels cr_learned = Signac(E = E, R = R_learned, spring.dir = new.data.dir) ``` Now we amend the existing labels (classified previously with SignacX); we add the new labels and generate a new SPRING layout.Note: * We usually copy the existing SPRING files from "FullDataset_v1" to "FullDataset_v1_Learned" to generate a new layout while preserving the existing layout. ```{r Seurat Visualization 2, message = F, eval = F} # modify the existing labels cr = lapply(json_data, function(x) x$label_list) logik = cr$CellStates == 'NK' cr$CellStates[logik] = cr_learned[logik] logik = cr$CellStates_novel == 'NK' cr$CellStates_novel[logik] = cr_learned[logik] new.data.dir = paste0(new.data.dir, "_Learned") ``` Save results ```{r Seurat Visualization 3, message = F, eval = F} # save dat = CID.writeJSON(cr, spring.dir = new.data.dir, new_colors = c('red'), new_populations = c( 'NK.CD56bright')) ```
**Session Info** ```{r, echo=FALSE} sessionInfo() ```