FAST: two DLPFC sections

Wei Liu

2024-03-18

This vignette introduces the FAST workflow for the analysis of two humn dorsolateral prefrontal cortex (DLPFC) spatial transcriptomics dataset. FAST workflow is based on the PRECASTObj object created in the PRECAST R package and the workflow of FAST is similar to that of PRECAST; see the vignette (https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html) for the workflow of PRECAST. In this vignette, the workflow of FAST consists of three steps

We demonstrate the use of FAST to two DLPFC Visium data that are here, which can be downloaded to the current working path by the following command:

githubURL <- "https://github.com/feiyoung/FAST/blob/main/vignettes_data/seulist2_ID9_10.RDS?raw=true"
download.file(githubURL,"seulist2_ID9_10.RDS",mode='wb')

Then load to R

dlpfc2 <- readRDS("./seulist2_ID9_10.RDS")

The package can be loaded with the command:

library(ProFAST) # load the package of FAST method
library(PRECAST)
library(Seurat)

View the DLPFC data

First, we view the the two spatial transcriptomics data with Visium platform. There are ~15000 genes for each data batch and ~7000 spots in total.

dlpfc2 ## a list including three Seurat object with default assay: RNA

Check the content in dlpfc2.

head(dlpfc2[[1]])

We observed the genes are Ensembl IDs. In the following, we will transfer the Ensembl IDs to gene symbols for matching the housekeeping genes in the downstream analysis for removing the unwanted variations.

row.names(dlpfc2[[1]])[1:10]

Create a PRECASTObject object

We show how to create a PRECASTObject object step by step. First, we create a Seurat list object using the count matrix and meta data of each data batch.

## Get the gene-by-spot read count matrices
countList <- lapply(dlpfc2, function(x) x[["RNA"]]@counts)
M <- length(countList)
### transfer the Ensembl ID to symbol
for(r in 1:M){
  row.names(countList[[r]]) <- transferGeneNames(row.names(countList[[r]]), now_name = "ensembl",
                                                 to_name="symbol",
                                                 species="Human", Method='eg.db')
}

## Check the spatial coordinates: Yes, they are named as "row" and "col"!
print(head(dlpfc2[[1]]@meta.data))

## Get the meta data of each spot for each data batch
metadataList <- lapply(dlpfc2, function(x) x@meta.data)


## ensure the row.names of metadata in metaList are the same as that of colnames count matrix in countList
for(r in 1:M){
  row.names(metadataList[[r]]) <- colnames(countList[[r]])
}


## Create the Seurat list  object

seuList <- list()
for(r in 1:M){
  seuList[[r]] <- CreateSeuratObject(counts = countList[[r]], meta.data=metadataList[[r]], project = "FASTdlpfc2")
}
print(seuList)
row.names(seuList[[1]])[1:10]

Prepare the PRECASTObject with preprocessing step.

Next, we use CreatePRECASTObject() to create a PRECASTObject object based on the Seurat list object seuList. Here, we select the top 2000 highly variable genes for followed analysis. Users are able to see https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html for what is done in this function.

Fit FAST for this data

Add the model setting

Add adjacency matrix list and parameter setting of FAST. More model setting parameters can be found in model_set_FAST().

Fit FAST

For function FAST, users can specify the number of factors q and the fitted model fit.model. The q sets the number of spatial factors to be extracted, and a lareger one means more information to be extracted but higher computaional cost. The fit.model specifies the version of FAST to be fitted. The Gaussian version (gaussian) models the log-count matrices while the Poisson verion (poisson) models the count matrices; default as poisson. For a fast implementation, we run Gaussian version of FAST here. (Note: The computational time required to run the analysis on personal PCs is approximately ~1 minute.)

Run possion version Users can also use the possion version by the following command:

Evaluate the adjusted McFadden’s pseudo R-square

Next, we investigate the performance of dimension reduction by calculating the adjusted McFadden’s pseudo R-square for each data batch. The manual annotations are regarded as the groud truth in the meta.data of each component of PRECASTObj@seulist.

Embedding alignment and spatial clustering using iSC-MEB

Based on the embeddings from FAST, we use iSC-MEB to jointly align the embeddings and perform clustering. In this downstream analysis, other methods for embedding alignment and clustering can be also used. In the vignette of the simulated ST data, we show another method (Harmony+Louvain) to perform embedding alignment and clustering.

First, we use the function RunHarmonyLouvain() in ProFAST R package to select the number of clusters

PRECASTObj <- RunHarmonyLouvain(PRECASTObj, resolution = 0.3)
ARI_vec_louvain <- sapply(1:M, function(r) mclust::adjustedRandIndex(PRECASTObj@resList$Louvain$cluster[[r]], yList[[r]]))
print(ARI_vec_louvain)

Then, we use the function RuniSCMEB() in ProFAST R package to jointly perform embedding alignment and spatial clustering. The results are saved in the slot PRECASTObj@resList$iSCMEB.

PRECASTObj <- RuniSCMEB(PRECASTObj, seed=1)
str(PRECASTObj@resList$iSCMEB)
sapply(1:M, function(r) mclust::adjustedRandIndex(PRECASTObj@resList$iSCMEB$cluster[[r]], yList[[r]]))

Remove unwanted variations in the log-normalized expression matrices

In the following, we remove the unwanted variations in the log-normalized expression matrices to obtain a combined log-normalized expression matrix in a Seurat object. For this real data, we use housekeeping genes as negative control to remove the unwanted variations. Specifically, we leverage the batch effect embeddings estimated through housekeeping gene expression matrix to capture unwanted variations. Additionally, we utilize the cluster labels obtained via iSC-MEB to retain the desired biological effects. The posterior probability of cluster labels (RList) are in the slot PRECASTObj@resList$iSCMEB.

str(PRECASTObj@resList$iSCMEB)

To capture the unwanted variation, we first select the select the housekeeping genes using the function SelectHKgenes().

## select the HK genes
HKgenes <- SelectHKgenes(seuList, species= "Human", HK.number=200)
seulist_HK <- lapply(seuList, function(x) x[HKgenes, ])
seulist_HK

Then, we integrate the two sections by removing the unwanted variations by using seulist_HK=seulist_HK and Method = "iSC-MEB" in the function IntegrateSRTData(). After obtaining seuInt, we will see there are three embeddings: FAST, iscmeb and position, in the slot seuInt@reductions. FAST are the embeddings obtained by FAST model fitting and are uncorrected embeddings that may includes the unwanted effects (i.e., batch effects); iscmebare the aligned embeddings obtained by iSC-MEB fitting; and position are the spatial coordinates.

seuInt <- IntegrateSRTData(PRECASTObj, seulist_HK=seulist_HK, Method = "iSC-MEB",
                           seuList_raw=NULL, covariates_use=NULL, verbose=TRUE)
seuInt

Visualization

First, user can choose a beautiful color schema using chooseColors() in the R package PRECAST.

cols_cluster <- chooseColors(palettes_name = "Nature 10", n_colors = 8, plot_colors = TRUE)

Then, we plot the spatial scatter plot for clusters using the function SpaPlot() in the R package PRECAST.

p12 <- SpaPlot(seuInt, item = "cluster", batch = NULL, point_size = 1, cols = cols_cluster, combine = FALSE)
library(ggplot2)
p12 <- lapply(p12, function(x) x+ coord_flip() + scale_x_reverse())
drawFigs(p12, layout.dim = c(1,2), common.legend = T)

We use the function AddUMAP() in the R package PRECAST to obtain the three-dimensional components of UMAP using the aligned embeddings in the reduction harmony.

seuInt <- AddUMAP(seuInt, n_comp=3, reduction = 'iscmeb', assay = 'RNA')
seuInt

We plot the spatial tNSE RGB plot to illustrate the performance in extracting features.

p13 <- SpaPlot(seuInt, batch = NULL, item = "RGB_UMAP", point_size = 1, combine = FALSE, text_size = 15)
p13 <- lapply(p13, function(x) x+ coord_flip() + scale_x_reverse())
drawFigs(p13, layout.dim = c(1, 2), common.legend = TRUE, legend.position = "right", align = "hv")

We use the function AddUTSNE() in the R package PRECAST to obtain the two-dimensional components of UMAP using the aligned embeddings in the reduction iscmeb, and plot the tSNE plot based on the extracted features to check the performance of integration.

seuInt <- AddTSNE(seuInt, n_comp = 2, reduction = 'iscmeb', assay = 'RNA')
p1 <- dimPlot(seuInt, item = "cluster", point_size = 0.5, font_family = "serif", cols = cols_cluster,
    border_col = "gray10",  legend_pos = "right")  # Times New Roman
p2 <- dimPlot(seuInt, item = "batch", point_size = 0.5, font_family = "serif", legend_pos = "right")

drawFigs(list(p1, p2), layout.dim = c(1, 2), legend.position = "right", align = "hv")

Combined differential epxression analysis

Finally, we condut the combined differential expression analysis using the integrated log-normalized expression matrix saved in the seuInt object. The function FindAllMarkers() in the Seurat R package is ued to achieve this analysis.

dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 5
dat_deg %>%
    group_by(cluster) %>%
    top_n(n = n, wt = avg_log2FC) -> top10

Plot dot plot of normalized expressions of the layer-specified marker genes for each spatial domain identified by using the FAST embeddings.

col_here <- c("#F2E6AB", "#9C0141") 
library(ggplot2)
marker_Seurat <- readRDS("151507_DEGmarkerTop_5.rds")
marker_Seurat <- lapply(marker_Seurat, function(x) transferGeneNames(x, Method='eg.db'))
maker_genes <- unlist(lapply(marker_Seurat, toupper))
features <- maker_genes[!duplicated(maker_genes)]
p_dot <- DotPlot(seuInt, features=unname(features), cols=col_here, #  idents = ident_here,
              col.min = -1, col.max = 1) + coord_flip()+ theme(axis.text.y = element_text(face = "italic"))+
  ylab("Domain") + xlab(NULL) + theme(axis.text.x = element_text(size=12, angle = 25, hjust = 1, family='serif'),
                                      axis.text.y = element_text(size=12, face= "italic", family='serif'))
p_dot
# table(unlist(yList), Idents(seuInt))

Based on the marker genes, we annotate the Domains 1-9 as Layer5/6, Layer3/4, Layer5, Layer2/3, Layer2/3, Layer6, WM, Layer5, Layer1. Then, we rename the Domains using the function RenameIdents().

seuInt2 <- RenameIdents(seuInt, '1' = 'Layer6', '2' = 'Layer2/3', '3'="Layer5", '4'="WM",
                           '5'='Layer3/4', '6'= 'Layer1')
levels(seuInt2)  <- c("Layer1", "Layer2/3", "Layer3/4", "Layer5", "Layer6", "WM")
p_dot2 <- DotPlot(seuInt2, features=unname(features), cols=col_here, #  idents = ident_here,
              col.min = -1, col.max = 1) + coord_flip()+ theme(axis.text.y = element_text(face = "italic"))+
  ylab("Domain") + xlab(NULL) + theme(axis.text.x = element_text(size=12, angle = 25, hjust = 1, family='serif'),
                                      axis.text.y = element_text(size=12, face= "italic", family='serif'))
p_dot2

Session Info