---
title: "Seurat_processing"
author: "Viswanadham Sridhara"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to myPackage}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
date: "2025-02-24"
---

```{r setup, include=FALSE}
library(scPipeline)
library(Seurat)
library(magrittr)
library(ReactomeGSA)
```

## R Markdown

Most of the RNA-seq experiments focus on bulk RNA-seq methods. However, after closely looking at single cell datasets, the information obtained from single-cell experiments can throw light on variety of underlying biological processes. Here, I downloaded publicly available PMBC dataset to show how this package can be used to run Seurat analyses with minimum parameters list. The reason to create this package is to arrange the Seurat functionality into modules for easy preprocessing, and cell clustering. 

In addition, the module can also be used for batch correction, cell-type annotation, and to transfer annotations from reference dataset to the current dataset (using SingleR and celldex annotations).

10X PBMC data can be found here: https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz

```{r Read10X data}
counts_data <- Read10X(data.dir = "../inst/extdata", gene.column = 1)
```

## Entire Seurat analysis from counts data to cell clustering using 2 functions

Using SeuratPreprocess and SeuratLowDim, the user can go directly from counts data, to clusters identified (t-SNE, UMAP).

```{r Seurat analysis, echo=FALSE}
so <- SeuratPreprocess(counts_data)
so <- SeuratLowDim(so)
```

## Plots

```{r Violin plot of known genes, echo=FALSE}
VlnPlot(so, features = c("MS4A1", "CD79A"))
```

```{r Feature plot of known genes, echo=FALSE}
FeaturePlot(so, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP",
    "CD8A"))
```


## Identify differentially expressed genes (markers list using Seurat)
The output seurat_markers has both the markers list, as well as a subset of the markers list, that are highly confident, both of which are stored in list.
```{r Marker analysis, echo=FALSE}
#Compute intense step
#seurat_markers <- SeuratMarkers(so)
```

Plotting markers by clusters with a threshold of log2FC greater than 1, and using the top 10 genes for each cluster.

```{r Heatmap of markers across clusters, echo=FALSE}
#Uncomment if you run the above chunk of finding markers
# pbmc.markers <- seurat_markers[[1]]
# pbmc.markers %>%
#     group_by(cluster) %>%
#     dplyr::filter(avg_log2FC > 1) %>%
#     slice_head(n = 10) %>%
#     ungroup() -> top10
# DoHeatmap(so, features = top10$gene) + NoLegend()
```

## Next, Identify Reactome Pathways using DE gene list (using ReactomeGSA package)
```{r Pathway analysis, echo=FALSE}
# Needs internet connection to access Reactome database
#seurat_reactome <- ReactomeData(so)
```

# Return the results
  seurat_reactome has 3 items in the list(gsva_result = gsva_result, pathway_expression = pathway_expression, max_difference = max_difference)

Expression of different pathways e.g.,
```{r Pathways expression}
#Uncomment if you run the above chunk of finding markers
#head(seurat_reactome[[2]], n = 3)
```
  
Min, max and differential Expression of different pathways e.g.,
```{r Pathways min max}
#Uncomment if you run the above chunk of finding markers
#head(seurat_reactome[[3]], n = 3)
```