---
title: "DrImpute : imputing dropout events in single-cell RNA-sequencing data"
date: "`r Sys.Date()`"
author: "Il-Youp Kwak"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{DrImpute : imputing dropout events in single-cell RNA-sequencing data}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r knitr_options, echo=FALSE, results=FALSE}
library(knitr)
opts_chunk$set(fig.width = 12)
```

This vignette illustrates the use of DrImpute software in single cell RNA sequencing data analysis. 

## Data preparation

Example data is taken from Usoskin et al. (2015), GSE59739. We randomly selected 150 cells from original 799 cells. 

```{r loading, include=FALSE}
library(DrImpute)
```

Firstly, genes that are expressed less than 2 cells are removed. 
```{r loading2}

data(exdata)
exdata <- preprocessSC(exdata)
```

Normalization is performed using total read count for simplicity, and then log transformation is applied. 
```{r loading3}
sf <- apply(exdata, 2, mean)
npX <- t(t(exdata) / sf ) 
lnpX <- log(npX+1)
```


## Data analysis

Dropout Imputation can be simply done using DrImpute function. 

```{r loading4}
lnpX_imp <- DrImpute(lnpX)
```

```{r, include=FALSE}
zero_p <- sum(lnpX == 0)/(dim(lnpX)[1]*dim(lnpX)[2])
zero_p_imp <- sum(lnpX_imp == 0)/(dim(lnpX)[1]*dim(lnpX)[2])
```

The ratio of zero is `r round(zero_p,2)`, and `r round(zero_p - zero_p_imp,2)*100` percent of zero's are imputed by DrImpute. 

We visualized single cell RNA sequencing data using PCA with and without imputation by DrImpute.
```{r viz1, echo=FALSE, fig.width=7, fig.height=3.5 }
par(mfrow = c(1,2))

lXc <- scale(t(lnpX), center= TRUE, scale = FALSE)
lXc_imp <- scale(t(lnpX_imp), center= TRUE, scale = FALSE)

library(irlba)
#svd.lXc <- svd(t(lXc))
svd.lXc <- irlba(lXc, nv = 2)
svd.lXc.imp <- irlba(lXc_imp, nv = 2)

PC <- svd.lXc$u %*% diag(svd.lXc$d)
PC.imp <- svd.lXc.imp$u %*% diag(svd.lXc.imp$d)

plot(PC, bg= c("red", "blue","black", "purple")[factor(colnames(exdata))], type = "p", pch = 21, col = "black", main = "Without imputation", xlab = "PC1", ylab="PC2")
plot(PC.imp, bg= c("red", "blue","black", "purple")[factor(colnames(exdata))], type = "p", pch=21, col="black", main = "With DrImpute", xlab = "PC1", ylab="PC2")

add_legend <- function(...) {
  opar <- par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0),
    mar=c(0, 0, 0, 0), new=TRUE)
  on.exit(par(opar))
  plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n')
  legend(...)
}

add_legend("bottomright", legend=levels(factor(colnames(exdata))), pch=19, col = c("red","blue", "black", "purple"), cex=1, horiz = TRUE)

```

Prior to the use of DrImpute, the NP, TH, and PEP groups are visually indistinguishable in the 2D space. However, after using DrImpute, NP, TH, and PEP have better separation.