--- title: "VIM" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{VIM} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- VIM introduces tools for visualization of missing and imputed values. Forthermore, methods to impute missing values are featured. This vignette will give a brief look at a common imputation scenario and showcase how VIM can be used to both impute the data and also interpret the results visually. ## Visualize missing values ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.align = "center" ) ``` ```{r setup, message=FALSE, fig.height = 3} library(VIM) data(sleep) a <- aggr(sleep, plot = FALSE) plot(a, numbers = TRUE, prop = FALSE) ``` The left plot shows the amount of missings for each column in the dataset `sleep` and the right plot shows how often each combination of missings occur. For example, there are 9 rows wich contain a missing in both `NonD` and `Dream`. For simplicity, we will only look at the variables `Dream` and `Sleep` for the remainer of this vignette. Bivariate datasets can be passed to special functions that visualize the structure of missings such as `marginplot()`. ```{r} x <- sleep[, c("Dream", "Sleep")] marginplot(x) ``` The __red__ boxplot on the left shows the distrubution of all values of `Sleep` where `Dream` contains a missing value. The __blue__ boxplot on the left shows the distribution of the values of `Sleep` where `Dream` is observed. ## Impute missing values In order to impute missing values, `VIM` offers a spectrum of imputation methods like `kNN()` (k nearest neighbour), `hotdeck()` and so forth. Those functions can be applied to a `data.frame` and return another `data.frame` where missings are replaced by imputed values. ```{r} x_imputed <- kNN(x) ``` To learn more about all implemented imputation methods, three vignettes are available - `vignette("donorImp")` explains the donor-based imputation methods `hotdeck()` and `kNN()` - `vignette("modelImp")` gives insight into the model-based imputation methods `regressionImp()` and `matchImpute()` - `vignette("irmi")` showcases the `irmi()` method. ## Visualize imputed values The same functions that visualize missing values can also visualize the imputed dataset. ```{r} marginplot(x_imputed, delimiter = "_imp") ``` In this plot three differnt colors are used in the top-right. These colors represent the structure of missings. * __brown__ points represent values where `Dream` was missing initially * __beige__ points represent values where `Sleep` was missing initially * __black__ points represent values where both `Dream` and `Sleep` were missing initially The `kNN()` method seemingly preserves the correlation between `Dream` and `Sleep`.