--- title: "Quick start guide" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quick start guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, setup, include = FALSE} source(system.file("extdata", "vignettes", "helpers.R", package = "ricu")) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(ricu) ``` ```{r, assign-src, echo = FALSE} src <- "mimic_demo" ``` ```{r, assign-demo, echo = FALSE} demo <- c(src, "eicu_demo") ``` ```{r, demo-miss, echo = FALSE, eval = !srcs_avail(demo), results = "asis"} demo_missing_msg(demo, "start.html") knitr::opts_chunk$set(eval = FALSE) ``` In order to set up `ricu`, download of datasets from several platforms is required. Two data sources, `mimic_demo` and `eicu_demo` are available directly as R packages, hosted on Github. The respective full-featured versions `mimic` and `eicu`, as well as the `hirid` dataset are available from [PhysioNet](https://physionet.org), while access to the remaining standard dataset `aumc` is available from yet another [website](https://amsterdammedicaldatascience.nl/). The following steps guide through package installation, data source set up and conclude with some example data queries. ## Package installation Stable package releases are available from [CRAN](https://cran.r-project.org/package=ricu) as ```{r, eval = FALSE} install.packages("ricu") ``` and the latest development version is available from [GitHub](https://github.com/eth-mds/ricu) as ```{r, eval = FALSE} remotes::install_github("eth-mds/ricu") ``` ## Demo datasets The demo datasets `r paste1(demo)` are listed as `Suggests` dependencies and therefore their availability is determined by the value passed as `dependencies` to the above package installation function. The following call explicitly installs the demo data set packages ```{r, echo = FALSE, eval = TRUE, results = "asis"} cat( "```r\n", "install.packages(\n", " c(", paste0("\"", sub("_", ".", demo), "\"", collapse = ", "), "),\n", " repos = \"https://eth-mds.github.io/physionet-demo\"\n", ")\n", "```\n", sep = "" ) ``` ## Full datasets Included with `ricu` are functions for download and setup of the following datasets: `mimic` (MIMIC-III), `eicu`, `hirid`, `aumc` and `miiv` (MIMIC-IV), which can be invoked in several different ways. - To begin with, a directory is needed where the data can permanently be stored. The default location is platform dependent and can be overridden using the environment variable `RICU_DATA_PATH`. The current value can be retrieved by calling `data_dir()`. - Access to both the PhysioNet datasets (MIMIC-III, eICU and HiRID), as well as to AUMCdb is free but credentialed. In addition to setting up an [account with PhysioNet](https://physionet.org/register/), [credentialing](https://physionet.org/settings/credentialing/) is required, which, in the case of HiRID, must also be followed by submitting an [access request](https://physionet.org/request-access/hirid/1.1.1/) to the data-owners. Details on the procedure for requesting access to AUMCdb is available from [here](https://amsterdammedicaldatascience.nl/amsterdamumcdb/) and consists of filling out a [form](https://amsterdammedicaldatascience.nl/content/uploads/sites/2/2022/12/arfeula_v1.6.pdf) and completing a training course such as [Data or Specimens Only Research (DSOR)](https://physionet.org/about/citi-course/) which, together with proof of training course completion, can be submitted by [email](mailto:access@amsterdammedicaldatascience.nl). - If raw data in `.csv` form has already been downloaded, this can be decompressed and copied to an appropriate sub-folder (`mimic`, `eicu`, `hirid` or `aumc`) to the directory identified by `data_dir()`. - In order to have `ricu` download the required data, login credentials can be supplied as environment variables `RICU_PHYSIONET_USER`/`RICU_PHYSIONET_PASS` and `RICU_AUMC_TOKEN` (the string the follows `token=` in the download URL received from the AUMCdb data owners) or entered into the terminal manually in interactive sessions. - Enabling efficient random row/column access, `ricu` converts `.csv` files into a binary format using the [fst](https://cran.r-project.org/package=fst) package. - Data conversion to `.fst` format (and potentially data download) is automatically triggered upon first access of a table. In interactive sessions, the user is asked for permission to setup the given data source and in non-interactive sessions, access to missing data throws an error. - Instead of relying on first data access to trigger setup, up-front data conversion, possible preceded by data download, can be invoked by calling `setup_src_data()`. ## Concept loading Many commonly used clinical data concepts are available for all data sources, where the required data exists. An overview of available concepts is available by calling `explain_dictionary()` and concepts can be loaded using `load_concepts()`: ```{r, load-ts} <> <> head(explain_dictionary(src = demo)) load_concepts("alb", src, verbose = FALSE) ``` Concepts representing time-dependent measurements are loaded as `ts_tbl` objects, whereas static information is retrieved as `id_tbl` object. Both classes inherit from `data.table` (and therefore also from `data.frame`) and can be coerced to any of the base classes using `as.data.table()` and `as.data.frame()`, respectively. Using `data.table` 'by-reference' operations, this is available as zero-copy operation by passing `by_ref = TRUE`^[While `data.table` by-reference operations can be very useful due to their inherent efficiency benefits, much care is required if enabled, as they break with the usual base R by-value (copy-on-modify) semantics.]. ```{r, load-id} (dat <- load_concepts("height", src, verbose = FALSE)) head(tmp <- as.data.frame(dat, by_ref = TRUE)) identical(dat, tmp) ``` Many functions exported by `ricu` use `id_tbl` and `ts_tbl` objects in order to enable more concise semantics. Merging an `id_tbl` with a `ts_tbl`, for example, will automatically use the columns identified by `id_vars()` of both tables, as `by.x`/`by.y` arguments, while for two `ts_tbl` object, respective columns reported by `id_vars()` and `index_var()` will be used to merge on. When loading form multiple data sources simultaneously, `load_concepts()` will add a `source` column (which will be among the `id_vars()` of the resulting object), thereby allowing to identify stay IDs corresponding to the individual data sources. ```{r, load-mult} load_concepts("weight", demo, verbose = FALSE) ``` ## Extending the concept dictionary In addition to the ~100 concepts that are available by default, adding user-defined concepts is possible either as R objects or more robustly, as JSON configuration files. - Data concepts consist of zero, one, or several data items per data source, encoding how to retrieve the corresponding data. The constructors `concept()` and `item()` can be used to instantiate concepts as R objects. ```{r, create-concept, eval = srcs_avail("mimic_demo")} ldh <- concept("ldh", item("mimic_demo", "labevents", "itemid", 50954), description = "Lactate dehydrogenase", unit = "IU/L" ) load_concepts(ldh, verbose = FALSE) ``` - Configuration files are looked for in both the package installation directory and in user-specified locations, either using the environment variable `RICU_CONFIG_PATH` or by passing paths as function arguments (`load_dictionary()` for example accepts a `cfg_dirs` argument). - Mechanisms for both extending and replacing existing concept dictionaries are supported by `ricu`. The file name of the default concept dictionary is called `concept-dict.json` and any file with the same name in user-specified locations will be used as extensions. In order to forgo the internal dictionary, a different file name can be chosen, which then has to be passed as function argument (`load_dictionary()` for example has a `name` argument which defaults to `concept-dict`) - A JSON-based concept akin to the one above can be specified as ``` { "ldh": { "unit": "IU/L", "description": "Lactate dehydrogenase", "sources": { "mimic_demo": [ { "ids": 50954, "table": "labevents", "sub_var": "itemid" } ] } } } ``` and this can (given that it is saved as `concept-dict.json` in a directory pointed to by `RICU_CONFIG_PATH`) then be loaded using `load_concepts()` as ```{r, eval = FALSE} load_concepts("ldh", "mimic_demo") ``` For further details on constructing concepts, refer to documentation at `?concept` and `?item`.