---
title: "Data Imputation"
author: "Bill Denney"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data Imputation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Imputation may be required for noncompartmental analysis (NCA) calculations.
Typical imputations may require setting the concentration before the first dose
to zero or shifting actual time predose concentrations to the beginning of the
dosing interval.

PKNCA supports imputation either for the full analysis dataset or per
calculation interval.

The current list of imputation methods built into PKNCA can be found by looking
at `?PKNCA_impute_method`:

```{r results='markup'}
library(PKNCA)
cat(paste(
  "*", ls("package:PKNCA", pattern = "^PKNCA_impute_method")
), sep = "\n")
```

## How does imputation occur?

(You can skip this section if you don't desire the details of the methods of
imputation.)

Imputation occurs just before calculations are performed within PKNCA.
Imputation occurs only on a single interval definition at a time, so the same
group (usually meaning the same subject with the same analyte) at the same time
range can have different imputations for different parameter calculations.

The reason that this is done is to ensure that there are no unintentional
modifications to the data.  As an example, if an AUC~0-24~ were calculated on
Day 1 and Day 2 of a study with actual times, the nominal 24 hour sample may be
collected at 23.5 hours.  It may be preferable to keep the 23.5 hour sample at
23.5 hours for the Day 1 calculation, and at the same time, it may be preferred
to shift the same 23.5 hr sample to 24 hours (time 0 on Day 2) for the Day 2
calculation.

## How to select imputation methods to use

The selection of imputation methods uses a string of text with commas or spaces
(or both) separating the imputation methods to use.  No imputation will be
performed if the imputation method is requested as `NA` or `""`.

* To select no imputation (the default), indicate the imputation by `NA` or
  `""`.
* To set imputation on the full dataset, use the `impute` argument to
  `PKNCAdata()` to specify the methods to use.
* To set imputation by interval, use the `impute` argument to `PKNCAdata()` to
  specify the column in the intervals dataset to use for imputation.
* You cannot specify imputation for both the full dataset and by interval at the
  same time.  And, if a column name in the dataset matches the `impute` argument
  to `PKNCAdata()`, that will be used.

Imputation method functions are named `PKNCA_impute_method_[method name]`.  For
example, the method to impute a concentration of 0 at time 0 is named
`PKNCA_impute_method_start_conc0`.  When specifying the imputation method to
use, give the `[method name]` part of the function name.  So for the example
above, use `"start_conc0"`.

To specify more than one, give all the methods in order with a comma or space
separating them.  For example, to first move a predose concentration up to the
time of dosing and then set time 0 to concentration 0, use
`"start_predose,start_conc0"`, and the two methods will be applied in order.

## Imputation for the full dataset

If an imputation applies to the full dataset, it can be provided in the `impute`
argument to `PKNCAdata()`:

```{r impute-full-data}
library(PKNCA)
# Remove time 0 to illustrate that imputation works
d_conc <- as.data.frame(datasets::Theoph)[!datasets::Theoph$Time == 0, ]
conc_obj <- PKNCAconc(d_conc, conc~Time|Subject)
d_dose <- unique(datasets::Theoph[datasets::Theoph$Time == 0,
                                  c("Dose", "Time", "Subject")])
dose_obj <- PKNCAdose(d_dose, Dose~Time|Subject)
data_obj <- PKNCAdata(conc_obj, dose_obj, impute = "start_predose,start_conc0")
nca_obj <- pk.nca(data_obj)
summary(nca_obj)
```

## Imputation by calculation interval

If an imputation applies to specific intervals, the column in the interval
data.frame can be provided in the `impute` argument to `PKNCAdata()`:

```{r impute-by-interval}
library(PKNCA)
# Remove time 0 to illustrate that imputation works
d_conc <- as.data.frame(datasets::Theoph)[!datasets::Theoph$Time == 0, ]
conc_obj <- PKNCAconc(d_conc, conc~Time|Subject)
d_dose <- unique(datasets::Theoph[datasets::Theoph$Time == 0,
                                  c("Dose", "Time", "Subject")])
dose_obj <- PKNCAdose(d_dose, Dose~Time|Subject)

d_intervals <-
  data.frame(
    start=0, end=c(24, 24.1),
    auclast=TRUE,
    impute=c(NA, "start_conc0")
  )

data_obj <- PKNCAdata(conc_obj, dose_obj, intervals = d_intervals, impute = "impute")
nca_obj <- pk.nca(data_obj)
# PKNCA does not impute time 0 by default, so AUClast in the 0-24 interval is
# not calculated
summary(nca_obj)
```

## Advanced: Writing your own imputation functions

Writing your own imputation function is intended to be a simple process.  To
create an imputation function requires the following steps:

1. Write a function where the name starts with `PKNCA_impute_method_` and the
  remainder of the function name is a brief description of the method.  (Such as
  `PKNCA_impute_method_start_conc0`.)
2. The function should have 4 arguments:  `conc`, `time`, `...`, and `options`.
3. The function should return a single data.frame with two columns named `conc`
   and `time`.  The rows in the data.frame must be sorted by `time`.

In addition to the above, the function may take named arguments of:

* `start` and `end` to indicate the start and end time of the interval, and
* `conc.group` and `time.group` to indicate the concentrations and times that
  have not been filtered for the interval.