---
title: "Introduction to `lvimp`"
output: 
  rmarkdown::html_vignette:
    keep_md: true
vignette: >
  %\VignetteIndexEntry{Introduction to `lvimp`}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
csl: chicago-author-date.csl
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

`lvimp` is a package that computes nonparametric estimates of summaries of a nonparametric variable importance trajectory over time, and provides inference on the true summaries of the variable importance trajectory. The package depends heavily on the [`vimp` package](https://github.com/bdwilliamson/vimp) for estimating and doing inference on the cross-sectional variable importance at each timepoint in the trajectory.

## Installation

A development version of the package may be downloaded and installed from GitHub using the `remotes` package:
```{r remotes-install, eval = FALSE}
pak::pkg_install("bdwilliamson/lvimp")
```

## Quick Start

This section should serve as a quick guide to using the `lvimp` package. We will cover the three main functions for estimating summaries of the longitudinal variable importance trajectory using simulated data.

First, load the `lvimp` package:
```{r load-lvimp, message = FALSE}
library("lvimp")
```

Next, create some longitudinal data:
```{r gen-data}
set.seed(4747)
p <- 2
n <- 5e4
T <- 3
timepoints <- seq_len(T) - 1
indices <- timepoints + 1
beta_01 <- rep(1, T)
beta_02 <- 1 + timepoints / 4
beta_0 <- lapply(as.list(seq_len(T)), function(t) {
  matrix(c(beta_01[t], beta_02[t]))
})
# generate 2 covariates
x <- lapply(as.list(1:T), function(t) as.data.frame(replicate(p, stats::rnorm(n, 0, 1))))
# apply the function to the x's
y <- lapply(as.list(1:T), function(t) as.matrix(x[[t]]) %*% beta_0[[t]] + rnorm(n, 0, 1))
```

In this scenario, there are three timepoints at which data are collected. The above code block creates a list `x` containing 3 matrices, each with 2 columns and `n` rows; and a list `y` containing three vectors of length `n`. Here, `x` contains the covariates of interest and `y` contains the outcomes of interest.

Next, we use the `vimp` package to estimate the importance of variable 1 relative to variable 2 for predicting $Y$ at each timepoint:
```{r cross-sectional-vim}
library("vimp")
library("SuperLearner")
set.seed(1234)
# in this case, glm is correctly specified (so only use one learner to speed things up)
vim_list_1 <- lapply(as.list(1:T), function(t) {
  vimp::cv_vim(Y = y[[t]], X = x[[t]], indx = 1, V = 10, type = "r_squared",
               SL.library = c("SL.glm"))
})
```

Finally, there are three available summaries in `lvimp`:
* The average variable importance over a contiguous subset of the time series (`lvim_average`)
* The linear trend in variable importance over a contiguous subset of the time series (`lvim_trend`)
* The area under the variable importance trajectory curve over a contiguous subset of the time series (`lvim_autc`)

We now estimate and do inference on these three summary measures:
```{r est-lvim}
# set up an lvim object
lvim_obj <- lvim(vim_list_1, timepoints = 1:3)
# obtain the average
est_lvim <- lvim_average(lvim_obj, indices = 1:3)
# add on the linear trend
est_lvim <- lvim_trend(est_lvim, indices = 1:3)
# add on the AUTC based on a piecewise linear trajectory
est_lvim <- lvim_autc(est_lvim, indices = 1:3)
# inspect the estimates
est_lvim
```