---
title: "Approximate Case Influence Using Scores and Casewise Likelihood"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Approximate Case Influence Using Scores and Casewise Likelihood}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette explains the computational shortcut implemented in [semfindr](https://sfcheung.github.io/semfindr/) to approximate the casewise influence. The approximation is most beneficial when the sample size *N* is large, where the approximation works better and the computational cost for refitting models *N* times is high.

```{r dat}
library(semfindr)
dat <- pa_dat
```

```{r fit}
library(lavaan)
mod <-
"
m1 ~ iv1 + iv2
dv ~ m1
"
fit <- sem(mod, dat)
```

```{r fit_rerun}
if (file.exists("semfindr_fit_rerun.RDS")) {
    fit_rerun <- readRDS("semfindr_fit_rerun.RDS")
  } else {
    fit_rerun <- lavaan_rerun(fit)
    saveRDS(fit_rerun, "semfindr_fit_rerun.RDS")
  }
```

## Using Scores to Approximate Case Influence

`lavaan` provides the handy `lavScores()` function to evaluate
$$s_i(\theta_m) = \frac{\partial \ell_i(\theta_m)}{\partial \theta_m}$$
for observation $i$, where $\ell_i(\theta)$ denotes the casewise loglikelihood function and $\theta_m$ is the $m$th model parameter.

For example,

```{r lav-scores-head}
head(lavScores(fit)[ , 1, drop = FALSE])
```

indicates the partial derivative of the casewise loglikelihod with respect to the parameter `m1~iv1`. Because the sum of the partial derivatives across all observations is zero at the maximum likelihood estimate with the full sample ($\hat \theta_m$; i.e., the derivative of loglikelihood of the full data is 0), $- s_i(\theta_m)$ can be used as an estimate of the partial derivative of the loglikelihood at $\hat \theta_m$ **for the sample without observation $i$**. This information can be used to approximate the maximum likelihood estimate for $\theta_m$ when case $i$ is dropped, denoted as $\hat \theta_{m(-i)}$

The second-order Taylor series expansion can be used to approximate the parameter vector estimate with an observation deleted, $\hat \theta_{(-i)}$, as in the iterative [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization). Specifically,

$$\hat \theta_{(i)} \approx \hat \theta - \frac{N}{N - 1}V(\hat \theta) \nabla \ell(\hat \theta)$$
$$\hat \theta - \hat \theta_{(i)} \approx \frac{N}{N - 1}V(\hat \theta) \nabla \ell_i(\hat \theta),$$
where $\nabla \ell_i(\hat \theta)$ is the gradient vector of the casewise loglikelihood with respect to the parameters (i.e., score). The $N / (N - 1)$ term is used to adjust for the decrease in sample size (this adjustment is trivial in large samples). This procedure should be the same as equation (4) of [Tanaka et al. (1991)](https://doi.org/10.1080/03610929108830742) (p. 3807) and is related to the one-step approximation described by [Cook and Weisberg (1982)](https://conservancy.umn.edu/handle/11299/37076) (p. 182).

### Comparison

The approximation is implemented in the `est_change_raw_approx()` function:

```{r fit_est_change_approx}
fit_est_change_approx <- est_change_raw_approx(fit)
fit_est_change_approx
```

Here is a comparison between the approximation using `semfindr::est_change_raw_approx()` and `semfindr::est_change_raw()`

```{r compare-est-change}
# From semfindr
fit_est_change_raw <- est_change_raw(fit_rerun)
# Plot the differences
library(ggplot2)
tmp1 <- as.vector(t(as.matrix(fit_est_change_raw)))
tmp2 <- as.vector(t(as.matrix(fit_est_change_approx)))
est_change_df <- data.frame(param = rep(colnames(fit_est_change_raw),
                                        nrow(fit_est_change_raw)),
                             est_change = tmp1,
                             est_change_approx = tmp2)
ggplot(est_change_df, aes(x = est_change, y = est_change_approx)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point(size = 0.8, alpha = 0.5) +
  facet_wrap(~ param) +
  coord_fixed()
```

The results are pretty similar.

### Generalized Cook's distance (*gCD*)

We can use the approximate parameter changes to approximate the *gCD* (see also Tanaka et al., 1991, equation 13, p. 3811):

```{r approx-gcd}
# Information matrix (Hessian)
information_fit <- lavInspect(fit, what = "information")
# Short cut for computing quadratic form (https://stackoverflow.com/questions/27157127/efficient-way-of-calculating-quadratic-forms-avoid-for-loops)
gcd_approx <- (nobs(fit) - 1) * rowSums(
  (fit_est_change_approx %*% information_fit) * fit_est_change_approx
)
```

This is implemented in the `est_change_approx()` function:

```{r est_change_approx-gcd}
fit_est_change_approx <- est_change_approx(fit)
fit_est_change_approx
```

```{r compare-approx-gcd}
# Compare to exact computation
fit_est_change <- est_change(fit_rerun)
# Plot
gcd_df <- data.frame(
  gcd_exact = fit_est_change[ , "gcd"],
  gcd_approx = fit_est_change_approx[ , "gcd_approx"]
)
ggplot(gcd_df, aes(x = gcd_exact, y = gcd_approx)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point() +
  coord_fixed()
```

The approximation tend to  underestimate the actual *gCD* but the rank ordering is almost identical. This is discussed also in Tanaka et al. (1991), who proposed a correction by applying a one-step approximation after the correction (currently not implemented due to the need to recompute scores with updated parameter values).

```{r cor-gcd}
cor(gcd_df, method = "spearman")
```

## Approximate Change in Fit

The casewise loglikelihood---the contribution to the likelihood function by an observation---can be computed in `lavaan`, which approximates the change in loglikelihood when an observation is deleted:

```{r lli}
lli <- lavInspect(fit, what = "loglik.casewise")
head(lli)
```

Here, $\ell(\hat \theta)$ will drop `r abs(round(lli[1], 2))` when observation 1 is deleted. This should approximate $\ell(\hat \theta_{(-i)})$ as long as $\hat \theta_{(-i)}$ is not too different from $\hat \theta$. Here's a comparison:

```{r compare-lli}
# Predicted ll without observation 1
fit@loglik$loglik - lli[1]
# Actual ll without observation 1
fit_no1 <- sem(mod, dat[-1, ])
fit_no1@loglik$loglik
```

They are pretty close. To approximate the change in $\chi^2$, as well as other $\chi^2$-based fit indices, we can use the `fit_measures_change_approx()` function:

```{r chisq_i_approx}
chisq_i_approx <- fit_measures_change_approx(fit)
# Compare to the actual chisq when dropping observation 1
c(predict = chisq_i_approx[1, "chisq"] + fitmeasures(fit, "chisq"),
  actual = fitmeasures(fit_no1, "chisq"))
```

### Comparing exact and approximate changes in fit indices

Change in $\chi^2$

```{r plot-change-chisq}
# Exact measure from semfindr
out <- fit_measures_change(fit_rerun)
# Plot
chisq_change_df <- data.frame(
  chisq_change = out[ , "chisq"],
  chisq_change_approx = chisq_i_approx[ , "chisq"]
)
ggplot(chisq_change_df, aes(x = chisq_change, y = chisq_change_approx)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point() +
  coord_fixed()
```

Change in RMSEA

```{r plot-change-rmsea}
# Plot
rmsea_change_df <- data.frame(
  rmsea_change = out[ , "rmsea"],
  rmsea_change_approx = chisq_i_approx[ , "rmsea"]
)
ggplot(rmsea_change_df, aes(x = rmsea_change, y = rmsea_change_approx)) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point() +
  coord_fixed()
```

The values aligned reasonably well along the 45-degree line.

# Limitations

- The approximate approach is tested only for models
  fitted by maximum likelihood (ML) with normal theory
  standard errors (the default).

- The approximate approach does not yet support multilevel
models.

The `lavaan` object will be checked by `approx_check()` to see if it is
supported. If not, an error will be raised.

# References

Cook, R. D., & Weisberg, S. (1982). *Residuals and influence in regression.* New York: Chapman and Hall. https://conservancy.umn.edu/handle/11299/37076

Tanaka, Y., Watadani, S., & Ho Moon, S. (1991). Influence in covariance structure analysis: With an application to confirmatory factor analysis. *Communications in Statistics - Theory and Methods, 20*(12), 3805–3821. https://doi.org/10.1080/03610929108830742