---
title: "Copula Modeling"
output: rmarkdown::html_vignette
bibliography: copula.bib
vignette: >
  %\VignetteIndexEntry{Copula Modeling}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  bibliography: copula.bib
---

This vignette shows how to model multivariate distributions with 
[copulas](https://en.wikipedia.org/wiki/Copula_(probability_theory))
using `univariateML` and the `copula` package. 

A **copula** is a function describing the dependency among a set of 
one-dimensional distributions. If both the marginal distributions and the copula
is known, the entire multivariate distribution is known too.

Suppose we look at a multivariate distribution as the pair of a copula and its marginals.
Then a natural model selection method is to 

1. Select the marginal distributions using the [AIC](https://en.wikipedia.org/wiki/Akaike_information_criterion).
2. Select a copula using the marginal data transformed to the unit interval, again using the AIC.

This two-step procedure is commonly used due to its simplicity. The procedure must 
be carried out this order since the marginal data cannot be transformed to the
unit interval unless we know the marginal distributions. 

The `univariateML` can be used for task 1, while the `copula` package can be used to do task 2.

## Abalone data

The `abalone` data set is included in this package. It consists of $9$ physical 
measurements of $4177$ sea snails.
```{r abalone}
library("univariateML")
head(abalone)
```

Following @ko2019focused we will take a look at four measurements of the abalones, 
namely `diameter`, `height`, `shell_weight` and `age`. The variable `age` is not 
present in the `abalone` data, but is defined as `age = rings + 1.5`. Moreover, 
there are two outliers in the `height` data at at $1.13$ and $0.52$. We will 
remove these outliers and all columns we don't need in the following.

```{r, make_data, fig.width = 6, fig.height = 5} 
data <- dplyr::filter(abalone, height < 0.5)
data$age <- data$rings + 1.5
data <- data[c("diameter", "height", "shell_weight", "age")]
hist(data$height, main = "Abalone height", xlab = "Height in mm")
```

Let's continue doing step 1. First we must decide on a set of models to try out.
```{r, models}
models <- c(
  "gumbel", "laplace", "logis", "norm", "exp", "gamma",
  "invgamma", "invgauss", "invweibull", "llogis", "lnorm",
  "rayleigh", "weibull", "lgamma", "pareto", "beta", "kumar",
  "logitnorm"
)
length(models)
```

Optionally, we can use all implemented models with
```{r, all_models}
univariateML_models
```

The next step is to fit all models, compute the AIC, and select the best model.
This is exactly what `model_select()` does.

```{r, margin_select}
margin_fits <- lapply(data, model_select, models = models, criterion = "aic")
```

Now we use the `fitCopula` from the package `copula` on the transformed margins of `abalone`.

We will examine two elliptical copulas and three Archimedean copulas. The elliptical 
copulas are the Gaussian copula and the *t*-copula, while the Archimedean copulas
are the Joe copula, the Clayton copula, and the Gumbel copula.


```{r, AIC_copula, warning = FALSE, cache = TRUE}
# Transform the marginals to the unit interval.
y <- sapply(seq_along(data), function(j) pml(data[[j]], margin_fits[[j]]))

# The copulas described above.
copulas <- list(
  normal = copula::normalCopula(dim = 4, dispstr = "un"),
  t = copula::tCopula(dim = 4, dispstr = "un"),
  joe = copula::joeCopula(dim = 4),
  clayton = copula::claytonCopula(dim = 4),
  gumbel = copula::gumbelCopula(dim = 4)
)

fits <- sapply(
  copulas,
  function(x) copula::fitCopula(x, data = y, method = "mpl")
)

sapply(fits, AIC)
```

The *t*-copula is the clear winner of the AIC competition. The Archimedean 
copulas perform particularly poorly.

Hence our final model is the *t*-copula with Kumaraswamy (`mlkumar`) marginal distribution for `diameter`, normal marginal distribution for `height`, Weibull marginal distribution for `shell_weight`, and log normal marginal distribution for `age`.

## References