---
title: "micsr objects"
date: today
date-format: long
bibliography: ../inst/REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{micsr objects}
  %\VignetteEngine{quarto::pdf}
  %\VignetteEncoding{UTF-8}
---

**micsr** provides a special `micsr` class for fitted models, that is
particularly convenient for models fitted by maximum likelihood.

## Estimation

With **micsr**, models are estimated using the `micsr::maximize`
function which is an interface for three optimization routines, selected
using the `opt` argument:

- `"bfgs"` uses the `stats::optim` function with the `method` argument
  set to `"BFGS"`,
- `"nlm"` uses the `stats::nlm` function which uses the
  Newton-Ralphson algorithm,
- `"newton"` uses the `micsr::newton` function which is a fast
  implementation of the Newton-Ralphson algorithm.
  
For simple log-likelihood function, `"newton"` is the fastest way to
estimate the model, otherwise, `"bfgs"` is the safer. 

The first argument is the function that returns the log-likelihood of
the model and further argument of this function can be passed using
the `...` argument of `maximize`:

- `gradient`, `hessian` and `information` are boolean which enable to
  compute the gradient, the hessian and the information matrix that
  are returned as attributes of the result,
- `sum` is a boolean: if `FALSE` the result is a vector containing the
  individual contributions to the log-likelihood and the `gradient`
  attribute is a matrix containing the individual contributions to
  elements of the gradients,
- `opposite` is a boolean: if `TRUE`, the opposite of the function,
  the gradient and the hessian are returned; this is usefull if the
  optimization function performs a minimization,
- `y`, `X` and `weights` are the vector of response, the matrix of
  covariates and a vector of weights.

Several functions are provided to estimate models already available in
core **R** (the **stats** package) and recommanded package.

- `binomreg` can be used to estimate the binomial model as an
alternative to `stats::glm` function. An `"identity"` is provided to
estimate the linear probability model and two-part formulas can be
provided to estimate the probit model with instrumental variables. In
this case, several method of estimation are provided selected by the
`method` argument: either `"ml"` for maximum-likelihood, `"twosteps"`
for the two-step estimator and `"minchisq"` for the minimum $\chi^2$
estimator,
- `poisreg` estimates different flavors of count models. The basic use
  of this function estimate the Poisson model as an alternative to the
  `stats::glm` function with `family = poisson(link = 'log')`. Mixing
  distribution can be introduced with the `mixing` argument which is
  `"none"` by default but can be set to `"lognorm"` to get the
  log-normal Poisson model and to `"gamma"` to get the NegBin model
  (the `vlink` argument is either set to `"nb1"` or to `"nb2"` to get
  the two flavors of the Negbin model, which can also be estimated
  using the `MASS::glm.nb` function),
- `weibreg` estimates the Weibull model that is normally estimated
  using the `survival::survreg` function with the default `"weibull"`
  `dist` argument. Using the `model` argument, one can use the
  "accelerature failure time" (`"aft"`) or the "proportional hazard"
  parametrization (`"ph"`). Moreover, the most used mixing survival
  model is obtained by setting the `mixing` argument to `TRUE` (in
  this case, the gamma mixing distribution is used).
- `ordreg` estimates the ordered model, also called, for the logit
  link the "proportional odds logistic regression). A link argument
  enables to estimate the model with a probit, logit or cloglog
  link. This model can be estimated using the `MASS::polr` function
  but `ordreg` deals with censored responses.
- `tobit1` estimates models for truncated responses, using either
  censored or truncated samples (using the `sample` argument); in the
  first case we get the so-called tobit or tobit1 model, in the second
  case the truncated model. The `left` and `right` arguments can be
  set to indicate the truncation points (the convenient default is 0
  for the former and $+\infty$ for the latter). Several methods of
  estimations are provided and selected using the `method` arguments:
  `"ml"` for maximum likelihood, `"lm"` for the (biased) linear
  estimator, `"twostep"` for the two-steps consistent estimator and
  `"trimmed"` for the symetric trimmed estimator. If a two-part
  formula is provided, by default, an instrumental variable is
  performed, either using maximum likelihood or the minimum $\chi ^ 2$
  estimator. If the `scedas` argument is not null, but set either to
  `"exp"` or `"pnorm"`, the heterosckedastic tobit model is
  estimated. 

All these functions have in common with `stats::lm` their first
arguments: `formula`, `data`, `subset`, `weights`, `na.action`,
`offset` and `contrasts`. Supplementary arguments include `opt`,
`maxit` and `trace` to select the optimization method with a given
maximum number of iterations and level of printing, `start` to
indicate a vector of starting values and `check_gradient` to check the
correspondance between the analytical and the numerical gradients.

The returned objects, of class `micsr` includes:

- `coefficients`: the vector of coefficients,
- `model`: the model frame,
- `terms`: model terms,
- `value`: a vector of individual contribution to the log-likelihood,
- `gradient`: the matrix of individual contributions to the gradient,
  also called the estimating function,
- `hessian`: the hessian
- `info`: the estimation of the information matrix
- `fitted.values`: a vector of fitted values,
- `linear.predictors`: a vector of linear predictors, i.e.,
  $X\hat{\beta}$
- `logLik`: a vector of length three containing the log-likelihood for
  the proposed, the saturated and the null models,
- `tests`: a vector of length three containing the three tests (the
  Wald, the score and the likelihood ratio tests) that "all the
  coefficients except the intercept are zero" (the equivalent of the
  $F$ test for the linear model); these statistics can be used to
  compute different flavors of the coefficient of determination,
- `df.residual`: the number of fitted coefficients,
- `npar`: a named numeric vector, the names being the name of the
  group of coefficients and the value the number of parameter in the
  group,
- `est_method`: a character indicating the method of estimation, i.e.,
  `ml` or `twostep`,
- `call`, `na.action`, `weights`, `offset`, `contrasts` and `xlevels`
(as in `lm` objects),
- `family` a family object (as in `glm`),
- `check_gradient`: if the `check_gradient` argument is `TRUE`, the
  result is a list containing, for the gradient and the hessian, the
  maximal absolute value of the difference between the analytical and
  the numerical gradient and hessian; attribute `"gradient"` is a
  matrix containing the numerical and the analytical gradient and
  attributes `"anal_hess"` and `"num_hess"` contain the analytical and
  the numerical hessians.


## Subsets of coefficients

Often, especially for advanced models, the whole set of fitted
parameters can be splited in several groups. For example, while fitted
by maximum likelihood a limited dependent model (probit or tobit):

```{r }
#| warning: false
library(micsr)
bank_msq <- ivldv(federiv ~ eqrat + optval +  mktbk +
                      perfor + dealdum | . - eqrat - optval +
                      no_emp + no_subs + no_off,
                  data = federiv, method = "ml")
```

fitted coefficients belongs to 4 different groups:

- the coefficients associated with covariates (the coefficients of
  main interest,
- the coefficients of the residuals of the endogenous variables,
- the coefficients of the instruments,
- the parameters of the cholesky decomposition of the covariance
  matrix for the endogenous variables.

This information is stored in the `rpar` element of the `micsr`
object:

```{r }
bank_msq$npar
```
It's a named list of integers containing the number of coefficients for
each groups. It may have a `"default"` attribute which indicates which
subset should be selected by default. The selection of a subset of coefficients is
performed by the `select_coef` function:

```{r }
#| collapse: true
select_coef(bank_msq)
select_coef(bank_msq, subset = c("resid", "chol"))
```

`select_coef` is called inside the `coef` and the `vcov` methods for
`micsr` objects:

```{r }
#| collapse: true
coef(bank_msq)
coef(bank_msq, subset = c("resid", "chol"))
vcov(bank_msq, subset = c("resid"))
```
Another way to select a subset of coefficients is to provide a regular
expression for the `grep` argument. For example, to get all the
coefficients that contains `"Intercept"`:

```{r }
#| collapse: true
vcov(bank_msq, subset = c("all"), grep = "Intercept")
```

The `npar` function extracts the number of the whole set of
coefficients or of specific subsets:

```{r }
#| collapse: true
npar(bank_msq)
npar(bank_msq, subset = c("resid", "chol"))
```

## Estimating function, hessian and individual contribution to the log-likelihood

The estimating function is a $N\times K$ matrix containing the
derivatives of the $N$ elements of the likelihood function with
respect to the K-lenght parameter vector. Its row-sum is the gradient
that should be close to 0 at the optimum. The `sandwich::estfun`
function is a generic that extract this information and several
methods are provided for different objects. Some of these method are
quite tiedous as the matrix is not stored in the object containing the
fitted models. For models of class `micsr`, this matrix is stored in
the `gradient` element and the method is just: `x$gradient`. This
matrix is particularly usefull to compute the variance-covariance
matrix estimator based on the outer-product of the gradient or on
sandwich formula.

The hessian is the $K\times K$ matrix of second-derivatives of the
log-likelihood function with respect with the parameters vector. Its
stored in the `hessian` object of a `micsr` object. In sandwich
estimators of the covariance matrix, the "meat" is $N (- H) ^ {-1}$
and it is how the `meat` method for `micsr` objects is defined.

The information matrix is either the variance of the gradient or the
opposite of the expectation of the hessian. When the analytical
computation of the information matrix possible, it is stored in the
`info` element of the `micsr` object.

The log-likelihood is the sum of $N$ contributions: the individual
contributions is a N-length vector, which is stored as the `value`
element of the `micsr` object. This vector is in particular usefull to
compute Vuong's test.

## Covariance matrix estimation

Three elements of the object can be used to compute different flavors
of the covariance matrix of the coefficients:

- `gradient` to compute the outer-product of the gradient estimator,
- `hessian` to compute the hessian-based estimator,
- `info` to compute the information-based estimator.

These matrix are computed using the `vcov` method, which has a `vcov`
argument that can be set to `"info"`, `"hessian"` or `"opg"`.

```{r }
vcov(bank_msq, subset = "resid", vcov = "opg")
```

The vector of standard errors are often used, in particular in the
table of coefficients. It can be obtained by taking the square root of
the diagonal elements of the covariance matrix, or more simply by
using `micsr::stder`

```{r }
#| collapse: true
sqrt(diag(vcov(bank_msq, subset = "resid")))
stder(bank_msq, subset = "resid")
```

## Goodness of fit measures

For models fitted by maximum likelihood, most of the GOF statistics
are based on the value of the objective function at the optimum. Three
values of the log-likelihood are stored in the `logLik` element of a
`micsr` object:

- the value at the optimum, for the fitted `model`,
- the value for the `saturated` model, ie the model with no degrees of
  freedom,
- the value for the `null` model, ie the model without any covariates.

```{r }
pbt <- binomreg(mode ~ cost + ivtime + ovtime,
                data = mode_choice, link = 'probit')
pbt$logLik
```

Any of these values can be extracted using the `logLik` method, which
has a `type` argument:

```{r }
#| collapse: true
logLik(pbt)
logLik(pbt, type = "model")
logLik(pbt, type = "saturated")
logLik(pbt, type = "null")
```

Comparing fitted models using the value of the log-likelihood is not
relevant because introducing more covariates (even if they are
irrelevant) will necesseraly increases the value of the
log-likelihood. **AIC** and **BIC** are two information measures that
take into account the number of fitted parameters. The formulas are
respectively:

- $- 2 \ln L + k K$
- $- 2 \ln L + K \ln N$

```{r }
#| collapse: true
AIC(pbt)
BIC(pbt)
AIC(pbt, type = "null")
AIC(pbt, k = 5)
```

The deviance is minus twice the difference between a model and the
hypothetical saturated model. For a linear gaussian model, this is the
sum of square residuals. The deviance method for `micsr` object has a
`type` argument equal either to `"model"`(the default) or `"null"`. In the latter
case, we obtain the "null deviance":


```{r }
#| collapse: true
deviance(pbt)
deviance(pbt, type = "null")
```

Finally, the relevance of a proposed model can be addressed using a
test that "all the coefficients except the intercept are zero", which
is the equivalent of the **F** test for the linear regression
model. Three tests can be conducted, either the Wald test, the score
test and the likelihood ratio test. The values of the statistics are
stored in the `test` element of the result:


```{r }
#| collapse: true
pbt$test
```


## Summary

The `summary`  method use the preeciding infrastructure to compute the
usual table of coefficients that contains the estimators, the standard
errors, the z-statistics and the probability values. A subset of
coefficients can be obtained using the `subset` argument and the
covariance matrix is chosen using the `vcov` argument:

```{r }
summary(bank_msq, subset = c("chol", "resid"), vcov = "opg")
```