--- title: "Introduction to `broom.mixed`" author: "Ben Bolker" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{introduction to broom.mixed} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE, message=FALSE} library(knitr) ``` # Introduction `broom.mixed` is a spinoff of the [broom package](https://CRAN.R-project.org/package=broom). The goal of `broom` is to bring the modeling process into a "tidy"(TM) workflow, in particular by providing standardized verbs that provide information on - `tidy`: estimates, standard errors, confidence intervals, etc. - `augment`: residuals, fitted values, influence measures, etc. - `glance`: whole-model summaries: AIC, R-squared, etc. `broom.mixed` aims to provide these methods for as many packages and model types in the R ecosystem as possible. These methods have been separated from those in the main `broom` package because there are issues that need to be dealt with for these models (e.g. different types of parameters: fixed, random-effects parameters, conditional modes/BLUPs of random effects, etc.) that are not especially relevant to the broader universe of models that `broom` deals with. # Mixed-model-specific issues ## Terminology - the upper-level parameters that describe the distribution of random variables (variance, covariance, precision, standard deviation, or correlation) are called *random-effect parameters* (`ran_pars` in the `effects` argument when tidying) - the values that describe the deviation of the observations in a group level from the population-level effect (which could be posterior means or medians, conditional modes, or BLUPs depending on the model) are called *random-effect values* (`ran_vals` in the `effects` argument when tidying) - the parameters that describe the population-level effects of (categorical and continuous) predictor variables are called *fixed effects* (`fixed` in the `effects` argument when tidying) - the categorical variable (factor) that identifies which group or cluster an observation belongs to is called a *grouping variable* (`group` column in `tidy()` output) - the particular level of a factor that specifies which level of the grouping variable an observation belongs to is called a *group level* (`level` column in `tidy()` output) - the categorical or continuous predictor variables that control the expected value (i.e., enter into the linear predictor for some part of the model) are called *terms* (`term` column in `tidy()` output); note that unlike in base `broom`, **the term column may have duplicated values**, because the same term may enter multiple model components (e.g. zero-inflated and conditional models; models for more than one response; fixed effects and random effects) ## Time-consuming computations Some kinds of computations needed for mixed model summaries are computationally expensive, e.g. likelihood profiles or parametric bootstrapping. In this case `broom.mixed` may offer an option for passing a pre-computed object to `tidy()`, eg. the `profile` argument in the `tidy.merMod` (`lmer`/`glmer`) method. # Related packages There are many, many things one might want to do with a fitted model, and `broom.mixed` can only do a few of them. - `emmeans` - `multcomp` - `car` - `afex` - `sjStats`/`sjPlots` - `rockchalk` ## huxtable + broom.mixed ## dotwhisker + broom.mixed `dotwhisker` is a convenient platform for creating dot-whisker plots - either directly from models or lists of models (`tidy()` methods are automatically called to convert the models to a tidy format), or from the (possibly post-processed) output of a `tidy()` call. There are a couple of caveats and issues to be aware of when using `dotwhisker` in conjunction with `broom.mixed`, however. 1. For fixed effects, the `group` value is set to `NA`: in the current CRAN version (0.5.0), an unfortunate `na.omit()` within the `dwplot` code will eliminate all of the fixed effects unless you drop this column before passing the results to `dwplot` (this [has been fixed](https://github.com/fsolt/dotwhisker/commit/e014e8dba95181dfcf5a68964cd7fdeb844e97cd) in the current GitHub version, which you can install with `devtools::install_github("fsolt/dotwhisker")`). 2. In `broom.mixed` output, it is fairly common for a single tidied model to have duplicated entries in the `term` column (e.g. effects that appear in both the conditional and the zero-inflated model, or intercept standard deviations for several different grouping variables). `dotwhisker::dwplot` takes this as evidence that it has been handed a tidied object containing the results from several different models, and asks for a `model` column that will distinguish the non-unique terms. There are (at least) two strategies you can take: - `dwplot(list(fitted_model))` will plot all of the non-unique values together - `tidy(fitted_model) %>% tidyr::unite(term, group, term)` will create a new `term` column that's the combination of the `group` and `term` columns (which will disambiguate random-effect terms from different grouping variables); `unite(term, component, term)` will disambiguate conditional and zero-inflation parameters. The code below shows a slightly more complicated (but prettier) approach. (Some sort of `disambiguate_terms()` function could be added in a future version of the package ...) ```{r dwplot1,message=FALSE,warning=FALSE} library(dplyr) library(tidyr) require(rstan) ## workaround for r-devel problem library(broom.mixed) if (require("brms") && require("dotwhisker") && require("ggplot2")) { L <- load(system.file("extdata", "brms_example.rda", package="broom.mixed")) gg0 <- (tidy(brms_crossedRE) ## disambiguate %>% mutate(term=ifelse(grepl("sd__(Int",term,fixed=TRUE), paste(group,term,sep="."), term)) %>% dwplot ) gg0 + geom_vline(xintercept=0,lty=2) } ``` # Capabilities Automatically retrieve table of available methods: ```{r get_methods} get_methods() ``` Manually compiled list of capabilities (possibly out of date): ```{r capabilities, results="asis",echo=FALSE, message=FALSE} cc <- read.csv(system.file("capabilities.csv", package="broom.mixed")) if (require("pander")) { pander::pander(cc,split.tables=Inf) } ```