--- title: "Post-run diagnostics and analyses" date: "2025-03-07" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{Post-run diagnostics and analyses} output: rmarkdown::html_vignette --- In this tutorial, we assume that you successfully ran an MCMC on a network model, and that it is now time to have a critical look at the output of the run. This tutorial will present how to check that the MCMC run went fine, but it will explain only very briefly how to check that the model is compatible with the observed data. More details about this important step is presented in the next vignette about [posterior predictive checks](tutorial-100-posterior-predictive-checks.html). This vignette is using the MCMC run from the [Quick start tutorial](tutorial-010-quick-start.html). Please go back and run the code of that vignette to generate the MCMC data if you haven't already! ``` r # Load the tidyverse package for easier data manipulation library(tidyverse) ``` ### Reminder: what is the result format returned by `run_mcmc()`?
In this case, the chains have converged towards the same region, and the mixing
looks good for all of them. There is no obvious problem visible in those traces.
### Gelman and Rubin's convergence diagnostic
It can be useful to have a more formal test of the convergence of the
chains. The Gelman and Rubin's convergence diagnostic is implemented in the
`coda` package. We can run it with the `coda::gelman.diag()` function. See the
coda documentation `?gelman.diag` for more details about this diagnostic.
Let's have a look at the diagnostic for our chains:
``` r
library(coda)
run %>% gelman.diag()
```
```
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## eta 1.00 1.00
## lambda_algae 1.01 1.02
## lambda_daphnia 1.01 1.02
## lambda_NH4 1.01 1.02
## upsilon_algae_to_daphnia 1.02 1.03
## upsilon_daphnia_to_NH4 1.00 1.00
## upsilon_NH4_to_algae 1.01 1.02
## zeta 1.00 1.01
##
## Multivariate psrf
##
## 1.02
```
The diagnostic values should be very, very close to 1: it looks good in this
case!
If some values were e.g. $>1.05$, this would already be enough to cause us to
wonder about the sampling quality.
### Predicted trajectories
In order to check the quality of the model fit, the consistency between the parameter posteriors and the observed data can be checked by plotting the credible envelopes for the estimated trajectories along with the observed data points. This is called a posterior predictive check and is very important to check that the model can actually predict the observed data reasonably well. If the observed data cannot be satisfactorily predicted from the fitted model, then our model is not a good model of the data!
To do a posterior predictive check, the first step is to generate predictions for the model based on the MCMC posteriors with `predict()`:
``` r
# From the Quick Start tutorial:
# 'm' is the network model we used when calling 'run <- run_mcmc(m, iter = 1000)'
predictions <- predict(m, run, probs = 0.95)
```
We can then visualize the predictions along with the observations with the `plot()` method:
``` r
plot(predictions)
```
This plot enables to compare both the **size** and the **proportion** observations with the predictions.
## Post-run analyses
### Parameter estimates
The quickest way to get parameter estimates is to use the `summary()` function
on the posterior:
``` r
run %>% summary()
```
```
##
## Iterations = 501:1000
## Thinning interval = 1
## Number of chains = 4
## Sample size per chain = 500
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## eta 0.12751 0.049964 0.0011172 0.0019548
## lambda_algae 0.10723 0.067599 0.0015116 0.0021817
## lambda_daphnia 0.03647 0.042346 0.0009469 0.0015642
## lambda_NH4 0.09269 0.068981 0.0015425 0.0027648
## upsilon_algae_to_daphnia 0.07785 0.023643 0.0005287 0.0009433
## upsilon_daphnia_to_NH4 0.04894 0.007169 0.0001603 0.0002337
## upsilon_NH4_to_algae 0.34347 0.045724 0.0010224 0.0017310
## zeta 0.43612 0.237678 0.0053146 0.0099599
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## eta 0.0654264 0.094176 0.11691 0.14817 0.25969
## lambda_algae 0.0085454 0.057541 0.09705 0.14604 0.26499
## lambda_daphnia 0.0007375 0.009573 0.02209 0.04620 0.15846
## lambda_NH4 0.0034180 0.041621 0.07744 0.13014 0.26348
## upsilon_algae_to_daphnia 0.0451278 0.062348 0.07370 0.08860 0.13730
## upsilon_daphnia_to_NH4 0.0361709 0.044109 0.04849 0.05325 0.06439
## upsilon_NH4_to_algae 0.2609275 0.314829 0.34115 0.37062 0.43671
## zeta 0.1844365 0.286386 0.36440 0.51207 1.09344
```
If you need to store those values, for example to plot them, you can assign the
output of `summary()` to an object:
``` r
estimates <- run %>% summary()
names(estimates)
```
```
## [1] "statistics" "quantiles" "start" "end" "thin" "nchain"
```
The means and standard deviations are accessible in `$statistics`:
``` r
estimates$statistics
```
```
## Mean SD Naive SE Time-series SE
## eta 0.12751257 0.049963905 0.0011172269 0.0019547723
## lambda_algae 0.10723050 0.067599025 0.0015115601 0.0021817165
## lambda_daphnia 0.03647363 0.042345690 0.0009468784 0.0015642272
## lambda_NH4 0.09269130 0.068980569 0.0015424524 0.0027648434
## upsilon_algae_to_daphnia 0.07784906 0.023642981 0.0005286731 0.0009432531
## upsilon_daphnia_to_NH4 0.04893872 0.007169036 0.0001603045 0.0002336529
## upsilon_NH4_to_algae 0.34347224 0.045723698 0.0010224130 0.0017310105
## zeta 0.43611523 0.237677637 0.0053146335 0.0099599011
```
and the quantiles are in `$quantiles`:
``` r
estimates$quantiles
```
```
## 2.5% 25% 50% 75% 97.5%
## eta 0.0654264365 0.094176064 0.11691344 0.14816691 0.25969422
## lambda_algae 0.0085454389 0.057540842 0.09704578 0.14603524 0.26498951
## lambda_daphnia 0.0007375003 0.009573074 0.02208908 0.04620389 0.15846417
## lambda_NH4 0.0034180067 0.041620929 0.07743919 0.13013928 0.26347716
## upsilon_algae_to_daphnia 0.0451277873 0.062348037 0.07370339 0.08859513 0.13729757
## upsilon_daphnia_to_NH4 0.0361709218 0.044109217 0.04849449 0.05324976 0.06439118
## upsilon_NH4_to_algae 0.2609274862 0.314829360 0.34115256 0.37062017 0.43670569
## zeta 0.1844365414 0.286385629 0.36440171 0.51207060 1.09343843
```
### Parameter correlations
The dependencies between your model parameters might be of interest to you. If
you would like to analyse the correlations between parameters during the MCMC
run, you can use a few ready-made functions to get a quick overview of the
correlation structure.
The `isotracer` package comes with the minimalist function `mcmc_heatmap()` to
draw the strength of parameter correlations:
``` r
mcmc_heatmap(run)
```
But of course you could use other functions provided by other packages, such as:
- `ggmcmc` package
+ `ggs_crosscorrelation()`
+ `ggs_pairs()`
- `bayesplot` package
+ `mcmc_pairs()`
### Extracting parameters, trajectories and flows
If you are interested in getting detailed tables containing all the samples of the parameter posteriors, you can use the `tidy_mcmc()` function:
``` r
tidy_mcmc(run)
```
```
## # A tibble: 2,000 × 3
## mcmc.chain mcmc.iteration mcmc.parameters
##
Finally, if what you are interested in are not the trajectories per se but the actual flows of nutrient during the experiment, you can use the `tidy_flows()` function to extract flows in a similar way:
``` r
#' Again, note that we also provide the original network model `m`
tf <- tidy_flows(m, run, n = 200)
tf
```
```
## # A tibble: 200 × 4
## mcmc.chain mcmc.iteration mcmc.parameters flows
## *