A quick-start guide to using the runjags package

A quick-start guide to using the runjags package

This document is intended as a short introduction to fitting MCMC models using JAGS and runjags, focussing on learning by example. The intention is to keep this document as brief as possible, but a slightly longer version with more examples is available from here. Additional information and more details are available in the user guide. For detailed documentation of individual functions, see the runjags manual for that function.

Preparation

The runjags packages requires a working installation of JAGS. To test the installation, it is recommended to use the testjags() function:

testjags()
## You are using R version 4.3.2 (2023-10-31; macOS arm64) on a unix
## (macOS arm64) machine, with the X11 GUI
## JAGS version 4.3.1 (universal binary) found successfully using the
## command '/usr/local/bin/jags'
## The rjags package is installed

This should verify that your JAGS installation has been found. The rjags and modeest packages are not strictly required, but are reccomended packages and should be installed where possible.

Running models using run.jags

A recommended introductory text for individuals not familiar with the BUGS language is:

Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D (2012). The BUGS book: A practical introduction to Bayesian analysis. CRC press.

To illustrate how to use run.jags we will look at the Salmonella example from chapter 6.5.2 of this book, with thanks to Lunn, Jackson, Best, Thomas, and Spiegelhalter 2012, for kind permission to reproduce this code. The model below was taken from the WinBUGS example, modified to include a mandatory pair of curly brackets to demarcate the Data and Inits blocks, and adding an optional second Inits block with different starting values for 2 chains:

filestring <- "
Poisson model...

model {
for (i in 1:6) {
for (j in 1:3) {
y[i,j] ~ dpois(mu[i])
}
log(mu[i]) <- alpha + beta*log(x[i] + 10) + gamma*x[i]
}
for (i in 1:6) {
y.pred[i] ~ dpois(mu[i])
}
alpha ~ dnorm(0, 0.0001)
beta ~ dnorm(0, 0.0001)
gamma ~ dnorm(0, 0.0001)
}

Data{
list(y = structure(.Data = c(15,21,29,16,18,21,16,26,33,
27,41,60,33,38,41,20,27,42),
.Dim = c(6, 3)),
x = c(0, 10, 33, 100, 333, 1000))
}

Inits{
list(alpha = 0, beta = 1, gamma = -0.1)
}

Inits{
list(alpha = 10, beta = 0, gamma = 0.1)
}
"

The following call to run.jags reads, compiles, runs and returns the model information along with MCMC samples and summary statistics:

results <- run.jags(filestring, monitor=c('alpha','beta','gamma'))
## Warning: Convergence cannot be assessed with only 1 chain

Note that this model is run from an embedded character vector here for convinience, but run.jags will also accept a file path as the main argument in which case the information will be read from the specified file. There is a print method associated with runjags class objects:

results
## 
## JAGS model summary statistics from 10000 samples (adapt+burnin = 5000):
##                                                                      
##          Lower95     Median     Upper95     Mean         SD      Mode
## alpha     1.7047     2.1616       2.574   2.1562      0.231    2.1752
## beta     0.21774    0.32253     0.44196  0.32358   0.060186   0.31985
## gamma -0.0015233 -0.0010255 -0.00055039 -0.00103 0.00025414 -0.001033
##                                             
##             MCerr MC%ofSD SSeff   AC.10 psrf
## alpha    0.024592    10.6    88 0.82222   --
## beta    0.0066382      11    82 0.84697   --
## gamma 0.000023845     9.4   114 0.62424   --
## 
## Total time taken: 0.6 seconds

This method displays relevant summary statistics - the effective sample size (SSeff) and convergence diagnostic (psrf) is shown for each stochastic variable. It is also advisable to plot the results using the default plot method to assess convergence graphically:

plot(results, layout=c(3,4))

plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8

In this case the chains have converged, but there is a large amount of auto-correlation and therefore a small effective sample size, which can be reduced by loading the GLM module in JAGS:

resultswithglm <- run.jags(filestring, monitor=c('alpha','beta','gamma'), modules='glm')
## module glm loaded
resultswithglm
## 
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##                                                                        
##          Lower95     Median    Upper95       Mean         SD       Mode
## alpha     1.7265      2.154     2.5784     2.1517    0.21857     2.1587
## beta     0.21301    0.32538    0.43455    0.32522   0.056711     0.3259
## gamma -0.0015222 -0.0010376 -0.0005682 -0.0010386 0.00024419 -0.0010581
##                                               
##            MCerr MC%ofSD SSeff    AC.10   psrf
## alpha  0.0029457     1.3  5506 0.010253 1.0003
## beta  0.00071886     1.3  6224 0.014285 1.0003
## gamma 2.7986e-06     1.1  7613 0.012903 1.0002
## 
## Total time taken: 0.8 seconds

This second model has a smaller AC.10 (autocorrelation with lag 10) and therefore larger SSeff (effective sample size), so there will be much less Monte Carlo error associated with these estimates comapred to the first model.

Using data and initial values from R

To facilitate running models using data that is already in R, runjags will look for a number of comment tags in the model file to indicate the variables that should be extracted from the R working environment. For example the following model uses N, Y and X as data, and coef, int and precision are given initial values, all of which are taken from R. In addition, the model indicates to monitor coef, int and precision so that no monitor vector is required in the call to run.jags:

# Simulate the data:
set.seed(1)
X <- 1:100
Y <- rnorm(length(X), 2*X + 10, 1)
N <- length(X)

model <- "
	model {
        for(i in 1 : N){ #data# N
        Y[i] ~ dnorm(true.y[i], precision) #data# Y
        true.y[i] <- (coef * X[i]) + int #data# X
	}
  	coef ~ dunif(-1000,1000)
  	int ~ dunif(-1000,1000)
  	precision ~ dexp(1)
  	#inits# coef, int, precision
  	#monitor# coef, int, precision
}"
# A function to return initial values for each chain:
coef <- function(chain) return(switch(chain, "1"= -10, "2"= 10))
int <- function(chain) return(switch(chain, "1"= -10, "2"= 10))
precision <- function(chain) return(switch(chain, "1"= 0.01, "2"= 100))

# Run the simulation:
results <- run.jags(model, n.chains = 2)

The output of this function can be extended using:

results <- extend.jags(results, sample=5000)

This function call takes an additional 5000 samples and combines them with the original simulation. It is also possible to automatically extend a model until it has converged (as assessed by Gelman and Rubin’s convergence diagnostic) using the autorun.jags and autoextend.jags functions, although manual inspection of trace plots is always recommended to ensure chains have really converged. Note also that each of the functions run.jags, extend.jags, autorun.jags and autoextend.jags allows a method argument to be specified, which allows separate chains to be run on parallel processors where available. Full details are provided are in the user guide, but the most used options are ‘rjags’, ‘interrputible’, ‘parallel’ and ‘background’.

Generating a GLMM template

One of the most difficult things about getting started with JAGS or BUGS is the requirement to write the model yourself. Looking at examples is a good way to learn, but it is even easier when the example is tailored towards your specific model. The template.jags function creates a model template file based on an lme4-style formula interface. We can look at an example from the help file for lm:

ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
D9 <- data.frame(weight, group)

# The JAGS equivalent:
model <- template.jags(weight ~ group, D9, n.chains=2, family='gaussian')

The model is written to disk by template.jags, and includes all of the data and initial values necessary to run the model. Examining this model file is a good way to learn how the BUGS syntax works, as well as some of the options allowed by the runjags package. To run the models:

JAGS.D9 <- run.jags(model)
lm.D9 <- lm(weight ~ group, data=D9)

And to compare results:

JAGS.D9
## 
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##                                                                       
##                      Lower95   Median Upper95     Mean      SD    Mode
## regression_precision 0.82221   1.9869  3.4249   2.0632 0.68863  1.8932
## intercept             4.5915   5.0302  5.5099   5.0314 0.23137  5.0287
## group_effect[1]            0        0       0        0       0       0
## group_effect[2]      -1.0416 -0.36876 0.26128 -0.36818 0.32934 -0.3778
## deviance              40.197   42.691  48.629   43.393  2.6538  41.601
## resid.sum.sq          8.7293   9.4149  12.161   9.8054   1.223  9.0246
##                                                                 
##                          MCerr MC%ofSD SSeff       AC.10    psrf
## regression_precision 0.0053868     0.8 16342  -0.0035518 0.99995
## intercept            0.0016361     0.7 20000  -0.0062014       1
## group_effect[1]             --      --    --          --      --
## group_effect[2]      0.0023288     0.7 20000  -0.0081939  1.0002
## deviance              0.021566     0.8 15142 -0.00035785       1
## resid.sum.sq         0.0094816     0.8 16638   0.0085924       1
## 
## Model fit assessment:
## DIC = 46.72683  (range between chains: 46.7156 - 46.73806)
## [PED not available from the stored object]
## Estimated effective number of parameters:  pD = 3.33419
## 
## Total time taken: 0.4 seconds
summary(lm.D9)
## 
## Call:
## lm(formula = weight ~ group, data = D9)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0710 -0.4938  0.0685  0.2462  1.3690 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.0320     0.2202  22.850 9.55e-15 ***
## groupTrt     -0.3710     0.3114  -1.191    0.249    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6964 on 18 degrees of freedom
## Multiple R-squared:  0.07308,	Adjusted R-squared:  0.02158 
## F-statistic: 1.419 on 1 and 18 DF,  p-value: 0.249

Note that lm reports sigma and JAGS the precision - to make them more comparable we could use a mutate function:

JAGS.D9 <- run.jags(model, mutate=list(prec2sd, 'precision'))

And focus on the results from the summary that we are interested in:

summary(JAGS.D9, vars=c('regression_precision.sd', 'intercept', 'group_effect'))
##                            Lower95     Median   Upper95       Mean        SD
## regression_precision.sd  0.5010718  0.7092006 0.9864847  0.7269795 0.1295457
## intercept                4.5812904  5.0306832 5.4960329  5.0323018 0.2325907
## group_effect[1]          0.0000000  0.0000000 0.0000000  0.0000000 0.0000000
## group_effect[2]         -1.0284244 -0.3747351 0.2510816 -0.3734043 0.3268774
##                               Mode       MCerr MC%ofSD SSeff        AC.10
## regression_precision.sd  0.6878627 0.001030108     0.8 15815 -0.001675477
## intercept                5.0345316 0.001689502     0.7 18953  0.007782636
## group_effect[1]          0.0000000          NA      NA    NA           NA
## group_effect[2]         -0.3628262 0.002226166     0.7 21560  0.004601452
##                             psrf
## regression_precision.sd 1.000038
## intercept               1.000192
## group_effect[1]               NA
## group_effect[2]         1.000334

We can also compare the estimated residuals using for example:

summary(residuals(lm.D9) - residuals(JAGS.D9, output='mean'))
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -0.0007255 -0.0007255 -0.0001631 -0.0001631  0.0003994  0.0003994

Model evaluation

The Deviance Information Criterion (DIC) is typically used to compare models fit using JAGS and BUGS, and is displayed as part of the print method shwon above. This can also be extracted from a runjags model using the extract method, along with other information such as the samplers used:

extract(JAGS.D9, what='samplers')
##   Index.No              Sampler                 Node
## 1        1 bugs::ConjugateGamma regression_precision
## 2        2         glm::Generic      group_effect[2]
## 3        2         glm::Generic            intercept

Another possible method to assess the fit of Bayesian models is using leave-one-out cross-validation, or a drop-k study. A utility function is included within runjags to facilitate this - for more details see the help file for the drop.k function or the extended version of this vignette.

Options

There are a large number of global options that can be controlled using the runjags.options function. For an explanation of these see the help file for the relevant function:

?runjags.options

Further information

There is a sourceforge repository associated with the runjags package, which contains additional versions of the package and internal JAGS module. This page also contains the extended version of this vignette which contains additional material and examples not found in the version hosted on CRAN.

This package is in constant development, so if you find any issues or the code does not behave as expected please either email the pacckage developer or visit the bug-tracker page to report the problem. General comments and suggestions on how you find using this package are also very welcome.

Citation

Devleopment of this R package has consumed a great deal of time and energy, and is a somewhat peripheral activity in my current academic position. In order to justify continued devleopment of the software I need to be able to demonstrate that it is being used within the general research community. So, if you use this software as part of a publication, please remember to cite the software as follows:

citation('runjags')
## To cite runjags in publications use:
## 
##   Matthew J. Denwood (2016). runjags: An R Package Providing Interface
##   Utilities, Model Templates, Parallel Computing Methods and Additional
##   Distributions for MCMC Models in JAGS. Journal of Statistical
##   Software, 71(9), 1-25. doi:10.18637/jss.v071.i09
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {{runjags}: An {R} Package Providing Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for {MCMC} Models in {JAGS}},
##     author = {Matthew J. Denwood},
##     journal = {Journal of Statistical Software},
##     year = {2016},
##     volume = {71},
##     number = {9},
##     pages = {1--25},
##     doi = {10.18637/jss.v071.i09},
##   }

It is also important to cite JAGS and R!

This vignette was built with:

sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Copenhagen
## tzcode source: internal
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] rjags_4-14      coda_0.19-4     runjags_2.2.2-4
## 
## loaded via a namespace (and not attached):
##  [1] stable_1.1.6        fBasics_4022.94     xfun_0.40          
##  [4] clue_0.3-64         lattice_0.21-9      spatial_7.3-17     
##  [7] rpart_4.1.21        knitr_1.43          modeest_2.4.0      
## [10] rmutil_1.1.10       statip_0.2.3        stabledist_0.7-1   
## [13] grid_4.3.2          timeDate_4022.108   compiler_4.3.2     
## [16] highr_0.10          tools_4.3.2         cluster_2.1.4      
## [19] evaluate_0.21       timeSeries_4030.106