This vignette introduces the FBMS package and shows how to perform Flexible Bayesian Model Selection (BMS) and Bayesian Model Averaging (BMA) for linear, generalized, nonlinear, fractional–polynomial, mixed–effect, and logic–regression models. More details are provided in the paper: “FBMS: An R Package for Flexible Bayesian Model Selection and Model Averaging” available on arxiv: more explicit examples accompanying the paper can be found on github https://github.com/jonlachmann/GMJMCMC/tree/data-inputs/tests_current
Load technical markdown settings and a custom function for printing only what is needed through printn().
\[ P(\Delta|D) = \sum_{m\in\Omega} P(m|D)\, \int_{\Theta_m} P(\Delta|m,\theta,D)\, P(\theta|m,D)\, d\theta. \]
Reference: [@hubin2020flexible]
\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y \mid \mu_i,\phi), \quad i = 1,\dots,n,\\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{q} \gamma_j \beta_j F_j(\mathbf{x}_i, \boldsymbol{\alpha}_j) + \sum_{k=1}^{r} \delta_k. \end{aligned} \]
Depending on allowed nonlinear functions \(\mathbb{G}\): neural nets (sigmoid), decision trees (thresholds), MARS (hinges), fractional polynomials, logic regression, etc.
FBMS| Name | Function | Name | Function |
|---|---|---|---|
| sigmoid | \(1 / (1 + \exp(-x))\) | sqroot | \(|x|^{1/2}\) |
| relu | \(\max(x, 0)\) | troot | \(|x|^{1/3}\) |
| nrelu | \(\max(-x, 0)\) | sin_deg | \(\sin(x/180 \cdot \pi)\) |
| hs | \(x > 0\) | cos_deg | \(\cos(x/180 \cdot \pi)\) |
| nhs | \(x < 0\) | exp_dbl | \(\exp(-|x|)\) |
| gelu | \(x \Phi(x)\) | gauss | \(\exp(-x^2)\) |
| ngelu | \(-x \Phi(-x)\) | erf | \(2 \Phi(\sqrt{2}x) - 1\) |
| pm2 | \(x^{-2}\) | p0pm2 | \(p0(x) \cdot x^{-2}\) |
| pm1 | \(\text{sign}(x) |x|^{-1}\) | p0pm05 | \(p0(x) \cdot |x|^{-0.5}\) |
| pm05 | \(|x|^{-0.5}\) | p0p0 | \(p0(x)^2\) |
| p05 | \(|x|^{0.5}\) | p0p05 | \(p0(x) \cdot |x|^{0.5}\) |
| p2 | \(x^2\) | p0p1 | \(p0(x) \cdot x\) |
| p3 | \(x^3\) | p0p2 | \(p0(x) \cdot x^2\) |
| p0p3 | \(p0(x) \cdot x^3\) |
Custom functions can be added to \(\mathbb{G}\).
Let \(M = (\gamma_1, \dots, \gamma_q)\) (for linear models \(q=p\)).
Penalizing complexity priors
\[ P(M) \propto \mathbb{I}(|\boldsymbol\gamma_{1:q}| \le Q) \prod_{j=1}^q r^{\gamma_j c(F_j(\mathbf{x}, \boldsymbol{\alpha}_j))}, \quad 0 < r < 1, \]
\(c(F_j(\cdot))\): complexity measure (linear models: \(c(x)=1\); BGNLM counts algebraic operators).
The following parameter will be used to change the prior penalization.
# model prior complexity penalty
model_prior = list(r = 0.005) #default is 1/nMixtures of g-priors
\[ \begin{aligned} P(\beta_0, \phi \mid M) &\propto \phi^{-1}, \\ P(\boldsymbol{\beta} \mid g) &\sim \mathbb{N}_{|M|}\!\left(\mathbf{0},\, g \cdot \phi\, J_n(\boldsymbol{\beta})^{-1}\right), \\ \frac{1}{1+g} &\sim \text{tCCH}\!\left(\frac{a}{2}, \frac{b}{2}, \rho, \frac{s}{2}, v, \kappa\right). \end{aligned} \]
Jeffreys prior
\[ P(\phi \mid M) = \phi^{-1}, \quad P(\beta_0, \boldsymbol{\beta} \mid M) = |J_n(\beta_0, \boldsymbol{\beta})|^{1/2}. \]
All priors from the table below (default is the g-prior)
| Prior (Alias) | \(a\) | \(b\) | \(\rho\) | \(s\) | \(v\) | \(k\) | Families |
|---|---|---|---|---|---|---|---|
| Default: | |||||||
g-prior |
\(g\) (default: \(\max(n, p^2)\)) | GLM | |||||
| tCCH-Related Priors: | |||||||
CH |
\(a\) | \(b\) | 0 | \(s\) | 1 | 1 | GLM |
uniform |
2 | 2 | 0 | 0 | 1 | 1 | GLM |
Jeffreys |
0 | 2 | 0 | 0 | 1 | 1 | GLM |
beta.prime |
\(1/2\) | \(n - p_M - 1.5\) | 0 | 0 | 1 | 1 | GLM |
benchmark |
0.02 | \(0.02 \max(n, p^2)\) | 0 | 0 | 1 | 1 | GLM |
TG |
\(2a\) | 2 | 0 | \(2s\) | 1 | 1 | GLM |
ZS-adapted |
1 | 2 | 0 | \(n + 3\) | 1 | 1 | GLM |
robust |
1 | 2 | 1.5 | 0 | \((n+1)/(p_M+1)\) | 1 | GLM |
hyper-g-n |
1 | 2 | 1.5 | 0 | 1 | \(1/n\) | GLM |
intrinsic |
1 | 1 | 1 | 0 | \((n + p_M + 1)/(p_M + 1)\) | \((n + p_M + 1)/n\) | GLM |
tCCH |
\(a\) | \(b\) | \(\rho\) | \(s\) | \(v\) | \(k\) | GLM |
| Other Priors: | |||||||
EB-local |
\(a\) | GLM | |||||
EB-global |
\(a\) | G | |||||
JZS |
\(a\) | G | |||||
ZS-null |
\(a\) | G | |||||
ZS-full |
\(a\) | G | |||||
hyper-g |
\(a\) | GLM | |||||
hyper-g-laplace |
\(a\) | G | |||||
AIC |
None | GLM | |||||
BIC |
None | GLM | |||||
Jeffreys-BIC |
Var | GLM |
Here \(p_M\) is the number of predictors excluding the intercept. “G” denotes Gaussian-only; “GLM” additionally includes binomial, Poisson, and gamma.
How to switch priors in code (be explicit):
# g-prior with g = 1000
beta_prior = list(type = "g-prior", alpha = 1000)
# Robust prior (tune by Table parameters)
beta_prior = list(type = "robust")
# Jeffreys-BIC
beta_prior = list(type = "Jeffreys-BIC")
# Generic tCCH (provide all hyperparameters explicitly)
beta_prior = list(type = "tCCH", a = 2, b = 2, rho = 0, s = 0, v = 1, k = 1)Marginal likelihood \[ P(D|M) = \int_{\Theta_M} P(D|\theta_M, M) \, P(\theta_M|M) \, d\theta_M. \]
Posterior \[ P(M|D) = \frac{P(D|M) P(M)}{\sum_{M' \in \Omega} P(D|M') P(M')}. \]
Approximation over discovered models \(\Omega^*\) \[ P(M|D) \approx \frac{P(D|M) P(M)}{\sum_{M' \in \Omega^*} P(D|M') P(M')}, \quad M \in \Omega^*. \]
Marginal inclusion probability \[ P(\gamma_j=1|D) \approx \sum_{M \in \Omega^*: \gamma_j=1} P(M|D). \]
Run multiple chains and aggregate unique models \(\Omega^*\):
\[ \widehat{P}(\Delta|D) = \sum_{M \in \Omega^*} P(\Delta|M,D) \, \widehat{P}(M|D). \]
# Parameters for parallel runs are set to a single thread and single core to comply with CRAN requirenments (please tune for your machine if you have more capacity)
runs <- 1 # 1 set for simplicity; use rather 16 or more
cores <- 1 # 1 set for simplicity; use rather 8 or moreData: \(n=939\) exoplanets; variables include semimajoraxis, period, hoststar_mass, etc.
Target relationship: \({a \approx K_2 \left(P^2 M_h\right)^{1/3}}\).
We shall run a single chain GMJMCMC
# Load example
data <- FBMS::exoplanet
# Choose a small but expressive transform set for a quick demo
transforms <- c("sigmoid", "sin_deg", "exp_dbl", "p0", "troot", "p3")
# ---- fbms() call (simple GMJMCMC) ----
# Key parameters (explicit):
# - formula : semimajoraxis ~ 1 + . # response and all predictors
# - data : data # dataset
# - beta_prior : list(type = "g-prior") # parameter prior
# - model_prior : list(r = 1/dim(data)[1]) # model prior
# - method : "gmjmcmc" # exploration strategy
# - transforms : transforms # nonlinear feature dictionary
# - P : population size per generation (search breadth)
result_single <- fbms(
formula = semimajoraxis ~ 1 + .,
data = data,
beta_prior = list(type = "g-prior", alpha = dim(data)[1]),
model_prior = list(r = 1/dim(data)[1]),
method = "gmjmcmc",
transforms = transforms,
P = 10
)
# Summarize
printn(summary(result_single))##
## Best population: 7 log marginal posterior: 3001.64
##
## feats.strings marg.probs
## 1 troot((period*sigmoid(mass))) 1.00000000
## 2 ((period*hoststar_mass)*period) 1.00000000
## 3 p3((period*hoststar_mass)) 1.00000000
## 4 (troot(period)*troot(period)) 1.00000000
## 5 (period*hoststar_mass) 1.00000000
## 6 (sigmoid(mass)*(period*hoststar_mass)) 1.00000000
## 7 hoststar_mass 0.91282456
## 8 hoststar_temperature 0.01243301
and a parallel GMJMCMC
# ---- fbms() call (parallel GMJMCMC) ----
# Key parameters (explicit):
# - formula : semimajoraxis ~ 1 + . # response and all predictors
# - data : data # dataset
# - beta_prior : list(type = "g-prior") # parameter prior
# - model_prior : list(r = 1/dim(data)[1]) # model prior
# - method : "gmjmcmc" # exploration strategy
# - transforms : transforms # nonlinear feature dictionary
# - runs, cores : parallelization controls
# - P : population size per generation (search breadth)
result_parallel <- fbms(
formula = semimajoraxis ~ 1 + .,
data = data,
beta_prior = list(type = "g-prior", alpha = dim(data)[1]),
model_prior = list(r = 1/dim(data)[1]),
method = "gmjmcmc.parallel",
transforms = transforms,
runs = runs, # by default the rmd has runs = 1; increase for convergence
cores = cores, # by default the rmd has cores = 1; increase for convergence
P = 10
)
# Summarize
printn(summary(result_parallel))##
## Best population: 10 thread: 1 log marginal posterior: 2144.064
##
## feats.strings marg.probs
## 1 period 1.0000000
## 2 (period*mass) 1.0000000
## 3 ((period*mass)*(period*mass)) 1.0000000
## 4 sigmoid(period) 1.0000000
## 5 p0(period) 1.0000000
## 6 ((period*mass)*(radius*(period*period))) 0.9999902
## 7 (radius*(period*period)) 0.9999844
## 8 (hoststar_temperature*p0(((period*mass)*(period*mass)))) 0.9945983
## 9 eccentricity 0.9554446
Plot output
plot(result_parallel)Convergence plots
diagn_plot(result_parallel)Reference: [@hubin2018mode]
\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y\mid \mu_i, \phi), \quad i=1,\dots,n, \\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{p} \gamma_j \beta_j x_{ij}. \end{aligned} \]
We simulate data with a known sparse truth and run MJMCMC with an explicit g-prior.
library(mvtnorm)
n <- 100 # sample size
p <- 20 # number of covariates
k <- 5 # size of true submodel
correct.model <- 1:k
beta.k <- (1:5)/5
beta <- rep(0, p)
beta[correct.model] <- beta.k
set.seed(123)
x <- rmvnorm(n, rep(0, p))
y <- x %*% beta + rnorm(n)
# Standardize
y <- scale(y)
X <- scale(x) / sqrt(n)
df <- as.data.frame(cbind(y, X))
colnames(df) <- c("Y", paste0("X", seq_len(ncol(df) - 1)))
printn(correct.model)## [1] 1 2 3 4 5
printn(beta.k)## [1] 0.2 0.4 0.6 0.8 1.0
Run MJMCMC with a g-prior (g = 100)
# ---- fbms() call (MJMCMC) ----
# Explicit prior choice:
# beta_prior = list(type = "g-prior", alpha = 100)
# To switch to another prior, e.g. robust:
# beta_prior = list(type = "robust")
result.lin <- fbms(
formula = Y ~ 1 + .,
data = df,
method = "mjmcmc",
N = 5000, # number of iterations
beta_prior = list(type = "g-prior", alpha = 100)
)Plot results
plot(result.lin)Summarize with posterior effects
# 'effects' specifies quantiles for posterior modes of effects across models
printn(summary(result.lin, effects = c(0.5, 0.025, 0.975)))##
## Best log marginal posterior: 56.2259
##
## $PIP
## feats.strings marg.probs
## 1 X4 1.00000000
## 2 X3 1.00000000
## 3 X5 1.00000000
## 4 X2 0.95262397
## 5 X1 0.16880319
## 6 X18 0.10464730
## 7 X20 0.10105529
## 8 X15 0.08767336
## 9 X10 0.08257391
## 10 X16 0.08200567
## 11 X13 0.07741378
## 12 X12 0.07474969
## 13 X11 0.07090813
## 14 X19 0.06903813
## 15 X7 0.06899870
## 16 X14 0.06586870
## 17 X9 0.05690408
## 18 X8 0.05592665
## 19 X6 0.05154572
## 20 X17 0.04739590
##
## $EFF
## Covariate quant_0.5 quant_0.025 quant_0.975
## 1 intercept 0 0 0
## 2 X1 0 0 0.6606
## 3 X2 1.7024 0 1.7383
## 4 X3 4.8118 4.7487 4.8867
## 5 X4 4.9803 4.9086 5.0868
## 6 X5 4.7744 4.3659 4.8573
## 7 X6 0 -0.0694 0
## 8 X7 0 -0.1382 0
## 9 X8 0 0 0.072
## 10 X9 0 -0.1875 0
## 11 X10 0 0 0.5609
## 12 X11 0 0 0.444
## 13 X12 0 -0.2502 0
## 14 X13 0 -0.3655 0
## 15 X14 0 0 0.2716
## 16 X15 0 0 0.5654
## 17 X16 0 0 0.5074
## 18 X17 0 0 0.2266
## 19 X18 0 0 0.4858
## 20 X19 0 -0.2733 0
## 21 X20 0 -0.5012 0
Run parallel MJMCMC
# ---- fbms() call (parallel MJMCMC) ----
# Explicit prior choice:
# beta_prior = list(type = "g-prior", alpha = 100)
# To switch to another prior, e.g. robust:
# beta_prior = list(type = "robust")
# method = mjmcmc.parallel
# runs, cores : parallelization controls
result.lin.par <- fbms(
formula = Y ~ 1 + .,
data = df,
method = "mjmcmc.parallel",
N = 5000, # number of iterations
beta_prior = list(type = "g-prior", alpha = 100),
runs = runs,
cores = cores
)
printn(summary(result.lin.par, effects = c(0.5, 0.025, 0.975)))##
## Best log marginal posterior: 56.2259
##
## $PIP
## feats.strings marg.probs
## 1 X4 1.00000000
## 2 X3 1.00000000
## 3 X5 1.00000000
## 4 X2 0.95518016
## 5 X1 0.17518018
## 6 X10 0.10751407
## 7 X20 0.10044924
## 8 X18 0.09363710
## 9 X16 0.09089584
## 10 X15 0.08729800
## 11 X13 0.08418363
## 12 X19 0.07302402
## 13 X14 0.07262783
## 14 X11 0.06881624
## 15 X17 0.06104115
## 16 X9 0.05570830
## 17 X12 0.05215211
## 18 X6 0.04938128
## 19 X7 0.04711947
## 20 X8 0.04447923
##
## $EFF
## Covariate quant_0.5 quant_0.025 quant_0.975
## 1 intercept 0 0 0
## 2 X1 0 0 0.6606
## 3 X2 1.7027 0 1.7394
## 4 X3 4.8118 4.7487 4.8885
## 5 X4 4.9803 4.9043 5.0888
## 6 X5 4.7744 4.3659 4.8576
## 7 X6 0 -0.0694 0
## 8 X7 0 -0.1346 0
## 9 X8 0 0 0.072
## 10 X9 0 -0.1875 0
## 11 X10 0 0 0.5625
## 12 X11 0 0 0.4382
## 13 X12 0 -0.2501 0
## 14 X13 0 -0.3742 0
## 15 X14 0 0 0.2739
## 16 X15 0 0 0.5567
## 17 X16 0 0 0.5074
## 18 X17 0 0 0.2266
## 19 X18 0 0 0.4718
## 20 X19 0 -0.2733 0
## 21 X20 0 -0.511 0
Reference: [@hubin2023fractional]
We augment the linear example to follow an FP truth and fit with GMJMCMC.
\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y|\mu_i,\phi),\\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{p} \sum_{k \in K} \gamma_{jk}\, \beta_{jk}\, \rho_k(x_{ij}), \quad \text{with } K = \mathbf{F}_0 \cup \mathbf{F}_1 \cup \mathbf{F}_2, \end{aligned} \]
# Create FP-style response with known structure, covariates are from previous example
df$Y <- p05(df$X1) + df$X1 + pm05(df$X3) + p0pm05(df$X3) + df$X4 +
pm1(df$X5) + p0(df$X6) + df$X8 + df$X10 + rnorm(nrow(df))
# Allow common FP transforms
transforms <- c(
"p0", "p2", "p3", "p05", "pm05", "pm1", "pm2", "p0p0",
"p0p05", "p0p1", "p0p2", "p0p3", "p0p05", "p0pm05", "p0pm1", "p0pm2"
)
# Generation probabilities — here only modifications and mutations
probs <- gen.probs.gmjmcmc(transforms)
probs$gen <- c(0, 1, 0, 1)
# Feature-generation parameters
params <- gen.params.gmjmcmc(ncol(df) - 1)
params$feat$D <- 1 # max depth 1 featuresRun GMJMCMC (single-thread)
result <- fbms(
formula = Y ~ 1 + .,
data = df,
method = "gmjmcmc",
transforms = transforms,
beta_prior = list(type = "Jeffreys-BIC"),
probs = probs,
params = params,
P = 10
)
printn(summary(result))##
## Best population: 10 log marginal posterior: -206.4266
##
## feats.strings marg.probs
## 1 pm05(X3) 1.0000000000
## 2 p0p0(X3) 1.0000000000
## 3 pm1(X5) 1.0000000000
## 4 p0pm05(X6) 0.9999959549
## 5 p0pm1(X3) 0.9359835862
## 6 X17 0.8981659238
## 7 X15 0.8795347468
## 8 X10 0.8584612830
## 9 X9 0.8507841966
## 10 p0pm2(X13) 0.7855348631
## 11 X20 0.1825368449
## 12 X4 0.0362927730
## 13 X7 0.0166240434
## 14 X13 0.0079441065
## 15 X2 0.0002276410
## 16 p0p3(X20) 0.0001784105
Parallel GMJMCMC
result_parallel <- fbms(
formula = Y ~ 1 + .,
data = df,
method = "gmjmcmc.parallel",
transforms = transforms,
beta_prior = list(type = "Jeffreys-BIC"),
probs = probs,
params = params,
P = 10,
runs = runs,
cores = cores
)
printn(summary(result_parallel))##
## Best population: 10 thread: 1 log marginal posterior: -506.0695
##
## feats.strings marg.probs
## 1 p0pm05(X3) 1.000000000
## 2 X5 1.000000000
## 3 pm2(X5) 1.000000000
## 4 X3 0.398001125
## 5 p0p0(X3) 0.001258547
Dataset: \(n=659\) kids; response \(y\): standardized height; covariates: c.bf, c.age, m.ht, m.bmi, reg. Random intercept for dr.
We specify a custom estimator that uses a mixed model (via lme4), and plug it into fbms() with family = "custom". We pass extra parameters of the estimator through mlpost_params = list(dr = dr,r = r)
# Custom approximate log marginal likelihood for mixed model using Laplace approximation
mixed.model.loglik.lme4 <- function (y, x, model, complex, mlpost_params) {
if (sum(model) > 1) {
x.model <- x[, model]
data <- data.frame(y, x = x.model[, -1], dr = mlpost_params$dr)
mm <- lmer(as.formula(paste0("y ~ 1 +",
paste0(names(data)[2:(ncol(data)-1)], collapse = "+"),
" + (1 | dr)")), data = data, REML = FALSE)
} else {
data <- data.frame(y, dr = mlpost_params$dr)
mm <- lmer(y ~ 1 + (1 | dr), data = data, REML = FALSE)
}
# log marginal likelihood (Laplace approx) + log model prior
mloglik <- as.numeric(logLik(mm)) - 0.5 * log(length(y)) * (ncol(data) - 2)
if (length(mlpost_params$r) == 0) mlpost_params$r <- 1 / nrow(x)
lp <- log_prior(mlpost_params, complex)
list(crit = mloglik + lp, coefs = fixef(mm))
}library(lme4)## Loading required package: Matrix
data(Zambia, package = "cAIC4")
df <- as.data.frame(sapply(Zambia[1:5], scale))
transforms <- c("p0","p2","p3","p05","pm05","pm1","pm2",
"p0p0","p0p05","p0p1","p0p2","p0p3",
"p0p05","p0pm05","p0pm1","p0pm2")
probs <- gen.probs.gmjmcmc(transforms)
probs$gen <- c(1, 1, 0, 1) # include modifications and interactions
params <- gen.params.gmjmcmc(ncol(df) - 1)
params$feat$D <- 1
params$feat$pop.max <- 10
result2a <- fbms(
formula = z ~ 1 + .,
data = df,
method = "gmjmcmc.parallel",
transforms = transforms,
probs = probs,
params = params,
P = 10,
N = 100,
runs = runs,
cores = cores,
family = "custom",
loglik.pi = mixed.model.loglik.lme4,
model_prior = list(r = 1 / nrow(df)), # model_prior is passed to mlpost_params
extra_params = list(dr = droplevels(Zambia$dr)) # extra_params are passed to mlpost_params
)
printn(summary(result2a, tol = 0.05, labels = names(df)[-1]))##
## Best population: 6 thread: 1 log marginal posterior: -876.5476
##
## feats.strings marg.probs
## 1 c.bf 1.0000000
## 2 c.age 1.0000000
## 3 m.bmi 1.0000000
## 4 m.ht 1.0000000
## 5 (c.age*c.age) 0.9999999
Reference: [@hubin2020logic]
\[ \mathsf{h}(\mu_i) = \beta_0 + \sum_{j=1}^{q} \gamma_j \beta_j L_{ji}, \quad L_{ji} \text{ are logic trees (e.g., } (x_{i1}\wedge x_{i2}) \vee x_{i3}^c ). \]
We generate Boolean covariates and a known logic signal, define a custom estimator with a logic prior, and fit via GMJMCMC.
n <- 2000
p <- 50
set.seed(1)
X2 <- as.data.frame(matrix(rbinom(n * p, size = 1, prob = runif(n * p, 0, 1)), n, p))
y2.Mean <- 1 + 7*(X2$V4*X2$V17*X2$V30*X2$V10) + 9*(X2$V7*X2$V20*X2$V12) +
3.5*(X2$V9*X2$V2) + 1.5*(X2$V37)
Y2 <- rnorm(n, mean = y2.Mean, sd = 1)
df <- data.frame(Y2, X2)
# Train/test split
df.training <- df[1:(n/2), ]
df.test <- df[(n/2 + 1):n, ]
df.test$Mean <- y2.Mean[(n/2 + 1):n]Custom estimator with logic regression priors
estimate.logic.lm <- function(y, x, model, complex, mlpost_params) {
suppressWarnings({
mod <- fastglm(as.matrix(x[, model]), y, family = gaussian())
})
mloglik <- -(mod$aic + (log(length(y)) - 2) * (mod$rank)) / 2
wj <- complex$width
lp <- sum(log(factorial(wj))) - sum(wj * log(4 * mlpost_params$p) - log(4))
logpost <- mloglik + lp
if (logpost == -Inf) logpost <- -10000
list(crit = logpost, coefs = mod$coefficients)
}Run GMJMCMC
set.seed(5001)
# Only "not" operator; "or" is implied by De Morgan via "and" + "not"
transforms <- c("not")
probs <- gen.probs.gmjmcmc(transforms)
probs$gen <- c(1, 1, 0, 1) # no projections
params <- gen.params.gmjmcmc(p)
params$feat$pop.max <- 50
params$feat$L <- 15
result <- fbms(
formula = Y2 ~ 1 + .,
data = df.training,
method = "gmjmcmc",
transforms = transforms,
N = 500,
P = 10,
family = "custom",
loglik.pi = estimate.logic.lm,
probs = probs,
params = params,
model_prior = list(p = p)
)
printn(summary(result))##
## Best population: 9 log marginal posterior: -1912.786
##
## feats.strings marg.probs
## 1 ((V4*V10)*V17) 1.0000000000
## 2 (V9*V2) 1.0000000000
## 3 V37 1.0000000000
## 4 ((V20*V7)*V12) 1.0000000000
## 5 (V33*(V30*(V4*V17))) 0.9999999179
## 6 V30 0.9999969055
## 7 V17 0.0137256351
## 8 V32 0.0025183771
## 9 V35 0.0020386609
## 10 V12 0.0013621295
## 11 V45 0.0012730261
## 12 V18 0.0012697929
## 13 V16 0.0010825818
## 14 V41 0.0010734783
## 15 V44 0.0009967761
## 16 V31 0.0009461623
## 17 V8 0.0008938150
## 18 V13 0.0007713390
## 19 V3 0.0006926010
## 20 V24 0.0006547320
## 21 V21 0.0006439233
## 22 V49 0.0006393027
## 23 V9 0.0006283463
## 24 V36 0.0006209956
## 25 V15 0.0006131099
# Extract models
mpm <- get.mpm.model(result, y = df.training$Y2, x = df.training[,-1],
family = "custom", loglik.pi = estimate.logic.lm, params = list(p = 50))
printn(mpm$coefs)## (Intercept) ((V4*V10)*V17) (V9*V2)
## 0.6578016 2.7049544 3.5767605
## V30 V37 (V33*(V30*(V4*V17)))
## 0.6051133 1.3582049 2.1279042
## ((V20*V7)*V12)
## 9.1213744
mpm2 <- get.mpm.model(result, y = df.training$Y2, x = df.training[,-1])
printn(mpm2$coefs)## (Intercept) ((V4*V10)*V17) (V9*V2)
## 0.6578016 2.7049544 3.5767605
## V30 V37 (V33*(V30*(V4*V17)))
## 0.6051133 1.3582049 2.1279042
## ((V20*V7)*V12)
## 9.1213744
mbest <- get.best.model(result)
printn(mbest$coefs)## Intercept ((V4*V10)*V17) (V9*V2)
## 0.6578016 2.7049544 3.5767605
## V30 V37 (V33*(V30*(V4*V17)))
## 0.6051133 1.3582049 2.1279042
## ((V20*V7)*V12)
## 9.1213744
Prediction
# Correct link is identity for Gaussian
pred <- predict(result, x = df.test[,-1], link = function(x) x)
pred_mpm <- predict(mpm, x = df.test[,-1], link = function(x) x)
pred_best <- predict(mbest, x = df.test[,-1], link = function(x) x)
# RMSEs
printn(sqrt(mean((pred$aggr$mean - df.test$Y2)^2)))## [1] 1.528321
printn(sqrt(mean((pred_mpm - df.test$Y2)^2)))## [1] 1.528355
printn(sqrt(mean((pred_best - df.test$Y2)^2)))## [1] 1.528355
printn(sqrt(mean((df.test$Mean - df.test$Y2)^2)))## [1] 1.028222
# Errors to the true mean (oracle)
printn(sqrt(mean((pred$aggr$mean - df.test$Mean)^2)))## [1] 1.076062
printn(sqrt(mean((pred_best - df.test$Mean)^2)))## [1] 1.076113
printn(sqrt(mean((pred_mpm - df.test$Mean)^2)))## [1] 1.076113
# Quick diagnostic plot
plot(pred$aggr$mean, df.test$Y2,
xlab = "Predicted (BMA)", ylab = "Observed")
points(pred$aggr$mean, df.test$Mean, col = 2)
points(pred_best, df.test$Mean, col = 3)
points(pred_mpm, df.test$Mean, col = 4)spam dataWe fit a binomial BGNLM and compare BMA, best-model, and MPM predictions.
Important: specify the correct link in predict() (here logistic).
library(kernlab)
data("spam")
df <- spam[, c(58, 1:57)]
n <- nrow(df)
p <- ncol(df) - 1
colnames(df) <- c("y", paste0("x", 1:p))
df$y <- as.numeric(df$y == "spam")
to3 <- function(x) x^3
transforms <- c("sigmoid","sin_deg","exp_dbl","p0","troot","to3")
probs <- gen.probs.gmjmcmc(transforms)
probs$gen <- c(1, 1, 1, 1)
params <- gen.params.gmjmcmc(p)
params$feat$check.col <- FALSE
set.seed(6001)
result <- fbms(
formula = y ~ 1 + .,
data = df,
method = "gmjmcmc",
family = "binomial",
beta_prior = list(type = "Jeffreys-BIC"),
transforms = transforms,
probs = probs,
params = params
)
printn(summary(result))##
## Best population: 2 log marginal posterior: -1048.305
##
## feats.strings marg.probs
## 1 x5 1.000000000
## 2 x8 1.000000000
## 3 x55 1.000000000
## 4 sigmoid(x21) 1.000000000
## 5 x16 1.000000000
## 6 x20 1.000000000
## 7 x21 1.000000000
## 8 x23 1.000000000
## 9 x25 1.000000000
## 10 x27 1.000000000
## 11 sigmoid(x7) 1.000000000
## 12 p0(x17) 1.000000000
## 13 to3(x25) 1.000000000
## 14 x41 1.000000000
## 15 x42 1.000000000
## 16 x44 1.000000000
## 17 x12 1.000000000
## 18 x6 1.000000000
## 19 sigmoid(x52) 1.000000000
## 20 x53 1.000000000
## 21 x52 1.000000000
## 22 x45 1.000000000
## 23 (x50*x25) 1.000000000
## 24 p0(x19) 1.000000000
## 25 p0(x22) 0.999999632
## 26 sigmoid(1+1*x16+1*x55) 0.999999632
## 27 (x24*x3) 0.998106918
## 28 x2 0.998091170
## 29 x28 0.998090532
## 30 troot(x27) 0.998090188
## 31 x39 0.998090175
## 32 x32 0.998090164
## 33 x46 0.994872577
## 34 x19 0.994843934
## 35 troot(x8) 0.807086602
## 36 x24 0.807017530
## 37 x49 0.806845676
## 38 x29 0.036497149
## 39 x40 0.017892855
## 40 x10 0.014132910
## 41 sigmoid(x46) 0.003217954
Prediction accuracy
# Logistic link
invlogit <- function(x) 1/(1 + exp(-x))
# Model averaging
pred <- predict(result, x = df[,-1], link = invlogit)
printn(mean(round(pred$aggr$mean) == df$y))## [1] 0.9397957
# Best model
bm <- get.best.model(result)
preds <- predict(bm, df[,-1], link = invlogit)
printn(mean(round(preds) == df$y))## [1] 0.9402304
# Median Probability Model
mpm <- get.mpm.model(result, family = "binomial", y = df$y, x = df[,-1])
preds_mpm <- predict(mpm, df[,-1], link = invlogit)
printn(mean(round(preds_mpm) == df$y))## [1] 0.9402304