ebnm: Solve the empirical Bayes normal means problem

Travis Build Status Appveyor Build status CircleCI build status codecov

The ebnm package provides functions to solve the (heteroskedastic) “empirical Bayes normal means” (EBNM) problem for various choices of prior family. The model is

\[ x_j \ | \ θ_j,\ s_j \sim N(θ_j,\ s_j^2) \]

\[ θ_j \ | \ s_j \sim g \in \mathcal{G} \]

where the distribution \(g\) is to be estimated. The distribution \(g\) is referred to as the “prior distribution” for \(θ\) and \(\mathcal{G}\) is a specified family of prior distributions. Several options for \(\mathcal{G}\) are implemented, some parametric and others non-parametric; see below for examples.

Solving the EBNM problem involves two steps. First, estimate \(g \in \mathcal{G}\) via maximum marginal likelihood, yielding an estimate

\[ \hat{g} := \arg\max_{g \in \mathcal{G}}\ L(g) \]

where

\[ L(g):= \Pi_j\ \int\ p(x_j \ | \ θ_j,\ s_j)\ g(dθ_j) \]

Second, compute the posterior distributions \(p(θ_j \ | \ x_j,\ s_j,\ \hat{g})\) and/or summaries such as posterior means and posterior second moments.

The prior families that have been implemented include:

“point_normal”: The family of mixtures where one component is a point mass at \(μ\) and the other is a normal distribution centered at \(μ\).

“point_laplace”: The family of mixtures where one component is a point mass at zero and the other is a double-exponential distribution.

“point_exponential”: The family of mixtures where one component is a point mass at zero and the other is a (nonnegative) exponential distribution.

“normal”: The family of normal distributions.

“horseshoe”: The family of horseshoe distributions.

“normal_scale_mixture”: The family of scale mixtures of normals.

“unimodal”: The family of all unimodal distributions.

“unimodal_symmetric”: The family of symmetric unimodal distributions.

“unimodal_nonnegative”: The family of unimodal distributions with support constrained to be greater than the mode.

“unimodal_nonpositive”: The family of unimodal distributions with support constrained to be less than the mode.

“generalized_binary”: The family of mixtures where one component is a point mass at zero and the other is a truncated normal distribution with lower bound zero and nonzero mode.

“npmle”: The family of all distributions.

“deconvolver”: A non-parametric exponential family with a natural spline basis. Like npmle, there is no unimodal assumption, but whereas npmle produces spiky estimates for \(g\), deconvolver estimates are much more regular. See Narasimhan and Efron (2020) for details.

“flat”: A “non-informative” improper uniform prior.

“point_mass”: The family of all point masses.

License

The ebnm source code repository is free software: you can redistribute it under the terms of the GNU General Public License. All the files in this project are part of ebnm. This project is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

Quick Start

Install the ebnm package:

install.packages("ebnm")

Load ebnm into your R environment, and get help:

library(ebnm)
?ebnm

Try an example:

set.seed(1)
theta = c(rep(0, 500), rnorm(500)) # true means
x = theta + rnorm(1000) # observations with standard error 1
ebnm_res = ebnm_point_normal(x, 1)
plot(ebnm_res)