Getting started with the ebnm package

The empirical Bayes normal means problem

Given \(n\) observations \(x_i\) with known standard deviations \(s_i\), \(i = 1, \dots, n\), the normal means model (Robbins 1951; Efron and Morris 1972; Stephens 2017; Bhadra et al. 2019; Johnstone 2019; Sun 2020) is \[\begin{equation} x_i \overset{\text{ind.}}{\sim} \mathcal{N}(\theta_i, s_i^2), \end{equation}\] with the unknown (true) means \(\theta_i\) to be estimated. Here and throughout, we use \(\mathcal{N}(\mu, \sigma^2)\) to denote the normal distribution with mean \(\mu\) and variance \(\sigma^2\). The maximum-likelihood estimate of \(\theta_i\) is, of course, \(x_i\). The empirical Bayes (EB) approach to inferring \(\theta_i\) attempts to improve upon the maximum-likelihood estimate by “borrowing information” across observations, exploiting the fact that each observation contains information not only about its respective mean, but also about how the means are collectively distributed (Robbins 1956; Morris 1983; Efron 2010; Stephens 2017). Specifically, the EB approach assumes that \[\begin{equation} \theta_i \overset{\text{ind.}}{\sim} g \in \mathcal{G}, \end{equation}\] where \(g\) is a distribution to be estimated from the data, typically chosen from among some family of distributions \(\mathcal{G}\) that is specified in advance. (Note that although \(\mathcal{G}\) must be specified in advance, it can be arbitrarily flexible.)

The empirical Bayes normal means model is therefore fit by first using all of the observations to estimate \(g \in \mathcal{G}\), then using the estimated distribution \(\hat{g}\) to compute posteriors for each mean \(\theta_i\). Commonly, \(g\) is estimated via maximum-likelihood, in which case the EB approach consists of the following:

  1. Find \(\hat{g} := \rm{argmax}_{g \,\in\, \mathcal{G}} L(g)\), where \(L(g)\) denotes the marginal likelihood, \[\begin{equation} L(g) := p({\mathbf x} \mid g, {\mathbf s}) = \prod_{i=1}^n \textstyle \int p(x_i \mid \theta_i, s_i) \, g(\theta_i) \, d\theta_i, \end{equation}\] and we define \({\mathbf x} := (x_1, \ldots, x_n)\), \({\mathbf s} := (s_1, \ldots, s_n)\).

  2. Compute posterior distributions \[\begin{equation} p(\theta_i \mid x_i, s_i, \hat{g}) \propto \hat{g}(\theta_i) \, p(x_i \mid \theta_i, s_i), \end{equation}\] and/or summaries (posterior means, variances, etc.).

We refer to this two-step process as “solving the EBNM problem.” The ebnm package provides a unified interface for efficiently solving the EBNM problem using a wide variety of prior families \(\mathcal{G}\). For some prior families, it leverages code from existing packages; for others, it implements new model fitting algorithms with a mind toward speed and robustness. In this vignette, we demonstrate usage of the ebnm package in an analysis of weighted on-base averages for Major League Baseball players.

Application: Weighted on-base averages

A longstanding tradition in empirical Bayes research is to include an analysis of batting averages using data from Major League Baseball (see, for example, Brown 2008; Jiang and Zhang 2010; and Gu and Koenker 2017). Until recently, batting averages were the most important measurement of a hitter’s performance, with the prestigious yearly “batting title” going to the hitter with the highest average. However, with the rise of baseball analytics, metrics that better correlate to teams’ overall run production have become increasingly preferred. One such metric is wOBA (“weighted on-base average”), which is both an excellent measure of a hitter’s offensive production and, unlike competing metrics such as MLB’s xwOBA (Sharpe 2019) or Baseball Prospectus’s DRC+ (Judge 2019), can be calculated using publicly available data and methods.

Initially proposed by Tango, Lichtman, and Dolphin (2006), wOBA assigns values (“weights”) to hitting outcomes according to how much the outcome contributes on average to run production. For example, while batting average treats singles identically to home runs, wOBA gives a hitter more than twice as much credit for a home run.

Given a vector of wOBA weights \(\mathbf{w}\), hitter \(i\)’s wOBA is the weighted average \[\begin{equation} x_i := \mathbf{w}^T \mathbf{z}^{(i)} / n_i, \end{equation}\] where \(\mathbf{z}^{(i)} = (z_1^{(i)}, \ldots, z_7^{(i)})\) tallies outcomes (singles, doubles, triples, home runs, walks, hit-by-pitches, and outs) over the hitter’s \(n_i\) plate appearances (PAs). Modeling hitting outcomes as i.i.d. \[\begin{equation} \mathbf{z}^{(i)} \sim \text{Multinomial}(n_i, {\boldsymbol\pi}^{(i)}), \end{equation}\] where \({\boldsymbol\pi}^{(i)} = (\pi_1^{(i)}, \ldots, \pi_7^{(i)})\) is the vector of “true” outcome probabilities for hitter \(i\), we can regard \(x_i\) as a point estimate for the hitter’s “true wOBA skill” \[\begin{equation} \theta_i := \mathbf{w}^T {\boldsymbol\pi}^{(i)}. \end{equation}\] Standard errors for the \(x_i\)’s can be estimated as \[\begin{equation} s_i^2 = \mathbf{w}^T \hat{\mathbf\Sigma}^{(i)} \mathbf{w}/n_i, \end{equation}\] where \(\hat{\mathbf\Sigma}^{(i)}\) is the estimate of the multinomial covariance matrix obtained by setting \({\boldsymbol\pi} = \hat{\boldsymbol\pi}\), where \[\begin{equation} \hat{\boldsymbol\pi}^{(i)} = \mathbf{z}^{(i)}/n_i. \end{equation}\] (To deal with small sample sizes, we conservatively lower bound each standard error by the standard error that would be obtained by plugging in league-average event probabilities \(\hat{\boldsymbol\pi}_{\mathrm{lg}} = \sum_{i=1}^N \mathbf{z}^{(i)}/ \sum_{i=1}^N n_i\), where \(N\) is the number of hitters in the dataset.)

The relative complexity of wOBA makes it well suited for analysis via ebnm. With batting average, a common approach is to obtain empirical Bayes estimates using a beta-binomial model (see, for example, Robinson (2017)). With wOBA, one can estimate hitting outcome probabilities by way of a Dirichlet-multinomial model; alternatively, one can approximate the likelihood as normal and fit an EBNM model directly to the observed wOBAs. In the following, we take the latter approach.

The wOBA data set

We begin by loading and inspecting the wOBA data set, which consists of wOBAs and standard errors for the 2022 MLB regular season:

library(ebnm)
data(wOBA)
nrow(wOBA)
#> [1] 688
head(wOBA)
#>   FanGraphsID           Name Team  PA     x     s
#> 1       19952     Khalil Lee  NYM   2 1.036 0.733
#> 2       16953 Chadwick Tromp  ATL   4 0.852 0.258
#> 3       19608     Otto Lopez  TOR  10 0.599 0.162
#> 4       24770   James Outman  LAD  16 0.584 0.151
#> 5        8090 Matt Carpenter  NYY 154 0.472 0.054
#> 6       15640    Aaron Judge  NYY 696 0.458 0.024

Column “x” contains each player’s wOBA for the 2022 season, which we regard as an estimate of the player’s “true” wOBA skill. Column “s” provides standard errors.

Next, we visualize the overall distribution of wOBAs:

library(ggplot2)
ggplot(wOBA, aes(x = x)) +
  geom_histogram(bins = 64, color = "white",fill = "black") +
  theme_classic()

As the histogram shows, most players finished the season with a wOBA between .200 and .400. A few had very high wOBAs (\(>\).500), while others had wOBAs at or near zero. A casual inspection of the data suggests that players with these very high (or very low) wOBAs were simply lucky (or unlucky). For example, the 4 players with the highest wOBAs each had 16 PAs or fewer. It is very unlikely that they would have sustained this high level of production over a full season’s worth of PAs.

In contrast, Aaron Judge’s production — which included a record-breaking number of home runs — appears to be “real,” since it was sustained over nearly 700 PAs. Other cases are more ambiguous: how, for example, are we to assess Matt Carpenter, who had several exceptional seasons between 2013 and 2018 but whose output steeply declined in 2019–2021 before his surprising “comeback” in 2022? An empirical Bayes analysis can help to answer this and other questions.

Main ebnm methods and the normal prior family

Function ebnm() is the main interface for fitting the empirical Bayes normal means model; it is a “Swiss army knife” that allows for various choices of prior family \(\mathcal{G}\) as well as providing multiple options for fitting and tuning models. For example, we can fit a normal means model with the prior family \(\mathcal{G}\) taken to be the family of normal distributions:

x <- wOBA$x
s <- wOBA$s
names(x) <- wOBA$Name
names(s) <- wOBA$Name
fit_normal <- ebnm(x, s, prior_family = "normal", mode = "estimate")

(The default behavior is to fix the prior mode at zero. Since we certainly do not expect the distribution of true wOBA skill to have a mode at zero, we set mode = "estimate".)

We note in passing that the ebnm package has a second model-fitting interface, in which each prior family gets its own function:

fit_normal <- ebnm_normal(x, s, mode = "estimate")

Textual and graphical overviews of results can be obtained using, respectively, methods summary() and plot(). The summary method appears as follows:

summary(fit_normal)
#> 
#> Call:
#> ebnm_normal(x = x, s = s, mode = "estimate")
#> 
#> EBNM model was fitted to 688 observations with _heteroskedastic_ standard errors.
#> 
#> The fitted prior belongs to the _normal_ prior family.
#> 
#> 2 degrees of freedom were used to estimate the model.
#> The log likelihood is 989.64.
#> 
#> Available posterior summaries: _mean_, _sd_.
#> Use method fitted() to access available summaries.
#> 
#> A posterior sampler is _not_ available.
#> One can be added via function ebnm_add_sampler().

The plot() method visualizes results, comparing the “observed” values \(x_i\) (the initial wOBA estimates) against the empirical Bayes posterior mean estimates \(\hat{\theta}_i\):

plot(fit_normal)

The dashed line shows the diagonal \(x = y\), which makes shrinkage effects clearly visible. In particular, the most extreme wOBAs on either end of the spectrum are strongly shrunk towards the league average (around .300).

Since plot() returns a “ggplot” object (Wickham 2016), the plot can conveniently be customized using ggplot2 syntax. For example, one can vary the color of the points by the number of plate appearances:

plot(fit_normal) +
  geom_point(aes(color = sqrt(wOBA$PA))) +
  labs(x = "wOBA", y = "EB estimate of true wOBA skill", 
       color = expression(sqrt(PA))) +
  scale_color_gradient(low = "blue", high = "red")

By varying the color of points, we see that the wOBA estimates with higher standard errors or fewer plate appearances (blue points) tend to be shrunk toward the league average much more strongly than wOBAs from hitters with many plate appearances (red points).

Above, we used head() to view data for the first 6 hitters in the dataset. Let’s now see what the EBNM analysis suggests might be their “true” wOBA skill. To examine the results more closely, we use the fitted() method, which returns a posterior summary for each hitter:

print(head(fitted(fit_normal)), digits = 3)
#>                 mean     sd
#> Khalil Lee     0.303 0.0287
#> Chadwick Tromp 0.308 0.0286
#> Otto Lopez     0.310 0.0283
#> James Outman   0.311 0.0282
#> Matt Carpenter 0.339 0.0254
#> Aaron Judge    0.394 0.0184

The wOBA estimates of the first four ballplayers are shrunk strongly toward the league average, reflecting the fact that these players had very few plate appearances (and indeed, we were not swayed by their very high initial wOBA estimates).

Carpenter had many more plate appearances (154) than these other four players, but according to this model we should remain skeptical about his strong performance; after factoring in the prior, we judge his “true” performance to be much closer to the league average, downgrading an initial estimate of .472 to the final posterior mean estimate of .339.

Comparing normal and unimodal prior families

Judge’s “true” wOBA is also estimated to be much lower (.394) than the initial estimate (.458) despite sustaining a high level of production over a full season (696 PAs). For this reason, one might ask whether a prior that is more flexible than the normal prior—that is, a prior that can better adapt to “outliers” like Judge—might produce a different result. The ebnm package is very well suited to answering this question. For example, to obtain results using the family of all unimodal distributions rather than the family of normal distributions, we only need to change prior_family from "normal" to "unimodal":

fit_unimodal <- ebnm(x, s, prior_family = "unimodal", mode = "estimate")

It is straightforward to produce a side-by-side visualization of the fitted models simply by including both models as arguments to the plot() method (we also use the subset argument to focus on the results for Judge and other players with the most plate appearances):

top50 <- order(wOBA$PA, decreasing = TRUE)
top50 <- top50[1:50]
plot(fit_normal, fit_unimodal, subset = top50)

This plot illustrates the ability of the unimodal prior to better adapt to the data: wOBA estimates for players with a lot of plate appearances are not adjusted quite so strongly toward the league average. To compare in more detail, we see for example that Judge’s wOBA estimate from the model with the unimodal prior (the "mean2" column) remains much closer to the original wOBA estimate:

dat <- cbind(wOBA[, c("PA","x")],
             fitted(fit_normal),
             fitted(fit_unimodal))
names(dat) <- c("PA", "x", "mean1", "sd1", "mean2", "sd2")
print(head(dat), digits = 3)
#>                 PA     x mean1    sd1 mean2    sd2
#> Khalil Lee       2 1.036 0.303 0.0287 0.302 0.0277
#> Chadwick Tromp   4 0.852 0.308 0.0286 0.307 0.0306
#> Otto Lopez      10 0.599 0.310 0.0283 0.310 0.0315
#> James Outman    16 0.584 0.311 0.0282 0.311 0.0318
#> Matt Carpenter 154 0.472 0.339 0.0254 0.355 0.0430
#> Aaron Judge    696 0.458 0.394 0.0184 0.439 0.0155

Carpenter’s wOBA estimate is also higher under the more flexible unimodal prior, but is still adjusted much more than Judge’s in light of Carpenter’s smaller sample size. It is also interesting that the unimodal prior assigns greater uncertainty (the "sd2" column) to this estimate compared to the normal prior.

Recall that the two normal means models differ only in the priors used, so we can understand the differences in the shrinkage behavior of these models by inspecting the priors. Calling plot() with incl_cdf = TRUE shows the cumulative distribution functions (CDFs) of the fitted priors \(\hat{g}\). Since we are particularly interested in understanding the differences in shrinkage behaviour for the largest wOBAs such as Judge’s, we create a second plot that zooms in on wOBAs over .350:

library(cowplot)
p1 <- plot(fit_normal, fit_unimodal, incl_cdf = TRUE, incl_pm = FALSE) +
  xlim(c(.250, .350)) +
  guides(color = "none")
p2 <- plot(fit_normal, fit_unimodal, incl_cdf = TRUE, incl_pm = FALSE) +
  lims(x = c(.350, .450), y = c(0.95, 1))
plot_grid(p1, p2, nrow = 1, ncol = 2, rel_widths = c(3,5))

The plot on the right shows that the fitted normal prior has almost no mass on wOBAs above .400, explaining why Judge’s wOBA estimate is shrunk so strongly toward the league average, whereas the unimodal prior is flexible enough to permit larger posterior estimates above .400.

The posterior means and standard errors returned from the ebnm() call cannot be used to obtain credible intervals (except for the special case of the normal prior). Therefore, we provide additional methods confint() and quantile() which return, respectively, credible intervals (or more precisely, highest posterior density intervals: E. P. Box and Tiao (1965), Chen and Shao (1999)) and posterior quantiles for each observation. These are implemented using Monte Carlo techniques, which can be slow for large data sets, so credible intervals are not computed by default. The following code computes 80% highest posterior density (HPD) intervals for the EBNM model with unimodal prior. (We add a Monte Carlo sampler using function ebnm_add_sampler(); alternatively, we could have added a sampler in our initial calls to ebnm() by specifying output = output_all().) We set a seed for reproducibility:

fit_unimodal <- ebnm_add_sampler(fit_unimodal)
set.seed(1)
print(head(confint(fit_unimodal, level = 0.8)), digits = 3)
#>                CI.lower CI.upper
#> Khalil Lee        0.277    0.328
#> Chadwick Tromp    0.277    0.334
#> Otto Lopez        0.277    0.336
#> James Outman      0.277    0.335
#> Matt Carpenter    0.277    0.389
#> Aaron Judge       0.428    0.458

Interestingly, the 80% credible interval for Carpenter is very wide, and shares the same lower bound as the first four ballplayers with very few plate appearances.

Nonparametric prior families and advanced usage

Above, we demonstrated how the ebnm package makes it is easy to perform EBNM analyses with different types of priors, then compared results across two different choices of prior family. Each of these families makes different assumptions about the data which, a priori, may be more or less plausible. An alternative to prior families that make specific assumptions about the data is to use the prior family that contains all distributions \(\mathcal{G}_{\mathrm{npmle}}\), which is in a sense “assumption free.” Here we re-analyze the wOBA data set to illustrate the use of this prior family. Note that although nonparametric priors require specialized computational techniques, switching to a nonparametric prior is seamless in ebnm, as these implementation details are hidden. Similar to above, we need only make a single change to the prior_family argument:

fit_npmle <- ebnm(x, s, prior_family = "npmle")

(Note that because the family \(\mathcal{G}_{\mathrm{npmle}}\) is not unimodal, the mode = "estimate" option is not relevant here.)

Although the implementation details are hidden by default, it can sometimes be helpful to see what is going on “behind the scenes,” particularly for flagging or diagnosing issues. By default, ebnm uses the mixsqp package (Kim et al. 2020) to fit the NPMLE \(\hat{g} \in \mathcal{G}_{\mathrm{npmle}}\). We can monitor convergence of the mix-SQP optimization algorithm by setting the verbose control argument to TRUE:

fit_npmle <- ebnm(x, s, prior_family = "npmle", 
                  control = list(verbose = TRUE))
#> Running mix-SQP algorithm 0.3-48 on 688 x 95 matrix
#> convergence tol. (SQP):     1.0e-08
#> conv. tol. (active-set):    1.0e-10
#> zero threshold (solution):  1.0e-08
#> zero thresh. (search dir.): 1.0e-14
#> l.s. sufficient decrease:   1.0e-02
#> step size reduction factor: 7.5e-01
#> minimum step size:          1.0e-08
#> max. iter (SQP):            1000
#> max. iter (active-set):     20
#> number of EM iterations:    10
#> Computing SVD of 688 x 95 matrix.
#> Matrix is not low-rank; falling back to full matrix.
#> iter        objective max(rdual) nnz stepsize max.diff nqp nls
#>    1 +9.583407733e-01  -- EM --   95 1.00e+00 6.08e-02  --  --
#>    2 +8.298700300e-01  -- EM --   95 1.00e+00 2.87e-02  --  --
#>    3 +7.955308369e-01  -- EM --   95 1.00e+00 1.60e-02  --  --
#>    4 +7.819858634e-01  -- EM --   68 1.00e+00 1.05e-02  --  --
#>    5 +7.753787534e-01  -- EM --   53 1.00e+00 7.57e-03  --  --
#>    6 +7.717040208e-01  -- EM --   49 1.00e+00 5.73e-03  --  --
#>    7 +7.694760705e-01  -- EM --   47 1.00e+00 4.48e-03  --  --
#>    8 +7.680398878e-01  -- EM --   47 1.00e+00 3.58e-03  --  --
#>    9 +7.670690681e-01  -- EM --   44 1.00e+00 2.91e-03  --  --
#>   10 +7.663865515e-01  -- EM --   42 1.00e+00 2.40e-03  --  --
#>    1 +7.658902386e-01 +6.493e-02  39  ------   ------   --  --
#>    2 +7.655151741e-01 +5.328e-02  19 1.00e+00 1.29e-01  20   1
#>    3 +7.627792563e-01 +6.432e-03   7 1.00e+00 1.74e-01  20   1
#>    4 +7.626270812e-01 +5.897e-05   7 1.00e+00 3.23e-01   7   1
#>    5 +7.626270755e-01 -1.335e-08   7 1.00e+00 4.67e-04   2   1
#> Optimization took 0.03 seconds.
#> Convergence criteria met---optimal solution found.

This output shows no issues with convergence of the optimization algorithm; the mix-SQP algorithm converged to the solution (up to numerical rounding error) in only six iterations. In some cases, convergence issues can arise when fitting nonparametric models to large or complex data sets, and revealing the details of the optimization can help to pinpoint these issues.

Next, we visually compare the three fits obtained so far:

plot(fit_normal, fit_unimodal, fit_npmle, incl_cdf = TRUE, subset = top50)

As before, estimates largely agree, differing primarily at the tails. Both the unimodal prior family and the NPMLE are sufficiently flexible to avoid the strong shrinkage behavior of the normal prior family.

Fits can be compared quantitatively using the logLik() method, which, in addition to the log likelihood for each model, usefully reports the number of free parameters (i.e., degrees of freedom):

logLik(fit_unimodal)
#> 'log Lik.' 992.6578 (df=40)
logLik(fit_npmle)
#> 'log Lik.' 994.193 (df=94)

A nonparametric prior \(\mathcal{G}\) is approximated by \(K\) mixture components on a fixed grid, with the mixture proportions to be estimated. We can infer from the above output that \(\mathcal{G}_\text{npmle}\) has been approximated as a family of mixtures over a grid of \(K = 95\) point masses spanning the range of the data. (The number of degrees of freedom is one fewer than \(K\) because the mixture proportions must always sum to 1, which removes one degree of freedom from the estimation of \({\boldsymbol\pi}\).)

The default behaviour for nonparametric prior families is to choose \(K\) such that the likelihood obtained using estimate \(\hat{g}\) should be (on average) within one log-likelihood unit of the optimal estimate from among the entire nonparametric family \(\mathcal{G}\) (see Willwerscheid 2021). Thus, a finer approximating grid should not yield a large improvement in the log-likelihood. We can check this by using ebnm_scale_npmle() to create a finer grid:

scale_npmle <- ebnm_scale_npmle(x, s, KLdiv_target = 0.001/length(x), 
                                max_K = 1000)
fit_npmle_finer <- ebnm_npmle(x, s, scale = scale_npmle)
logLik(fit_npmle)
#> 'log Lik.' 994.193 (df=94)
logLik(fit_npmle_finer)
#> 'log Lik.' 994.2703 (df=528)

As the theory predicts, a much finer grid, with \(K = 529\), results in only a modest improvement in the log-likelihood. ebnm provides similar functions to customize grids for unimodal and normal scale mixture prior families.

One potential issue with the NPMLE is that, since it is discrete (as the above CDF plot makes apparent), observations are variously shrunk towards one of the support points, which can result in poor interval estimates. For illustration, we calculate 10% and 90% quantiles:

fit_npmle <- ebnm_add_sampler(fit_npmle)
print(head(quantile(fit_npmle, probs = c(0.1, 0.9))), digits = 3)
#>                  10%   90%
#> Khalil Lee     0.276 0.342
#> Chadwick Tromp 0.276 0.342
#> Otto Lopez     0.276 0.342
#> James Outman   0.276 0.342
#> Matt Carpenter 0.309 0.430
#> Aaron Judge    0.419 0.430

Each credible interval bound is constrained to lie at one of the support points of the NPMLE \(\hat{g}\). The interval estimate for Judge strikes us as far too narrow. Indeed, the NPMLE can sometimes yield degenerate interval estimates:

confint(fit_npmle, level = 0.8, parm = "Aaron Judge")
#>              CI.lower  CI.upper
#> Aaron Judge 0.4298298 0.4298298

To address this issue, the deconvolveR package (Narasimhan and Efron 2020) uses a penalized likelihood that encourages “smooth” priors \(\hat{g}\); that is, priors \(\hat{g}\) for which few of the mixture proportions are zero:

fit_deconv <- ebnm_deconvolver(x / s, output = ebnm_output_all()) 
plot(fit_deconv, incl_cdf = TRUE, incl_pm = FALSE)

Note, however, that the “true” means \(\theta\) being estimated are \(z\)-scores rather than raw wOBA skill, as package deconvolveR fits a model to \(z\)-scores rather than observations and associated standard errors. While this is reasonable in many settings, it does not seem appropriate for the present analysis:

print(head(quantile(fit_deconv, probs = c(0.1, 0.9)) * s), digits = 3)
#>                  10%   90%
#> Khalil Lee     0.400 2.000
#> Chadwick Tromp 0.563 1.127
#> Otto Lopez     0.354 0.796
#> James Outman   0.412 0.742
#> Matt Carpenter 0.413 0.531
#> Aaron Judge    0.406 0.459

These interval estimates do not match our basic intuitions; for example, a wOBA over .600 has never been sustained over a full season.

References

Bhadra, Anindya, Jyotishka Datta, Nicholas G. Polson, and Brandon Willard. 2019. “Lasso Meets Horseshoe: A Survey.” Statistical Science 34 (3): 405–27.
Brown, Lawrence D. 2008. “In-Season Prediction of Batting Averages: A Field Test of Empirical Bayes and Bayes Methodologies.” Annals of Applied Statistics 2 (1): 113–52.
Chen, Ming-Hui, and Qi-Man Shao. 1999. Monte Carlo Estimation of Bayesian Credible and HPD Intervals.” Journal of Computational and Graphical Statistics 8 (1): 69–92.
E. P. Box., and George C. Tiao. 1965. “Multiparameter Problems from a Bayesian Point of View.” Annals of Mathematical Statistics 36 (5): 1468–82.
Efron, Bradley. 2010. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Vol. 1. Institute of Mathematical Statistics Monographs. Cambridge, UK: Cambridge University Press.
Efron, Bradley, and Carl Morris. 1972. “Limiting the Risk of Bayes and Empirical Bayes Estimators—Part II: The Empirical Bayes Case.” Journal of the American Statistical Association 67 (337): 130–39.
Gu, Jiaying, and Roger Koenker. 2017. “Empirical Bayesball Remixed: Empirical Bayes Methods for Longitudinal Data.” Journal of Applied Econometrics 32 (3): 575–99.
Jiang, Wenhua, and Cun-Hui Zhang. 2010. “Empirical Bayes In-Season Prediction of Baseball Batting Averages.” In Borrowing Strength: Theory Powering Applications—a Festschrift for Lawrence D. Brown, 6:263–73. Institute of Mathematical Statistics Collections. Beachwood, OH: Institute of Mathematical Statistics.
Johnstone, Iain. 2019. “Gaussian Estimation: Sequence and Wavelet Models.” https://imjohnstone.su.domains/.
Judge, Jonathan. 2019. “Entirely Beyond WOWY: A Breakdown of DRC+.” Baseball Prospectus. https://www.baseballprospectus.com/news/article/48293/entirely-beyond-wowy-a-breakdown-of-drc/.
Kim, Youngseok, Peter Carbonetto, Matthew Stephens, and Mihai Anitescu. 2020. “A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions Using Sequential Quadratic Programming.” Journal of Computational and Graphical Statistics 29 (2): 261–73.
Morris, Carl N. 1983. “Parametric Empirical Bayes Inference: Theory and Applications.” Journal of the American Statistical Association 78 (381): 47–55.
Narasimhan, Balasubramanian, and Bradley Efron. 2020. “deconvolveR: A g-Modeling Program for Deconvolution and Empirical Bayes Estimation.” Journal of Statistical Software 94 (11): 1–20.
Robbins, Herbert. 1951. “Asymptotically Subminimax Solutions of Compound Statistical Decision Problems.” In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1951, Vol. II, 131–49. University of California Press, Berkeley; Los Angeles, CA.
———. 1956. “An Empirical Bayes Approach to Statistics.” In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956, Vol. I, 157–63. University of California Press, Berkeley; Los Angeles, CA.
Robinson, David. 2017. “Introduction to Empirical Bayes: Examples from Baseball Statistics.” https://github.com/dgrtwo/empirical-bayes-book.
Sharpe, Sam. 2019. “An Introduction to Expected Weighted On-Base Average (xwOBA).” MLB Technology Blog. https://technology.mlblogs.com/an-introduction-to-expected-weighted-on-base-average-xwoba-29d6070ba52b.
Stephens, Matthew. 2017. “False Discovery Rates: A New Deal.” Biostatistics 18 (2): 275–94.
Sun, Lei. 2020. “Topics on Empirical Bayes Normal Means.” PhD thesis, Chicago, IL: University of Chicago.
Tango, T. M., M. G. Lichtman, and A. E. Dolphin. 2006. The Book: Playing the Percentages in Baseball. TMA Press.
Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. New York, NY: Springer-Verlag.
Willwerscheid, Jason. 2021. “Empirical Bayes Matrix Factorization: Methods and Applications.” PhD thesis, Chicago, IL: University of Chicago.

Session information

The following R version and packages were used to generate this vignette:

sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] cowplot_1.1.1 ggplot2_3.3.6 ebnm_1.1-2   
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.1   xfun_0.36          bslib_0.3.1        ashr_2.2-57       
#>  [5] purrr_0.3.4        splines_3.6.2      lattice_0.20-38    colorspace_1.4-1  
#>  [9] vctrs_0.3.8        generics_0.0.2     htmltools_0.5.4    yaml_2.2.0        
#> [13] utf8_1.1.4         rlang_1.0.6        mixsqp_0.3-48      jquerylib_0.1.4   
#> [17] pillar_1.6.2       withr_2.5.0        glue_1.4.2         DBI_1.1.0         
#> [21] RColorBrewer_1.1-2 trust_0.1-8        lifecycle_1.0.3    stringr_1.4.0     
#> [25] munsell_0.5.0      gtable_0.3.0       evaluate_0.14      labeling_0.3      
#> [29] knitr_1.37         fastmap_1.1.0      invgamma_1.1       irlba_2.3.3       
#> [33] fansi_0.4.0        highr_0.8          Rcpp_1.0.8         scales_1.1.0      
#> [37] horseshoe_0.2.0    jsonlite_1.7.2     truncnorm_1.0-8    farver_2.0.1      
#> [41] deconvolveR_1.2-1  digest_0.6.23      stringi_1.4.3      dplyr_1.0.7       
#> [45] grid_3.6.2         cli_3.5.0          tools_3.6.2        magrittr_2.0.1    
#> [49] sass_0.4.0         tibble_3.1.3       crayon_1.4.1       pkgconfig_2.0.3   
#> [53] ellipsis_0.3.2     Matrix_1.3-4       SQUAREM_2017.10-1  assertthat_0.2.1  
#> [57] rmarkdown_2.21     R6_2.4.1           compiler_3.6.2