dfba_beta_descriptive

library(DFBA)

1 Overview

An important probability model in both theoretical and applied statistics is the beta distribution. It is an especially important distribution in Bayesian models of categorical data, which are associated with a number of the nonparametric procedures in the DFBA package. The beta is a univariate continuous probability distribution on the \([0,\,1]\) interval. The probability density is \(f(x)\), and it is a function of two non-negative finite shape parameters, which we will denote as \(a\) and \(b\). These shape parameters can be integers or non-integer real values provided that they are greater than zero and finite. The probability density function for a beta distribution is

\[\begin{equation} f(x) = \begin{cases} \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{a-1}(1-x)^{b-1}, & 0 \le x \le 1, a>0, b>0 \\ 0 & elsewhere \end{cases} \tag{1.1} \end{equation}\]

For a given beta distribution, the \(a\) and \(b\) parameters are fixed values, so the term \(\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\) is a normalization constant that assures that the cumulative probability (i.e., \(F(x)=\int_{0}^{x}f(x)\,dx\)) over all values for \(x\) is \(1\).1 The mean of the beta distribution is equal to \(\frac{a}{a+b}\). The mode of the distribution is \(\frac{a-1}{a+b-2}\), so long as \(a>1\) and \(b>1\) (Johnson, Kotz, & Balakrishnan,1995). When either (1) \(a = b = 1\), (2) \(a < 1\), or (3) \(b < 1\), the mode is undefined. The variance of the distribution is \(\frac{ab}{(a+b)^2)(a+b+1)}\).

The purpose of the dfba_beta_descriptive() function is to provide centrality and interval estimates as well as to provide an easy way to see displays of both the probability density function and the cumulative probability function. The function provides information on properties of the beta distribution that are important for doing Bayesian inference, and supplements the dbeta(), pbeta(), qbeta(), and rbeta() functions included in the stats package. The dfba_beta_descriptive() function is also called by several of the other functions in the DFBA package.

The dfba_beta_descriptive() function provides the mean, median, mode, and variance estimates for a beta variate in terms of the two shape parameters for the beta distribution. The mean and median of the beta distribution are always provided, but, as noted above, there are conditions under which the mode is not defined. For example when \(a=b=1\), the beta distribution is a flat density function on the \([0,~1]\) interval, so there is no mode. Another case when there is not a proper mode is when either \(0<a<1\), \(0<b<1\) or when both shape parameters are less than \(1\), which results in the density function that diverges at end points. The dfba_beta_descriptive() function reports the modal value as NA whenever the mode is not properly defined.

In addition to centrality and variance estimates, the dfba_beta_descriptive() function provides two interval estimates for the beta variate. Each of the interval estimates captures a set proportion of the distribution where a given probability lies within the limits. For both estimates, the default value is (\(95\%\)). One interval estimate has equal-tail probabilities (i.e., the probability below the lower limit is equal to the probability above the upper limit). The other interval estimate is the most compact interval that contains the stipulated probability; this interval estimate is called the highest-density interval.

The dfba_beta_descriptive() function has three arguments:

2 Examples

2.1 Example 1

The first example employs the default value of \(.95\) for the prob_interval argument, and it examines the case where the first and second beta shape parameters are, respectively, \(17\) and \(3\): The code for this example is

dfba_beta_descriptive(a = 17, 
                      b = 3)
#> Centrality Estimates 
#> ========================
#>   Mean            Median          Mode 
#>   0.85            0.861729        0.8888889 
#>  
#> Spread Estimate 
#> ========================
#>   Variance   
#>   0.00607143 
#> 
#>   Interval Estimates 
#> ========================
#>   95% Equal-tail interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.668623        0.9661738 
#>   95% Highest-density interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.697388        0.9801174 
#> 

Note that because \(a>b\), the distribution has central point estimates greater than \(.5\). Also note that the two \(95\)-percent interval estimates are different. The highest-density interval is a more compressed interval because it is not constrained to have equal probabilities of \(.025\) outside each limit.

The plot() method generates plots of the probability density function and the cumulative probability function:

plot(dfba_beta_descriptive(a = 17,
                           b = 3))

The dfba_beta_descriptive() object list also contains a dataframe of \(x\), \(f(x)\), \(F(x)\) should the user wish to create alternative displays:

x<- dfba_beta_descriptive(a = 17,
                          b = 3)$outputdf

head(x)
#>       x      density cumulative_prob
#> 1 0.000 0.000000e+00    0.000000e+00
#> 2 0.005 4.391484e-34    1.292334e-37
#> 3 0.010 2.849151e-29    1.677853e-32
#> 4 0.015 1.852583e-26    1.637400e-29
#> 5 0.020 1.829688e-24    2.157461e-27
#> 6 0.025 6.434198e-23    9.489049e-26

2.2 Example 2

Consider the case of a user who is interested in finding the \(90\%\) highest-density interval for a beta distribution where the shape parameters are \(31\) and \(20\):

x <- dfba_beta_descriptive(a = 31,
                           b = 20,
                           prob_interval = .90)
hdi <- c(x$hdi_lower,
         x$hdi_upper)

hdi
#> [1] 0.4969356 0.7196442

3 References

Johnson, N. L., Kotz S., and Balakrishnan, N. (1995). Continuous Univariate Distributions, Vol. 1, New York: Wiley.


  1. The gamma function \(\Gamma(x)\) is the generalization of the factorial to real, nonnegative values. If \(x\) is an integer, then \(\Gamma(x)=(x-1)!\).↩︎