Almost every function in **ggdensity** accepts a
`method`

argument—this is true for `geom_hdr()`

and other layer functions (`geom_hdr_lines()`

,
`geom_hdr_points()`

, …), as well as `get_hdr()`

and `get_hdr_1d()`

. This vignette summarizes the many ways in
which the `method`

argument can be specified; first looking
at it from a more basic perspective, then from the perspective of a
developer wanting to implement additional estimators.

`method_*()`

functionsFirst, let’s load the necessary packages and generate some sample data.

```
library("ggdensity"); theme_set(theme_minimal(8))
#> Loading required package: ggplot2
theme_update(legend.position = "none") # Suppressing legends for readability
```

```
set.seed(1)
<- data.frame(x = rnorm(500), y = rnorm(500))
df <- ggplot(df, aes(x, y))
p + geom_point() p
```

The easiest way to plot HDRs with `geom_hdr()`

(or any
other layer function from **ggdensity**) with a specified
density estimator is to provide a character object to the
`method`

argument:

```
+ geom_hdr(method = "kde")
p
+ geom_hdr(method = "mvnorm")
p
+ geom_hdr(method = "histogram")
p
+ geom_hdr(method = "freqpoly") p
```

However, as of **ggdensity** v1.0.0 there is an
alternative approach—providing a `method_*()`

function
call:

```
+ geom_hdr(method = method_kde())
p
+ geom_hdr(method = method_mvnorm())
p
+ geom_hdr(method = method_histogram())
p
+ geom_hdr(method = method_freqpoly()) p
```

The default behaviors of these two approaches are the same and always
will be—in this way, they are completely interchangeable. However, the
`method_*()`

function call is required to estimate HDRs with
non-default estimator parameters. For example, we can set the
`adjust`

parameter to apply a multiplicative adjustment to
the heuristically determined bandwidth in `method_kde()`

(which itself uses the one computed by
`MASS::bandwidth.nrd()`

):

`+ geom_hdr(method = method_kde(adjust = 1/2)) p `

The relevant parameters for each method are documented in their
respective `?method_*`

help pages. Note that these parameters
can not be provided to `geom_hdr()`

or
`stat_hdr()`

and thus are not accessible if a character value
is provided to `method`

.

The `method`

argument of `get_hdr()`

functions
in the same way:

```
<- get_hdr(df, method = method_kde(adjust = 1/2))
res
str(res)
#> List of 3
#> $ df_est:'data.frame': 10000 obs. of 5 variables:
#> ..$ x : num [1:10000] -3.01 -2.94 -2.87 -2.8 -2.73 ...
#> ..$ y : num [1:10000] -3 -3 -3 -3 -3 ...
#> ..$ fhat : num [1:10000] 4.72e-17 1.30e-15 2.88e-14 5.16e-13 7.44e-12 ...
#> ..$ fhat_discretized: num [1:10000] 2.00e-19 5.50e-18 1.22e-16 2.18e-15 3.15e-14 ...
#> ..$ hdr : num [1:10000] 1 1 1 1 1 1 1 1 1 1 ...
#> $ breaks: Named num [1:5] 0.00422 0.01273 0.03024 0.07544 Inf
#> ..- attr(*, "names")= chr [1:5] "99%" "95%" "80%" "50%" ...
#> $ data :'data.frame': 500 obs. of 3 variables:
#> ..$ x : num [1:500] -0.626 0.184 -0.836 1.595 0.33 ...
#> ..$ y : num [1:500] 0.0773 -0.2969 -1.1832 0.0113 0.9916 ...
#> ..$ hdr_membership: num [1:500] 0.5 0.5 0.8 0.8 0.5 0.95 0.8 0.5 0.5 0.5 ...
```

For details on the output of `get_hdr()`

, see
`?get_hdr`

.

`method_*_1d()`

functionsIn **ggdensity**, it is possible to estimate and plot
1-dimensional HDRs with `geom_hdr_rug()`

and
`get_hdr_1d()`

. These functions also accept a
`method`

argument, but they do not accept the previously
discussed `method_*()`

functions. Instead they accept the
1-dimensional analogues: `method_*_1d()`

.

```
+
p geom_point() +
geom_hdr_rug(method = method_kde_1d())
+
p geom_point() +
geom_hdr_rug(method = method_norm_1d())
+
p geom_point() +
geom_hdr_rug(method = method_histogram_1d())
+
p geom_point() +
geom_hdr_rug(method = method_freqpoly_1d())
```

Just like we saw with `geom_hdr()`

,
`geom_hdr_rug()`

also accepts character values for
`method`

:

```
+
p geom_point() +
geom_hdr_rug(method = "kde")
+
p geom_point() +
geom_hdr_rug(method = "norm")
+
p geom_point() +
geom_hdr_rug(method = "histogram")
+
p geom_point() +
geom_hdr_rug(method = "freqpoly")
```

Because the return values of the `method_*()`

functions
are incompatible with the 1-dimensional HDR estimation procedure, if a
2-dimensional method is specified the following error message is
issued:

```
+
p geom_point() +
geom_hdr_rug(method = method_kde())
#> Warning: Computation failed in `stat_hdr_rug()`
#> Caused by error in `get_hdr_1d()`:
#> ! Invalid `method` argument -- did you forget the `_1d()`?
```

Lastly, we see that the `method`

argument of
`get_hdr_1d()`

behaves similarly.

```
<- get_hdr_1d(df$x, method = method_kde_1d())
res
str(res)
#> List of 3
#> $ df_est:'data.frame': 512 obs. of 4 variables:
#> ..$ x : num [1:512] -3.01 -2.99 -2.98 -2.97 -2.95 ...
#> ..$ fhat : num [1:512] 0.00728 0.00748 0.00769 0.00789 0.00809 ...
#> ..$ fhat_discretized: num [1:512] 9.73e-05 1.00e-04 1.03e-04 1.05e-04 1.08e-04 ...
#> ..$ hdr : num [1:512] 1 1 1 1 1 1 1 1 1 1 ...
#> $ breaks: Named num [1:5] 0.0188 0.0562 0.1601 0.3146 Inf
#> ..- attr(*, "names")= chr [1:5] "99%" "95%" "80%" "50%" ...
#> $ data :'data.frame': 500 obs. of 2 variables:
#> ..$ x : num [1:500] -0.626 0.184 -0.836 1.595 0.33 ...
#> ..$ hdr_membership: num [1:500] 0.5 0.5 0.8 0.95 0.5 0.8 0.5 0.8 0.5 0.5 ...
```

Again, for details on the above output of `get_hdr_1d()`

,
see `?get_hdr_1d`

.

`method_*()`

functionsNow that we understand the ways in which `method`

can be
specified let’s look at the internals of the `method_*()`

functions. Note: the implementations discussed in this section depend
heavily on topics in functional programming, especially closures
and function
factories. While not necessary, a good understanding of these ideas
is helpful—the linked chapters from Hadley Wickham’s *Advanced R*
are a great place to start.

Looking at the definition of `method_kde()`

, we see that
it is a function of `h`

and `adjust`

, returning a
closure with arguments `data`

, `n`

,
`rangex`

, and `rangey`

. The closure passes the
`x`

and `y`

columns of `data`

to
`MASS::kde2d()`

, returning the estimated density evaluated on
a grid with columns `x`

, `y`

, and
`fhat`

. This closure is what `geom_hdr()`

expects
as its `method`

argument, and is how the HDRs are estimated
(via `get_hdr()`

).

```
method_kdefunction (h = NULL, adjust = c(1, 1))
{function(data, n, rangex, rangey) {
if (is.null(h)) {
<- c(MASS::bandwidth.nrd(data$x), MASS::bandwidth.nrd(data$y))
h
}<- h * adjust
h <- MASS::kde2d(x = data$x, y = data$y, n = n,
kdeout h = h, lims = c(rangex, rangey))
<- with(kdeout, expand.grid(x = x, y = y))
df $fhat <- as.vector(kdeout$z)
df
df
}
}<bytecode: 0x5611c4b228b0>
<environment: namespace:ggdensity>
```

Both `method_histogram()`

and
`method_freqpoly()`

behave similarly, accepting parameters
governing the density estimation procedure and returning a closure with
arguments `data`

, `n`

, `rangex`

, and
`rangey`

. However, these functions are significantly more
complicated as the density estimation procedures are implemented
entirely in **ggdensity**.

`method_mvnorm()`

is different in a few ways. The closure
it returns is a function of just one argument: `data`

. This
is because it does not return the estimated density evaluated on a grid.
Instead, it returns yet another closure with (vectorized) arguments
`x`

and `y`

. As in `method_kde()`

, the
return value of the closure is a representation of the estimated pdf.
The difference is the manner in which the pdf is represented. Whereas
before we had a pdf defined by a discrete approximation on a grid, we
now have an explicit definition of the pdf in terms of `x`

and `y`

.

```
method_mvnormfunction ()
{function(data) {
<- with(data, cbind(x, y))
data_matrix <- colMeans(data_matrix)
mu_hat <- chol(cov(data_matrix))
R function(x, y) {
<- cbind(x, y)
X <- backsolve(R, t(X) - mu_hat, transpose = TRUE)
tmp <- -sum(log(diag(R))) - log(2 * pi) - 0.5 *
logretval colSums(tmp^2)
exp(logretval)
}
}
}<bytecode: 0x5611c4a98538>
<environment: namespace:ggdensity>
```

To summarize each of the above cases: in the first example, the
`method_*()`

function returned a closure with arguments
`data`

, `n`

, `rangex`

, and
`rangey`

which itself returned the estimated density
evaluated on a grid; in the second, the `method_*()`

function
returned a closure with a single argument, `data`

, which
itself returned a closure with arguments `x`

and
`y`

, representing the estimated density explicitly. In both
cases, the `method_*()`

function can have any number of
parameters governing the density estimation procedure.

These are the two ways the `method`

argument may be
specified. The first is necessary for cases in which an explicit
definition of the estimated density is not computationally feasible (for
example, KDEs). The second is an easier option for the cases in which a
closed form of the estimated density is available (for example,
parametric estimators).

Let’s look at how we might define our own `method_*()`

functions in each case, beginning with a simple parametric
estimator.

In **ggdensity**, `method_mvnorm()`

estimates
HDRs based on the parametric multivariate normal model. If we wanted to
fit a simpler model in which the data is further assumed to be
independent, we could implement `method_mvnorm_ind()`

.

```
<- function() {
method_mvnorm_ind
function(data) {
<- mean(data$x)
xbar <- mean(data$y)
ybar
<- sd(data$x)
sx <- sd(data$y)
sy
# joint pdf is simply the product of the marginals
function(x, y) dnorm(x, xbar, sx) * dnorm(y, ybar, sy)
}
}
```

To use our `method_mvnorm_ind()`

, we just need to supply
it to `geom_hdr()`

’s `method`

argument.

```
ggplot(df, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind())
```

If we transform our data to have non-zero covariance we still see the major and minor axes of the contours coincide with the plot axes—exactly what we would expect with this (incorrectly) constrained model.

```
<- matrix(c(
A 2*cos(pi/6), -2*sin(pi/6),
1*sin(pi/6), 1*cos(pi/6)
byrow = TRUE, ncol = 2)
),
<- as.data.frame(as.matrix(df) %*% A)
df_rot colnames(df_rot) <- c("x", "y")
ggplot(df_rot, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind()) +
geom_point(size = .4) +
coord_fixed(xlim = c(-6, 6), ylim = c(-6, 6))
```

Notice, `method_mvnorm_ind()`

accepts no arguments. The
density estimation procedure is so simple that there are no parameters
to govern it. To allow for circular models in which the fitted variances
are required to be equal, we can implement a `circular`

argument.

```
<- function(circular = FALSE) {
method_mvnorm_ind
function(data) {
<- mean(data$x)
xbar <- mean(data$y)
ybar
if (circular) {
<- sd(c(data$x - xbar, data$y - ybar))
sx <- sx
sy else {
} <- sd(data$x)
sx <- sd(data$y)
sy
}
function(x, y) dnorm(x, xbar, sx) * dnorm(y, ybar, sy)
}
}
```

Now, the contours are perfectly circular.

```
ggplot(df_rot, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind(circular = TRUE)) +
geom_point(size = .4) +
coord_fixed(xlim = c(-6, 6), ylim = c(-6, 6))
```

In the above plot, the upper and lower portions of the HDRs are cut
off. This is because the default behavior of **ggdensity**
is to not draw HDRs outside of the “bounding box” around observed data.
This is *not* because we are using a custom
`method_*()`

function. To fix this, we need to either set a
better `ylim`

value for `geom_hdr()`

or specify a
larger range in `scale_y_continuous()`

.

```
ggplot(df_rot, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind(circular = TRUE), ylim = c(-6, 6)) +
geom_point(size = .4) +
coord_fixed(xlim = c(-6, 6), ylim = c(-6, 6))
ggplot(df_rot, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind(circular = TRUE)) +
geom_point(size = .4) +
scale_y_continuous(limits = c(-6, 6)) +
coord_fixed(xlim = c(-6, 6), ylim = c(-6, 6))
```

Notice, neither of these approaches involve arguments to
`method_mvnorm_ind()`

. Internally, the closure returned by
`method_mvnorm_ind()`

is used by `get_hdr()`

,
along with information from the `scales`

associated with the
`ggplot`

object. It is the `scales`

that need
adjusting, not anything related to the `method`

argument.

To illustrate the other case, in which the object returned by the
closure is the estimated density evaluated on a grid, we implement
`method_mvnorm_ind_grid()`

. This estimates the same
independent normal density as `method_mvnorm_ind()`

, the only
difference is the behavior of the returned closure.

```
<- function() {
method_mvnorm_ind_grid
function(data, n, rangex, rangey) {
# First, we estimate the density -----------------------------
<- mean(data$x)
xbar <- mean(data$y)
ybar
<- sd(data$x)
sx <- sd(data$y)
sy
<- function(x, y) dnorm(x, xbar, sx) * dnorm(y, ybar, sy)
f_est
# Return the density evaluated on a grid ---------------------
# df_grid defined by rangex, rangey, and n
<- expand.grid(
df_grid x = seq(rangex[1], rangex[2], length.out = n),
y = seq(rangey[1], rangey[2], length.out = n)
)
$fhat <- f_est(df_grid$x, df_grid$y)
df_grid
df_grid
}
}
```

See that returned closure has additional arguments `n`

,
`rangex`

, and `rangey`

which define the grid.
Also, the grid is represented a `data.frame`

with columns
`x`

, `y`

, and `fhat`

, where
`fhat`

is the (potentially unnormalized) density
estimate.

Again, to use our `method_mvnorm_ind_grid()`

we provide it
to `geom_hdr()`

’s method argument.

```
ggplot(df, aes(x, y)) +
geom_hdr(method = method_mvnorm_ind_grid())
```

Like we saw in the previous example, we could prevent the HDRs from
being “cut off” by specifying either the `x/ylim`

arguments
in `geom_hdr()`

or by setting a larger range in
`scale_x/y_continuous()`

.

`method_*_1d()`

functionsWe saw before that **ggdensity** uses
`method_*_1d()`

functions for the estimation of 1-dimensional
densities. The internals of these functions are very similar to the
`method_*()`

functions, the only differences are slight
changes to the arguments and return values of the returned closures.

Looking at the definition of `method_kde_1d()`

, we see the
returned closure has arguments `x`

, `n`

, and
`range`

. This is very similar to `method_kde()`

,
the only difference is we are now dealing with univariate data: the
vector argument `x`

is used instead of `data`

, and
we have a single `range`

parameter instead of
`rangex`

and `rangey`

. Similarly, the closure now
returns the estimated density evaluated on a univariate grid, with
columns `x`

and `fhat`

instead of the bivariate
grid with columns `x`

, `y`

, and `fhat`

.
Finally, see that `method_kde_1d()`

accepts several arguments
governing the density estimation procedure just like
`method_kde()`

.

```
method_kde_1dfunction (bw = "nrd0", adjust = 1, kernel = "gaussian", weights = NULL,
window = kernel)
{function(x, n, range) {
<- length(x)
nx if (is.null(weights)) {
<- rep(1/nx, nx)
weights
}else {
<- normalize(weights)
weights
}<- stats::density(x, bw = bw, adjust = adjust, kernel = kernel,
dens weights = weights, window = window, n = n, from = range[1],
to = range[2])
data.frame(x = dens$x, fhat = dens$y)
}
}<bytecode: 0x5611c6f7a6d0>
<environment: namespace:ggdensity>
```

Estimated univariate densities can also be represented explicitly, as
illustrated by `method_norm_1d()`

. Comparing this to the
previously discussed `method_mvnorm()`

we see that little has
changed: the closure is now a function of a vector `x`

instead of `data`

and returns a function of one variable
(`x`

) instead of two (`x`

and `y`

).

```
method_norm_1dfunction ()
{function(x) {
<- mean(x)
mu_hat <- sd(x)
sigma_hat function(x) dnorm(x, mu_hat, sigma_hat)
}
}<bytecode: 0x5611c88c3b08>
<environment: namespace:ggdensity>
```

Additional `method_*_1d()`

functions can be implemented in
the same way as the 2-dimensional `method_*()`

functions, so
long as the returned closure is structured in one of the two ways we
have seen here.