Different domain types

The package offers a few types of domains on which the multivariate distribution is defined, namely "R", "R+", "uniform", "simplex", and "polynomial". As the domain is one of the two building blocks that define a distribution, we first present a guide to creating a domain object that should be passed to the two main functions gen() and estimate() (as the domain argument).

Note that since the probability densities considered in the package are defined with respect to the Lebesgue measure, the package is indifferent to whether the boundary points are included in the domain or not (i.e. $\sum_ix_i^2<1$ and $\sum_ix_i^2\leq 1$ are treated equally), except for the simplex domains.

Throughout the demonstration in this section, we assume the number of covariates is p=5.

p <- 5

The entire real space

The most straightforward domain type is "R", which is the entire real space $\mathbb{R}^p$.

domain <- make_domain(type="R", p=p)
domain
#> $type
#> [1] "R"
#> 
#> $p
#> [1] 5
#> 
#> $p_deemed
#> [1] 5
#> 
#> $checked
#> [1] TRUE

The non-negative orthant of the real space

The second most commonly used domain type would be the non-negative orthant of the $\mathbb{R}^p$ space, $\mathbb{R}_+^p$. Constructing the domain is also straightforward.

domain <- make_domain(type="R+", p=p)
domain
#> $type
#> [1] "R+"
#> 
#> $p
#> [1] 5
#> 
#> $p_deemed
#> [1] 5
#> 
#> $checked
#> [1] TRUE

The simplex domains

Another useful type of domains the package offers is the simplices. Formally, we define the $(p-1)$-dimensional simplex as $\{\boldsymbol{x}\in\mathbb{R}_+^p:\sum_{i=1}^p x_i=1, \boldsymbol{x}\succ\boldsymbol{0}\}$.

Defining the domain is also straightforward without any additional arguments required:

domain <- make_domain(type="simplex", p=p)
domain
#> $type
#> [1] "simplex"
#> 
#> $p
#> [1] 5
#> 
#> $p_deemed
#> [1] 4
#> 
#> $checked
#> [1] TRUE
#> 
#> $simplex_tol
#> [1] 1e-10

The simplex_tol member is unique to this type of domain, and is used internally for checking if each row in the data matrix indeed sums to 1. It is also the only domain type that has p_deemed one less than p, whereas for other domain types these two are equal. It is because it is currently the only domain type implemented that is a Lebesgue-null subset of $\mathbb{R}^p$.

Uniform-type domains

This domain type assumes that each component/covariate has the same domain, which is a finite union of intervals. The lefts arguments specify the left endpoints of each interval, and rights specify the right endpoints accordingly. Formally, the domain is defined as $\left(\cup_{i}[\mathrm{lefts}_i,\mathrm{rights}_i]\right)^p$.

For example, if we assume each covariate is larger than (or equal to) 1, then one can specify the domain as follows.

domain <- make_domain(type="uniform", p=p, lefts=1, rights=Inf)
domain
#> $type
#> [1] "uniform"
#> 
#> $p
#> [1] 5
#> 
#> $p_deemed
#> [1] 5
#> 
#> $lefts
#> [1] 1
#> 
#> $rights
#> [1] Inf
#> 
#> $left_inf
#> [1] FALSE
#> 
#> $right_inf
#> [1] TRUE
#> 
#> $checked
#> [1] TRUE

Note again that we do not differentiate between open/closed/half-open half-closed intervals, as the probability of the random vector lying at the boundary points is assumed to be 0.

If rights is just Inf and lefts is simply -Inf or 0, this corresponds to the "R" and "R+" domain types, and the domain type would be changed accordingly.

domain <- make_domain(type="uniform", p=p, lefts=-Inf, rights=Inf) # Changed to R
#> Warning in make_domain(type = "uniform", p = p, lefts = -Inf, rights = Inf):
#> Domain type automatically changed to R.
domain <- make_domain(type="uniform", p=p, lefts=0, rights=Inf) # Changed to R+
#> Warning in make_domain(type = "uniform", p = p, lefts = 0, rights = Inf): Domain
#> type automatically changed to R+.

Of course, the domain can also be bounded, e.g. $[-1,1]^p$.

domain <- make_domain(type="uniform", p=p, lefts=-1, rights=1) 
domain
#> $type
#> [1] "uniform"
#> 
#> $p
#> [1] 5
#> 
#> $p_deemed
#> [1] 5
#> 
#> $lefts
#> [1] -1
#> 
#> $rights
#> [1] 1
#> 
#> $left_inf
#> [1] FALSE
#> 
#> $right_inf
#> [1] FALSE
#> 
#> $checked
#> [1] TRUE

A more interesting case would be when the uniform domain for each component is a union of multiple intervals, e.g. $((-\infty,-2]\cup[-1,1]\cup[2,+\infty))^p$.

domain <- make_domain(type="uniform", p=p, lefts=c(-Inf, -1, 2), rights=c(-2, 1, Inf)) 
domain
#> $type
#> [1] "uniform"
#> 
#> $p
#> [1] 5
#> 
#> $p_deemed
#> [1] 5
#> 
#> $lefts
#> [1] -Inf   -1    2
#> 
#> $rights
#> [1]  -2   1 Inf
#> 
#> $left_inf
#> [1] TRUE
#> 
#> $right_inf
#> [1] TRUE
#> 
#> $checked
#> [1] TRUE

Solution for infinite unions of intervals or non-uniform domains for each component

Domains that are a union of infinitely many intervals are currently not supported, but in some cases they can be approximated by a finite union.

For example, if the goal is to generate samples using gen(), $\cup_{i=0}^{\infty}[2i,2i+1]$ may be approximated by $\cup_{i=0}^{10}[2i,2i+1]$ in the first three lines of code below, assuming the joint density is negligible if any $x_j>21$.

domain <- make_domain(type="uniform", p=p, lefts=seq(0, 20, by=2), rights=seq(1, 21, by=2)) # If goal is random sample generation, just truncate the infinite union by an interval large enough 
x <- gen(1000, setting="gaussian", abs=FALSE, eta=rep(0,p), K=diag(p), domain=domain, finite_infinity=100, seed=2, burn_in=1000, thinning=1000, verbose=FALSE, remove_outofbound=TRUE) # Generates a Gaussian
hist(x, breaks=20) # Generated data far from the upper bound 21 we set above

If the goal is estimation given a data matrix using estimate(), one may simply truncate the infinite union by $\left[-\max_{i,j}\left|x_{ij}\right|, \max_{i,j}\left|x_{ij}\right|\right]$, as below.

# Suppose the goal is estimation, simply truncate to the maximum absolute value in x
max_i <- ceiling((max(abs(x)) - 1) / 2)
domain <- make_domain(type="uniform", p=p, lefts=seq(0, 2*max_i, by=2), rights=seq(1, 2*max_i+1, by=2))
# Estimate the inverse covariance matrix K with no penalty and no diagonal multiplier since n >> p, assuming mu = eta = 0
est <- estimate(x=x, setting="gaussian", domain=domain, centered = TRUE,
                mode="min_pow", param1=1, param2=3, lambda1s=0,
                diagonal_multiplier=1, verbose=FALSE, return_raw=TRUE)
est$raw_estimates[[1]] # Should be close to diag(p) we used to generate x
#>              [,1]        [,2]         [,3]         [,4]        [,5]
#> [1,]  1.037142623 -0.19835515 -0.003285077  0.258111089  0.06697515
#> [2,] -0.198355147  1.16963917 -0.096986651 -0.123145792  0.16352970
#> [3,] -0.003285077 -0.09698665  1.156527553 -0.009055624 -0.03244605
#> [4,]  0.258111089 -0.12314579 -0.009055624  1.267582275 -0.32215930
#> [5,]  0.066975147  0.16352970 -0.032446050 -0.322159301  0.84250419

If each covariate has its own domain as a different union of intervals, refer to the polynomial-type domains below.

Polynomial-type domains

The most complicated and flexible domain type is "polynomial". Although effort is made for simplifying the definition of this domain type, the user may find the exact rules/requirements confusing, but hopefully the examples should make it easier to follow.

Each polynomial-type domain is defined by a set of inequalities, where for each inequality a constant on the right-hand side is compared to a polynomial on the left-hand side, which must not have any interaction term and can have at most one term for each covariate. (That is, an inequality like $x_1x_2>1$ or $x_1^2+x_1^3>1$ are unfortunately not yet supported in the current version.) If there are more than one inequality, the user must specify a logical rule using "&" and "|" telling the program how to aggregate the domains defined by each inequality.

Each term can be $\log(x)$, $\exp(nx)$ with $n$ a nonzero integer, or a rational power of $x$, where $x^{a/b}=(-1)^a|x|^{a/b}$ ($a$, $b$ coprime) for $x\geq 0$ and $x<0$ if $b$ is odd, or NA if $x<0$ and $b$ is even.

For example, an inequality may look like $1.3x_1^2-2.7 x_2^3+0.37\exp(2x_3)-1.4\log(x_4)>1.3$, and another may look like $0.5x_1^{-2/3}+1.91x_2^{-5/4}-0.73\exp(-3x_3)-1.7\log(x_4)<-1.3$. If we wish to let our domain be the intersection of the two domains defined by the two inequalities, we write

domain <- make_domain(type="polynomial", 
                      p=p,
                      rule="1 && 2",
                      ineqs=list(
                        list(expression="1.3x1^2-2.7* x2^3+0.37exp(2x3)-1.4log(x4)>1.3", 
                             nonnegative=FALSE, abs=FALSE),
                        list(expression="5e-1x1^(-2/3)+1.91*x2^(-5/4)-0.73exp(-3*x3)-1.7*log(x4)<-1.3",
                             nonnegative=FALSE, abs=FALSE)
                      )
)
domain
#> $type
#> [1] "polynomial"
#> 
#> $p
#> [1] 5
#> 
#> $p_deemed
#> [1] 5
#> 
#> $checked
#> [1] TRUE
#> 
#> $rule
#> [1] "1 && 2"
#> 
#> $ineqs
#> $ineqs[[1]]
#> $ineqs[[1]]$uniform
#> [1] FALSE
#> 
#> $ineqs[[1]]$larger
#> [1] TRUE
#> 
#> $ineqs[[1]]$power_numers
#> [1] 2 3 2 0 1
#> 
#> $ineqs[[1]]$power_denoms
#> [1] 1 1 0 0 1
#> 
#> $ineqs[[1]]$coeffs
#> [1]  1.30 -2.70  0.37 -1.40  0.00
#> 
#> $ineqs[[1]]$const
#> [1] 1.3
#> 
#> $ineqs[[1]]$abs
#> [1] FALSE
#> 
#> $ineqs[[1]]$nonnegative
#> [1] FALSE
#> 
#> 
#> $ineqs[[2]]
#> $ineqs[[2]]$uniform
#> [1] FALSE
#> 
#> $ineqs[[2]]$larger
#> [1] FALSE
#> 
#> $ineqs[[2]]$power_numers
#> [1] -2 -5 -3  0  1
#> 
#> $ineqs[[2]]$power_denoms
#> [1] 3 4 0 0 1
#> 
#> $ineqs[[2]]$coeffs
#> [1]  0.50  1.91 -0.73 -1.70  0.00
#> 
#> $ineqs[[2]]$const
#> [1] -1.3
#> 
#> $ineqs[[2]]$abs
#> [1] FALSE
#> 
#> $ineqs[[2]]$nonnegative
#> [1] FALSE
#> 
#> 
#> 
#> $postfix_rule
#> [1] "1 2 &"

Inequalities

In this subsection we discuss the ineqs argument of make_domain() when defining a polynomial-type domain.

The argument must be a list, and each element in this list is a list itself that represents an inequality. The recommended way of representing an inequality uses a list of three members: (1) nonnegative, a logical indicating whether the domain of this inequality should be restricted to $\mathbb{R}_+^p$, (2) abs, a logical whether to use the absolute values $|\boldsymbol{x}|$ in place of $\boldsymbol{x}$ when evaluating the inequality, and (3) expression, a string expression of the inequality, which we explain the next. There is another highly discouraged way of representing an inequality by how the inequality is stored internally that is not covered in this guide.

We call a term in an expression “uniform” if it is written as a function in “x”, and “non-uniform” if it is written as a function in “x” followed by an index, e.g. “x1” or “x2”. A uniform term can be (1) "log(x)", (2) "exp(x)", "exp(nx)" or "exp(n*x)" where n is a nonzero integer, or (3) a power in one of the following forms: "x^n", "x^(-n)", "x^(n/m)", "x^(-n/m)", "x^(n/-m)" (replace n and m by non-zero integers). A non-uniform term is similar (replacing x by e.g. x1 or x2), and can start with a coefficient, e.g. "1.2*log(x)", "-2.3x^2".

An expression must have the variable part on the left-hand side, followed by one of “<”, “>”, “<=”, “>=”, and finally a number to compare to. The variable part can be (1) a single uniform term (e.g. x^(-2/3), exp(x), log(x)), (2) a single uniform term surrounded by "sum()" (e.g. sum(x^(-2/3)), sum(exp(x)), sum(log(x))), or (3) a sum of non-uniform terms separated by +/- (e.g. 1.3x1^2-0.7*x2^(2/3)+2e3log(x)+1.3e-2*exp(-x)).

For (1), the same inequality will be applied to each covariate independently; (2) on the other hand is a shorthand for (3) with the same form for all components and coefficients all equal to 1 (e.g. "sum(x^2)" is just "x1^2+x2^2+...+xp^2").

In conclusion, the following are some examples of expression:

"x<=-3.2e2" # (1)
#> [1] "x<=-3.2e2"
"x^(-2/3)>3.1" # (1)
#> [1] "x^(-2/3)>3.1"
"exp(x)>1.3" # (1)
#> [1] "exp(x)>1.3"
"exp(-23x)<=3e3" # (1)
#> [1] "exp(-23x)<=3e3"
"log(x) < 1.3" # (1)
#> [1] "log(x) < 1.3"
"sum(x)<=3e3" # (2)
#> [1] "sum(x)<=3e3"
"sum(x^2)>10" # (2)
#> [1] "sum(x^2)>10"
"sum(x^(1/3))>10" # (2)
#> [1] "sum(x^(1/3))>10"
"sum(x^(-2/3))<=3e3" # (2)
#> [1] "sum(x^(-2/3))<=3e3"
"sum(exp(-23x))<=3e3" # (2)
#> [1] "sum(exp(-23x))<=3e3"
"sum(log(x)) < 2" # (2)
#> [1] "sum(log(x)) < 2"
"x1>1" # (3)
#> [1] "x1>1"
"x2<=1" # (3)
#> [1] "x2<=1"
"x1^(2/3)-1.3x2^(-3)< 1" # (3)
#> [1] "x1^(2/3)-1.3x2^(-3)< 1"
"exp(x1)+2.3*x2^2 > 2" # (3)
#> [1] "exp(x1)+2.3*x2^2 > 2"
"1x1+2x2+3x3+4x4+5x5 <1" # (3)
#> [1] "1x1+2x2+3x3+4x4+5x5 <1"
"0.5*x1^(-2/3)-0.3x4^(4/-6)+2e3x3^(-6/9) < 3.5e5" # (3)
#> [1] "0.5*x1^(-2/3)-0.3x4^(4/-6)+2e3x3^(-6/9) < 3.5e5"
"0.5*x1^(-2/3)-x3^3 + 2log(x2)- 1.3e4exp(-25*x6)+x8-.3x5^(-3/-4) >= 2" # (3)
#> [1] "0.5*x1^(-2/3)-x3^3 + 2log(x2)- 1.3e4exp(-25*x6)+x8-.3x5^(-3/-4) >= 2"

Rule

If more than one inequality is provided, the user must specify the rule to aggregate the domains defined by each function. The rule can only contain inequality numbers (indexed starting from 1 to length(domain$ineqs)), logical operators (& / |, or && / ||; no difference is made between the single and doubled operators), parentheses and space. The only other requirement is that & and | is given the same precedence, and thus only operators of the same kind can be chained without a parenthesis, i.e. 1 & 2 | 3 is not allowed; one must specify (1 & 2) | 3 or 1 | (2 & 3) to avoid ambiguity. The following are some rules allowed.

"1"
#> [1] "1"
"1 & 2" # Assuming there are at least 2 inequalities
#> [1] "1 & 2"
"1 || 2" # Assuming >= 2 inequalities
#> [1] "1 || 2"
"1 && 2 & 3" # Assuming >= 3 inequalities
#> [1] "1 && 2 & 3"
"1 | 2 || 3" # Assuming >= 3 inequalities
#> [1] "1 | 2 || 3"
"(((1 & 2) | (3 & 4 && 5) || 6 || 7) & 8 & 9 && 10) || 11 " # Assuming >= 11 inequalities
#> [1] "(((1 & 2) | (3 & 4 && 5) || 6 || 7) & 8 & 9 && 10) || 11 "

Examples of polynomial domains

# x such that sum(x^2) > 10 && sum(x^(1/3)) > 10 with x allowed to be negative
domain <- make_domain("polynomial", p=p, rule="1 && 2",
                      ineqs=list(list(expression="sum(x^2)>10", abs=FALSE, nonnegative=FALSE),
                                 list(expression="sum(x^(1/3))>10", abs=FALSE, nonnegative=FALSE)))

# x such that {x1 > 1 && log(1.3) < x2 < 1 && x3 > log(1.3) && ... && xp > log(1.3)}
domain <- make_domain("polynomial", p=p, rule="1 && 2 && 3",
                      ineqs=list(list(expression="x1>1", abs=FALSE, nonnegative=TRUE),
                                 list(expression="x2<1", abs=FALSE, nonnegative=TRUE),
                                 list(expression="exp(x)>1.3", abs=FALSE, nonnegative=FALSE)))

#' # x in R_+^p such that {sum(log(x))<2 || (x1^(2/3)-1.3x2^(-3)<1 && exp(x1)+2.3*x2>2)}
domain <- make_domain("polynomial", p=p, rule="1 || (2 && 3)",
                      ineqs=list(list(expression="sum(log(x))<2", abs=FALSE, nonnegative=TRUE),
                                 list(expression="x1^(2/3)-1.3x2^(-3)<1", abs=FALSE, nonnegative=TRUE),
                                 list(expression="exp(x1)+2.3*x2^2>2", abs=FALSE, nonnegative=TRUE)))

#' # x in R_+^p such that {x in R_+^p: sum_j j * xj <= 1}
domain <- make_domain("polynomial", p=p,
                      ineqs=list(
                        list(expression=paste(paste(sapply(1:p, function(j){paste(j, "x", j, sep="")}), collapse="+"), "<1"),
                             abs=FALSE, nonnegative=TRUE)))

# The l-1 ball {sum(|x|) < 1}
domain <- make_domain("polynomial", p=p, 
                      ineqs=list(list(expression="sum(x)<1", abs=TRUE, nonnegative=FALSE)))

Generalized Score Matching on Generalized Domain Types

Shiqing Yu

2020-04-24

Different domain types

The entire real space

The non-negative orthant of the real space

The simplex domains

Uniform-type domains

Solution for infinite unions of intervals or non-uniform domains for each component

Polynomial-type domains

Inequalities

Rule

Examples of polynomial domains

Distribution Models Supported

Examples of Multivariate Graphical Models

Truncated Gaussian Graphical Models on the Non-negative Orthant

Data setup

Estimation using `estimate()` with `x` directly

Estimate using `elts`

Results for one lambda

Aggregating multiple ROC curves

Exponential Square-Root Graphical Models on the Non-negative Orthant

Data setup

Estimation

Gamma Graphical Models on the Non-negative Orthant

General a-b Graphical Models on the Non-negative Orthant

(Untruncated) Gaussian Graphical Models on the Entire Real Space

Aitchison \(A^d\) Models on the Simplex

Univariate Truncated Normal Distributions on x > 0

Introduction

Variance Estimation and Confidence Intervals

Plots from Yu et al (2019)

Generalized Score Matching on Generalized Domain Types

Shiqing Yu

2020-04-24

Different domain types

The entire real space

The non-negative orthant of the real space

The simplex domains

Uniform-type domains

Solution for infinite unions of intervals or non-uniform domains for each component

Polynomial-type domains

Inequalities

Rule

Examples of polynomial domains

Distribution Models Supported

Examples of Multivariate Graphical Models

Truncated Gaussian Graphical Models on the Non-negative Orthant

Data setup

Estimation using estimate() with x directly

Estimate using elts

Results for one lambda

Aggregating multiple ROC curves

Exponential Square-Root Graphical Models on the Non-negative Orthant

Data setup

Estimation

Gamma Graphical Models on the Non-negative Orthant

General a-b Graphical Models on the Non-negative Orthant

(Untruncated) Gaussian Graphical Models on the Entire Real Space

Aitchison \(A^d\) Models on the Simplex

Univariate Truncated Normal Distributions on x > 0

Introduction

Variance Estimation and Confidence Intervals

Plots from Yu et al (2019)

Estimation using `estimate()` with `x` directly

Estimate using `elts`