Type: | Package |
Title: | Linear p-Wasserstein Projections |
Version: | 0.2.3 |
Date: | 2025-02-03 |
Description: | Performs Wasserstein projections from the predictive distributions of any model into the space of predictive distributions of linear models. We utilize L1 penalties to also reduce the complexity of the model space. This package employs the methods as described in Dunipace, Eric and Lorenzo Trippa (2020) <doi:10.48550/arXiv.2012.09999>. |
License: | GPL (== 3.0) |
Depends: | R (≥ 4.0) |
Imports: | approxOT (≥ 1.2), glmnet, oem, Rcpp, rlang, ROI, ROI.plugin.ecos, ROI.plugin.lpsolve, Matrix, rqPen, quantreg, doParallel, foreach, doRNG, dplyr, stats, magrittr, methods, slam, lifecycle |
LinkingTo: | approxOT (≥ 1.2), BH, Rcpp (≥ 1.0.0), RcppCGAL, RcppEigen, RcppProgress, RSpectra |
Suggests: | ggplot2, ggsci, ggridges, testthat (≥ 2.1.0), transport, Rmosek, spelling, ECOSolveR |
RoxygenNote: | 7.3.2 |
URL: | https://github.com/ericdunipace/WpProj |
BugReports: | https://github.com/ericdunipace/WpProj/issues |
SystemRequirements: | C++17 |
Encoding: | UTF-8 |
Language: | en-US |
NeedsCompilation: | yes |
Packaged: | 2025-02-03 23:43:13 UTC; eifer |
Author: | Eric Dunipace |
Maintainer: | Eric Dunipace <edunipace@mail.harvard.edu> |
Repository: | CRAN |
Date/Publication: | 2025-02-05 18:20:02 UTC |
WpProj: Linear p-Wasserstein Projections
Description
Performs Wasserstein projections from the predictive distributions of any model into the space of predictive distributions of linear models. We utilize L1 penalties to also reduce the complexity of the model space. This package employs the methods as described in Dunipace, Eric and Lorenzo Trippa (2020) doi:10.48550/arXiv.2012.09999.
Author(s)
Maintainer: Eric Dunipace edunipace@mail.harvard.edu (ORCID)
Other contributors:
Clemens Schmid clemens@nevrome.de (ORCID) (ETA progres bar is adapted from their code) [contributor]
Espen Bernton ('Hilbert Sort' adapted from their code) [contributor]
Mathieu Gerber ('Hilbert Sort' adapted from their code) [contributor]
Pierre Jacob ('Hilbert Sort' adapted from their code) [contributor]
Bin Dai bdai@uwalumni.com (W2 projections adapted from their 'OEM' code) [contributor]
Jared Huling jaredhuling@gmail.com (ORCID) (W2 projections adapted from their 'OEM' code) [contributor]
Yixuan Qiu (W2 projections adapted from their 'OEM' code) [contributor]
Dominic Schuhmacher dominic.schuhmacher@mathematik.uni-goettingen.de ('Shortsimplex 'optimal transport method adapted from their code) [contributor]
Nicolas Bonneel ('Network Simplex' algorithm adapted from their code) [contributor]
See Also
Useful links:
Run the Hahn-Carvalho Method
Description
Runs the Hahn-Carvalho method but adapted to return full distributions.
Usage
HC(
X,
Y = NULL,
theta,
family = "gaussian",
penalty = c("elastic.net", "selection.lasso", "lasso", "ols", "mcp", "scad", "mcp.net",
"scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net",
"grp.scad.net", "sparse.grp.lasso"),
method = c("selection.variable", "projection"),
lambda = numeric(0),
nlambda = 100L,
lambda.min.ratio = NULL,
alpha = 1,
gamma = 1,
tau = 0.5,
groups = numeric(0),
penalty.factor = NULL,
group.weights = NULL,
maxit = 500L,
tol = 1e-07,
irls.maxit = 100L,
irls.tol = 0.001
)
Arguments
X |
Covariates |
Y |
Predictions |
theta |
Parameters |
family |
Family for method. See oem. |
penalty |
Penalty function. See oem. |
method |
Should we run a selection variable methodology or projection? |
lambda |
lambda for lasso. See oem for this and all options below |
nlambda |
Number of lambda values. |
lambda.min.ratio |
Minimum lambda ratio for self selected lambda |
alpha |
elastic net mixing. |
gamma |
tuning parameters for SCAD and MCP |
tau |
mixing parameter for sparse group lasso |
groups |
A vector of grouping values |
penalty.factor |
Penalty factor for OEM. |
group.weights |
Weights for groupped lasso |
maxit |
Max iteration for OEM |
tol |
Tolerance for OEM |
irls.maxit |
IRLS max iterations for OEM |
irls.tol |
IRLS tolerance for OEM |
Value
a WpProj
object with selected covariates and their values
References
Hahn, P. Richard and Carlos M. Carvalho. (2014) "Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective." https://arxiv.org/pdf/1408.0464
Examples
n <- 32
p <- 10
s <- 99
x <- matrix( 1, nrow = n, ncol = p )
beta <- (1:10)/10
y <- x %*% beta
post_beta <- matrix(beta, nrow=p, ncol=s)
post_mu <- x %*% post_beta
fit <- HC(X=x, Y=post_mu, theta = post_beta,
penalty = "lasso",
method = "projection"
)
Options For Use With the L0 Method
Description
Options For Use With the L0 Method
Usage
L0_method_options(
method = c("binary program", "projection"),
transport.method = transport_options(),
epsilon = 0.05,
OTmaxit = 0,
parallel = NULL,
...
)
Arguments
method |
Should covariates be selected as an approximate "binary program" or should a projection method be used. Default is the approximate binary program. |
transport.method |
Method for Wasserstein distance calculation. Should be one the outputs of |
epsilon |
A value > 0 for the penalty parameter if using the Sinkhorn method for optimal transport |
OTmaxit |
The number of iterations to run the Wasserstein distance solvers. |
parallel |
A cluster backend to be used by |
... |
Not used |
Value
a named list corresponding to the above arguments
Examples
L0_method_options()
Options For Use With the L1 Method
Description
Options For Use With the L1 Method
Usage
L1_method_options(
penalty = L1_penalty_options(),
lambda = numeric(0),
nlambda = 500L,
lambda.min.ratio = 1e-04,
gamma = 1,
alpha = 1,
maxit = 500L,
model.size = NULL,
tol = 1e-07,
display.progress = FALSE,
solver.options = NULL
)
Arguments
penalty |
The penalty to use. See |
lambda |
The penalty parameter to use if method is "L1". |
nlambda |
The number of lambdas to explore for the "L1" method if |
lambda.min.ratio |
The minimum ratio of max to min lambda for "L1" method. Default 1e-4. |
gamma |
Tuning parameter for SCAD and MCP penalties if method = "L1". |
alpha |
Tuning parameter for elastic net penalties |
maxit |
The maximum iterations for optimization. Default is 500. |
model.size |
What is the maximum number of coefficients to have in the final model. Default is NULL. If NULL, will find models from the minimum size, 0, to the number of columns in |
tol |
The tolerance for convergence |
display.progress |
Logical. Should intermediate progress be displayed? TRUE or FALSE. Default is FALSE. |
solver.options |
Options to be passed on to the solver. Only used for "ecos" and "mosek" solvers. |
Value
A list with names corresponding to each argument above.
See Also
Examples
L1_method_options()
Recognized L1 Penalties
Description
Recognized L1 Penalties
Usage
L1_penalty_options()
Value
A character vector with the possible penalties for L1 methods
Examples
L1_penalty_options()
# [1] "lasso" "ols" "mcp" "elastic.net" "scad"
# [6] "mcp.net" "scad.net" "grp.lasso" "grp.lasso.net" "grp.mcp"
# [11] "grp.scad" "grp.mcp.net" "grp.scad.net" "sparse.grp.lasso"
1-Wasserstein projection
Description
1-Wasserstein projection
Usage
W1L1(
X,
Y,
theta = NULL,
penalty = c("none", "lasso", "scad", "mcp"),
model.size = NULL,
lambda = numeric(0),
lambda.min.ratio = 1e-04,
nlambda = 10,
gamma = 1,
display.progress = FALSE,
solver = c("cone", "rqPen", "gurobi", "mosek"),
...
)
Arguments
X |
Covariates |
Y |
Predictions from arbitrary model |
theta |
Parameters of original linear model. Optional. |
penalty |
penalty term to use. One of "none", "lasso","scad","mcp" |
model.size |
Maximum number of coefficients in interpretable model |
lambda |
Lambdas to use |
lambda.min.ratio |
Minimum lambda to select if choosing lambdas using default methods |
nlambda |
number of lambdas to look through |
gamma |
parameter for SCAD and MCP methods |
display.progress |
Print intermediate output? |
solver |
Solver to use. Must be one of "rqPen", "gurobi", "mosek", though "mosek" is preferred. |
... |
options to pass to solvers |
Value
WpProj
object
2-Wasserstein distance selection by Integer Programming
Description
2-Wasserstein distance selection by Integer Programming
Usage
W2IP(
X,
Y = NULL,
theta,
transport.method = transport_options(),
model.size = NULL,
nvars = NULL,
maxit = 100L,
infimum.maxit = 100L,
tol = 1e-07,
solver = c("cone", "lp", "mosek", "cplex", "gurobi"),
display.progress = FALSE,
parallel = NULL,
...
)
Arguments
X |
Covariates |
Y |
Predictions from arbitrary model |
theta |
Parameters of original linear model. Required |
transport.method |
Method for Wasserstein distance calculation. Should be one of the outputs of |
model.size |
Maximum number of coefficients in interpretable model |
nvars |
The number of variables to explore. Should be an integer vector of model sizes. Default is NULL which will explore all models from 1 to |
maxit |
Maximum number of solver iterations |
infimum.maxit |
Maximum iterations to alternate binary program and Wasserstein distance calculation |
tol |
Tolerance for convergence of coefficients |
solver |
The solver to use. Must be one of "cone","lp", "cplex", "gurobi","mosek". |
display.progress |
Should progress be printed? |
parallel |
foreach back end. See |
... |
Extra args to Wasserstein distance methods |
Details
For argument solution.method
, options "cone" and "lp" use the free solvers "ECOS" and "lpSolver", respectively. "cplex", "gurobi" and "mosek" require installing the corresponding commercial solvers.
2-Wasserstein distance linear projections with an L_1
penalty
Description
2-Wasserstein distance linear projections with an L_1
penalty
Usage
W2L1(
X,
Y = NULL,
theta = NULL,
penalty = c("lasso", "ols", "mcp", "elastic.net", "selection.lasso", "scad", "mcp.net",
"scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net",
"grp.scad.net", "sparse.grp.lasso"),
method = c("projection", "selection.variable", "location.scale", "scale"),
transport.method = transport_options(),
epsilon = 0.05,
OTmaxit = 0,
model.size = NULL,
lambda = numeric(0),
nlambda = 100L,
lambda.min.ratio = NULL,
alpha = 1,
gamma = 1,
tau = 0.5,
groups = numeric(0),
scale.factor = numeric(0),
penalty.factor = NULL,
group.weights = NULL,
maxit = 500L,
tol = 1e-07,
irls.maxit = 100L,
irls.tol = 0.001,
infimum.maxit = NULL,
display.progress = FALSE
)
Arguments
X |
An n x p matrix of covariates |
Y |
An n x s matrix of predictions |
theta |
optional parameter matrix for selection methods. Should be p x s. |
penalty |
Form of penalty. One of "lasso", "ols", "mcp", "elastic.net","selection.lasso", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp","grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso" |
method |
"selection.variable" or "projection |
transport.method |
Method for calculating the Wasserstein distance. One of "exact", "sinkhorn", "greenkhorn","hilbert" |
epsilon |
Penalty parameter for Sinkhorn and Greenkhorn and optimal transport |
OTmaxit |
Maximum iterations for the optimal transport iterations |
model.size |
The maximum number of desired covariates. Defaults to the number of covariates. |
lambda |
Penalty parameter for lasso regression. See oem. |
nlambda |
Number of lambda values. See oem. |
lambda.min.ratio |
Minimum lambda ratio for self selected lambda. See oem. |
alpha |
elastic net mixing. See oem. |
gamma |
tuning parameters for SCAD and MCP. See oem. |
tau |
mixing parameter for sparse group lasso. See oem. |
groups |
A vector of grouping values. See oem. |
scale.factor |
Value to standardize the covariates by. Typically, is the standard deviation. Should have length 1 or length same as the number of covariates |
penalty.factor |
Penalty factor for OEM. See oem. |
group.weights |
Weights for group lasso. See oem. |
maxit |
Max iteration for OEM. See oem. |
tol |
Tolerance for OEM. See oem. |
irls.maxit |
IRLS max iterations for OEM. See oem. |
irls.tol |
IRLS tolerance for OEM. See oem. |
infimum.maxit |
Maximum number of iterations alternating optimization and Wasserstein distance calculation. Irrelevant for projection method. |
display.progress |
Display intermediate progress? |
Value
Object of class WpProj
Infinity-Wasserstein Linear Projections With an L1 Penalty
Description
Infinity-Wasserstein Linear Projections With an L1 Penalty
Usage
WInfL1(
X,
Y,
theta = NULL,
penalty = c("none", "lasso", "mcp", "scad"),
lambda = numeric(0),
lambda.min.ratio = 1e-04,
gamma = 1.5,
nlambda = 10,
solver = c("cone", "mosek", "gurobi"),
options = list(solver_opts = NULL, init = NULL, tol = 1e-07, iter = 100),
model.size = NULL,
display.progress = FALSE,
...
)
Arguments
X |
An n x p matrix of covariates |
Y |
An n x s matrix of predictions |
theta |
optional parameter matrix for selection methods. Should be p x s. |
penalty |
Form of penalty. One of "none", "lasso", "mcp","scad" |
lambda |
Penalty parameter for lasso regression. |
lambda.min.ratio |
Minimum lambda ratio for self selected lambda. |
gamma |
tuning parameters for SCAD and MCP. |
nlambda |
Number of lambda values. |
solver |
Which solver to use. One of "cone","mosek", or "gurobi". Note "mosek" and "gurobi" are commercial installers. |
options |
A list containing slots |
model.size |
The maximum number of paramters to consider. Should be an integer greater than 1 and less than or equal to the number of covariates |
display.progress |
Whether to display progress. TRUE or FALSE |
... |
Additional arguments passed to the solver as needed |
Value
A WpProj
object
p-Wasserstein projections with an L0 penalty
Description
p-Wasserstein projections with an L0 penalty
Usage
WPL0(
X,
Y = NULL,
theta,
power = 2,
method = c("selection.variable", "projection"),
transport.method = transport_options(),
epsilon = 0.05,
OTmaxit = 0,
parallel = NULL,
...
)
Arguments
X |
matrix of covariates |
Y |
matrix of predictions |
theta |
optional matrix of coefficients from original model, if relevant |
power |
power of the Wasserstein distance |
method |
One of "selection.variable" or "projection". Methods decide whether covariate matrix in |
transport.method |
Method for Wasserstein distance calculation. Should be one of the outputs of |
epsilon |
hyperparameter for sinkhorn iterations |
OTmaxit |
max iteration for sinkhorn iterations |
parallel |
foreach backend |
Value
WpProj
object
p-Wasserstein Linear Projections With an L_1
Penalty
Description
p-Wasserstein Linear Projections With an L_1
Penalty
Usage
WPL1(
X,
Y = NULL,
theta = NULL,
power = 2,
penalty = c("lasso", "ols", "mcp", "elastic.net", "selection.lasso", "scad", "mcp.net",
"scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net",
"grp.scad.net", "sparse.grp.lasso"),
model.size = NULL,
lambda = numeric(0),
nlambda = 100L,
lambda.min.ratio = 1e-04,
gamma = 1,
alpha = 1,
maxit = 500L,
tol = 1e-07,
...
)
Arguments
X |
matrix of covariates |
Y |
matrix of predictions |
theta |
optional parameter matrix for selection methods. |
power |
power of the Wasserstein distance |
penalty |
Form of penalty. One of "lasso", "ols", "mcp", "elastic.net","selection.lasso", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp","grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso" |
model.size |
How many coefficients should final model have |
lambda |
penalty parameter |
nlambda |
number of lambdas to explore |
lambda.min.ratio |
minimum ratio of max to min lambda |
gamma |
Tuning parameter for SCAD and MCP methods |
maxit |
maximum iterations for optimization |
tol |
tolerance for convergence |
... |
arguments passed to other methods such as Wasserstein distance |
Value
object of class WpProj
See Also
W_p R^2
Function to Evaluate Performance
Description
This function will calculate p-Wasserstein distances between the predictions of interest and the projected model.
Usage
WPR2(
predictions = NULL,
projected_model,
p = 2,
method = "exact",
base = NULL,
...
)
## S4 method for signature 'ANY,matrix'
WPR2(
predictions = NULL,
projected_model,
p = 2,
method = "exact",
base = NULL,
...
)
## S4 method for signature 'ANY,distcompare'
WPR2(
predictions = NULL,
projected_model,
p = 2,
method = "exact",
base = NULL,
...
)
## S4 method for signature 'ANY,list'
WPR2(
predictions = NULL,
projected_model,
p = 2,
method = "exact",
base = NULL,
...
)
## S4 method for signature 'ANY,WpProj'
WPR2(
predictions = NULL,
projected_model,
p = 2,
method = "exact",
base = NULL,
...
)
Arguments
predictions |
Predictions of interest, likely from the original model |
projected_model |
A matrix of competing predictions, possibly from a WpProj fit, a WpProj fit itself, or a list of WpProj objects |
p |
Power of the Wasserstein distance to use in distance calculations |
method |
Method for calculating Wasserstein distance |
base |
The baseline result to compare to. If not provided, defaults to the model with no covariates and only an intercept. |
... |
Arguments passed to Wasserstein distance calculation. See |
Value
W_p R ^2
values
Examples
if (rlang::is_installed("stats")) {
# this example is not a true posterior estimation, but is used for illustration
n <- 32
p <- 10
s <- 21
x <- matrix( stats::rnorm(n*p), nrow = n, ncol = p )
beta <- (1:10)/10
y <- x %*% beta + stats::rnorm(n)
post_beta <- matrix(beta, nrow=p, ncol=s) +
matrix(rnorm(p*s), p, s) # not a true posterior
post_mu <- x %*% post_beta
fit <- WpProj(X=x, eta=post_mu, power = 2.0)
out <- WPR2(predictions = post_mu, projected_model = fit,
base = rowMeans(post_mu) # same as intercept only projection
)
}
p-Wasserstein distance projections using simulated annealing
Description
p-Wasserstein distance projections using simulated annealing
Usage
WPSA(
X,
Y = NULL,
theta,
power = 2,
force = NULL,
model.size = NULL,
nvars = NULL,
maxit = 1,
temps = 1000,
max.time = 3600,
const = NULL,
proposal = proposal.fun,
options = list(method = c("selection.variable", "scale", "projection"),
transport.method = transport_options(), energy.distribution = "boltzman",
cooling.schedule = "Geman-Geman", proposal.method = "covariance", epsilon = 0.05,
OTmaxit = 0),
display.progress = FALSE,
parallel = NULL,
calc.theta = TRUE,
xtx = NULL,
xty = NULL,
...
)
Arguments
X |
Covariate vector |
Y |
Predictions |
theta |
Optional matrix of parameters for generating predictions |
power |
Power of the Wasserstein distance |
force |
Any covariates to force into the model? |
model.size |
Maximum number of coefficients |
nvars |
The number of variables to explore. Should be an integer vector of model sizes. Default is NULL which will explore all models from 1 to |
maxit |
Maximum number of iterations |
temps |
Number of temperatures |
max.time |
Maximum time in seconds to run |
const |
Maximum value for simulated annealing distance |
proposal |
Proposal function. There is a default method but can provide your own with parameters |
options |
Options for simulated annealing |
display.progress |
Whether to display solver progress. TRUE or FALSE. Default is FALSE. |
parallel |
A |
calc.theta |
Should the model save the linear coefficients? TRUE or FALSE. Default is TRUE |
xtx |
precomputed crossproduct |
xty |
precomputed |
Value
An object of class WpProj
p-Wasserstein Distance Linear Projections Using a Stepwise Method
Description
p-Wasserstein Distance Linear Projections Using a Stepwise Method
Usage
WPSW(
X,
Y,
theta,
power = 2,
force = NULL,
direction = c("backward", "forward"),
method = c("selection.variable", "scale", "projection"),
transport.method = transport_options(),
OTmaxit = 0,
epsilon = 0.05,
calc.theta = TRUE,
model.size = NULL,
parallel = NULL,
display.progress = FALSE,
...
)
Arguments
X |
matrix of covariates |
Y |
matrix of predictions |
theta |
optional parameter matrix for selection methods. |
power |
Power of the Wasserstein distance |
force |
Any covariates to force into the model? |
direction |
forward or backward selection |
method |
"selection.variable" or "projection |
transport.method |
Method for calculating the Wasserstein distance. Should be one of the outputs of |
OTmaxit |
maximum number of iterations for the opt?imal transport methods |
epsilon |
hyperparameter if using sinkhorn iterations to approximate OT |
calc.theta |
should we get the linear coefficients |
model.size |
Maximum model size |
parallel |
foreach backend |
display.progress |
Display intermediate progress |
Value
An object of class WpProj
p-Wasserstein Variable Importance
Description
This function will measure how much removing each covariate harms prediction accuracy.
Usage
WPVI(
X,
eta,
theta,
pred.fun = NULL,
p = 2,
ground_p = 2,
transport.method = transport_options(),
epsilon = 0.05,
OTmaxit = 0,
display.progress = FALSE,
parallel = NULL
)
Arguments
X |
Covariates |
eta |
Predictions from the estimated model |
theta |
Parameters from the estimated model. |
pred.fun |
A prediction function. must take variables x, theta as arguments: |
p |
Power of Wasserstein distance |
ground_p |
Power of distance metric |
transport.method |
Transport methods. See |
epsilon |
Hyperparameter for Sinkhorn iterations |
OTmaxit |
Maximum number of iterations for the Wasserstein method |
display.progress |
Display intermediate progress |
parallel |
a foreach backend if already created |
Value
Returns an integer vector ranking covariate importance from most to least important.
Examples
n <- 128
p <- 10
s <- 99
x <- matrix(1, nrow = n, ncol = p )
beta <- (1:10)/10
y <- x %*% beta
post_beta <- matrix(beta, nrow=p, ncol=s)
post_mu <- x %*% post_beta
fit <- WpProj(X=x, eta=post_mu, power = 2.0)
WPVI(X = x, eta = post_mu, theta = post_beta, transport.method = "hilbert")
p-Wasserstein Linear Projections
Description
This function will calculate linear projections from a set of predictions into the space of the covariates in terms of the p-Wasserstein distance.
Usage
WpProj(
X,
eta = NULL,
theta = NULL,
power = 2,
method = c("L1", "binary program", "stepwise", "simulated annealing", "L0"),
solver = c("lasso", "ecos", "lpsolve", "mosek"),
options = NULL
)
Arguments
X |
An |
eta |
An |
theta |
An optional An |
power |
The power of the Wasserstein distance to use. Must be |
method |
The algorithm to calculate the Wasserstein projections. One of "L1", "binary program", "IP", "stepwise","simulated annealing", or "L0". Will default to "L1" if not provided. See details for more information. |
solver |
Which solver to use? One of "lasso", "ecos", "lpsolve", or "mosek". See details for more information |
options |
Options passed to the particular method and desired solver. See details for more information. |
Details
Methods
The WpProj
function is a wrapper for the various Wasserstein projection methods. It is designed to be a one-stop shop for all Wasserstein projection methods. It will automatically choose the correct method and solver based on the arguments provided. It will also return a standardized output for all methods. Each method has its own set of options that can be passed to it. See the documentation for each method for more information.
For the L1 methods, see L1_method_options()
for more information. For the binary program methods, see binary_program_method_options()
for more information. For the stepwise methods, see stepwise_method_options()
for more information. For the simulated annealing methods, see simulated_annealing_method_options()
for more information.
In most cases, we recommend using the L1 methods or binary program methods. The L1 methods are the fastest and applicable to Wasserstein powers of any value greater than 1 and function as direct linear projections into the space of the covariates. The binary program methods instead preserve the coefficients of the original model if this is of interest, such as when the original model was already a linear model. The binary program will instead function as a way of turning on and off certain coefficients in a way that minimizes the Wasserstein distance between reduced and original models. Of note, we also have available an approximate binary program method using a lasso solver. This method is faster than the exact binary program method but is not guaranteed to find the optimal solution. It is recommended to use the exact binary program method if possible. See binary_program_method_options()
for more information on how to set up the approximate method as some arguments for the lasso solver should be specified. For more information on how this works, please also see the referenced paper.
The stepwise, simulated annealing, and L0 methods also select covariates like the binary program methods but they can be slower. They are presented merely for comparison purposes given they were used in the original paper.
Wasserstein distances and powers
The Wasserstein distance is a measure of distance between two probability distributions. It is defined as:
W_p(\mu,\nu) = \left(\inf_{\pi \in \Pi(\mu,\nu)} \int_{\mathbb{R}^d \times \mathbb{R}^d} \|x-y\|^p d\pi(x,y)\right)^{1/p},
where \Pi(\mu,\nu)
is the set of all joint distributions with marginals \mu
and \nu
. The Wasserstein distance is a generalization of the Euclidean distance, which is the case when p=2
. In our function we have argument power
that corresponds to the p
of the equation above. The default power
is 2.0
but any value greater than or equal to 1.0
is allowed. For more information, see the references.
The particular implementation of the Wasserstein distance is as follows. If \mu
is the original prediction from the original model, then we seek to find a new prediction \nu
that minimizes the Wasserstein distance between the two: \text{argmin}_\nu W_p(\mu,\nu)
.
Value
object of class WpProj
, which is a list with the following slots:
call
: The call to the functiontheta
: A list of the final parameter matrices for each returned modelfitted.values
: A list of the fitted values for each returned modelpower
: The power of the Wasserstein distance usedmethod
: The method used to calculate the Wasserstein projectionssolver
: The solver used to calculate the Wasserstein projectionsniter
: The number of iterations used to calculate the Wasserstein projections. Not all methods return a number of iterations so this may beNULL
nzero
: The number of non zero coefficients in the final models
References
Dunipace, Eric and Lorenzo Trippa (2020) https://arxiv.org/abs/2012.09999.
Examples
if(rlang::is_installed("stats")) {
# note we don't generate believable data with real posteriors
# these examples are just to show how to use the function
n <- 32
p <- 10
s <- 21
# covariates and coefficients
x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p )
beta <- (1:10)/10
#outcome
y <- x %*% beta + stats::rnorm(n)
# fake posterior
post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1)
post_mu <- x %*% post_beta #posterior predictive distributions
# fit models
## L1 model
fit.p2 <- WpProj(X=x, eta=post_mu, power = 2.0,
method = "L1", #default
solver = "lasso" #default
)
## approximate binary program
fit.p2.bp <- WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0,
method = "binary program",
solver = "lasso" #default because approximate algorithm is faster
)
## compare performance by measuring distance from full model
dc <- distCompare(models = list("L1" = fit.p2, "BP" = fit.p2.bp))
if(rlang::is_installed(c("ggplot2","ggsci"))) {
plot(dc)
}
## compare performance by measuring the relative distance between a null model
## and the predictions of interest as a pseudo R^2
r2.expect <- WPR2(predictions = post_mu, projected_model = dc) # can have negative values
r2.null <- WPR2(projected_model = dc) # should be between 0 and 1
if(rlang::is_installed(c("ggplot2","ggsci"))) {
plot(r2.null)
}
## we can also examine how predictions change in the models for individual observations
if(rlang::is_installed(c("ggplot2","ggsci","ggridges"))) {
ridgePlot(fit.p2, index = 21, minCoef = 0, maxCoef = 10)
}
}
Options For Use With the Binary Program Method
Description
Options For Use With the Binary Program Method
Usage
binary_program_method_options(
maxit = 500L,
infimum.maxit = 100L,
transport.method = transport_options(),
epsilon = 0.05,
OTmaxit = 100L,
model.size = NULL,
nvars = NULL,
tol = 1e-07,
display.progress = FALSE,
parallel = NULL,
solver.options = NULL
)
Arguments
maxit |
The maximum iterations for the optimizer. Default is 500. |
infimum.maxit |
Maximum iterations to alternate binary program and Wasserstein distance calculations |
transport.method |
Method for Wasserstein distance calculation. Should be one the outputs of |
epsilon |
A value > 0 for the penalty parameter of if using the Sinkhorn method |
OTmaxit |
The number of iterations to run the Wasserstein distance solvers. |
model.size |
What is the maximum number of coefficients to have in the final model. Default is NULL. If NULL, will find models from the minimum size, 0, to the number of columns in |
nvars |
The number of variables to explore. Should be an integer vector of model sizes. Default is NULL which will explore all models from 1 to |
tol |
The tolerance for convergence |
display.progress |
Logical. Should intermediate progress be displayed? TRUE or FALSE. Default is FALSE. |
parallel |
A cluster backend to be used by |
solver.options |
Options to be passed on to the solver. See details |
Details
This function will setup the default arguments used by the binary program method. Of note, for the argument solver.options
, If using the "lasso" solver, you should provide arguments such as "penalty", "nlambda", "lambda.min.ratio", "gamma", and "lambda" in a list. A simple way to do this is to feed the output of the L1_method_options()
function to the argument solver.options.
This will tell the approximate solver, which uses a lasso method that then will project the parameters back to the \{0,1\}
space. For the other solvers, you can see the options in the ECOS solver package, ECOSolveR::ecos.control()
, and the options for the mosek solver, Rmosek::mosek()
.
Value
A list with names corresponding to each argument above.
See Also
Examples
binary_program_method_options()
# is using the lasso solver for the binary program method to give an approximate solution
binary_program_method_options(solver.options = L1_method_options(nlambda = 50L))
A Function to Combine W_p R ^2
Objects
Description
Will combine
W_p R ^2
objects into a single object.
Usage
combine.WPR2(...)
Arguments
... |
List of |
Value
A vector of W_p R^2
objects
See Also
Examples
if (rlang::is_installed("stats")) {
n <- 128
p <- 10
s <- 99
x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p )
beta <- (1:10)/10
y <- x %*% beta + stats::rnorm(n)
post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1)
post_mu <- x %*% post_beta
fit1 <- WpProj(X=x, eta=post_mu, theta = post_beta,
power = 2.0, method = "binary program")
fit2 <- WpProj(X=x, eta=post_mu, power = 2.0,
options = list(penalty = "lasso")
)
out1 <- WPR2(predictions = post_mu, projected_model = fit1)
out2 <- WPR2(predictions = post_mu, projected_model = fit2)
combine <- combine.WPR2(out1, out2)
}
Combine distance calculations from the distCompare function
Description
Combine distance calculations from the distCompare function
Usage
combine.distcompare(...)
Arguments
... |
|
Value
an object of class combine.distcompare
, the combined distcompare
class objects as returned by distCompare()
function
Compares Optimal Transport Distances Between WpProj and Original Models
Description
Will compare the Wasserstein distance between the original model and the
WpProj
model.
Usage
distCompare(
models,
target = list(parameters = NULL, predictions = NULL),
power = 2,
method = "exact",
quantity = c("parameters", "predictions"),
parallel = NULL,
transform = function(x) {
return(x)
},
...
)
Arguments
models |
A list of models from WpProj methods |
target |
The target to compare the methods to. Should be a list with slots "parameters" to compare the parameters and "predictions" to compare predictions |
power |
The power parameter of the Wasserstein distance. |
method |
Which approximation to the Wasserstein distance to use. Should be one of the outputs of |
quantity |
Should the function target the "parameters" or the "predictions". Can choose both. |
parallel |
Parallel backend to use for the |
transform |
Transformation function for the predictions. |
... |
other options passed to the |
Details
For the data frames, dist
is the Wasserstein distance, nactive
is the number of active variables in the model, groups
is the name distinguishing the model, and method
is the method used to calculate the distance (i.e., exact, sinkhorn, etc.). If the list in models
is named, these will be used as the group names otherwise the group names will be created based on the call from the WpProj
method.
Value
an object of class distcompare
with slots parameters
, predictions
, and p
. The slots parameters
and predictions
are data frames. See the details for more info. The slot p
is the power parameter of the Wasserstein distance used in the distance calculation.
Examples
if(rlang::is_installed("stats")) {
n <- 32
p <- 10
s <- 21
x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p )
beta <- (1:10)/10
y <- x %*% beta + stats::rnorm(n)
post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1)
post_mu <- x %*% post_beta
fit1 <- WpProj(X=x, eta=post_mu, power = 2.0,
options = list(penalty = "lasso")
)
fit2 <- WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0,
method = "binary program", solver = "lasso",
options = list(solver.options = list(penalty = "mcp"))
)
dc <- distCompare(models = list("L1" = fit1, "BP" = fit2),
target = list(parameters = post_beta, predictions = post_mu))
if(rlang::is_installed(c("ggplot2","ggsci"))) {
plot(dc)
}
}
Plot Function for W_p R^2
Objects
Description
Plot Function for W_p R^2
Objects
Usage
## S4 method for signature 'WPR2'
plot(
x,
xlim = NULL,
ylim = NULL,
linesize = 0.5,
pointsize = 1.5,
facet.group = NULL,
...
)
Arguments
x |
A |
xlim |
x-axis limits |
ylim |
y-axis limits |
linesize |
Linesize for geom_line |
pointsize |
Point size for geom_point |
facet.group |
Group to do facet_grid by |
... |
Currently not used |
Value
a ggplot2::ggplot()
object
Examples
n <- 128
p <- 10
s <- 99
x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p )
beta <- (1:10)/10
y <- x %*% beta + stats::rnorm(n)
post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1)
post_mu <- x %*% post_beta
fit <- WpProj(X=x, eta=post_mu, power = 2.0,
options = list(penalty = "lasso")
)
obj <- WPR2(predictions = post_mu, projected_model = fit)
if (rlang::is_installed("ggplot2")) {
p <- plot(obj)
}
Plot 'combine.distcompare' Objects
Description
Plot 'combine.distcompare' Objects
Usage
## S4 method for signature 'combine_distcompare'
plot(x, ylim = NULL, ylabs = c(NULL, NULL), facet.group = NULL, ...)
Arguments
x |
|
ylim |
y-axis limits |
ylabs |
y-axis labels |
facet.group |
groups to facet by |
... |
additional plotting parameters like alpha |
Value
An object of class plotcombine
that is a list of various diagnostic plots of the WpProj
objects
Plot distcompare
Objects
Description
Plot distcompare
Objects
Usage
## S4 method for signature 'distcompare'
plot(
x = NULL,
models = NULL,
ylim = NULL,
ylabs = c(NULL, NULL),
xlab = NULL,
xlim = NULL,
linesize = 0.5,
pointsize = 1.5,
facet.group = NULL,
...
)
Arguments
x |
object of class |
models |
Can give list of |
ylim |
Limits on y-axis |
ylabs |
Y-axis labels |
xlab |
X-axis labels |
xlim |
Limits of the x-axis |
linesize |
How big to make the lines? |
pointsize |
How big to make the points? |
facet.group |
Should the plots be turned into a facet_grid? |
... |
Additional options for the wasserstein distance if just inputing raw |
Value
A ggplot2
object
Plot the Rankings of the 'combine.distcompare' Objects
Description
Plot the Rankings of the 'combine.distcompare' Objects
Usage
plot_ranks(distances, ylim = NULL, ylabs = c(NULL, NULL), ...)
Arguments
distances |
A |
ylim |
y-axis limits |
ylabs |
y-axis labels |
... |
additional plot arguments like alpha parameter in ggplot2 |
Value
object of class plotrank
which is a list with slots "parameters" and "predictions"
Ranks distcompare
Objects
Description
Ranks distcompare
Objects
Usage
rank_distcompare(distances)
Arguments
distances |
A |
Value
The ranks of a distcompare
object as a list containing slots "predictions" and "parameters".
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- approxOT
Ridge Plots for a Range of Coefficients
Description
This function will plot the distribution of predictions for a range of active coefficients
Usage
ridgePlot(
fit,
index = 1,
minCoef = 1,
maxCoef = 10,
scale = 1,
alpha = 0.5,
full = NULL,
transform = function(x) {
x
},
xlab = "Predictions",
bandwidth = NULL
)
Arguments
fit |
A |
index |
The observation number to select. Can be a vector |
minCoef |
The minimum number of coefficients to use |
maxCoef |
The maximum number of coefficients to use |
scale |
How the densities should be scale |
alpha |
Alpha term from ggplot2 object |
full |
"True" prediction to compare to |
transform |
transform for predictions |
xlab |
x-axis label |
bandwidth |
Bandwidth for kernel |
Value
a ggplot2::ggplot()
plot
Examples
if(rlang::is_installed("stats")) {
n <- 128
p <- 10
s <- 99
x <- matrix(stats::rnorm(n*p), nrow = n, ncol = p )
beta <- (1:10)/10
y <- x %*% beta + stats::rnorm(n)
post_beta <- matrix(beta, nrow=p, ncol=s) + matrix(stats::rnorm(p*s, 0, 0.1), p, s)
post_mu <- x %*% post_beta
fit <- WpProj(X=x, eta=post_mu,
power = 2
)
if(rlang::is_installed(c("ggplot2","ggsci","ggridges"))) {
ridgePlot(fit)
}
}
Options For Use With the Simulated Annealing Selection Method
Description
Options For Use With the Simulated Annealing Selection Method
Usage
simulated_annealing_method_options(
force = NULL,
method = c("binary program", "projection"),
transport.method = transport_options(),
OTmaxit = 0L,
epsilon = 0.05,
maxit = 1L,
temps = 1000L,
max.time = 3600,
proposal.method = c("covariance", "uniform"),
energy.distribution = c("boltzman", "bose-einstein"),
cooling.schedule = c("Geman-Geman", "exponential"),
model.size = NULL,
nvars = NULL,
display.progress = FALSE,
parallel = NULL,
calc.theta = TRUE,
...
)
Arguments
force |
Any covariates to force into the model? Should be by column number or NULL if no variables to force into the model. |
method |
Should covariates be selected as an approximate "binary program" or should a projection method be used. Default is the approximate binary program. |
transport.method |
Method for Wasserstein distance calculation. Should be one the outputs of |
OTmaxit |
The number of iterations to run the Wasserstein distance solvers. |
epsilon |
A value > 0 for the penalty parameter of if using the Sinkhorn method for optimal transport |
maxit |
Maximum number of iterations per temperature |
temps |
Number of temperatures to try |
max.time |
Maximum time in seconds to run the algorithm |
proposal.method |
The method to propose the next covariate to add. One of "covariance" or "random". "covariance" will randomly select from covariates with probability proportional to the absolute value of the covariance. "uniform" will select covariates uniformly at random. |
energy.distribution |
The energy distribution to use for evaluating proposals. One of "boltzman" or "bose-einstein". Default is "boltzman". |
cooling.schedule |
The schedule to use for cooling temperatures. One of "Geman-Geman" or "exponential". Default is "Geman-Geman". |
model.size |
How many coefficients should the maximum final model have? Ignored if |
nvars |
What model sizes should one check? Should be a numeric vector with maximum less than number of variables or |
display.progress |
Logical. Should intermediate progress be displayed? TRUE or FALSE. Default is |
parallel |
A cluster backend to be used by |
calc.theta |
Return the linear coefficients? Default is TRUE. |
... |
Not used. |
Value
A named list with the above arguments
Examples
simulated_annealing_method_options()
Options For Use With the Stepwise Selection Method
Description
Options For Use With the Stepwise Selection Method
Usage
stepwise_method_options(
force = NULL,
direction = c("backward", "forward"),
method = c("binary program", "projection"),
transport.method = transport_options(),
OTmaxit = 0,
epsilon = 0.05,
model.size = NULL,
display.progress = FALSE,
parallel = NULL,
calc.theta = TRUE,
...
)
Arguments
force |
Any covariates to force into the model? Should be by column number or NULL if no variables to force into the model. |
direction |
"forward" or "backward" selection? Default is "backward" |
method |
Should covariates be selected as an approximate "binary program" or should a projection method be used. Default is the approximate binary program. |
transport.method |
Method for Wasserstein distance calculation. Should be one the outputs of |
OTmaxit |
The number of iterations to run the Wasserstein distance solvers. |
epsilon |
A value > 0 for the penalty parameter of if using the Sinkhorn method for optimal transport |
model.size |
How many coefficients should the maximum final model have? |
display.progress |
Logical. Should intermediate progress be displayed? TRUE or FALSE. Default is FALSE. |
parallel |
A cluster backend to be used by |
calc.theta |
Return the linear coefficients? Default is TRUE. |
... |
Not used |
Value
A named list with the above arguments
Examples
stepwise_method_options()