lessSEM can be used for regularized SEM and for general purpose optimization. That is, you can use all optimizers and penalty functions implemented in lessSEM for your own models. To this end, you must define a fitting function; i.e., a function which takes in the parameters and returns a single value - the unregularized fit. lessSEM uses this fitting function and adds the penalty terms. The combined fitting function is then optimized. Currently, there are four ways to use the optimizers in lessSEM
?lessSEM::gpLasso
).?lessSEM::gpLassoCpp
).In general, the approaches get faster as you transition from 1 to 4. You will see the largest performance gains when implementing a gradient function and not just a fitting function, however. As a rule of thumb: Use approach 1 if you intend to run your model a few times, don’t want to create a new package and your model runs fairly fast. Use approach 2 if you want to increase the speed a bit, while keeping the changes necessary to your files manageable. Use approach 3 or 4 if you create a new package, have some experience with RcppArmadillo and want to get the best performance.
In the following, we will demonstrate all three approaches using a linear regression model as an example.
Let’s start by setting up our linear regression model. To this end, we will simulate a data set:
We will now try to implement a lasso regularized linear regression
using the gpLasso interface. This interface is very similar to
optim
. To use it, we must define our fitting function in
R:
# defining the sum-squared-errors:
sseFun <- function(par, y, X, N){
# par is the parameter vector
# y is the observed dependent variable
# X is the design matrix
# N is the sample size
pred <- X %*% matrix(par, ncol = 1) #be explicit here:
# we need par to be a column vector
sse <- sum((y - pred)^2)
# we scale with .5/N to get the same results as glmnet
return((.5/N)*sse)
}
Additionally, we need a labeled vector with starting values:
par <- rep(0, p+1)
names(par) <- paste0("b", 0:p)
print(par)
#> b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
#> 0 0 0 0 0 0 0 0 0 0 0
Note that we defined one more parameter than there are variables in X. This is because we also want to estimate the intercept. To this end, we extend X:
Xext <- cbind(1,X)
head(Xext)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#> [1,] 1 -0.56047565 -0.71040656 2.1988103 -0.7152422 -0.07355602 -0.60189285 1.07401226 -0.7282191 0.3562833 -1.0141142
#> [2,] 1 -0.23017749 0.25688371 1.3124130 -0.7526890 -1.16865142 -0.99369859 -0.02734697 -1.5404424 -0.6580102 -0.7913139
#> [3,] 1 1.55870831 -0.24669188 -0.2651451 -0.9385387 -0.63474826 1.02678506 -0.03333034 -0.6930946 0.8552022 0.2995937
#> [4,] 1 0.07050839 -0.34754260 0.5431941 -1.0525133 -0.02884155 0.75106130 -1.51606762 0.1188494 1.1529362 1.6390519
#> [5,] 1 0.12928774 -0.95161857 -0.4143399 -0.4371595 0.67069597 -1.50916654 0.79038534 -1.3647095 0.2762746 1.0846170
#> [6,] 1 1.71506499 -0.04502772 -0.4762469 0.3311792 -1.65054654 -0.09514745 -0.21073418 0.5899827 0.1441047 -0.6245675
Finally, we need to decide which parameters should be regularized and the values for lambda. We want to regularize everything except for the intercept:
(regularized <- paste0("b", 1:p))
#> [1] "b1" "b2" "b3" "b4" "b5" "b6" "b7" "b8" "b9" "b10"
lambdas <- seq(0,.1,length.out = 20)
Now, we are ready to estimate the model:
library(lessSEM)
l1 <- gpLasso(par = par,
regularized = regularized,
fn = sseFun,
lambdas = lambdas,
X = Xext,
y = y,
N = length(y)
)
head(l1@parameters)
#> lambda alpha theta b0 b1 b2 b3 b4 b5 b6 b7 b8
#> 1 0.000000000 1 0 0.02738472 1.0129194 0.9991454 0.9705725 1.027626 0.014036009 -0.007460964 0.0185899238 0.021930771
#> 2 0.005263158 1 0 0.02935302 1.0043737 0.9908934 0.9626258 1.025138 0.003365832 0.000000000 0.0143411319 0.015434822
#> 3 0.010526316 1 0 0.02995132 0.9967095 0.9846674 0.9552789 1.021891 0.000000000 0.000000000 0.0096220740 0.010707768
#> 4 0.015789474 1 0 0.03010607 0.9897339 0.9789423 0.9481496 1.018672 0.000000000 0.000000000 0.0049333012 0.006364327
#> 5 0.021052632 1 0 0.03029739 0.9827288 0.9732058 0.9409861 1.015363 0.000000000 0.000000000 0.0001782005 0.002036124
#> 6 0.026315789 1 0 0.03112551 0.9753354 0.9670622 0.9338614 1.011552 0.000000000 0.000000000 0.0000000000 0.000000000
#> b9 b10
#> 1 -0.009900077 0.027401044
#> 2 -0.007939443 0.022297575
#> 3 -0.005256845 0.017465446
#> 4 -0.002392921 0.012713123
#> 5 0.000000000 0.007969754
#> 6 0.000000000 0.003304080
Note that we did not specify the gradients of our function. In this case, lessSEM will use numDeriv to compute the gradients. However, if you know how to specify the gradients, this can result in faster estimation:
sseGrad <- function(par, y, X, N){
gradients = (-2.0*t(X) %*% y + 2.0*t(X)%*%X%*%matrix(par,ncol = 1))
gradients = (.5/length(y))*gradients
return(t(gradients))
}
l1 <- gpLasso(par = par,
regularized = regularized,
fn = sseFun,
gr = sseGrad,
lambdas = lambdas,
X = Xext,
y = y,
N = length(y)
)
head(l1@parameters)
#> lambda alpha theta b0 b1 b2 b3 b4 b5 b6 b7 b8
#> 1 0.000000000 1 0 0.02738485 1.0129200 0.9991452 0.9705725 1.027626 0.014034994 -0.007460252 0.0185901898 0.021930702
#> 2 0.005263158 1 0 0.02935325 1.0043732 0.9908928 0.9626258 1.025139 0.003364951 0.000000000 0.0143418480 0.015434447
#> 3 0.010526316 1 0 0.02995023 0.9967094 0.9846669 0.9552792 1.021892 0.000000000 0.000000000 0.0096217574 0.010707383
#> 4 0.015789474 1 0 0.03010649 0.9897330 0.9789426 0.9481493 1.018672 0.000000000 0.000000000 0.0049332838 0.006363868
#> 5 0.021052632 1 0 0.03029729 0.9827286 0.9732062 0.9409869 1.015362 0.000000000 0.000000000 0.0001772169 0.002036300
#> 6 0.026315789 1 0 0.03112481 0.9753368 0.9670620 0.9338616 1.011553 0.000000000 0.000000000 0.0000000000 0.000000000
#> b9 b10
#> 1 -0.009900699 0.027400748
#> 2 -0.007939640 0.022297136
#> 3 -0.005257048 0.017465134
#> 4 -0.002393522 0.012713116
#> 5 0.000000000 0.007969996
#> 6 0.000000000 0.003303777
Here is a short comparison of running both models 5 times each:
Runtime in seconds without gradients:
#> [1] 0.2155101 0.2029781 0.1960189 0.2026141 0.2020009
Runtime in seconds with gradients:
#> [1] 0.02016687 0.02007699 0.02003384 0.01954794 0.02730393
That’s quite a speedup!
Note that you can also pass a C++ function to gpLasso similar to the approach above:
library(RcppArmadillo)
library(Rcpp)
linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
double fitfunction(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N){
// compute the sum of squared errors:
arma::mat sse = arma::trans(y-X*parameters)*(y-X*parameters);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
sse *= 1.0/(2.0 * N);
// note: We must return a double, but the sse is a matrix
// To get a double, just return the single value that is in
// this matrix:
return(sse(0,0));
}
// [[Rcpp::export]]
arma::rowvec gradientfunction(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N){
// note: we want to return our gradients as row-vector; therefore,
// we have to transpose the resulting column-vector:
arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*parameters);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
gradients *= (.5/N);
return(gradients);
}'
Rcpp::sourceCpp(code = linreg)
Run the model as before:
l1 <- gpLasso(par = par,
regularized = regularized,
fn = fitfunction,
gr = gradientfunction,
lambdas = lambdas,
X = Xext,
y = y,
N = length(y)
)
head(l1@parameters)
#> lambda alpha theta b0 b1 b2 b3 b4 b5 b6 b7 b8
#> 1 0.000000000 1 0 0.02738527 1.0129194 0.9991448 0.9705722 1.027624 0.014035744 -0.007459583 0.0185892822 0.021930374
#> 2 0.005263158 1 0 0.02935316 1.0043741 0.9908936 0.9626259 1.025139 0.003366124 0.000000000 0.0143421453 0.015435255
#> 3 0.010526316 1 0 0.02995019 0.9967089 0.9846670 0.9552796 1.021892 0.000000000 0.000000000 0.0096214361 0.010707812
#> 4 0.015789474 1 0 0.03010669 0.9897326 0.9789426 0.9481493 1.018672 0.000000000 0.000000000 0.0049330908 0.006364232
#> 5 0.021052632 1 0 0.03029700 0.9827285 0.9732063 0.9409868 1.015362 0.000000000 0.000000000 0.0001764845 0.002037008
#> 6 0.026315789 1 0 0.03112464 0.9753365 0.9670623 0.9338615 1.011553 0.000000000 0.000000000 0.0000000000 0.000000000
#> b9 b10
#> 1 -0.009900549 0.027401230
#> 2 -0.007938658 0.022297962
#> 3 -0.005256606 0.017465214
#> 4 -0.002393757 0.012713307
#> 5 0.000000000 0.007971003
#> 6 0.000000000 0.003304894
The runtime in seconds with C++ is:
#> [1] 0.01715302 0.01618695 0.01546001 0.01603198 0.01513100
Which is even lower than what we had before!
While using the Rcpp functions defined above was quite fast for our linear regression, it can still be fairly slow for more involved models (e.g., SEM). This is due to our optimizer having to go back and forth between R and C++. To reduce this overhead, we can use the second approach. Here, instead of passing an Rcpp function which is then executed in R, we pass a pointer to the underlying C++ functions. This approach is more constrained than the one presented above:
const Rcpp::NumericVector&
(the
parameters) and an Rcpp::List&
(everything else). While
this seems restrictive, note that we can virtually pass anything we want
in a list.This may be a bit overwhelming at first, so we will go through it step by step.
We already defined a fitting function and a gradient function for our linear regression model in the example above. However, we often do not know the gradients in closed form. If you don’t have a gradient function, you can try a numerical approximation. More details can be found here.
Note that our fitting function and our gradient function do not
comply with the constraints mentioned above. That is, they do take more
than two parameters as arguments
(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N
),
and these arguments are not a
const Rcpp::NumericVector&
and an
Rcpp::List&
. How can we make this work? The parameter
vector const Rcpp::NumericVector&
will hold all
elements in the arma::colvec pararameters
of our old
function. The Rcpp::List&
must contain all of the other
elements (X,y,N
). Let’s start by creating this list, which
we will call data:
Next, we have to change our functions to make things work:
linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
double fitfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
// our function now only takes the two specified arguments: a
// const Rcpp::NumericVector& and an Rcpp::List&.
// We have to extract all elements from the list:
arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
int N = Rcpp::as<int>(data["N"]); // the sample size
// Next, we want to get the parameters as a column-vector:
arma::colvec b = Rcpp::as<arma::colvec>(parameters);
// compute the sum of squared errors:
arma::mat sse = arma::trans(y-X*b)*(y-X*b);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
sse *= 1.0/(2.0 * N);
// note: We must return a double, but the sse is a matrix
// To get a double, just return the single value that is in
// this matrix:
return(sse(0,0));
}
// [[Rcpp::export]]
arma::rowvec gradientfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
// our function now only takes the two specified arguments: a
// const Rcpp::NumericVector& and an Rcpp::List&.
// We have to extract all elements from the list:
arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
int N = Rcpp::as<int>(data["N"]); // the sample size
// Next, we want to get the parameters as a column-vector:
arma::colvec b = Rcpp::as<arma::colvec>(parameters);
// note: we want to return our gradients as row-vector; therefore,
// we have to transpose the resulting column-vector:
arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*b);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
gradients *= (.5/N);
return(gradients);
}
'
That’s it, our functions have been transformed!
This is where it get’s really tricky! We can’t just pass our functions to C++. However, we can create pointers. These have to be generated in C++ and this can be tricky to get right. To simplify the process, we have created a function which helps setting things up:
cat(lessSEM::makePtrs(fitFunName = "fitfunction", # name of the function in C++
gradFunName = "gradientfunction" # name of the function in C++
)
)
#>
#> // INSTRUCTIONS: ADD THE FOLLOWING LINES TO YOUR C++ FUNCTIONS
#>
#> // IF RCPPARMADILLO IS NOT IMPORTED YET, UNCOMMENT THE FOLLOWING TWO LINES
#> // // [[Rcpp::depends(RcppArmadillo)]]
#> // #include <RcppArmadillo.h>
#>
#> // Dirk Eddelbuettel at
#> // https://gallery.rcpp.org/articles/passing-cpp-function-pointers/
#>
#> typedef double (*fitFunPtr)(const Rcpp::NumericVector&, //parameters
#> Rcpp::List& //additional elements
#> );
#> typedef Rcpp::XPtr<fitFunPtr> fitFunPtr_t;
#>
#> typedef arma::rowvec (*gradientFunPtr)(const Rcpp::NumericVector&, //parameters
#> Rcpp::List& //additional elements
#> );
#> typedef Rcpp::XPtr<gradientFunPtr> gradientFunPtr_t;
#>
#> // [[Rcpp::export]]
#> fitFunPtr_t fitfunctionPtr() {
#> return(fitFunPtr_t(new fitFunPtr(&fitfunction)));
#> }
#>
#> // [[Rcpp::export]]
#> gradientFunPtr_t gradientfunctionPtr() {
#> return(gradientFunPtr_t(new gradientFunPtr(&gradientfunction)));
#> }
Let’s follow the instructions and add the lines to our C++ functions:
linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
double fitfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
// our function now only takes the two specified arguments: a
// const Rcpp::NumericVector& and an Rcpp::List&.
// We have to extract all elements from the list:
arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
int N = Rcpp::as<int>(data["N"]); // the sample size
// Next, we want to get the parameters as a column-vector:
arma::colvec b = Rcpp::as<arma::colvec>(parameters);
// compute the sum of squared errors:
arma::mat sse = arma::trans(y-X*b)*(y-X*b);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
sse *= 1.0/(2.0 * N);
// note: We must return a double, but the sse is a matrix
// To get a double, just return the single value that is in
// this matrix:
return(sse(0,0));
}
// [[Rcpp::export]]
arma::rowvec gradientfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
// our function now only takes the two specified arguments: a
// const Rcpp::NumericVector& and an Rcpp::List&.
// We have to extract all elements from the list:
arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
int N = Rcpp::as<int>(data["N"]); // the sample size
// Next, we want to get the parameters as a column-vector:
arma::colvec b = Rcpp::as<arma::colvec>(parameters);
// note: we want to return our gradients as row-vector; therefore,
// we have to transpose the resulting column-vector:
arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*b);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
gradients *= (.5/N);
return(gradients);
}
/// THE FOLLOWING PART IS NEW:
// INSTRUCTIONS: ADD THE FOLLOWING LINES TO YOUR C++ FUNCTIONS
// IF RCPPARMADILLO IS NOT IMPORTED YET, UNCOMMENT THE FOLLOWING TWO LINES
// // [[Rcpp::depends(RcppArmadillo)]]
// #include <RcppArmadillo.h>
// Dirk Eddelbuettel at
// https://gallery.rcpp.org/articles/passing-cpp-function-pointers/
typedef double (*fitFunPtr)(const Rcpp::NumericVector&, //parameters
Rcpp::List& //additional elements
);
typedef Rcpp::XPtr<fitFunPtr> fitFunPtr_t;
typedef arma::rowvec (*gradientFunPtr)(const Rcpp::NumericVector&, //parameters
Rcpp::List& //additional elements
);
typedef Rcpp::XPtr<gradientFunPtr> gradientFunPtr_t;
// [[Rcpp::export]]
fitFunPtr_t fitfunctionPtr() {
return(fitFunPtr_t(new fitFunPtr(&fitfunction)));
}
// [[Rcpp::export]]
gradientFunPtr_t gradientfunctionPtr() {
return(gradientFunPtr_t(new gradientFunPtr(&gradientfunction)));
}
'
Compile the functions using Rcpp:
Great! Now that this is out of the way, we can create the pointers to our functions:
ffp <- fitfunctionPtr() # create the pointer to the fitting function
# Note that the name of this function will depend on the name of your fitting function.
# For instance, if your fitting function is called sse, then the pointer will be created
# with ffp <- ssePtr()
gfp <- gradientfunctionPtr() # create the pointer to the gradient function
# Note that the name of this function will depend on the name of your gradient function.
# For instance, if your gradient function is called sseGradient, then the pointer will be created
# with gfp <- sseGradientPtr()
The last step is to call the general purpose optimization. To this
end, use the gpLassoCpp
function:
l1 <- gpLassoCpp(par = par,
regularized = regularized,
# important: pass the poinnters!
fn = ffp,
gr = gfp,
lambdas = lambdas,
# finally, pass the list which the fitting function and the
# gradient function need:
additionalArguments = data
)
head(l1@parameters)
#> lambda alpha theta b0 b1 b2 b3 b4 b5 b6 b7 b8
#> 1 0.000000000 1 0 0.02738542 1.0129198 0.9991455 0.9705733 1.027625 0.014037188 -0.007460885 0.0185907073 0.021930984
#> 2 0.005263158 1 0 0.02935271 1.0043736 0.9908928 0.9626259 1.025139 0.003365862 0.000000000 0.0143413234 0.015434697
#> 3 0.010526316 1 0 0.02995027 0.9967094 0.9846668 0.9552792 1.021892 0.000000000 0.000000000 0.0096220212 0.010707439
#> 4 0.015789474 1 0 0.03010668 0.9897329 0.9789425 0.9481493 1.018672 0.000000000 0.000000000 0.0049333261 0.006364019
#> 5 0.021052632 1 0 0.03029739 0.9827288 0.9732059 0.9409868 1.015363 0.000000000 0.000000000 0.0001773121 0.002036026
#> 6 0.026315789 1 0 0.03112461 0.9753368 0.9670620 0.9338617 1.011553 0.000000000 0.000000000 0.0000000000 0.000000000
#> b9 b10
#> 1 -0.009900023 0.027401255
#> 2 -0.007939417 0.022297387
#> 3 -0.005256688 0.017464767
#> 4 -0.002393495 0.012713141
#> 5 0.000000000 0.007969742
#> 6 0.000000000 0.003303734
Benchmarking this approach results in:
#> [1] 0.01474500 0.01400304 0.01196003 0.01131010 0.01107502
So, we have reduced our runtime even more!
This approach requires a more elaborate setup which is why we have
created a whole package to demonstrate it. You will find more
information in the vignette The-optimizer-interface
and in
the lessLM package. If
you just want the optimizers and don’t want to depend on the
lessSEM package, we recommend that you copy the lesstimate C++ library
in your packages inst/include folder.
It will come to the same parameter estimates:
#> b0 b1 b2 b3 b4 b5 b6 b7 b8 b9
#> [1,] 0.02734701 1.0129361 0.9991629 0.9705501 1.027728 0.013993181 -0.007491533 0.0186210155 0.021974963 -0.009975776
#> [2,] 0.02939675 1.0043635 0.9908681 0.9626493 1.025035 0.003400965 0.000000000 0.0143089363 0.015388336 -0.007860992
#> [3,] 0.02998680 0.9967117 0.9846504 0.9552960 1.021803 0.000000000 0.000000000 0.0095927088 0.010679371 -0.005183552
#> [4,] 0.03006777 0.9897315 0.9789595 0.9481301 1.018774 0.000000000 0.000000000 0.0049636831 0.006397664 -0.002474334
#> [5,] 0.03032345 0.9827556 0.9731799 0.9409662 1.015441 0.000000000 0.000000000 0.0001374354 0.002100692 0.000000000
#> [6,] 0.03111085 0.9753325 0.9670615 0.9338506 1.011607 0.000000000 0.000000000 0.0000000000 0.000000000 0.000000000
#> b10
#> [1,] 0.027466466
#> [2,] 0.022231196
#> [3,] 0.017406513
#> [4,] 0.012779758
#> [5,] 0.008033475
#> [6,] 0.003352431
And the run times are even lower:
#> [1] 0.002497911 0.001735926 0.001685858 0.001821041 0.001929998