General-Purpose-Optimization

lessSEM can be used for regularized SEM and for general purpose optimization. That is, you can use all optimizers and penalty functions implemented in lessSEM for your own models. To this end, you must define a fitting function; i.e., a function which takes in the parameters and returns a single value - the unregularized fit. lessSEM uses this fitting function and adds the penalty terms. The combined fitting function is then optimized. Currently, there are four ways to use the optimizers in lessSEM

In general, the approaches get faster as you transition from 1 to 4. You will see the largest performance gains when implementing a gradient function and not just a fitting function, however. As a rule of thumb: Use approach 1 if you intend to run your model a few times, don’t want to create a new package and your model runs fairly fast. Use approach 2 if you want to increase the speed a bit, while keeping the changes necessary to your files manageable. Use approach 3 or 4 if you create a new package, have some experience with RcppArmadillo and want to get the best performance.

In the following, we will demonstrate all three approaches using a linear regression model as an example.

The example

Let’s start by setting up our linear regression model. To this end, we will simulate a data set:

set.seed(123)

# first, we simulate data for our
# linear regression.
N <- 100 # number of persons
p <- 10 # number of predictors
X <- matrix(rnorm(N*p), nrow = N, ncol = p) # design matrix
b <- c(rep(1,4),
       rep(0,6)) # true regression weights
y <- X%*%matrix(b,ncol = 1) + rnorm(N,0,.2)

The first approach: Interfacing from R

We will now try to implement a lasso regularized linear regression using the gpLasso interface. This interface is very similar to optim. To use it, we must define our fitting function in R:

# defining the sum-squared-errors:
sseFun <- function(par, y, X, N){
  # par is the parameter vector
  # y is the observed dependent variable
  # X is the design matrix
  # N is the sample size
  pred <- X %*% matrix(par, ncol = 1) #be explicit here:
  # we need par to be a column vector
  sse <- sum((y - pred)^2)
  # we scale with .5/N to get the same results as glmnet
  return((.5/N)*sse)
}

Additionally, we need a labeled vector with starting values:

par <- rep(0, p+1)
names(par) <- paste0("b", 0:p)
print(par)
#>  b0  b1  b2  b3  b4  b5  b6  b7  b8  b9 b10 
#>   0   0   0   0   0   0   0   0   0   0   0

Note that we defined one more parameter than there are variables in X. This is because we also want to estimate the intercept. To this end, we extend X:

Xext <- cbind(1,X)
head(Xext)
#>      [,1]        [,2]        [,3]       [,4]       [,5]        [,6]        [,7]        [,8]       [,9]      [,10]      [,11]
#> [1,]    1 -0.56047565 -0.71040656  2.1988103 -0.7152422 -0.07355602 -0.60189285  1.07401226 -0.7282191  0.3562833 -1.0141142
#> [2,]    1 -0.23017749  0.25688371  1.3124130 -0.7526890 -1.16865142 -0.99369859 -0.02734697 -1.5404424 -0.6580102 -0.7913139
#> [3,]    1  1.55870831 -0.24669188 -0.2651451 -0.9385387 -0.63474826  1.02678506 -0.03333034 -0.6930946  0.8552022  0.2995937
#> [4,]    1  0.07050839 -0.34754260  0.5431941 -1.0525133 -0.02884155  0.75106130 -1.51606762  0.1188494  1.1529362  1.6390519
#> [5,]    1  0.12928774 -0.95161857 -0.4143399 -0.4371595  0.67069597 -1.50916654  0.79038534 -1.3647095  0.2762746  1.0846170
#> [6,]    1  1.71506499 -0.04502772 -0.4762469  0.3311792 -1.65054654 -0.09514745 -0.21073418  0.5899827  0.1441047 -0.6245675

Finally, we need to decide which parameters should be regularized and the values for lambda. We want to regularize everything except for the intercept:

(regularized <- paste0("b", 1:p))
#>  [1] "b1"  "b2"  "b3"  "b4"  "b5"  "b6"  "b7"  "b8"  "b9"  "b10"
lambdas  <- seq(0,.1,length.out = 20)

Now, we are ready to estimate the model:

library(lessSEM)
l1 <- gpLasso(par = par, 
              regularized = regularized, 
              fn = sseFun, 
              lambdas = lambdas, 
              X = Xext,
              y = y,
              N = length(y)
)
head(l1@parameters)

#>        lambda alpha theta         b0        b1        b2        b3       b4          b5           b6           b7          b8
#> 1 0.000000000     1     0 0.02738472 1.0129194 0.9991454 0.9705725 1.027626 0.014036009 -0.007460964 0.0185899238 0.021930771
#> 2 0.005263158     1     0 0.02935302 1.0043737 0.9908934 0.9626258 1.025138 0.003365832  0.000000000 0.0143411319 0.015434822
#> 3 0.010526316     1     0 0.02995132 0.9967095 0.9846674 0.9552789 1.021891 0.000000000  0.000000000 0.0096220740 0.010707768
#> 4 0.015789474     1     0 0.03010607 0.9897339 0.9789423 0.9481496 1.018672 0.000000000  0.000000000 0.0049333012 0.006364327
#> 5 0.021052632     1     0 0.03029739 0.9827288 0.9732058 0.9409861 1.015363 0.000000000  0.000000000 0.0001782005 0.002036124
#> 6 0.026315789     1     0 0.03112551 0.9753354 0.9670622 0.9338614 1.011552 0.000000000  0.000000000 0.0000000000 0.000000000
#>             b9         b10
#> 1 -0.009900077 0.027401044
#> 2 -0.007939443 0.022297575
#> 3 -0.005256845 0.017465446
#> 4 -0.002392921 0.012713123
#> 5  0.000000000 0.007969754
#> 6  0.000000000 0.003304080

Note that we did not specify the gradients of our function. In this case, lessSEM will use numDeriv to compute the gradients. However, if you know how to specify the gradients, this can result in faster estimation:

sseGrad <- function(par, y, X, N){
  
  gradients = (-2.0*t(X) %*% y + 2.0*t(X)%*%X%*%matrix(par,ncol = 1))
  
  gradients = (.5/length(y))*gradients
  return(t(gradients))
}

l1 <- gpLasso(par = par, 
              regularized = regularized, 
              fn = sseFun, 
              gr = sseGrad,
              lambdas = lambdas, 
              X = Xext,
              y = y,
              N = length(y)
)
head(l1@parameters)

#>        lambda alpha theta         b0        b1        b2        b3       b4          b5           b6           b7          b8
#> 1 0.000000000     1     0 0.02738485 1.0129200 0.9991452 0.9705725 1.027626 0.014034994 -0.007460252 0.0185901898 0.021930702
#> 2 0.005263158     1     0 0.02935325 1.0043732 0.9908928 0.9626258 1.025139 0.003364951  0.000000000 0.0143418480 0.015434447
#> 3 0.010526316     1     0 0.02995023 0.9967094 0.9846669 0.9552792 1.021892 0.000000000  0.000000000 0.0096217574 0.010707383
#> 4 0.015789474     1     0 0.03010649 0.9897330 0.9789426 0.9481493 1.018672 0.000000000  0.000000000 0.0049332838 0.006363868
#> 5 0.021052632     1     0 0.03029729 0.9827286 0.9732062 0.9409869 1.015362 0.000000000  0.000000000 0.0001772169 0.002036300
#> 6 0.026315789     1     0 0.03112481 0.9753368 0.9670620 0.9338616 1.011553 0.000000000  0.000000000 0.0000000000 0.000000000
#>             b9         b10
#> 1 -0.009900699 0.027400748
#> 2 -0.007939640 0.022297136
#> 3 -0.005257048 0.017465134
#> 4 -0.002393522 0.012713116
#> 5  0.000000000 0.007969996
#> 6  0.000000000 0.003303777

Here is a short comparison of running both models 5 times each:

Runtime in seconds without gradients:

#> [1] 0.2155101 0.2029781 0.1960189 0.2026141 0.2020009

Runtime in seconds with gradients:

#> [1] 0.02016687 0.02007699 0.02003384 0.01954794 0.02730393

That’s quite a speedup!

Note that you can also pass a C++ function to gpLasso similar to the approach above:

library(RcppArmadillo)
library(Rcpp)
linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
double fitfunction(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N){
  
  // compute the sum of squared errors:
    arma::mat sse = arma::trans(y-X*parameters)*(y-X*parameters);
    
    // other packages, such as glmnet, scale the sse with 
    // 1/(2*N), where N is the sample size. We will do that here as well
    
    sse *= 1.0/(2.0 * N);
    
    // note: We must return a double, but the sse is a matrix
    // To get a double, just return the single value that is in 
    // this matrix:
      return(sse(0,0));
}

// [[Rcpp::export]]
arma::rowvec gradientfunction(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N){
  
  // note: we want to return our gradients as row-vector; therefore,
  // we have to transpose the resulting column-vector:
    arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*parameters);
    
    // other packages, such as glmnet, scale the sse with 
    // 1/(2*N), where N is the sample size. We will do that here as well
    
    gradients *= (.5/N);
    
    return(gradients);
}'

Rcpp::sourceCpp(code = linreg)

Run the model as before:

l1 <- gpLasso(par = par, 
              regularized = regularized, 
              fn = fitfunction, 
              gr = gradientfunction,
              lambdas = lambdas, 
              X = Xext,
              y = y,
              N = length(y)
)
head(l1@parameters)

#>        lambda alpha theta         b0        b1        b2        b3       b4          b5           b6           b7          b8
#> 1 0.000000000     1     0 0.02738527 1.0129194 0.9991448 0.9705722 1.027624 0.014035744 -0.007459583 0.0185892822 0.021930374
#> 2 0.005263158     1     0 0.02935316 1.0043741 0.9908936 0.9626259 1.025139 0.003366124  0.000000000 0.0143421453 0.015435255
#> 3 0.010526316     1     0 0.02995019 0.9967089 0.9846670 0.9552796 1.021892 0.000000000  0.000000000 0.0096214361 0.010707812
#> 4 0.015789474     1     0 0.03010669 0.9897326 0.9789426 0.9481493 1.018672 0.000000000  0.000000000 0.0049330908 0.006364232
#> 5 0.021052632     1     0 0.03029700 0.9827285 0.9732063 0.9409868 1.015362 0.000000000  0.000000000 0.0001764845 0.002037008
#> 6 0.026315789     1     0 0.03112464 0.9753365 0.9670623 0.9338615 1.011553 0.000000000  0.000000000 0.0000000000 0.000000000
#>             b9         b10
#> 1 -0.009900549 0.027401230
#> 2 -0.007938658 0.022297962
#> 3 -0.005256606 0.017465214
#> 4 -0.002393757 0.012713307
#> 5  0.000000000 0.007971003
#> 6  0.000000000 0.003304894

The runtime in seconds with C++ is:

#> [1] 0.01715302 0.01618695 0.01546001 0.01603198 0.01513100

Which is even lower than what we had before!

The second approach: Using C++ function pointers

While using the Rcpp functions defined above was quite fast for our linear regression, it can still be fairly slow for more involved models (e.g., SEM). This is due to our optimizer having to go back and forth between R and C++. To reduce this overhead, we can use the second approach. Here, instead of passing an Rcpp function which is then executed in R, we pass a pointer to the underlying C++ functions. This approach is more constrained than the one presented above:

We must define both, a fitting function and a gradient function in Rcpp. We cannot rely on numDeriv any more!
The fitting function and the gradient function are only allowed two parameters each: a const Rcpp::NumericVector& (the parameters) and an Rcpp::List& (everything else). While this seems restrictive, note that we can virtually pass anything we want in a list.
We must create pointers to the fit and gradient function. This is difficult, however we will provide some guidance below.

This may be a bit overwhelming at first, so we will go through it step by step.

1. Creating a fitting function and a gradient function

We already defined a fitting function and a gradient function for our linear regression model in the example above. However, we often do not know the gradients in closed form. If you don’t have a gradient function, you can try a numerical approximation. More details can be found here.

2. Adapting the functions to the constraints

Note that our fitting function and our gradient function do not comply with the constraints mentioned above. That is, they do take more than two parameters as arguments (const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N), and these arguments are not a const Rcpp::NumericVector& and an Rcpp::List&. How can we make this work? The parameter vector const Rcpp::NumericVector& will hold all elements in the arma::colvec pararameters of our old function. The Rcpp::List& must contain all of the other elements (X,y,N). Let’s start by creating this list, which we will call data:

data <- list("X" = Xext,
             "y" = y,
             "N" = length(y))

Next, we have to change our functions to make things work:

linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
double fitfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
  // our function now only takes the two specified arguments: a
  // const Rcpp::NumericVector& and an Rcpp::List&.
  // We have to extract all elements from the list:
  arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
  arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
  int N = Rcpp::as<int>(data["N"]); // the sample size
  
  // Next, we want to get the parameters as a column-vector:
    arma::colvec b = Rcpp::as<arma::colvec>(parameters);
    
  // compute the sum of squared errors:
    arma::mat sse = arma::trans(y-X*b)*(y-X*b);
    
    // other packages, such as glmnet, scale the sse with 
    // 1/(2*N), where N is the sample size. We will do that here as well
    
    sse *= 1.0/(2.0 * N);
    
    // note: We must return a double, but the sse is a matrix
    // To get a double, just return the single value that is in 
    // this matrix:
      return(sse(0,0));
}

// [[Rcpp::export]]
arma::rowvec gradientfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
    // our function now only takes the two specified arguments: a
  // const Rcpp::NumericVector& and an Rcpp::List&.
  // We have to extract all elements from the list:
  arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
  arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
  int N = Rcpp::as<int>(data["N"]); // the sample size
  
  // Next, we want to get the parameters as a column-vector:
    arma::colvec b = Rcpp::as<arma::colvec>(parameters);
  
  // note: we want to return our gradients as row-vector; therefore,
  // we have to transpose the resulting column-vector:
    arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*b);
    
    // other packages, such as glmnet, scale the sse with 
    // 1/(2*N), where N is the sample size. We will do that here as well
    
    gradients *= (.5/N);
    
    return(gradients);
}
'

That’s it, our functions have been transformed!

Step 3: Creating pointers to our functions

This is where it get’s really tricky! We can’t just pass our functions to C++. However, we can create pointers. These have to be generated in C++ and this can be tricky to get right. To simplify the process, we have created a function which helps setting things up:

cat(lessSEM::makePtrs(fitFunName = "fitfunction", # name of the function in C++
                      gradFunName = "gradientfunction" # name of the function in C++
)
)
#> 
#> // INSTRUCTIONS: ADD THE FOLLOWING LINES TO YOUR C++ FUNCTIONS
#> 
#> // IF RCPPARMADILLO IS NOT IMPORTED YET, UNCOMMENT THE FOLLOWING TWO LINES
#> // // [[Rcpp::depends(RcppArmadillo)]]
#> // #include <RcppArmadillo.h>
#> 
#> // Dirk Eddelbuettel at
#> // https://gallery.rcpp.org/articles/passing-cpp-function-pointers/
#> 
#> typedef double (*fitFunPtr)(const Rcpp::NumericVector&, //parameters
#>                 Rcpp::List& //additional elements
#> );
#> typedef Rcpp::XPtr<fitFunPtr> fitFunPtr_t;
#> 
#> typedef arma::rowvec (*gradientFunPtr)(const Rcpp::NumericVector&, //parameters
#>                       Rcpp::List& //additional elements
#> );
#> typedef Rcpp::XPtr<gradientFunPtr> gradientFunPtr_t;
#> 
#> // [[Rcpp::export]]
#> fitFunPtr_t fitfunctionPtr() {
#>         return(fitFunPtr_t(new fitFunPtr(&fitfunction)));
#> }
#> 
#> // [[Rcpp::export]]
#> gradientFunPtr_t gradientfunctionPtr() {
#>         return(gradientFunPtr_t(new gradientFunPtr(&gradientfunction)));
#> }

Let’s follow the instructions and add the lines to our C++ functions:

linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
double fitfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
  // our function now only takes the two specified arguments: a
  // const Rcpp::NumericVector& and an Rcpp::List&.
  // We have to extract all elements from the list:
  arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
  arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
  int N = Rcpp::as<int>(data["N"]); // the sample size
  
  // Next, we want to get the parameters as a column-vector:
    arma::colvec b = Rcpp::as<arma::colvec>(parameters);
    
  // compute the sum of squared errors:
    arma::mat sse = arma::trans(y-X*b)*(y-X*b);
    
    // other packages, such as glmnet, scale the sse with 
    // 1/(2*N), where N is the sample size. We will do that here as well
    
    sse *= 1.0/(2.0 * N);
    
    // note: We must return a double, but the sse is a matrix
    // To get a double, just return the single value that is in 
    // this matrix:
      return(sse(0,0));
}

// [[Rcpp::export]]
arma::rowvec gradientfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
    // our function now only takes the two specified arguments: a
  // const Rcpp::NumericVector& and an Rcpp::List&.
  // We have to extract all elements from the list:
  arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
  arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
  int N = Rcpp::as<int>(data["N"]); // the sample size
  
  // Next, we want to get the parameters as a column-vector:
    arma::colvec b = Rcpp::as<arma::colvec>(parameters);
  
  // note: we want to return our gradients as row-vector; therefore,
  // we have to transpose the resulting column-vector:
    arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*b);
    
    // other packages, such as glmnet, scale the sse with 
    // 1/(2*N), where N is the sample size. We will do that here as well
    
    gradients *= (.5/N);
    
    return(gradients);
}

/// THE FOLLOWING PART IS NEW:

// INSTRUCTIONS: ADD THE FOLLOWING LINES TO YOUR C++ FUNCTIONS

// IF RCPPARMADILLO IS NOT IMPORTED YET, UNCOMMENT THE FOLLOWING TWO LINES
// // [[Rcpp::depends(RcppArmadillo)]]
// #include <RcppArmadillo.h>

// Dirk Eddelbuettel at
// https://gallery.rcpp.org/articles/passing-cpp-function-pointers/
typedef double (*fitFunPtr)(const Rcpp::NumericVector&, //parameters
                Rcpp::List& //additional elements
);
typedef Rcpp::XPtr<fitFunPtr> fitFunPtr_t;

typedef arma::rowvec (*gradientFunPtr)(const Rcpp::NumericVector&, //parameters
                      Rcpp::List& //additional elements
);
typedef Rcpp::XPtr<gradientFunPtr> gradientFunPtr_t;

// [[Rcpp::export]]
fitFunPtr_t fitfunctionPtr() {
        return(fitFunPtr_t(new fitFunPtr(&fitfunction)));
}

// [[Rcpp::export]]
gradientFunPtr_t gradientfunctionPtr() {
        return(gradientFunPtr_t(new gradientFunPtr(&gradientfunction)));
}
'

Compile the functions using Rcpp:

Rcpp::sourceCpp(code = linreg)

Great! Now that this is out of the way, we can create the pointers to our functions:

ffp <- fitfunctionPtr() # create the pointer to the fitting function
# Note that the name of this function will depend on the name of your fitting function.
# For instance, if your fitting function is called sse, then the pointer will be created 
# with ffp <- ssePtr()
gfp <- gradientfunctionPtr() # create the pointer to the gradient function
# Note that the name of this function will depend on the name of your gradient function.
# For instance, if your gradient function is called sseGradient, then the pointer will be created 
# with gfp <- sseGradientPtr()

Optimizing the model

The last step is to call the general purpose optimization. To this end, use the gpLassoCpp function:

l1 <- gpLassoCpp(par = par, 
                 regularized = regularized, 
                 # important: pass the poinnters!
                 fn = ffp, 
                 gr = gfp, 
                 lambdas = lambdas, 
                 # finally, pass the list which the fitting function and the 
                 # gradient function need:
                 additionalArguments = data
)
head(l1@parameters)

#>        lambda alpha theta         b0        b1        b2        b3       b4          b5           b6           b7          b8
#> 1 0.000000000     1     0 0.02738542 1.0129198 0.9991455 0.9705733 1.027625 0.014037188 -0.007460885 0.0185907073 0.021930984
#> 2 0.005263158     1     0 0.02935271 1.0043736 0.9908928 0.9626259 1.025139 0.003365862  0.000000000 0.0143413234 0.015434697
#> 3 0.010526316     1     0 0.02995027 0.9967094 0.9846668 0.9552792 1.021892 0.000000000  0.000000000 0.0096220212 0.010707439
#> 4 0.015789474     1     0 0.03010668 0.9897329 0.9789425 0.9481493 1.018672 0.000000000  0.000000000 0.0049333261 0.006364019
#> 5 0.021052632     1     0 0.03029739 0.9827288 0.9732059 0.9409868 1.015363 0.000000000  0.000000000 0.0001773121 0.002036026
#> 6 0.026315789     1     0 0.03112461 0.9753368 0.9670620 0.9338617 1.011553 0.000000000  0.000000000 0.0000000000 0.000000000
#>             b9         b10
#> 1 -0.009900023 0.027401255
#> 2 -0.007939417 0.022297387
#> 3 -0.005256688 0.017464767
#> 4 -0.002393495 0.012713141
#> 5  0.000000000 0.007969742
#> 6  0.000000000 0.003303734

Benchmarking this approach results in:

#> [1] 0.01474500 0.01400304 0.01196003 0.01131010 0.01107502

So, we have reduced our runtime even more!