Introduction to ‘armadillo’

Motivations

The development of armadillo emerges from the desire to follow a simplified approach towards R and C++ integration by building on top of cpp11, a ground up rewrite of C++ bindings to R with different design trade-offs and features. armadillo aims at providing an additional layer to put the end-user focus on the computation instead of configuration (Vaughan, Hester, and François 2023).

Armadillo is a linear algebra library for the C++ language, aiming towards a good balance between speed and ease of use. It is justified in the fact that C++, in its current form, is very valuable to address bottlenecks that we find with interpreted languages such as R and Python but it does not provide data structures nor functions for linear algebra (Sanderson and Curtin 2016).

RcppArmadillo was first published to CRAN in 2010, and it allows to use Armadillo via Rcpp, a widely extended R package to call C++ functions from R (Eddelbuettel and Sanderson 2014).

Design choices

The design choices in armadillo are:

These ideas reflect a comprehensive effort to provide an efficient interface for integrating C++ and R that aligns with the Tidy philosophy (Wickham et al. 2019), addressing both the technical and community-driven aspects that influence software evolution.

These choices have advantages and disadvantages. A disadvantage is that armadillo will not convert data types automatically, the user must be explicit about data types, especially when passing data from R to C++ and then exporting the final computation back to R. An advantage is that armadillo codes, including its internal templates, can be adapted to work with Python via pybind11 (Jakob, Rhinelander, and Moldovan 2016).

armadillo uses Hansen (2022) notation, meaning that matrices are column-major and vectors are expressed as column vectors (i.e., \(N\times1\) matrices).

Examples

Convention: input R matrices are denoted by x, y, z, and output or intermediate C++ matrices are denoted by X, Y, Z. The example functions can be called from R scripts and should have proper headers as in the following code:

#include <armadillo.hpp>
#include <cpp11.hpp>
#include <cpp11armadillo.hpp>

using namespace arma;
using namespace cpp11;

[[cpp11::register]] // allows using the function in R
doubles_matrix<> solve_mat(doubles_matrix<> x) {
  Mat<double> Y = as_Mat(x); // convert from R to C++
  Mat<double> Yinv = inv(Y); // Y^(-1)
  return as_doubles_matrix(Yinv); // convert from C++ to R
}

This example includes the Armadillo, cpp11 and armadillo libraries, and allows interfacing C++ with R (i.e., the #include <cpp11.hpp>). It also loads the corresponding namespaces (i.e., the using namespace cpp11) in order to simplify the notation (i.e., using Mat instead of arma::Mat).

The as_Mat() function is provided by armadillo to pass a matrix object from R to C++ and that Armadillo can read.

The as_doubles_matrix() function is also provided by armadillo to pass a Mat<double> or Mat<int> object from C++ to R.

Ordinary Least Squares

Given a design matrix \(X\) and and outcome vector \(y\), one function to obtain the OLS estimator \(\hat{\beta} = (X^tX)^{-1}(X^tY)\) as a matrix (i.e., column vector) is:

Mat<double> ols_(const doubles_matrix<>& y, const doubles_matrix<>& x) {
  Mat<double> Y = as_Mat(y);  // Col<double> Y = as_Col(y); also works
  Mat<double> X = as_Mat(x);

  Mat<double> XtX = X.t() * X;             // X'X
  Mat<double> XtX_inv = inv(XtX);          // (X'X)^(-1)
  Mat<double> beta = XtX_inv * X.t() * Y;  // (X'X)^(-1)(X'Y)

  return beta;
}

[[cpp11::register]] doubles_matrix<> ols_mat(const doubles_matrix<>& y,
                                             const doubles_matrix<>& x) {
  Mat<double> beta = ols_(y, x);
  return as_doubles_matrix(beta);
}

[[cpp11::register]] doubles ols_dbl(const doubles_matrix<>& y,
                                    const doubles_matrix<>& x) {
  Mat<double> beta = ols_(y, x);
  return as_doubles(beta);
}

The ols_mat() function receives inputs from R and calls ols_() to do the computation on C++ side. The use of const and & are specific to the C++ language and allow to pass data from R to C++ while avoiding copying the data, therefore saving time and memory.

The ols_dbl() function does the same but returns a vector instead of a matrix.

Benchmarks

A proper benchmark is to compute eigenvalues for large matrices. Both armadillo and RcppArmadillo use Armadillo as a backend, and the marginal observed differences are because of how cpp11 and Rcpp pass data from R to C++ and viceversa. The computation times are identical.

Input Median time armadillo Median time RcppArmadillo
500x500 35.07ms 36.4ms
1000x1000 260.28ms 263.21ms
1500x1500 874.62ms 857.31ms
2000x2000 2.21s 2.21s
Input Memory allocation armadillo Memory allocation RcppArmadillo
500x500 17.1KB 4.62MB
1000x1000 21KB 4.62MB
1500x1500 24.9KB 4.63MB
2000x2000 28.8KB 4.63MB

The armadillo computation was obtained with the eigen_sym_mat() function already shown.

The RcppArmadillo computation was obtained with the following function:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;

// [[Rcpp::export]]
arma::mat eigen_sym_mat(const arma::mat& x) {
  arma::mat y = eig_sym(x);
  return y;
}

In order to get the RcppArmadillo function to work, we had to dedicate time to search online about the error function 'enterRNGScope' not provided by package 'Rcpp', which required to include // [[Rcpp::depends(RcppArmadillo)]] for the function to work.

Additional Examples

The package repository includes the directory armadillotest, which contains an R package that uses Armadillo, and that provides additional examples for eigenvalues, QR decomposition, and others.

Conclusion

RcppArmadillo has been and will continue to be widely successful. armadillo is a alternative templated implementation with different design trade-offs and features. Both packages can co-exist and continue to enrich the R community.

References

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” Computational Statistics & Data Analysis 71 (March): 1054–63. https://doi.org/10.1016/j.csda.2013.02.005.

Hansen, Bruce. 2022. Econometrics. Princeton University Press.

Jakob, Wenzel, Jason Rhinelander, and Dean Moldovan. 2016. Pybind11 — Seamless Operability Between C++11 and Python. https://github.com/pybind/pybind11.

Sanderson, Conrad, and Ryan Curtin. 2016. “Armadillo: A Template-Based C++ Library for Linear Algebra.” Journal of Open Source Software 1 (2): 26. https://doi.org/10.21105/joss.00026.

Vaughan, Davis, Jim Hester, and Romain François. 2023. Cpp11: A C++11 Interface for R’s c Interface. https://CRAN.R-project.org/package=cpp11.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.