Getting started with cppally

Let’s briefly show some of the capabilities of cppally, from its custom C++ scalar and vectors, to using templates and concepts.

Setup

Let’s start by loading cppally

library(cppally)

Registering R functions

To make a C++ function available to R we use the [[cppally::register]] tag.

#include <cppally.hpp>
using namespace cppally;

[[cppally::register]]
void hello_world(){
  print("Hello World!");
}

After tagging our functions we want to make them available to R. To do that we have a few routes.

Registering C++ functions outside of a package context

After writing our hello world program in foo.cpp we can use cpp_source() to compile and register the function to R.

cpp_source(file = "src/foo.cpp")

Now the function is available in R

hello_world()
#> Hello World!

Similarly we can use the helper cpp_eval to run simple expressions and return the result without needing to include cppally.hpp and register the function.

cpp_eval('print("Hello World Again!")')
#> Hello World Again!

Note - For the rest of the examples it is assumed that the following code is always included beforehand.

#include <cppally.hpp>
using namespace cppally;

Registering C++ functions inside a cppally-linked package

Since cppally is header-only, we can include the headers directly into our own package.

General steps to using cppally in a package

  1. Create package (if you haven’t already done so) using usethis::create_tidy_package()
  2. Run cppally::use_cppally()
  3. Run cppally::document()

This will automatically add the necessary package content needed to start working with cppally. For continuous development, use cppally::load_all() to compile and register cppally tagged functions, including our hello world function.

Note: We aim to integrate cppally registration into the devtools framework for ease-of-use.

C++ types

cppally offers a rich set of R types in C++ that are NA-aware. This means that common arithmetic and logical operations will account for NA in a similar fashion to R.

logical scalar - r_lgl

cppally’s scalar version of logical, r_lgl can represent true, false or NA.

r_true
r_false
r_na
#> [1] TRUE
#> [1] FALSE
#> [1] NA

Logical operators work just like in R


[[cppally::register]]
r_vec<r_lgl> lgl_ops(){
  return make_vec<r_lgl>(
    r_true || r_false, // true
    r_true && r_false, // false
    r_na || r_true,    // true
    r_na && r_true,    // NA
    r_na && r_false,   // false
    r_na || r_na,      // NA
    r_na && r_na      // NA
  );
}
lgl_ops()
#> [1]  TRUE FALSE  TRUE    NA FALSE    NA    NA

Using r_lgl in if-statements

For type-safety reasons r_lgl cannot be implicitly converted to bool except in if-statements where an error is thrown if the value is NA.

DON’T do this:


[[cppally::register]]
void bad_lgl_print(r_lgl condition){
  if (condition){
    print("true");
  } else {
    print("false");
  }
}
bad_lgl_print(TRUE)
#> true
bad_lgl_print(FALSE)
#> false
bad_lgl_print(NA) # Can't implicitly convert NA to bool
#> Error:
#> ! Cannot implicitly convert r_lgl NA to bool, please check

DO this:


[[cppally::register]]
void good_lgl_print(r_lgl condition){
  if (is_na(condition)){
    print("NA");
  } else if (condition){
    print("true");
  } else {
    print("false");
  }
}
good_lgl_print(TRUE)
#> true
good_lgl_print(FALSE)
#> false
good_lgl_print(NA) # NA is handled explicitly so no issues
#> NA

We can also use r_lgl members is_true() and is_false() which return bool and are equivalent to R’s isTRUE() and isFALSE()


[[cppally::register]]
void also_good_lgl_print(r_lgl condition){
  if (condition.is_true()){
    print("true");
  } else {
    print("not true");
  }
}
also_good_lgl_print(TRUE)
#> true
also_good_lgl_print(FALSE)
#> not true
also_good_lgl_print(NA) # Falls into 'not true' branch here as expected
#> not true

All cppally scalar types are implemented as structs that contain the underlying C/C++ types as well as other member functions.

cppally type Description Implicitly converts to
r_lgl Scalar logical bool only in if-statements
r_int Scalar integer int
r_int64 Scalar 64-bit integer int64_t
r_dbl Scalar double double
r_str Scalar string SEXP
r_cplx Scalar double complex std::complex<double>
r_raw Scalar raw unsigned char
r_sym Symbol SEXP
r_date 1 Scalar date double
r_psxct Scalar date-time double
r_sexp Generic R object (SEXP)2 SEXP

NA values can be accessed via the template function na<T>

C++ NA values and their R C API equivalents

Type Value R C API Value constexpr?3
r_lgl na<r_lgl>()/r_na NA_LOGICAL Yes
r_int na<r_int>() NA_INTEGER Yes
r_int64 na<r_int64>() Not applicable Yes
r_dbl na<r_dbl>() NA_REAL Yes
r_str na<r_str>() NA_STRING No
r_cplx na<r_cplx>() Not applicable Yes
r_sym Not applicable Not applicable No
r_sexp4


na<r_sexp>/r_null R_NilValue No

Vectors

cppally vectors are templated and can be thought of as containers of scalar elements like r_int, r_dbl, etc.

We can create vectors like so


// Integer vector of size n
[[cppally::register]]
r_vec<r_int> new_integer_vector(int n){
  r_vec<r_int> int_vctr(n, /*fill = */ r_int(0));
  return int_vctr;
}
new_integer_vector(3)
#> [1] 0 0 0

inline vectors

To create inline vectors, use make_vec<>

make_vec<r_dbl>(1, 1.5, 2, na<r_dbl>())
#> [1] 1.0 1.5 2.0  NA

We can add names on the fly with arg()


make_vec<r_dbl>(
    arg("first") = 1,
    arg("second") = 1.5,
    arg("third") = 2,
    arg("last") = na<r_dbl>()
  )
#>  first second  third   last 
#>    1.0    1.5    2.0     NA

In R a list is a generic vector, so cppally defines lists as r_vec<r_sexp>, a vector of the generic type r_sexp.

make_vec<r_sexp>(1, 2, 3)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3

A list of all cppally vectors of length 0


[[cppally::register]]
r_vec<r_sexp> all_vectors(){
  return make_vec<r_sexp>(
    arg("logical") = r_vec<r_lgl>(),
    arg("integer") = r_vec<r_int>(),
    arg("integer64") = r_vec<r_int64>(), // Requires bit64
    arg("double") = r_vec<r_dbl>(),
    arg("character") = r_vec<r_str>(),
    arg("character") = r_vec<r_str_view>(),
    arg("raw") = r_vec<r_raw>(),
    arg("date") = r_vec<r_date>(),
    arg("date-time") = r_vec<r_psxct>(),
    arg("list") = r_vec<r_sexp>()
  );
}
all_vectors()
#> $logical
#> logical(0)
#> 
#> $integer
#> integer(0)
#> 
#> $integer64
#> integer64(0)
#> 
#> $double
#> numeric(0)
#> 
#> $character
#> character(0)
#> 
#> $character
#> character(0)
#> 
#> $raw
#> raw(0)
#> 
#> $date
#> Date of length 0
#> 
#> $`date-time`
#> POSIXct of length 0
#> 
#> $list
#> list()

Concepts and Templates

One of the most powerful features of C++20 are concepts. These allow users to write human-readable templates and constraints.

When writing your own templates, it is highly encouraged to place them in headers for cppally registration to work correctly.

Let’s practice by creating an absolute function in C++ using templates and the RMathType concept.


template <RMathType T>
[[cppally::register]]
T cpp_abs(T x){
  if (is_na(x)) return na<T>();

  if (x < 0){
    return -x;
  } else {
    return x;
  }
}

Works correctly for doubles

cpp_abs(-5)
#> [1] 5
cpp_abs(0)
#> [1] 0
cpp_abs(100)
#> [1] 100
cpp_abs(NA_real_)
#> [1] NA

It also works for integers

cpp_abs(-3L)
#> [1] 3
cpp_abs(NA_integer_)
#> [1] NA

The top-line template <RMathType T> declares a template that encapsulates T, an RMathType - a concept that contains r_lgl, r_int, r_int64 and r_dbl

If x is NA then we immediately also return NA via na<T>() which is a templated function that returns NA of the input type T.

Without templates, writing C++ functions that accept flexible inputs is quite difficult because C++ is a statically-typed language. Usually one would write one absolute function for doubles and another for integers whereas here we don’t have to.

Notes on templates

To correctly register templates, the ‘[[cppally::register]]’ tag must always go above the function name.

template <typename T>
[[cppally::register]] // <--- Here
T foo(T x){
  return x;
}

Explicit instantiation (from R) is unfortunately not possible and template types must be deduced from supplied arguments.

template <typename T>
[[cppally::register]]
T foo(){
    return T();
}

You may get a cryptic compiler error like this

error: no matching function for call to 'foo()'
[]<typename T>() -> decltype(cpp_to_sexp(::foo())) {

along with an equally cryptic note

note:   couldn't deduce template parameter 'T'
[]<typename T>() -> decltype(cpp_to_sexp(::foo())) {

This is because the parameter T cannot be automatically deduced from any of the function inputs. Even though these kinds of templates can be written with cppally, they cannot be exported to R.

An obvious and somewhat ugly workaround is to include a prototype argument that allows the template parameter to be deduced from.


// Return the default constructor result of RScalar types

template <RScalar T>
[[cppally::register]]
T scalar_default(T ptype){
    return T();
}
scalar_default(integer(1)) # Default is 0L
#> [1] 0
scalar_default(numeric(1)) # Default is 0.0
#> [1] 0
scalar_default(character(1)) # Default is ""
#> [1] ""

Exporting variadic templates are also not supported. The best alternative is to use lists (r_vec<r_sexp>).

In the above example we used the RScalar concept which includes all cppally scalar types (excluding r_sexp). For a list of all cppally concepts, please see the Annex

Coercion

To coerce from one scalar to another we can use as<T>


[[cppally::register]]
r_int double_to_int(r_dbl x){
  return as<r_int>(x);
}
double_to_int(pi)
#> [1] 3
double_to_int(NA_real_)
#> [1] NA

We can also coerce from one vector type to another


[[cppally::register]]
r_vec<r_int> to_int_vec(r_vec<r_dbl> x){
  return as<r_vec<r_int>>(x);
}
to_int_vec(c(0, 1.5, NA))
#> [1]  0  1 NA

Since as<T> is extremely flexible, we can also coerce from a scalar to a vector or vice versa


[[cppally::register]]
r_vec<r_sexp> coercions(){
    r_dbl a(4.2);
    r_vec<r_dbl> b = make_vec<r_dbl>(2.5);
    return make_vec<r_sexp>(
        as<r_vec<r_int>>(a),
        as<r_int>(a),
        as<r_int>(b),
        as<r_dbl>(b)
    );
}
coercions()
#> [[1]]
#> [1] 4
#> 
#> [[2]]
#> [1] 4
#> 
#> [[3]]
#> [1] 2
#> 
#> [[4]]
#> [1] 2.5

Strings

cppally provides the useful string type r_str

We can create R strings easily

r_str("hello")
#> [1] "hello"

To get a C or C++ string, use the members c_str() and cpp_str() respectively

C string via c_str()

r_str("hello").c_str()
#> [1] "hello"

C++ string_view via cpp_str()

This can be converted into a std::string via its constructor


[[cppally::register]]
r_str str_concatenate(r_str x, r_str y, r_str sep){
  std::string left = std::string(x.cpp_str());
  std::string right = std::string(y.cpp_str());
  std::string middle = std::string(sep.cpp_str());
  std::string combined = left + middle + right;
  return r_str(combined.c_str());
}
str_concatenate("hello", "how are you?", sep = ", ")
#> [1] "hello, how are you?"

Symbols

Symbols have class r_sym and can be created directly from a string literal

r_sym("new_symbol")
#> new_symbol

Or from a cppally string

r_sym(r_str("symbol_from_string"))
#> symbol_from_string

Cached strings & symbols

cppally provides an efficient caching strategy for constructing cppally strings/symbols from string literals

cached_str<>

cached_str<"cached_string">()
#> [1] "cached_string"

This initialises the string once, caches it (to R’s CHARSXP pool), and efficiently re-uses the cached string for each subsequent call.

We can cache symbols in a similar way

cached_sym<"cached_symbol">()
#> cached_symbol

Lists

r_sexp is generally interpreted as an “element of a list” since lists are defined as r_vec<r_sexp>, a vector that holds generic r_sexp elements.


using list = r_vec<r_sexp>;

[[cppally::register]]
list new_list(int n){
  return list(n);
}
new_list(0)
#> list()
new_list(3)
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL

The problem with a class like r_sexp is that it is by design generic and therefore difficult to work with in C++. To disambiguate the actual type we can use visit_vector() or visit_sexp() via a C++ lambda.

Example: using visit_vector() to resize every vector to length n in-place


[[cppally::register]]
r_vec<r_sexp> resize_all(r_vec<r_sexp> x, r_size_t n){
    r_size_t list_length = x.length();
    for (r_size_t i = 0; i < list_length; ++i){
        visit_vector(x.view(i), [&](auto vec) {
            x.set(i, vec.resize(n));
        });
    }
    return x;
}
# Resize to size 1
resize_all(list(1:5, letters), n = 1)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "a"

When we pass a non-vector to visit_vector, it aborts and explains that the input must be a vector

resize_all(list(mean_fn = mean), 1)
#> Error:
#> ! `x` must be a vector to be instantiated from an `r_sexp`

visit_sexp

This allows us to visit more types than just vectors, including factors, symbols and (soon to be implemented) data frames. When an object’s type can’t be deduced into a distinct type, r_sexp is returned.

Example: Same example as above but with visit_sexp()


[[cppally::register]]
r_vec<r_sexp> resize_all2(r_vec<r_sexp> x, r_size_t n){
    r_size_t list_length = x.length();
    for (r_size_t i = 0; i < list_length; ++i){
        visit_sexp(x.view(i), [&](auto vec) {
          using vec_t = decltype(vec); // type of object `vec`
          if constexpr (RVector<vec_t>){
            x.set(i, vec.resize(n));
          } else {
            abort("Cannot resize a non-vector");
          }
        });
    }
    return x;
}
# Resize to size 1
resize_all2(list(1:5, letters), n = 1)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "a"
resize_all2(list(mean_fn = mean), n = 1)
#> Error:
#> ! Cannot resize a non-vector

Factors

We can create a factor via r_factors()


[[cppally::register]]
r_factors new_factor(r_vec<r_str> x){
    return r_factors(x);
}
new_factor(letters)
#>  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
#> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

In cppally, like R, factors are not vectors and therefore do not satisfy the RVector concept. To access the underlying integer codes vector, use the public codes() member function


static_assert(!RVector<r_factors>);

[[cppally::register]]
r_vec<r_int> factor_codes(r_factors x){
    return x.codes();
}
letter_fct <- new_factor(letters)

letter_fct |>
    factor_codes()
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26

Attributes

Attributes can be manipulated via functions defined in the attr namespace.

Example: Converting a list of samples to a data frame


[[cppally::register]]
r_vec<r_sexp> list_as_df(r_vec<r_sexp> x){

  r_size_t n = x.length();

  if (n_unique(x.lengths()) > 1){
    abort("List must have vectors of equal length to be converted to a data frame");
  }

  r_vec<r_str> names(attr::get_attr(x, cached_sym<"names">()));
  if (names.is_null()){
     abort("list must have names to be converted to a data frame");
  }

  r_vec<r_sexp> out = shallow_copy(x);

  int nrow = 0;
  r_vec<r_int> row_names;
  if (n > 0){
    nrow = out.view(0).length();
    row_names = make_vec<r_int>(na<r_int>(), -nrow);
  }

  attr::set_attr(out, cached_sym<"row.names">(), row_names);
  attr::set_attr(out, cached_sym<"class">(), make_vec<r_str>("data.frame"));
  return out;
}
set.seed(42)
norm_samples <- lapply(1:5, \(x) rnorm(10, mean = x))
names(norm_samples) <- paste0("sample_", 1:5)
list_as_df(norm_samples)
#>     sample_1   sample_2 sample_3 sample_4 sample_5
#> 1  2.3709584  3.3048697 2.693361 4.455450 5.205999
#> 2  0.4353018  4.2866454 1.218692 4.704837 4.638943
#> 3  1.3631284  0.6111393 2.828083 5.035104 5.758163
#> 4  1.6328626  1.7212112 4.214675 3.391074 4.273295
#> 5  1.4042683  1.8666787 4.895193 4.504955 3.631719
#> 6  0.8938755  2.6359504 2.569531 2.282991 5.432818
#> 7  2.5115220  1.7157471 2.742731 3.215541 4.188607
#> 8  0.9053410 -0.6564554 1.236837 3.149092 6.444101
#> 9  3.0184237 -0.4404669 3.460097 1.585792 4.568554
#> 10 0.9372859  3.3201133 2.360005 4.036123 5.655648

More useful attribute helpers

Sugar functions

cppally also offers many useful and high-performance common functions in cppally/sugar

Example: n_unique() - fast calculation of number of unique values.


template <RVector T>
[[cppally::register]]
r_int cpp_n_unique(T x){
  return as<r_int>(n_unique(x));
}
library(bench)
x <- sample(1:100, 10^5, replace = TRUE)
mark(
  base_n_unique = length(unique(x)),
  cppally_n_unique = cpp_n_unique(x)
)
#> # A tibble: 2 × 6
#>   expression            min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 base_n_unique       553µs    734µs     1247.    1.38MB     34.7
#> 2 cppally_n_unique    171µs    214µs     4149.        0B      0

More useful sugar functions

Scalar math functions

There is a rich suite of math functions. Some examples include min(), max(), round(), log(), floor(), ceiling() and more.

Stats sugar functions

Some statistical summary functions that are all very highly optimised for speed

Annex

Symbols in R-registered templates

r_sym is unsupported in templates when it’s part of a template argument but is supported when the argument is explicitly an r_sym.

[[cppally::register]]
r_str symbol_to_string(r_sym x){
    return as<r_str>(x);
}
hello_world_symbol <- as.symbol("hello world!")
hello_world_symbol
`hello world!`
symbol_to_string(hello_world_symbol)
[1] "hello world!"

All core cppally concepts

Other useful type traits

Accessing the underlying types and values

While it is generally recommended not to access the underlying objects, you can do so with unwrap() which returns the underlying C/C++ value. For example, unwrap(r_int(5)) will return an int of value 5.

To access the underlying type, use unwrap_t<> which always aligns with unwrap()

The main reason for wanting to access underlying values would likely be optimisation and so unwrap() and unwrap_t allow this to be done consistently.

Example: Summing a double vector using r_vec<T>::data() member


[[cppally::register]]
double primitive_sum(const r_vec<r_dbl>& x){

  // r_vec<T>::data_type always returns typename T
  using data_t = typename std::remove_cvref_t<decltype(x)>::data_type;

  using primitive_t = unwrap_t<data_t>;
  primitive_t *p_x = x.data();

  r_size_t n = x.length();
  double sum = 0;

  OMP_SIMD_REDUCTION1(+:sum)
  for (r_size_t i = 0; i < n; ++i){
    sum += p_x[i];
  }
  return sum;
}
x <- rnorm(10^5)
primitive_sum(x)
#> [1] -467.8787

  1. Unlike r_str which is composite and holds an r_sexp member, r_date and r_psxct instead inherit directly from r_dbl. This means that they can implicitly convert to r_dbl↩︎

  2. r_sexp represents a generic R object which can include cppally vectors. We will explain how to disambiguate r_sexp later which is most useful when working with lists and data frames↩︎

  3. In C++ constexpr is used as a keyword to declare that it’s possible to evaluate values at compile-time, meaning they are known before any code is run by the user. Since r_na internally is the largest possible int which does not change and is known a priori, it is therefore a compile-time constant.↩︎

  4. Having an NA sentinel for r_sexp is very useful when writing templates involving vectors. For this reason the NA sentinel is r_null. This doesn’t mean is_na(r_null) is true, and is intentionally not true because it is not a scalar and therefore cannot be NA. As r_null represents the absence of a tangible R object, it can be thought of as a zero-length object and since all NA values are represented as length-1 vectors (in R), is_na(r_null) should not return true.↩︎