RcppSimdJson: Rcpp Bindings for the simdjson Header Library

CI License CRAN Dependencies Downloads Code Coverage Last Commit

Motivation

simdjson by Daniel Lemire (with contributions by Geoff Langdale, John Keiser and many others) is an engineering marvel. Through very clever use of SIMD instructions, it manages to parse JSON files faster than disc access. Wut? Yes you read that right: parallel processing with so little overhead that the net throughput is limited only by disk speed.

Moreover, it is implemented in neat modern C++ and can be accessed as a header-only library. (Well, one library in two files, really.) Which makes R packaging easy and convenient and compelling. So here we are.

For further introduction, see the arXiv paper by Langdale and Lemire (out/to appear in VLDB Journal 28(6) as well) and/or the video of the recent talk by Daniel Lemire at QCon (voted best talk).

Example

jsonfile <- system.file("jsonexamples", "twitter.json", package="RcppSimdJson")
library(RcppSimdJson)
validateJSON(jsonfile)                  # validate a JSON file
res <- fload(jsonfile)                  # parse a JSON file

Comparison

A simple parsing benchmark against four other R-accessible JSON parsers:

R> res
Unit: milliseconds
     expr      min       lq     mean   median       uq       max neval  cld
 simdjson  1.87118  2.03252  2.24351  2.17228  2.27756   6.57145   100 a
  jsonify  8.91694  9.20124  9.58652  9.46077  9.73692  13.41707   100  b
  RJSONIO 10.49187 11.09410 11.69109 11.42555 11.95780  17.93653   100  b
   ndjson 27.04830 28.62251 31.44330 29.51343 32.05847 146.88221   100   c
 jsonlite 34.93334 36.54784 38.67843 37.74890 40.19555  46.32444   100    d
R>

Or in chart form:

Status

All three major OSs are supported, and JSON can be parsed from file and string under a variety of settings. A C++17 compiler is required for ease of setup (though the upstream can fall back to older compiler; one can edit src/Makevars accordingly if need be).

Contributing

Any problems, bug reports, or features requests for the package can be submitted and handled most conveniently as Github issues in the repository.

Before submitting pull requests, it is frequently preferable to first discuss need and scope in such an issue ticket. See the file Contributing.md (in the Rcpp repo) for a brief discussion.

See Also

For standard JSON work on R, as well as for other nicely done C++ libraries, consider these:

Author

For the R package, Dirk Eddelbuettel and Brendan Knapp.

For everything pertaining to simdjson, Daniel Lemire (and many contributors).