RcppSimdJson: Rcpp Bindings for the simdjson Header Library
Motivation
simdjson by Daniel Lemire (with contributions by Geoff Langdale, John Keiser and many others) is an engineering marvel. Through very clever use of SIMD instructions, it manages to parse JSON files faster than disc access. Wut? Yes you read that right: parallel processing with so little overhead that the net throughput is limited only by disk speed.
Moreover, it is implemented in neat modern C++ and can be accessed as a header-only library. (Well, one library in two files, really.) Which makes R packaging easy and convenient and compelling. So here we are.
For further introduction, see the arXiv paper by Langdale and Lemire (out/to appear in VLDB Journal 28(6) as well) and/or the video of the recent talk by Daniel Lemire at QCon (voted best talk).
Example
jsonfile <- system.file("jsonexamples", "twitter.json", package="RcppSimdJson")
library(RcppSimdJson)
validateJSON(jsonfile) # validate a JSON file
res <- fload(jsonfile) # parse a JSON file
Comparison
A simple parsing benchmark against four other R-accessible JSON parsers:
R> res
Unit: milliseconds
expr min lq mean median uq max neval cld
simdjson 1.87118 2.03252 2.24351 2.17228 2.27756 6.57145 100 a
jsonify 8.91694 9.20124 9.58652 9.46077 9.73692 13.41707 100 b
RJSONIO 10.49187 11.09410 11.69109 11.42555 11.95780 17.93653 100 b
ndjson 27.04830 28.62251 31.44330 29.51343 32.05847 146.88221 100 c
jsonlite 34.93334 36.54784 38.67843 37.74890 40.19555 46.32444 100 d
R>
Or in chart form:
Status
All three major OSs are supported, and JSON can be parsed from file and string under a variety of settings. A C++17 compiler is required for ease of setup (though the upstream can fall back to older compiler; one can edit src/Makevars accordingly if need be).
Contributing
Any problems, bug reports, or features requests for the package can be submitted and handled most conveniently as Github issues in the repository.
Before submitting pull requests, it is frequently preferable to first discuss need and scope in such an issue ticket. See the file Contributing.md (in the Rcpp repo) for a brief discussion.
See Also
For standard JSON work on R, as well as for other nicely done C++ libraries, consider these:
- jsonlite by Jeroen Ooms is excellent, very versatile, and probably most-widely used;
- rapidjsonr and jsonify by David Cooley bringing RapidJSON to R;
- ndjson by Bob Rudis builds on the JSON for Modern C++ library by Niels Lohmann;
- RJSONIO by Duncan Temple Lang started all this but could use a little love.
Author
For the R package, Dirk Eddelbuettel and Brendan Knapp.
For everything pertaining to simdjson, Daniel Lemire (and many contributors).