Passing expressions and data from R to C++ before compile-time in Rmarkdown
Introduction
In this post we give a simple illustrative example of how data generated by R code can be used by compiled languages such as C++ at compile time, instead of run-time, inside Rmarkdown.
This is an example of inter-language code generation. Metaprogramming/code generation is an extremely powerful technique but it’s also one that is very easy to overdo. This is just a fun example to learn from. Thorough testing is very important for any production code.
Using other languages in Rmarkdown
Out of the box Rmarkdown can work with the following languages assuming a proper back-end is available:
names(knitr::knit_engines$get())
## [1] "awk" "bash" "coffee" "gawk" "groovy"
## [6] "haskell" "lein" "mysql" "node" "octave"
## [11] "perl" "psql" "Rscript" "ruby" "sas"
## [16] "scala" "sed" "sh" "stata" "zsh"
## [21] "highlight" "Rcpp" "tikz" "dot" "c"
## [26] "fortran" "fortran95" "asy" "cat" "asis"
## [31] "stan" "block" "block2" "js" "css"
## [36] "sql" "go" "python" "julia" "sass"
## [41] "scss" "theorem" "lemma" "corollary" "proposition"
## [46] "conjecture" "definition" "example" "exercise" "proof"
## [51] "remark" "solution"
Although we can use R’s native foreign function interface to call compiled code, for C++ a higher level alternative is to use Rcpp. In Rmarkdown we can compile C++ code chunks using Rcpp and export the compiled functions to be available for use in R.
As a common example, we can compile the following code
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector timesTwo(NumericVector x)
{
return x * 2;
}
and use the exported function in R
timesTwo(1:10)
## [1] 2 4 6 8 10 12 14 16 18 20
Registering a user-defined language engine in Knitr
We can create user-defined engines to control exactly how the code chunk is sourced, or even modify existing engines. To get an idea we can look at the default Rcpp engine used by knitr:
knitr::knit_engines$get()$Rcpp
## function (options)
## {
## sourceCpp = getFromNamespace("sourceCpp", "Rcpp")
## code = one_string(options$code)
## opts = options$engine.opts
## cache = options$cache && ("cacheDir" %in% names(formals(sourceCpp)))
## if (cache) {
## opts$cacheDir = paste(valid_path(options$cache.path,
## options$label), "sourceCpp", sep = "_")
## opts$cleanupCacheDir = TRUE
## }
## if (!is.environment(opts$env))
## opts$env = knit_global()
## if (options$eval) {
## message("Building shared library for Rcpp code chunk...")
## do.call(sourceCpp, c(list(code = code), opts))
## }
## options$engine = "cpp"
## engine_output(options, code, "")
## }
## <environment: namespace:knitr>
Using the default engine above as a template we can define a new knitr engine for compiling C++. One that can read and make use of more dynamic R data in C++ before compilation (or even dynamically create Makevars files to control compilation flags). First let’s include the knitr package:
library(knitr)
Next let’s take a crack at defining a new engine to compile C++ code. In this example we will modify the current Rcpp engine to take in an extra field (but otherwise behave the same).
knit_engines$set(RcppFoo = function(options) {
extra = options$extra
sourceCpp = getFromNamespace("sourceCpp", "Rcpp")
## Code is read as a list of strings, one list element per line
## Here we append extra code that may be defined in R to the
## code written in the chunk
code = c(extra, options$code)
code = paste(code, collapse = '\n')
opts = options$engine.opts
if (!is.environment(opts$env))
opts$env = knit_global()
if (options$eval) {
message("Building shared library for Rcpp code chunk...")
do.call(sourceCpp, c(list(code = code), opts))
}
options$engine = "cpp"
engine_output(options,
options$code,
paste("Added the lines:\n",
paste(extra, collapse = '\n'),
sep = '\n'))
})
Next we test by creating some data in R and using that as a compile time constant in C++. Here we pass values of pi and e as static const doubles to C++ (a much cleaner API is possible of course).
constants = list(
paste('static const double Pi =', pi, ';'),
paste('static const double Euler =', exp(1),';')
)
This already highlights a danger as we have not considered exactly how R might convert these double precision floating point numbers to strings. Regardless, we proceed. To use the new engine we run the engine as {RcppFoo test_chunk, extra = constants}
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector timesFoo(NumericVector x)
{
return x * Pi + Euler;
}
## Added the lines:
##
## static const double Pi = 3.14159265358979 ;
## static const double Euler = 2.71828182845905 ;
x = timesFoo(1:10)
print(x)
## [1] 5.859874 9.001467 12.143060 15.284652 18.426245 21.567838 24.709430
## [8] 27.851023 30.992616 34.134208
We get almost the same result as in R
y = pi*(1:10)+exp(1)
print(y)
## [1] 5.859874 9.001467 12.143060 15.284652 18.426245 21.567838 24.709430
## [8] 27.851023 30.992616 34.134208
But metaprogramming can be dangerous when mixed with floating point arithmetic. In this case some loss of precision occurred with the doubles when converting to strings:
x - y
## [1] 1.776357e-15 0.000000e+00 -3.552714e-15 -7.105427e-15 -1.065814e-14
## [6] -1.421085e-14 -1.776357e-14 -1.776357e-14 -2.131628e-14 -2.842171e-14
as.double(as.character(pi))*(1:10) + as.double(as.character(exp(1))) - x
## [1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [6] 3.552714e-15 3.552714e-15 0.000000e+00 0.000000e+00 0.000000e+00
Anyway this was just a small example. There are many many directions one can choose to take with metaprogramming. Even creating new preprocessing directives such as unrolling loops, defining constexprs, etc.