Interfacing C++ code from R Sren Hjsgaard Department of - - PowerPoint PPT Presentation

interfacing c code from r
SMART_READER_LITE
LIVE PREVIEW

Interfacing C++ code from R Sren Hjsgaard Department of - - PowerPoint PPT Presentation

Interfacing C++ code from R Sren Hjsgaard Department of Mathematical Sciences Aalborg University, Denmark November 20, 2012 Printed: November 20, 2012 File: interfaceCpp-slides.tex 2 Contents 1 Calling C++ from R 3 2 Some important


slide-1
SLIDE 1

Interfacing C++ code from R

Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark November 20, 2012

Printed: November 20, 2012 File: interfaceCpp-slides.tex

slide-2
SLIDE 2

2

Contents

1 Calling C++ from R 3 2 Some important libraries and packages 4 3 Example: The exponential function 6 3.1 Using Rcpp together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Example: Calculating Fibonacci recursively 12 5 Example: Matrix multiplication 15 5.1 Using Rcpp together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.2 Using RcppArmadillo together with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.3 Benchmarking – III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6 Compiling without using inline 21 6.1 Example: Matrix multiplication using Rcpp . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.2 Example: Matrix multiplication using RcppArmadillo . . . . . . . . . . . . . . . . . . . . . . 27 6.3 Example: Inverting a symmetric positive definite matrix using RcppArmadillo . . . . . . . . . . 28 7 Further examples on using RcppArmadillo with inline 30 7.1 Example: Extracting submatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 8 Calling R from C++ 33 9 Building packages using Rcpp libraries 36 10 EXERCISES 37

slide-3
SLIDE 3

3

1 Calling C++ from R

C++ code can be called from R.

  • Easily done using the Rcpp package. (There are other ways not to be

discussed here).

  • The Rcpp package combines nicely with the inline package.
  • There are several existing libraries available with a C++ interface that we may

use – instead of“reinventing the wheel”(as we did when creating our own matrix multiplication functions).

slide-4
SLIDE 4

4

2 Some important libraries and packages

  • Armadillo is an open-source C++ linear algebra library (matrix maths) aiming

towards a good balance between speed and ease of use. http://arma.sourceforge.net/ R interface via RcppArmadillo

  • Eigen is a C++ template library for linear algebra: matrices, vectors, numerical

solvers, and related algorithms. http://eigen.tuxfamily.org/ R interface via RcppEigen

  • The GNU Scientific Library (GSL) is a numerical library for C and C++

programmers. http://www.gnu.org/software/gsl/ R interface via RcppGSL

slide-5
SLIDE 5

5

Notice:

  • To use these packages we must program in C++ rather than in C.
  • Dirk Eddelbuettel provides many many examples on his website:

http://dirk.eddelbuettel.com/code/rcpp.html

slide-6
SLIDE 6

6

3 Example: The exponential function

Recall our implementation of the exponential function in R:

> expfunR <- function(x, eps=.Machine$double.eps){ + if (x<0) { + 1/expfunR(-x) + } else { + n <- 0; term <- 1; ans <- 1 + while(abs(term)> eps) { + n = n + 1 + term = term * x / n + ans <- ans + term + } + ans + } + }

slide-7
SLIDE 7

7

A pure C implementation (which ignores the numerical difficulties when x < 0) is:

1 #include <math.h> 2 double C_expfun2 (double x){ 3 double ans =1.0 , term =1.0 , eps =1e -16; 4 int n=0; 5 while (fabs(term)>eps ){ 6 n++; 7 term = term * x / n; 8 ans = ans + term; 9 } 10 return(ans ); 11 }

slide-8
SLIDE 8

8

3.1 Using Rcpp together with inline

> src <- ' + double x = as<double>(x_); + double ans=1.0, term=1.0, eps=1e-16; + int n=0; + while (fabs(term)>eps){ + n++; + term = term * x / n; + ans = ans + term; + } + return(wrap(ans));'

Compiling, linking and loading is done with one function call

> library(inline) > expfunC <- cxxfunction(signature(x_="numeric"), plugin="Rcpp", body=src)

  • R data object (SEXPs) converted to C++ objects with as()
  • C++ data object converted R data objects (SEXPs) with wrap()
slide-9
SLIDE 9

9

An alternative implementation is to use the original function (accounting for nummerical difficulties when x < 0 has been resolved):

> incltxt <- 'double C_expfun2 (double x){ + double ans=1.0, term=1.0, eps=1e-16; + int n=0; + while (fabs(term)>eps){ + n++; + term = term * x / n; + ans = ans + term; + } + return(ans); + }' > expfunC2 <- cxxfunction(signature(x_="numeric"), includes=incltxt, + plugin="Rcpp", body=' + double x = as<double>(x_); + if (x>0){ + return wrap(C_expfun2(x)); + } else { + return wrap(1/C_expfun2(-x)); + } + ')

slide-10
SLIDE 10

10 > xx <- 1 > c(expfunR(xx), expfunC(xx), expfunC2(xx)) [1] 2.718282 2.718282 2.718282 > xx <- 20 > c(expfunR(xx), expfunC(xx), expfunC2(xx)) [1] 485165195 485165195 485165195 > xx <- -20 > c(expfunR(xx), expfunC(xx), expfunC2(xx)) [1] 2.061154e-09 5.621884e-09 2.061154e-09 > (expfunC(-20)-exp(-20))/exp(-20) [1] 1.727543 > (expfunC2(-20)-exp(-20))/exp(-20) [1] 2.006596e-16 >

slide-11
SLIDE 11

11 > library(rbenchmark) > cols <- c("test", "replications", "elapsed", "relative") > N <- 20000 > benchmark(expfunR(10), expfunC(10), expfunC2(10), + columns=cols, order="relative", replications=N) test replications elapsed relative 2 expfunC(10) 20000 0.08 1.0 3 expfunC2(10) 20000 0.08 1.0 1 expfunR(10) 20000 2.44 30.5

slide-12
SLIDE 12

12

4 Example: Calculating Fibonacci recursively

The standard definition of the Fibonacci sequence is fn = fn−1 + fn−2 where f0 = 0, f1 = 1. A simple recursive implementation in R is:

> fibR <- function(n){ + if (n==0) + return(0) + else + if (n==1) + return(1) + else + return(fibR(n-1)+fibR(n-2)) + }

slide-13
SLIDE 13

13

Easy to write fast C++ version:

> incltxt <-' + int fibonacci(int n){ + if (n==0) + return(0); + else + if (n==1) + return(1); + else + return(fibonacci(n-1)+fibonacci(n-2)); + }' > fibC <- cxxfunction(signature(x_="int"), + plugin="Rcpp", + include=incltxt, + body=' + int x = as<int>(x_); + return wrap( fibonacci(x) ); + ')

slide-14
SLIDE 14

14 > library(rbenchmark) > cols <- c("test", "replications", "elapsed", "relative") > M <- 30 > benchmark(fibR(M), fibC(M), + columns=cols, order="relative", replications=1) test replications elapsed relative 1 fibR(M) 1 7.86 NA 2 fibC(M) 1 0.00 NA

slide-15
SLIDE 15

15

5 Example: Matrix multiplication

Recall our first version of the matrix multiplication function:

1 /* File: matprod1.c: Calculate the product

  • f

matrices X and Y */ 2 void matprod1(double *X, int *dimX , double *Y, int *dimY , double *ans ){ 3 double sum; 4 int ii , jj , kk; 5 int nrX=dimX [0], ncX=dimX [1], nrY=dimY [0], ncY=dimY [1]; 6 7 for (ii =0; ii <nrX; ii ++){ 8 for (jj =0; jj <ncY; jj ++){ 9 sum = 0; 10 for (kk =0; kk <ncX; kk ++){ 11 sum = sum + X[ii+nrX*kk]*Y[kk+nrY*jj]; 12 } 13 ans[ii+nrX*jj] = sum; 14 } 15 } 16 }

slide-16
SLIDE 16

16

An interface using SEXPs is:

1 /* File: matprod2.c: Calculates the product

  • f

matrices X and Y */ 2 #include <Rdefines.h> 3 #include "matprod1.h" 4 5 SEXP matprod2(SEXP X, SEXP Y) { 6 int nprot =0; 7 PROTECT(X = AS_NUMERIC (X)); nprot ++; /* Digest SEXPs from R */ 8 PROTECT(Y = AS_NUMERIC (Y)); nprot ++; 9 double *xptr; xptr = REAL(X); 10 double *yptr; yptr = REAL(Y); 11 int *dimX; dimX = INTEGER(GET_DIM(X)); 12 int *dimY; dimY = INTEGER(GET_DIM(Y)); 13 SEXP ans; /* Create SEXP to hold result */ 14 PROTECT(ans = allocMatrix (REALSXP , dimX [0], dimY [1])); nprot ++; 15 double *ansptr; ansptr = REAL(ans ); 16 matprod1(xptr , dimX , yptr , dimY , ansptr ); /* Calculate product */ 17 UNPROTECT(nprot ); /* Wrap up; */ 18 return(ans ); /* Return the result to R */ 19 }

where

1 void matprod1(double *X, int *dimX , double *Y, int *dimY , double *ans );

slide-17
SLIDE 17

17

5.1 Using Rcpp together with inline

With Rcpp matrices can be indexed the usual way:

> src <- ' + NumericMatrix X(X_); + NumericMatrix Y(Y_); + NumericMatrix ans (X.nrow(), Y.ncol()); + int ii, jj, kk; + for (ii=0; ii<X.nrow(); ii++){ + for (jj=0; jj<Y.ncol(); jj++){ + double sum = 0; + for (kk=0; kk<X.ncol(); kk++){ + sum += X(ii, kk) * Y(kk, jj); + } + ans(ii,jj) = sum; + } + } + return(ans);' > library(inline) > mprod5_inline_Rcpp <- cxxfunction(signature(X_="numeric", Y_="numeric"), + body = src, plugin="Rcpp")

slide-18
SLIDE 18

18

5.2 Using RcppArmadillo together with inline

The Armadillo library is an excellent C++ package for linear algebra and RcppArmadillo makes this easy

> src <- ' + arma::mat X = Rcpp::as<arma::mat>(X_); + arma::mat Y = Rcpp::as<arma::mat>(Y_); + arma::mat ans = X * Y; + return(wrap(ans)); + ' > mprod6_inline_RcppArma <- cxxfunction(signature(X_="numeric", Y_="numeric"), + body = src, plugin="RcppArmadillo")

slide-19
SLIDE 19

19

5.3 Benchmarking – III

> library(rbenchmark) > cols <- c("test", "replications", "elapsed", "relative") > N <- 10 > A <- matrix(rnorm(1000000), nr=1000); B<-A[,1:5] > benchmark(A %*% B, mprod5_inline_Rcpp(A,B), mprod6_inline_RcppArma(A,B), + columns=cols, order="relative", replications=N) test replications elapsed relative 1 A %*% B 10 0.05 1.0 3 mprod6_inline_RcppArma(A, B) 10 0.06 1.2 2 mprod5_inline_Rcpp(A, B) 10 1.26 25.2

slide-20
SLIDE 20

20

Tentative conclusions on the benchmarking:

  • Speedwise, .Call() is better than .C().
  • Speedwise, C is better than C++.
  • Execution time of a program must be traded off with the programming time to

make the program work.

  • For larger matrices our“own homegrown”C code seems to loose to the other

competitors.

slide-21
SLIDE 21

21

6 Compiling without using inline

Rcpp based code can be compiled using R CMD SHLIB. To do so, one must tell the compiler where to find the headers and tell the linker which libraries to link against and where to find them. One way of doing so is by creating a Makevars file. Below are the Makevars files that get things to work on window and linux: Using Rcpp: Using Rcpp alone, a Makevars file with these lines work on both linux and windows:

PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"`

slide-22
SLIDE 22

22

Using RcppArmadillo: For compilation on windows, the file Makevars.win contains these lines:

PKG_LIBS = $(BLAS_LIBS) $(FLIBS) $(LAPACK_LIBS) \ $(shell "Rscript.exe" -e "Rcpp:::LdFlags()") PKG_CPPFLAGS = -I${R_HOME}/include -I${R_HOME}/library/Rcpp/include \

  • I${R_HOME}/library/RcppArmadillo/include
  • I. -DNDEBUG

For compilation on linux, the Makevars file contains the lines:

PKG_LIBS = $(BLAS_LIBS) $(FLIBS) $(LAPACK_LIBS) \ $(shell "Rscript" -e "Rcpp:::LdFlags()") ## If Rcpp etc. are installed in /usr/local/lib/R/site-library R_SITE=/usr/local/lib/R/site-library PKG_CPPFLAGS = -I${R_HOME}/include -I${R_SITE}/Rcpp/include \

  • I${R_SITE}/RcppArmadillo/include
  • I. -DNDEBUG

## If Rcpp etc. are installed in /usr/lib/R/ use instead: ### PKG_CPPFLAGS = -I${R_HOME}/include -I${R_HOME}/library/Rcpp/include \ ###

  • I${R_HOME}/library/RcppArmadillo/include
  • I. -DNDEBUG
slide-23
SLIDE 23

23

Using RcppEigen: For compilation on windows, the file Makevars.win contains these lines:

PKG_LIBS = $(BLAS_LIBS) $(FLIBS) \ $(shell "Rscript.exe" -e "Rcpp:::LdFlags()") PKG_CPPFLAGS = -I${R_HOME}/library/RcppEigen/include \

  • I${R_HOME}/library/Rcpp/include -I. -DNDEBUG

For compilation on linux, the Makevars file contains the lines

PKG_LIBS = `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"` ## If Rcpp etc. are installed in /usr/local/lib/R/site-library R_SITE=/usr/local/lib/R/site-library PKG_CPPFLAGS = -I${R_HOME}/include -I${R_SITE}/Rcpp/include \

  • I${R_SITE}/RcppEigen/include
  • I. -DNDEBUG

## If Rcpp etc. are installed in /usr/lib/R/ use instead: ## PKG_CPPFLAGS = -I${R_HOME}/library/Rcpp/include \ ##

  • I${R_HOME}/library/RcppEigen/include
  • I. -DNDEBUG
slide-24
SLIDE 24

24

NOTICE: For easier reading, a long line is broken up by putting ” \enter”as the last characters of this line and the rest of this (logical) line on the next (physical)

  • line. (Hence there must be no white space following the backslash).
slide-25
SLIDE 25

25

6.1 Example: Matrix multiplication using Rcpp

With Rcpp matrices can be indexed the usual way: Consider the file:

1 // File: matprod7.cpp 2 #include <Rcpp.h> 3 4 RcppExport SEXP matprod7( SEXP X_ , SEXP Y_){ 5 Rcpp :: NumericMatrix X(X_); 6 Rcpp :: NumericMatrix Y(Y_); 7 Rcpp :: NumericMatrix ans (X.nrow(), Y.ncol ()); 8 int ii , jj , kk; 9 for (ii =0; ii <X.nrow (); ii ++){ 10 for (jj =0; jj <Y.ncol (); jj ++){ 11 double sum = 0; 12 for (kk =0; kk <X.ncol (); kk ++){ 13 sum += X(ii , kk) * Y(kk , jj); 14 } 15 ans(ii ,jj) = sum; 16 } 17 } 18 return(ans ); 19 }

slide-26
SLIDE 26

26

To compile this file, first create a file named Makevars with the content (notice the back quotes): PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"` PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` Next, compile the file as usual:

R CMD SHLIB src/matprod7.cpp

which creates matprod7.dll / matprod7.so.

> mprod7_Rcpp <- function(X,Y){ .Call("matprod7", X, Y) } > A <- B <- matrix(1:9,nrow=3) > dyn.load("src/matprod7.dll") > mprod7_Rcpp(A, B) [,1] [,2] [,3] [1,] 30 66 102 [2,] 36 81 126 [3,] 42 96 150 > dyn.unload("src/matprod7.dll")

slide-27
SLIDE 27

27

6.2 Example: Matrix multiplication using RcppArmadillo

1 #include <Rcpp.h> 2 #include <RcppArmadillo .h> 3 4 RcppExport SEXP matprod8( SEXP X_ , SEXP Y_ ){ 5 arma :: mat X = Rcpp ::as <arma ::mat >(X_); 6 arma :: mat Y = Rcpp ::as <arma ::mat >(Y_); 7 arma :: mat ans = X * Y; 8 return Rcpp :: wrap(ans ); 9 } R CMD SHLIB src_arma/matprod8.cpp > dyn.load("src_arma/matprod8.dll") > .Call("matprod8", A, B) [,1] [,2] [,3] [1,] 30 66 102 [2,] 36 81 126 [3,] 42 96 150 > dyn.unload("src_arma/matprod8.dll")

slide-28
SLIDE 28

28

6.3 Example: Inverting a symmetric positive definite matrix using RcppArmadillo

Consider inverting a symmetric positive defininite matrix. An R approach to doing so is via a Cholesky decomposition

> spdinv_R <- function(X_) {chol2inv(chol(X_))}

An RcppArmadillo based implementation (where we do exploit that the matrix is symmetric and positive definite):

1 /* File: spdinv -arma.cpp */ 2 #include <Rcpp.h> 3 #include <RcppArmadillo .h> 4 5 RcppExport SEXP C_spdinv_arma ( SEXP X_ ){ 6 arma :: mat X = Rcpp ::as <arma ::mat >(X_); 7 arma :: mat Xinv = arma :: inv( arma :: sympd(X) ); 8 return(Rcpp :: wrap(Xinv )); 9 } R CMD SHLIB src_arma/spdinv-arma.cpp

slide-29
SLIDE 29

29

Benchmarks

> library(MASS) > library(Matrix) > PP <- 10 > X <- cov(mvrnorm(1000, mu=rep(0,PP), Sigma=diag(1,PP))) > Xm <- Matrix(X) > library(rbenchmark) > cols <- c("test", "replications", "elapsed", "relative") > dyn.load("src_arma/spdinv-arma.dll") > benchmark(spdinv_R(X), solve(X), solve(Xm), .Call("C_spdinv_arma", X), + columns=cols, replications=10000) test replications elapsed relative 4 .Call("C_spdinv_arma", X) 10000 0.06 1.000 2 solve(X) 10000 0.92 15.333 3 solve(Xm) 10000 0.26 4.333 1 spdinv_R(X) 10000 0.88 14.667 > dyn.unload("src_arma/spdinv-arma.dll")

slide-30
SLIDE 30

30

7 Further examples on using RcppArmadillo with inline

slide-31
SLIDE 31

31

7.1 Example: Extracting submatrices

> library(inline) > library(Rcpp) > submat <- cxxfunction(signature(r_SS='float', r_rows='int', r_cols='int'), body = ' + using namespace arma; + using namespace Rcpp; + mat SS = as<mat>(r_SS); + uvec rowsi = as<uvec>(r_rows); + uvec colsi = as<uvec>(r_cols); + int has_col = colsi[0], has_row = rowsi[0]; + if (has_col==0) { + if (has_row==0) { + mat yy = SS; return wrap(yy); + } else { + mat yy = SS.rows(rowsi-1); return wrap(yy); + } + } else { + if (has_row==0) { + mat yy = SS.cols(colsi-1); return wrap(yy); + } else { + mat yy = SS.submat(rowsi-1, colsi-1); return wrap(yy); + } + } + ', plugin='RcppArmadillo')

slide-32
SLIDE 32

32 > M <- matrix(1:12,nr=3) > submat(M, 0, 0) [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > submat(M, 1:2, 0) [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 > submat(M, 0, 2:3) [,1] [,2] [1,] 4 7 [2,] 5 8 [3,] 6 9 > submat(M, 1:2, 2:3) [,1] [,2] [1,] 4 7 [2,] 5 8

slide-33
SLIDE 33

33

8 Calling R from C++

> toString_ <- cxxfunction(signature(x_ = "", sep_=""), plugin = "Rcpp", + body = ' + using namespace Rcpp; + using namespace std; + CharacterVector x=as<CharacterVector>(x_); + CharacterVector sep=as<CharacterVector>(sep_); + std::string res; + for (int ii=0; ii<(x.length())-1; ii++){ + res += x[ii]; + res += as<string>(sep); + } + res += x[x.length()-1]; + return(wrap(res));') > toString_(c("foo","bar","bob"),";") [1] "foo;bar;bob" > toString_(c(1,2,3),";") [1] "1;2;3"

slide-34
SLIDE 34

34 > get_index_ <- cxxfunction(signature(x_ = "SEXP", V_="SEXP"), plugin = "Rcpp", + body = ' + using namespace Rcpp; + using namespace std; + Function R_match("match"); + CharacterVector V(V_); + List x(x_); + List res=List::create(); + for (int ii=0; ii<x.length(); ii++){ + CharacterVector ee = as<CharacterVector>(x[ii]); + IntegerVector out = R_match(wrap(ee), V_); + Rf_PrintValue(out); + res.push_back(out); + } + return(wrap(res)); + ') > a<-get_index_(list(c("a","b"),c("b","c"), c("a","c","e"), c("a","c","q")), letters[1:10]) [1] 1 2 [1] 2 3 [1] 1 3 5 [1] 1 3 NA > str(a) List of 4 $ : int [1:2] 1 2 $ : int [1:2] 2 3 $ : int [1:3] 1 3 5 $ : int [1:3] 1 3 NA

slide-35
SLIDE 35

35

slide-36
SLIDE 36

36

9 Building packages using Rcpp libraries

Your friends are:

> library(Rcpp) > Rcpp.package.skeleton() > library(RcppArmadillo) > RcppArmadillo.package.skeleton

slide-37
SLIDE 37

37

10 EXERCISES

  • 1. Using inline and RcppArmadillo, implement a function that calculates the

conditional variance in a multivariate normal distribution, i.e. V ar(Ya|Yb) = Σaa − ΣabΣ−1

bb Σba

  • 2. Compare the performance in computing time with a pure R implementation:

> condVarR <- function(SS, aa, bb){ + SS[aa,aa] - SS[aa,bb,drop=FALSE] %*% solve(SS[bb,bb,drop=FALSE],SS[bb,aa,drop=FALSE]) + }