Rcpp at 1000 Reverse Depends: Some Observations 2/23 More a stream - - PowerPoint PPT Presentation

rcpp at 1000 reverse depends some observations
SMART_READER_LITE
LIVE PREVIEW

Rcpp at 1000 Reverse Depends: Some Observations 2/23 More a stream - - PowerPoint PPT Presentation

Dirk Eddelbuettel DSC 2017 July 3, 2017 Ketchum Trading; Debian and R Projects 1/23 Rcpp at 1000 Reverse Depends: Some Observations 2/23 More a stream of consiousness Outline Some Notes about Rcpp, ever so briefly about testing


slide-1
SLIDE 1

Rcpp at 1000 Reverse Depends: Some Observations

Dirk Eddelbuettel DSC 2017 July 3, 2017

Ketchum Trading; Debian and R Projects 1/23

slide-2
SLIDE 2

Outline

Some Notes …

· about Rcpp, ever so briefly · about testing · about APIs

More a stream of consiousness

2/23

slide-3
SLIDE 3

Why now?

A few points

· 1000 depends is a nice milestone to summarize · Rcpp is a fairly widely used package (over 1k direct depends) · Rcpp affects a number of packages (over 7k recursive depends) · We try to take testing somewhat seriously

3/23

slide-4
SLIDE 4

Rcpp

4/23

slide-5
SLIDE 5

Rcpp

Team Effort

· Dominic had the early vision · Romain turned the dial to 11, and again, and again · Doug and John provided early adult oversight · JJ gave us Rcpp Attributes and much wisdom · Kevin, KK, and Nathan are keeping the wheels on

5/23

slide-6
SLIDE 6

Usage

2010 2012 2014 2016 200 400 600 800 1000

Growth of Rcpp usage on CRAN

Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages using Rcpp (right axis) 200 400 600 800 1000 2010 2012 2014 2016 2 4 6 8 10 Data current as of July 1, 2017.

6/23

slide-7
SLIDE 7

Pagerank

library(pagerank) # github.com/andrie/pagerank cran <- ”http://cloud.r-project.org” pr <- compute_pagerank(cran) ## ## Attaching package: ’utils’ ## The following objects are masked from ’package:Rcpp’: ## ## .DollarNames, prompt round(100*pr[1:5], 3) ## Rcpp MASS ggplot2 Matrix mvtnorm ## 2.688 1.569 1.199 0.870 0.684

7/23

slide-8
SLIDE 8

Pagerank

boot rgl raster doParallel zoo nlme RCurl RColorBrewer coda shiny XML reshape2 foreach magrittr data.table igraph jsonlite sp RcppArmadillo httr stringr lattice dplyr plyr survival mvtnorm Matrix ggplot2 MASS Rcpp 0.005 0.010 0.015 0.020 0.025

Top 30 of Page Rank as of July 2017

8/23

slide-9
SLIDE 9

Pagerank

9/23

slide-10
SLIDE 10

CRAN Proportion

db <- tools::CRAN_package_db() # R 3.4.0 or later dim(db) ## [1] 10958 65 ## all Rcpp reverse depends (c(n_rcpp <- length(tools::dependsOnPkgs(”Rcpp”, recursive=FALSE, installed=db)), n_compiled <- table(db[, ”NeedsCompilation”])[[”yes”]])) ## [1] 1074 2928 ## Rcpp percentage of packages with compiled code n_rcpp / n_compiled ## [1] 0.3668033

10/23

slide-11
SLIDE 11

One Example

11/23

slide-12
SLIDE 12

Example: Convolution

#include <R.h> #include <Rinternals.h> SEXP convolve2(SEXP a, SEXP b) { int na, nb, nab; double *xa, *xb, *xab; SEXP ab; a = PROTECT(coerceVector(a, REALSXP)); b = PROTECT(coerceVector(b, REALSXP)); na = length(a); nb = length(b); nab = na + nb - 1; ab = PROTECT(allocVector(REALSXP, nab)); xa = REAL(a); xb = REAL(b); xab = REAL(ab); for (int i = 0; i < nab; i++) xab[i] = 0.0; for (int i = 0; i < na; i++) for (int j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j]; UNPROTECT(3); return ab; }

12/23

slide-13
SLIDE 13

Example: Convolution

#include <Rcpp.h> // [[Rcpp::export]] Rcpp::NumericVector convolve2cpp(Rcpp::NumericVector a, Rcpp::NumericVector b) { int na = a.length(), nb = b.length(); Rcpp::NumericVector ab(na + nb - 1); for (int i = 0; i < na; i++) for (int j = 0; j < nb; j++) ab[i + j] += a[i] * b[j]; return(ab); }

13/23

slide-14
SLIDE 14

Example: C++ from the R prompt

cppFunction(”Rcpp::NumericVector convolve2cpp(Rcpp::NumericVector a, Rcpp::NumericVector b) { int na = a.length(), nb = b.length(); Rcpp::NumericVector ab(na + nb - 1); for (int i = 0; i < na; i++) for (int j = 0; j < nb; j++) ab[i + j] += a[i] * b[j]; return(ab); }”) convolve2cpp(1:4, 4:1) ## [1] 4 11 20 30 20 11 4

14/23

slide-15
SLIDE 15

Testing

15/23

slide-16
SLIDE 16

Cost of testing

No Free Lunch

· Single run on a decent machine now takes more than a workday · Should be easy-ish to parallelize (given resources) · But that has not yet happened. · Is testing support a community thing? R Hub?

16/23

slide-17
SLIDE 17

Change in testing?

No Free Lunch

· Do we need to rethink testing?

· only packages which themselves are impactful? (maybe) · only packages which were updated recently? (maybe not) · only packages which may have failed in the past? (possibly)) · other ways to subsample?

· This both an engineering and a statistics questions so …

17/23

slide-18
SLIDE 18

Tests no be all end all

Still No Free Lunch

· Tests really only run the code they cover · Rcpp has e.g. code generators, we generally do not regenerate in

client packages

· The one minute cap via CRAN Policy means we suppress tests

18/23

slide-19
SLIDE 19

API

19/23

slide-20
SLIDE 20

Rcpp as an R Extension

That worked well

· Package system and design work as plan · Access of C API of R now easier to access · Good division of labour

20/23

slide-21
SLIDE 21

Should Rcpp be promoted into Base R?

Question I get asked sometime

· Probably not · If “you” take it “you” get to work on it · Smaller base good design principle

21/23

slide-22
SLIDE 22

API Re-Use ? · RApiSerialize · RApiDatetime · There could potentially be much more · How can “we” (R users) get better (programmatic) access to what

is already in R?

· Does the (relatively) wide use of Rcpp mean the core API is too

hard to use?

22/23

slide-23
SLIDE 23

Summary

Next Steps?

· Possible room for improvement on testing · Possible need for better testing support · Possible to open the API a little more

23/23