Renjin: The new R interpreter built on the JVM What? Renjin is a - - PowerPoint PPT Presentation

renjin the new r interpreter
SMART_READER_LITE
LIVE PREVIEW

Renjin: The new R interpreter built on the JVM What? Renjin is a - - PowerPoint PPT Presentation

Alexander Bertram BeDataDriven Renjin: The new R interpreter built on the JVM What? Renjin is a new interpreter for the R language. Core & Base GNU Builtins R Language Written in Packages Stats Java Graphics Why? Performance


slide-1
SLIDE 1

Renjin: The new R interpreter built on the JVM

Alexander Bertram BeDataDriven

slide-2
SLIDE 2

What?

Renjin is a new interpreter for the R language.

Core & Builtins Written in Java Base Stats Graphics

GNU R Language Packages

slide-3
SLIDE 3

Why?

Memory

Easier Integration

Speed

Performance Java Virtual Machine

GC 500k libs JIT tools Parallel ism

slide-4
SLIDE 4

Sure, but why Renjin?

Packages Forks

biglm bigvis scaleR scaleR

+ High performance for specific applications

  • Require rewriting existing code
  • Limited applicability

pqR

+ Marginal improvements for all code

  • Unable to address underlying limitations
  • f the GNU R interpreter
slide-5
SLIDE 5

What do I get, like, today?

slide-6
SLIDE 6

Flexible

> renjin –f myscript.R

Command-Line Interpreter Embeddable Java Library Web-based REPL

slide-7
SLIDE 7

Java Virtual Machine Renjin Session 1 Renjin Session 2 Renjin Session 3 Vector Web Request Web Request Web Request Immutable Data Structures

Multiple In-process sessions, Shared Data

slide-8
SLIDE 8

Memory Efficiency

# GNU R Renjin x <- runif(1e8) # +721 MB + 721 MB y <- x + 1 # +761 MB comment(y) <- "important!" # +763 MB

  • getAttributes()

Vector Interface

  • length()
  • getElement(int index)
slide-9
SLIDE 9

packages.renjin.org

Proper Dependency Management Pre-built Package Repository Automated Testing of Renjin Translation of C/Fortran to JVM Bytecode

slide-10
SLIDE 10

Seamless Access to Java/Scala Classes

import(com.acme.Customer) bob <- Customer$new(name='Bob', age=36) carol <- Customer$new(name='Carole', age=41) bob$name <- "Bob II" cat(c("Name: ", bob$name, "; Age: ", bob$age))

slide-11
SLIDE 11

Simple to embed in larger systems

// create a script engine manager ScriptEngineManager factory = new ScriptEngineManager(); // create an R engine ScriptEngine engine = factory.getEngineByName("Renjin"); // load package from classpath engine.eval(“library(survey)"); // evaluate R code from String engine.eval("print('Hello, World')"); // evaluate R script on disk engine.eval(new FileReader("myscript.R")); // evaluate R script from classpath engine.eval(new InputStreamReader( getClass().getResourceAsStream("myScript.R")));

slide-12
SLIDE 12

Package Development in Java

@DataParallel @Deferrable public static String chartr( String oldChars, String newChars, @Recycle String x) { StringBuilder translation = new StringBuilder(x.length()); for(int i=0;i!=x.length();++i) { int codePoint = x.codePointAt(i); int charIndex = oldChars.indexOf(codePoint); if(charIndex == -1) { translation.appendCodePoint(codePoint); } else { translation.appendCodePoint( newChars.codePointAt(charIndex)); } } return translation.toString(); }

slide-13
SLIDE 13

Under the hood

slide-14
SLIDE 14

Specialized Execution Modes

“Slow” AST Interpreter Vector Pipeliner Scalar Compiler

  • Supports full dynamism of R
  • Compute on the language
  • Acts like a query planner
  • Batches, auto-parallelizes vector

workflows

  • Partially evaluates &

compiles loop bodies, apply functions to JVM byte code

slide-15
SLIDE 15

Queuing up work for the Vector Pipeliner

x <- runif(1e6) y <- sqrt(x + 1) z <- mean(y) - mean(x) attr(z, 'comments') <- 'still not computed' print(length(z)) # prints "1" # but doesn't #evaluate the mean print(z) # triggers computation

slide-16
SLIDE 16

x <- runif(1e6) y <- sqrt(x + 1) z <- mean(y) - mean(x)

slide-17
SLIDE 17

Real-world case study:

Distance Correlation in the Energy Package

slide-18
SLIDE 18

Distance correlation: robust measure of association. Zero if and only if variables are independent.

slide-19
SLIDE 19

dcor <- function (x, y, index = 1) { x <- as.matrix(dist(x)) y <- as.matrix(dist(y)) n <- nrow(x) m <- nrow(y) dims <- c(n, ncol(x), ncol(y)) Akl <- function(x) { d <- as.matrix(x)^index m <- rowMeans(d) M <- mean(d) a <- sweep(d, 1, m) b <- sweep(a, 2, m) return(b + M) } A <- Akl(x) B <- Akl(y) dCov <- sqrt(mean(A * B)) dVarX <- sqrt(mean(A * A)) dVarY <- sqrt(mean(B * B)) V <- sqrt(dVarX * dVarY) if (V > 0) dCor <- dCov/V else dCor <- 0 return(list(dCov = dCov, dCor = dCor, dVarX = dVarX, dVarY = dVarY)) }

dist(x) Evaluates as a view Defer rowMeans(x) until later Need to evaluate

slide-20
SLIDE 20
slide-21
SLIDE 21

20 40 60 80 100 120 140 160 180 200 1000 2000 5000 10000 Number of Observations

GNU R C Renjin Run time of distance correlation of 10 pairs of variables

slide-22
SLIDE 22

Where do we go from here?

slide-23
SLIDE 23

Inspired by…

slide-24
SLIDE 24

Join us!

Download & Test

Contribute!

Contract us for Commercial Support

Sponsor Development!

> Renjin.org