renjin the new r interpreter
play

Renjin: The new R interpreter built on the JVM What? Renjin is a - PowerPoint PPT Presentation

Alexander Bertram BeDataDriven Renjin: The new R interpreter built on the JVM What? Renjin is a new interpreter for the R language. Core & Base GNU Builtins R Language Written in Packages Stats Java Graphics Why? Performance


  1. Alexander Bertram BeDataDriven Renjin: The new R interpreter built on the JVM

  2. What? Renjin is a new interpreter for the R language. Core & Base GNU Builtins R Language Written in Packages Stats Java Graphics

  3. Why? Performance Memory Easier Integration Parallel Speed ism Java Virtual Machine JIT GC tools 500k libs

  4. Sure, but why Renjin? bigvis + High performance for specific applications biglm Packages - Require rewriting existing code scaleR - Limited applicability scaleR + Marginal improvements for all code - Unable to address underlying limitations Forks pqR of the GNU R interpreter

  5. What do I get, like, today?

  6. Flexible Command-Line Embeddable Interpreter Java Library > renjin – f myscript.R Web-based REPL

  7. Multiple In-process sessions, Shared Data Web Request Web Request Web Request Renjin Session 1 Renjin Session 2 Renjin Session 3 Vector Immutable Data Java Virtual Machine Structures

  8. Memory Efficiency # GNU R Renjin x <- runif(1e8) # +721 MB + 721 MB y <- x + 1 # +761 MB comment(y) <- "important!" # +763 MB - getAttributes() Vector Interface - length() - getElement(int index)

  9. packages.renjin.org Proper Pre-built Dependency Package Management Repository Translation of Automated C/Fortran to Testing of JVM Bytecode Renjin

  10. Seamless Access to Java/Scala Classes import(com.acme.Customer) bob <- Customer$new(name='Bob', age=36) carol <- Customer$new(name='Carole', age=41) bob$name <- "Bob II" cat(c("Name: ", bob$name, "; Age: ", bob$age))

  11. Simple to embed in larger systems // create a script engine manager ScriptEngineManager factory = new ScriptEngineManager(); // create an R engine ScriptEngine engine = factory.getEngineByName("Renjin"); // load package from classpath engine.eval( “library(survey)" ); // evaluate R code from String engine.eval("print('Hello, World')"); // evaluate R script on disk engine.eval(new FileReader("myscript.R")); // evaluate R script from classpath engine.eval(new InputStreamReader( getClass().getResourceAsStream("myScript.R")));

  12. Package Development in Java @DataParallel @Deferrable public static String chartr( String oldChars , String newChars , @Recycle String x ) { StringBuilder translation = new StringBuilder ( x . length ()); for(int i = 0 ; i != x . length ();++ i ) { int codePoint = x . codePointAt ( i ); int charIndex = oldChars . indexOf ( codePoint ); if( charIndex == - 1 ) { translation . appendCodePoint ( codePoint ); } else { translation . appendCodePoint ( newChars . codePointAt ( charIndex )); } } return translation . toString (); }

  13. Under the hood

  14. Specialized Execution Modes “Slow” - Supports full dynamism of R AST - Compute on the language Interpreter - Acts like a query planner Vector - Batches, auto-parallelizes vector Pipeliner workflows - Partially evaluates & Scalar compiles loop bodies, Compiler apply functions to JVM byte code

  15. Queuing up work for the Vector Pipeliner x <- runif (1e6) y <- sqrt (x + 1) z <- mean (y) - mean (x) attr (z, 'comments') <- 'still not computed' print ( length (z)) # prints "1" # but doesn't #evaluate the mean print (z) # triggers computation

  16. x <- runif (1e6) y <- sqrt (x + 1) z <- mean (y) - mean (x)

  17. Real-world case study: Distance Correlation in the Energy Package

  18. Distance correlation : robust measure of association. Zero if and only if variables are independent.

  19. dcor <- function (x, y, index = 1) { x <- as.matrix ( dist (x)) y <- as.matrix ( dist (y)) n <- nrow (x) dist(x) m <- nrow (y) Evaluates as a dims <- c (n, ncol (x), ncol (y)) view Akl <- function (x) { d <- as.matrix (x)^index m <- rowMeans (d) M <- mean (d) Defer a <- sweep (d, 1, m) b <- sweep (a, 2, m) rowMeans(x) return (b + M) until later } A <- Akl(x) B <- Akl(y) dCov <- sqrt ( mean (A * B)) dVarX <- sqrt ( mean (A * A)) dVarY <- sqrt ( mean (B * B)) Need to V <- sqrt (dVarX * dVarY) if (V > 0) evaluate dCor <- dCov/V else dCor <- 0 return ( list (dCov = dCov, dCor = dCor, dVarX = dVarX, dVarY = dVarY)) }

  20. Run time of distance correlation of 10 pairs of variables 200 180 GNU R C Renjin 160 140 120 100 80 60 40 20 0 1000 2000 5000 10000 Number of Observations

  21. Where do we go from here?

  22. Inspired by…

  23. Join us! Download & Test Contract us Contribute! for Commercial Support Sponsor Development! > Renjin.org

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend