directions in statistical computing 2014 renjin s jit
play

Directions in Statistical Computing 2014 Renjin's JIT Thinking - PowerPoint PPT Presentation

Directions in Statistical Computing 2014 Renjin's JIT Thinking about R as a Query Language Alexander Bertram BeDataDriven 2014 1 Quick Intro: Renjin R-language Interpreter written in Java, uses GNU R core packages (base, stats, etc)


  1. Directions in Statistical Computing 2014 Renjin's JIT Thinking about R as a Query Language Alexander Bertram BeDataDriven 2014 1

  2. Quick Intro: Renjin ● R-language Interpreter written in Java, uses GNU R core packages (base, stats, etc) as-is ● Goals: Completeness first, performance next ● C/Fortran: Supported with translator and emulation layer ● Can run roughly ~50% of CRAN packages (see packages.renjin.org) ● Actively user group, diverse 2014 2

  3. R as a “Query Language” How can R be as fast as Fortran or C++ ? How can R be more like SQL? – Analyst describes the what – Query planner determines the how ● Implicit parallelism ● Target diverse architechture (in-memory, single node, clusters) 2014 3

  4. Is R dynamic? Argument: Not where/when performance matters 2014 4

  5. “But R is too dynamic!” airlines <- read.bigtable(“airlines”) Complicated print(nrow(airlines)) # ~240m Argument Matching fit.exp <- function(x, max.iter = 10 ) { rate <- 1 / mean(x) repeat { loglik <- sum (-dexp(r = rate, x = lambda, log = T) if( goodEnough(loglik) ) break rate <- next } } sum() is group Is the break() generic, function dispatches based redefined? on argument 2014 5

  6. airlines <- read.bigtable(“airlines”) delay <- airlines$delay[airlines$ delay > 30] dexp <- function ( x , rate=1, log = FALSE) { mean <- 1/rate d <- exp(- x / mean ) / mean if(log) return(log( d )) d } fit.exp <- function ( x , max.iter = 10 ) { rate <- 1 / mean( x ) repeat { loglik <- sum(-dexp(r = rate , x , log = T) if( logLik > epsilon ) break rate <- update(rate) } } rate <- fit.exp 2014 6

  7. Real world example: Distance Correlation [ see energy package] 2014 7

  8. 2014 8

  9. Optimizations: Views x <- dist(x) y <- dist(y) x <- as.matrix(x) y <- as.matrix(y) # GNU R: x^2 + y^2 memory alloc'd # Renjin: ~ 0 2014 9

  10. DistanceMatrix public class DistanceMatrix extends DoubleVector { private Vector vector; public double getElementAsDouble(int index) { int size = vector.length(); int row = index % size; int col = index / size; if(row == col) { return 0; } else { double x = vector.getElementAsDouble(row); double y = vector.getElementAsDouble(col); return Math.abs(x - y); } } public int length() { return vector.length() * vector.length(); } } 2014 10

  11. Deferred Evalution ● Defer computation of pure functions when inputs exceed some threshold: x <- (1:100) + 4 # x is computed y <- (1:e^6) + 4 # no work done # x is a view z <- y – mean(z) z <- dnorm(z) print(z) # triggers evaluation 2014 11

  12. 2014 12

  13. Query Planner ● Once evaluation is triggered: we have a better broad view of the calcuation to be completed ● Computation Graph is essentially a pure function ● We can reorder operations, and easily see which branches can be evaluated independently, in parallel 2014 13

  14. 2014 14

  15. Loop Fusion mean(op1(op2(op3(x))) transformed to... double sum = 0; for(int i..1000) { sum += op1(op2(op3)) } 2014 15

  16. Beyond Bytecode JVM Byte Code → JVM Byte Code → Native Machine Code Native Machine Code SQL Query OpenCL 2014 16

  17. Results 2014 17

  18. 2014 18

  19. Loops! m <- 4 for (i in 1:m) { x = exp (tanh (a^2 * (b^2 + i/m))) r[i%%10+1] = r[i%%10+1] + sum(x) } Kaboom! (thanks Radford!) 2014 19

  20. Loops! ● R gives you the flexibility to mix imperative with functional approaches ● In many dynamic languages (JS, Ruby), sophisticated runtime analysis is required to identify and compile hotspots in the code. ● In R, they're pretty easy to spot: x <- 1:1e6 for(i in seq_along(x)) { ... } 2014 20

  21. for (i in 1:m) { x = exp (tanh (a^2 * (b^2 + i/m))) r[i%%10+1] = r[i%%10+1] + sum(x) } BB4: [L2] BB3: [L1] ₃ ₂ Λ0 ← increment counter Λ0 ₂ ₃ ₂ i ← τ [Λ0 ] BB1: goto L0 ₄ ₀ ₃ ₀ τ ← (^ a 2.0d) τ ← (: 1.0d m ) ₅ ₀ τ ← (^ b 2.0d) Λ0 ← 0 ₁ τ ← (/ i m ) ₆ ₂ ₀ BB5: [L3] τ ← length(τ ) ₂ ₃ τ ← (+ τ τ ) ₇ ₅ ₆ return NULL τ ← (* τ τ ) ₈ ₄ ₇ BB2: [L0] ₉ ₈ τ ← (tanh τ ) r ← Φ(r , r ) ₁ ₀ ₂ ₂ ₉ x ← (exp τ ) ₂ ₁ ₃ Λ0 ← Φ(Λ0 , Λ0 ) ₁₀ ₂ τ ← (%% i 10.0d) ₁ ₀ ₂ i ← Φ(i , i ) ₁₁ ₁₀ τ ← (+ τ 1.0d) ₁ ₀ ₂ x ← Φ(x , x ) τ ← ([ r τ ) ₁₂ ₁ ₁₁ if Λ0 >= τ => TRUE:L3, ₂ ₂ τ ← (sum x ) ₁₃ ₂ FALSE:L1, NA:ERROR τ ← (%% i 10.0d) ₁₄ ₂ ₁₅ ₁₄ τ ← (+ τ 1.0d) ₁₆ ₂ τ ← (%% i 10.0d) ₁₇ ₁₆ τ ← (+ τ 1.0d) r ← ([<- r τ ) ₂ ₁ ₁₇ 2014 21

  22. Compared to other dynamic languages? ● Argument: Speculative specialization works very well for long-running code, but unnecessary for most statistical code with many loops: – Simulations – Iterative algorithms – ? ● Needs to be tested... 2014 22

  23. packages.renjin.org 2014 23

  24. Developing CI + benchmarking system for testing optimizations 2014 24

  25. More Information ● http://www.renjin.org ● http://packages.renjin.org ● http://docs.renjin.org/en/latest/ 2014 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend