lazyeval A uniform approach to NSE July 2016 Hadley Wickham - - PowerPoint PPT Presentation

lazyeval
SMART_READER_LITE
LIVE PREVIEW

lazyeval A uniform approach to NSE July 2016 Hadley Wickham - - PowerPoint PPT Presentation

lazyeval A uniform approach to NSE July 2016 Hadley Wickham @hadleywickham Chief Scientist, RStudio Motivation Take this simple variant of subset() subset <- function(df, condition) { cond <- substitute(condition) rows <-


slide-1
SLIDE 1

Hadley Wickham 
 @hadleywickham
 Chief Scientist, RStudio

lazyeval

A uniform approach to NSE

July 2016

slide-2
SLIDE 2

Motivation

slide-3
SLIDE 3

subset <- function(df, condition) { cond <- substitute(condition) rows <- eval(cond, df, parent.frame()) rows[is.na(rows)] <- FALSE df[rows, , drop = FALSE] }

Take this simple variant of subset()

slide-4
SLIDE 4

subset( my_data_frame_with_a_very_long_name, x > 10 & y > 10 ) # vs. my_data_frame_with_a_very_long_name[ my_data_frame_with_a_very_long_name$x > 10 & my_data_frame_with_a_very_long_name$y > 10, ] # and hence makes the code clearer

Pro: it reduces typing

slide-5
SLIDE 5

df <- data.frame(x = c(1:5, NA)) subset(df, x > 3) #> x #> 4 4 #> 5 5 # vs. df[df$x > 3, ] #> [1] 4 5 NA

Pro: it alleviates two common frustrations

slide-6
SLIDE 6

rows <- cyl == 6 my_subset(mtcars, row)

Con: you can’t define then use the arguments

slide-7
SLIDE 7

my_subset <- function(df, cond) { subset(df, cond) } my_subset(mtcars, cyl == 6) #> Error in eval(expr, envir, enclos) : #> object 'cyl' not found

Con: it fails with the simplest wrapper

slide-8
SLIDE 8

Con: it’s hard to safely compose

threshold_x <- function(df, threshold) { subset(df, x > threshold) } # Silently gives incorrect result if: # (a) no x col in df, but x var in parent # (b) df has threshold column

slide-9
SLIDE 9

Con: it’s hard to safely parameterise

# I think this is the best you can do threshold <- function(df, var, threshold) { stopifnot(is.name(var)) eval(substitute(subset(df, var > threshold))) }

slide-10
SLIDE 10

Can we do better?

slide-11
SLIDE 11

subset <- function(df, condition) { cond <- substitute(condition) rows <- eval(cond, df, parent.frame()) rows[is.na(rows)] <- FALSE df[rows, , drop = FALSE] }

Can we do better?

slide-12
SLIDE 12

sieve <- function(df, condition) { rows <- lazyeval::f_eval(condition, df) rows[is.na(rows)] <- FALSE df[rows, , drop = FALSE] }

Here is one approach

slide-13
SLIDE 13

subset(mtcars, mpg > 30) # vs. sieve(mtcars, ~ mpg > 30)

Con: requires 1-2 more characters

slide-14
SLIDE 14

# This works: x <- ~ mpg > 30 sieve(mtcars, x) # As does this: my_sieve <- function(df, condition) { sieve(df, condition) } # And this: n <- 10 my_sieve(mtcars, ~ x > n)

Pro: it’s referentially transparent

slide-15
SLIDE 15

library(lazyeval) # Because a formula captures both the # expression and the environment f <- ~ mpg > 30 f_rhs(f) #> mpg > 30 f_env(f) #> <environment: R_GlobalEnv>

Why does this work?

slide-16
SLIDE 16

sieve <- function(df, condition) { rows <- f_eval(condition, df) rows[is.na(rows)] <- FALSE df[rows, , drop = FALSE] }

Most important new function is f_eval()

slide-17
SLIDE 17

# f_eval() is 90% this: f_eval <- function(f, data) { eval(f_rhs(f), data, f_env(f)) } # But it provides two useful features: # (a) pronouns to disambiguate # (b) full quasiquotation engine

f_eval() is mostly simple:

slide-18
SLIDE 18

threshold_x <- function(df, threshold) { sieve(df, ~ .data$x > .env$threshold) } # This will never fail silently

Can use pronouns in to disambiguate:

slide-19
SLIDE 19

threshold <- function(df, var, threshold) { sieve(df, ~ uq(var) > .env$threshold) } threshold(mtcars, ~mpg, 30) # Similar to to bquote() but also provides # unquote-splice: uqs()

Can use quasiquotation to parameterise:

slide-20
SLIDE 20

sieve <- function(df, condition) { sieve_(df, f_capture(condition)) } sieve_ <- function(df, condition) { rows <- f_eval(condition, df) rows[is.na(rows)] <- FALSE df[rows, , drop = FALSE] }

What if you want to eliminate the ~?

Turns promise into formula Convention: always provide SE version with _ suffix

slide-21
SLIDE 21

Another motivation

slide-22
SLIDE 22

grid <- seq(0, pi, , 30) sinx <- sin(grid) plot(grid, sinx) # Inside plot: xlabel <- deparse(subsitute(xlab))

NSE commonly used for labelling

  • 0.0

1.0 2.0 3.0 0.0 0.4 0.8 grid sinx

slide-23
SLIDE 23

deparse(quote({ a + b c + d })) # Not a problem for plot, but I've been # bitten by this many times in error messages

Con: deparse() returns a vector!

slide-24
SLIDE 24

myplot <- function(x, y) { plot(x, y, pch = 20, cex = 2) } myplot(1:10, runif(10))

Con: substitute() doesn’t follow chain of promises

  • 2

4 6 8 10 0.2 0.6 x y

slide-25
SLIDE 25

# Like substitute, but finds "root" promise expr_find(x) expr_env(x, default_env) # Couple of helpers to convert to strings expr_text(x) expr_label(x)

lazyeval also provides some tools

slide-26
SLIDE 26 SEXP base_promise(SEXP promise, SEXP env) { while(TYPEOF(promise) == PROMSXP) { env = PRENV(promise); promise = PREXPR(promise); if (env == R_NilValue) break; if (TYPEOF(promise) == SYMSXP) { SEXP obj = Rf_findVar(promise, env); if (TYPEOF(obj) != PROMSXP) break; if (is_lazy_load(obj)) break; promise = obj; } } return promise; }

Implementation is relatively straightforward

slide-27
SLIDE 27

Conclusion

slide-28
SLIDE 28
  • 1. Where possible, use formulas

instead of NSE.

  • 2. Provide pronouns to

disambiguate.

  • 3. Use quasiquotation to

parameterise.

slide-29
SLIDE 29

lazyeval

http://rpubs.com/hadley/lazyeval

https://github.com/hadley/lazyeval/