Robust code R Functions What do these calls do? > df[, vars] - - PowerPoint PPT Presentation

robust code
SMART_READER_LITE
LIVE PREVIEW

Robust code R Functions What do these calls do? > df[, vars] - - PowerPoint PPT Presentation

R Functions Robust code R Functions What do these calls do? > df[, vars] > subset(df, x == y) > data.frame(x = "a") R Functions Interactive analysis Helpful Programming Strict R Functions Three main problems


slide-1
SLIDE 1

R Functions

Robust code

slide-2
SLIDE 2

R Functions

What do these calls do?

> df[, vars] > subset(df, x == y) > data.frame(x = "a")

slide-3
SLIDE 3

R Functions

Interactive analysis Programming

Helpful Strict

slide-4
SLIDE 4

R Functions

Three main problems

  • Type-unstable functions
  • Non-standard evaluation
  • Hidden arguments
slide-5
SLIDE 5

R Functions

Throwing errors

> x <- 1:10 > stopifnot(is.character(x)) Error: is.character(x) is not TRUE

slide-6
SLIDE 6

R Functions

Throwing errors

> x <- 1:10 > stopifnot(is.character(x)) Error: is.character(x) is not TRUE

if (condition) { stop("Error", call. = FALSE) }

slide-7
SLIDE 7

R Functions

Throwing errors

> if (!is.character(x)) { stop("`x` should be a character vector", call. = FALSE) } Error: `x` should be a character vector > x <- 1:10 > stopifnot(is.character(x)) Error: is.character(x) is not TRUE

if (condition) { stop("Error", call. = FALSE) }

slide-8
SLIDE 8

R Functions

Let’s practice!

slide-9
SLIDE 9

R Functions

Unstable types

slide-10
SLIDE 10

R Functions

Surprises due to unstable types

  • Type-inconsistent: the type of the return object depends on

the input

  • Surprises occur when you’ve used a type-inconsistent

function inside your own function

  • Sometimes lead to hard to decipher error messages
slide-11
SLIDE 11

R Functions

What will df[1, ] return?

> df <- data.frame(z = 1:3, y = 2:4) > str(df[1, ]) 'data.frame': 1 obs. of 2 variables: $ z: int 1 $ y: int 2 > df <- data.frame(z = 1:3) > str(df[1, ]) int 1

slide-12
SLIDE 12

R Functions

[ is a common source of surprises

> last_row <- function(df) { df[nrow(df), ] } > df <- data.frame(x = 1:3) # Not a row, just a vector > str(last_row(df)) int 3

slide-13
SLIDE 13

R Functions

Two common solutions for [

  • Use drop = FALSE: df[x, , drop = FALSE]
  • Subset the data frame like a list: df[x]

> last_row <- function(df) { df[nrow(df), , drop = FALSE] } > df <- data.frame(x = 1:3) > str(last_row(df)) 'data.frame': 1 obs. of 1 variable: $ x: int 3

slide-14
SLIDE 14

R Functions

What to do?

  • Write your own functions to be type-stable
  • Learn the common type-inconsistent functions in R:

[, sapply

  • Avoid using type-inconsistent functions inside your
  • wn functions
  • Build a vocabulary of type-consistent functions
slide-15
SLIDE 15

R Functions

Let’s practice!

slide-16
SLIDE 16

R Functions

Non-standard evaluation

slide-17
SLIDE 17

R Functions

> subset(mtcars, disp > 400) mpg cyl disp hp drat wt qsec vs am gear carb Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 > disp > 400 Error: object 'disp' not found > disp Error: object 'disp' not found

What is non-standard evaluation?

  • Great for data analysis, because they save typing

evaluated inside mtcars

  • Non-standard evaluation functions don’t use the usual

lookup rules

slide-18
SLIDE 18

R Functions

Other NSE functions

> library(ggplot2) > ggplot(mpg, aes(displ, cty)) + geom_point() > library(dplyr) > filter(mtcars, disp > 400) mpg cyl disp hp drat wt qsec vs am gear carb 1 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 2 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 3 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 > disp_threshold <- 400 > filter(mtcars, disp > disp_threshold) mpg cyl disp hp drat wt qsec vs am gear carb 1 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 2 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 3 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4

slide-19
SLIDE 19

R Functions

> library(ggplot2) > ggplot(mpg, aes(displ, cty)) + geom_point() > library(dplyr) > filter(mtcars, disp > 400) mpg cyl disp hp drat wt qsec vs am gear carb 1 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 2 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 3 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 > disp_threshold <- 400 > filter(mtcars, disp > disp_threshold) mpg cyl disp hp drat wt qsec vs am gear carb 1 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 2 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 3 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4

Other NSE functions

disp_threshold value in the global environment

slide-20
SLIDE 20

R Functions

What to do?

  • Using non-standard evaluation functions inside your
  • wn functions can cause surprises
  • Avoid using non-standard evaluation functions inside

your functions

  • Or, learn the surprising cases and protect against them
slide-21
SLIDE 21

R Functions

Let’s practice!

slide-22
SLIDE 22

R Functions

Hidden arguments

slide-23
SLIDE 23

R Functions

Pure functions

  • 1. Their output only depends on their inputs
  • 2. They don’t affect the outside world except through their

return value

  • Hidden arguments are function inputs that may be

different for different users or sessions

  • Common example: argument defaults that depend on

global options

slide-24
SLIDE 24

R Functions

Viewing global options

> options() $add.smooth [1] TRUE > options() $add.smooth [1] TRUE $browserNLdisabled [1] FALSE $CBoundsCheck [1] FALSE $check.bounds [1] FALSE ...

slide-25
SLIDE 25

R Functions

Geing and seing options

> getOption("digits") [1] 7 > options(digits = 5) > getOption("digits") [1] 5 # To read about some of the common options > ?options

slide-26
SLIDE 26

R Functions

Relying on options in your code

  • The return value of a function should never depend on a

global option

  • Side effects may be controlled by global options
slide-27
SLIDE 27

R Functions

Let’s practice!

slide-28
SLIDE 28

R Functions

Wrapping up

slide-29
SLIDE 29

R Functions

Writing functions

  • If you have copy-and-pasted two times, it’s time to write a

function

  • Solve a simple problem, before writing the function
  • A good function is both correct and understandable
slide-30
SLIDE 30

R Functions

Functional Programming

  • Abstract away the paern, so you can focus on the data and

actions

  • Solve iteration problems more easily
  • Have more understandable code
slide-31
SLIDE 31

R Functions

Remove duplication and improve readability

> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE)) > df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$b, na.rm = TRUE) - min(df$b, na.rm = TRUE)) > df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE)) > df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE)) > library(purrr) > df[] <- map(df, rescale01)

slide-32
SLIDE 32

R Functions

Unusual inputs and outputs

  • Deal with failure using safely()
  • Iterate over two or more arguments
  • Iterate functions for their side effects
slide-33
SLIDE 33

R Functions

Write functions that don’t surprise

  • Use stop() and stopifnot() to fail early
  • Avoid using type-inconsistent functions in your own

functions

  • Avoid non-standard evaluation functions in your own

functions

  • Never rely on global options for computational details
slide-34
SLIDE 34

R Functions

Wrapping up

  • Solve the problem that you’re working on
  • Never feel bad about using a for loop!
  • Get a function that works right, for the easiest 80% of the

problem

  • In time, you’ll learn how to get to 99% with minimal extra

effort

  • Concise and elegant code is something to strive towards!
slide-35
SLIDE 35

R Functions

Thanks!