Hadley Wickham @hadleywickham Chief Scientist, RStudio
The tidyverse
September 2016
The tidyverse September 2016 Hadley Wickham @hadleywickham Chief - - PowerPoint PPT Presentation
The tidyverse September 2016 Hadley Wickham @hadleywickham Chief Scientist, RStudio Import Visualise Surprises, but doesn't scale Tidy Transform Create new variables & new summaries Consistent way of storing data Model Scales,
Hadley Wickham @hadleywickham Chief Scientist, RStudio
The tidyverse
September 2016Tidy Import
Surprises, but doesn't scale Create new variables & new summaries Consistent way of storing dataVisualise Transform Model Communicate
Scales, but doesn't (fundamentally) surpriseProgram
No matter how complex and polished the individual operations are, it is often the quality of the glue that most directly determines the power of the system. — Hal Abelson
Tidy Import Visualise Transform Model Communicate Program
Tidy Import Visualise Transform Model Communicate Program
The tidy tools manifesto
http://r4ds.had.co.nz
tidyverse
2.Compose simple pieces. 3.Embrace FP. 4.Write for humans.
Share data structures
data frame.
column.
Tidy data
Messy data has a varied “shape”
What are the variables in this dataset? (Hint: f = female, u = unknown, 1524 = 15-24)tidyr helps you tidy your messy data
Tidy data has a uniform “shape”
Sometimes you don’t have variables & cases
strings dates matrices vectors xml HTTP requests HTTP response
http://simplystatistics.org/2016/02/17/non-tidy-data/factors
What if you have a mix of object types?
Training data Test data Model Predictions RMSE Cross-validation data frame data frame lm vector scalarUse a tibble with list-columns!
# A tibble: 100 x 5 train test .id mod rmse <list> <list> <chr> <list> <dbl> 1 <S3: resample> <S3: resample> 001 <S3: lm> 0.5661605 2 <S3: resample> <S3: resample> 002 <S3: lm> 0.2399357 3 <S3: resample> <S3: resample> 003 <S3: lm> 3.5482986 4 <S3: resample> <S3: resample> 004 <S3: lm> 0.2396810 5 <S3: resample> <S3: resample> 005 <S3: lm> 0.1591336 6 <S3: resample> <S3: resample> 006 <S3: lm> 0.1934869 7 <S3: resample> <S3: resample> 007 <S3: lm> 0.2697834 8 <S3: resample> <S3: resample> 008 <S3: lm> 0.4910886 9 <S3: resample> <S3: resample> 009 <S3: lm> 1.7002645 10 <S3: resample> <S3: resample> 010 <S3: lm> 0.2047787 ... with 90 more rowsYour turn!
Your turn!
Two surprises partial name matching & stringsAsFactorsTwo important tensions for understanding base R
Interactive exploration Programming Conservative Utopian
Tibbles are data frames that are lazy & surly
And work better with list-columns
And work better with list-columns
And work better with list-columns
Compose simple pieces
Goal: Solve complex problems by combining uniform pieces.
magrittr::
Consistency across packages is important
😨
And ggplot2 is not even internally consistent
And ggplot2 is not even internally consistent
😲
ggplot1 had a tidier API than ggplot2!
So you can use the pipe with ggplot1
One small example from Bob Rudis
https://rud.is/b/2016/07/26Embrace FP
Answered with cupcakes Why are for loops “bad”?Vanilla cupcakes
The hummingbird bakery cookbookChocolate cupcakes
The hummingbird bakery cookbookChocolate cupcakes
The hummingbird bakery cookbookVanilla cupcakes
The hummingbird bakery cookbookVanilla cupcakes
Vanilla cupcakes
Vanilla cupcakes
Cupcakes
What do these for loops do?
For loops emphasise the objects
Not the actions
Functional programming emphasises the actions
Teaser: simulation
Teaser: saving parameterised reports
Write for humans
Programs must be written for people to read, and only incidentally for machines to execute. — Hal Abelson
tibblelubridate forcats
filter mutate summarise arrange select
magrittr
Conclusion
2.Compose simple pieces. 3.Embrace FP. 4.Write for humans.
My goal is to make a pit of success
h t t p : / / b lGotta install them all
http://r4ds.had.co.nz
tidyverse