Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1 Making - PowerPoint PPT Presentation

Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1

Making numbers into factors using numeric ranges somgen223.stanford.edu 2

Making numbers into factors using numeric ranges • We use factors for grouping, but numbers themselves do not make very good groups. Would you want to group together all subjects with weight of 12.5? • Instead, we set up non-overlapping intervals, and use those as the factor values. • Example: 0–10, 10–20, 20–30 somgen223.stanford.edu 3

y <- c (1, 2, 3, 4, 5) cut_number (y, n = 2) [1] [1,3] [1,3] [1,3] (3,5] (3,5] Levels : [1,3] (3,5] Example • cut_number tries to create n bins with approximately the same number of values in each bin. • It returns a factor vector using a special symbolic code for the ranges. • The interval (a,b] spans from a to b , open on the left end, and closed on the right. This does not include a , but does include b . • Note the levels of the factor. somgen223.stanford.edu 4

2 [1,3] 5 (3,5] # A tibble: 5 x 2 y y_cut < dbl > < fct > 1 1 [1,3] 2 tibble (y = y, y_cut = cut_number (y, n = 2)) 3 3 [1,3] 4 4 (3,5] 5 Example somgen223.stanford.edu 5

cut_interval z <- c (1, 1, 1, 2, 4, 5) cut_number (z, n = 2) [1] [1,1.5] [1,1.5] [1,1.5] (1.5,5] (1.5,5] (1.5,5] Levels : [1,1.5] (1.5,5] cut_interval (z, n = 2) [1] [1,3] [1,3] [1,3] [1,3] (3,5] (3,5] Levels : [1,3] (3,5] • cut_interval makes n intervals with the same range (width). somgen223.stanford.edu 6

cut_width cut_width (z, width = 1) [1] [0.5,1.5] [0.5,1.5] [0.5,1.5] (1.5,2.5] (3.5,4.5] (4.5,5.5] Levels : [0.5,1.5] (1.5,2.5] (2.5,3.5] (3.5,4.5] (4.5,5.5] • cut_width makes intervals of the specified width. somgen223.stanford.edu 7

iris %>% mutate (petal_length = cut_number (Petal.Length, n = 4)) %>% ggplot ( aes (petal_length, Petal.Width)) + geom_boxplot () Graphics example 2.5 2.0 Petal.Width 1.5 1.0 0.5 0.0 [1,1.6] (1.6,4.35] (4.35,5.1] (5.1,6.9] petal_length somgen223.stanford.edu 8

Formatting numbers somgen223.stanford.edu 9

2.4 1.5 0 0 0 [1] round (x, digits = -1) 2.5 10.6 x <- c (1.4234, 1.5, 1.6234, 2.4, 2.5, 10.6) 1.6 1.4 0 10 [1] round (x, digits = 1) 2 11 2 2 2 1 [1] round (x) 0 round • round creates a new, rounded, number. • At 0.5 it rounds to the even digit. • You can specify the number of digits. Negative numbers round to multiples of 10 . somgen223.stanford.edu 10

[1] signif (x, digits = 1) 1.6234 1.5000 1.4234 [1] signif (x, digits = 5) 2 10 2 2 2 1 x 2.5000 10.6000 2.5000 10.6000 2.4000 1.6234 1.5000 1.4234 [1] signif (x) 2.5000 10.6000 2.4000 1.6234 1.5000 1.4234 [1] 2.4000 signif • signif creates a new number, rounded to the specified number of significant digits. somgen223.stanford.edu 11

1 234.0 1.23 123. 123. 120 123 123.4 5 12.3 12.3 12 12 12.3 6 1.23 library (scales) # for the number function 1.2 1 1.2 7 0.123 0.12 0.12 0 0.1 8 0.0123 0.01 4 1 234 0.0 < dbl > < chr > (d <- tibble (x = c (123400, 12340, 1234, 123.4, 12.34, 1.234, 0.1234, 0.01234)) %>% mutate (rounded = round (x, digits = 2), signifed = signif (x, digits = 2), number1 = number (x, accuracy = 1), number2 = number (x, accuracy = 0.1))) # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < chr > 1200 1 123400 123400 120000 123 400 123 400.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 0.012 0 data frame example somgen223.stanford.edu 12

120 1. options (pillar.sigfig = 1) 123 123.4 5 12. 12. 12 12 12.3 6 1. 1. 123. 1 1.2 7 0.1 0.1 0.1 0 0.1 8 0.01 0.01 123. 4 0.0 120000 d # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < dbl > < chr > < chr > 1 123400 123400 123 400 123 400.0 1 234.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 1200 1 234 0.01 0 set option • This sets a print option for tibbles. • The default value is 3. • A value you set stays in place until you change it (or quit R). somgen223.stanford.edu 13

123.4 1.23 120 123 123.4 5 12.34 12.34 12 12 12.3 6 1.234 1.2 123.4 1 1.2 7 0.1234 0.12 0.12 0 0.1 8 0.01234 0.01 options (pillar.sigfig = 5) 4 0.0 120000 d # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < dbl > < chr > < chr > 1 123400 123400 123 400 123 400.0 1 234.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 1200 1 234 0.012 0 set option somgen223.stanford.edu 14

sprintf ("The value of x is approximately: %.2f", 1.23456) [1] "The value of x is approximately: 1.23" sprintf • sprintf inserts values into a format string, which contains both literal text and format codes, starting with % . • The result is of type character. You can print this (or save it). • For more about the many format codes, see the help page. somgen223.stanford.edu 15

# # ... with 6 more number1 <chr>, # signifed <dbl>, # more variables: # rows, and 3 print (d, n = 2, width = 20) 12340 number2 <chr> 12340 2 123400 1 123400 < dbl > < dbl > x rounded # A tibble: 8 x 5 # controlling how data frames print • This will print 2 rows, and the first 20 characters per row. somgen223.stanford.edu 16

print (d, n = + Inf) printing the entire data frame • This will print all rows. somgen223.stanford.edu 17

Row vs column operations somgen223.stanford.edu 18

2 100 3 3 101 12 2 (d1 <- tibble (x = 1 : 3, y = 11 : 13, z = 100 : 102)) 11 102 1 1 < int > < int > < int > z y x # A tibble: 3 x 3 13 Exercise: Sum along all the columns • How would you create a new row that contains the column sums? somgen223.stanford.edu 19

d1 %>% summarize_all (sum) # A tibble: 1 x 3 x y z < int > < int > < int > 1 6 36 303 Answer: Sum all the columns • This applies the sum function to every one of the columns. somgen223.stanford.edu 20

12 2 6 4 102 13 3 3 101 bind_rows (d1, summarize_all (d1, sum)) 2 303 100 11 1 1 < int > < int > < int > z y x # A tibble: 4 x 3 36 Include the sum as the last row • This includes the row with the summed values as the bottom row. somgen223.stanford.edu 21

2 100 3 3 101 12 2 d1 11 102 1 1 < int > < int > < int > z y x # A tibble: 3 x 3 13 Exercise: Sum across all the rows • How would you create a new column with the sum of all the previous columns? • This is a bit more complicated: each column is a vector, but each row is not. somgen223.stanford.edu 22

2 100 13 3 3 115 101 12 2 d1 %>% 112 11 118 1 1 < dbl > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = rowSums (.)) 102 Answer: Sum across all the rows • rowSums is built-in. • There is also a rowMeans function. • But what if we want a different calculation? somgen223.stanford.edu 23

112 100 13 3 3 115 101 12 2 2 d1 %>% 11 118 1 1 < int > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = reduce (., `+`)) 102 Answer: Sum across all the rows • + is a binary operator to compute the sum. somgen223.stanford.edu 24

112 100 13 3 3 115 101 12 2 2 d1 %>% 11 118 1 1 < dbl > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = flatten_dbl ( pmap (., sum))) 102 Answer: Sum across all the rows • This is a more complex approach using functions from the purrr package. somgen223.stanford.edu 25

How to combine multiple plots together somgen223.stanford.edu 26

## install.packages("patchwork") library (patchwork) package patchwork • This package allows you to easily combine multiple ggplot plots into a single graphic. somgen223.stanford.edu 27

g1 <- tibble (x = 1 : 3, y = 1 : 3) %>% ggplot ( aes (x, y)) + geom_point (size = 5) g2 <- tibble (x = 1 : 3, y = 3 : 1) %>% ggplot ( aes (x, y)) + geom_point (size = 5) Create two graphs somgen223.stanford.edu 28

g1 | g2 Combine using patchwork 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x • Use “ | ” to place side-by-side somgen223.stanford.edu 29

g1 / g2 Combine using patchwork 3.0 2.5 2.0 y 1.5 1.0 1.0 1.5 2.0 2.5 3.0 x 3.0 2.5 2.0 y 1.5 1.0 1.0 1.5 2.0 2.5 3.0 x • Use “ / ” to place on top of somgen223.stanford.edu 30

(g1 | g2) / (g2 | g1) Combine using patchwork 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x • Use “ ( ) ” for grouping somgen223.stanford.edu 31

g1 | g2 | g1 | g2 Combine using patchwork 3.0 3.0 3.0 3.0 2.5 2.5 2.5 2.5 2.0 2.0 2.0 2.0 y y y y 1.5 1.5 1.5 1.5 1.0 1.0 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x x x somgen223.stanford.edu 32

Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1 Making - PowerPoint PPT Presentation

Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1 Making numbers into factors using numeric ranges somgen223.stanford.edu 2 Making numbers into factors using numeric ranges We use factors for grouping, but numbers themselves do not

The gapminder dataset David Robinson Data Scientist, Stack Overflow DataCamp Introduction to

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Beyond Semantics Wrapup Annual meeting of the DGfS AG 1 Gttingen, 2011 Wrapup Bonnie

Frontend Wrapup COMP 520: Compiler Design (4 credits) Alexander Krolik

Introduction to the Tidyverse Exploring an Opinionated Grammar of R Nicholas R. Davis 7/29/2019

Advanced R (with Tidyverse) Simon Andrews V2020-11 Course Content Expanding knowledge

Introduction to qualitative data Emily Robinson Data Scientist DataCamp Categorical Data in

Examining common themed variables Emily Robinson Data Scientist DataCamp Categorical Data in

TO COWORKERS Erin Grand TEACH THE TIDYVERSE TO BEGINNERS COWORKERS Preparing for Good PD 1.

Introduction to data frames Steve Bagley somgen223.stanford.edu 1 Using packages from the

Learning and using the tidyverse for historical research @vivalosburros Jesse Sadler

Hierarchical cl u stering N E TW OR K AN ALYSIS IN TH E TIDYVE R SE Massimo Franceschet Prof .

Visualization Max Turgeon STAT 4690Applied Multivariate Analysis Tidyverse For graphics,

Training, test and validation splits Dmitriy (Dima) Gorenshteyn Lead Data Scientist, Memorial

Net w ork anal y sis in R : A tid y approach N E TW OR K AN ALYSIS IN TH E TIDYVE R SE Massimo

Introduction to Machine Learning CART: Growing a Tree Learning goals Understand how a tree is

Media Strategy: PPQ Cardashians Blueprint Communications for Personalised Plates Queensland The

12 Powerful Words Increase T est Scores and Help Close the Achievement Gap Objectives T

Geographical Names as a Cultural Expression: an Analysis of the City of Petrpolis RJ,

Forests http://www.rhaensch.de/rfvis.html AI vs. ML vs. DL Art rtif ific icia ial l In

Vehicle Routing 3. Construction Heuristics Marco Chiarandini Construction Heuristics for CVRP

Upgrade tracker of the HL-LHC Sergio Dez Cornell, Berkeley Lab (USA), On behalf of the ATLAS

The Dynamics of Parabolic Transcendental Maps Mashael Alhamd University of Liverpool October 3,