tidyverse wrapup
play

Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1 Making - PowerPoint PPT Presentation

Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1 Making numbers into factors using numeric ranges somgen223.stanford.edu 2 Making numbers into factors using numeric ranges We use factors for grouping, but numbers themselves do not


  1. Tidyverse wrapup Steve Bagley somgen223.stanford.edu 1

  2. Making numbers into factors using numeric ranges somgen223.stanford.edu 2

  3. Making numbers into factors using numeric ranges • We use factors for grouping, but numbers themselves do not make very good groups. Would you want to group together all subjects with weight of 12.5? • Instead, we set up non-overlapping intervals, and use those as the factor values. • Example: 0–10, 10–20, 20–30 somgen223.stanford.edu 3

  4. y <- c (1, 2, 3, 4, 5) cut_number (y, n = 2) [1] [1,3] [1,3] [1,3] (3,5] (3,5] Levels : [1,3] (3,5] Example • cut_number tries to create n bins with approximately the same number of values in each bin. • It returns a factor vector using a special symbolic code for the ranges. • The interval (a,b] spans from a to b , open on the left end, and closed on the right. This does not include a , but does include b . • Note the levels of the factor. somgen223.stanford.edu 4

  5. 2 [1,3] 5 (3,5] # A tibble: 5 x 2 y y_cut < dbl > < fct > 1 1 [1,3] 2 tibble (y = y, y_cut = cut_number (y, n = 2)) 3 3 [1,3] 4 4 (3,5] 5 Example somgen223.stanford.edu 5

  6. cut_interval z <- c (1, 1, 1, 2, 4, 5) cut_number (z, n = 2) [1] [1,1.5] [1,1.5] [1,1.5] (1.5,5] (1.5,5] (1.5,5] Levels : [1,1.5] (1.5,5] cut_interval (z, n = 2) [1] [1,3] [1,3] [1,3] [1,3] (3,5] (3,5] Levels : [1,3] (3,5] • cut_interval makes n intervals with the same range (width). somgen223.stanford.edu 6

  7. cut_width cut_width (z, width = 1) [1] [0.5,1.5] [0.5,1.5] [0.5,1.5] (1.5,2.5] (3.5,4.5] (4.5,5.5] Levels : [0.5,1.5] (1.5,2.5] (2.5,3.5] (3.5,4.5] (4.5,5.5] • cut_width makes intervals of the specified width. somgen223.stanford.edu 7

  8. iris %>% mutate (petal_length = cut_number (Petal.Length, n = 4)) %>% ggplot ( aes (petal_length, Petal.Width)) + geom_boxplot () Graphics example 2.5 2.0 Petal.Width 1.5 1.0 0.5 0.0 [1,1.6] (1.6,4.35] (4.35,5.1] (5.1,6.9] petal_length somgen223.stanford.edu 8

  9. Formatting numbers somgen223.stanford.edu 9

  10. 2.4 1.5 0 0 0 [1] round (x, digits = -1) 2.5 10.6 x <- c (1.4234, 1.5, 1.6234, 2.4, 2.5, 10.6) 1.6 1.4 0 10 [1] round (x, digits = 1) 2 11 2 2 2 1 [1] round (x) 0 round • round creates a new, rounded, number. • At 0.5 it rounds to the even digit. • You can specify the number of digits. Negative numbers round to multiples of 10 . somgen223.stanford.edu 10

  11. [1] signif (x, digits = 1) 1.6234 1.5000 1.4234 [1] signif (x, digits = 5) 2 10 2 2 2 1 x 2.5000 10.6000 2.5000 10.6000 2.4000 1.6234 1.5000 1.4234 [1] signif (x) 2.5000 10.6000 2.4000 1.6234 1.5000 1.4234 [1] 2.4000 signif • signif creates a new number, rounded to the specified number of significant digits. somgen223.stanford.edu 11

  12. 1 234.0 1.23 123. 123. 120 123 123.4 5 12.3 12.3 12 12 12.3 6 1.23 library (scales) # for the number function 1.2 1 1.2 7 0.123 0.12 0.12 0 0.1 8 0.0123 0.01 4 1 234 0.0 < dbl > < chr > (d <- tibble (x = c (123400, 12340, 1234, 123.4, 12.34, 1.234, 0.1234, 0.01234)) %>% mutate (rounded = round (x, digits = 2), signifed = signif (x, digits = 2), number1 = number (x, accuracy = 1), number2 = number (x, accuracy = 0.1))) # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < chr > 1200 1 123400 123400 120000 123 400 123 400.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 0.012 0 data frame example somgen223.stanford.edu 12

  13. 120 1. options (pillar.sigfig = 1) 123 123.4 5 12. 12. 12 12 12.3 6 1. 1. 123. 1 1.2 7 0.1 0.1 0.1 0 0.1 8 0.01 0.01 123. 4 0.0 120000 d # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < dbl > < chr > < chr > 1 123400 123400 123 400 123 400.0 1 234.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 1200 1 234 0.01 0 set option • This sets a print option for tibbles. • The default value is 3. • A value you set stays in place until you change it (or quit R). somgen223.stanford.edu 13

  14. 123.4 1.23 120 123 123.4 5 12.34 12.34 12 12 12.3 6 1.234 1.2 123.4 1 1.2 7 0.1234 0.12 0.12 0 0.1 8 0.01234 0.01 options (pillar.sigfig = 5) 4 0.0 120000 d # A tibble: 8 x 5 x rounded signifed number1 number2 < dbl > < dbl > < dbl > < chr > < chr > 1 123400 123400 123 400 123 400.0 1 234.0 2 12340 12340 12000 12 340 12 340.0 3 1234 1234 1200 1 234 0.012 0 set option somgen223.stanford.edu 14

  15. sprintf ("The value of x is approximately: %.2f", 1.23456) [1] "The value of x is approximately: 1.23" sprintf • sprintf inserts values into a format string, which contains both literal text and format codes, starting with % . • The result is of type character. You can print this (or save it). • For more about the many format codes, see the help page. somgen223.stanford.edu 15

  16. # # ... with 6 more number1 <chr>, # signifed <dbl>, # more variables: # rows, and 3 print (d, n = 2, width = 20) 12340 number2 <chr> 12340 2 123400 1 123400 < dbl > < dbl > x rounded # A tibble: 8 x 5 # controlling how data frames print • This will print 2 rows, and the first 20 characters per row. somgen223.stanford.edu 16

  17. print (d, n = + Inf) printing the entire data frame • This will print all rows. somgen223.stanford.edu 17

  18. Row vs column operations somgen223.stanford.edu 18

  19. 2 100 3 3 101 12 2 (d1 <- tibble (x = 1 : 3, y = 11 : 13, z = 100 : 102)) 11 102 1 1 < int > < int > < int > z y x # A tibble: 3 x 3 13 Exercise: Sum along all the columns • How would you create a new row that contains the column sums? somgen223.stanford.edu 19

  20. d1 %>% summarize_all (sum) # A tibble: 1 x 3 x y z < int > < int > < int > 1 6 36 303 Answer: Sum all the columns • This applies the sum function to every one of the columns. somgen223.stanford.edu 20

  21. 12 2 6 4 102 13 3 3 101 bind_rows (d1, summarize_all (d1, sum)) 2 303 100 11 1 1 < int > < int > < int > z y x # A tibble: 4 x 3 36 Include the sum as the last row • This includes the row with the summed values as the bottom row. somgen223.stanford.edu 21

  22. 2 100 3 3 101 12 2 d1 11 102 1 1 < int > < int > < int > z y x # A tibble: 3 x 3 13 Exercise: Sum across all the rows • How would you create a new column with the sum of all the previous columns? • This is a bit more complicated: each column is a vector, but each row is not. somgen223.stanford.edu 22

  23. 2 100 13 3 3 115 101 12 2 d1 %>% 112 11 118 1 1 < dbl > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = rowSums (.)) 102 Answer: Sum across all the rows • rowSums is built-in. • There is also a rowMeans function. • But what if we want a different calculation? somgen223.stanford.edu 23

  24. 112 100 13 3 3 115 101 12 2 2 d1 %>% 11 118 1 1 < int > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = reduce (., `+`)) 102 Answer: Sum across all the rows • + is a binary operator to compute the sum. somgen223.stanford.edu 24

  25. 112 100 13 3 3 115 101 12 2 2 d1 %>% 11 118 1 1 < dbl > < int > < int > < int > z row_sum y x # A tibble: 3 x 4 mutate (row_sum = flatten_dbl ( pmap (., sum))) 102 Answer: Sum across all the rows • This is a more complex approach using functions from the purrr package. somgen223.stanford.edu 25

  26. How to combine multiple plots together somgen223.stanford.edu 26

  27. ## install.packages("patchwork") library (patchwork) package patchwork • This package allows you to easily combine multiple ggplot plots into a single graphic. somgen223.stanford.edu 27

  28. g1 <- tibble (x = 1 : 3, y = 1 : 3) %>% ggplot ( aes (x, y)) + geom_point (size = 5) g2 <- tibble (x = 1 : 3, y = 3 : 1) %>% ggplot ( aes (x, y)) + geom_point (size = 5) Create two graphs somgen223.stanford.edu 28

  29. g1 | g2 Combine using patchwork 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x • Use “ | ” to place side-by-side somgen223.stanford.edu 29

  30. g1 / g2 Combine using patchwork 3.0 2.5 2.0 y 1.5 1.0 1.0 1.5 2.0 2.5 3.0 x 3.0 2.5 2.0 y 1.5 1.0 1.0 1.5 2.0 2.5 3.0 x • Use “ / ” to place on top of somgen223.stanford.edu 30

  31. (g1 | g2) / (g2 | g1) Combine using patchwork 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x 3.0 3.0 2.5 2.5 2.0 2.0 y y 1.5 1.5 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x • Use “ ( ) ” for grouping somgen223.stanford.edu 31

  32. g1 | g2 | g1 | g2 Combine using patchwork 3.0 3.0 3.0 3.0 2.5 2.5 2.5 2.5 2.0 2.0 2.0 2.0 y y y y 1.5 1.5 1.5 1.5 1.0 1.0 1.0 1.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 x x x x somgen223.stanford.edu 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend