lists
play

Lists Steve Bagley somgen223.stanford.edu 1 Vectors vs lists - PowerPoint PPT Presentation

Lists Steve Bagley somgen223.stanford.edu 1 Vectors vs lists somgen223.stanford.edu 2 c (1, "b", FALSE) [1] "1" "b" "FALSE" Vector A vector can only contain one type of data. In this example


  1. Lists Steve Bagley somgen223.stanford.edu 1

  2. Vectors vs lists somgen223.stanford.edu 2

  3. c (1, "b", FALSE) [1] "1" "b" "FALSE" Vector • A vector can only contain one type of data. • In this example R forces the vector to be all characters because that is the only type that can contain some representation of numbers, characters, and logicals. somgen223.stanford.edu 3

  4. list (1, "b", FALSE) [[1]] [1] 1 [[2]] [1] "b" [[3]] [1] FALSE List • A list can contain any types of data. • Lists print out in an unusual way. • Each element of the list is preceded by [[i]] for the i th element. somgen223.stanford.edu 4

  5. > a <- list (1, "b", FALSE) > a[2] [[1]] [1] "b" > a[[2]] [1] "b" How to access elements of a list • a[2] returns a one-element list containing the character vector "b" . This is not very helpful. • a[[2]] returns the character vector "b" . It’s the second element of the list. somgen223.stanford.edu 5

  6. > (b <- list (x = 1, y = "b", c = FALSE)) $ x [1] 1 $ y [1] "b" $ c [1] FALSE > b[["x"]] [1] 1 Named lists • Lists can use names for the elements of the list. This is often easier to understand and more useful than using numeric indices. somgen223.stanford.edu 6

  7. Using lists in the tidyverse somgen223.stanford.edu 7

  8. species_groups <- as_tibble (iris) %>% group_by (Species) %>% group_split () Splitting a data frame into a list of groups • This creates a three element list. • Each element is a data frame containing the data for that species. somgen223.stanford.edu 8

  9. map map (species_groups, nrow) [[1]] [1] 50 [[2]] [1] 50 [[3]] [1] 50 • map takes a list and a function and applies that function to each element of the list. • It returns a list of those function call results. • nrow returns the number of rows in a data frame. somgen223.stanford.edu 9

  10. map < dbl > 7 3.2 4.7 1.4 versicolor [[3]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species < dbl > map (species_groups, ~head (., n = 1)) < dbl > < dbl > < fct > 1 6.3 3.3 6 2.5 virginica 1 < dbl > < fct > < dbl > < dbl > [[1]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species < dbl > < dbl > < dbl > < dbl > < fct > 1 5.1 3.5 1.4 0.2 setosa [[2]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species < dbl > • Return a list with the first row in each data frame. somgen223.stanford.edu 10

  11. 2 1.4 3.3 6.3 3 1.4 versicolor 4.7 3.2 7 map (species_groups, ~head (., n = 1)) %>% 0.2 setosa 3.5 2.5 virginica 5.1 1 < dbl > < fct > < dbl > < dbl > < dbl > Sepal.Length Sepal.Width Petal.Length Petal.Width Species # A tibble: 3 x 5 bind_rows () 6 Glue them back together • bind_rows takes a list of data frames and glues them into a single data frame. • This result is a 3-row data frame, each row being the first row of data for each species. somgen223.stanford.edu 11

  12. 7 0.2 setosa 3.3 6.3 3 1.4 versicolor 4.7 3.2 map_df (species_groups, ~head (., n = 1)) 2 1.4 2.5 virginica 3.5 5.1 1 < dbl > < fct > < dbl > < dbl > < dbl > Sepal.Length Sepal.Width Petal.Length Petal.Width Species # A tibble: 3 x 5 6 Glue them back together • Remember map_df from earlier lecture: It applies the function to each data frame in the list, and glues the results into a single data frame. • This is the same result as the previous slide. somgen223.stanford.edu 12

  13. Using lists with regression objects somgen223.stanford.edu 13

  14. Petal.Width Signif. codes : > r <- lm (Petal.Length ~ Petal.Width, data = iris) 2.22994 0.05140 43.39 < 2e-16 *** --- 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 14.85 Residual standard error : 0.4782 on 148 degrees of freedom Multiple R - squared : 0.9271, Adjusted R - squared : 0.9266 F - statistic : < 2e-16 *** 0.07297 p - value : < 2.2e-16 Median > (s <- summary (r)) Call : lm (formula = Petal.Length ~ Petal.Width, data = iris) Residuals : Min 1Q 3Q 1.08356 Max -1.33542 -0.30347 -0.02955 0.25776 1.39453 Coefficients : Estimate Std. Error t value Pr ( >| t | ) (Intercept) 1882 on 1 and 148 DF, Create a regression object • r is the regression object. • s is the summary of the regression object, which has some additional information computed from the result. • We can extract useful information from the summary. somgen223.stanford.edu 14

  15. coefficients (s) Estimate Std. Error t value Pr ( >| t | ) (Intercept) 1.083558 0.07296696 14.84998 4.043318e-31 Petal.Width 2.229940 0.05139623 43.38724 4.675004e-86 The regression coefficients • The coefficients function extracts the regression coefficients. • The first row is the intercept. • The second row is the slope, the coefficient for the variable Petal.Width . somgen223.stanford.edu 15

  16. > s $ r.squared [1] 0.9271098 The coefficient of determination, r^2 • There is no function to extract the 𝑠 2 value, so we have to do it ourselves. • You need to type ?lm to see the names of what is inside the regression object. somgen223.stanford.edu 16

  17. Computing over groups • Can we compute one regression for each species? somgen223.stanford.edu 17

  18. species_groups %>% map ( ~summary ( lm (Petal.Length ~ Petal.Width, data = .)) $ r.squared) [[1]] [1] 0.1099785 [[2]] [1] 0.6188467 [[3]] [1] 0.1037537 Compute r squared for each group 1 • We build the model for each element of the group, call summary on the model, and extract the r.squared element from the summary. • The result is a list. somgen223.stanford.edu 18

  19. species_groups %>% map_dbl ( ~summary ( lm (Petal.Length ~ Petal.Width, data = .)) $ r.squared) [1] 0.1099785 0.6188467 0.1037537 Compute r squared for each group 2 • This the same calculation as on the previous slide. • We use map_dbl instead of map to convert the result from a list to a vector of double-precision floating-point numbers. somgen223.stanford.edu 19

  20. species_groups %>% map ( ~ lm (Petal.Length ~ Petal.Width, data = .)) %>% map (summary) %>% map_dbl ("r.squared") [1] 0.1099785 0.6188467 0.1037537 Compute r squared for each group 2 • Advanced: This is the same calculation, but using the pipe operator for each step. • map_dbl normally takes a function to apply, but you can also give the name of a element of the list to extract. somgen223.stanford.edu 20

  21. 1.781275 0.6472593 map ( ~ lm (Petal.Length ~ Petal.Width, data = .)) %>% map (coefficients) [[1]] (Intercept) Petal.Width 1.3275634 0.5464903 [[2]] (Intercept) Petal.Width species_groups %>% 1.869325 [[3]] (Intercept) Petal.Width 4.2406526 Collect coefficients for each group 1 • This is close, but the result is a list of named vectors. • A named vector is like a named list: each position can be referred to a name, a character string, instead of the numeric index. somgen223.stanford.edu 21

  22. # A tibble: 1 x 2 20 c (x = 10, y = 20) x y 10 20 ## Convert to one-row data frame bind_rows ( c (x = 10, y = 20)) ## Make a named vector x y < dbl > < dbl > 1 10 Converting a named vector into a one-row data frame somgen223.stanford.edu 22

  23. < dbl > species_groups %>% 3 1.87 1.78 2 0.546 1.33 1 < dbl > 0.647 `(Intercept)` Petal.Width # A tibble: 3 x 2 bind_rows () ## now glue the data frames together map (bind_rows) %>% ## convert named vector into 1-row data frame using bind_rows map (coefficients) %>% map ( ~ lm (Petal.Length ~ Petal.Width, data = .)) %>% 4.24 Collect coefficients for each group 2 somgen223.stanford.edu 23

  24. broom • If you do a lot of analyses like this, you should learn about the broom package. somgen223.stanford.edu 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend