Lists Steve Bagley somgen223.stanford.edu 1
Vectors vs lists somgen223.stanford.edu 2
c (1, "b", FALSE) [1] "1" "b" "FALSE" Vector • A vector can only contain one type of data. • In this example R forces the vector to be all characters because that is the only type that can contain some representation of numbers, characters, and logicals. somgen223.stanford.edu 3
list (1, "b", FALSE) [[1]] [1] 1 [[2]] [1] "b" [[3]] [1] FALSE List • A list can contain any types of data. • Lists print out in an unusual way. • Each element of the list is preceded by [[i]] for the i th element. somgen223.stanford.edu 4
> a <- list (1, "b", FALSE) > a[2] [[1]] [1] "b" > a[[2]] [1] "b" How to access elements of a list • a[2] returns a one-element list containing the character vector "b" . This is not very helpful. • a[[2]] returns the character vector "b" . It’s the second element of the list. somgen223.stanford.edu 5
> (b <- list (x = 1, y = "b", c = FALSE)) $ x [1] 1 $ y [1] "b" $ c [1] FALSE > b[["x"]] [1] 1 Named lists • Lists can use names for the elements of the list. This is often easier to understand and more useful than using numeric indices. somgen223.stanford.edu 6
Using lists in the tidyverse somgen223.stanford.edu 7
species_groups <- as_tibble (iris) %>% group_by (Species) %>% group_split () Splitting a data frame into a list of groups • This creates a three element list. • Each element is a data frame containing the data for that species. somgen223.stanford.edu 8
map map (species_groups, nrow) [[1]] [1] 50 [[2]] [1] 50 [[3]] [1] 50 • map takes a list and a function and applies that function to each element of the list. • It returns a list of those function call results. • nrow returns the number of rows in a data frame. somgen223.stanford.edu 9
map < dbl > 7 3.2 4.7 1.4 versicolor [[3]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species < dbl > map (species_groups, ~head (., n = 1)) < dbl > < dbl > < fct > 1 6.3 3.3 6 2.5 virginica 1 < dbl > < fct > < dbl > < dbl > [[1]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species < dbl > < dbl > < dbl > < dbl > < fct > 1 5.1 3.5 1.4 0.2 setosa [[2]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species < dbl > • Return a list with the first row in each data frame. somgen223.stanford.edu 10
2 1.4 3.3 6.3 3 1.4 versicolor 4.7 3.2 7 map (species_groups, ~head (., n = 1)) %>% 0.2 setosa 3.5 2.5 virginica 5.1 1 < dbl > < fct > < dbl > < dbl > < dbl > Sepal.Length Sepal.Width Petal.Length Petal.Width Species # A tibble: 3 x 5 bind_rows () 6 Glue them back together • bind_rows takes a list of data frames and glues them into a single data frame. • This result is a 3-row data frame, each row being the first row of data for each species. somgen223.stanford.edu 11
7 0.2 setosa 3.3 6.3 3 1.4 versicolor 4.7 3.2 map_df (species_groups, ~head (., n = 1)) 2 1.4 2.5 virginica 3.5 5.1 1 < dbl > < fct > < dbl > < dbl > < dbl > Sepal.Length Sepal.Width Petal.Length Petal.Width Species # A tibble: 3 x 5 6 Glue them back together • Remember map_df from earlier lecture: It applies the function to each data frame in the list, and glues the results into a single data frame. • This is the same result as the previous slide. somgen223.stanford.edu 12
Using lists with regression objects somgen223.stanford.edu 13
Petal.Width Signif. codes : > r <- lm (Petal.Length ~ Petal.Width, data = iris) 2.22994 0.05140 43.39 < 2e-16 *** --- 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 14.85 Residual standard error : 0.4782 on 148 degrees of freedom Multiple R - squared : 0.9271, Adjusted R - squared : 0.9266 F - statistic : < 2e-16 *** 0.07297 p - value : < 2.2e-16 Median > (s <- summary (r)) Call : lm (formula = Petal.Length ~ Petal.Width, data = iris) Residuals : Min 1Q 3Q 1.08356 Max -1.33542 -0.30347 -0.02955 0.25776 1.39453 Coefficients : Estimate Std. Error t value Pr ( >| t | ) (Intercept) 1882 on 1 and 148 DF, Create a regression object • r is the regression object. • s is the summary of the regression object, which has some additional information computed from the result. • We can extract useful information from the summary. somgen223.stanford.edu 14
coefficients (s) Estimate Std. Error t value Pr ( >| t | ) (Intercept) 1.083558 0.07296696 14.84998 4.043318e-31 Petal.Width 2.229940 0.05139623 43.38724 4.675004e-86 The regression coefficients • The coefficients function extracts the regression coefficients. • The first row is the intercept. • The second row is the slope, the coefficient for the variable Petal.Width . somgen223.stanford.edu 15
> s $ r.squared [1] 0.9271098 The coefficient of determination, r^2 • There is no function to extract the 𝑠 2 value, so we have to do it ourselves. • You need to type ?lm to see the names of what is inside the regression object. somgen223.stanford.edu 16
Computing over groups • Can we compute one regression for each species? somgen223.stanford.edu 17
species_groups %>% map ( ~summary ( lm (Petal.Length ~ Petal.Width, data = .)) $ r.squared) [[1]] [1] 0.1099785 [[2]] [1] 0.6188467 [[3]] [1] 0.1037537 Compute r squared for each group 1 • We build the model for each element of the group, call summary on the model, and extract the r.squared element from the summary. • The result is a list. somgen223.stanford.edu 18
species_groups %>% map_dbl ( ~summary ( lm (Petal.Length ~ Petal.Width, data = .)) $ r.squared) [1] 0.1099785 0.6188467 0.1037537 Compute r squared for each group 2 • This the same calculation as on the previous slide. • We use map_dbl instead of map to convert the result from a list to a vector of double-precision floating-point numbers. somgen223.stanford.edu 19
species_groups %>% map ( ~ lm (Petal.Length ~ Petal.Width, data = .)) %>% map (summary) %>% map_dbl ("r.squared") [1] 0.1099785 0.6188467 0.1037537 Compute r squared for each group 2 • Advanced: This is the same calculation, but using the pipe operator for each step. • map_dbl normally takes a function to apply, but you can also give the name of a element of the list to extract. somgen223.stanford.edu 20
1.781275 0.6472593 map ( ~ lm (Petal.Length ~ Petal.Width, data = .)) %>% map (coefficients) [[1]] (Intercept) Petal.Width 1.3275634 0.5464903 [[2]] (Intercept) Petal.Width species_groups %>% 1.869325 [[3]] (Intercept) Petal.Width 4.2406526 Collect coefficients for each group 1 • This is close, but the result is a list of named vectors. • A named vector is like a named list: each position can be referred to a name, a character string, instead of the numeric index. somgen223.stanford.edu 21
# A tibble: 1 x 2 20 c (x = 10, y = 20) x y 10 20 ## Convert to one-row data frame bind_rows ( c (x = 10, y = 20)) ## Make a named vector x y < dbl > < dbl > 1 10 Converting a named vector into a one-row data frame somgen223.stanford.edu 22
< dbl > species_groups %>% 3 1.87 1.78 2 0.546 1.33 1 < dbl > 0.647 `(Intercept)` Petal.Width # A tibble: 3 x 2 bind_rows () ## now glue the data frames together map (bind_rows) %>% ## convert named vector into 1-row data frame using bind_rows map (coefficients) %>% map ( ~ lm (Petal.Length ~ Petal.Width, data = .)) %>% 4.24 Collect coefficients for each group 2 somgen223.stanford.edu 23
broom • If you do a lot of analyses like this, you should learn about the broom package. somgen223.stanford.edu 24
Recommend
More recommend