Lists Steve Bagley somgen223.stanford.edu 1 Vectors vs lists - - PowerPoint PPT Presentation

lists
SMART_READER_LITE
LIVE PREVIEW

Lists Steve Bagley somgen223.stanford.edu 1 Vectors vs lists - - PowerPoint PPT Presentation

Lists Steve Bagley somgen223.stanford.edu 1 Vectors vs lists somgen223.stanford.edu 2 c (1, "b", FALSE) [1] "1" "b" "FALSE" Vector A vector can only contain one type of data. In this example


slide-1
SLIDE 1

Lists

Steve Bagley

somgen223.stanford.edu 1

slide-2
SLIDE 2

Vectors vs lists

somgen223.stanford.edu 2

slide-3
SLIDE 3

Vector

c(1, "b", FALSE) [1] "1" "b" "FALSE"

  • A vector can only contain one type of data.
  • In this example R forces the vector to be all characters because that is the only

type that can contain some representation of numbers, characters, and logicals.

somgen223.stanford.edu 3

slide-4
SLIDE 4

List

list(1, "b", FALSE) [[1]] [1] 1 [[2]] [1] "b" [[3]] [1] FALSE

  • A list can contain any types of data.
  • Lists print out in an unusual way.
  • Each element of the list is preceded by [[i]] for the ith element.

somgen223.stanford.edu 4

slide-5
SLIDE 5

How to access elements of a list

> a <- list(1, "b", FALSE) > a[2] [[1]] [1] "b" > a[[2]] [1] "b"

  • a[2] returns a one-element list containing the character vector "b". This is not

very helpful.

  • a[[2]] returns the character vector "b". It’s the second element of the list.

somgen223.stanford.edu 5

slide-6
SLIDE 6

Named lists

> (b <- list(x = 1, y = "b", c = FALSE)) $x [1] 1 $y [1] "b" $c [1] FALSE > b[["x"]] [1] 1

  • Lists can use names for the elements of the list. This is often easier to understand

and more useful than using numeric indices.

somgen223.stanford.edu 6

slide-7
SLIDE 7

Using lists in the tidyverse

somgen223.stanford.edu 7

slide-8
SLIDE 8

Splitting a data frame into a list of groups

species_groups <- as_tibble(iris) %>% group_by(Species) %>% group_split()

  • This creates a three element list.
  • Each element is a data frame containing the data for that species.

somgen223.stanford.edu 8

slide-9
SLIDE 9

map

map(species_groups, nrow) [[1]] [1] 50 [[2]] [1] 50 [[3]] [1] 50

  • map takes a list and a function and applies that function to each element of the

list.

  • It returns a list of those function call results.
  • nrow returns the number of rows in a data frame.

somgen223.stanford.edu 9

slide-10
SLIDE 10

map

map(species_groups, ~head(., n = 1)) [[1]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fct> 1 5.1 3.5 1.4 0.2 setosa [[2]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fct> 1 7 3.2 4.7 1.4 versicolor [[3]] # A tibble: 1 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fct> 1 6.3 3.3 6 2.5 virginica

  • Return a list with the first row in each data frame.

somgen223.stanford.edu 10

slide-11
SLIDE 11

Glue them back together

map(species_groups, ~head(., n = 1)) %>% bind_rows() # A tibble: 3 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fct> 1 5.1 3.5 1.4 0.2 setosa 2 7 3.2 4.7 1.4 versicolor 3 6.3 3.3 6 2.5 virginica

  • bind_rows takes a list of data frames and glues them into a single data frame.
  • This result is a 3-row data frame, each row being the first row of data for each

species.

somgen223.stanford.edu 11

slide-12
SLIDE 12

Glue them back together

map_df(species_groups, ~head(., n = 1)) # A tibble: 3 x 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fct> 1 5.1 3.5 1.4 0.2 setosa 2 7 3.2 4.7 1.4 versicolor 3 6.3 3.3 6 2.5 virginica

  • Remember map_df from earlier lecture: It applies the function to each data

frame in the list, and glues the results into a single data frame.

  • This is the same result as the previous slide.

somgen223.stanford.edu 12

slide-13
SLIDE 13

Using lists with regression objects

somgen223.stanford.edu 13

slide-14
SLIDE 14

Create a regression object

> r <- lm(Petal.Length ~ Petal.Width, data = iris) > (s <- summary(r)) Call: lm(formula = Petal.Length ~ Petal.Width, data = iris) Residuals: Min 1Q Median 3Q Max

  • 1.33542 -0.30347 -0.02955

0.25776 1.39453 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.08356 0.07297 14.85 <2e-16 *** Petal.Width 2.22994 0.05140 43.39 <2e-16 ***

  • Signif. codes:

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4782 on 148 degrees of freedom Multiple R-squared: 0.9271, Adjusted R-squared: 0.9266 F-statistic: 1882 on 1 and 148 DF, p-value: < 2.2e-16

  • r is the regression object.
  • s is the summary of the regression object, which has some additional information

computed from the result.

  • We can extract useful information from the summary.

somgen223.stanford.edu 14

slide-15
SLIDE 15

The regression coefficients

coefficients(s) Estimate Std. Error t value Pr(>|t|) (Intercept) 1.083558 0.07296696 14.84998 4.043318e-31 Petal.Width 2.229940 0.05139623 43.38724 4.675004e-86

  • The coefficients function extracts the regression coefficients.
  • The first row is the intercept.
  • The second row is the slope, the coefficient for the variable Petal.Width.

somgen223.stanford.edu 15

slide-16
SLIDE 16

The coefficient of determination, r^2

> s$r.squared [1] 0.9271098

  • There is no function to extract the 𝑠2 value, so we have to do it ourselves.
  • You need to type ?lm to see the names of what is inside the regression object.

somgen223.stanford.edu 16

slide-17
SLIDE 17

Computing over groups

  • Can we compute one regression for each species?

somgen223.stanford.edu 17

slide-18
SLIDE 18

Compute r squared for each group 1

species_groups %>% map(~summary(lm(Petal.Length ~ Petal.Width, data = .))$r.squared) [[1]] [1] 0.1099785 [[2]] [1] 0.6188467 [[3]] [1] 0.1037537

  • We build the model for each element of the group, call summary on the model,

and extract the r.squared element from the summary.

  • The result is a list.

somgen223.stanford.edu 18

slide-19
SLIDE 19

Compute r squared for each group 2

species_groups %>% map_dbl(~summary(lm(Petal.Length ~ Petal.Width, data = .))$r.squared) [1] 0.1099785 0.6188467 0.1037537

  • This the same calculation as on the previous slide.
  • We use map_dbl instead of map to convert the result from a list to a vector of

double-precision floating-point numbers.

somgen223.stanford.edu 19

slide-20
SLIDE 20

Compute r squared for each group 2

species_groups %>% map(~ lm(Petal.Length ~ Petal.Width, data = .)) %>% map(summary) %>% map_dbl("r.squared") [1] 0.1099785 0.6188467 0.1037537

  • Advanced: This is the same calculation, but using the pipe operator for each step.
  • map_dbl normally takes a function to apply, but you can also give the name of a

element of the list to extract.

somgen223.stanford.edu 20

slide-21
SLIDE 21

Collect coefficients for each group 1

species_groups %>% map(~ lm(Petal.Length ~ Petal.Width, data = .)) %>% map(coefficients) [[1]] (Intercept) Petal.Width 1.3275634 0.5464903 [[2]] (Intercept) Petal.Width 1.781275 1.869325 [[3]] (Intercept) Petal.Width 4.2406526 0.6472593

  • This is close, but the result is a list of named vectors.
  • A named vector is like a named list: each position can be referred to a name, a

character string, instead of the numeric index.

somgen223.stanford.edu 21

slide-22
SLIDE 22

Converting a named vector into a one-row data frame

## Make a named vector c(x = 10, y = 20) x y 10 20 ## Convert to one-row data frame bind_rows(c(x = 10, y = 20)) # A tibble: 1 x 2 x y <dbl> <dbl> 1 10 20

somgen223.stanford.edu 22

slide-23
SLIDE 23

Collect coefficients for each group 2

species_groups %>% map(~ lm(Petal.Length ~ Petal.Width, data = .)) %>% map(coefficients) %>% ## convert named vector into 1-row data frame using bind_rows map(bind_rows) %>% ## now glue the data frames together bind_rows() # A tibble: 3 x 2 `(Intercept)` Petal.Width <dbl> <dbl> 1 1.33 0.546 2 1.78 1.87 3 4.24 0.647

somgen223.stanford.edu 23

slide-24
SLIDE 24

broom

  • If you do a lot of analyses like this, you should learn about the broom package.

somgen223.stanford.edu 24