Binds Joining Data in R with dplyr Joining Data in R with dplyr - - PowerPoint PPT Presentation

binds joining data in r with dplyr
SMART_READER_LITE
LIVE PREVIEW

Binds Joining Data in R with dplyr Joining Data in R with dplyr - - PowerPoint PPT Presentation

JOINING DATA IN R WITH DPLYR Binds Joining Data in R with dplyr Joining Data in R with dplyr rbind() cbind() bind_rows() bind_cols() Joining Data in R with dplyr bind_rows() > band1 > band2 name surname name


slide-1
SLIDE 1

JOINING DATA IN R WITH DPLYR

Binds

slide-2
SLIDE 2

Joining Data in R with dplyr

slide-3
SLIDE 3

Joining Data in R with dplyr

  • rbind()
  • cbind()
  • bind_rows()
  • bind_cols()
slide-4
SLIDE 4

Joining Data in R with dplyr

> bind_rows(band1, band2) name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr 5 Mick Jagger 6 Keith Richards 7 Charlie Watts 8 Ronnie Wood

bind_rows()

> band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr > band2 name surname 1 Mick Jagger 2 Keith Richards 3 Charlie Watts 4 Ronnie Wood

tables to combine

slide-5
SLIDE 5

Joining Data in R with dplyr

bind_cols()

> bind_cols(band1, plays1) name surname instrument born 1 John Lennon Guitar 1940 2 Paul McCartney Bass 1942 3 George Harrison Guitar 1943 4 Ringo Starr Drums 1940 > band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr > plays1 instrument born 1 Guitar 1940 2 Bass 1942 3 Guitar 1943 4 Drums 1940

slide-6
SLIDE 6

Joining Data in R with dplyr

Benefits of bind_rows() and bind_cols()

  • Faster
  • Return a tibble
  • Can handle lists of data frames
  • .id
slide-7
SLIDE 7

Joining Data in R with dplyr

> bind_rows(Beatles = band1, Stones = band2, .id = "band") band name surname 1 Beatles John Lennon 2 Beatles Paul McCartney 3 Beatles George Harrison 4 Beatles Ringo Starr 5 Stones Mick Jagger 6 Stones Keith Richards 7 Stones Charlie Watts 8 Stones Ronnie Wood

bind_rows()

> band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr > band2 name surname 1 Mick Jagger 2 Keith Richards 3 Charlie Watts 4 Ronnie Wood

Label names for new column Column name for new column

slide-8
SLIDE 8

JOINING DATA IN R WITH DPLYR

Let’s practice!

slide-9
SLIDE 9

JOINING DATA IN R WITH DPLYR

Build a beer data frame

slide-10
SLIDE 10

Joining Data in R with dplyr

  • data.frame()
  • as.data.frame()
  • data_frame()
  • as_data_frame()
slide-11
SLIDE 11

Joining Data in R with dplyr

data.frame() defaults

  • Changes strings to factors
  • Adds row names
  • Changes unusual column names
slide-12
SLIDE 12

Joining Data in R with dplyr

> data_frame( + Beatles = c("John", "Paul", "George", "Ringo"), + Stones = c("Mick", "Keith", "Charlie", "Ronnie"), + Zeppelins = c("Robert", "Jimmy", "John Paul", "John") + ) # A tibble: 4 × 3 Beatles Stones Zeppelins <chr> <chr> <chr> 1 John Mick Robert 2 Paul Keith Jimmy 3 George Charlie John Paul 4 Ringo Ronnie John

data_frame()

slide-13
SLIDE 13

Joining Data in R with dplyr

data_frame()

data_frame() will not…

  • Change the data type of vectors (e.g. strings to factors)
  • Add row names
  • Change column names
  • Recycle vectors greater than length one
slide-14
SLIDE 14

Joining Data in R with dplyr

data_frame()

> data_frame( + numbers = 1:5, + squares = numbers ^ 2 + ) # A tibble: 5 × 2 numbers squares <int> <dbl> 1 1 1 2 2 4 3 3 9 4 4 16 5 5 25

  • Returns a tibble
  • Evaluates arguments lazily, in order
slide-15
SLIDE 15

Joining Data in R with dplyr

as_data_frame()

slide-16
SLIDE 16

JOINING DATA IN R WITH DPLYR

Let’s practice!

slide-17
SLIDE 17

JOINING DATA IN R WITH DPLYR

Working with data types

slide-18
SLIDE 18

Joining Data in R with dplyr

> 1 + 1 [1] 2 > "one" + "one" Error in "one" + "one" : non-numeric argument to binary operator

slide-19
SLIDE 19

Joining Data in R with dplyr

Character Number Character? Number?

slide-20
SLIDE 20

Joining Data in R with dplyr

> typeof(TRUE) [1] "logical" > typeof("hello") [1] "character" > typeof(3.14) [1] "double" > typeof(1L) [1] "integer" > typeof(1 + 2i) [1] "complex" > typeof(raw(1)) [1] "raw"

Atomic data types

Logical Character (i.e. string) Double (i.e. numeric w/ decimal) Integer (i.e. numeric w/o decimal) Complex Raw

slide-21
SLIDE 21

Joining Data in R with dplyr

> x <- c(1L, 2L, 3L, 2L) > x [1] 1 2 3 2 > typeof(x) [1] "integer" > class(x) [1] "integer" > attributes(x) <- list(class = "factor", levels = c("A", "B", "C", "D")) > x [1] A B C B Levels: A B C D > typeof(x) [1] "integer" > class(x) [1] "factor"

Classes

1L = A 2L = B 3L = C 4L = D

slide-22
SLIDE 22

JOINING DATA IN R WITH DPLYR

Let’s practice!

slide-23
SLIDE 23

JOINING DATA IN R WITH DPLYR

dplyr's coercion rules

slide-24
SLIDE 24

Joining Data in R with dplyr

Character Number Character? Number?

slide-25
SLIDE 25

Joining Data in R with dplyr

Character (string) Integer Double Logical Character Double Integer Logical Double

TRUE -> 1 FALSE -> 0

Integer Logical Integer

TRUE -> 1 FALSE -> 0

as.character() as.numeric() as.integer()

slide-26
SLIDE 26

Joining Data in R with dplyr

> as.character(x) [1] "A" "B" "C" "B" > as.numeric(x) [1] 1 2 3 2

factors

# x is a factor > x [1] A B C B Levels: A B C D # How x is stored? > unclass(x) [1] 1 2 3 2 attr(,"levels") [1] "A" "B" "C" "D"

slide-27
SLIDE 27

Joining Data in R with dplyr

factors

# y is a factor > y <- factor(c(5, 6, 7, 6)) > y [1] 5 6 7 6 Levels: 5 6 7 > unclass(y) [1] 1 2 3 2 attr(,"levels") [1] "5" "6" "7" > as.character(y) [1] "5" "6" "7" "6" > as.numeric(y) [1] 1 2 3 2 > as.numeric(as.character(y)) [1] 5 6 7 6

slide-28
SLIDE 28

Joining Data in R with dplyr

dplyr's coercion behavior

  • dplyr functions will not automatically coerce data types
  • Returns an error
  • Expects you to manually coerce data
  • Exception: factors
  • dplyr converts non-aligning factors to strings
  • Gives warning message
slide-29
SLIDE 29

JOINING DATA IN R WITH DPLYR

Let’s practice!