JOINING DATA IN R WITH DPLYR
Binds Joining Data in R with dplyr Joining Data in R with dplyr - - PowerPoint PPT Presentation
Binds Joining Data in R with dplyr Joining Data in R with dplyr - - PowerPoint PPT Presentation
JOINING DATA IN R WITH DPLYR Binds Joining Data in R with dplyr Joining Data in R with dplyr rbind() cbind() bind_rows() bind_cols() Joining Data in R with dplyr bind_rows() > band1 > band2 name surname name
Joining Data in R with dplyr
Joining Data in R with dplyr
- rbind()
- cbind()
- bind_rows()
- bind_cols()
Joining Data in R with dplyr
> bind_rows(band1, band2) name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr 5 Mick Jagger 6 Keith Richards 7 Charlie Watts 8 Ronnie Wood
bind_rows()
> band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr > band2 name surname 1 Mick Jagger 2 Keith Richards 3 Charlie Watts 4 Ronnie Wood
tables to combine
Joining Data in R with dplyr
bind_cols()
> bind_cols(band1, plays1) name surname instrument born 1 John Lennon Guitar 1940 2 Paul McCartney Bass 1942 3 George Harrison Guitar 1943 4 Ringo Starr Drums 1940 > band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr > plays1 instrument born 1 Guitar 1940 2 Bass 1942 3 Guitar 1943 4 Drums 1940
Joining Data in R with dplyr
Benefits of bind_rows() and bind_cols()
- Faster
- Return a tibble
- Can handle lists of data frames
- .id
Joining Data in R with dplyr
> bind_rows(Beatles = band1, Stones = band2, .id = "band") band name surname 1 Beatles John Lennon 2 Beatles Paul McCartney 3 Beatles George Harrison 4 Beatles Ringo Starr 5 Stones Mick Jagger 6 Stones Keith Richards 7 Stones Charlie Watts 8 Stones Ronnie Wood
bind_rows()
> band1 name surname 1 John Lennon 2 Paul McCartney 3 George Harrison 4 Ringo Starr > band2 name surname 1 Mick Jagger 2 Keith Richards 3 Charlie Watts 4 Ronnie Wood
Label names for new column Column name for new column
JOINING DATA IN R WITH DPLYR
Let’s practice!
JOINING DATA IN R WITH DPLYR
Build a beer data frame
Joining Data in R with dplyr
- data.frame()
- as.data.frame()
- data_frame()
- as_data_frame()
Joining Data in R with dplyr
data.frame() defaults
- Changes strings to factors
- Adds row names
- Changes unusual column names
Joining Data in R with dplyr
> data_frame( + Beatles = c("John", "Paul", "George", "Ringo"), + Stones = c("Mick", "Keith", "Charlie", "Ronnie"), + Zeppelins = c("Robert", "Jimmy", "John Paul", "John") + ) # A tibble: 4 × 3 Beatles Stones Zeppelins <chr> <chr> <chr> 1 John Mick Robert 2 Paul Keith Jimmy 3 George Charlie John Paul 4 Ringo Ronnie John
data_frame()
Joining Data in R with dplyr
data_frame()
data_frame() will not…
- Change the data type of vectors (e.g. strings to factors)
- Add row names
- Change column names
- Recycle vectors greater than length one
Joining Data in R with dplyr
data_frame()
> data_frame( + numbers = 1:5, + squares = numbers ^ 2 + ) # A tibble: 5 × 2 numbers squares <int> <dbl> 1 1 1 2 2 4 3 3 9 4 4 16 5 5 25
- Returns a tibble
- Evaluates arguments lazily, in order
Joining Data in R with dplyr
as_data_frame()
JOINING DATA IN R WITH DPLYR
Let’s practice!
JOINING DATA IN R WITH DPLYR
Working with data types
Joining Data in R with dplyr
> 1 + 1 [1] 2 > "one" + "one" Error in "one" + "one" : non-numeric argument to binary operator
Joining Data in R with dplyr
Character Number Character? Number?
Joining Data in R with dplyr
> typeof(TRUE) [1] "logical" > typeof("hello") [1] "character" > typeof(3.14) [1] "double" > typeof(1L) [1] "integer" > typeof(1 + 2i) [1] "complex" > typeof(raw(1)) [1] "raw"
Atomic data types
Logical Character (i.e. string) Double (i.e. numeric w/ decimal) Integer (i.e. numeric w/o decimal) Complex Raw
Joining Data in R with dplyr
> x <- c(1L, 2L, 3L, 2L) > x [1] 1 2 3 2 > typeof(x) [1] "integer" > class(x) [1] "integer" > attributes(x) <- list(class = "factor", levels = c("A", "B", "C", "D")) > x [1] A B C B Levels: A B C D > typeof(x) [1] "integer" > class(x) [1] "factor"
Classes
1L = A 2L = B 3L = C 4L = D
JOINING DATA IN R WITH DPLYR
Let’s practice!
JOINING DATA IN R WITH DPLYR
dplyr's coercion rules
Joining Data in R with dplyr
Character Number Character? Number?
Joining Data in R with dplyr
Character (string) Integer Double Logical Character Double Integer Logical Double
TRUE -> 1 FALSE -> 0
Integer Logical Integer
TRUE -> 1 FALSE -> 0
as.character() as.numeric() as.integer()
Joining Data in R with dplyr
> as.character(x) [1] "A" "B" "C" "B" > as.numeric(x) [1] 1 2 3 2
factors
# x is a factor > x [1] A B C B Levels: A B C D # How x is stored? > unclass(x) [1] 1 2 3 2 attr(,"levels") [1] "A" "B" "C" "D"
Joining Data in R with dplyr
factors
# y is a factor > y <- factor(c(5, 6, 7, 6)) > y [1] 5 6 7 6 Levels: 5 6 7 > unclass(y) [1] 1 2 3 2 attr(,"levels") [1] "5" "6" "7" > as.character(y) [1] "5" "6" "7" "6" > as.numeric(y) [1] 1 2 3 2 > as.numeric(as.character(y)) [1] 5 6 7 6
Joining Data in R with dplyr
dplyr's coercion behavior
- dplyr functions will not automatically coerce data types
- Returns an error
- Expects you to manually coerce data
- Exception: factors
- dplyr converts non-aligning factors to strings
- Gives warning message
JOINING DATA IN R WITH DPLYR