day-1-slides Presentation July 2019 DOI: - - PDF document

โ–ถ
day 1 slides
SMART_READER_LITE
LIVE PREVIEW

day-1-slides Presentation July 2019 DOI: - - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/334207202 day-1-slides Presentation July 2019 DOI: 10.13140/RG.2.2.21639.04001 CITATIONS READS 0 33 1 author: Ruan van Mazijk


slide-1
SLIDE 1 See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/334207202 day-1-slides Presentation ยท July 2019 DOI: 10.13140/RG.2.2.21639.04001 CITATIONS READS 33 1 author: Some of the authors of this publication are also working on these related projects: Genome size, water-use ecophysiology, habitat & phenology in Cape Schoenoid sedges (Cyperaceae: Schoeneae) View project Plant species richness, turnover & environmental heterogeneity in the Cape and SW Australia View project Ruan van Mazijk University of Cape Town 21 PUBLICATIONS 3 CITATIONS SEE PROFILE All content following this page was uploaded by Ruan van Mazijk on 03 July 2019. The user has requested enhancement of the downloaded file.
slide-2
SLIDE 2

data_wrangling() && ("manipulation" %in% R)

postgraduate_workshop( dept = "Biological Sciences", presenter = c( "Ruan van Mazijk", "MSc candidate" ) )

๐Ÿฉ๐Ÿ ๐Ÿ€ %>% %>% %>% ๐Ÿค”๐Ÿ“‹๐Ÿฅฑ

> logos() > face()
slide-3
SLIDE 3

> introduce( )

slide-4
SLIDE 4

> introduce( )

  • BSc + Hons here at UCT
slide-5
SLIDE 5

> introduce( )

  • BSc + Hons here at UCT
  • Ecology & evolution
  • (Mostly plant) comparative biology
  • Biogeography
slide-6
SLIDE 6

> introduce( )

  • BSc + Hons here at UCT
  • Ecology & evolution
  • (Mostly plant) comparative biology
  • Biogeography
  • Been working with R for 4ยฝ years
  • Every major project Iโ€™ve doneโ€ฆ
slide-7
SLIDE 7

> introduce( )

Schoenus compar Silvermine, Table Mountatin NP
  • R. van Mazijk 2018
Tetraria ustulata Marloth NR
  • R. van Mazijk 2018
Tetraria thermalis Silvermine, Table Mountain NP
  • R. van Mazijk 2018
slide-8
SLIDE 8

> workshop$goals

slide-9
SLIDE 9

> workshop$goals

  • More reproducible science
slide-10
SLIDE 10

> workshop$goals

  • More reproducible science
  • Save time by:
  • Automating repetitive tasks
  • Eliminating human error
slide-11
SLIDE 11

> workshop$goals

  • More reproducible science
  • Save time by:
  • Automating repetitive tasks
  • Eliminating human error
  • Boost your skills
  • Think about your data programmatically
slide-12
SLIDE 12

tinyurl.com/r-with-ruan

Notes & slides will go up here:

(But I encourage you to make your own notes!)
slide-13
SLIDE 13

> workshop$outline

slide-14
SLIDE 14

> workshop$outline[1:3]

slide-15
SLIDE 15

> workshop$outline[1:3]

DAY 1

Tidy data principles

& tidyr
slide-16
SLIDE 16

> workshop$outline[1:3]

DAY 1

Tidy data principles

& tidyr

DAY 2

Manipulating data

& an intro to dplyr

DAY 3

Extending your data

with mutate(), summarise() & friends
slide-17
SLIDE 17

> workshop$outline[-(1:3)]

slide-18
SLIDE 18

> workshop$outline[-(1:3)]

2 dialects of R:

slide-19
SLIDE 19

> workshop$outline[-(1:3)]

2 dialects of R: base

$ [] [[]] apply() which() subset()

slide-20
SLIDE 20

> workshop$outline[-(1:3)]

2 dialects of R: base

$ [] [[]] apply() which() subset()

tidyverse

slide-21
SLIDE 21

data <- read.csv("my-data.csv")

slide-22
SLIDE 22

data <- read.csv("my-data.csv") data1 <- f(data, arg1 = "something")

๐Ÿ˜†

slide-23
SLIDE 23

data <- read.csv("my-data.csv") data1 <- f(data, arg1 = "something") data2 <- g(data1, another.thing = "blah")

๐Ÿ˜† ๐Ÿ˜ฆ

slide-24
SLIDE 24

data <- read.csv("my-data.csv") data1 <- f(data, arg1 = "something") data2 <- g(data1, another.thing = "blah") data3 <- h(data2, a.setting = TRUE)

๐Ÿ˜† ๐Ÿ˜ฆ ๐Ÿ˜ฑ

slide-25
SLIDE 25

data <- read.csv("my-data.csv") data1 <- f(data, arg1 = "something") data2 <- g(data1, another.thing = "blah") data3 <- h(data2, a.setting = TRUE) data4 <- data3[data3$a.column == "cough", ]

๐Ÿ˜† ๐Ÿ˜ฆ ๐Ÿ˜ฑ ๐Ÿค–

slide-26
SLIDE 26

data <- read.csv("my-data.csv") data1 <- f(data, arg1 = "something") data2 <- g(data1, another.thing = "blah") data3 <- h(data2, a.setting = TRUE) data4 <- data3[data3$a.column == "cough", ]

๐Ÿ˜† ๐Ÿ˜ฆ ๐Ÿ˜ฑ ๐Ÿค–

slide-27
SLIDE 27

data <- read.csv("my-data.csv")

slide-28
SLIDE 28

data <- read.csv("my-data.csv") data <- data

slide-29
SLIDE 29

data <- read.csv("my-data.csv") data <- f(data, arg1 = "something")

slide-30
SLIDE 30

data <- read.csv("my-data.csv") data <- g( f(data, arg1 = "something"), another.thing = "blah" )

slide-31
SLIDE 31

data <- read.csv("my-data.csv") data <- h( g( f(data, arg1 = "something"), another.thing = "blah" ), a.setting = TRUE )

slide-32
SLIDE 32

data <- read.csv("my-data.csv") data <- h( g( f(data, arg1 = "something"), another.thing = "blah" ), a.setting = TRUE )

๐Ÿ˜‘

slide-33
SLIDE 33

data <- read.csv("my-data.csv") data <- h( g( f(data, arg1 = "something"), another.thing = "blah" ), a.setting = TRUE )

๐Ÿ˜‘

slide-34
SLIDE 34

data <- read.csv("my-data.csv") data <- h( g( f(data, arg1 = "something"), another.thing = "blah" ), a.setting = TRUE )

๐Ÿ˜‘

slide-35
SLIDE 35

data <- read.csv("my-data.csv") data <- h( g( f(data, arg1 = "something"), another.thing = "blah" ), a.setting = TRUE )

๐Ÿ˜‘

slide-36
SLIDE 36

data <- read.csv("my-data.csv") data <- h( g( f(data, arg1 = "something"), another.thing = "blah" ), a.setting = TRUE )

๐Ÿ˜‘

data <- data[data$a.column == "cough", ]

๐Ÿคญ

slide-37
SLIDE 37

%>%

Solution: the pipe!

slide-38
SLIDE 38

%>%

Solution: the pipe! { } [ ] [[ ]] <- = ( ) , " " ' '

slide-39
SLIDE 39

%>%

Solution: the pipe! { } [ ] [[ ]] <- = ( ) , " " ' '

Read: โ€œthenโ€

slide-40
SLIDE 40

data <- read.csv("my-data.csv") data1 <- f(data, arg1 = "something") data2 <- g(data1, another.thing = "blah") data3 <- h(data2, a.setting = TRUE) data4 <- data3[data3$a.column == "cough", ]

๐Ÿ˜† ๐Ÿ˜ฆ ๐Ÿ˜ฑ ๐Ÿค–

slide-41
SLIDE 41

data

slide-42
SLIDE 42

โ†“ f() โ†“ g() โ†“ h()

data

slide-43
SLIDE 43

โ†“ f() โ†“ g() โ†“ h() โ†“ Some subsetting

data

slide-44
SLIDE 44

โ†“ f() โ†“ g() โ†“ h() โ†“ Some subsetting โ†“

new data data

slide-45
SLIDE 45

f(x)

slide-46
SLIDE 46

f(x) sort(1:10)

slide-47
SLIDE 47

f(x) sort(1:10) x %>% f()

slide-48
SLIDE 48

f(x) sort(1:10) x %>% f() 1:10 %>% sort()

slide-49
SLIDE 49

f(x, y) t.test(data$x, data$y)

slide-50
SLIDE 50

f(x, y) t.test(data$x, data$y) x %>% f(y) data$x %>% t.test(data$y)

slide-51
SLIDE 51

data <- read.csv("my-data.csv") data1 <- f(data, arg1 = "something") data2 <- g(data1, another.thing = "blah") data3 <- h(data2, a.setting = TRUE) data4 <- data3[data3$a.column == "cough", ]

๐Ÿ˜† ๐Ÿ˜ฆ ๐Ÿ˜ฑ ๐Ÿค–

slide-52
SLIDE 52

data <- read.csv("my-data.csv") data <- h( g( f(data, arg1 = "something"), another.thing = "blah" ), a.setting = TRUE )

๐Ÿ˜‘

data <- data[data$a.column == "cough", ]

๐Ÿคญ

slide-53
SLIDE 53

h(g(f(x)))

slide-54
SLIDE 54

h(g(f(x))) x %>%

slide-55
SLIDE 55

h(g(f(x))) x %>% f() %>%

slide-56
SLIDE 56

h(g(f(x))) x %>% f() %>% g() %>%

slide-57
SLIDE 57

h(g(f(x))) x %>% f() %>% g() %>% h()

โ†“ f() โ†“ g() โ†“ h() โ†“ Some subsetting โ†“ n e w d a t a d a t a
slide-58
SLIDE 58

data <- read.csv("my-data.csv") data <- h( g( f(data, arg1 = "something"), another.thing = "blah" ), a.setting = TRUE )

slide-59
SLIDE 59

data <- read.csv("my-data.csv") data <- data %>% f(arg1 = "something") %>% g(another.thing = "blah") %>% h(a.setting = TRUE)

slide-60
SLIDE 60

data <- read.csv("my-data.csv") data <- data %>% f(arg1 = "something") %>% g(another.thing = "blah") %>% h(a.setting = TRUE)

โ†“ f() โ†“ g() โ†“ h() โ†“ Some subsetting โ†“ n e w d a t a d a t a
slide-61
SLIDE 61

data <- read.csv("my-data.csv") data <- data %>% f(arg1 = "something") %>% g(another.thing = "blah") %>% h(a.setting = TRUE)

data <- data[data$a.column == "cough", ]

? ? ? ? ? ? ?

๐Ÿคฎ

slide-62
SLIDE 62

> workshop$outline[1:3]

DAY 1

Tidy data principles & tidyr

DAY 2

Manipulating data & an intro to dplyr

DAY 3

Extending your data with mutate(), summarise() & friends
slide-63
SLIDE 63

> workshop$outline[[1]]

DAY 1

Tidy data principles & tidyr

slide-64
SLIDE 64

> workshop$outline[[1]]

DAY 1

Tidy data principles & tidyr

slide-65
SLIDE 65

A motivating exampleโ€ฆ

slide-66
SLIDE 66

An example data-collection scenario in biology

Kogelberg NR,
  • R. van Mazijk
2019 Observation Pk,
  • R. van Mazijk 2018
Near Pearly Beach, Agulhas Plains,
  • R. van Mazijk 2018
slide-67
SLIDE 67

An example data-collection scenario in biology

Kogelberg NR,
  • R. van Mazijk
2019 Observation Pk,
  • R. van Mazijk 2018
Near Pearly Beach, Agulhas Plains,
  • R. van Mazijk 2018
slide-68
SLIDE 68

An example data-collection scenario in biology

Kogelberg NR,
  • R. van Mazijk
2019 Observation Pk,
  • R. van Mazijk 2018
Near Pearly Beach, Agulhas Plains,
  • R. van Mazijk 2018
slide-69
SLIDE 69

An example data-collection scenario in biology

Kogelberg NR,
  • R. van Mazijk
2019 Observation Pk,
  • R. van Mazijk 2018
Near Pearly Beach, Agulhas Plains,
  • R. van Mazijk 2018
slide-70
SLIDE 70

An example data-collection scenario in biology

Kogelberg NR,
  • R. van Mazijk
2019 Observation Pk,
  • R. van Mazijk 2018
Near Pearly Beach, Agulhas Plains,
  • R. van Mazijk 2018
slide-71
SLIDE 71

(A good way to collect your data!)

slide-72
SLIDE 72
slide-73
SLIDE 73

Site 1 Site 2 Site 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3

slide-74
SLIDE 74

One way to lay out your collected dataโ€ฆ ๐Ÿคฃ

Site 1 Site 2 Site 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3

slide-75
SLIDE 75

Site 1 Site 2 Site 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3

slide-76
SLIDE 76

Site 1 Site 2 Site 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3

???

slide-77
SLIDE 77

Site 1 Site 2 Site 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3

???

๐Ÿคฃ

slide-78
SLIDE 78

Site 1 Site 2 Site 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3 Sp 1 Sp 2 Sp 3

???

๐Ÿคฃ ๐Ÿ˜ฅ

slide-79
SLIDE 79

Another wayโ€ฆ ๐Ÿ˜ญ

Site 1 Site 2 Site 3 Sp

slide-80
SLIDE 80

The โ€œbestโ€ way. (Will make your life easiest in the long-term.)

๐Ÿ˜Ž๐Ÿ˜™ ๐Ÿ‘…๐Ÿ˜ด

Sp Site

slide-81
SLIDE 81

The โ€œbestโ€ way. (Will make your life easiest in the long-term.)

๐Ÿ˜Ž๐Ÿ˜™ ๐Ÿ‘…๐Ÿ˜ด

Sp Site

TIDY DATA

slide-82
SLIDE 82

TIDY DATA

CC BY-NC-ND 3.0 Grolemund & Wickham 2017. R for Data Science
slide-83
SLIDE 83

TIDY DATA

CC BY-NC-ND 3.0 Grolemund & Wickham 2017. R for Data Science
slide-84
SLIDE 84

TIDY DATA

CC BY-NC-ND 3.0 Grolemund & Wickham 2017. R for Data Science
  • 1. Each va
variable must have its own co column mn
  • 2. Each ob
  • bse
servation
  • n must have its own ro
row
  • 3. Each va
value, therefore, must have its own ce cell
slide-85
SLIDE 85 CC BY-NC-ND 3.0 Grolemund & Wickham 2017. R for Data Science

tidyr

An R-package all about getting to this:

slide-86
SLIDE 86

# Verbs to tidy your data

slide-87
SLIDE 87

# Verbs to tidy your data

# Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row

slide-88
SLIDE 88

# Verbs to tidy your data

# Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy variables? separate() # if > 1 variable per column unite() # if variables live in > 1 column

slide-89
SLIDE 89

Note the following when choosing tidyr-verbs:

slide-90
SLIDE 90

Note the following when choosing tidyr-verbs:

  • Be clear on what your ob
  • bse
servation
  • ns are:
  • Like, what uni
unit of your study โ€œcountsโ€ as an observation
  • E.g. Leaf traits: plant leaf vs plant individual
  • E.g. Reproductive success: egg size vs clutch size
slide-91
SLIDE 91

Note the following when choosing tidyr-verbs:

  • Be clear on what your ob
  • bse
servation
  • ns are:
  • Like, what uni
unit of your study โ€œcountsโ€ as an observation
  • E.g. Leaf traits: plant leaf vs plant individual
  • E.g. Reproductive success: egg size vs clutch size
  • This will depend on your study &/or data!
slide-92
SLIDE 92

Note the following when choosing tidyr-verbs:

  • Be clear on what your ob
  • bse
servation
  • ns are:
  • Like, what uni
unit of your study โ€œcountsโ€ as an observation
  • E.g. Leaf traits: plant leaf vs plant individual
  • E.g. Reproductive success: egg size vs clutch size
  • This will depend on your study &/or data!
  • Va
Variables are discrete, separate ideas!
slide-93
SLIDE 93

Note the following when choosing tidyr-verbs:

  • Be clear on what your ob
  • bse
servation
  • ns are:
  • Like, what uni
unit of your study โ€œcountsโ€ as an observation
  • E.g. Leaf traits: plant leaf vs plant individual
  • E.g. Reproductive success: egg size vs clutch size
  • This will depend on your study &/or data!
  • Va
Variables are discrete, separate ideas!
  • But again, this will depend on your study &/or data!
slide-94
SLIDE 94

# Verbs to tidy your data

# Untidy observations? gather() # if > 1 observation per row spread() # if observations live in > 1 row # Untidy variables? separate() # if > 1 variable per column unite() # if variables live in > 1 column

slide-95
SLIDE 95

# Untidy observations?

slide-96
SLIDE 96

# Untidy observations? gather() # if > 1 observation per row

slide-97
SLIDE 97

# Untidy observations? gather() # if > 1 observation per row data %>% gather(key, value, ...)

slide-98
SLIDE 98

# Untidy observations? gather() # if > 1 observation per row data %>% gather(key, value, ...)

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-99
SLIDE 99

# Untidy observations? gather() # if > 1 observation per row data %>% gather(key, value, ...)

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-100
SLIDE 100

# Untidy observations? gather() # if > 1 observation per row data %>% gather(year, cases, 1999, 2000)

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-101
SLIDE 101

# Untidy observations? spread() # if observations live in > 1 row

slide-102
SLIDE 102

# Untidy observations? spread() # if observations live in > 1 row data %>% spread(key, value)

slide-103
SLIDE 103

# Untidy observations? spread() # if observations live in > 1 row data %>% spread(key, value)

slide-104
SLIDE 104

# Untidy observations? spread() # if observations live in > 1 row data %>% spread(key, value)

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-105
SLIDE 105

# Untidy observations? spread() # if observations live in > 1 row data %>% spread(type, count)

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-106
SLIDE 106

# Untidy variables?

slide-107
SLIDE 107

# Untidy variables? separate() # if > 1 variable per column

slide-108
SLIDE 108

# Untidy variables? separate() # if > 1 variable per column data %>% separate(col, into, sep)

slide-109
SLIDE 109

# Untidy variables? separate() # if > 1 variable per column data %>% separate(col, into)

slide-110
SLIDE 110

# Untidy variables? separate() # if > 1 variable per column data %>% separate(col, into)

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-111
SLIDE 111

# Untidy variables? separate() # if > 1 variable per column data %>% separate(rate, c("cases", "pop"))

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-112
SLIDE 112

# Untidy variables? unite() # if variables live in > 1 column

slide-113
SLIDE 113

# Untidy variables? unite() # if variables live in > 1 column data %>% unite(col, ..., sep)

slide-114
SLIDE 114

# Untidy variables? unite() # if variables live in > 1 column data %>% unite(col, ...)

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-115
SLIDE 115

# Untidy variables? unite() # if variables live in > 1 column data %>% unite(year, century, year)

CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
slide-116
SLIDE 116

> demo()

slide-117
SLIDE 117

> demo()

tinyurl.com/unicorns-day-1 tinyurl.com/prepost-day-1 tinyurl.com/lang-day-1

DATASETS:

View publication stats View publication stats