E x ploring n u merical data E XP L OR ATOR Y DATA AN ALYSIS IN R - - PowerPoint PPT Presentation

e x ploring n u merical data
SMART_READER_LITE
LIVE PREVIEW

E x ploring n u merical data E XP L OR ATOR Y DATA AN ALYSIS IN R - - PowerPoint PPT Presentation

E x ploring n u merical data E XP L OR ATOR Y DATA AN ALYSIS IN R Andre w Bra y Assistant Professor , Reed College Cars dataset str(cars) Classes tbl_df, tbl and 'data.frame': 428 obs. of 19 variables: $ name : chr


slide-1
SLIDE 1

Exploring numerical data

E XP L OR ATOR Y DATA AN ALYSIS IN R

Andrew Bray

Assistant Professor, Reed College

slide-2
SLIDE 2

EXPLORATORY DATA ANALYSIS IN R

Cars dataset

str(cars) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 428 obs. of 19 variables: $ name : chr "Chevrolet Aveo 4dr" "Chevrolet Aveo LS 4dr hatch" ... $ sports_car : logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ suv : logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ wagon : logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ minivan : logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ pickup : logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ all_wheel : logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ rear_wheel : logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ msrp : int 11690 12585 14610 14810 16385 13670 15040 13270 ... $ dealer_cost: int 10965 11802 13697 13884 15357 12849 14086 12482 ... $ eng_size : num 1.6 1.6 2.2 2.2 2.2 2 2 2 2 2 ... $ ncyl : int 4 4 4 4 4 4 4 4 4 4 ... $ horsepwr : int 103 103 140 140 140 132 132 130 110 130 ... $ city_mpg : int 28 28 26 26 26 29 29 26 27 26 ... $ hwy_mpg : int 34 34 37 37 37 36 36 33 36 33 ... $ weight : int 2370 2348 2617 2676 2617 2581 2626 2612 2606 ... $ wheel_base : int 98 98 104 104 104 105 105 103 103 103 ... $ length : int 167 153 183 183 183 174 174 168 168 168 ... $ width : int 66 66 69 68 69 67 67 67 67 67 ...

slide-3
SLIDE 3

EXPLORATORY DATA ANALYSIS IN R

Dotplot

ggplot(data, aes(x = weight)) + geom_dotplot(dotsize = 0.4)

slide-4
SLIDE 4

EXPLORATORY DATA ANALYSIS IN R

Histogram

ggplot(data, aes(x = weight)) + geom_histogram()

slide-5
SLIDE 5

EXPLORATORY DATA ANALYSIS IN R

Density plot

ggplot(data, aes(x = weight)) + geom_density()

slide-6
SLIDE 6

EXPLORATORY DATA ANALYSIS IN R

Density plot

ggplot(data, aes(x = weight)) + geom_density()

slide-7
SLIDE 7

EXPLORATORY DATA ANALYSIS IN R

Density plot

ggplot(data, aes(x = weight)) + geom_density()

slide-8
SLIDE 8

EXPLORATORY DATA ANALYSIS IN R

Boxplot

ggplot(data, aes(x = 1, y = weight)) + geom_boxplot() + coord_flip()

slide-9
SLIDE 9

EXPLORATORY DATA ANALYSIS IN R

Boxplot

ggplot(data, aes(x = 1, y = weight)) + geom_boxplot() + coord_flip()

slide-10
SLIDE 10

EXPLORATORY DATA ANALYSIS IN R

Boxplot

ggplot(data, aes(x = 1, y = weight)) + geom_boxplot() + coord_flip()

slide-11
SLIDE 11

EXPLORATORY DATA ANALYSIS IN R

Boxplot

ggplot(data, aes(x = 1, y = weight)) + geom_boxplot() + coord_flip()

slide-12
SLIDE 12

EXPLORATORY DATA ANALYSIS IN R

Faceted histogram

ggplot(cars, aes(x = hwy_mpg)) + geom_histogram() + facet_wrap(~pickup) `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning message: Removed 14 rows containing non-finite values (stat_bin).

slide-13
SLIDE 13

EXPLORATORY DATA ANALYSIS IN R

Faceted histogram

ggplot(cars, aes(x = hwy_mpg)) + geom_histogram() + facet_wrap(~pickup) `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning message: Removed 14 rows containing non-finite values (stat_bin).

slide-14
SLIDE 14

EXPLORATORY DATA ANALYSIS IN R

Faceted histogram

ggplot(cars, aes(x = hwy_mpg)) + geom_histogram() + facet_wrap(~pickup) `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning message: Removed 14 rows containing non-finite values (stat_bin).

slide-15
SLIDE 15

Let's practice!

E XP L OR ATOR Y DATA AN ALYSIS IN R

slide-16
SLIDE 16

Distribution of one variable

E XP L OR ATOR Y DATA AN ALYSIS IN R

Andrew Bray

Assistant Professor, Reed College

slide-17
SLIDE 17

EXPLORATORY DATA ANALYSIS IN R

Marginal vs. conditional

ggplot(cars, aes(x = hwy_mpg)) + geom_histogram() `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning message: Removed 14 rows containing non-finite values (stat_bin).

slide-18
SLIDE 18

EXPLORATORY DATA ANALYSIS IN R

Marginal vs. conditional

ggplot(cars, aes(x = hwy_mpg)) + geom_histogram() + facet_wrap(~pickup) `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning message: Removed 14 rows containing non-finite values (stat_bin).

slide-19
SLIDE 19

EXPLORATORY DATA ANALYSIS IN R

Building a data pipeline

cars2 <- cars %>% filter(eng_size < 2.0) ggplot(cars2, aes(x = hwy_mpg)) + geom_histogram()

slide-20
SLIDE 20

EXPLORATORY DATA ANALYSIS IN R

Building a data pipeline

cars %>% filter(eng_size < 2.0) %>% ggplot(aes(x = hwy_mpg)) + geom_histogram()

slide-21
SLIDE 21

EXPLORATORY DATA ANALYSIS IN R

Filtered and faceted histogram

cars %>% filter(eng_size < 2.0) %>% ggplot(aes(x = hwy_mpg)) + geom_histogram() `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

slide-22
SLIDE 22

EXPLORATORY DATA ANALYSIS IN R

Wide bin width

cars %>% filter(eng_size < 2.0) %>% ggplot(aes(x = hwy_mpg)) + geom_histogram(binwidth = 5)

slide-23
SLIDE 23

EXPLORATORY DATA ANALYSIS IN R

Density plot

cars %>% filter(eng_size < 2.0) %>% ggplot(aes(x = hwy_mpg)) + geom_density()

slide-24
SLIDE 24

EXPLORATORY DATA ANALYSIS IN R

Wide bandwidth

cars %>% filter(eng_size < 2.0) %>% ggplot(aes(x = hwy_mpg)) + geom_density(bw = 5)

slide-25
SLIDE 25

Let's practice!

E XP L OR ATOR Y DATA AN ALYSIS IN R

slide-26
SLIDE 26

Box plots

E XP L OR ATOR Y DATA AN ALYSIS IN R

Andrew Bray

Assistant Professor, Reed College

slide-27
SLIDE 27

EXPLORATORY DATA ANALYSIS IN R

slide-28
SLIDE 28

EXPLORATORY DATA ANALYSIS IN R

slide-29
SLIDE 29

EXPLORATORY DATA ANALYSIS IN R

slide-30
SLIDE 30

EXPLORATORY DATA ANALYSIS IN R

slide-31
SLIDE 31

EXPLORATORY DATA ANALYSIS IN R

slide-32
SLIDE 32

EXPLORATORY DATA ANALYSIS IN R

slide-33
SLIDE 33

EXPLORATORY DATA ANALYSIS IN R

slide-34
SLIDE 34

EXPLORATORY DATA ANALYSIS IN R

slide-35
SLIDE 35

EXPLORATORY DATA ANALYSIS IN R

slide-36
SLIDE 36

EXPLORATORY DATA ANALYSIS IN R

slide-37
SLIDE 37

EXPLORATORY DATA ANALYSIS IN R

slide-38
SLIDE 38

EXPLORATORY DATA ANALYSIS IN R

slide-39
SLIDE 39

EXPLORATORY DATA ANALYSIS IN R

Side-by-side box plots

ggplot(common_cyl, aes(x = as.factor(ncyl), y = city_mpg)) + geom_boxplot() Warning message: Removed 11 rows containing non-finite values (stat_boxplot).

slide-40
SLIDE 40

EXPLORATORY DATA ANALYSIS IN R

Side-by-side box plots

ggplot(common_cyl, aes(x = as.factor(ncyl), y = city_mpg)) + geom_boxplot() Warning message: Removed 11 rows containing non-finite values (stat_boxplot).

slide-41
SLIDE 41

EXPLORATORY DATA ANALYSIS IN R

Side-by-side box plots

ggplot(common_cyl, aes(x = as.factor(ncyl), y = city_mpg)) + geom_boxplot() Warning message: Removed 11 rows containing non-finite values (stat_boxplot).

slide-42
SLIDE 42

EXPLORATORY DATA ANALYSIS IN R

slide-43
SLIDE 43

EXPLORATORY DATA ANALYSIS IN R

slide-44
SLIDE 44

Let's practice!

E XP L OR ATOR Y DATA AN ALYSIS IN R

slide-45
SLIDE 45

Visualization in higher dimensions

E XP L OR ATOR Y DATA AN ALYSIS IN R

Andrew Bray

Assistant Professor, Reed College

slide-46
SLIDE 46

EXPLORATORY DATA ANALYSIS IN R

Plots for 3 variables

ggplot(cars, aes(x = msrp)) + geom_density() + facet_grid(pickup ~ rear_wheel)

slide-47
SLIDE 47

EXPLORATORY DATA ANALYSIS IN R

Plots for 3 variables

ggplot(cars, aes(x = msrp)) + geom_density() + facet_grid(pickup ~ rear_wheel, labeller = label_both)

slide-48
SLIDE 48

EXPLORATORY DATA ANALYSIS IN R

Plots for 3 variables

ggplot(cars, aes(x = msrp)) + geom_density() + facet_grid(pickup ~ rear_wheel, labeller = label_both) table(cars$rear_wheel, cars$pickup) FALSE TRUE FALSE 306 12 TRUE 98 12

slide-49
SLIDE 49

EXPLORATORY DATA ANALYSIS IN R

Higher dimensional plots

Shape Size Color Paern Movement x-coordinate y-coordinate

slide-50
SLIDE 50

Let's practice!

E XP L OR ATOR Y DATA AN ALYSIS IN R