Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 - - PowerPoint PPT Presentation

case study i bag plot
SMART_READER_LITE
LIVE PREVIEW

Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 - - PowerPoint PPT Presentation

DATA VISUALIZATION WITH GGPLOT2 Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 Write your own extensions Extremely flexible Create bag plot John Tukey (box plots) 2D box plot Data Visualization


slide-1
SLIDE 1

DATA VISUALIZATION WITH GGPLOT2

Case Study I Bag Plot

slide-2
SLIDE 2

Data Visualization with ggplot2

ggplot2 2.0

  • Write your own extensions
  • Extremely flexible
  • Create bag plot
  • John Tukey (box plots)
  • 2D box plot
slide-3
SLIDE 3

Data Visualization with ggplot2

data set

> dim(df) [1] 202 2 > head(df) type Value 1 1 99.43952 2 1 99.76982 3 1 101.55871 4 1 100.07051 5 1 100.12929 6 1 101.71506

slide-4
SLIDE 4

Data Visualization with ggplot2

2 box plots

> ggplot(df, aes(x = type, Value)) + geom_boxplot() + facet_wrap(~type, ncol = 2, scales = "free")

  • 1

2 98 100 102 104 146 148 150 152 1 2

type Value

slide-5
SLIDE 5

Data Visualization with ggplot2

slope plot

> df$ID <- seq_len(nrow(df) / 2) > ggplot(df, aes(x = type, Value, group = ID)) + geom_line(alpha = 0.3)

100 120 140 1 2

type Value

slide-6
SLIDE 6

Data Visualization with ggplot2

Distribution of slope

40 45 50

slope

Box plot?

slide-7
SLIDE 7

Data Visualization with ggplot2

2 distinct variables

> head(dat) group1 group2 1 99.43952 149.2896 2 99.76982 150.2569 3 101.55871 149.7533 4 100.07051 149.6525 5 100.12929 149.0484 6 101.71506 149.9550

slide-8
SLIDE 8

Data Visualization with ggplot2

Scaer plot

> ggplot(dat, aes(x = group1, y = group2)) + geom_point()

  • 146

148 150 152 98 100 102 104

group1 group2

slide-9
SLIDE 9

Data Visualization with ggplot2

2D density plot

> library(viridis) > ggplot(dat, aes(x = group1, y = group2)) + stat_density_2d(geom = "tile", aes(fill = ..density..), 
 contour = FALSE) + scale_fill_viridis()

145.0 147.5 150.0 152.5 98 100 102 104

group1 group2

0.05 0.10 0.15

density

slide-10
SLIDE 10

Data Visualization with ggplot2

Bag plot

> library(aplpack) > bagplot(dat[1:2])

98 100 102 104 146 148 150 152 group1 group2

  • hull

loop bag

slide-11
SLIDE 11

Data Visualization with ggplot2

aplpack

> library(aplpack) > plot_data <- compute.bagplot(x = dat$group1, y = dat$group2) > names(plot_data) [1] "center" "hull.center" "hull.bag" "hull.loop" 
 [5] "pxy.bag" "pxy.outer" "pxy.outlier" "hdepths" [9] "is.one.dim" "prdata" "xy" "xydata"

slide-12
SLIDE 12

Data Visualization with ggplot2

ggplot2

> ggplot(dat, aes(x = group1, y = group2)) + geom_point()

  • 146

148 150 152 98 100 102 104

group1 group2

slide-13
SLIDE 13

Data Visualization with ggplot2

ggplot2

> ggplot(dat, aes(x = group1, y = group2)) + stat_bag(alpha = 0.2)

146 148 150 152 98 100 102 104

group1 group2

slide-14
SLIDE 14

Data Visualization with ggplot2

Remarks

  • Useful but not popular
  • Poorly understood
  • Learn to use ggplot2 extensions
slide-15
SLIDE 15

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

slide-16
SLIDE 16

DATA VISUALIZATION WITH GGPLOT2

Case Study II Weather (Part 1)

slide-17
SLIDE 17

Data Visualization with ggplot2

Weather

Source: hp://www.edwardtue.com/

slide-18
SLIDE 18

Data Visualization with ggplot2

present

> dim(present) [1] 153 5 > head(present, n = 4) month day year temp new_day 1 1 1 2016 41 1 2 1 2 2016 37 2 3 1 3 2016 40 3 4 1 4 2016 33 4 > tail(present, n = 4) month day year temp new_day 148 5 28 2016 79 148 149 5 29 2016 80 149 150 5 30 2016 73 150 151 5 31 2016 76 151

slide-19
SLIDE 19

Data Visualization with ggplot2

Time series

> ggplot(present, aes(x = new_day, y = temp)) + geom_line()

20 40 60 80 50 100 150

new_day temp

slide-20
SLIDE 20

Data Visualization with ggplot2

past

> str(past) 'data.frame': 7645 obs. of 11 variables: $ month : num 1 1 1 1 1 1 1 1 1 1 ... $ day : num 1 2 3 4 5 6 7 8 9 10 ... $ year : num 1995 1995 1995 1995 1995 ... $ temp : num 44 41 28 31 21 27 42 35 34 29 ... $ new_day : int 1 2 3 4 5 6 7 8 9 10 ... $ upper : num 51 48 57 55 56 62 52 57 54 47 ... $ lower : num 17 15 16 15 21 14 14 12 21 8.5 ... $ avg : num 35.6 35.4 34.9 35.1 35.9 ... $ se : num 2.19 1.83 2.46 2.53 1.92 ... $ avg_upper: num 40.2 39.2 40 40.5 39.9 ... $ avg_lower: num 31 31.5 29.7 29.8 31.9 ...

slide-21
SLIDE 21

Data Visualization with ggplot2

Each year separately

> ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.2)

25 50 75 100 200 300

new_day temp

slide-22
SLIDE 22

Data Visualization with ggplot2

present + past

> ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.4) + geom_line(data = present, aes(group = 1), col = "red")

25 50 75 100 200 300

new_day temp

slide-23
SLIDE 23

Data Visualization with ggplot2

present + past

> ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.4) + geom_line(data = present, aes(group = 1), col = "red")

25 50 75 100 200 300

new_day temp

slide-24
SLIDE 24

Data Visualization with ggplot2

Linerange

25 50 75 100 200 300

new_day temp

slide-25
SLIDE 25

Data Visualization with ggplot2

Records

  • 25

50 75 100 200 300

new_day temp

slide-26
SLIDE 26

Data Visualization with ggplot2

Custom legend

  • 95% CI range

Current year past record high past record low

  • New record high

New record low

25 50 75 100 200 300

new_day temp

slide-27
SLIDE 27

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

slide-28
SLIDE 28

DATA VISUALIZATION WITH GGPLOT2

Case Study II Weather (Part 2)

slide-29
SLIDE 29

Data Visualization with ggplot2

Up to now

  • 95% CI range

Current year past record high past record low

  • New record high

New record low

25 50 75 100 200 300

new_day temp

slide-30
SLIDE 30

Data Visualization with ggplot2

Situation

  • Many data frames
  • Plot summary data frame as a layer
  • stat_summary()
slide-31
SLIDE 31

Data Visualization with ggplot2

stat_historical()

> ggplot(my_data, aes(x = new_day, y = temp, fill = year)) + stat_historical()

25 50 75 100 200 300

new_day temp

slide-32
SLIDE 32

Data Visualization with ggplot2

stat_present()

> ggplot(my_data, aes(x = new_day, y = temp, fill = year)) + stat_historical() + stat_present()

25 50 75 100 200 300

new_day temp

slide-33
SLIDE 33

Data Visualization with ggplot2

stat_extremes()

> ggplot(my_data, aes(new_day, temp, fill = year)) + stat_historical() + stat_present() + stat_extremes(aes(colour = ..record..))

  • 25

50 75 100 200 300

new_day temp

slide-34
SLIDE 34

Data Visualization with ggplot2

Specific layers

> ggplot(my_data, aes(new_day, temp, fill = year)) + stat_historical() + # stat_present() + stat_extremes(aes(colour = ..record..))

  • 25

50 75 100 200 300

new_day temp

slide-35
SLIDE 35

Data Visualization with ggplot2

Faceing

  • PARIS

REYKJAVIK NEW YORK LONDON 25 50 75 25 50 75 100 200 300 100 200 300

new_day temp

slide-36
SLIDE 36

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

slide-37
SLIDE 37

DATA VISUALIZATION WITH GGPLOT2

Wrap-up

slide-38
SLIDE 38

Data Visualization with ggplot2

Design Graphical Data Analysis Communication & Perception Statistics

slide-39
SLIDE 39

Data Visualization with ggplot2

Explain

Inform and Persuade

Explore

Confirm and Analyse

slide-40
SLIDE 40

Data Visualization with ggplot2

Element Description Data The dataset being ploed. Aesthetics The scales onto which we map our data. Geometries The visual elements used for our data.

slide-41
SLIDE 41

Data Visualization with ggplot2

Element Description Data The dataset being ploed. Aesthetics The scales onto which we map our data. Geometries The visual elements used for our data. Facets Ploing small multiples. Statistics Representations of our data to aid understanding. Coordinates The space on which the data will be ploed. Themes All non-data ink.

slide-42
SLIDE 42

Data Visualization with ggplot2

10 20 30 40 50 60 70 1931 1932

Year Yield (bushels/acre)

Site Waseca Crookston Morris University Farm Duluth Grand Rapids

  • 3

6 9 12 15 18 21 24 Carnivore Herbivore Insectivore Omnivore

Eating habits Total sleep time (h)

slide-43
SLIDE 43

Data Visualization with ggplot2

18 18 18 18 19 19 19 19 20 20 20 20 21 21 21 21 22 22 22 22 23 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 30 30 30 30 31 31 31 31 32 32 32 32 33 33 33 33 34 34 34 34 35 35 35 35 36 36 36 36 37 37 37 37 38 38 38 38 39 39 39 39 40 40 40 40 41 41 41 41 42 42 42 42 43 43 43 43 44 44 44 44 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 49 49 49 49 50 50 50 50 51 51 51 51 52 52 52 52 53 53 53 53 54 54 54 54 55 55 55 55 56 56 56 56 57 57 57 57 58 58 58 58 59 59 59 59 60 60 60 60 61 61 61 61 62 62 62 62 63 63 63 63 64 64 64 64 65 65 65 65 66 66 66 66 67 67 67 67 68 68 68 68 69 69 69 69 70 70 70 70 71 71 71 71 72 72 72 72 73 73 73 73 74 74 74 74 75 75 75 75 76 76 76 76 77 77 77 77 78 78 78 78 79 79 79 79 80 80 80 80 81 81 81 81 82 82 82 82 83 83 83 83 84 84 84 84 Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight

0.00 0.25 0.50 0.75 1.00 10000 20000 30000 40000

xtext 1

−5.0−2.5 0.0 2.5 5.0 residual

slide-44
SLIDE 44

Data Visualization with ggplot2

20 40 60 80 100 2 4 6 8 1 2 4 6 8 1

Silt Sand Clay

2 3 4 5 50 60 70 80 90

waiting eruptions

0.005 0.010 0.015 0.020 0.025

density

3 6 9 12

Unemployment (%)

slide-45
SLIDE 45

Data Visualization with ggplot2

2.0 2.5 3.0 3.5 4.0 4.5 4 5 6 7 8

Length Width Species

setosa versicolor virginica

Anderson, 1936

Iris Sepals

slide-46
SLIDE 46

Data Visualization with ggplot2

146 148 150 152 98 100 102 104

group1 group2

  • 95% CI range

Current year past record high past record low

  • New record high

New record low

25 50 75 100 200 300

new_day temp

slide-47
SLIDE 47

DATA VISUALIZATION WITH GGPLOT2

Thank you!