Introduction Data Visualization with ggplot2 Chapter 1 0.15 0.10 - - PowerPoint PPT Presentation

introduction
SMART_READER_LITE
LIVE PREVIEW

Introduction Data Visualization with ggplot2 Chapter 1 0.15 0.10 - - PowerPoint PPT Presentation

DATA VISUALIZATION WITH GGPLOT2 Introduction Data Visualization with ggplot2 Chapter 1 0.15 0.10 density


slide-1
SLIDE 1

DATA VISUALIZATION WITH GGPLOT2

Introduction

slide-2
SLIDE 2

Data Visualization with ggplot2

Chapter 1

  • 5000

10000 15000 Fair Good Very Good Premium Ideal

cut price

0.00 0.05 0.10 0.15 −2.5 0.0 2.5

bimodal density

5 10 15 20 Carnivore Herbivore Insectivore Omnivore

vore sleep_total vore

Carnivore Herbivore Insectivore Omnivore

slide-3
SLIDE 3

Data Visualization with ggplot2

Chapter 2

20 40 60 80 100 20 40 60 80 100 20 40 60 80 100

Silt Sand Clay

5000 10000 15000 1 2 3 4 5

carat price

A− A+ AB− AB+ B− B+ O− O+

10 20 30 40 50 60 −10 −5 5 10 Fitted values Residuals

  • lm(Volume ~ Girth)

Residuals vs Fitted

31 20 19

slide-4
SLIDE 4

Data Visualization with ggplot2

Chapter 3

Brandenburger Tor Potsdamer Platz Victory Column Checkpoint Charlie Reichstag Alexander Platz

slide-5
SLIDE 5

Data Visualization with ggplot2

Chapter 3

Brandenburger Tor Potsdamer Platz Victory Column Checkpoint Charlie Reichstag Alexander Platz

slide-6
SLIDE 6

Data Visualization with ggplot2

Chapter 4

  • Introduction to grid
  • Manipulating graphical objects
  • ggplot_build()
  • gridExtra
slide-7
SLIDE 7

Data Visualization with ggplot2

Chapter 5

146 148 150 152 98 100 102 104

group1 group2

  • 95% CI range

Current year past record high past record low

  • New record high

New record low

25 50 75 100 200 300

new_day temp

  • PARIS

REYKJAVIK NEW YORK LONDON 25 50 75 25 50 75 100 200 300 100 200 300

new_day temp

slide-8
SLIDE 8

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

slide-9
SLIDE 9

DATA VISUALIZATION WITH GGPLOT2

Box Plots

slide-10
SLIDE 10

Data Visualization with ggplot2

Statistical plots

  • Academic audience
  • 2 common types
  • Box plots
  • Density plots
  • Case study: 2D box plots
slide-11
SLIDE 11

Data Visualization with ggplot2

Box plot

  • John Tukey - Exploratory Data Analysis
  • Visualizing the 5 number summary
slide-12
SLIDE 12

Data Visualization with ggplot2

  • −2

−1 1 2

values

slide-13
SLIDE 13

Data Visualization with ggplot2

  • −2

−1 1 2

values

mean standard deviation Not robust!

slide-14
SLIDE 14

Data Visualization with ggplot2

  • −2

−1 1 2

values

minimum

slide-15
SLIDE 15

Data Visualization with ggplot2

  • −2

−1 1 2

values

Q1 minimum

slide-16
SLIDE 16

Data Visualization with ggplot2

  • −2

−1 1 2

values

Q1 minimum Q2

slide-17
SLIDE 17

Data Visualization with ggplot2

  • −2

−1 1 2

values

Q1 minimum Q2 Q3

slide-18
SLIDE 18

Data Visualization with ggplot2

  • −2

−1 1 2

values

Q1 minimum Q2 Q3 maximum = median IQR = interquartile range

slide-19
SLIDE 19

Data Visualization with ggplot2

  • −2

−1 1 2

values

2 1 3 4 5 5-number summary 25% 25% 25% 25%

slide-20
SLIDE 20

Data Visualization with ggplot2

  • −2

2 4 6

values

slide-21
SLIDE 21

Data Visualization with ggplot2

  • −2

2 4 6

values

slide-22
SLIDE 22

Data Visualization with ggplot2

  • −2

2 4 6

values

slide-23
SLIDE 23

Data Visualization with ggplot2

  • −2

2 4 6

values

slide-24
SLIDE 24

Data Visualization with ggplot2

  • −2

2 4 6

values

slide-25
SLIDE 25

Data Visualization with ggplot2

  • −2

2 4 6

values

slide-26
SLIDE 26

Data Visualization with ggplot2

  • −2

2 4 6

values

slide-27
SLIDE 27

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

slide-28
SLIDE 28

DATA VISUALIZATION WITH GGPLOT2

Density Plots

slide-29
SLIDE 29

Data Visualization with ggplot2

  • Distribution of univariate data
  • Statistics
  • Probability Density Function
  • Theoretical: based on formula
  • Empirical: based on data

Density plot

0.0 0.1 0.2 0.3 0.4 −3 −2 −1 1 2 3

x f(x)

Standard Normal Curve

0.0 0.1 0.2 0.3 0.4 −3 −2 −1 1 2 3

x f(x)

t (8)

0.0 0.5 1.0 1.5 2.0 1 2 3 4

x f(x)

chi−sq (1)

0.00 0.25 0.50 0.75 1.00 1 2 3 4

x f(x)

F (2,18)

slide-30
SLIDE 30

Data Visualization with ggplot2

Kernel Density Estimate (KDE)

A sum of 'bumps' placed at the observations. 
 The kernel function determines the shape of the bumps 
 while the window width, h, determines their width.

Source: Brian S. Everi and Torsten Hothorn, A Handbook of Statistical Analyses Using R

slide-31
SLIDE 31

Data Visualization with ggplot2

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

x

Example

> x <- c(0.0, 1.0, 1.1, 1.5, 1.9, 2.8, 2.9, 3.5) > x [1] 0.0 1.0 1.1 1.5 1.9 2.8 2.9 3.5

slide-32
SLIDE 32

Data Visualization with ggplot2

0.0 0.1 0.2 0.3 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

x values

Bumps

slide-33
SLIDE 33

Data Visualization with ggplot2

0.0 0.1 0.2 0.3 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

x values

Sum of bumps

Many overlapping lines -> higher value -> higher density Empirical Probability Density Function mode = value at which probability density function has its maximum value

slide-34
SLIDE 34

Data Visualization with ggplot2

Bandwidth - h

0.0 0.1 0.2 0.3 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

bw = 0.4 values

0.0 0.1 0.2 0.3 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

bw = 0.69 values

Remember: Density plots are representations

  • f the underlying distribution!

0.279 0.355

slide-35
SLIDE 35

Data Visualization with ggplot2

0.0 0.1 0.2 0.3 0.4 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

bw = 0.4 values

Intermediate steps Plot extends beyond limits of data

0.0 0.1 0.2 0.3 1 2 3

bw = 0.4, restricted to range density

geom_density() area ≠ 1 happens for every bandwidth!

slide-36
SLIDE 36

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

slide-37
SLIDE 37

DATA VISUALIZATION WITH GGPLOT2

Multiple Groups/Variables

slide-38
SLIDE 38

Data Visualization with ggplot2

Groups

Levels within a factor variable

> head(mammals) vore sleep_total 1 Carnivore 12.1 2 Omnivore 17.0 3 Herbivore 14.4 4 Omnivore 14.9 5 Herbivore 4.0 6 Herbivore 14.4 > levels(mammals$vore) [1] "Carnivore" "Herbivore" "Insectivore" "Omnivore"

slide-39
SLIDE 39

Data Visualization with ggplot2

Jiered points

  • 5

10 15 20 Carnivore Herbivore Insectivore Omnivore

vore sleep_total

ggplot(mammals, aes(x = vore, y = sleep_total)) + geom_point(position = position_jitter(0.2))

slide-40
SLIDE 40

Data Visualization with ggplot2

Box plot

  • 5

10 15 20 Carnivore Herbivore Insectivore Omnivore

vore sleep_total

ggplot(mammals, aes(x = vore, y = sleep_total)) + geom_boxplot()

5 observations - meaningless!

slide-41
SLIDE 41

Data Visualization with ggplot2

Box plot (2)

  • 5

10 15 20 Carnivore Herbivore Insectivore Omnivore

vore sleep_total

ggplot(mammals, aes(x = vore, y = sleep_total)) + geom_boxplot(varwidth = TRUE)

slide-42
SLIDE 42

Data Visualization with ggplot2

Density plots

ggplot(mammals, aes(x = sleep_total, fill = vore)) + geom_density(col = NA, alpha = 0.35)

0.0 0.1 0.2 0.3 5 10 15 20

sleep_total density vore

Carnivore Herbivore Insectivore Omnivore

abundant, but only 5 observations!

> # Add weights > mammals <- mammals %>% group_by(vore) %>% mutate(n = n()/nrow(mammals))

slide-43
SLIDE 43

Data Visualization with ggplot2

Weighted

0.0 0.5 1.0 1.5 5 10 15 20

sleep_total density vore

Carnivore Herbivore Insectivore Omnivore

ggplot(mammals, aes(x = sleep_total, fill = vore)) + geom_density(aes(weight = n), col = NA, alpha = 0.35)

slide-44
SLIDE 44

Data Visualization with ggplot2

Violin plot

5 10 15 20 Carnivore Herbivore Insectivore Omnivore

vore sleep_total

ggplot(mammals, aes(x = vore, y = sleep_total)) + geom_violin()

slide-45
SLIDE 45

Data Visualization with ggplot2

5 10 15 20 Carnivore Herbivore Insectivore Omnivore

vore sleep_total vore

Carnivore Herbivore Insectivore Omnivore

ggplot(mammals, aes(x = vore, 
 y = sleep_total, 
 fill = vore)) + geom_violin(aes(weight = n), col = NA)

Weighted

slide-46
SLIDE 46

Data Visualization with ggplot2

Compare separate variables

> dim(faithful) [1] 272 2 > head(faithful) eruptions waiting 1 3.600 79 2 1.800 54 3 3.333 74 4 2.283 62 5 4.533 85 6 2.883 55

slide-47
SLIDE 47

Data Visualization with ggplot2

First look

ggplot(faithful, aes(x = waiting, y = eruptions)) + geom_point()

  • 2

3 4 5 50 60 70 80 90

waiting eruptions

slide-48
SLIDE 48

Data Visualization with ggplot2

2D density plot

ggplot(faithful, aes(x = waiting, y = eruptions)) + geom_density_2d()

2 3 4 5 50 60 70 80 90

waiting eruptions

slide-49
SLIDE 49

Data Visualization with ggplot2

2 3 4 5 50 60 70 80 90

waiting eruptions

0.005 0.010 0.015 0.020 0.025

density

2D density plot

ggplot(faithful, aes(x = waiting, y = eruptions)) + stat_density_2d(geom = "tile", 
 aes(fill = ..density..), contour = FALSE)

slide-50
SLIDE 50

Data Visualization with ggplot2

2 3 4 5 50 60 70 80 90

waiting eruptions

0.005 0.010 0.015 0.020 0.025

density

Viridis

library(viridis) ggplot(faithful, aes(x = waiting, y = eruptions)) + stat_density_2d(geom = "tile", aes(fill = ..density..), contour = FALSE) + scale_fill_viridis()

slide-51
SLIDE 51

Data Visualization with ggplot2

  • ●●●●●● ● ●
  • ●●●●●●● ● ●
  • ●●●●●●● ● ●
  • ●●●●●●● ● ●
  • ● ●●●●●● ● ●
  • ● ● ● ● ● ● ● ● ●
  • ● ● ● ● ● ● ●
  • ● ● ● ● ●
  • ● ● ● ● ● ● ●
  • ● ● ● ● ● ● ● ● ●
  • ● ● ● ●●●●● ● ●
  • ● ● ●●●●●●● ● ●
  • ● ●●●●
  • ●●● ●
  • ●●●
  • ●● ● ●
  • ●●●
  • ●● ● ●
  • ● ●●●
  • ●● ● ●
  • ● ●●●●
  • ●● ● ●
  • ● ●●●●●● ● ● ●
  • ● ● ●●●● ● ● ●
  • 2

3 4 5 50 60 70 80 90

waiting eruptions density

  • 0.005

0.010 0.015 0.020

Grid of circles

ggplot(faithful, aes(x = waiting, y = eruptions)) + stat_density_2d(geom = "point", aes(size = ..density..), n = 20, contour = FALSE) + scale_size(range = c(0, 9))

slide-52
SLIDE 52

DATA VISUALIZATION WITH GGPLOT2

Let’s practice!