Scatter plots IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G - - PowerPoint PPT Presentation

scatter plots
SMART_READER_LITE
LIVE PREVIEW

Scatter plots IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G - - PowerPoint PPT Presentation

Scatter plots IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT 2 Rick Sca v e a Fo u nder , Sca v e a Academ y 48 geometries geom _* abline conto u r dotplot ji er pointrange ribbon spoke area co u nt errorbar label


slide-1
SLIDE 1

Scatter plots

IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT2

Rick Scavea

Founder, Scavea Academy

slide-2
SLIDE 2

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

48 geometries

geom_* abline contour dotplot jier pointrange ribbon spoke area count errorbar label polygon rug step bar crossbar errorbarh line qq segment text bin2d curve freqpoly linerange qq_line sf tile blank density hex map quantile sf_label violin boxplot density2d histogram path raster sf_text vline col density_2d hline point rect smooth

slide-3
SLIDE 3

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Common plot types

Plot type Possible Geoms Scaer plots points, jier, abline, smooth, count

slide-4
SLIDE 4

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Scatter plots

Each geom can accept specic aesthetic mappings, e.g. geom_point(): Essential x,y

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()

slide-5
SLIDE 5

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Scatter plots

Each geom can accept specic aesthetic mappings, e.g. geom_point(): Essential Optional x,y alpha, color, ll, shape, size, stroke

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) + geom_point()

slide-6
SLIDE 6

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Geom-specific aesthetic mappings

# These result in the same plot! ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) + geom_point() ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(aes(col = Species))

Control aesthetic mappings of each layer independently:

slide-7
SLIDE 7

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

head(iris, 3) # Raw data Species Sepal.Length Sepal.Width Petal.Length Petal.Width 1 setosa 5.1 3.5 1.4 0.2 2 setosa 4.9 3.0 1.4 0.2 3 setosa 4.7 3.2 1.3 0.2 iris %>% group_by(Species) %>% summarise_all(mean) -> iris.summary iris.summary # Summary statistics # A tibble: 3 x 5 Species Sepal.Length Sepal.Width Petal.Length Petal.Width <fct> <dbl> <dbl> <dbl> <dbl> 1 setosa 5.01 3.43 1.46 0.246 2 versicolor 5.94 2.77 4.26 1.33 3 virginica 6.59 2.97 5.55 2.03

slide-8
SLIDE 8

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) + # Inherits both data and aes from ggplot() geom_point() + # Different data, but inherited aes geom_point(data = iris.summary, shape = 15, size = 5)

slide-9
SLIDE 9

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Shape attribute values

slide-10
SLIDE 10

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Example

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) + geom_point() + geom_point(data = iris.summary, shape = 21, size = 5, fill = "black", stroke = 2)

slide-11
SLIDE 11

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

On-the-fly stats by ggplot2

See the second course for the stats layer. Note: Avoid ploing only the mean without a measure of spread, e.g. the standard deviation.

slide-12
SLIDE 12

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

position = "jitter"

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) + geom_point(position = "jitter")

slide-13
SLIDE 13

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

geom_jitter()

A short-cut to geom_point(position = "jitter")

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) + geom_jitter()

slide-14
SLIDE 14

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Don't forget to adjust alpha

Combine jiering with alpha-blending if necessary

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) + geom_jitter(alpha = 0.6)

slide-15
SLIDE 15

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Hollow circles also help

shape = 1 is a. hollow circle.

Not necessary to also use alpha-blending.

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, col = Species)) + geom_jitter(shape = 1)

slide-16
SLIDE 16

Let's practice!

IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT2

slide-17
SLIDE 17

Histograms

IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT2

Rick Scavea

Founder, Scavea Academy

slide-18
SLIDE 18

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Common plot types

Plot type Possible Geoms Scaer plots points, jier, abline, smooth, count Bar plots histogram, bar, col, errorbar Line plots line, path

slide-19
SLIDE 19

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Histograms

ggplot(iris, aes(x = Sepal.Width)) + geom_histogram()

A plot of binned values i.e. a statistical function

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

slide-20
SLIDE 20

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Default of 30 even bins

ggplot(iris, aes(x = Sepal.Width)) + geom_histogram()

A plot of binned values i.e. a statistical function

# Default bin width: diff(range(iris$Sepal.Width))/30 [1] 0.08

slide-21
SLIDE 21

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Intuitive and meaningful bin widths

ggplot(iris, aes(x = Sepal.Width)) + geom_histogram(binwidth = 0.1)

Always set a meaningful bin widths for your data. No spaces between bars.

slide-22
SLIDE 22

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Re-position tick marks

ggplot(iris, aes(x = Sepal.Width)) + geom_histogram(binwidth = 0.1, center = 0.05)

Always set a meaningful bin widths for your data. No spaces between bars. X axis labels are between bars.

slide-23
SLIDE 23

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Different Species

ggplot(iris, aes(x = Sepal.Width, fill = Species)) + geom_histogram(binwidth = .1, center = 0.05)

slide-24
SLIDE 24

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Default position is "stack"

ggplot(iris, aes(x = Sepal.Width, fill = Species)) + geom_histogram(binwidth = .1, center = 0.05, position = "stack")

slide-25
SLIDE 25

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

position = "dodge"

ggplot(iris, aes(x = Sepal.Width, fill = Species)) + geom_histogram(binwidth = .1, center = 0.05, position = "dodge")

slide-26
SLIDE 26

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

position = "fill"

ggplot(iris, aes(x = Sepal.Width, fill = Species)) + geom_histogram(binwidth = .1, center = 0.05, position = "fill")

slide-27
SLIDE 27

Final Slide

IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT2

slide-28
SLIDE 28

Bar plots

IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT2

Rick Scavea

Founder, Scavea Academy

slide-29
SLIDE 29

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Bar Plots, with a categorical X-axis

Use geom_bar() or geom_col() Geom Stat Action

geom_bar()

"count" Counts the number of cases at each x position

geom_col()

"identity" Plot actual values All positions from before are available Two types Absolute counts Distributions

slide-30
SLIDE 30

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Bar Plots, with a categorical X-axis

Use geom_bar() or geom_col() Geom Stat Action

geom_bar()

"count" Counts the number of cases at each x position

geom_col()

"identity" Plot actual values

slide-31
SLIDE 31

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Bar Plots, with a categorical X-axis

Use geom_bar() or geom_col() Geom Stat Action

geom_bar()

"count" Counts the number of cases at each x position

geom_col()

"identity" Plot actual values All positions from before are available Two types Absolute counts Distributions

slide-32
SLIDE 32

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Habits of mammals

str(sleep) 'data.frame': 76 obs. of 3 variables: $ vore : Factor w/ 4 levels "carni","herbi",..: 1 4 2 4 2 2 1 1 2 2 ... $ total: num 12.1 17 14.4 14.9 4 14.4 8.7 10.1 3 5.3 ... $ rem : num NA 1.8 2.4 2.3 0.7 2.2 1.4 2.9 NA 0.6 ...

slide-33
SLIDE 33

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Bar plot

ggplot(sleep, aes(vore)) + geom_bar()

slide-34
SLIDE 34

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Plotting distributions instead of absolute counts

# Calculate Descriptive Statistics: iris %>% select(Species, Sepal.Width) %>% gather(key, value, -Species) %>% group_by(Species) %>% summarise(avg = mean(value), stdev = sd(value))

  • > iris_summ_long

iris_summ_long

Species avg stdev setosa 3.43 0.38 versicolor 2.77 0.31 virginica 2.97 0.32

slide-35
SLIDE 35

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Plotting distributions

ggplot(iris_summ_long, aes(x = Species, y = avg)) + geom_col() + geom_errorbar(aes(ymin = avg - stdev, ymax = avg + stdev), width = 0.1)

slide-36
SLIDE 36

Let's practice!

IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT2

slide-37
SLIDE 37

Line plots

IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT2

Rick Scavea

Founder, Scavea Academy

slide-38
SLIDE 38

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Common plot types

Plot type Possible Geoms Scaer plots points, jier, abline, smooth, count Bar plots histogram, bar, col, errorbar Line plots line, path

slide-39
SLIDE 39

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Beaver

str(beaver) 'data.frame': 101 obs. of 3 variables: $ time : POSIXct, format: "2000-01-01 09:30:00" "2000-01-01 09:40:00" "2000-01-01 09:50:00" ... $ temp : num 36.6 36.7 36.9 37.1 37.2 ... $ active: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

slide-40
SLIDE 40

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Beaver

ggplot(beaver, aes(x = time, y = temp)) + geom_line()

slide-41
SLIDE 41

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Beaver

ggplot(beaver, aes(x = time, y = temp, color = factor(active)) ) + geom_line()

slide-42
SLIDE 42

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

The fish catch dataset

str(fish) 'data.frame': 427 obs. of 3 variables: $ Species: Factor w/ 7 levels "Pink","Chum",..: 1 2 3 4 5 6 7 1 2 3 ... $ Year : int 1950 1950 1950 1950 1950 1950 1950 1951 1951 1951 ... $ Capture: int 100600 139300 64100 30500 0 23200 10800 259000 155900 51200 ...

slide-43
SLIDE 43

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Linetype aesthetic

ggplot(fish, aes(x = Year, y = Capture, linetype = Species)) + geom_line()

slide-44
SLIDE 44

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Size aesthetic

ggplot(fish, aes(x = Year, y = Capture, size = Species)) + geom_line()

slide-45
SLIDE 45

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Color aesthetic

ggplot(fish, aes(x = Year, y = Capture, color = Species)) + geom_line()

slide-46
SLIDE 46

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Aesthetics for categorical variables

slide-47
SLIDE 47

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Fill aesthetic with geom_area()

ggplot(fish, aes(x = Year, y = Capture, fill = Species)) + geom_area()

slide-48
SLIDE 48

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

Using position = "fill"

ggplot(fish, aes(x = Year, y = Capture, fill = Species)) + geom_area(position = "fill")

slide-49
SLIDE 49

INTRODUCTION TO DATA VISUALIZATION WITH GGPLOT2

geom_ribbon()

ggplot(fish, aes(x = Year, y = Capture, fill = Species)) + geom_ribbon(aes(ymax = Capture, ymin = 0), alpha = 0.3)

slide-50
SLIDE 50

Let's practice!

IN TR OD U C TION TO DATA VISU AL IZATION W ITH G G P L OT2