Stats with geoms IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2 - - PowerPoint PPT Presentation

stats with geoms
SMART_READER_LITE
LIVE PREVIEW

Stats with geoms IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2 - - PowerPoint PPT Presentation

Stats with geoms IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2 Rick Scavetta Founder, Scavetta Academy ggplot2, course 2 Statistics Coordinates Facets Data Visualization Best Practices INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2


slide-1
SLIDE 1

Stats with geoms

IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2

Rick Scavetta

Founder, Scavetta Academy

slide-2
SLIDE 2

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

ggplot2, course 2

Statistics Coordinates Facets Data Visualization Best Practices

slide-3
SLIDE 3

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Statistics layer

Two categories of functions Called from within a geom Called independently

stats_

slide-4
SLIDE 4

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

geom_ <-> stat_

p <- ggplot(iris, aes(x = Se p + geom_histogram()

slide-5
SLIDE 5

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

geom_ <-> stat_

p <- ggplot(iris, aes(x = Sepal.Width)) p + geom_histogram() p + geom_bar()

slide-6
SLIDE 6

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

geom_ <-> stat_

p <- ggplot(mtcars, aes(x = factor(cyl), fill = factor(am) p + geom_bar() p + stat_count()

slide-7
SLIDE 7

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

The geom_/stat_ connection

stat_ geom_ stat_bin() geom_histogram() , geom_freqpoly() stat_count() geom_bar()

slide-8
SLIDE 8

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

stat_smooth()

ggplot(iris, aes(x = Sepal.Lengt y = Sepal.Width color = Species geom_point() + geom_smooth() geom_smooth() using method = 'lo formula 'y ~ x'

slide-9
SLIDE 9

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

stat_smooth(se = FALSE)

ggplot(iris, aes(x = Sepal.L y = Sepal.W color = Spe geom_point() + geom_smooth(se = FALSE) geom_smooth() using method = formula 'y ~ x'

slide-10
SLIDE 10

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

geom_smooth(span = 0.4)

ggplot(iris, aes(x = Sepal.L y = Sepal.W color = Spe geom_point() + geom_smooth(se = FALSE, sp geom_smooth() using method = formula 'y ~ x'

slide-11
SLIDE 11

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

geom_smooth(method = "lm")

ggplot(iris, aes(x = Sepal.L y = Sepal.W color = Spe geom_point() + geom_smooth(method = "lm",

slide-12
SLIDE 12

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

geom_smooth(fullrange = TRUE)

ggplot(iris, aes(x = Sepal.L y = Sepal.W color = Spe geom_point() + geom_smooth(method = "lm", fullrange = TR

slide-13
SLIDE 13

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

The geom_/stat_ connection

stat_ geom_ stat_bin() geom_histogram() , geom_freqpoly() stat_count() geom_bar() stat_smooth() geom_smooth()

slide-14
SLIDE 14

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Other stat_ functions

stat_ geom_ stat_boxplot() geom_boxplot()

slide-15
SLIDE 15

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Other stat_ functions

stat_ geom_ stat_boxplot() geom_boxplot() stat_bindot() geom_dotplot() stat_bin2d() geom_bin2d() stat_binhex() geom_hex()

slide-16
SLIDE 16

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Other stat_ functions

stat_ geom_ stat_boxplot() geom_boxplot() stat_bindot() geom_dotplot() stat_bin2d() geom_bin2d() stat_binhex() geom_hex() stat_contour() geom_contour() stat_quantile() geom_quantile() stat_sum() geom_count()

slide-17
SLIDE 17

Let's practice!

IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2

slide-18
SLIDE 18

Stats: sum and quantile

IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2

Rick Scavetta

Founder, Scavetta Academy

slide-19
SLIDE 19

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Recall from course 1

Cause of Over-plotting Solutions

  • 1. Large datasets

Alpha-blending, hollow circles, point size

  • 2. Aligned values on a single

axis As above, plus change position

  • 3. Low-precision data

Position: jitter

  • 4. Integer data

Position: jitter

slide-20
SLIDE 20

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Plot counts to overcome over-plotting

Cause of Over- plotting Solutions Here...

  • 1. Large datasets

Alpha-blending, hollow circles, point size

  • 2. Aligned values on

a single axis As above, plus change position

  • 3. Low-precision

data Position: jitter

geom_count()

  • 4. Integer data

Position: jitter

geom_count()

slide-21
SLIDE 21

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Low precision (& integer) data

p <- ggplot(iris, aes(Sepal. Sepal. p + geom_point()

slide-22
SLIDE 22

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Jittering may give a wrong impressions

p + geom_jitter(alpha = 0.5, width = 0.1, height = 0.1

slide-23
SLIDE 23

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

geom_count()

p + geom_count()

slide-24
SLIDE 24

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

The geom/stat connection

geom_ stat_

geom_count() stat_sum()

slide-25
SLIDE 25

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

stat_sum()

p + stat_sum()

slide-26
SLIDE 26

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Over-plotting can still be a problem!

ggplot(iris, aes(Sepal.Lengt Sepal.Width color = Spe geom_count(alpha = 0.4)

slide-27
SLIDE 27

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

geom_quantile()

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_count(alpha = 0.4)

slide-28
SLIDE 28

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Dealing with heteroscedasticity

library(AER) data(Journals) p <- ggplot(Journals, aes(log(price/ci log(subs))) geom_point(alpha = 0.5) + labs(...) p

slide-29
SLIDE 29

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Using geom_quantiles

p + geom_quantile(quantiles = c(0.05, 0.50

slide-30
SLIDE 30

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

The geom/stat connection

geom_ stat_

geom_count() stat_sum() geom_quantile() stat_quantile()

slide-31
SLIDE 31

Ready for exercises!

IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2

slide-32
SLIDE 32

Stats outside geoms

IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2

Rick Scavetta

Founder, Scavetta Academy

slide-33
SLIDE 33

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Basic plot

ggplot(iris, aes(x = Species y = Sepal.L geom_jitter(width = 0.2)

slide-34
SLIDE 34

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Calculating statistics

set.seed(123) xx <- rnorm(100) mean(xx) [1] 0.09040591 mean(xx) + (sd(xx) * c(-1, 1)) [1] -0.822410 1.003222

slide-35
SLIDE 35

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Calculating statistics

set.seed(123) xx <- rnorm(100) # Hmisc library(Hmisc) smean.sdl(xx, mult = 1) Mean Lower Upper 0.09040591 -0.82240997 1.00322179 # ggplot2 mean_sdl(xx, mult = 1) y ymin ymax 1 0.09040591 -0.82241 1.003222

slide-36
SLIDE 36

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

stat_summary()

ggplot(iris, aes(x = Species y = Sepal.L stat_summary(fun.data = mea fun.args = l

Uses

geom_pointrange() by

default

slide-37
SLIDE 37

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

stat_summary()

ggplot(iris, aes(x = Species y = Sepal.L stat_summary(fun.y = mean, geom = "point stat_summary(fun.data = me fun.args = li geom = "error width = 0.1)

slide-38
SLIDE 38

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Not recommended!

slide-39
SLIDE 39

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

95% condence interval

ERR <- qt(0.975, length(xx) - 1) * (sd(xx) / sqrt(length(xx))) mean(xx) 0.09040591 mean(xx) + (ERR * c(-1, 1)) # 95% CI

  • 0.09071657 0.27152838

mean_cl_normal(xx) y ymin ymax 0.09040591 -0.09071657 0.2715284

slide-40
SLIDE 40

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Other stat_ functions

stat_

Description

stat_summary()

summarize y values at distinct x values.

stat_function()

compute y values from a function of x values.

stat_qq()

perform calculations for a quantile-quantile plot.

slide-41
SLIDE 41

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

MASS::mammals

slide-42
SLIDE 42

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

Normal distribution

mam.new <- data.frame(body = log10(mam ggplot(mam.new, aes(x = body)) + geom_histogram(aes( y = ..density..) geom_rug() + stat_function(fun = dnorm, color = " args = list(mean = mea sd = sd(ma

slide-43
SLIDE 43

INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

QQ plot

ggplot(mam.new, aes(sample = stat_qq() + geom_qq_line(col = "red")

slide-44
SLIDE 44

Your turn!

IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2