R package ggplot2 STAT 133 Gaston Sanchez Department of - - PowerPoint PPT Presentation

r package ggplot2
SMART_READER_LITE
LIVE PREVIEW

R package ggplot2 STAT 133 Gaston Sanchez Department of - - PowerPoint PPT Presentation

R package ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Scatterplot with "ggplot2" Terminology aesthetic


slide-1
SLIDE 1

R package ggplot2

STAT 133 Gaston Sanchez

Department of Statistics, UC–Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133

slide-2
SLIDE 2

ggplot2

2

slide-3
SLIDE 3

Scatterplot with "ggplot2"

Terminology

◮ aesthetic mappings ◮ geometric objects ◮ statistical transformations ◮ scales ◮ non-data elements (themes & elements) ◮ facets 3

slide-4
SLIDE 4

Considerations

Specifying graphical elements from 3 sources:

◮ The data values (represented by the geometric objects) ◮ The scales and coordinate system (axes, legends) ◮ Plot annotations (background, title, grid lines) 4

slide-5
SLIDE 5

Scatterplot with geom point

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point()

  • 100

200 300 10 15 20 25 30 35

mpg hp 5

slide-6
SLIDE 6

Another geom

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_line()

100 200 300 10 15 20 25 30 35

mpg hp 6

slide-7
SLIDE 7

Mapping Attributes

  • vs-

Setting Attributes

7

slide-8
SLIDE 8

Increase size of points

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(size = 3)

  • 100

200 300 10 15 20 25 30 35

mpg hp 8

slide-9
SLIDE 9

How does it work?

To increase the size of points, we set the aesthetic size to a constant value of 3 (inside the geoms function):

+ geom_point(size = 3)

9

slide-10
SLIDE 10

Adding color

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(size = 3, color = "tomato")

  • 100

200 300 10 15 20 25 30 35

mpg hp 10

slide-11
SLIDE 11

Adding color

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(size = 3, color = "#259ff8")

  • 100

200 300 10 15 20 25 30 35

mpg hp 11

slide-12
SLIDE 12

Test your knowledge

Identify the valid hex-color

A) "345677" B) "#1234567" C) "#AAAAAA" D) "#GG0033"

12

slide-13
SLIDE 13

Changing points shape

# 'shape' accepts 'pch' values ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(size = 3, color = "tomato", shape = 15)

100 200 300 10 15 20 25 30 35

mpg hp 13

slide-14
SLIDE 14

Setting and Mapping

Aesthetic attributes can be either mapped —via aes()— or set

# mapping aesthetic color ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point(aes(color = cyl)) # setting aesthetic color ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point(color = "blue")

14

slide-15
SLIDE 15

Geom text, and mapping labels

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_text(aes(label = gear))

4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4

100 200 300 10 15 20 25 30 35

mpg hp 15

slide-16
SLIDE 16

Changing axis labels and title

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(size = 3, color = "tomato") + xlab("miles per gallon") + ylab("horse power") + ggtitle("Scatter plot with ggplot2")

  • 100

200 300 10 15 20 25 30 35

miles per gallon horse power

Scatter plot with ggplot2

16

slide-17
SLIDE 17

Changing background theme

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(size = 3, color = "tomato") + xlab("miles per gallon") + ylab("horse power") + ggtitle("Scatter plot with ggplot2") + theme_bw()

  • 100

200 300 10 15 20 25 30 35

miles per gallon horse power

Scatter plot with ggplot2

17

slide-18
SLIDE 18

Your turn: Replicate this figure

100 200 300 10 15 20 25 30 35

miles per gallon horse power

disp 100 200 300 400

18

slide-19
SLIDE 19

Your turn: Replicate this figure

◮ Specify a color in hex notation ◮ Change the shape of the point symbol ◮ Map disp to attribute size of points ◮ Add axis labels 19

slide-20
SLIDE 20

Your turn

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(aes(size = disp), color = "#ff6666", shape = 17) + xlab("miles per gallon") + ylab("horse power")

20

slide-21
SLIDE 21

More geoms

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point() + geom_smooth(method = "lm")

  • 100

200 300 10 15 20 25 30 35

mpg hp 21

slide-22
SLIDE 22

More geoms

We can map variable to a color aesthetic. Here we map color to cyl (cylinders)

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(aes(color = cyl))

  • 100

200 300 10 15 20 25 30 35

mpg hp

4 5 6 7 8 cyl

22

slide-23
SLIDE 23

More geoms

If the variable that maps to color is a factor, then the color scale will change

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(aes(color = as.factor(cyl)))

  • 100

200 300 10 15 20 25 30 35

mpg hp

as.factor(cyl)

  • 4

6 8

23

slide-24
SLIDE 24

Your turn: Replicate this figure

100 200 300 400 10 15 20 25 30 35

miles per gallon displacement

factor(am) 1 hp 100 150 200 250 300

Scatter plot with ggplot2

24

slide-25
SLIDE 25

Your turn: example 2

◮ Map hp to attribute size of points ◮ Map am (as factor) to attribute color points ◮ Add an alpha transparency of 0.7 ◮ Change the shape of the point symbol ◮ Add axis labels ◮ Add a title 25

slide-26
SLIDE 26

Your turn: example 2

ggplot(data = mtcars, aes(x = mpg, y = disp)) + geom_point(aes(size = hp, color = factor(am)), alpha = 0.7) + xlab("miles per gallon") + ylab("displacement") + ggtitle("Scatter plot with ggplot2")

26

slide-27
SLIDE 27

Histogram

ggplot(data = mtcars, aes(x = mpg)) + geom_histogram(binwidth = 2)

2 4 6 10 20 30

mpg count

27

slide-28
SLIDE 28

Boxplots

ggplot(data = mtcars, aes(x = factor(cyl), y = mpg)) + geom_boxplot()

  • 10

15 20 25 30 35 4 6 8

factor(cyl) mpg

28

slide-29
SLIDE 29

Density Curves

ggplot(data = mtcars, aes(x = mpg)) + geom_density()

0.00 0.02 0.04 0.06 10 15 20 25 30 35

mpg density

29

slide-30
SLIDE 30

Density Curves

ggplot(data = mtcars, aes(x = mpg)) + geom_density(fill = "#c6b7f5")

0.00 0.02 0.04 0.06 10 15 20 25 30 35

mpg density

30

slide-31
SLIDE 31

Density Curves

ggplot(data = mtcars, aes(x = mpg)) + geom_density(fill = "#c6b7f5", alpha = 0.4)

0.00 0.02 0.04 0.06 10 15 20 25 30 35

mpg density

31

slide-32
SLIDE 32

Density Curves

ggplot(data = mtcars, aes(x = mpg)) + geom_line(stat = 'density', col = "#a868c0", size = 2)

0.02 0.03 0.04 0.05 0.06 0.07 10 15 20 25 30 35

mpg density

32

slide-33
SLIDE 33

Density Curves

ggplot(data = mtcars, aes(x = mpg)) + geom_density(fill = '#a868c0') + geom_line(stat = 'density', col = "#a868c0", size = 2)

0.00 0.02 0.04 0.06 10 15 20 25 30 35

mpg density

33

slide-34
SLIDE 34

ggplot objects

34

slide-35
SLIDE 35

Plot objects

You can assign a plot to a new object (this won’t plot anything):

mpg_hp <- ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(size = 3, color = "tomato")

To show the actual plot associated to the object mpg hp use the function print()

print(mpg_hp)

35

slide-36
SLIDE 36

"ggplot2" objects

working with ggplot objects, we can ...

◮ define a basic plot, to which we can add or change layers

without typing everything again

◮ render it on screen with print() ◮ describe its structure with summary() ◮ render it to disk with ggsave() ◮ save a cached copy to disk with save() 36

slide-37
SLIDE 37

Adding a title and axis labels to a ggplot2 object:

mpg_hp + ggtitle("Scatter plot with ggplot2") + xlab("miles per gallon") + ylab("horse power")

  • 100

200 300 10 15 20 25 30 35

miles per gallon horse power

Scatter plot with ggplot2

37

slide-38
SLIDE 38

Your turn: example 3

Create the following ggplot object:

# ggplot object

  • bj <- ggplot(data = mtcars,

aes(x = mpg, y = hp, label = rownames(mtcars)))

Add more layers to the object "”obj” in order to replicate the figure in the following slide:

38

slide-39
SLIDE 39

Your turn: example 3

Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird Fiat X1−9 Porsche 914−2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E

100 200 300 10 15 20 25 30 35

miles per gallon horse power

factor(am)

a a

1

Scatter plot

39

slide-40
SLIDE 40

Your turn: example 3

  • bj +

geom_text(aes(color = factor(am))) + ggtitle("Scatter plot") + xlab("miles per gallon") + ylab("horse power")

40

slide-41
SLIDE 41

Scales

41

slide-42
SLIDE 42

Scales

◮ The scales component encompases the ideas of both axes

and legends on plots, e.g.:

◮ Axes can be continuous or discrete ◮ Legends involve colors, symbol shapes, size, etc

– scale x continuous – scale y continuous – scale color manual

◮ scales will often automatically generate appropriate scales

for plots

◮ Explicitly adding a scale component overrides the default

scale

42

slide-43
SLIDE 43

Continuous axis scales

Use scale x continuous() to modify the default values in the x axis

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(aes(color = factor(am))) + scale_x_continuous(name = "miles per gallon", limits = c(10, 40), breaks = c(10, 20, 30, 40))

43

slide-44
SLIDE 44

Continuous axis scales

  • 100

200 300 10 20 30 40

miles per gallon hp

factor(am)

  • 1

44

slide-45
SLIDE 45

Continuous axis scales

Use scale y continuous() to modify the default values in the y axis

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(aes(color = factor(am))) + scale_x_continuous(name = "miles per gallon", limits = c(10, 40), breaks = c(10, 20, 30, 40)) + scale_y_continuous(name = "horsepower", limits = c(50, 350), breaks = seq(50, 350, by = 50))

45

slide-46
SLIDE 46

Continuous axis scales

  • 50

100 150 200 250 300 350 10 20 30 40

miles per gallon horsepower

factor(am)

  • 1

46

slide-47
SLIDE 47

Example: color scale

Use scale color manual() to modify the colors associated to a factor

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(aes(color = factor(am))) + scale_color_manual(values = c("orange", "purple"))

47

slide-48
SLIDE 48

Example: color scale

  • 100

200 300 10 15 20 25 30 35

mpg hp

factor(am)

  • 1

48

slide-49
SLIDE 49

Example: modifying legend

Modifying legends depends on the type of scales (e.g. color, shapes, size, etc)

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(aes(color = factor(am))) + scale_color_manual(values = c("orange", "purple"), name = "transmission", labels = c('no', 'yes'))

49

slide-50
SLIDE 50

Example: modifying legend

  • 100

200 300 10 15 20 25 30 35

mpg hp

transmission

  • no

yes

50

slide-51
SLIDE 51

Faceting

51

slide-52
SLIDE 52

Faceting with facet wrap()

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(color = "#3088f0") + facet_wrap(~ cyl)

  • 4

6 8 100 200 300 10 15 20 25 30 35 10 15 20 25 30 35 10 15 20 25 30 35

mpg hp

52

slide-53
SLIDE 53

Faceting with facet grid()

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(color = "#3088f0") + facet_grid(cyl ~ .)

  • 100

200 300 100 200 300 100 200 300 4 6 8 10 15 20 25 30 35

mpg hp

53

slide-54
SLIDE 54

Faceting with facet grid()

ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(color = "#3088f0") + facet_grid(. ~ cyl)

4 6 8

  • 100

200 300 10 15 20 25 30 35 10 15 20 25 30 35 10 15 20 25 30 35

mpg hp

54

slide-55
SLIDE 55

Layered Grammar

About "ggplot2"

◮ Key concept: layer (layered grammar of graphics) ◮ Designed to work in a layered fashion ◮ Starting with a layer showing the data ◮ Then adding layers of annotations and statistical

transformations

◮ Core idea: independents components combined togehter 55

slide-56
SLIDE 56

Some Concepts

◮ the data to be visualized ◮ a set of aesthetic mappings describing how varibales are

mapped to aesthetic attributes

◮ geometric objects, geoms, representing what you see on

the plot (points, lines, etc)

◮ statistical transformations, stats, summarizing data in

various ways

◮ scales that map values in the data space to values in an

aesthetic space

◮ a coordinate system, coord, describing how data

coordinates are mapped to the plane of the graphic

◮ a faceting specification describing how to break up the

data into subsets and to displays those subsets

56