Graphics in R STAT 133 Gaston Sanchez Department of Statistics, - - PowerPoint PPT Presentation

graphics in r
SMART_READER_LITE
LIVE PREVIEW

Graphics in R STAT 133 Gaston Sanchez Department of Statistics, - - PowerPoint PPT Presentation

Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 R Graphics 2 Understanding Graphics in R 2 main graphics systems


slide-1
SLIDE 1

Graphics in R

STAT 133 Gaston Sanchez

Department of Statistics, UC–Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133

slide-2
SLIDE 2

R Graphics

2

slide-3
SLIDE 3

Understanding Graphics in R 2 main graphics systems "graphics" & "grid"

3

slide-4
SLIDE 4

Basics of Graphics in R

Graphics Systems

◮ "graphics" and "grid" are the two main graphics

systems in R

◮ "graphics" is the traditional system, also referred to as

base graphics

◮ "grid" prodives low-level functions for programming

plotting functions

4

slide-5
SLIDE 5

Basics of Graphics in R

Graphics Engine

◮ Underneath "graphics" and "grid" there is the package

"grDevices"

◮ "grDevices" is the graphics engine in R ◮ It provides the graphics devices and support for colors and

fonts

5

slide-6
SLIDE 6

grid graphics

grDevices

maps diagram plotrix ggplot2 lattice

tikzDevice

JavaGD Cairo

6

slide-7
SLIDE 7

Basics of Graphics in R

Package "graphics"

The package "graphics" is the traditional system; it provides functions for complete plots, as well as low-level facilities. Many other graphics packages are built on top of graphics like "maps", "diagram", "pixmap", and many more.

7

slide-8
SLIDE 8

Understanding Graphics in R

Package "grid"

The "grid" package does not provide functions for drawing complete plots. "grid" is not used directly to produce statistical plots. Instead, it is used to build other graphics packages like "lattice" or "ggplot2".

8

slide-9
SLIDE 9

In this course

◮ In this course we’ll focus on the packages "graphics" and

"ggplot2"

◮ "graphics" is the traditional plotting system in R, and

many functions and packages are built on top of it.

◮ "ggplot2" excels at providing graphics for visualizing

multivariate data sets —in data.frame format—, while taking care of many issues for superior visual displays.

9

slide-10
SLIDE 10

R Graphics by Paul Murrell

10

slide-11
SLIDE 11

Some Resources

◮ R Graphics by Paul Murrell

book and webpage

◮ R Graphics Cookbook by Winston Chang

http://www.cookbook-r.com/Graphs/

◮ ggplot2: Elegant Graphics for Data Analysis by

Hadley Wickham

◮ R Graphs Cookbook by Hrishi Mittal ◮ Graphics for Statistics and Data Analysis with R by

Kevin Keen

11

slide-12
SLIDE 12

Traditional (Base) Graphics

12

slide-13
SLIDE 13

Base Graphics in R

Types of graphics functions

Graphics functions can be divided into two main types:

◮ high-level functions produce complete plots, e.g.

barplot(), boxplot(), dotchart()

◮ low-level functions add further output to an existing plot,

e.g. text(), points(), legend()

13

slide-14
SLIDE 14

The plot() function

◮ plot() is the most important high-level function in

traditional graphics

◮ The first argument to plot() provides the data to plot ◮ The provided data can take different forms: e.g. vectors,

factors, matrices, data frames.

◮ To be more precise, plot() is a generic function ◮ You can create your own plot() method function 14

slide-15
SLIDE 15

Basic Plots with plot()

In its basic form, we can use plot() to make graphics of:

◮ one single variable ◮ two variables ◮ multiple variables 15

slide-16
SLIDE 16

Plots of One Variable

16

slide-17
SLIDE 17

High-level graphics of a single variable

Function Data Description plot() numeric scatterplot plot() factor barplot plot() 1-D table barplot numeric can be either a vector or a 1-D array (e.g. row or column from a matrix)

17

slide-18
SLIDE 18

One variable objects

Vector / Factor row (data.frame) row (matrix) 1-D table column (data.frame) column (matrix)

18

slide-19
SLIDE 19

plot() of one variable

# plot numeric vector num_vec <- (c(1:10))^2 plot(num_vec) # plot factor set.seed(4) abc <- factor(sample(c('A', 'B', 'C'), 20, replace = TRUE)) plot(abc) # plot 1D-table abc_table <- table(abc) plot(abc_table)

19

slide-20
SLIDE 20

plot() of one variable

  • 2

4 6 8 10 20 40 60 80 100 Index num_vec A B C 2 4 6 8 2 4 6 8 abc abc_table A B C

20

slide-21
SLIDE 21

More high-level graphics of a single variable

Function Data Description barplot() numeric barplot pie() numeric pie chart dotchart() numeric dotplot boxplot() numeric boxplot hist() numeric histogram stripchart() numeric 1-D scatterplot stem() numeric stem-and-leaf plot

21

slide-22
SLIDE 22

Plots of one variable

# barplot numeric vector barplot(num_vec) # pie chart pie(1:3) # dot plot dotchart(num_vec)

22

slide-23
SLIDE 23

Plots of one variable

20 40 60 80 100 1 2 3

  • 20

40 60 80 100

23

slide-24
SLIDE 24

Plots of one variable

# barplot numeric vector boxplot(num_vec) # pie chart hist(num_vec) # dot plot stripchart(num_vec) # stem-and-leaf stem(num_vec)

24

slide-25
SLIDE 25

boxplot()

# boxplot boxplot(iris$Sepal.Length)

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

25

slide-26
SLIDE 26

hist()

# histogram hist(iris$Sepal.Length)

Histogram of iris$Sepal.Length

iris$Sepal.Length Frequency 4 5 6 7 8 5 10 15 20 25 30

26

slide-27
SLIDE 27

Test your knowledge

What option does not apply to histograms:

A) adjacent bars (no gaps) B) area of bars indicate proportions C) bins of equal length D) bars can be reordered

27

slide-28
SLIDE 28

stripchart()

# strip-chart (1-D scatter plot) # (for small sample sizes) stripchart(num_vec)

20 40 60 80 100

28

slide-29
SLIDE 29

stem()

# stem-and-leaf plot # (for small sample sizes) stem(num_vec) ## ## The decimal point is 1 digit(s) to the right of the | ## ## 0 | 1496 ## 2 | 56 ## 4 | 9 ## 6 | 4 ## 8 | 1 ## 10 | 0

29

slide-30
SLIDE 30

Kernel Density Curve

◮ Surprisingly, R does not have a specific function to plot

density curves

◮ R does have the density() function which computes a

kernel density estimate

◮ We can pass a "density" object to plot() in order to

get a density curve.

30

slide-31
SLIDE 31

Kernel Density Curve

# kernel density curve dens <- density(num_vec) plot(dens)

−50 50 100 150 0.000 0.004 0.008

density.default(x = num_vec)

N = 10 Bandwidth = 19.41 Density

31

slide-32
SLIDE 32

Test your knowledge

What type of plot is based on the five-number summary

A) bar chart B) box plot C) histogram D) scatterplot

32

slide-33
SLIDE 33

Plots of Two Variables

33

slide-34
SLIDE 34

High-level graphics of two variables

Function Data Description plot() numeric, numeric scatterplot plot() numeric, factor stripcharts plot() factor, numeric boxplots plot() factor, factor spineplot plot() 2-column numeric matrix scatterplot plot() 2-column numeric data.frame scatterplot plot() 2-D table mosaicplot

34

slide-35
SLIDE 35

Two variable objects

2-D table (frequency or crosstable) 2-column (numeric data.frame) 2-column (numeric matrix) 2 numeric vectors num vector, factor factor, num vector 2 factors

35

slide-36
SLIDE 36

Plots of two variables

# plot numeric, numeric plot(iris$Petal.Length, iris$Sepal.Length) # plot numeric, factor plot(iris$Petal.Length, iris$Species) # plot factor, numeric plot(iris$Species, iris$Petal.Length) # plot factor, factor plot(iris$Species, iris$Species)

36

slide-37
SLIDE 37

Plots of two variables

# plot numeric, numeric plot(iris$Petal.Length, iris$Sepal.Length)

  • 1

2 3 4 5 6 7 4.5 5.5 6.5 7.5 iris$Petal.Length iris$Sepal.Length 37

slide-38
SLIDE 38

Plots of two variables

# plot numeric, factor plot(iris$Petal.Length, iris$Species)

  • ● ●
  • ● ●
  • ● ●
  • 1

2 3 4 5 6 7 1.0 1.5 2.0 2.5 3.0 iris$Petal.Length iris$Species 38

slide-39
SLIDE 39

Plots of two variables

# plot factor, numeric plot(iris$Species, iris$Petal.Length)

  • setosa

versicolor virginica 1 2 3 4 5 6 7 39

slide-40
SLIDE 40

Plots of two variables

# plot factor, factor plot(iris$Species, iris$Species)

x y setosa versicolor virginica setosa versicolor virginica 0.0 0.2 0.4 0.6 0.8 1.0

40

slide-41
SLIDE 41

Plots of two variables

# some fake data set.seed(1) # hair color hair <- factor( sample(c('blond', 'black', 'brown'), 100, replace = TRUE)) # eye color eye <- factor( sample(c('blue', 'brown', 'green'), 100, replace = TRUE))

41

slide-42
SLIDE 42

Plots of two variables

# plot factor, factor plot(hair, eye)

x y black blond brown blue brown green 0.0 0.2 0.4 0.6 0.8 1.0

42

slide-43
SLIDE 43

More high-level graphics of two variables

Function Data Description sunflowerplot() numeric, numeric sunflower scatterplot smoothScatter() numeric, numeric smooth scatterplot boxplot() list of numeric boxplots barplot() matrix stacked / side-by-side barplot dotchart() matrix dotplot stripchart() list of numeric stripcharts spineplot() numeric, factor spinogram cdplot() numeric, factor conditional density plot fourfoldplot() 2x2 table fourfold display assocplot() 2-D table association plot mosaicplot() 2-D table mosaic plot

43

slide-44
SLIDE 44

Plots of two variables

# sunflower plot (numeric, numeric) sunflowerplot(iris$Petal.Length, iris$Sepal.Length)

1 2 3 4 5 6 7 4.5 5.5 6.5 7.5 iris$Petal.Length iris$Sepal.Length

  • 44
slide-45
SLIDE 45

Plots of two variables

# smooth scatter plot (numeric, numeric) smoothScatter(iris$Petal.Length, iris$Sepal.Length)

1 2 3 4 5 6 7 4.5 5.5 6.5 7.5 iris$Petal.Length iris$Sepal.Length 45

slide-46
SLIDE 46

Plots of two variables

# boxplots (numeric, numeric) boxplot(iris$Petal.Length, iris$Sepal.Length)

1 2 1 2 3 4 5 6 7 8 46

slide-47
SLIDE 47

Plots of two variables

m <- matrix(1:8, 4, 2) # barplot (numeric matrix) barplot(m)

5 10 15 20 25 47

slide-48
SLIDE 48

Plots of two variables

m <- matrix(1:8, 4, 2) # barplot (numeric matrix) barplot(m, beside = TRUE)

2 4 6 8 48

slide-49
SLIDE 49

Plots of two variables

# conditional density plot (numeric, factor) cdplot(iris$Petal.Length, iris$Species)

iris$Petal.Length iris$Species 2 3 4 5 6 setosa virginica 0.0 0.2 0.4 0.6 0.8 1.0 49

slide-50
SLIDE 50

Two categorical variables: frequency table

# 2-D table (HairEyeColor data) x <- margin.table(HairEyeColor, c(1, 2)) x ## Eye ## Hair Brown Blue Hazel Green ## Black 68 20 15 5 ## Brown 119 84 54 29 ## Red 26 17 14 14 ## Blond 7 94 10 16

50

slide-51
SLIDE 51

Plots of two categorical variables

# mosaic plot (2-D table) mosaicplot(x, main = "Relation between hair and eye color")

Relation between hair and eye color

Hair Eye

Black Brown Red Blond Brown Blue Hazel Green

51

slide-52
SLIDE 52

Plots of two categorical variables

# association plot (2-D table) assocplot(x, main = "Relation between hair and eye color")

Black Brown Red Blond Green Blue Brown

Relation between hair and eye color

Hair Eye 52

slide-53
SLIDE 53

Plots of Multiple Variables

53

slide-54
SLIDE 54

High-level graphics of multiple variables

Function Data Description plot() data frame scatterplot matrix pairs() matrix scatterplot matrix matplot() matrix scatterplot stars() matrix star plots image() numeric, numeric, numeric image plot contour() numeric, numeric, numeric contour plot filled.contour() numeric, numeric, numeric filled contour plot persp() numeric, numeric, numeric 3-D surface symbols() numeric, numeric, numeric symbols scatterplot mosaicplot() N-D table mosaic plot

54

slide-55
SLIDE 55

Plots of multiple variables

# scatter plot matrix (data frame) plot(iris[ , 1:4]) ## Warning: closing unused connection 5 (http://gastonsanchez.com/education.csv)

Sepal.Length

2.0 3.0 4.0

  • 0.5

1.5 2.5 4.5 6.0 7.5

  • 2.0

3.0 4.0

  • Sepal.Width
  • ● ●
  • ● ●
  • ● ●
  • Petal.Length

1 3 5 7

  • 0.5

1.5 2.5

  • Petal.Width

55

slide-56
SLIDE 56

Plots of multiple variables

# scatter plot matrix (data frame) pairs(iris[ , 1:4])

Sepal.Length

2.0 3.0 4.0

  • 0.5

1.5 2.5 4.5 6.0 7.5

  • 2.0

3.0 4.0

  • Sepal.Width
  • ● ●
  • ● ●
  • ● ●
  • Petal.Length

1 3 5 7

  • 4.5

6.0 7.5 0.5 1.5 2.5

  • 1

3 5 7

  • Petal.Width

56

slide-57
SLIDE 57

Plots of multiple variables

# scatter plot matrix (data frame) matplot(iris[ , 1:4])

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 100 150 2 4 6 8 iris[, 1:4] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

57

slide-58
SLIDE 58

Plots of multiple variables

# star plot (data frame) stars(iris[ , 1:4])

58

slide-59
SLIDE 59

Plots of multiple variables

# color image (matrix) image(t(volcano)[ncol(volcano):1, ])

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

59

slide-60
SLIDE 60

Plots of multiple variables

# display of Maunga Whau volcano x <- 10*(1:nrow(volcano)) y <- 10*(1:ncol(volcano)) image(x, y, volcano, col = terrain.colors(100), axes = FALSE) contour(x, y, volcano, levels = seq(90, 200, by = 5), add = TRUE, col = "peru") axis(1, at = seq(100, 800, by = 100)) axis(2, at = seq(100, 600, by = 100)) box() title(main = "Maunga Whau Volcano", font.main = 4)

60

slide-61
SLIDE 61

Plots of multiple variables

x y

95 1 100 1 105 1 5 105 110 110 110 1 1 115 115 115 120 125 130 135 140 145 150 1 5 5 155 1 6 160 165 165 1 7 170 175 180 1 8 5 190

100 200 300 400 500 600 700 800 100 200 300 400 500 600

Maunga Whau Volcano

61

slide-62
SLIDE 62

Plots of multiple variables

# mosaic plot of N-D tables mosaicplot(HairEyeColor) HairEyeColor

Hair Eye

Black Brown Red Blond Brown Blue Hazel Green MaleFemale Male Female Male Female Male Female

62

slide-63
SLIDE 63

Plots of multiple variables

# symbols scatter plots symbols(iris[, 1], iris[, 2], circles = iris[, 3]/100, inches = FALSE) 4 5 6 7 8 2.0 3.0 4.0 iris[, 1] iris[, 2]

  • 63
slide-64
SLIDE 64

Graphics Parameters

Graphics Functions and Arguments

◮ Plot functions usually come with various arguments ◮ Typically, the first argument(s) is the data object(s) to be

plotted

◮ Most of the other arguments have default options ◮ Graphic arguments have a consisting naming convention,

but there will always be some exception

64

slide-65
SLIDE 65

Graphical Parameters

Graphical Arguments

◮ Some arguments are specific to a function (e.g. horiz or

beside in barplot())

◮ Other arguments are more general (e.g. col, xlab, ylab) ◮ General graphical parameters are listed in the

documentation of the function par()

◮ See ?par for more information 65

slide-66
SLIDE 66

Graphics in R

How to choose a graphics approach?

◮ look first for an existing function that does what you want

—or something similar to what you want (don’t reivent the wheel!)

◮ Existing plotting functions can be combined and

customized by using optional arguments or graphical parameters

◮ For exploratory data analysis (quick and dirty) the plotting

functions in "graphics" is a good option

66