Graphics in R
STAT 133 Gaston Sanchez
Department of Statistics, UC–Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133
Graphics in R STAT 133 Gaston Sanchez Department of Statistics, - - PowerPoint PPT Presentation
Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 R Graphics 2 Understanding Graphics in R 2 main graphics systems
STAT 133 Gaston Sanchez
Department of Statistics, UC–Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133
2
3
◮ "graphics" and "grid" are the two main graphics
systems in R
◮ "graphics" is the traditional system, also referred to as
base graphics
◮ "grid" prodives low-level functions for programming
plotting functions
4
◮ Underneath "graphics" and "grid" there is the package
"grDevices"
◮ "grDevices" is the graphics engine in R ◮ It provides the graphics devices and support for colors and
fonts
5
grid graphics
grDevices
maps diagram plotrix ggplot2 lattice
tikzDevice
JavaGD Cairo
6
The package "graphics" is the traditional system; it provides functions for complete plots, as well as low-level facilities. Many other graphics packages are built on top of graphics like "maps", "diagram", "pixmap", and many more.
7
The "grid" package does not provide functions for drawing complete plots. "grid" is not used directly to produce statistical plots. Instead, it is used to build other graphics packages like "lattice" or "ggplot2".
8
◮ In this course we’ll focus on the packages "graphics" and
"ggplot2"
◮ "graphics" is the traditional plotting system in R, and
many functions and packages are built on top of it.
◮ "ggplot2" excels at providing graphics for visualizing
multivariate data sets —in data.frame format—, while taking care of many issues for superior visual displays.
9
10
◮ R Graphics by Paul Murrell
book and webpage
◮ R Graphics Cookbook by Winston Chang
http://www.cookbook-r.com/Graphs/
◮ ggplot2: Elegant Graphics for Data Analysis by
Hadley Wickham
◮ R Graphs Cookbook by Hrishi Mittal ◮ Graphics for Statistics and Data Analysis with R by
Kevin Keen
11
12
Graphics functions can be divided into two main types:
◮ high-level functions produce complete plots, e.g.
barplot(), boxplot(), dotchart()
◮ low-level functions add further output to an existing plot,
e.g. text(), points(), legend()
13
◮ plot() is the most important high-level function in
traditional graphics
◮ The first argument to plot() provides the data to plot ◮ The provided data can take different forms: e.g. vectors,
factors, matrices, data frames.
◮ To be more precise, plot() is a generic function ◮ You can create your own plot() method function 14
In its basic form, we can use plot() to make graphics of:
◮ one single variable ◮ two variables ◮ multiple variables 15
16
Function Data Description plot() numeric scatterplot plot() factor barplot plot() 1-D table barplot numeric can be either a vector or a 1-D array (e.g. row or column from a matrix)
17
Vector / Factor row (data.frame) row (matrix) 1-D table column (data.frame) column (matrix)
18
# plot numeric vector num_vec <- (c(1:10))^2 plot(num_vec) # plot factor set.seed(4) abc <- factor(sample(c('A', 'B', 'C'), 20, replace = TRUE)) plot(abc) # plot 1D-table abc_table <- table(abc) plot(abc_table)
19
4 6 8 10 20 40 60 80 100 Index num_vec A B C 2 4 6 8 2 4 6 8 abc abc_table A B C
20
Function Data Description barplot() numeric barplot pie() numeric pie chart dotchart() numeric dotplot boxplot() numeric boxplot hist() numeric histogram stripchart() numeric 1-D scatterplot stem() numeric stem-and-leaf plot
21
# barplot numeric vector barplot(num_vec) # pie chart pie(1:3) # dot plot dotchart(num_vec)
22
20 40 60 80 100 1 2 3
40 60 80 100
23
# barplot numeric vector boxplot(num_vec) # pie chart hist(num_vec) # dot plot stripchart(num_vec) # stem-and-leaf stem(num_vec)
24
# boxplot boxplot(iris$Sepal.Length)
4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
25
# histogram hist(iris$Sepal.Length)
Histogram of iris$Sepal.Length
iris$Sepal.Length Frequency 4 5 6 7 8 5 10 15 20 25 30
26
A) adjacent bars (no gaps) B) area of bars indicate proportions C) bins of equal length D) bars can be reordered
27
# strip-chart (1-D scatter plot) # (for small sample sizes) stripchart(num_vec)
20 40 60 80 100
28
# stem-and-leaf plot # (for small sample sizes) stem(num_vec) ## ## The decimal point is 1 digit(s) to the right of the | ## ## 0 | 1496 ## 2 | 56 ## 4 | 9 ## 6 | 4 ## 8 | 1 ## 10 | 0
29
◮ Surprisingly, R does not have a specific function to plot
density curves
◮ R does have the density() function which computes a
kernel density estimate
◮ We can pass a "density" object to plot() in order to
get a density curve.
30
# kernel density curve dens <- density(num_vec) plot(dens)
−50 50 100 150 0.000 0.004 0.008
density.default(x = num_vec)
N = 10 Bandwidth = 19.41 Density
31
A) bar chart B) box plot C) histogram D) scatterplot
32
33
Function Data Description plot() numeric, numeric scatterplot plot() numeric, factor stripcharts plot() factor, numeric boxplots plot() factor, factor spineplot plot() 2-column numeric matrix scatterplot plot() 2-column numeric data.frame scatterplot plot() 2-D table mosaicplot
34
2-D table (frequency or crosstable) 2-column (numeric data.frame) 2-column (numeric matrix) 2 numeric vectors num vector, factor factor, num vector 2 factors
35
# plot numeric, numeric plot(iris$Petal.Length, iris$Sepal.Length) # plot numeric, factor plot(iris$Petal.Length, iris$Species) # plot factor, numeric plot(iris$Species, iris$Petal.Length) # plot factor, factor plot(iris$Species, iris$Species)
36
# plot numeric, numeric plot(iris$Petal.Length, iris$Sepal.Length)
2 3 4 5 6 7 4.5 5.5 6.5 7.5 iris$Petal.Length iris$Sepal.Length 37
# plot numeric, factor plot(iris$Petal.Length, iris$Species)
2 3 4 5 6 7 1.0 1.5 2.0 2.5 3.0 iris$Petal.Length iris$Species 38
# plot factor, numeric plot(iris$Species, iris$Petal.Length)
versicolor virginica 1 2 3 4 5 6 7 39
# plot factor, factor plot(iris$Species, iris$Species)
x y setosa versicolor virginica setosa versicolor virginica 0.0 0.2 0.4 0.6 0.8 1.0
40
# some fake data set.seed(1) # hair color hair <- factor( sample(c('blond', 'black', 'brown'), 100, replace = TRUE)) # eye color eye <- factor( sample(c('blue', 'brown', 'green'), 100, replace = TRUE))
41
# plot factor, factor plot(hair, eye)
x y black blond brown blue brown green 0.0 0.2 0.4 0.6 0.8 1.0
42
Function Data Description sunflowerplot() numeric, numeric sunflower scatterplot smoothScatter() numeric, numeric smooth scatterplot boxplot() list of numeric boxplots barplot() matrix stacked / side-by-side barplot dotchart() matrix dotplot stripchart() list of numeric stripcharts spineplot() numeric, factor spinogram cdplot() numeric, factor conditional density plot fourfoldplot() 2x2 table fourfold display assocplot() 2-D table association plot mosaicplot() 2-D table mosaic plot
43
# sunflower plot (numeric, numeric) sunflowerplot(iris$Petal.Length, iris$Sepal.Length)
1 2 3 4 5 6 7 4.5 5.5 6.5 7.5 iris$Petal.Length iris$Sepal.Length
# smooth scatter plot (numeric, numeric) smoothScatter(iris$Petal.Length, iris$Sepal.Length)
1 2 3 4 5 6 7 4.5 5.5 6.5 7.5 iris$Petal.Length iris$Sepal.Length 45
# boxplots (numeric, numeric) boxplot(iris$Petal.Length, iris$Sepal.Length)
1 2 1 2 3 4 5 6 7 8 46
m <- matrix(1:8, 4, 2) # barplot (numeric matrix) barplot(m)
5 10 15 20 25 47
m <- matrix(1:8, 4, 2) # barplot (numeric matrix) barplot(m, beside = TRUE)
2 4 6 8 48
# conditional density plot (numeric, factor) cdplot(iris$Petal.Length, iris$Species)
iris$Petal.Length iris$Species 2 3 4 5 6 setosa virginica 0.0 0.2 0.4 0.6 0.8 1.0 49
# 2-D table (HairEyeColor data) x <- margin.table(HairEyeColor, c(1, 2)) x ## Eye ## Hair Brown Blue Hazel Green ## Black 68 20 15 5 ## Brown 119 84 54 29 ## Red 26 17 14 14 ## Blond 7 94 10 16
50
# mosaic plot (2-D table) mosaicplot(x, main = "Relation between hair and eye color")
Relation between hair and eye color
Hair Eye
Black Brown Red Blond Brown Blue Hazel Green
51
# association plot (2-D table) assocplot(x, main = "Relation between hair and eye color")
Black Brown Red Blond Green Blue Brown
Relation between hair and eye color
Hair Eye 52
53
Function Data Description plot() data frame scatterplot matrix pairs() matrix scatterplot matrix matplot() matrix scatterplot stars() matrix star plots image() numeric, numeric, numeric image plot contour() numeric, numeric, numeric contour plot filled.contour() numeric, numeric, numeric filled contour plot persp() numeric, numeric, numeric 3-D surface symbols() numeric, numeric, numeric symbols scatterplot mosaicplot() N-D table mosaic plot
54
# scatter plot matrix (data frame) plot(iris[ , 1:4]) ## Warning: closing unused connection 5 (http://gastonsanchez.com/education.csv)
Sepal.Length
2.0 3.0 4.0
1.5 2.5 4.5 6.0 7.5
3.0 4.0
1 3 5 7
1.5 2.5
55
# scatter plot matrix (data frame) pairs(iris[ , 1:4])
Sepal.Length
2.0 3.0 4.0
1.5 2.5 4.5 6.0 7.5
3.0 4.0
1 3 5 7
6.0 7.5 0.5 1.5 2.5
3 5 7
56
# scatter plot matrix (data frame) matplot(iris[ , 1:4])
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 100 150 2 4 6 8 iris[, 1:4] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
57
# star plot (data frame) stars(iris[ , 1:4])
58
# color image (matrix) image(t(volcano)[ncol(volcano):1, ])
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
59
# display of Maunga Whau volcano x <- 10*(1:nrow(volcano)) y <- 10*(1:ncol(volcano)) image(x, y, volcano, col = terrain.colors(100), axes = FALSE) contour(x, y, volcano, levels = seq(90, 200, by = 5), add = TRUE, col = "peru") axis(1, at = seq(100, 800, by = 100)) axis(2, at = seq(100, 600, by = 100)) box() title(main = "Maunga Whau Volcano", font.main = 4)
60
x y
95 1 100 1 105 1 5 105 110 110 110 1 1 115 115 115 120 125 130 135 140 145 150 1 5 5 155 1 6 160 165 165 1 7 170 175 180 1 8 5 190
100 200 300 400 500 600 700 800 100 200 300 400 500 600
Maunga Whau Volcano
61
# mosaic plot of N-D tables mosaicplot(HairEyeColor) HairEyeColor
Hair Eye
Black Brown Red Blond Brown Blue Hazel Green MaleFemale Male Female Male Female Male Female
62
# symbols scatter plots symbols(iris[, 1], iris[, 2], circles = iris[, 3]/100, inches = FALSE) 4 5 6 7 8 2.0 3.0 4.0 iris[, 1] iris[, 2]
◮ Plot functions usually come with various arguments ◮ Typically, the first argument(s) is the data object(s) to be
plotted
◮ Most of the other arguments have default options ◮ Graphic arguments have a consisting naming convention,
but there will always be some exception
64
◮ Some arguments are specific to a function (e.g. horiz or
beside in barplot())
◮ Other arguments are more general (e.g. col, xlab, ylab) ◮ General graphical parameters are listed in the
documentation of the function par()
◮ See ?par for more information 65
◮ look first for an existing function that does what you want
—or something similar to what you want (don’t reivent the wheel!)
◮ Existing plotting functions can be combined and
customized by using optional arguments or graphical parameters
◮ For exploratory data analysis (quick and dirty) the plotting
functions in "graphics" is a good option
66