Statistical Graphics using lattice Deepayan Sarkar Fred Hutchinson - - PowerPoint PPT Presentation

statistical graphics using lattice
SMART_READER_LITE
LIVE PREVIEW

Statistical Graphics using lattice Deepayan Sarkar Fred Hutchinson - - PowerPoint PPT Presentation

Introduction Basic use Overview Case studies Statistical Graphics using lattice Deepayan Sarkar Fred Hutchinson Cancer Research Center 29 July 2008 Deepayan Sarkar Statistical Graphics using lattice Introduction Basic use Overview Case


slide-1
SLIDE 1

Introduction Basic use Overview Case studies

Statistical Graphics using lattice

Deepayan Sarkar

Fred Hutchinson Cancer Research Center

29 July 2008

Deepayan Sarkar Statistical Graphics using lattice

slide-2
SLIDE 2

Introduction Basic use Overview Case studies

R graphics

❼ R has two largely independent graphics subsystems

❼ Traditional graphics ❼ available in R from the beginning ❼ rich collection of tools ❼ not very flexible ❼ Grid graphics ❼ relatively recent (2000) ❼ low-level tool, highly flexible

❼ Grid forms the basis of two high-level graphics systems:

❼ lattice: based on Trellis graphics (Cleveland) ❼ ggplot2: inspired by“Grammar of Graphics”(Wilkinson)

Deepayan Sarkar Statistical Graphics using lattice

slide-3
SLIDE 3

Introduction Basic use Overview Case studies

The lattice package

❼ Trellis graphics for R (originally developed in S) ❼ Powerful high-level data visualization system ❼ Provides common statistical graphics with conditioning

❼ emphasis on multivariate data ❼ sufficient for typical graphics needs ❼ flexible enough to handle most nonstandard requirements

❼ Traditional user interface:

❼ collection of high level functions: xyplot(), dotplot(), etc. ❼ interface based on formula and data source

Deepayan Sarkar Statistical Graphics using lattice

slide-4
SLIDE 4

Introduction Basic use Overview Case studies

Outline

❼ Introduction, simple examples ❼ Overview of features ❼ Sample session to work through, available at

http://dsarkar.fhcrc.org/lattice-lab/

❼ A few case studies if time permits

Deepayan Sarkar Statistical Graphics using lattice

slide-5
SLIDE 5

High-level functions in lattice

Function Default Display histogram() Histogram densityplot() Kernel Density Plot qqmath() Theoretical Quantile Plot qq() Two-sample Quantile Plot stripplot() Stripchart (Comparative 1-D Scatter Plots) bwplot() Comparative Box-and-Whisker Plots barchart() Bar Plot dotplot() Cleveland Dot Plot xyplot() Scatter Plot splom() Scatter-Plot Matrix contourplot() Contour Plot of Surfaces levelplot() False Color Level Plot of Surfaces wireframe() Three-dimensional Perspective Plot of Surfaces cloud() Three-dimensional Scatter Plot parallel() Parallel Coordinates Plot

slide-6
SLIDE 6

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

The Chem97 dataset

❼ 1997 A-level Chemistry examination in Britain > data(Chem97, package = "mlmRev") > head(Chem97[c("score", "gender", "gcsescore")]) score gender gcsescore 1 4 F 6.625 2 10 F 7.625 3 10 F 7.250 4 10 F 7.500 5 8 F 6.444 6 10 F 7.750

Deepayan Sarkar Statistical Graphics using lattice

slide-7
SLIDE 7

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> histogram(~ gcsescore, data = Chem97)

gcsescore Percent of Total

5 10 15 20 25 2 4 6 8

Deepayan Sarkar Statistical Graphics using lattice

slide-8
SLIDE 8

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> histogram(~ gcsescore | factor(score), data = Chem97)

gcsescore Percent of Total

10 20 30 2 4 6 8

2

2 4 6 8

4 6

2 4 6 8

8

10 20 30

10 Deepayan Sarkar Statistical Graphics using lattice

slide-9
SLIDE 9

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> densityplot(~ gcsescore | factor(score), Chem97, plot.points = FALSE, groups = gender, auto.key = TRUE)

gcsescore Density

0.0 0.2 0.4 0.6 0.8 2 4 6 8

2

2 4 6 8

4 6

2 4 6 8

8

0.0 0.2 0.4 0.6 0.8

10 M F Deepayan Sarkar Statistical Graphics using lattice

slide-10
SLIDE 10

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

Trellis Philosophy: Part I

❼ Display specified in terms of

❼ Type of display (histogram, densityplot, etc.) ❼ Variables with specific roles

❼ Typical roles for variables

❼ Primary variables: used for the main graphical display ❼ Conditioning variables: used to divide into subgroups and

juxtapose (multipanel conditioning)

❼ Grouping variable: divide into subgroups and superpose

❼ Primary interface: high-level functions

❼ Each function corresponds to a display type ❼ Specification of roles depends on display type ❼ Usually specified through the formula and the groups argument

Deepayan Sarkar Statistical Graphics using lattice

slide-11
SLIDE 11

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> qqmath(~ gcsescore | factor(score), Chem97, groups = gender, f.value = ppoints(100), auto.key = TRUE, type = c("p", "g"), aspect = "xy")

qnorm gcsescore

3 4 5 6 7 8 −2 −1 1 2

  • ● ●
  • ● ●
  • ● ●

2

−2 −1 1 2

  • ● ●
  • ● ●

4

  • ● ●
  • ● ●

6

−2 −1 1 2

  • ● ●
  • ● ●

8

3 4 5 6 7 8

  • ● ●
  • ● ●

10 M F

  • Deepayan Sarkar

Statistical Graphics using lattice

slide-12
SLIDE 12

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> qq(gender ~ gcsescore | factor(score), Chem97, f.value = ppoints(100), type = c("p", "g"), aspect = 1)

M F

3 4 5 6 7 8 3 4 5 6 7 8

  • 2

3 4 5 6 7 8

  • 4
  • 6

3 4 5 6 7 8

  • 8

3 4 5 6 7 8

  • 10

Deepayan Sarkar Statistical Graphics using lattice

slide-13
SLIDE 13

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> bwplot(factor(score) ~ gcsescore | gender, Chem97)

gcsescore

2 4 6 8 10 2 4 6 8

  • ●●
  • ●●
  • ● ●
  • ● ●
  • M

2 4 6 8

  • ● ●
  • ● ●
  • ● ●
  • F

Deepayan Sarkar Statistical Graphics using lattice

slide-14
SLIDE 14

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> bwplot(gcsescore ~ gender | factor(score), Chem97, layout = c(6, 1))

gcsescore

2 4 6 8 M F

  • M

F

  • 2

M F

  • 4

M F

  • 6

M F

  • 8

M F

  • 10

Deepayan Sarkar Statistical Graphics using lattice

slide-15
SLIDE 15

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> stripplot(depth ~ factor(mag), data = quakes, jitter.data = TRUE, alpha = 0.6)

depth

200 400 600 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.9 6 6.1 6.4

Deepayan Sarkar Statistical Graphics using lattice

slide-16
SLIDE 16

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

The VADeaths dataset

❼ Death rates in Virginia, 1941, among different population

subgroups

> VADeaths Rural Male Rural Female Urban Male Urban Female 50-54 11.7 8.7 15.4 8.4 55-59 18.1 11.7 24.3 13.6 60-64 26.9 20.3 37.0 19.3 65-69 41.0 30.9 54.6 35.1 70-74 66.0 54.3 71.1 50.0

Deepayan Sarkar Statistical Graphics using lattice

slide-17
SLIDE 17

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> barchart(VADeaths, groups = FALSE, layout = c(4, 1))

Freq

50−54 55−59 60−64 65−69 70−74 20 40 60

Rural Male

20 40 60

Rural Female

20 40 60

Urban Male

20 40 60

Urban Female Deepayan Sarkar Statistical Graphics using lattice

slide-18
SLIDE 18

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> dotplot(VADeaths, groups = FALSE, layout = c(4, 1))

Freq

50−54 55−59 60−64 65−69 70−74 20 40 60

  • Rural Male

20 40 60

  • Rural Female

20 40 60

  • Urban Male

20 40 60

  • Urban Female

Deepayan Sarkar Statistical Graphics using lattice

slide-19
SLIDE 19

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> dotplot(VADeaths, type = "o", auto.key = list(points = TRUE, lines = TRUE, space = "right"))

Freq

50−54 55−59 60−64 65−69 70−74 20 40 60

  • Rural Male

Rural Female Urban Male Urban Female

  • Deepayan Sarkar

Statistical Graphics using lattice

slide-20
SLIDE 20

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> data(Earthquake, package = "nlme") > xyplot(accel ~ distance, data = Earthquake)

distance accel

0.0 0.2 0.4 0.6 0.8 100 200 300

  • Deepayan Sarkar

Statistical Graphics using lattice

slide-21
SLIDE 21

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> xyplot(accel ~ distance, data = Earthquake, scales = list(log = TRUE), type = c("p", "g", "smooth"))

distance accel

10^−2.5 10^−2.0 10^−1.5 10^−1.0 10^−0.5 10^0.0 10^0.0 10^0.5 10^1.0 10^1.5 10^2.0 10^2.5

  • Deepayan Sarkar

Statistical Graphics using lattice

slide-22
SLIDE 22

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> Depth <- equal.count(quakes$depth, number=8, overlap=.1) > summary(Depth) Intervals: min max count 1 39.5 63.5 138 2 60.5 102.5 138 3 97.5 175.5 138 4 161.5 249.5 142 5 242.5 460.5 138 6 421.5 543.5 137 7 537.5 590.5 140 8 586.5 680.5 137 Overlap between adjacent intervals: [1] 16 14 19 15 14 15 15

Deepayan Sarkar Statistical Graphics using lattice

slide-23
SLIDE 23

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> xyplot(lat ~ long | Depth, data = quakes)

long lat

−35 −30 −25 −20 −15 −10 165 170 175 180 185

  • Depth
  • Depth

165 170 175 180 185

  • Depth
  • Depth
  • Depth

165 170 175 180 185

  • Depth
  • Depth

165 170 175 180 185 −35 −30 −25 −20 −15 −10

  • Depth

Deepayan Sarkar Statistical Graphics using lattice

slide-24
SLIDE 24

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> cloud(depth ~ lat * long, data = quakes, zlim = rev(range(quakes$depth)), screen = list(z = 105, x = -70), panel.aspect = 0.75)

lat long depth Deepayan Sarkar Statistical Graphics using lattice

slide-25
SLIDE 25

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> cloud(depth ~ lat * long, data = quakes, zlim = rev(range(quakes$depth)), screen = list(z = 80, x = -70), panel.aspect = 0.75)

lat long depth Deepayan Sarkar Statistical Graphics using lattice

slide-26
SLIDE 26

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

More high-level functions

❼ More high-level functions in lattice

❼ Won’t discuss, but examples in manual page

❼ Other Trellis high-level functions can be defined in other

packages, e.g.,

❼ ecdfplot(), mapplot() in the latticeExtra package ❼ hexbinplot() in the hexbin package

Deepayan Sarkar Statistical Graphics using lattice

slide-27
SLIDE 27

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

The“trellis”object model

❼ One important feature of lattice:

❼ High-level functions do not actually plot anything ❼ They return an object of class“trellis” ❼ Display created when such objects are print()-ed or plot()-ed

❼ Usually not noticed because of automatic printing rule ❼ Can be used to arrange multiple plots ❼ Other uses as well

Deepayan Sarkar Statistical Graphics using lattice

slide-28
SLIDE 28

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object

> dp.uspe <- dotplot(t(USPersonalExpenditure), groups = FALSE, layout = c(1, 5), xlab = "Expenditure (billion dollars)") > dp.uspe.log <- dotplot(t(USPersonalExpenditure), groups = FALSE, layout = c(1, 5), scales = list(x = list(log = 2)), xlab = "Expenditure (billion dollars)") > plot(dp.uspe, split = c(1, 1, 2, 1)) > plot(dp.uspe.log, split = c(2, 1, 2, 1), newpage = FALSE)

Deepayan Sarkar Statistical Graphics using lattice

slide-29
SLIDE 29

Introduction Basic use Overview Case studies Univariate Tables Scatter plots Shingles Object Expenditure (billion dollars)

1940 1945 1950 1955 1960 20 40 60 80

  • Food and Tobacco

1940 1945 1950 1955 1960

  • Household Operation

1940 1945 1950 1955 1960

  • Medical and Health

1940 1945 1950 1955 1960

  • Personal Care

1940 1945 1950 1955 1960

  • Private Education

Expenditure (billion dollars)

1940 1945 1950 1955 1960 2^0 2^2 2^4 2^6

  • Food and Tobacco

1940 1945 1950 1955 1960

  • Household Operation

1940 1945 1950 1955 1960

  • Medical and Health

1940 1945 1950 1955 1960

  • Personal Care

1940 1945 1950 1955 1960

  • Private Education

Deepayan Sarkar Statistical Graphics using lattice

slide-30
SLIDE 30

Introduction Basic use Overview Case studies

Trellis Philosophy: Part I

❼ Display specified in terms of

❼ Type of display (histogram, densityplot, etc.) ❼ Variables with specific roles

❼ Typical roles for variables

❼ Primary variables: used for the main graphical display ❼ Conditioning variables: used to divide into subgroups and

juxtapose (multipanel conditioning)

❼ Grouping variable: divide into subgroups and superpose

❼ Primary interface: high-level functions

❼ Each function corresponds to a display type ❼ Specification of roles depends on display type ❼ Usually specified through the formula and the groups argument

Deepayan Sarkar Statistical Graphics using lattice

slide-31
SLIDE 31

Introduction Basic use Overview Case studies

Trellis Philosophy: Part II

❼ Design goals:

❼ Enable effective graphics by encouraging good graphical

practice (e.g., Cleveland, 1985)

❼ Remove the burden from the user as much as possible by

building in good defaults into software ❼ Some obvious examples:

❼ Use as much of the available space as possible ❼ Encourage direct comparsion by superposition (grouping) ❼ Enable comparison when juxtaposing (conditioning): ❼ use common axes ❼ add common reference objects (such as grids)

❼ Inevitable departure from traditional R graphics paradigms

Deepayan Sarkar Statistical Graphics using lattice

slide-32
SLIDE 32

Introduction Basic use Overview Case studies

Trellis Philosophy: Part III

❼ Any serious graphics system must also be flexible ❼ lattice tries to balance flexibility and ease of use using the

following model:

❼ A display is made up of various elements ❼ Coordinated defaults provide meaningful results, but ❼ Each element can be controlled independently ❼ The main elements are: ❼ the primary (panel) display ❼ axis annotation ❼ strip annotation (describing the conditioning process) ❼ legends (typically describing the grouping process)

Deepayan Sarkar Statistical Graphics using lattice

slide-33
SLIDE 33

Introduction Basic use Overview Case studies

❼ The full system would take too long to describe ❼ Online documentation has details; start with ?Lattice ❼ We discuss a few advanced ideas using some case studies

Deepayan Sarkar Statistical Graphics using lattice

slide-34
SLIDE 34

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Case studies

❼ Adding regression lines to scatter plots ❼ Reordering levels of a factor

Deepayan Sarkar Statistical Graphics using lattice

slide-35
SLIDE 35

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Example 1: Growth curves

❼ Heights of boys from Oxford over time ❼ 26 boys, height measured on 9 occasions > data(Oxboys, package = "nlme") > head(Oxboys) Subject age height Occasion 1 1 -1.0000 140.5 1 2 1 -0.7479 143.4 2 3 1 -0.4630 144.8 3 4 1 -0.1643 147.1 4 5 1 -0.0027 147.7 5 6 1 0.2466 150.2 6

Deepayan Sarkar Statistical Graphics using lattice

slide-36
SLIDE 36

> xyplot(height ~ age | Subject, data = Oxboys, strip = FALSE, aspect = "xy", pch = 16, xlab = "Standardized age", ylab = "Height (cm)")

Standardized age Height (cm)

130 140 150 160 170 −1.00.0 1.0

  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • 130

140 150 160 170

slide-37
SLIDE 37

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Example 2: Exam scores

❼ GCSE exam scores on a science subject. Two components:

❼ course work ❼ written paper

❼ 1905 students > data(Gcsemv, package = "mlmRev") > head(Gcsemv) school student gender written course 1 20920 16 M 23 NA 2 20920 25 F NA 71.2 3 20920 27 F 39 76.8 4 20920 31 F 36 87.9 5 20920 42 M 16 44.4 6 20920 62 F 36 NA

Deepayan Sarkar Statistical Graphics using lattice

slide-38
SLIDE 38

> xyplot(written ~ course | gender, data = Gcsemv, xlab = "Coursework score", ylab = "Written exam score")

Coursework score Written exam score

20 40 60 80 20 40 60 80 100

  • F

20 40 60 80 100

  • ● ●
  • M
slide-39
SLIDE 39

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Adding to a Lattice display

❼ Traditional R graphics encourages incremental additions ❼ The Lattice analogue is to write panel functions

Deepayan Sarkar Statistical Graphics using lattice

slide-40
SLIDE 40

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

A simple panel function

❼ Things to know:

❼ Panel functions are functions (!) ❼ They are responsible for graphical content inside panels ❼ They get executed once for every panel ❼ Every high level function has a default panel function

e.g., xyplot() has default panel function panel.xyplot()

Deepayan Sarkar Statistical Graphics using lattice

slide-41
SLIDE 41

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

A simple panel function

❼ So, equivalent call: > xyplot(written ~ course | gender, data = Gcsemv, xlab = "Coursework score", ylab = "Written exam score", panel = panel.xyplot)

Deepayan Sarkar Statistical Graphics using lattice

slide-42
SLIDE 42

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

A simple panel function

❼ So, equivalent call: > xyplot(written ~ course | gender, data = Gcsemv, xlab = "Coursework score", ylab = "Written exam score", panel = function(...) { panel.xyplot(...) })

Deepayan Sarkar Statistical Graphics using lattice

slide-43
SLIDE 43

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

A simple panel function

❼ So, equivalent call: > xyplot(written ~ course | gender, data = Gcsemv, xlab = "Coursework score", ylab = "Written exam score", panel = function(x, y, ...) { panel.xyplot(x, y, ...) })

Deepayan Sarkar Statistical Graphics using lattice

slide-44
SLIDE 44

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

A simple panel function

❼ Now, we can add a couple of elements: > xyplot(written ~ course | gender, data = Gcsemv, xlab = "Coursework score", ylab = "Written exam score", panel = function(x, y, ...) { panel.grid(h = -1, v = -1) panel.xyplot(x, y, ...) panel.loess(x, y, ..., col = "black") panel.rug(x = x[is.na(y)], y = y[is.na(x)]) })

Deepayan Sarkar Statistical Graphics using lattice

slide-45
SLIDE 45

Coursework score Written exam score

20 40 60 80 20 40 60 80 100

  • F

20 40 60 80 100

  • ● ●
  • M
slide-46
SLIDE 46

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Panel functions

Another useful feature: argument passing

> xyplot(written ~ course | gender, data = Gcsemv, panel = function(x, y, ...) { panel.xyplot(x, y, ..., type = c("g", "p", "smooth"), col.line = "black") })

is equivalent to

> xyplot(written ~ course | gender, data = Gcsemv, type = c("g", "p", "smooth"), col.line = "black")

Deepayan Sarkar Statistical Graphics using lattice

slide-47
SLIDE 47

course written

20 40 60 80 20 40 60 80 100

  • F

20 40 60 80 100

  • ● ●
  • M
slide-48
SLIDE 48

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Passing arguments to panel functions

❼ Requires knowledge of arguments supported by panel function ❼ Each high-level function has a corresponding default panel

function, named as“panel.” followed by the function name. For example,

❼ histogram() has panel function panel.histogram ❼ dotplot() has panel function panel.dotplot

❼ Most have useful arguments that support common variants

Deepayan Sarkar Statistical Graphics using lattice

slide-49
SLIDE 49

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Back to regression lines

❼ Oxboys: model height on age

yij = µ + bi + xij + x2

ij + εij

❼ Mixed effect model that can be fit with lme4 > library(lme4) > fm.poly <- lmer(height ~ poly(age, 2) + (1 | Subject), data = Oxboys) ❼ Goal: plot of data with fitted curve superposed

Deepayan Sarkar Statistical Graphics using lattice

slide-50
SLIDE 50

Standardized age Height (cm)

130 140 150 160 170 −1.00.0 1.0

  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • 130

140 150 160 170

slide-51
SLIDE 51

> xyplot(height ~ age | Subject, data = Oxboys, strip = FALSE, aspect = "xy", type = "p", pch = 16, xlab = "Standardized age", ylab = "Height (cm)")

Standardized age Height (cm)

130 140 150 160 170 −1.00.0 1.0

  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • 130

140 150 160 170

slide-52
SLIDE 52

> xyplot(fitted(fm.poly) ~ age | Subject, data = Oxboys, strip = FALSE, aspect = "xy", type = "l", lwd = 2, xlab = "Standardized age", ylab = "Height (cm)")

Standardized age Height (cm)

130 140 150 160 170 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 −1.00.0 1.0 130 140 150 160 170

slide-53
SLIDE 53

> xyplot(height + fitted(fm.poly) ~ age | Subject, data = Oxboys, strip = FALSE, aspect = "xy", pch = 16, lwd = 2, type = c("p", "l"), distribute.type = TRUE, xlab = "Standardized age", ylab = "Height (cm)")

Standardized age Height (cm)

130 140 150 160 170 −1.00.0 1.0

  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • −1.00.0 1.0
  • 130

140 150 160 170

slide-54
SLIDE 54

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

GCSE exam scores

❼ Gcsemv: model written score by coursework and gender ❼ A similar approach does not work as well

❼ x values are not ordered ❼ missing values are omitted from fitted model

Deepayan Sarkar Statistical Graphics using lattice

slide-55
SLIDE 55

> fm <- lm(written ~ course + I(course^2) + gender, Gcsemv) > xyplot(written + fitted(fm) ~ course | gender, data = subset(Gcsemv, !(is.na(written) | is.na(course))) type = c("p", "l"), distribute.type = TRUE)

course written + fitted(fm)

20 40 60 80 20 40 60 80 100

  • F

20 40 60 80 100

  • ● ●
  • M
slide-56
SLIDE 56

❼ Built-in solution: Simple Linear Regression in each panel > xyplot(written ~ course | gender, Gcsemv, type = c("p", "r"), col.line = "black")

course written

20 40 60 80 20 40 60 80 100

  • F

20 40 60 80 100

  • ● ●
  • M
slide-57
SLIDE 57

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

GCSE exam scores

❼ More complex models need a little more work ❼ Consider three models: > fm0 <- lm(written ~ course, Gcsemv) > fm1 <- lm(written ~ course + gender, Gcsemv) > fm2 <- lm(written ~ course * gender, Gcsemv) ❼ Goal: compare fm2 and fm1 with fm0

Deepayan Sarkar Statistical Graphics using lattice

slide-58
SLIDE 58

course written

20 40 60 80 20 40 60 80 100

  • F

20 40 60 80 100

  • ● ●
  • M
slide-59
SLIDE 59

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

❼ Solution: evaluate fits separately and combine > course.rng <- range(Gcsemv$course, finite = TRUE) > grid <- expand.grid(course = do.breaks(course.rng, 30), gender = unique(Gcsemv$gender)) > fm0.pred <- cbind(grid, written = predict(fm0, newdata = grid)) > fm1.pred <- cbind(grid, written = predict(fm1, newdata = grid)) > fm2.pred <- cbind(grid, written = predict(fm2, newdata = grid)) > orig <- Gcsemv[c("course", "gender", "written")]

Deepayan Sarkar Statistical Graphics using lattice

slide-60
SLIDE 60

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

> str(orig) ✬data.frame✬: 1905 obs. of 3 variables: $ course : num NA 71.2 76.8 87.9 44.4 NA 89.8 17.5 32.4 84.2 .. $ gender : Factor w/ 2 levels "F","M": 2 1 1 1 2 1 1 2 2 1 ... $ written: num 23 NA 39 36 16 36 49 25 NA 48 ... > str(fm0.pred) ✬data.frame✬: 62 obs. of 3 variables: $ course : num 9.25 12.28 15.30 18.32 21.35 ... $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ... $ written: num 21.6 22.7 23.9 25.1 26.3 ...

Deepayan Sarkar Statistical Graphics using lattice

slide-61
SLIDE 61

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

> combined <- make.groups(original = orig, fm0 = fm0.pred, fm2 = fm2.pred) > str(combined) ✬data.frame✬: 2029 obs. of 4 variables: $ course : num NA 71.2 76.8 87.9 44.4 NA 89.8 17.5 32.4 84.2 .. $ gender : Factor w/ 2 levels "F","M": 2 1 1 1 2 1 1 2 2 1 ... $ written: num 23 NA 39 36 16 36 49 25 NA 48 ... $ which : Factor w/ 3 levels "original","fm0",..: 1 1 1 1 1 1 1

Deepayan Sarkar Statistical Graphics using lattice

slide-62
SLIDE 62

> xyplot(written ~ course | gender, data = combined, groups = which, type = c("p", "l", "l"), distribute.type = TRUE)

course written

20 40 60 80 20 40 60 80 100

  • F

20 40 60 80 100

  • ● ●
  • M
slide-63
SLIDE 63

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Reordering factor levels

❼ Levels of categorical variables often have no intrinsic order ❼ The default in factor() is to use sort(unique(x))

❼ Implies alphabetical order for factors converted from character

❼ Usually irrelevant in analyses ❼ Can strongly affect impact in a graphical display

Deepayan Sarkar Statistical Graphics using lattice

slide-64
SLIDE 64

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Example

❼ Population density in US states in 1975 > state <- data.frame(name = state.name, region = state.region, state.x77) > state$Density <- with(state, Population / Area) > dotplot(name ~ Density, state) > dotplot(name ~ Density, state, scales = list(x = list(log = TRUE)))

Deepayan Sarkar Statistical Graphics using lattice

slide-65
SLIDE 65

Density

Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming 0.0 0.2 0.4 0.6 0.8 1.0

  • Density

Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming 10^−3 10^−2 10^−1 10^0

slide-66
SLIDE 66

Density

Alaska Wyoming Montana Nevada South Dakota North Dakota New Mexico Idaho Utah Arizona Nebraska Oregon Colorado Kansas Maine Oklahoma Arkansas Texas Minnesota Mississippi Vermont Iowa Washington Missouri Alabama West Virginia Wisconsin Louisiana Georgia Kentucky New Hampshire South Carolina Tennessee North Carolina Virginia Hawaii California Indiana Florida Michigan Illinois Ohio Pennsylvania Delaware New York Maryland Connecticut Massachusetts Rhode Island New Jersey 0.0 0.2 0.4 0.6 0.8 1.0

  • Density

Alaska Wyoming Montana Nevada South Dakota North Dakota New Mexico Idaho Utah Arizona Nebraska Oregon Colorado Kansas Maine Oklahoma Arkansas Texas Minnesota Mississippi Vermont Iowa Washington Missouri Alabama West Virginia Wisconsin Louisiana Georgia Kentucky New Hampshire South Carolina Tennessee North Carolina Virginia Hawaii California Indiana Florida Michigan Illinois Ohio Pennsylvania Delaware New York Maryland Connecticut Massachusetts Rhode Island New Jersey 10^−3 10^−2 10^−1 10^0

slide-67
SLIDE 67

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

The reorder() function

> dotplot(reorder(name, Density) ~ Density, state) > dotplot(reorder(name, Density) ~ Density, state, scales = list(x = list(log = TRUE))) ❼ Reorders levels of a factor by another variable ❼ optional summary function, default mean()

Deepayan Sarkar Statistical Graphics using lattice

slide-68
SLIDE 68

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Reordering by multiple variables

❼ Not directly supported, but. . . ❼ Order is preserved within ties > state$region <- with(state, reorder(region, Frost, median)) > state$name <- with(state, reorder(reorder(name, Frost), as.numeric(region))) > p <- dotplot(name ~ Frost | region, state, strip = FALSE, strip.left = TRUE, layout = c(1, 4), scales = list(y = list(relation = "free", rot = 0))) > plot(p, panel.height = list(x = table(state$region), units = "null"))

Deepayan Sarkar Statistical Graphics using lattice

slide-69
SLIDE 69

Frost

Florida Louisiana Alabama Texas Mississippi Georgia Arkansas South Carolina Tennessee North Carolina Oklahoma Virginia Kentucky West Virginia Maryland Delaware 50 100 150

  • South

Hawaii Arizona California Washington Oregon New Mexico Idaho Utah Alaska Montana Colorado Wyoming Nevada

  • West

New York Massachusetts New Jersey Pennsylvania Rhode Island Connecticut Maine Vermont New Hampshire

  • Northeast

Missouri Kansas Indiana Ohio Michigan Illinois Nebraska Iowa Wisconsin Minnesota South Dakota North Dakota

  • North Central
slide-70
SLIDE 70

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Ordering panels using index.cond

❼ Order panels by some summary of panel data ❼ Example: death rates due to cancer in US counties, 2001-2003 > data(USCancerRates, package = "latticeExtra") > xyplot(rate.male ~ rate.female | state, USCancerRates, index.cond = function(x, y, ...) { median(y - x, na.rm = TRUE) }, aspect = "iso", panel = function(...) { panel.grid(h = -1, y = -1) panel.abline(0, 1) panel.xyplot(...) }, pch = ".")

Deepayan Sarkar Statistical Graphics using lattice

slide-71
SLIDE 71

rate.female rate.male

100 200 300 400 500 600 50 150250350

Colorado Arizona

50 150250350

California Wyoming

50 150250350

Montana Utah

50 150250350

Oregon New Mexico

50 150250350

Nebraska Idaho

50 150250350

New Jersey Washington

50 150250350

Alaska Rhode Island Connecticut New York South DakotaVermontNorth Dakota MassachusettsKansas New Hampshire Wisconsin Nevada Minnesota

100 200 300 400 500 600

Iowa

100 200 300 400 500 600

Maine MichiganPennsylvania Ohio Florida Missouri Indiana West Virginia Texas Illinois Delaware Maryland Oklahoma Virginia

50 150250350

North Carolina Tennessee

50 150250350

Arkansas Georgia

50 150250350

Kentucky Louisiana

50 150250350

Alabama South Carolina

50 150250350 100 200 300 400 500 600

Mississippi

slide-72
SLIDE 72

Introduction Basic use Overview Case studies Regression Lines Reordering Summary

Take home message

❼ Panel functions provide finest level of control ❼ Built-in panel functions are also powerful

❼ Easily taken advantage of using argument passing ❼ Requires knowledge of arguments (read documentation!) ❼ Special function panel.superpose() useful for grouping

❼ Several useful functions make life a little simpler

❼ reorder(), make.groups(), etc.

Deepayan Sarkar Statistical Graphics using lattice