Visual comparisons Comparing distributions: Part 1 R.W. Oldford - - PowerPoint PPT Presentation

visual comparisons
SMART_READER_LITE
LIVE PREVIEW

Visual comparisons Comparing distributions: Part 1 R.W. Oldford - - PowerPoint PPT Presentation

Visual comparisons Comparing distributions: Part 1 R.W. Oldford The Titanic The data set Titanic provides information on the fate of passengers on the fatal maiden voy- age of the ocean liner Titanic, summarized ac- cording to


slide-1
SLIDE 1

Visual comparisons

Comparing distributions: Part 1 R.W. Oldford

slide-2
SLIDE 2

The Titanic

The data set ‘Titanic‘ provides “information on the fate of passengers on the fatal maiden voy- age of the ocean liner ‘Titanic’, summarized ac- cording to economic status (class), sex, age and survival.” The Titanic data records the number of passengers in various categories for four different categorical variates No. Variate Values 1 Class 1st, 2nd, 3rd, Crew 2 Sex Male, Female 3 Age Child, Adult 4 Survived No, Yes

slide-3
SLIDE 3

The Titanic

Might be interested in comparing classes by survival

library(knitr) ## Warning: package 'knitr' was built under R version 3.5.2 # Subtable of survival/not by class classTable <- apply(Titanic, MARGIN = c(4,1), FUN = sum) kable(classTable) 1st 2nd 3rd Crew No 122 167 528 673 Yes 203 118 178 212 # Number in each class is classTotals <- apply(classTable, MARGIN = 2, FUN = sum) classSurvival <- t(classTable["Yes", ] /classTotals) rownames(classSurvival) <- c("Survived") kable(classSurvival) 1st 2nd 3rd Crew Survived 0.6246154 0.4140351 0.2521246 0.239548

slide-4
SLIDE 4

The Titanic

Following the rules for tables, a better way to present these numbers is as

# Rescale and round to two decimals newTable <- 100 * round(classSurvival, 2) # swap rows and columns newTable <- t(newTable) # Values are already in the right order, but in general # order the values in descending order descendingOrder <- order(newTable, decreasing = TRUE) newTable <- newTable[descendingOrder, ,drop = FALSE] # Note drop argument colnames(newTable) <- c("% survived") kable(newTable, caption = "Survival rates on the Titanic by class") Table 4: Survival rates on the Titanic by class % survived 1st 62 2nd 41 3rd 25 Crew 24 How else might we visually compare these sets of numbers?

slide-5
SLIDE 5

The Titanic

As lengths of bars, colour coded (and labelled) by class:

nvals <- nrow(newTable) cols <- rainbow(nvals, alpha = 0.5) barplot(newTable, col = cols, horiz = TRUE, names.arg = c(""), axes = FALSE, xlab = colnames(newTable)) xlocs <- cumsum(newTable) centres <-c(xlocs[1]/2, xlocs[1:(nvals -1)] + diff(xlocs)/2) text(centres, 0.75, labels = rownames(newTable))

% survived 1st 2nd 3rd Crew

which compares lengths along a common NON-aligned scale.

slide-6
SLIDE 6

The Titanic

barplot(newTable, col = cols, horiz = TRUE, beside = TRUE, names.arg = c(""), xlab = colnames(newTable), legend.text = rownames(newTable))

Crew 3rd 2nd 1st % survived 10 20 30 40 50 60

which compares lengths along a common ALIGNED scale.

slide-7
SLIDE 7

The Titanic

Survival and not surviving

survivalProportions <- classTable survivalProportions["Yes",] <- survivalProportions["Yes", ] /classTotals survivalProportions["No",] <- survivalProportions["No", ] /classTotals survivalCols <- adjustcolor(c("black", "grey"), 0.5) barplot(survivalProportions, col = survivalCols, horiz = TRUE, beside = TRUE, xlab = "Proportion of class", xlim = c(0,1)) legend("bottomright", title = "Survival", fill = survivalCols, legend = rownames(survivalProportions))

1st 2nd 3rd Crew Proportion of class 0.0 0.2 0.4 0.6 0.8 1.0 Survival No Yes

slide-8
SLIDE 8

The Titanic

Survival and not surviving; frame

barplot(survivalProportions, col = survivalCols, horiz = TRUE, beside = FALSE, xlab = "Proportion of class", space = 0)

1st 2nd 3rd Crew Proportion of class 0.0 0.2 0.4 0.6 0.8 1.0

Both are again along a common but non-aligned scale, but now bars to be compared are closer and we have the positive effect of framing.

slide-9
SLIDE 9

Warning – problems with stacked bars

Bars placed side by side are pretty natural in some contexts, for example when the horizontal axis (and bar width) represents time. For example, consider the following “sleep telemetry chart”: Yellow corresponds to when the baby is awake, blue when they are asleep. But take care when these bars are stacked on top of each other (as above; or placed side by side if arranged vertically). Look what happens for many many stacked bars (and many bars in each).

www.trixietracker.com/tour/sleep/

slide-10
SLIDE 10

Warning – problems with stacked bars

Take care when placing bars of stacked colours side by side. For example, Horizontal lines look crooked.

slide-11
SLIDE 11

Warning – problems with stacked bars

slide-12
SLIDE 12

Warning – problems with stacked bars

Even when the rectangles are the same size, unintended visual effects can be introduced. All lines are perfectly horizontal! This is called the “cafe wall illusion” after a cafe in Bristol, England.

slide-13
SLIDE 13

Aside – The cafe wall illusion

Take care when placing bars of stacked colours side by side or you might induce unintended visual variation. Cafe on St. Michael’s Hill in Bristol, England

slide-14
SLIDE 14

The Titanic - Number of passengers by class

barplot(apply(classTable, MARGIN = 2, FUN = sum), col= adjustcolor("steelblue", 0.5), xlab="Class", ylab="Number of passengers")

1st 2nd 3rd Crew Class Number of passengers 200 400 600 800

slide-15
SLIDE 15

The Titanic - Number who died in each class

barplot(classTable["No",], col = survivalCols[1], xlab="Class", ylab="Number of passengers")

1st 2nd 3rd Crew Class Number of passengers 200 400 600

slide-16
SLIDE 16

The Titanic - Number who survived in each class

barplot(classTable["Yes",], col = survivalCols[2], xlab="Class", ylab="Number of passengers")

1st 2nd 3rd Crew Class Number of passengers 50 100 150 200

slide-17
SLIDE 17

The Titanic - The proportion of deaths in each class

barplot(classTable, col= survivalCols, xlab="Class", ylab="Number of passengers")

1st 2nd 3rd Crew Class Number of passengers 200 400 600 800

slide-18
SLIDE 18

The Titanic

savePar <- par(mfrow=c(1,3)) barplot(apply(classTable, MARGIN = 2, FUN = sum), col= adjustcolor("steelblue", 0.5), ylim = c(0,1000), # ensure common scale xlab="Class", ylab="Number of passengers") barplot(classTable["No",], col = survivalCols[1], ylim = c(0,1000), # ensure common scale main="Died", xlab="Class", ylab="Number of passengers") barplot(classTable["Yes",], col = survivalCols[2], ylim = c(0,1000), # ensure common scale main="Survived", xlab="Class", ylab="Number of passengers") par(savePar)

slide-19
SLIDE 19

The Titanic

Comparing counts

1st 2nd 3rd Crew Class Number of passengers 200 400 600 800 1000 1st 2nd 3rd Crew

Died

Class Number of passengers 200 400 600 800 1000 1st 2nd 3rd Crew

Survived

Class Number of passengers 200 400 600 800 1000

Can easily compare number of each class. Common aligned scales. Position, length, areas redundantly encode the values. Easier to compare the “shapes” of the distributions as well. Again, “Died” shape looks fairly similar to the total, except perhaps for 1st and 2nd classes. (Differences easier to tell in framed versions.)

slide-20
SLIDE 20

The Titanic

Comparing shapes - no common scale savePar <- par(mfrow=c(1,3)) barplot(apply(classTable, MARGIN = 2, FUN = sum), col= adjustcolor("steelblue", 0.5), # NO COMMON SCALE main="Total", xlab="Class", ylab="Number of passengers") barplot(classTable["No",], col = survivalCols[1], # NO COMMON SCALE main="Died", xlab="Class", ylab="Number of passengers") barplot(classTable["Yes",], col = survivalCols[2], # NO COMMON SCALE main="Survived", xlab="Class", ylab="Number of passengers") par(savePar)

slide-21
SLIDE 21

The Titanic

Comparing shapes - no common scale

1st 2nd 3rd Crew

Total

Class Number of passengers 200 400 600 800 1st 2nd 3rd Crew

Died

Class Number of passengers 100 300 500 1st 2nd 3rd Crew

Survived

Class Number of passengers 50 100 150 200

Different scaling makes it easier to compare the “shapes” of the distributions but harder to compare the actual values.

slide-22
SLIDE 22

South African heart disease

Here we will look at a dataset ‘SAheart‘ from the package ‘ElemStatLearn‘. It is a sample from a retrospective study

  • f heart disease in males from a high-risk region of the

Western Cape, South Africa. There are 462 cases and 10 variates (see ‘help(SAheart, package="ElemStatLearn")‘ for details). For example, ’sbp’ is the measured systolic blood pressure which is the blood pressure when the heart pumps, ‘chd‘ is 1 if the patient has coronary heart disease, and ‘famhist‘ indicates whether or not the patient has a family history of heart disease. library(ElemStatLearn) ## Warning: package 'ElemStatLearn' was built under R version 3.5.2 kable(head(SAheart)) sbp tobacco ldl adiposity famhist typea

  • besity

alcohol age chd 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 1 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 1 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 1 132 6.20 6.47 36.21 Present 62 30.77 14.14 45

slide-23
SLIDE 23

South African heart disease

Some have a family history of heart disease, others do not. noFamilyHistory <- SAheart[, "famhist"] == "Absent" # Number with no family history of heart disease sum(noFamilyHistory) ## [1] 270 # Number with family history of heart disease FamilyHistory <- SAheart[, "famhist"] == "Present" sum(FamilyHistory) ## [1] 192 Can we compare the distributions of the values of sbp, systolic blood pressure, for those patients who have a family history with those who do not?

slide-24
SLIDE 24

South African heart disease

Comparing systolic blood pressure for those with and without family history via boxplots savePar = par(mfrow=c(2,1)) famHistoryCol <- adjustcolor("steelblue", 0.5) noHistoryCol <- adjustcolor("firebrick", 0.5) boxplot(SAheart[noFamilyHistory,"sbp"], col = famHistoryCol, main="No family history", horizontal = TRUE) boxplot(SAheart[FamilyHistory,"sbp"], col = noHistoryCol, main="Have family history", horizontal = TRUE) par(savePar)

slide-25
SLIDE 25

South African heart disease

What can be compared via boxplots?

100 120 140 160 180 200 220

No family history

100 120 140 160 180 200 220

Have family history Note the scales are not identical.

slide-26
SLIDE 26

South African heart disease

Place them on common aligned scales

boxplot(sbp ~ famhist, data = SAheart, col=c(noHistoryCol, famHistoryCol), main = "Systolic blood pressure", horizontal=TRUE)

Absent Present 100 120 140 160 180 200 220

Systolic blood pressure

slide-27
SLIDE 27

South African heart disease

library(ggplot2) ## Warning: package 'ggplot2' was built under R version 3.5.2 # ggplot(data = SAheart, mapping = aes(x=famhist, y=sbp)) + geom_boxplot(colour=c("firebrick", "steelblue"), fill = c("firebrick", "steelblue"), alpha=0.5 ) + coord_flip()

Absent Present 100 125 150 175 200

sbp famhist

slide-28
SLIDE 28

South African heart disease

What can be compared via histograms? savePar = par(mfrow=c(1,2)) hist(SAheart[noFamilyHistory,"sbp"], col=noHistoryCol, main="No family history") hist(SAheart[FamilyHistory,"sbp"], col=famHistoryCol, main="Have family history") par(savePar) Note the scales are not necessarily identical.

slide-29
SLIDE 29

South African heart disease

What can be compared via histograms?

No family history

SAheart[noFamilyHistory, "sbp"] Frequency 100 120 140 160 180 200 220 10 20 30 40 50 60

Have family history

SAheart[FamilyHistory, "sbp"] Frequency 100 120 140 160 180 200 220 10 20 30 40 50

Note the scales are not necessarily identical.

slide-30
SLIDE 30

South African heart disease

Place histograms on a common(in both x and in y) aligned (only x) scale savePar = par(mfrow = c(2,1)) hist(SAheart[noFamilyHistory,"sbp"], col = noHistoryCol main = "No family history", xlab = "systolic blood pressure", xlim = extendrange(SAheart[,"sbp"]), ylim = c(0, 60)) hist(SAheart[FamilyHistory,"sbp"], col = famHistoryCol, main = "Have family history", xlab = "systolic blood pressure", xlim = extendrange(SAheart[,"sbp"]), ylim = c(0, 60)) par(savePar)

slide-31
SLIDE 31

South African heart disease

Place histograms on a common(in both x and in y) aligned (only x) scale

No family history

systolic blood pressure Frequency 100 120 140 160 180 200 220 10 20 30 40 50 60

Have family history

systolic blood pressure Frequency 100 120 140 160 180 200 220 10 20 30 40 50 60

slide-32
SLIDE 32

South African heart disease

Place them on a common aligned scale hist(SAheart[noFamilyHistory,"sbp"], col=noHistoryCol, main="Overlaid: pink without history, blue with", xlim=extendrange(SAheart[,"sbp"])) hist(SAheart[FamilyHistory,"sbp"], col=famHistoryCol, xlim=extendrange(SAheart[,"sbp"]), add=TRUE)

slide-33
SLIDE 33

South African heart disease

Place them on a common aligned scale

Overlaid: pink without history, blue with

SAheart[noFamilyHistory, "sbp"] Frequency 100 120 140 160 180 200 220 10 20 30 40 50 60

slide-34
SLIDE 34

South African heart disease

Reflected xrange <- extendrange(SAheart[,"sbp"]) breaks <- seq(xrange[1], xrange[2], length.out = 12 ) h1 = hist(SAheart[noFamilyHistory,"sbp"], breaks= breaks, plot=FALSE) h2 = hist(SAheart[FamilyHistory,"sbp"], breaks= breaks, plot=FALSE) hmax = max(c(h1$counts, h2$counts)) h2$counts = - h2$counts hmin = -hmax X = c(h1$breaks, h2$breaks) xmax = max(X) xmin = min(X) plot(h1, xlab="Systolic blood pressure", main="Comparing patients with (blue) and without (pink)", ylim=c(hmin, hmax), xlim=c(xmin, xmax), col=noHistoryCol) lines(h2, col=famHistoryCol)

slide-35
SLIDE 35

South African heart disease

Reflected

Comparing patients with (blue) and without (pink)

Systolic blood pressure Frequency 100 120 140 160 180 200 220 −60 −40 −20 20 40 60

slide-36
SLIDE 36

South African heart disease

Back to back

yrange <- extendrange(SAheart[,"sbp"]) breaks <- seq(yrange[1], yrange[2], length.out = 12 ) h1 = hist(SAheart[noFamilyHistory,"sbp"], breaks= breaks, plot=FALSE) h2 = hist(SAheart[FamilyHistory,"sbp"], breaks= breaks, plot=FALSE) nbreaks <- length(breaks) hmax = max(c(h1$counts, h2$counts)) h2$counts = - h2$counts hmin = -hmax Y <-rep(h1$breaks, each=2) X <-c(0, rep(h1$counts, each=2), 0) # Create a plot with nothing inside plot(rep(0,2), range(Y), type = "l", col="black", xlim = c(hmin, hmax), ylim = extendrange(Y), bty="n", xlab="Frequency", ylab="Systolic blood pressure", main="Comparing patients with (blue) and without (pink)" ) polygon(X, Y, col=noHistoryCol) for (i in 1:nbreaks) { lines(c(0, h1$counts[i]), c(rep(h1$breaks[i+1],2)))} Y <-rep(h2$breaks, each=2) X <-c(0, rep(h2$counts, each=2), 0) polygon(X, Y, col=famHistoryCol) for (i in 1:nbreaks) { lines(c(0, h2$counts[i]), c(rep(h2$breaks[i+1],2)))}

slide-37
SLIDE 37

South African heart disease

Back to back Often see demographic data plotted this way

−60 −40 −20 20 40 60 100 120 140 160 180 200 220

Comparing patients with (blue) and without (pink)

Frequency Systolic blood pressure

slide-38
SLIDE 38

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids”

slide-39
SLIDE 39

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids”

slide-40
SLIDE 40

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids”

slide-41
SLIDE 41

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids”

slide-42
SLIDE 42

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids”

slide-43
SLIDE 43

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids”

slide-44
SLIDE 44

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids”

slide-45
SLIDE 45

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids”

slide-46
SLIDE 46

Population pyramids

You can even group common patterns

slide-47
SLIDE 47

Comparing distributions

Features in common and differences are interesting:

slide-48
SLIDE 48

Comparing distributions

What’s different here? What’s the same?

slide-49
SLIDE 49

Comparing distributions

What’s different here? What’s the same?

slide-50
SLIDE 50

Comparing distributions

A closer look at Bahrain:

Source: Government of Bahrain Census Summary Result 2010 (Population, Housing, . . . Census)

slide-51
SLIDE 51

Comparing distributions

A closer look at Bahrain:

non-Bahrainis

slide-52
SLIDE 52

Comparing distributions

A closer look at Bahrain:

non-Bahrainis Bahrainis Source: Government of Bahrain Census Summary Result 2010 (Population, Housing, . . . Census)

slide-53
SLIDE 53

Population pyramids

Often see demographic data plotted this way - so-called “population pyramids” Aside: Note shimmering effect.

slide-54
SLIDE 54

South African heart disease

Using ‘facets’ with ggplot2 library(ggplot2) # ggplot(data = SAheart, mapping = aes(x=sbp)) + geom_histogram(bins=12, colour="grey50", fill = "white") + facet_grid(famhist ~.)

slide-55
SLIDE 55

South African heart disease

Absent Present 120 160 200 20 40 60 20 40 60

sbp count

slide-56
SLIDE 56

South African heart disease

What can be compared via density estimates? savePar = par(mfrow=c(1,2)) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw="SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw="SJ") plot(densAbsent, col="firebrick", main="No family history") polygon(densAbsent, col=noHistoryCol) plot(densPresent, col="steelblue", main="Family history") polygon(densPresent, col=famHistoryCol) par(savePar) Note the scales are not necessarily identical.

slide-57
SLIDE 57

South African heart disease

What can be compared via density estimates?

100 150 200 0.000 0.005 0.010 0.015 0.020 0.025

No family history

N = 270 Bandwidth = 5.423 Density 100 120 140 160 180 200 220 0.000 0.005 0.010 0.015 0.020 0.025

Family history

N = 192 Bandwidth = 4.607 Density

Note the scales are not necessarily identical.

slide-58
SLIDE 58

South African heart disease

Common (both x,y) aligned (x only) scales savePar = par(mfrow = c(2,1)) xlim <- extendrange(SAheart[,"sbp"]) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw = "SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw = "SJ") ylim <- extendrange(c(densAbsent$y, densPresent$y)) plot(densAbsent, col = "firebrick", main = "No family history", xlim = xlim, ylim = ylim) polygon(densAbsent, col = noHistoryCol) plot(densPresent, col = "steelblue", main = "Family history", xlim = xlim, ylim = ylim) polygon(densPresent, col = famHistoryCol) par(savePar)

slide-59
SLIDE 59

South African heart disease

Common (both x,y) aligned (x only) scales

100 120 140 160 180 200 220 0.000 0.020

No family history

N = 270 Bandwidth = 5.423 Density 100 120 140 160 180 200 220 0.000 0.020

Family history

N = 192 Bandwidth = 4.607 Density

slide-60
SLIDE 60

South African heart disease

Common aligned scales (overlaid densities with transparency) xlim <- extendrange(SAheart[,"sbp"]) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw="SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw="SJ") ylim <- extendrange(c(densAbsent$y, densPresent$y)) plot(densAbsent, col="firebrick", xlab="Systolic blood pressure", main="Comparing a family history with no family history", xlim=xlim, ylim=ylim) polygon(densAbsent, col=noHistoryCol) lines(densPresent, col="steelblue") polygon(densPresent, col=famHistoryCol)

slide-61
SLIDE 61

South African heart disease

Common aligned scales (overlaid densities with transparency)

100 120 140 160 180 200 220 0.000 0.005 0.010 0.015 0.020 0.025

Comparing a family history with no family history

Systolic blood pressure Density

slide-62
SLIDE 62

South African heart disease

Common aligned scales - reflected xlim <- extendrange(SAheart[,"sbp"]) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw="SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw="SJ") densPresent$y <- - densPresent$y ylim <- extendrange(c(densAbsent$y, densPresent$y)) plot(densAbsent, col="firebrick", xlab="Systolic blood pressure", main="Comparing a family history with no family history", xlim=xlim, ylim=ylim) polygon(densAbsent, col=noHistoryCol) lines(densPresent, col="steelblue") polygon(densPresent, col=famHistoryCol)

slide-63
SLIDE 63

South African heart disease

Common aligned scales - reflected

100 120 140 160 180 200 220 −0.03 −0.02 −0.01 0.00 0.01 0.02

Comparing a family history with no family history

Systolic blood pressure Density

slide-64
SLIDE 64

South African heart disease

Back to back

ylim <- extendrange(SAheart[,"sbp"]) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw="SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw="SJ") densPresent$y <- - densPresent$y xlim <- extendrange(c(densAbsent$y, densPresent$y)) xyswitch <- function(xy_thing) { yx_thing <-xy_thing yx_thing$x <- xy_thing$y yx_thing$y <- xy_thing$x yx_thing } plot(xyswitch(densAbsent), col="firebrick", xlab="Density", ylab="Systolic blood pressure", main="Comparing a family history with no family history", xlim=xlim, ylim=ylim) polygon(xyswitch(densAbsent), col=noHistoryCol) lines(xyswitch(densPresent), col="steelblue") polygon(xyswitch(densPresent), col=famHistoryCol)

slide-65
SLIDE 65

South African heart disease

Back to back

−0.03 −0.02 −0.01 0.00 0.01 0.02 100 120 140 160 180 200 220

Comparing a family history with no family history

Density Systolic blood pressure

slide-66
SLIDE 66

South African heart disease

Using ‘facets’ with ggplot2 library(ggplot2) # ggplot(data = SAheart, mapping = aes(x=sbp, col = famhist)) + geom_density(colour="grey50", fill = "black", alpha = 0.4, bw="SJ") + facet_grid(famhist ~.)

slide-67
SLIDE 67

South African heart disease

Using ‘facets’ with ggplot2

Absent Present 100 125 150 175 200 0.00 0.01 0.02 0.00 0.01 0.02

sbp density

slide-68
SLIDE 68

South African heart disease

Simple points xlim <- extendrange(SAheart[,"sbp"]) n <- nrow(SAheart) col <- rep(adjustcolor("firebrick", 0.2), n) col[FamilyHistory] <- adjustcolor("steelblue", 0.2) Y <- rep(1, n) Y[FamilyHistory] <- -1 plot(SAheart[,"sbp"], y= Y, col=col, pch=19, cex=3, xlab="Systolic blood pressure", ylab = "", main="Comparing a family history with no family history", xlim=xlim, ylim = c(-2,2), bty="n", yaxt = "n")

slide-69
SLIDE 69

South African heart disease

Simple points

100 120 140 160 180 200 220

Comparing a family history with no family history

Systolic blood pressure

slide-70
SLIDE 70

South African heart disease

Simple points with jittering xlim <- extendrange(SAheart[,"sbp"]) n <- nrow(SAheart) col <- rep(adjustcolor("firebrick", 0.2), n) col[FamilyHistory] <- adjustcolor("steelblue", 0.2) Y <- rep(1, n) Y[FamilyHistory] <- -1 U <- runif(n, -0.3, 0.3) plot(SAheart[,"sbp"], y= Y + U, col=col, pch=19, cex=3, xlab="Systolic blood pressure", ylab = "", main="Comparing a family history with no family history", xlim=xlim, ylim = c(-2,2), bty="n", yaxt = "n")

slide-71
SLIDE 71

South African heart disease

Simple points with jittering

100 120 140 160 180 200 220

Comparing a family history with no family history

Systolic blood pressure

slide-72
SLIDE 72

South African heart disease

Quantile plots savePar <- par(mfrow=c(1,2)) nAbsent <- sum(noFamilyHistory) nPresent <- sum(FamilyHistory) pAbsent <- ppoints(nAbsent) pPresent <- ppoints(nPresent) plot(pAbsent, sort(SAheart[noFamilyHistory,"sbp"]), type="b", col=noHistoryCol, pch=19, xlab="Cumulative proportion", ylab = "Systolic blood pressure", main="no family history") plot(pPresent, sort(SAheart[FamilyHistory,"sbp"]), type="b", col=famHistoryCol, pch=19, xlab="Cumulative proportion", ylab = "Systolic blood pressure", main="family history") par(savePar)

slide-73
SLIDE 73

South African heart disease

Quantile plots

0.0 0.2 0.4 0.6 0.8 1.0 100 120 140 160 180 200 220

no family history

Cumulative proportion Systolic blood pressure 0.0 0.2 0.4 0.6 0.8 1.0 100 120 140 160 180 200 220

family history

Cumulative proportion Systolic blood pressure

slide-74
SLIDE 74

South African heart disease

Quantile plots (common aligned scales via overlaying) ylim <- extendrange(SAheart[,"sbp"]) nAbsent <- sum(noFamilyHistory) nPresent <- sum(FamilyHistory) pAbsent <- ppoints(nAbsent) pPresent <- ppoints(nPresent) plot(pAbsent, sort(SAheart[noFamilyHistory,"sbp"]), type = "b", col = noHistoryCol, pch = 19, ylim = ylim, xlab = "Cumulative proportion", ylab = "Systolic blood pressure", main = "Comparing with (blue) to no family history (pink)") points(pPresent, sort(SAheart[FamilyHistory,"sbp"]), type = "b", col = famHistoryCol, pch=19)

slide-75
SLIDE 75

South African heart disease

Quantile plots (common aligned scales via overlaying)

0.0 0.2 0.4 0.6 0.8 1.0 100 120 140 160 180 200 220

Comparing with (blue) to no family history (pink)

Cumulative proportion Systolic blood pressure

How do these two distributions compare? In location? Scale? Modality? Tails?