Visual comparisons Comparing distributions: Part 1 R.W. Oldford - - PowerPoint PPT Presentation
Visual comparisons Comparing distributions: Part 1 R.W. Oldford - - PowerPoint PPT Presentation
Visual comparisons Comparing distributions: Part 1 R.W. Oldford The Titanic The data set Titanic provides information on the fate of passengers on the fatal maiden voy- age of the ocean liner Titanic, summarized ac- cording to
The Titanic
The data set ‘Titanic‘ provides “information on the fate of passengers on the fatal maiden voy- age of the ocean liner ‘Titanic’, summarized ac- cording to economic status (class), sex, age and survival.” The Titanic data records the number of passengers in various categories for four different categorical variates No. Variate Values 1 Class 1st, 2nd, 3rd, Crew 2 Sex Male, Female 3 Age Child, Adult 4 Survived No, Yes
The Titanic
Might be interested in comparing classes by survival
library(knitr) ## Warning: package 'knitr' was built under R version 3.5.2 # Subtable of survival/not by class classTable <- apply(Titanic, MARGIN = c(4,1), FUN = sum) kable(classTable) 1st 2nd 3rd Crew No 122 167 528 673 Yes 203 118 178 212 # Number in each class is classTotals <- apply(classTable, MARGIN = 2, FUN = sum) classSurvival <- t(classTable["Yes", ] /classTotals) rownames(classSurvival) <- c("Survived") kable(classSurvival) 1st 2nd 3rd Crew Survived 0.6246154 0.4140351 0.2521246 0.239548
The Titanic
Following the rules for tables, a better way to present these numbers is as
# Rescale and round to two decimals newTable <- 100 * round(classSurvival, 2) # swap rows and columns newTable <- t(newTable) # Values are already in the right order, but in general # order the values in descending order descendingOrder <- order(newTable, decreasing = TRUE) newTable <- newTable[descendingOrder, ,drop = FALSE] # Note drop argument colnames(newTable) <- c("% survived") kable(newTable, caption = "Survival rates on the Titanic by class") Table 4: Survival rates on the Titanic by class % survived 1st 62 2nd 41 3rd 25 Crew 24 How else might we visually compare these sets of numbers?
The Titanic
As lengths of bars, colour coded (and labelled) by class:
nvals <- nrow(newTable) cols <- rainbow(nvals, alpha = 0.5) barplot(newTable, col = cols, horiz = TRUE, names.arg = c(""), axes = FALSE, xlab = colnames(newTable)) xlocs <- cumsum(newTable) centres <-c(xlocs[1]/2, xlocs[1:(nvals -1)] + diff(xlocs)/2) text(centres, 0.75, labels = rownames(newTable))
% survived 1st 2nd 3rd Crew
which compares lengths along a common NON-aligned scale.
The Titanic
barplot(newTable, col = cols, horiz = TRUE, beside = TRUE, names.arg = c(""), xlab = colnames(newTable), legend.text = rownames(newTable))
Crew 3rd 2nd 1st % survived 10 20 30 40 50 60
which compares lengths along a common ALIGNED scale.
The Titanic
Survival and not surviving
survivalProportions <- classTable survivalProportions["Yes",] <- survivalProportions["Yes", ] /classTotals survivalProportions["No",] <- survivalProportions["No", ] /classTotals survivalCols <- adjustcolor(c("black", "grey"), 0.5) barplot(survivalProportions, col = survivalCols, horiz = TRUE, beside = TRUE, xlab = "Proportion of class", xlim = c(0,1)) legend("bottomright", title = "Survival", fill = survivalCols, legend = rownames(survivalProportions))
1st 2nd 3rd Crew Proportion of class 0.0 0.2 0.4 0.6 0.8 1.0 Survival No Yes
The Titanic
Survival and not surviving; frame
barplot(survivalProportions, col = survivalCols, horiz = TRUE, beside = FALSE, xlab = "Proportion of class", space = 0)
1st 2nd 3rd Crew Proportion of class 0.0 0.2 0.4 0.6 0.8 1.0
Both are again along a common but non-aligned scale, but now bars to be compared are closer and we have the positive effect of framing.
Warning – problems with stacked bars
Bars placed side by side are pretty natural in some contexts, for example when the horizontal axis (and bar width) represents time. For example, consider the following “sleep telemetry chart”: Yellow corresponds to when the baby is awake, blue when they are asleep. But take care when these bars are stacked on top of each other (as above; or placed side by side if arranged vertically). Look what happens for many many stacked bars (and many bars in each).
www.trixietracker.com/tour/sleep/
Warning – problems with stacked bars
Take care when placing bars of stacked colours side by side. For example, Horizontal lines look crooked.
Warning – problems with stacked bars
Warning – problems with stacked bars
Even when the rectangles are the same size, unintended visual effects can be introduced. All lines are perfectly horizontal! This is called the “cafe wall illusion” after a cafe in Bristol, England.
Aside – The cafe wall illusion
Take care when placing bars of stacked colours side by side or you might induce unintended visual variation. Cafe on St. Michael’s Hill in Bristol, England
The Titanic - Number of passengers by class
barplot(apply(classTable, MARGIN = 2, FUN = sum), col= adjustcolor("steelblue", 0.5), xlab="Class", ylab="Number of passengers")
1st 2nd 3rd Crew Class Number of passengers 200 400 600 800
The Titanic - Number who died in each class
barplot(classTable["No",], col = survivalCols[1], xlab="Class", ylab="Number of passengers")
1st 2nd 3rd Crew Class Number of passengers 200 400 600
The Titanic - Number who survived in each class
barplot(classTable["Yes",], col = survivalCols[2], xlab="Class", ylab="Number of passengers")
1st 2nd 3rd Crew Class Number of passengers 50 100 150 200
The Titanic - The proportion of deaths in each class
barplot(classTable, col= survivalCols, xlab="Class", ylab="Number of passengers")
1st 2nd 3rd Crew Class Number of passengers 200 400 600 800
The Titanic
savePar <- par(mfrow=c(1,3)) barplot(apply(classTable, MARGIN = 2, FUN = sum), col= adjustcolor("steelblue", 0.5), ylim = c(0,1000), # ensure common scale xlab="Class", ylab="Number of passengers") barplot(classTable["No",], col = survivalCols[1], ylim = c(0,1000), # ensure common scale main="Died", xlab="Class", ylab="Number of passengers") barplot(classTable["Yes",], col = survivalCols[2], ylim = c(0,1000), # ensure common scale main="Survived", xlab="Class", ylab="Number of passengers") par(savePar)
The Titanic
Comparing counts
1st 2nd 3rd Crew Class Number of passengers 200 400 600 800 1000 1st 2nd 3rd Crew
Died
Class Number of passengers 200 400 600 800 1000 1st 2nd 3rd Crew
Survived
Class Number of passengers 200 400 600 800 1000
Can easily compare number of each class. Common aligned scales. Position, length, areas redundantly encode the values. Easier to compare the “shapes” of the distributions as well. Again, “Died” shape looks fairly similar to the total, except perhaps for 1st and 2nd classes. (Differences easier to tell in framed versions.)
The Titanic
Comparing shapes - no common scale savePar <- par(mfrow=c(1,3)) barplot(apply(classTable, MARGIN = 2, FUN = sum), col= adjustcolor("steelblue", 0.5), # NO COMMON SCALE main="Total", xlab="Class", ylab="Number of passengers") barplot(classTable["No",], col = survivalCols[1], # NO COMMON SCALE main="Died", xlab="Class", ylab="Number of passengers") barplot(classTable["Yes",], col = survivalCols[2], # NO COMMON SCALE main="Survived", xlab="Class", ylab="Number of passengers") par(savePar)
The Titanic
Comparing shapes - no common scale
1st 2nd 3rd Crew
Total
Class Number of passengers 200 400 600 800 1st 2nd 3rd Crew
Died
Class Number of passengers 100 300 500 1st 2nd 3rd Crew
Survived
Class Number of passengers 50 100 150 200
Different scaling makes it easier to compare the “shapes” of the distributions but harder to compare the actual values.
South African heart disease
Here we will look at a dataset ‘SAheart‘ from the package ‘ElemStatLearn‘. It is a sample from a retrospective study
- f heart disease in males from a high-risk region of the
Western Cape, South Africa. There are 462 cases and 10 variates (see ‘help(SAheart, package="ElemStatLearn")‘ for details). For example, ’sbp’ is the measured systolic blood pressure which is the blood pressure when the heart pumps, ‘chd‘ is 1 if the patient has coronary heart disease, and ‘famhist‘ indicates whether or not the patient has a family history of heart disease. library(ElemStatLearn) ## Warning: package 'ElemStatLearn' was built under R version 3.5.2 kable(head(SAheart)) sbp tobacco ldl adiposity famhist typea
- besity
alcohol age chd 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 1 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 1 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 1 132 6.20 6.47 36.21 Present 62 30.77 14.14 45
South African heart disease
Some have a family history of heart disease, others do not. noFamilyHistory <- SAheart[, "famhist"] == "Absent" # Number with no family history of heart disease sum(noFamilyHistory) ## [1] 270 # Number with family history of heart disease FamilyHistory <- SAheart[, "famhist"] == "Present" sum(FamilyHistory) ## [1] 192 Can we compare the distributions of the values of sbp, systolic blood pressure, for those patients who have a family history with those who do not?
South African heart disease
Comparing systolic blood pressure for those with and without family history via boxplots savePar = par(mfrow=c(2,1)) famHistoryCol <- adjustcolor("steelblue", 0.5) noHistoryCol <- adjustcolor("firebrick", 0.5) boxplot(SAheart[noFamilyHistory,"sbp"], col = famHistoryCol, main="No family history", horizontal = TRUE) boxplot(SAheart[FamilyHistory,"sbp"], col = noHistoryCol, main="Have family history", horizontal = TRUE) par(savePar)
South African heart disease
What can be compared via boxplots?
100 120 140 160 180 200 220
No family history
100 120 140 160 180 200 220
Have family history Note the scales are not identical.
South African heart disease
Place them on common aligned scales
boxplot(sbp ~ famhist, data = SAheart, col=c(noHistoryCol, famHistoryCol), main = "Systolic blood pressure", horizontal=TRUE)
Absent Present 100 120 140 160 180 200 220
Systolic blood pressure
South African heart disease
library(ggplot2) ## Warning: package 'ggplot2' was built under R version 3.5.2 # ggplot(data = SAheart, mapping = aes(x=famhist, y=sbp)) + geom_boxplot(colour=c("firebrick", "steelblue"), fill = c("firebrick", "steelblue"), alpha=0.5 ) + coord_flip()
Absent Present 100 125 150 175 200
sbp famhist
South African heart disease
What can be compared via histograms? savePar = par(mfrow=c(1,2)) hist(SAheart[noFamilyHistory,"sbp"], col=noHistoryCol, main="No family history") hist(SAheart[FamilyHistory,"sbp"], col=famHistoryCol, main="Have family history") par(savePar) Note the scales are not necessarily identical.
South African heart disease
What can be compared via histograms?
No family history
SAheart[noFamilyHistory, "sbp"] Frequency 100 120 140 160 180 200 220 10 20 30 40 50 60
Have family history
SAheart[FamilyHistory, "sbp"] Frequency 100 120 140 160 180 200 220 10 20 30 40 50
Note the scales are not necessarily identical.
South African heart disease
Place histograms on a common(in both x and in y) aligned (only x) scale savePar = par(mfrow = c(2,1)) hist(SAheart[noFamilyHistory,"sbp"], col = noHistoryCol main = "No family history", xlab = "systolic blood pressure", xlim = extendrange(SAheart[,"sbp"]), ylim = c(0, 60)) hist(SAheart[FamilyHistory,"sbp"], col = famHistoryCol, main = "Have family history", xlab = "systolic blood pressure", xlim = extendrange(SAheart[,"sbp"]), ylim = c(0, 60)) par(savePar)
South African heart disease
Place histograms on a common(in both x and in y) aligned (only x) scale
No family history
systolic blood pressure Frequency 100 120 140 160 180 200 220 10 20 30 40 50 60
Have family history
systolic blood pressure Frequency 100 120 140 160 180 200 220 10 20 30 40 50 60
South African heart disease
Place them on a common aligned scale hist(SAheart[noFamilyHistory,"sbp"], col=noHistoryCol, main="Overlaid: pink without history, blue with", xlim=extendrange(SAheart[,"sbp"])) hist(SAheart[FamilyHistory,"sbp"], col=famHistoryCol, xlim=extendrange(SAheart[,"sbp"]), add=TRUE)
South African heart disease
Place them on a common aligned scale
Overlaid: pink without history, blue with
SAheart[noFamilyHistory, "sbp"] Frequency 100 120 140 160 180 200 220 10 20 30 40 50 60
South African heart disease
Reflected xrange <- extendrange(SAheart[,"sbp"]) breaks <- seq(xrange[1], xrange[2], length.out = 12 ) h1 = hist(SAheart[noFamilyHistory,"sbp"], breaks= breaks, plot=FALSE) h2 = hist(SAheart[FamilyHistory,"sbp"], breaks= breaks, plot=FALSE) hmax = max(c(h1$counts, h2$counts)) h2$counts = - h2$counts hmin = -hmax X = c(h1$breaks, h2$breaks) xmax = max(X) xmin = min(X) plot(h1, xlab="Systolic blood pressure", main="Comparing patients with (blue) and without (pink)", ylim=c(hmin, hmax), xlim=c(xmin, xmax), col=noHistoryCol) lines(h2, col=famHistoryCol)
South African heart disease
Reflected
Comparing patients with (blue) and without (pink)
Systolic blood pressure Frequency 100 120 140 160 180 200 220 −60 −40 −20 20 40 60
South African heart disease
Back to back
yrange <- extendrange(SAheart[,"sbp"]) breaks <- seq(yrange[1], yrange[2], length.out = 12 ) h1 = hist(SAheart[noFamilyHistory,"sbp"], breaks= breaks, plot=FALSE) h2 = hist(SAheart[FamilyHistory,"sbp"], breaks= breaks, plot=FALSE) nbreaks <- length(breaks) hmax = max(c(h1$counts, h2$counts)) h2$counts = - h2$counts hmin = -hmax Y <-rep(h1$breaks, each=2) X <-c(0, rep(h1$counts, each=2), 0) # Create a plot with nothing inside plot(rep(0,2), range(Y), type = "l", col="black", xlim = c(hmin, hmax), ylim = extendrange(Y), bty="n", xlab="Frequency", ylab="Systolic blood pressure", main="Comparing patients with (blue) and without (pink)" ) polygon(X, Y, col=noHistoryCol) for (i in 1:nbreaks) { lines(c(0, h1$counts[i]), c(rep(h1$breaks[i+1],2)))} Y <-rep(h2$breaks, each=2) X <-c(0, rep(h2$counts, each=2), 0) polygon(X, Y, col=famHistoryCol) for (i in 1:nbreaks) { lines(c(0, h2$counts[i]), c(rep(h2$breaks[i+1],2)))}
South African heart disease
Back to back Often see demographic data plotted this way
−60 −40 −20 20 40 60 100 120 140 160 180 200 220
Comparing patients with (blue) and without (pink)
Frequency Systolic blood pressure
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids”
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids”
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids”
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids”
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids”
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids”
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids”
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids”
Population pyramids
You can even group common patterns
Comparing distributions
Features in common and differences are interesting:
Comparing distributions
What’s different here? What’s the same?
Comparing distributions
What’s different here? What’s the same?
Comparing distributions
A closer look at Bahrain:
Source: Government of Bahrain Census Summary Result 2010 (Population, Housing, . . . Census)
Comparing distributions
A closer look at Bahrain:
non-Bahrainis
Comparing distributions
A closer look at Bahrain:
non-Bahrainis Bahrainis Source: Government of Bahrain Census Summary Result 2010 (Population, Housing, . . . Census)
Population pyramids
Often see demographic data plotted this way - so-called “population pyramids” Aside: Note shimmering effect.
South African heart disease
Using ‘facets’ with ggplot2 library(ggplot2) # ggplot(data = SAheart, mapping = aes(x=sbp)) + geom_histogram(bins=12, colour="grey50", fill = "white") + facet_grid(famhist ~.)
South African heart disease
Absent Present 120 160 200 20 40 60 20 40 60
sbp count
South African heart disease
What can be compared via density estimates? savePar = par(mfrow=c(1,2)) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw="SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw="SJ") plot(densAbsent, col="firebrick", main="No family history") polygon(densAbsent, col=noHistoryCol) plot(densPresent, col="steelblue", main="Family history") polygon(densPresent, col=famHistoryCol) par(savePar) Note the scales are not necessarily identical.
South African heart disease
What can be compared via density estimates?
100 150 200 0.000 0.005 0.010 0.015 0.020 0.025
No family history
N = 270 Bandwidth = 5.423 Density 100 120 140 160 180 200 220 0.000 0.005 0.010 0.015 0.020 0.025
Family history
N = 192 Bandwidth = 4.607 Density
Note the scales are not necessarily identical.
South African heart disease
Common (both x,y) aligned (x only) scales savePar = par(mfrow = c(2,1)) xlim <- extendrange(SAheart[,"sbp"]) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw = "SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw = "SJ") ylim <- extendrange(c(densAbsent$y, densPresent$y)) plot(densAbsent, col = "firebrick", main = "No family history", xlim = xlim, ylim = ylim) polygon(densAbsent, col = noHistoryCol) plot(densPresent, col = "steelblue", main = "Family history", xlim = xlim, ylim = ylim) polygon(densPresent, col = famHistoryCol) par(savePar)
South African heart disease
Common (both x,y) aligned (x only) scales
100 120 140 160 180 200 220 0.000 0.020
No family history
N = 270 Bandwidth = 5.423 Density 100 120 140 160 180 200 220 0.000 0.020
Family history
N = 192 Bandwidth = 4.607 Density
South African heart disease
Common aligned scales (overlaid densities with transparency) xlim <- extendrange(SAheart[,"sbp"]) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw="SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw="SJ") ylim <- extendrange(c(densAbsent$y, densPresent$y)) plot(densAbsent, col="firebrick", xlab="Systolic blood pressure", main="Comparing a family history with no family history", xlim=xlim, ylim=ylim) polygon(densAbsent, col=noHistoryCol) lines(densPresent, col="steelblue") polygon(densPresent, col=famHistoryCol)
South African heart disease
Common aligned scales (overlaid densities with transparency)
100 120 140 160 180 200 220 0.000 0.005 0.010 0.015 0.020 0.025
Comparing a family history with no family history
Systolic blood pressure Density
South African heart disease
Common aligned scales - reflected xlim <- extendrange(SAheart[,"sbp"]) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw="SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw="SJ") densPresent$y <- - densPresent$y ylim <- extendrange(c(densAbsent$y, densPresent$y)) plot(densAbsent, col="firebrick", xlab="Systolic blood pressure", main="Comparing a family history with no family history", xlim=xlim, ylim=ylim) polygon(densAbsent, col=noHistoryCol) lines(densPresent, col="steelblue") polygon(densPresent, col=famHistoryCol)
South African heart disease
Common aligned scales - reflected
100 120 140 160 180 200 220 −0.03 −0.02 −0.01 0.00 0.01 0.02
Comparing a family history with no family history
Systolic blood pressure Density
South African heart disease
Back to back
ylim <- extendrange(SAheart[,"sbp"]) densAbsent <- density(SAheart[noFamilyHistory,"sbp"], bw="SJ") densPresent <- density(SAheart[FamilyHistory,"sbp"], bw="SJ") densPresent$y <- - densPresent$y xlim <- extendrange(c(densAbsent$y, densPresent$y)) xyswitch <- function(xy_thing) { yx_thing <-xy_thing yx_thing$x <- xy_thing$y yx_thing$y <- xy_thing$x yx_thing } plot(xyswitch(densAbsent), col="firebrick", xlab="Density", ylab="Systolic blood pressure", main="Comparing a family history with no family history", xlim=xlim, ylim=ylim) polygon(xyswitch(densAbsent), col=noHistoryCol) lines(xyswitch(densPresent), col="steelblue") polygon(xyswitch(densPresent), col=famHistoryCol)
South African heart disease
Back to back
−0.03 −0.02 −0.01 0.00 0.01 0.02 100 120 140 160 180 200 220
Comparing a family history with no family history
Density Systolic blood pressure
South African heart disease
Using ‘facets’ with ggplot2 library(ggplot2) # ggplot(data = SAheart, mapping = aes(x=sbp, col = famhist)) + geom_density(colour="grey50", fill = "black", alpha = 0.4, bw="SJ") + facet_grid(famhist ~.)
South African heart disease
Using ‘facets’ with ggplot2
Absent Present 100 125 150 175 200 0.00 0.01 0.02 0.00 0.01 0.02
sbp density
South African heart disease
Simple points xlim <- extendrange(SAheart[,"sbp"]) n <- nrow(SAheart) col <- rep(adjustcolor("firebrick", 0.2), n) col[FamilyHistory] <- adjustcolor("steelblue", 0.2) Y <- rep(1, n) Y[FamilyHistory] <- -1 plot(SAheart[,"sbp"], y= Y, col=col, pch=19, cex=3, xlab="Systolic blood pressure", ylab = "", main="Comparing a family history with no family history", xlim=xlim, ylim = c(-2,2), bty="n", yaxt = "n")
South African heart disease
Simple points
100 120 140 160 180 200 220
Comparing a family history with no family history
Systolic blood pressure
South African heart disease
Simple points with jittering xlim <- extendrange(SAheart[,"sbp"]) n <- nrow(SAheart) col <- rep(adjustcolor("firebrick", 0.2), n) col[FamilyHistory] <- adjustcolor("steelblue", 0.2) Y <- rep(1, n) Y[FamilyHistory] <- -1 U <- runif(n, -0.3, 0.3) plot(SAheart[,"sbp"], y= Y + U, col=col, pch=19, cex=3, xlab="Systolic blood pressure", ylab = "", main="Comparing a family history with no family history", xlim=xlim, ylim = c(-2,2), bty="n", yaxt = "n")
South African heart disease
Simple points with jittering
100 120 140 160 180 200 220
Comparing a family history with no family history
Systolic blood pressure
South African heart disease
Quantile plots savePar <- par(mfrow=c(1,2)) nAbsent <- sum(noFamilyHistory) nPresent <- sum(FamilyHistory) pAbsent <- ppoints(nAbsent) pPresent <- ppoints(nPresent) plot(pAbsent, sort(SAheart[noFamilyHistory,"sbp"]), type="b", col=noHistoryCol, pch=19, xlab="Cumulative proportion", ylab = "Systolic blood pressure", main="no family history") plot(pPresent, sort(SAheart[FamilyHistory,"sbp"]), type="b", col=famHistoryCol, pch=19, xlab="Cumulative proportion", ylab = "Systolic blood pressure", main="family history") par(savePar)
South African heart disease
Quantile plots
0.0 0.2 0.4 0.6 0.8 1.0 100 120 140 160 180 200 220
no family history
Cumulative proportion Systolic blood pressure 0.0 0.2 0.4 0.6 0.8 1.0 100 120 140 160 180 200 220
family history
Cumulative proportion Systolic blood pressure
South African heart disease
Quantile plots (common aligned scales via overlaying) ylim <- extendrange(SAheart[,"sbp"]) nAbsent <- sum(noFamilyHistory) nPresent <- sum(FamilyHistory) pAbsent <- ppoints(nAbsent) pPresent <- ppoints(nPresent) plot(pAbsent, sort(SAheart[noFamilyHistory,"sbp"]), type = "b", col = noHistoryCol, pch = 19, ylim = ylim, xlab = "Cumulative proportion", ylab = "Systolic blood pressure", main = "Comparing with (blue) to no family history (pink)") points(pPresent, sort(SAheart[FamilyHistory,"sbp"]), type = "b", col = famHistoryCol, pch=19)
South African heart disease
Quantile plots (common aligned scales via overlaying)
0.0 0.2 0.4 0.6 0.8 1.0 100 120 140 160 180 200 220
Comparing with (blue) to no family history (pink)
Cumulative proportion Systolic blood pressure
How do these two distributions compare? In location? Scale? Modality? Tails?