SLIDE 1 Introduction to data visualization with R
An Andr drew Heiss, PhD
Brigham Young University SLC RUG • July 11, 2018 @andrewheiss
SLIDE 2
Why visualize data?
Plan for today
Aesthetics and design Types of visualizations
! Take a sad plot and make it CRAPier !
SLIDE 3
talks.andrewheiss.com/utah-rug-dataviz/
SLIDE 4
SLIDE 5
SLIDE 6
Why visualize data?
SLIDE 7
Data alone cannot tell stories or prove theories
SLIDE 8
SLIDE 9
SLIDE 10
Never trust summary statistics alone
SLIDE 11 Humans are visual creatures
@FacesPics
SLIDE 12
SLIDE 13
What makes a good visualization?
SLIDE 14
SLIDE 15 Characteristics of graphical excellence
- 1. “... the well-designed presentation of interesting
data—a matter of substance, statistics, and design.”
- 2. Complex ideas communicated with
clarity, precision, and efficiency.
- 3. That which gives the viewer the greatest
number of ideas in the shortest time with the least ink in the smallest space.
- 4. Nearly always multivariate.
- 5. Requires telling the truth about the data.
SLIDE 16 What makes Minard’s graph so great?
=
- <latexit sha1_base64="pYen8CoRw76JGC2IK5JDw/YfKA=">ACN3icbVDLSgMxFM34rPVdekmWARXZUYERCKutCNKFgrtKXcSTNtaCYzJHfEMsxfufE3OnGhSJu/QPTB6itBwKHc87l5h4/lsKg6z47U9Mzs3PzuYX84tLymphbf3GRIlmvMIiGelbHwyXQvEKCpT8NtYcQl/yqt896fvVO6NiNQ19mLeCKGtRCAYoJWahYs68ntMTwHh8Fx1qe7rGT2i9UADS4euNbLmTzDLJvXrCEFmWbNQdEvuAHSeCNSJCNcNgtP9VbEkpArZBKMqXlujI0UNAomeZavJ4bHwLrQ5jVLFYTcNLB3RndtkqLBpG2TyEdqL8nUgiN6YW+TYaAHTPu9cX/vFqCwUEjFSpOkCs2XBQkmJE+yXSltCcoexZAkwL+1fKOmALQ1t13pbgjZ8SW52S5b8q72iuXjUR05skm2yA7xyD4pkzNySqEkQfyQt7Iu/PovDofzucwOuWMZjbIHzhf3yfvry4=</latexit><latexit sha1_base64="pYen8CoRw76JGC2IK5JDw/YfKA=">ACN3icbVDLSgMxFM34rPVdekmWARXZUYERCKutCNKFgrtKXcSTNtaCYzJHfEMsxfufE3OnGhSJu/QPTB6itBwKHc87l5h4/lsKg6z47U9Mzs3PzuYX84tLymphbf3GRIlmvMIiGelbHwyXQvEKCpT8NtYcQl/yqt896fvVO6NiNQ19mLeCKGtRCAYoJWahYs68ntMTwHh8Fx1qe7rGT2i9UADS4euNbLmTzDLJvXrCEFmWbNQdEvuAHSeCNSJCNcNgtP9VbEkpArZBKMqXlujI0UNAomeZavJ4bHwLrQ5jVLFYTcNLB3RndtkqLBpG2TyEdqL8nUgiN6YW+TYaAHTPu9cX/vFqCwUEjFSpOkCs2XBQkmJE+yXSltCcoexZAkwL+1fKOmALQ1t13pbgjZ8SW52S5b8q72iuXjUR05skm2yA7xyD4pkzNySqEkQfyQt7Iu/PovDofzucwOuWMZjbIHzhf3yfvry4=</latexit><latexit sha1_base64="pYen8CoRw76JGC2IK5JDw/YfKA=">ACN3icbVDLSgMxFM34rPVdekmWARXZUYERCKutCNKFgrtKXcSTNtaCYzJHfEMsxfufE3OnGhSJu/QPTB6itBwKHc87l5h4/lsKg6z47U9Mzs3PzuYX84tLymphbf3GRIlmvMIiGelbHwyXQvEKCpT8NtYcQl/yqt896fvVO6NiNQ19mLeCKGtRCAYoJWahYs68ntMTwHh8Fx1qe7rGT2i9UADS4euNbLmTzDLJvXrCEFmWbNQdEvuAHSeCNSJCNcNgtP9VbEkpArZBKMqXlujI0UNAomeZavJ4bHwLrQ5jVLFYTcNLB3RndtkqLBpG2TyEdqL8nUgiN6YW+TYaAHTPu9cX/vFqCwUEjFSpOkCs2XBQkmJE+yXSltCcoexZAkwL+1fKOmALQ1t13pbgjZ8SW52S5b8q72iuXjUR05skm2yA7xyD4pkzNySqEkQfyQt7Iu/PovDofzucwOuWMZjbIHzhf3yfvry4=</latexit><latexit sha1_base64="pYen8CoRw76JGC2IK5JDw/YfKA=">ACN3icbVDLSgMxFM34rPVdekmWARXZUYERCKutCNKFgrtKXcSTNtaCYzJHfEMsxfufE3OnGhSJu/QPTB6itBwKHc87l5h4/lsKg6z47U9Mzs3PzuYX84tLymphbf3GRIlmvMIiGelbHwyXQvEKCpT8NtYcQl/yqt896fvVO6NiNQ19mLeCKGtRCAYoJWahYs68ntMTwHh8Fx1qe7rGT2i9UADS4euNbLmTzDLJvXrCEFmWbNQdEvuAHSeCNSJCNcNgtP9VbEkpArZBKMqXlujI0UNAomeZavJ4bHwLrQ5jVLFYTcNLB3RndtkqLBpG2TyEdqL8nUgiN6YW+TYaAHTPu9cX/vFqCwUEjFSpOkCs2XBQkmJE+yXSltCcoexZAkwL+1fKOmALQ1t13pbgjZ8SW52S5b8q72iuXjUR05skm2yA7xyD4pkzNySqEkQfyQt7Iu/PovDofzucwOuWMZjbIHzhf3yfvry4=</latexit>
SLIDE 17
We forget this!
SLIDE 18
SLIDE 19
SLIDE 20
Types of visualizations
SLIDE 21 Exploratory visualizations
Academic-ish Quick scatterplots, histograms, other charts to help understand your data
Explanatory visualizations
Publishable Consumable by the general public; Vox, NYT, Washington Post, FiveThirtyEight, etc.
SLIDE 22
Exploratory data analysis
Find analytical insight in data (even causal inference ! )
SLIDE 23
Explanatory data analysis
Annotate and tell a story
SLIDE 24
Exploratory Explanatory
SLIDE 25
Which chart type do I use?
SLIDE 26
Aesthetics and design
SLIDE 27
SLIDE 28
SLIDE 29 Is he maybe a little right?
plot(mtcars$wt, mtcars$mpg)
SLIDE 30 R can’t do everything
There’s still a place for Illustrator, InDesign, et al.
SLIDE 31
But R can do a lot
And it can automate most of the hard work
SLIDE 32
What constitutes good design?
SLIDE 33 Four core design principles
Co Contrast Re Repetition Al Alignmen ent Pr Proximity
SLIDE 34 Contrast
“If f two it items are not exactly ly th the same, , make th them diffe
- ifferent. Really different.”
.”
Don’t be a wimp!
SLIDE 35 Serif Sans Serif Slab Serif Script Decorative Lorem ipsum dolor sit amet Lo Lorem ipsum um do dolor sit am amet
Lorem ipsum dolor sit amet
Lo Lorem ip ipsu sum do dolor sit am amet
Lo Lorem ip ipsum do dolor sit am amet Light Lorem ipsum dolor sit amet Black Lorem ipsum dolor sit amet
SLIDE 36 https://color.adobe.com/ http://colorbrewer2.org/ viridis Scientific Colour-Maps https://github.com/thomasp85/scico
SLIDE 37
SLIDE 38 Repetition
“R “Repeat at some as aspect
f the desig ign throughout the entir ire pie iece.”
SLIDE 39
SLIDE 40 Alignment
“Every it item should ld have a vis isual l connectio ion wit ith somethin ing els lse on the page.”
SLIDE 41
SLIDE 42
SLIDE 43 Proximity
“Group rela lated it items together.”
SLIDE 44
SLIDE 45 Contrast Repetition Alignment Proximity
SLIDE 46
! Take a sad plot and make it CRAPier !
SLIDE 47 By default, R graphics violate CRAP
plot(mtcars$wt, mtcars$mpg, main = "Here's a title") ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + labs(title = "Here's a title")
SLIDE 48 With ggplot’s theme() and other functions, we can make beautiful CRAPy figures automatically with R talks.andrewheiss.com/utah-rug-dataviz/
You can also do this in base R, but I find ggplot’s paradigm more intuitive
SLIDE 49
Moral of the story
Graphics are essential for telling stories and gaining insight Design principles (CRAP) make graphics better understandable R + ggplot can follow CRAP and make beautiful, insightful graphics