SLIDE 1 What is data visualization & how can you use it in your daily work?
Anamaria (Ana) Crisan PhD Student University of British Columbia
Evening Rounds November, 15th, 2016
SLIDE 2 Un Undergrad
CompSci
Mas Masters
Bioinformatics
Ph PhD Ge GenomeDX BC BCCDC
EX EXPER ERIEN ENCE
(work experience) (Clinical) (Public Health)
Computer Science Skills
+ Data Visualization Skills!
WHO AM I?
@amcrisan http://cs.ubc.ca/~acrisan
SLIDE 3 WHAT DO I RESEARCH?
Genomic Contact Network Patient Data Outcomes Geography / Location time Treatment
Person Place Time
TB Nurses TB Clinicians Medical Health Officers Researchers Community Leaders
SLIDE 4
WHAT DO I RESEARCH?
SLIDE 5
Data Visualization is not an art or graphic design project
WHAT YOU CAN TAKE AWAY FROM THIS TALK
Deciding upon the most appropriate data visualization can be a research problem
Think about ”why, what, and how” framework Design & Evaluation
Think broadly, progressively find the right data visualization
Example: Communicating Patient Risk Example : Visualization and Election
SLIDE 6
IF VISUALIZATIONS WERE CARS A BEAUTIFULLY IMPRACTICAL OPTION
Ferrari visualizations look super cool and take a lot of time, effort, and resources to produce, but they’re not necessarily practical for most applications. Worthwhile creating sometimes, but think it through.
SLIDE 7
A LESS BEAUTIFUL PRACTICAL OPTION
Toyota Visualizations are well engineered and fit a variety of needs making it a more practical choice. Also, less expensive (time, effort, money) than a Ferrari. Lacks the “wow” factor of a Ferrari, but can hold its own.
IF VISUALIZATIONS WERE CARS
SLIDE 8
POSSIBLY DANGEROUS
Pinto visualizations are tempting because they inexpensive (really low time, energy, money), but they are questionably engineered. Aspects of the visualization are not properly tested with stakeholders, and it can explode if tapped the wrong way.
IF VISUALIZATIONS WERE CARS
SLIDE 9
DATA VISUALIZATION IS NOT AN ART PROJECT
SLIDE 10
- Before we talk “big data” let’s talk “artisanal small
batch data”
- With the paper in front of you, sketch out as many
examples as you can to visualize the following to numbers:
EXERCISE: VISUALIZING TWO QUANTITIES
75 37
example:
SLIDE 11 http://www.scribblelive.com/blog/2012/07/27/45-ways-to-communicate-two-quantities/
EXERCISE: VISUALIZING TWO QUANTITIES
Why do this exercise?
SLIDE 12 Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis”
EXAMPLE : CHANGING ARTERY VISUALIZATIONS
MOTIVATION: Improve accuracy to identify blockages in heart arteries
SLIDE 13 Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis”
EXAMPLE : CHANGING ARTERY VISUALIZATIONS
MOTIVATION: Improve accuracy to identify blockages in heart arteries
SLIDE 14 Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis”
EXISTING STANDARD Accuracy : 39% REVISED VISUALIZATION Accuracy: 91%
RESULTS: Revised visualizations had higher accuracy
EXAMPLE : CHANGING ARTERY VISUALIZATIONS
SLIDE 15
Why?
Why do you need to visualize data?
What?
What kind of data is being visualized?
How?
How is data being visualized?
(Motivation) (Data) (Visual and Interaction Design)
DATA VISUALIZATION IN THREE QUESTIONS
SLIDE 16 A QUICK NOTE ON “WHAT”
16 Munzner (2014) “Visualization Analysis and Design”
Don’t just visualize the raw data!
SLIDE 17 Why?
17
What? How?
Design Evaluation
Does the visualization solve a relevant problem? Are you using the right data, or deriving the right data? Are the visual and interactive design choices appropriate?
DATA VISUALIZATION IN THREE QUESTIONS
SLIDE 18 HOW TO DESIGN & EVALUATE DATA VIZ
18 Munzner (2014) “Visualization Analysis and Design”
Data Visual + Interaction Design Choices Technique Motivation
Why? What? How?
A visualization can be decomposed into four layers
SLIDE 19 HOW TO DESIGN & EVALUATE DATA VIZ
19 Munzner (2014) “Visualization Analysis and Design”
Data Visual + Interaction Design Choices Technique
A visualization can be decomposed into four layers
Motivation
Why? What? How?
Design Process: start with the “why” (domain problem) work your way in to “how”
DESIGN
SLIDE 20 HOW TO DESIGN & EVALUATE DATA VIZ
20 Munzner (2014) “Visualization Analysis and Design”
Data Visual + Interaction Design Choices Technique
A visualization can be decomposed into four layers
Motivation
Why? What? How?
Evaluation Process: start with the “how” and assess if it’s the right choice for the “why” and “what”
EVALUATION
SLIDE 21 HOW TO DESIGN & EVALUATE DATA VIZ
21 Munzner (2014) “Visualization Analysis and Design”
Data Visual + Interaction Design Choices Technique Motivation
Why? What? How?
A visualization can be decomposed into four layers We’ll talk a little bit about the “how” today
SLIDE 22 BREAKING DOWN “HOW”
22 Munzner (2014) “Visualization Analysis and Design”
Building up a visualization from geometric points
SLIDE 23 Cleveland and McGill 1984; Heer and Bostock 2010
Some visualizations are more effective than others
Most errors Least Errors
BREAKING DOWN “HOW”
SLIDE 24 Munzner (2014) “Visualization Analysis and Design”
EXAMPLE: BREAKING DOWN A VISUALIZATION
Vertical Position Vertical Position Vertical Position Vertical Position Horizontal Position Horizontal Position Horizontal Position Colour Colour Size
SLIDE 25 25
Colour = Continent Size = Population
Five dimensions are plotted in 2D
(4 continuous dimensions & 1 categorical dimension)
Transparency = Similarity Position: HE Position: LE
EXAMPLE: BREAKING DOWN A VISUALIZATION
SLIDE 26 EXAMPLE: BREAKING DOWN A VISUALIZATION
*Note* not the same data
SLIDE 27
Find more terrible visualizations here!
SLIDE 28 Matthew Brehmer’s totally subjective ranking of vis design tools
SOFTWARE TOOLS FOR DATA VISUALIZATION
SLIDE 29
BUT ALSO PEN & PAPER!
Dear Data Project (Lupi & Posavec)
SLIDE 30
VISUALIZING AN ELECTION
SLIDE 31 31
- Point of example is not to discuss:
- Correctness / relevancy of polling or forecasting
- Politics of results
- Very interesting data visualizations emerged
from US election cycle (before & after)
- Forecasting relied on reporting probabilities; also
commonly reported in medicine
US ELECTIONS DATA VISUALIZATIONS
SLIDE 32 32
http://bit.ly/1FxtT2z
PROBABILITY INCONSISTENTLY INTERPRETTED
SLIDE 33 33
US ELECTIONS DATA VISUALIZATIONS
Show forecasted voter intentions % chance of “winning” state; geography Choropleth map WHY WHAT HOW
SLIDE 34 34
US ELECTIONS DATA VISUALIZATIONS
Show forecasted voter intentions % chance of “winning” state; geography; # of EC votes Cartogram WHY WHAT HOW
SLIDE 35 35
US ELECTIONS DATA VISUALIZATIONS
Show forecasted voter intentions % chance of “winning” state; # of EC votes Snakey Diagram WHY WHAT HOW
SLIDE 36 36
US ELECTIONS DATA VISUALIZATIONS
Support for each party by region Margin of win; total # votes cast; geography Choropleth map WHY WHAT HOW
SLIDE 37 37
US ELECTIONS DATA VISUALIZATIONS
Changes in votes cast between 2016 & 2012 Changing support; margin (points) of lead by party Choropleth map WHY WHAT HOW
SLIDE 38 38
US ELECTIONS DATA VISUALIZATIONS
Changes in votes cast between 2016 & 2012 Changing support; margin (points) of lead by party Choropleth map WHY WHAT HOW
SLIDE 39 39
US ELECTIONS DATA VISUALIZATIONS
Changes in votes cast between 2016 & 2012 changing support; margin (percentage) of lead by party; # EC votes Stacked bar chart WHY WHAT HOW
SLIDE 40 40
- All visualizations have trying to solve very similar
problem
§ Show how people may vote & how this effects elections
- Very different types of data shown in each
visualization
§ Visualization only as good as underlying data § VERY important to understand data sources
- Different use of visual metaphors, some simple,
some complex
US ELECTIONS DATA VISUALIZATIONS
SLIDE 41
COMMUNICATING PATIENT RISK
SLIDE 42 42
XKCD Comic #881
HOW DO WE COMMUNICATE RISK?
SLIDE 43 43
60%
Probability Frequency Visualization
6 in 10
< <
(difficult to understand) (easier to understand)
EVIDENCE FROM RISK COMMUNICATION
Whiting (2015) “How well do health professionals interpret diagnostic information? A systematic review”
- Numeracy : the ability to reason with numbers
§ Individuals with low numeracy have a difficulty interpreting numbers and probabilities
- Visualizations can help people with low
numeracy make sense of data
- But- limited guidance toward vis design
§ Different visualizations are not equally effective
SLIDE 44 Garcia-Retamero et. al (2013) “Visual representation of statistical information improves diagnostic inferences in doctors and their patients”
R A N D O M I Z E Probability Frequency R N D Visual Aid No Visual Aid R N D Visual Aid No Visual Aid Patients + Doctors
STUDY DESIGN RESULTS
Visualization improved comprehension of both doctors and patients Visualization improved concordance between doctors and patients Quasi-randomized trial with four conditions Outcome : correctly calculating the risk (essentially a math test)
EXAMPLE : SHARED DECISION MAKING
SLIDE 45 45
EXAMPLE : BREAST CANCER TX CHOICE
Baseline Visualization Alternative 1 Alternative 2
SLIDE 46
EXAMPLE : WWW. VIZHEALTH.ORG
SLIDE 47
MAMOGRAPHY SCREENING PROBLEM
What is the probability that a woman who participates in routine screening and receives a positive result has breast cancer? “The probability of breast cancer is 1% for a woman who participates in routine screening. If a woman who participates in routine screening has breast cancer, the probability is 80% that she will have a positive test result. If a woman who participates in routine screening does not have breast cancer, the probability is 9.6% that she will have a positive test result”
SLIDE 48
MAMOGRAPHY SCREENING PROBLEM
“The probability of breast cancer is 1% for a woman who participates in routine screening. If a woman who participates in routine screening has breast cancer, the probability is 80% that she will have a positive test result. If a woman who participates in routine screening does not have breast cancer, the probability is 9.6% that she will have a positive test result”
Has Breast Cancer No Breast Cancer Total Positive Result 8 95 103 Negative Result 2 895 897 Total 10 (1%) 990 (99%) 1000
SLIDE 49
MAMOGRAPHY SCREENING PROBLEM
“The probability of breast cancer is 1% for a woman who participates in routine screening. If a woman who participates in routine screening has breast cancer, the probability is 80% that she will have a positive test result. If a woman who participates in routine screening does not have breast cancer, the probability is 9.6% that she will have a positive test result”
Has Breast Cancer No Breast Cancer Total Positive Result 8 95 103 Negative Result 2 895 897 Total 10 (1%) 990 (99%) 1000 Probability of breast cancer given positive mammography : 7.8% (8 ÷ 103)
SLIDE 50 MAMOGRAPHY SCREENING PROBLEM 1000 women 10
Have breast cancer
990
Do Not have breast cancer
8
Positive Result
2
Negative Result
895
Negative Result
95
Positive Result
SLIDE 51
MAMOGRAPHY SCREENING PROBLEM
SLIDE 52
MAMOGRAPHY SCREENING PROBLEM
SLIDE 53
WHAT DO YOU THINK?
Why? What? How?