Visualizing Public Health Data Anamaria Crisan, MSc PhD student - - PowerPoint PPT Presentation

visualizing public health data
SMART_READER_LITE
LIVE PREVIEW

Visualizing Public Health Data Anamaria Crisan, MSc PhD student - - PowerPoint PPT Presentation

Visualizing Public Health Data Anamaria Crisan, MSc PhD student with Drs. Jennifer Gardy & Tamara Munzner UBC School of Population and Public Health Why am I giving this talk? PhD Master of Science (Computational Public Health)


slide-1
SLIDE 1

Visualizing Public Health Data

Anamaria Crisan, MSc

PhD student with Drs. Jennifer Gardy & Tamara Munzner UBC School of Population and Public Health
slide-2
SLIDE 2

Why am I giving this talk?

slide-3
SLIDE 3 3

Master of Science

( Bioinformatics )

PhD

(“Computational Public Health”) GenomeDx Biosciences British Columbia Centre for Disease Control

2010 2013 2015 2008

@amcrisan http://cs.ubc.ca/~acrisan

slide-4
SLIDE 4 4

I’m not an artist. I’m a data analyst.

http://blog.framed.io/ Computer Science Skills + Data Visualization Skills!
slide-5
SLIDE 5 5

Disclaimer

I’ll be talking about a project I worked on while employed at GenomeDx Biosciences. Everything I am presenting is publically available, but this doesn’t mean that I endorse their products or the products of their competitors. Furthermore, I am relaying high level details of my own thought process during and after this project, not the thoughts of others at the organization.

slide-6
SLIDE 6 6

Eventually I had Explain my Work to Experts with Different Backgrounds

I often used data visualization to explain the results of data mining and statistical techniques But one day I got tasked with a rather challenging problem…

slide-7
SLIDE 7 7

The Question:

The task: We had developed a genomic biomarker panel to assess a man’s risk of metastatic prostate cancer following prostatectomy

How do we communicate “risk”?

XKCD Comic #881
slide-8
SLIDE 8 8

I wanted to take more ownership of the question “how do we communicate risk?”

slide-9
SLIDE 9 9

I wanted to take more ownership of the question “how do we communicate risk?” There wasn’t a simple answer

slide-10
SLIDE 10 10

http://bit.ly/1Knrj19

Just show a Number …

slide-11
SLIDE 11 http:/ /bit.ly/1FxtT2z

Is a Data Visualization really Necessary?

slide-12
SLIDE 12 12

60%

Probability Frequency Visualization

6 in 10

< <

(difficult to understand) (easier to understand)

Evidence from Risk Communication Literature

Whiting et. al (2015) “How well do health professionals interpret diagnostic information? A systematic review”

Numeracy : the ability to reason with numbers Individuals with low numeracy have a difficulty interpreting numbers and probabilities Visualizations can help people with low numeracy make sense of data, But, there is some evidence that low numeracy affects reasoning with graphs as well.

slide-13
SLIDE 13 13

Example : Data Visualization in Shared decision Making

Garcia-Retamero et. al (2013) “Visual representation of statistical information improves diagnostic inferences in doctors and their patients” R A N D O M I Z E Probability Frequency R N D Visual Aid No Visual Aid R N D Visual Aid No Visual Aid Patients + Doctors

STUDY DESIGN RESULTS

Visualization improved comprehension of both doctors and patients Visualization improved concordance between doctors and patients Quasi-randomized trial with four conditions Outcome : correctly calculating the risk (essentially a math test)
slide-14
SLIDE 14

Yes! Data visualization was more than a “nice to have”!

slide-15
SLIDE 15 15

Example Report: OncotypeDx DCIS report

Show a Number and a Picture

slide-16
SLIDE 16 16

Example Report: Myriad Prolaris Prostate Cancer Test Report

Show a Number and a Picture

slide-17
SLIDE 17 17

Example Report: Decipher Prostate Cancer Test Report

Primary population: Men, who are susceptible to red- green colour blindness

Show a Number and a Picture

slide-18
SLIDE 18 18

Example : Deciding upon an Intervention

Baseline Visualization Alternative 1 Alternative 2

Zikmund-Fisher (2013). A demonstration of ''less can be more'' in risk graphics. Zikmund-Fisher (2008). Improving understanding of adjuvant therapy

  • ptions by using simpler risk

graphics

Helping breast cancer patients decide between multiple treatment

  • ptions.
slide-19
SLIDE 19

SO… what is data visualization?

19
slide-20
SLIDE 20 20

Data visualization is not art

Beyond Building Pretty & Cool Visualizations

slide-21
SLIDE 21

Design Art

Ideas taken from @rachelbinx’s 2016 Open Vis talk And http:/ /featureguru.com/art-vs-design.html

Data Visualization

(I argue data visualization is much more about design)

21

Defining Data Visualization

Beyond Building Pretty & Cool Visualizations

slide-22
SLIDE 22 22

There’s more to data visualization than simply communicating numerical data

BUT WAIT!

slide-23
SLIDE 23 23

Example : Hypothesis Generation

John Snow’s Visualization of the 1854 Cholera Outbreak

Allowed John Snow to form the hypothesis of what may be leading to the cholera

  • utbreak
slide-24
SLIDE 24 24

Example : Hypothesis Generation

John Snow’s Visualization of the 1854 Cholera Outbreak

Allowed John Snow to form the hypothesis of what may be leading to the cholera

  • utbreak
slide-25
SLIDE 25 25

Example : Checking Assumptions of Statistical Models

Anscombe’s quartet, four datasets that have near identical descriptive statistics but that look very different when visualized.

Anscombe, F. (1973) “Graphs in Statistical Analysis”

Data visualization has long complemented applied statistical

  • practices. Consider Tukey’s classic

“Exploratory Data Analysis”, which is rife with suggestions for how to visualization data.

slide-26
SLIDE 26

So what should be think about when designing data visualizations?

26
slide-27
SLIDE 27

Why?

Why do you need to visualize data?

What?

What kind of data is being visualized?

How?

How is data being visualized?

A Data visualization in 3 Questions:

27

(Motivation) (Data) (Visual and Interaction Design)

slide-28
SLIDE 28

Why? A Data visualization in 3 Questions:

28

What? How?

Design Evaluation

Does the visualization solve a relevant problem? Are you using the right data, or deriving the right data? Are the visual and interactive design choices appropriate?

slide-29
SLIDE 29

Why What How How

Steps to Design and Evaluate a Data Visualization

DESIGN EVALUATION

29 Munzner (2014) “Visualization Analysis and Design”
slide-30
SLIDE 30

Why What How How

Steps to Design and Evaluate a Data Visualization

Qualitative Methods, Domain Knowledge Qualitative & Quantitative Methods Design & Cognitive Science Computer Science

Methodology

30
slide-31
SLIDE 31 31

The “Design Space” metaphor

Sedlmair 2012 https://www.cs.ubc.ca/nest/imager/tr/2012/dsm/dsm-talk.pdf
slide-32
SLIDE 32 32

The “Design Space” metaphor

Sedlmair 2012 https://www.cs.ubc.ca/nest/imager/tr/2012/dsm/dsm-talk.pdf

OPTIMIZATION!

How Data Visualization is like Statistical Modelling

slide-33
SLIDE 33 33 Sedlmair 2012 https://www.cs.ubc.ca/nest/imager/tr/2012/dsm/dsm-talk.pdf

The “Design Space” metaphor

Progressively Identify the Right Visualization

Use “why, what, and how” framework to guide the selection

  • f the optimal design choice
slide-34
SLIDE 34 34

The Importance of Thinking Broadly

Munzner (2014) “Visualization Analysis and Design”

Use “why, what, and how” framework to guide the selection

  • f the optimal design choice
slide-35
SLIDE 35

Designs for Visualizing Health Data (http:/

/www.vizhealth.org/) 35
slide-36
SLIDE 36

A preview of some things I am working on

36
slide-37
SLIDE 37

How do we design good visualizations for public health?

37

BUT…..

slide-38
SLIDE 38 38

To what extent and in what ways does the visualization of genomic, administrative, and contact network data support decision making for communicable disease prevention and control Primary Research Question

slide-39
SLIDE 39 39

To what extent and in what ways does the visualization of genomic, administrative, and contact network data support decision making for communicable disease prevention and control Primary Research Question

  • aka. “How is visualization of communicable disease (public

health) data useful? Can I quantify how useful it is?”

slide-40
SLIDE 40 40

What is the best way to visually represent data in an outbreak context to promote a rapid response? How can stakeholders explore their data more effectively to identify areas of needs and develop effective outreach programs? What is the most effective way to show genomic data over space and time? Some Example Sub Questions

slide-41
SLIDE 41 41

Example 1

Visualizing Tuberculosis data at the British Columbia Centre for Disease Control

slide-42
SLIDE 42

WHY

42
slide-43
SLIDE 43

Clinical Social Lab

Combining Data will Prepare us for the Pandemics of the Future

43
slide-44
SLIDE 44 44

But, that’s a lot of data….

slide-45
SLIDE 45

Can Visualizing TB data help Decision Support?

We wanted to create an interactive and visual tool that allowed

  • ur public health stakeholders to analyze the different data types

We want to understand how this tool can be used by different public health stakeholders

TB Nurses TB Clinicians Medical Health Officers 45 Researchers Epis / Biostats
slide-46
SLIDE 46

WHAT

46
slide-47
SLIDE 47

Treatment Genomic Contact Network Patient Data Outcomes Geography / Location

47
slide-48
SLIDE 48

Treatment Genomic Contact Network Patient Data Outcomes Geography / Location

48

TB whole genome Genotyping

slide-49
SLIDE 49

Treatment Genomic Contact Network Patient Data Outcomes Geography / Location time

49
slide-50
SLIDE 50

HOW

50
slide-51
SLIDE 51

An Iterative Approach to Development

51

An iterative approach to development allows us to get feedback before committing to ineffective design choices

slide-52
SLIDE 52

Introducing EpiCOGs

DEMO

52

EpiCogs is a data viewer and currently a sandbox environment for developing data visualizations

slide-53
SLIDE 53

Technology Changes

Factors Influencing the Current Design

Support for data visualization tools in R improved greatly allowing for the creation of better data visualizations

Data Driven Interface and Analysis

Created a data driven interface that is responsive to the user’s data.

Policies and Procedures

Existing policies and procedures at the BCCDC inform the utility of such a tool and how it can integrate into existing workflows

Needs of individuals

Gathered through meetings, dialogue with individuals, and various iterations of EpiCOGs

slide-54
SLIDE 54

Much initial work was to understand the tool’s feasibility

Initial Work & Next Directions

Could it meet the needs of stakeholders? How could it integrate (security & workflow)? How could it be supported long term? (Choice of R) Could we build a useful tool in R?

Next phases will explore genotypes, genomics, and contact networks

Right now, users can filter based on assigned genotype clusters (which will show patients on map), but we’re working towards better visual and interactive design for these data

slide-55
SLIDE 55

TRY THE DEMO:

https:/ /amcrisan.shinyapps.io/EpiCOGSDEMO/

GET THE CODE

(& contribute to the project!) :

https:/ /github.com/amcrisan/EpiCOGS/

This is an Open Source Project

slide-56
SLIDE 56

Call for Guinea Pigs!

To make relevant tools I need feedback! If you want to be involved and get project updates let me know!

E-mail: anamaria.crisan@bccdc.ca Twitter: @amcrisan Web : cs.ubc.ca/~acrisan

slide-57
SLIDE 57 57

Example 2

Visualizing the Ebola Outbreak – An example of a design process

slide-58
SLIDE 58 58

This was what we started with

A very familiar layout, all the information is there, but you have to do some work to put it together

slide-59
SLIDE 59 59

This was what we started with

Bedford Lab – Next Strain

slide-60
SLIDE 60

Can we improve the Design of the Visualization?

  • How do different public health stakeholders use the data?
  • Epidemiologists want to know:

– Where is it spreading? – How is it spreading? – How many people are impacted?

  • Researchers want to know:

– What’s spreading? – How similar are the outbreak clusters? – How is changing over time?

60
slide-61
SLIDE 61 61

Step 1: Small multiples by time

Ag Aggregate case distribution over entire sampling period Ag Aggregate case distribution by month
slide-62
SLIDE 62 62

Step 2: Small multiples by time and genome cluster

slide-63
SLIDE 63 63

Step 3: Small multiples by time and genome cluster and with sequence similarity

White: dominant nucleotide Grey : less dominant nucleotide
slide-64
SLIDE 64 64

Highly populous capital is very difficult to see

By abstracting the geography, we can represent more data more easily

slide-65
SLIDE 65 65

Capital city gets a more prominent view

By abstracting the geography, we can represent more data more easily

slide-66
SLIDE 66 66

X-axis ordering

Alphabetically within high level administration regions Geographic distance
slide-67
SLIDE 67

Part 3:

Take home messages

67
slide-68
SLIDE 68

Beyond Building Pretty & Cool Visualizations

68

Data visualization is not art It is a research process.

slide-69
SLIDE 69 69

Data Visualization is not an art or graphic design project

Take Home Messages

Relevance (utility) and usability trump aesthetics

slide-70
SLIDE 70 70

Data Visualization is not an art or graphic design project Deciding upon the most appropriate data visualization can be a research problem

Think about ”why, what, and how” framework Parallels to finding the right statistical model Relevance (utility) and usability trump aesthetics Design & Evaluation

Take Home Messages

slide-71
SLIDE 71 71

Data Visualization is not an art or graphic design project Deciding upon the most appropriate data visualization can be a research problem

Relevance (utility) and usability trump aesthetics

Think broadly, progressively find the right data visualization

The Design Space Concept Iterative development Think about ”why, what, and how” framework Parallels to finding the right statistical model Design & Evaluation

Take Home Messages

slide-72
SLIDE 72 PH PHSA SA Re Reference Labor borator
  • ry
  • Dr. Patrick Tang
Hope Lapointe Clare Kong BC BCCDC CDPACS Ciaran Aiken Laura MacDougall Mike Coss Sunny Mak Mike Otterstatter Robert Balshaw BC BCCDC Clinical TB B Clinical Team Clinicians
  • Dr. Maureen Mayhew
  • Dr. James Johnston
  • Dr. Jason Wong (CPS)
  • Dr. Victoria Cook
Nurses Nash Dhalla Michelle Mesaros Epidemiologists Dr
  • Dr. Da
David Roth The The Ga Gard rdy La Lab Dr
  • Dr. Jennifer Ga
Gard rdy Jennifer Guthrie UB UBC C Co Computer Science Dr
  • Dr. Tamara Mu
Munzner The InfoVis group

Th This would work not be possible without these fi fine people

72 The The large e tea eam of ind ndividua ual’s fr from B BC’s H HAs a and H HSDAs wi without wh whom there wo would be be no
  • da
data.