Effective Visualizations for Credible, Data-Driven Decision Making
Marc Vandemeulebroecke, Mark Baillie, Charlotta Früchtenicht and Diego Saldana On behalf of the visR collaboration team http://openpharma.github.com/visR
Effective Visualizations for Credible, Data-Driven Decision Making - - PowerPoint PPT Presentation
Effective Visualizations for Credible, Data-Driven Decision Making Marc Vandemeulebroecke, Mark Baillie , Charlotta Frchtenicht and Diego Saldana On behalf of the visR collaboration team http://openpharma.github.com/visR Agenda Intro and
Marc Vandemeulebroecke, Mark Baillie, Charlotta Früchtenicht and Diego Saldana On behalf of the visR collaboration team http://openpharma.github.com/visR
https://www.nytimes.com/2020/03/19/health/coronavirus-distancing-transmission.html
https://informationisbeautiful.net/
https://www.economist.com/briefing/2020/02/29/covid-19-is-now-in-50-countries-and-things-will-get-worse
https://twitter.com/CT_Bergstrom/status/1235865328074153986
○ Know the purpose of creating the graph ○ Identify the quantitative evidence to support the purpose ○ Identify the audience and focus the design to support their needs
○ Avoid misrepresentation (use appropriate scales) ○ Choose the appropriate graph type to display your data ○ Maximize data to ink ratio (reduce distraction, less is more)
○ Use proximity and alignment to aid in comparisons ○ Minimize mental arithmetic (e.g. plot the difference) ○ Use colors and annotations to highlight important details
https://ascpt.onlinelibrary.wiley.com/doi/full/10.1002/psp4.12455
Why Clearly identify the purpose of the graph, e.g. to deliver a message or for exploration? What Identify the quantitative evidence to support the purpose Who Identify the intended audience (specialists, non-specialists, both) and focus the design to support their needs Where Adapt the design to space or formatting constraints (e.g. clinical report, slide deck or publication)
https://graphicsprinciples.github.io/
https://simplystatistics.org/2019/04/17/tukey-design-thinking-and-better-questions/
“More Questions, Better Questions
Most of the time in data analysis, we are trying to answer a question with data. I don’t think it’s controversial to say that, but maybe that’s the wrong approach? Or maybe, that’s what we’re not trying to do at first. Maybe what we spend most of our time doing is figuring out a better question.”
“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.” - John Tukey
Credit Andrew Wright, Novartis
Credit Andrew Wright, Novartis
16
https://twitter.com/YouGov/status/838750115796041728
Deviation Correlation Ranking Distribution Evolution Part-to-whole Magnitude
baseline Scatter plot Horizontal bar chart Boxplot Kaplan Meier Stacked bar chart Vertical bar chart Waterfall Heat map Dotplot Histogram Line plot Tree map Forest plot
https://graphicsprinciples.github.io/
https://graphicsprinciples.github.io/
https://graphicsprinciples.github.io/
https://graphicsprinciples.github.io/
https://graphicsprinciples.github.io/
https://graphicsprinciples.github.io/
https://graphicsprinciples.github.io/
Genetic marker positive Genetic marker negative
https://graphicsprinciples.github.io/
https://socviz.co/ https://serialmentor.com/dataviz/ https://www.principiae.be/book/
○ enable clear and impactful communication, ○ elevate our influence with our stakeholders, ○ facilitate informed decision making.
https://twitter.com/EricTopol/status/1236001710507585536
Problem Styling and annotating plots is time consuming, so most exploratory analyses do not adhere to these principles thus creating additional work downstream
Figures and tables in reports should always have:
treatment group)
Essential meta data needs to be part of the rendered object as to not get lost Additional context can to be provided as a separate numbered caption in the report
Example: Table Shell for Baseline Demographics Requirements
Integrate graphical principles in your analytics projects Seamless integration into analytics & reporting workflows Combination of ease of use with flexibility for complex analyses Adaptable to target audiences without repeating core analysis Export outputs (plots & tables) to a variety of formats Explore different visualisations of analytic data set Suitable for analytics use cases in clinical/ medical development
language in clinical development
parts of the problem that we can build upon
questions and stages in the workflow
package and contribute to future development
visR::plot_x() visR:: style_x() Input data Wrapper for typical analyses visR::plot_attrition etc.
Use existing functions/packages where possible
Convenience functions to aggregate data and estimate models*
*Functions for survival models, p-values, confidence intervals,... … make available separately & allow to call on patient-level data
Styles should be adaptable to corporate designs
tidyverse
○ Re-use established tools where possible ○ Interact with dplyr and modeling packages ○ Plotting should build on ggplot2
modification
formats (html, pdf, word, …)
○ Estimator function: computes estimates, as well as upper and lower bounds, p-values, etc. ○ Visualization function: visualizes data as a plot or a table (or something else). ○ Style function: applies common theme and color palettes to all outputs
Estimator Function Visualization Function + Style Function trt time status time trt estimate lower upper Interim (estimate) data model (e.g. broom) Input data model (a data.frame / tibble) Visualization
1 B u i l d A n a l y s i s C
t
H
m a n y p a t i e n t s a r e k e p t a f t e r a p p l y i n g i n c l u s i
/ e x c l u s i
c r i t e r i a ?
2 B a s e l i n e C h a r a c t e r i s t i c s
W h a t a r e t h e g e n e r a l c h a r a c t e r i s t i c s
t h e p
u l a t i
w e a r e a n a l y z i n g ?
3 E s t i m a t e S u r v i v a l F u n c t i
W h a t i s t h e p r
a b i l i t y
a p a t i e n t h a v i n g s u r v i v e d a t t i m e t g i v e n h i s / h e r s t r a t u m ?
Attrition
Summary
Kaplan-Meier
created by vr_attrition_table.R
Can help to comply with reporting guidelines like CONSORT and STROBE.
and displayed in a table
easily adapted using in-built or custom summary functions
DT html tables with and without download feature Rendering as gt table Rendering as dt table
from a survey and paper by Morris et
about what the perfect Kaplan-Meier plot should look like
such as number of patients, axis labels (with units where needed), data source.
censored, by stratum at regular timepoints.
estimation plotting estimation plotting Kaplan-Meier Curve Risk Table Kaplan-Meier Curve + Risk Table wrapper
estimation functions.
strata.
December 2019 Develop Prototype
Time to Event analysis
April 2020 Launch & Find Collaborators
stakeholders
collaborators to drive development
Q2/Q3 2020 Scale Development
choices
and table types following graphical principles
Project Kick Off
and requirements
architecture
visR is still in its experimental phase and we are looking for partners to further develop the package!
github issues
pick an issue & work on it
What contributions are we looking for?
visR http://openpharma.github.com/visR Graphics Principles https://graphicsprinciples.github.io/
Acknowledgements
Survival, DT Datatables, kable & kableExtra, ggpubr, and many more
Acknowledgments Thanos Siadimas, Pawel Kawski, James Black, Janine Hoffart, Baldur Magnusson, Alison Margolskee How to reach out? Email: mark.baillie@novartis.com & james.black.jb2@roche.com