data visualization strategies and tools for microbial
play

Data visualization strategies and tools for microbial genomic - PowerPoint PPT Presentation

Data visualization strategies and tools for microbial genomic epidemiology Anamaria Crisan Vanier Canada Scholar & UBC Public Scholar PhD Candidate, Computer Science University of British Columbia @amcrisan acrisan@cs.ubc.ca


  1. Data - Don’t Just Visualize the Raw Data! Example when this advice is ignored Example Original (Raw) Data Derived Data T. Munzner (2014) – Visualization Design and Analysis XKCD

  2. Tasks - How People Use the Data Geographic Overview of Prostate Cancer Individual Prostate Cancer Risk § Useful for epidemiologists and policy makers § Good for patients and doctors § Supports surveillance tasks § Supports treatment decision making tasks Source : http://riskcalc.org/PCPTRC/ (UT San Antonio) Source : Atlanta CDC

  3. Tasks - How People Use the Data • Tasks can also change how the same data should be visualized • Example: representing US electoral collage results Standard Map Cartogram

  4. Tasks - How People Use the Data • Tasks can also change how the same data should be visualized • Example: representing US electoral collage results Standard Map Snakey Diagram

  5. Tasks - How People Use the Data • Tasks can also change how the same data should be visualized • Example: representing US electoral collage results

  6. How can we identify Examples from tasks and data? my own research

  7. My research : making a clinical report for tuberculosis • Mixed methods approach to gathering data and tasks Di Disco cove very De Design Implement Im Information Gathering Design & Evaluation Finalize Design MYCOBACTERIUM TUBERCULOSIS GENOME SEQUENCING REPORT NOT FOR DIAGNOSTIC USE Pa�ent Name JOHN DOE Barcode Birth Date 2000-01-01 Pa�ent ID 12345678910 Loca�on SOMEPLACE Sample Type SPUTUM Sample Source PULMONARY Sample Date 2016-12-25 Sample ID A12345678 Sequenced From MGIT CULTURED ISOLATE Repor�ng Lab LAB NAME Report Date/Time 2017-01-01, 15:36 Requested By REQUESTER NAME Requester Contact REQUESTER@EMAIL.COM Summary The specimen was posi�ve for Mycobacterium tuberculosis . It is resistant to isoniaizd and ri- fampin . It belongs to a cluster, sugges�ng recent transmission . Organism The specimen was posi�ve for Mycobacterium tuberculosis , lineage 2.2.1 ( East-Asian Beijing ). Drug Suscep�bility � No drug resistance predicted Resistance is reported when a high-confidence � Mono-resistance predicted resistance-conferring muta�on is detected. “No muta�on detected” does not exclude the possi- � Mul�-drug resistance predicted � bility of resistance . � Extensive drug resistance predicted Drug class Interpreta�on Drug Resistance Gene (Amino Acid Muta�on) Ethambutol No muta�on detected Suscep�ble Expert TB Task & Data Design Design Choice Pyrazinimide No muta�on detected First Line Isoniazid katG (S315T) Resistant Rifampin rpoB (S531L) Streptomycin No muta�on detected Consults Workflow Questionnaire Sprint Questionnaire Ciprofloxacin No muta�on detected Ofloxacin No muta�on detected Second Line Suscep�ble Moxifloxacin No muta�on detected Amikacin No muta�on detected Map Kanamycin No muta�on detected Data Gathered Capreomycin No muta�on detected Page 1 of 2 Pa�ent ID: 12345678910 | Date: 2017-01-01 | Loca�on: Someplace Qualitative Quantitative Exploratory Sequential Model Embedded Model Study Design

  8. My research : making a clinical report for tuberculosis Consensus DIAGNOSIS TASKS TREATMENT TASKS SURVEILLANCE TASKS among participants Connect Guide Characterize Assess Guide Report to Case to Public WGS Diagnose Diagnose Reactive vs Transmission Choose Choose Tx Response Contact Public Define a Existing Health TOTAL cat. % agree equivalent Latent TB Active TB New Infection Risk Meds Duration to Tx Tracing Health Cluster Cluster Response SCORE Patient Identifier Same 3 3 3 3 3 3 3 2 1 1 1 1 26 3 (>75%) 3 3 2 3 3 3 3 1 1 1 1 1 24 Sample Collection Date Same Patient Prior TB Results Same 3 2 3 3 3 3 3 1 1 1 0 1 23 2 (50% - 25%) Speciation Speciation 1 3 2 3 3 3 3 2 1 1 1 1 23 Sample Type (sputum, fine Same 2 3 2 3 3 3 3 1 1 1 0 1 22 needle aspirate etc.) 1 (25% -50%) Culture results NA 1 3 2 3 3 3 3 2 1 1 0 1 22 Sample Collection Site (lymph Same 2 3 2 3 3 3 3 1 1 0 0 1 21 0 (<25%) node, lung etc..) 2 3 2 3 2 3 3 1 1 1 0 1 21 Acid Fast Bacilli Smear Speciation Resistotype Predicted DST 0 2 3 1 3 3 2 2 1 1 1 1 19 Phenotypic DST Predicted DST 0 2 3 2 3 3 2 1 1 1 0 1 18 3 3 2 3 0 2 3 1 0 0 0 0 17 Chest x-ray NA Data Report Release Date Same 2 2 1 2 2 2 2 1 0 1 0 1 15 Requester IDs Same 2 2 2 2 2 2 2 1 0 0 0 0 15 Interpretation or comments Same 2 2 1 2 2 2 3 1 0 0 0 0 15 from reviewer Predicted DST Predicted DST 0 2 2 1 3 3 2 1 0 1 0 0 15 0 2 3 1 1 1 1 1 1 1 1 1 13 MIRU-VNTR SNPs Cluster Assignment Same 0 2 2 1 1 1 0 1 1 1 1 1 11 SNP/variant distance SNPs 0 1 2 1 1 1 0 1 1 1 1 1 10 0 2 1 1 1 1 0 1 0 1 1 1 9 Phylogenetic Tree Same Reviewer ID Same 1 1 1 1 1 1 1 1 0 0 0 0 8 TST results Speciation* 3 1 1 1 0 0 0 1 0 0 0 0 7 IGRA results Speciation* 3 1 1 1 0 0 0 1 0 0 0 0 7 0 1 2 1 1 1 0 1 0 0 0 0 7 Lab QC WGS Specific Spoligotype SNPs 0 1 1 1 0 0 0 0 0 0 0 0 3 RFLP SNPs 0 1 1 1 0 0 0 0 0 0 0 0 3

  9. My research : making a clinical report for tuberculosis MYCOBACTERIUM TUBERCULOSIS GENOME SEQUENCING REPORT NOT FOR DIAGNOSTIC USE Pa�ent Name JOHN DOE Barcode Birth Date 2000-01-01 Pa�ent ID 12345678910 Loca�on SOMEPLACE Sample Type SPUTUM Sample Source PULMONARY Sample Date 2016-12-25 Sample ID A12345678 Sequenced From MGIT CULTURED ISOLATE Repor�ng Lab LAB NAME Report Date/Time 2017-01-01, 15:36 Requested By REQUESTER NAME Requester Contact REQUESTER@EMAIL.COM Summary The specimen was posi�ve for Mycobacterium tuberculosis . It is resistant to isoniaizd and ri- fampin . It belongs to a cluster, sugges�ng recent transmission . Organism The specimen was posi�ve for Mycobacterium tuberculosis , lineage 2.2.1 ( East-Asian Beijing ). Drug Suscep�bility � No drug resistance predicted Resistance is reported when a high-confidence � Mono-resistance predicted resistance-conferring muta�on is detected. “No muta�on detected” does not exclude the possi- � Mul�-drug resistance predicted � bility of resistance . � Extensive drug resistance predicted Drug class Interpreta�on Drug Resistance Gene (Amino Acid Muta�on) Ethambutol No muta�on detected Suscep�ble Pyrazinimide No muta�on detected First Line Isoniazid katG (S315T) Resistant Rifampin rpoB (S531L) Streptomycin No muta�on detected Ciprofloxacin No muta�on detected Ofloxacin No muta�on detected Second Line Suscep�ble Moxifloxacin No muta�on detected Amikacin No muta�on detected Kanamycin No muta�on detected Capreomycin No muta�on detected Page 1 of 2 Pa�ent ID: 12345678910 | Date: 2017-01-01 | Loca�on: Someplace

  10. Thinking Systematically about Data Visualization Da Data Vi Visual + Interaction Do Domain Algori Al rithm thm + Task + sk De Design Ch Choices Problem Pr 4. Explore if other visualizations have addressed this problem and set of tasks & data 5. Implement your own solution (remember this include interaction!) T. Munzner (2014) – Visualization Design and Analysis

  11. Marks & Channels : Basic Building Blocks Ma Mark rk: Basic Graphical Element (basic building block) Ch Channel: Controls the appearance of marks 49 49 T. Munzner (2014) – Visualization Design and Analysis

  12. Marks Vary in their Effectiveness Example Pi Pie Chart Angle & Area Bar Bar Char art Position Common Scale 50 50 J. Heer (2010) – Crowdsourcing Graphical Perception: Using Mechanical Turk ……

  13. Perception and Cognition Matter Too! Original Visualization Visualization as seen by color blind person (color blindness (deuteranopia) impacts men more often)) Colour Blind Simulator: http://www.color-blindness.com/coblis-color-blindness-simulator/

  14. Perception and Cognition Here too! Colour scales also impact interpretation! Perceptual research from Liu et al (2018) Liu et al. (2018) - Somewhere Over the Rainbow: An Empirical Assessment of Quantitative Colormaps

  15. Marks & Channels : ggplot2 example Channel: Colour Channel: Position ggplot (data = mpg, ae aes( x= x= display, y y = ct cty, co colour = cl class) ) + geom_p _point( ) Mark: Point Note No te: : Generally in ggplot2 aesthetics refer to channels and geoms refer to marks, but there are complex geoms that aren’t simple marks but chart types (i.e. geom_density) and there are aesthetics that have little to do with the visual channels directly (i.e. group) 51 51 https://rpubs.com/hadley/ggplot-intro

  16. Marks & Channels : Tableau example Marks Channels 51 51

  17. Linking Data to Mark and Channels to Make Visualizations Dat Data Ma Marks & Channels Vi Visualization

  18. Linking Data to Mark and Channels to Make Visualizations Data to viz Chart Chooser https://www.data-to-viz.com/ https://bit.ly/2P9zLEW

  19. Examples from How do people my own research visualize data?

  20. My research: surveying visualizations in genomic epidemiology http://gevit.net Crisan et. al (2018) “A systematic method for surveying data visualizations and a resulting genomic epidemiology visualization typology: GEViT” OXFORD BIOINFORMATICS

  21. Examples from How can we help my own research people visualize data?

  22. My research: simplifying the creation of data visualizations #specify individual charts phyloTree_chart <- specify_base (chart_type = "phylogenetic tree",data="tree_dat") epicurve <- specify_base (chart_type = "histogram",data="tab_dat",x = "month") map_chart <- specify_base(" geographic map",data="tab_dat",lat = "latitude",long = "longitude") #specify a combination colour_ combo <- specify_combination (combo_type = "color_linked", base_charts = c("phyloTree_chart","map_chart","epicurve"),link_by="country") #plot the result plot(color_combo )

  23. My research: automatic data visualization Preliminary Result # Analyze different longitude combo_axis_var combo_axis_var − 8 # data types automatically − 10 harmon_obj<-data_harmonization(tab_dat, − 12 tree_dat,genomic_dat,all_spatial) − 14 # Create specifications GIN LBR SLE GIN LBR SLE country country # that compile to minCombinr A B case_count 1000 12 ° N component_specs<-get_spec_list(harmon_obj) 0 30 750 10 ° N 60 count #plot the result one view at a time 90 500 8 ° N plot_view(component_specs,view_num=1) minID 250 GIN 6 ° N LBR 0 SLE GIN LBR SLE 4 ° N 14 ° W 12 ° W 10 ° W 8 ° W country

  24. Thinking Systematically about Data Visualization Da Data Vi Visual + Interaction Do Domain Algori Al rithm thm + Task + sk De Design Ch Choices Problem Pr 4. Explore if other visualizations have addressed this problem and set of tasks 5. Implement your own solution (part or all of that solution could be a new algorithm)

  25. Thinking Systematically about Data Visualization Da Data Visual + Interaction Vi Do Domain Algori Al rithm thm + + Task sk De Design Ch Choices Problem* Pr 6. Test multiple alternatives (including new ones you develop) with stakeholders 7. Gather qualitative & quantitative evaluation data

  26. Thinking Systematically about Data Visualization 1. Identify a relevant pr blem that effects you or a group probl of stakeholders Design data stakeholders use (is it available)? 2. Ask wh what da 3. Ask wh what stake keholde ders do do with the data [ ta tasks ] 4. Explore if other visualizations have addressed this blem and set of ta pr probl tasks & da data ta 5. Implement yo your own wn solution (vis and/or algorithm) 6. Test multiple ltiple alte alternativ atives (including new ones you develop) with stakeholders 7. Gather qu qualita tati tive & qu quanti tita tati tive evaluation data Evaluation

  27. What datavis tools are available?

  28. Data Visualization Tools to Get You Started

  29. Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: http://bit.ly/2gRGx1J I am presenting her figures here

  30. Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: http://bit.ly/2gRGx1J Analysis vs Presentation

  31. Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: http://bit.ly/2gRGx1J Extent of Flexibility How easy/hard it is to make data visualizations (including custom/novel visualizations)

  32. Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: http://bit.ly/2gRGx1J Static vs Interactive

  33. Tools & Libraries for data visualization Lisa Charlotte Rost has an excellent blog post about this: http://bit.ly/2gRGx1J “There are no perfect tools, just good tools for people with certain goals” See a detailed table here: http://bit.ly/2DeWPwV

  34. Tools & Libraries for data visualization Another take with commonly used tools : https://bit.ly/2SgrOzS

  35. Don’t forget that pen and paper is an option too! Dear Data Project (Lupi & Posavec)

  36. Datavis tools for (Microbial) Genomics

  37. IGV Browser for all your genomic needs https://software.broadinstitute.org/software/igv/

  38. The classic UCSC genome browser https://genome.ucsc.edu

  39. GenVisR: Human Genomes in R https://academic.oup.com/bioinformatics/article/32/19/3012/2196360

  40. Variant Viewer: Human Genomes http://www.cs.ubc.ca/labs/imager/tr/2013/VariantView/

  41. Island Viewer: Microbial Genomics https://www.pathogenomics.sfu.ca/islandviewer/accession/NZ_CP012358.1/

  42. Microreact: Microbial Genomics https://microreact.org/

  43. GenGIS: Microbial Genomics (Made in Canada!) http://kiwi.cs.dal.ca/GenGIS/Main_Page

  44. Nextstrain: Microbial Genomics https://nextstrain.org/ebola

  45. Wrapping up

  46. DATA VISUALIZATION IS NOT JUST AN ART PROJECT

  47. Key take-aways from this talk Vi Visualizations of data are useful § Helpful in instance of low numeracy § Can used in communication an and exploration § Bu But. t.. visual alizati ation de design gn al also matte atters rs § Many different alternatives, important to test § It It’s p possib ible le t to t thin ink s systemat atic ically ally ab about v vis isualiz alizat atio ions § Many disciplines cross cut information visualization research § At the minimum think “Why”, “What”, “How” § En Encod ode data well so o that ot others can decod ode it later § Da Data ta visualizati tion is a re researc rch pro rocess wi with ope pen and d interesting g pr probl blems ms §

  48. Additional Resources Bo Books to consider: § Interpretable Machine Learning: https://christophm.github.io/interpretable-ml-book/ § Making Data Visual: A Practical Guide to Using Visualization for Insight § by Danyel Fisher and Miriah Meyer Visualization Design and Analysis by Tamara Munzner (more technical ) § On Onlin line resou ources: § Distill Publication : https://distill.pub/ § UBC Infovis Resource Page : http://www.cs.ubc.ca/group/infovis/resources.shtml § UW Interactive Data Lab : https://medium.com/@uwdata § Data stories podcast : http://datastori.es/ § In Inspiration : § Information is Beautiful : https://informationisbeautiful.net/ § Visualization WTF (examples of what not to do) : http://viz.wtf/ §

  49. Data visualization strategies and tools for microbial genomic epidemiology Anamaria Crisan Vanier Canada Scholar & UBC Public Scholar PhD Candidate, Computer Science University of British Columbia @amcrisan acrisan@cs.ubc.ca http://cs.ubc.ca/~acrisan

  50. Part I: Data Visualization Strategies & Tools Pa Part II: A brief (5 min) activity Pa Pa Part III: Data Visualization Research in Practice

  51. How many ways can we visualize these numbers? • In your head, on paper, or computer, sketch out as many examples as you can to visualize the following to numbers: 75 37

  52. How many ways can we visualize these numbers? • In your head, on paper, or computer, sketch out as many examples as you can to visualize the following to numbers: 75 37 example:

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend