[PPT] - Re-inserting human interaction ! into cancer genome interpretation ! PowerPoint Presentation

SLIDE 1

Re-inserting human interaction ! into cancer genome interpretation!

CYDNEY NIELSEN

UNIVERSITY OF BRITISH COLUMBIA BRITISH COLUMBIA CANCER AGENCY

SLIDE 2

Outline

1 Visualization and its role in scientific discovery! 2 Interactive cancer genomics visualization: why now?! 3 Building a cancer genomics visualization platform !

Flexible integration of views!
Dynamic linking between views!
Scalable to large data sets!

4 Summary! !

! !

SLIDE 3

1

Visualization and its role in scientific discovery

SLIDE 4

Discovery loop

INSIGHTS! QUESTIONS! DATA! hypothesis generation! interpretation! experiments!

...01100110...

!

?

SLIDE 5

...01100110...

!

?

Discovery loop

INSIGHTS! QUESTIONS! DATA! experiments! communication! PUBLICATIONS! interpretation!

SLIDE 6

Discovery loop

INSIGHTS! QUESTIONS! DATA! experiments! interpretation! computer automation + human expert! communication! PUBLICATIONS!

...01100110...

!

?

SLIDE 7

Intelligence Amplifying System > Artificial Intelligence System!

! That is, a machine and a mind can beat a mind-imitating machine working by itself.!

Frederick Brooks

SLIDE 8

Why visualization?

Visualization!

Leverages our ability to visually recognize patterns and enhances our ability to reason about data!
Can reveal a level of detail that may be missed in summary statistics alone!

y I II III IV x 10 8 13 9 11 14 6 4 12 7 5 y 8.04 6.95 7.58 8.81 8.33 9.96 7.24 4.26 10.84 4.82 5.68 x 10 8 13 9 11 14 6 4 12 7 5 9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74 y 7.46 6.77 12.74 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73 x 8 8 8 8 8 8 8 19 8 8 8 y 6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 6.89 x 10 8 13 9 11 14 6 4 12 7 5

a b

Anscombe’s quartet!

SLIDE 9

...01100110...

!

?

Why visualization?

INSIGHTS! DATA!

interpretation!

Visualization!

Is well suited to questions where the solution is too ill-defined to be automatically computed!

! !

SLIDE 10

Why visualization?

www.apple.com

Example: ! ! Visual Information-Seeking Mantra! ! Overview first, zoom and filter, then details-on-demand. !

!

Shneiderman 1996!

Visualization!

Can be further enhanced with interactivity, which is key to dynamic data exploration!

! !

SLIDE 11

Visualization!

Reduces the computational barrier

posed by many data analysis workflows! ! !

Why visualization?

SLIDE 12

2 Interactive cancer genomics visualization: why now?

SLIDE 13

Analogy: Human genome assembly

Computer automation

To reconstruct the human genome sequence from raw sequencing data!

Human expert

To finish the genome: close gaps, correct mis-assemblies, improve error probabilities of the consensus bases!

Consed | David Gordon and Phil Green!

SLIDE 14

Analogy: Human genome assembly

Computer automation Human expert

Consed | David Gordon and Phil Green!

SLIDE 15

Analogy: Human genome assembly

Consed | David Gordon and Phil Green!

Some manual tasks become

automated once they are better characterized (e.g. AutoFinish)!

Computational analyses can be

interactively focused by the user (e.g. local re-assembly)!

Computer automation Human expert

SLIDE 16

Cancer genomics data interpretation

Computer automation

To predict diverse features that differ between tumor and matched-normal sample pairs!

A > G A > G!

Mutations! Copy number!

deletion deletion!

Gene expression!

AAAAA! AAAAA! AAAAA! AAAAA! AAAAA! AAAAA!

Rearrangements!

translocation translocation!

Human expert

To integrate and interpret these features together with relevant patient metadata!

SLIDE 17

Cancer genomics data interpretation

A > G A > G!

Mutations! Copy number!

deletion deletion!

Gene expression!

AAAAA! AAAAA! AAAAA! AAAAA! AAAAA! AAAAA!

Rearrangements!

translocation translocation!

Need$interac+ve$visualiza+on$tools$to$ facilitate$the$human$component$and$ complement$the$computa+onal$one$

Computer automation Human expert

SLIDE 18

Visualizing multidimensional cancer genomics data

Michael P Schroeder1, Abel Gonzalez-Perez1 and Nuria Lopez-Bigas*1,2

REVIEW

Schroeder et al. Genome Medicine 2013, 5:9 http://genomemedicine.com/content/5/1/9

Genomics visualization

Matrix heatmaps Genomic coordinates Networks Chromosomal coordinates Clinical data Interactions Clinical data Omics data Genes Clinical data Omics data Omics data Genes Samples

SLIDE 19

Visualizing multidimensional cancer genomics data

Michael P Schroeder1, Abel Gonzalez-Perez1 and Nuria Lopez-Bigas*1,2

REVIEW

Schroeder et al. Genome Medicine 2013, 5:9 http://genomemedicine.com/content/5/1/9

Genomics visualization

SLIDE 20

3

Building a cancer genomics visualization platform

SLIDE 21

Flexible integration of views

SLIDE 22

Integrate multiple data types into one view

Mutations | MutationSeq!

Ding et al., Bioinformatics, 2012!

Copy Number | Titan!

Ha et al., Genome Research, 2014!

Example analysis: Examine a mutation in its copy number context!

!

dele$on' muta$on'

SLIDE 23

Integrate multiple data types into one view

Mutations | MutationSeq!

Ding et al., Bioinformatics, 2012!

Copy Number | Titan!

Ha et al., Genome Research, 2014!

Example analysis: Examine a mutation in its copy number context!

!

mutations! copy number!

SLIDE 24

Compare data filters on a single data set

MutationSeq predictions!

Example analysis: ! Examine impact of MutationSeq probability threshold on coverage versus allele ratio distribution! !

SLIDE 25

Explore views of different data types

MutationSeq predictions! Titan copy number predictions!

Example analysis: ! Examine both the mutations and copy number alterations for a given sample! !

SLIDE 26

v! d! Components

Data!

sample(s) + data type!

View!

visual representation!

Region Filter!

n genomic range!

Data Filter!

n data parameters!

SLIDE 27

Integrate multiple data types into one view

v! d! d!

mutations! copy number!

SLIDE 28

Compare data filters on a single data set

v! d! v! MutationSeq predictions!

SLIDE 29

v! v! d! d! MutationSeq predictions! Titan copy number predictions!

Explore views of different data types

SLIDE 30

Interface

SLIDE 31

Select a predefined structure!

Create

SLIDE 32

Add to an existing structure!

Create

SLIDE 33

Sample(s)! Query by project name / tumour type / sample id! ! Single data type! e.g. mutations, copy number, etc.!

Define Data

SLIDE 34

Data filters depend

n previously

selected data type!

Filter Data

SLIDE 35

Filter Regions

Limit the view to genes or regions of interest!

SLIDE 36

View types depend

n previously

selected data type!

Select a View

SLIDE 37

Adjust View

SLIDE 38

Inspect/Modify

SLIDE 39

Dynamic linking between views

SLIDE 40

Dynamically link views of different data types

v! v! d! d! MutationSeq predictions! Titan copy number predictions!

SLIDE 41

Dynamically link views of different data types

v! v! d! d!

SLIDE 42

Dynamically link views of different data types

v! v! d! d!

SLIDE 43

Scalability

SLIDE 44

Research on big data visualization must address two major challenges: ! perceptual and interactive scalability!

Zhicheng Liu, Biye Jiang, Jeffrey Heer inMens, EuroVis 2013

SLIDE 45

Interactive scalability

How to enable dynamic querying and rendering of millions of data points in real time? !

SLIDE 46

Optimized for text search across documents!
All fields are indexed for fast retrieval (bag-of-terms approach)!
Query performance is a function of the number of query matches not the

total data set size!

Scales well as the data set size grows!
Appropriate for load-once-read-many workflows!

Search

SLIDE 47

Elasticsearch

Chose for ease of use (built on top of Apache Lucene)!
Benefits include:!
Built-in support for distributed data (manages shards across nodes)!
Extensive caching!
Sophisticated query language (DSL)!

SLIDE 48

Storing data

community use!

SLIDE 51

Perceptual scalability

What to do when we have more data points than pixels?!

SLIDE 52

Design views to present meaningful summaries!

Scalable views

Sample or smooth data while preserving potentially relevant outliers!
Exploit optimized elasticsearch aggregation functions (e.g. counts in

heatmaps computed during search)!

SLIDE 53

4 Summary

SLIDE 54

Summary

We need a flexible platform in order to tackle the diverse visualization

demands of cancer genomics!

Goal of this platform is to facilitate scientific discovery!
Uses visualization as a means of supporting data exploration and

insight generation!

The insights are not necessarily the final product, but rather the

beginning of a rigorous scientific process to further test the idea!

Key challenges include:!
Interactive and perceptual scalability!
Interoperability!

SLIDE 55

Acknowledgements

Sohrab Shah! ! Samuel Aparicio! David Huntsman! Marco Marra! Janessa Laskin! ! ! Michael Smith Genome Sciences Centre!

British Columbia Cancer Agency! Vancouver, Canada!

Tom Jin! Kevin Wagner! Daniel Machev! Kelsey Hamer! Ali Bashashati! ! Shah Lab Development Team!