Visualization In Biology Alexander Lex CS 171 Guest Lecture, - - PowerPoint PPT Presentation

visualization in biology
SMART_READER_LITE
LIVE PREVIEW

Visualization In Biology Alexander Lex CS 171 Guest Lecture, - - PowerPoint PPT Presentation

Visualization In Biology Alexander Lex CS 171 Guest Lecture, 18.04.2013 WHA HAT T DO O I M I MEAN: N: VIS ISUALI ALIZA ZATION TION IN IN BI BIOL OLOG OGY? Y? 2 Visualizing the Flight of Bats? [Bergou 2011] 3 Visualizing Bird


slide-1
SLIDE 1

Visualization In Biology

Alexander Lex

CS 171 Guest Lecture, 18.04.2013

slide-2
SLIDE 2

WHA HAT T DO O I M I MEAN: N: VIS ISUALI ALIZA ZATION TION IN IN BI BIOL OLOG OGY? Y?

2

slide-3
SLIDE 3

Visualizing the Flight of Bats?

3

[Bergou 2011]

slide-4
SLIDE 4

Visualizing Bird Populations?

4

[Ferreira 2011]

slide-5
SLIDE 5

Visualizing Fish Swarms?

5

[Boosherian 2012]

slide-6
SLIDE 6

Visualizing CT/MRI Data?

6

[Bruckner 2007]

slide-7
SLIDE 7

NO NO! ! IN N THI HIS LE LECTUR TURE: E: MOL OLECUL CULAR AR BIOLOG OLOGY Y (M (MB)

7

slide-8
SLIDE 8

Why is MB important?

100,000 200,000 300,000 400,000 500,000 600,000 700,000

Suicide Influenza and Pneumonia Kidney-Related Diabetes Alzheimer's disease Accidents Stroke Chronic lower… Cancer Heart disease

Causes of Death in the USA 2011

8

[Data from CDC Death and Mortality Repot 2011]

slide-9
SLIDE 9

Why is MB important?

100,000 200,000 300,000 400,000 500,000 600,000 700,000

Suicide Influenza and Pneumonia Kidney-Related Diabetes Alzheimer's disease Accidents Stroke Chronic lower… Cancer Heart disease

Causes of Death in the USA 2011

9

[Data from CDC Death and Mortality Repot 2011]

slide-10
SLIDE 10

Why is MB important?

Understanding Fundamentals in Biology Disease Prevention Targeted Diagnosis (BioMarkers) Personalized Medicine Drug Development Targeted Modification of Organisms

10

slide-11
SLIDE 11

Why is Vis for MB important?

Biology is experiencing a revolution!

Transformation from a wet-lab/experimental to computational science Challenge in MB is shifting from Data Acquisition to Data Processing & Analysis

11

slide-12
SLIDE 12

Why is Vis for MB important?

12

slide-13
SLIDE 13

What does this mean?

We can now do very large experiments

13

slide-14
SLIDE 14

Why is the Analysis Hard?

20,000 protein coding genes (1.5% of the genome) 3 billion basepairs Gene -> Protein -> Function Each of these steps is influenced by many processes! Very complex interplay of functional aspects.

14

slide-15
SLIDE 15

Major Areas for Vis in MB

Genome Structure Genome Activity - Omics Data Biological Networks Macromolecular Structures Phylogenetics

15

slide-16
SLIDE 16

Genome Structure

What is the sequence of bases in a genome? Common “Defects”

Chromosomal alterations Copy-number variation Mutations SNPs

How do these influence the phenotype?

16

Scale

slide-17
SLIDE 17

Genome Structure Vis

“Track-based” Visualization

17

slide-18
SLIDE 18

Circular Layouts

18

[Krzywinski 2009] [Meyer 2009]

slide-19
SLIDE 19

Genome Activity

Which genes are active? How active are they? Protein Expression Gene Expression Epigenetics:

miRNA Expression methylation

What is the function of a gene?

19

slide-20
SLIDE 20

Heat Maps

20

[Eisen 1999]

slide-21
SLIDE 21

New Approaches!

21

[Meyer 2010]

slide-22
SLIDE 22

22

slide-23
SLIDE 23

Biological Networks

How do proteins and other (bio)chemical products interact? Protein-Protein interaction Pathways What are the processes in a cell?

23

slide-24
SLIDE 24

Protein Interaction Networks

24

[Cytoscape]

slide-25
SLIDE 25

Pathways

25

[Kegg]

slide-26
SLIDE 26

Pathways

26

[Kegg]

slide-27
SLIDE 27

Pathways – Free Layouts

27

[Barsky 2008]

slide-28
SLIDE 28

CASE SE STUD UDIES IES

http://caleydo.org

28

slide-29
SLIDE 29

What is Caleydo?

Software for visualizing biomolecular data

tabular data

numerical & categorical e.g., mRNA, microRNA, copy number variation, methylation, mutation status, etc. clinical data

pathways

KEGG, WikiPathways

29

slide-30
SLIDE 30

Caleydo Core Features

Multi-Dataset Analysis. Want to see….

…relationships between multiple datasets? …relationships between tabular and graph data?

30

slide-31
SLIDE 31

What is Caleydo?

Software for doing research in visualization

developed in academic setting platform for trying out radically new visualization ideas

Quest for compromise between academic prototyping and ready-to-use software

31

Marc Streit & Alexander Lex

slide-32
SLIDE 32

Who is Caleydo?

Marc Streit

Johannes Kepler University Linz, AT

Alexander Lex

Harvard University, Cambridge, USA

Christian Partl

Graz University of Technology, AT

Samuel Gratzl

Johannes Kepler University Linz, AT

Nils Gehlenborg

Harvard Medical School, Boston, USA

Dieter Schmalstieg

Graz University of Technology, AT

Hanspeter Pfister

Harvard University, Cambridge, USA

32

slide-33
SLIDE 33

CANC NCER ER SUB UBTYPE TYPE VISU SUALIZA ALIZATION TION

Case Study

33

slide-34
SLIDE 34

Motivation

Cancer types are not homogeneous They are divided into Subtypes

different histology different molecular alterations

Subtypes have serious implications

different treatment for subtypes prognosis varies between subtypes

34

slide-35
SLIDE 35

Cancer Subtype Analysis

Done using many different types of data, for large numbers of patients.

35

slide-36
SLIDE 36

36

Large-scale project to catalogue genetic mutations responsible for cancer

20 tumor types 500 patient samples each

Extensive molecular profiling for each patient

slide-37
SLIDE 37

TCGA Data

37

methylation levels mRNA expression copy number status mutation status microRNA expression clinical parameters pathways

slide-38
SLIDE 38

Subtype Identification

38

Patients

slide-39
SLIDE 39

39

Our goal is to support tu tumo mor r subtyp btype e ch chara racter cteriz izatio ation through integrative vis isual ual analysis ysis of

  • f ca

cance cer r genomi

  • mics

cs data ta sets ts.

slide-40
SLIDE 40

Challenge 1

Manage complex setup of multiple datasets, multiple stratifications & multiple views

Challenge 2

Visualize complex interdependencies between multiple, heterogeneous, large datasets

40

StratomeX Data-View Integrator

slide-41
SLIDE 41

Subtype Identification Process

Step 1: Determine candidate subtypes Step 2: Find supporting evidence

41

slide-42
SLIDE 42

Stratification

Patients

T abular Data

Candidate Subtypes

Genes, Proteins, etc.

42

slide-43
SLIDE 43

Stratification of a Single Dataset

Cluster A1 Cluster A2 Cluster A3

43

slide-44
SLIDE 44

Stratification

Subtypes are identified by stratifying datasets, e.g.,

based on an expression pattern a mutation status a copy number alteration a combination of these

44

slide-45
SLIDE 45

Subtype Identification Process

Step 1: Determine candidate subtypes Step 2: Find supporting evidence

45

slide-46
SLIDE 46

T asks

T1 Evaluate whether stratifications support each other T2 Review effect of stratifications

  • n clinical outcomes
  • n pathways

T3 Show expression patterns in subtypes

46

slide-47
SLIDE 47

Stratification of Multiple Datasets

47

Cluster A1 Cluster A2 Cluster A3 B1 B2

T1 Evaluate whether stratifications support each other

Tabular e.g., mRNA Categorical, e.g., mutation status

slide-48
SLIDE 48

Example: Titanic Dataset

Multi-dimensional dataset

Age Name Gender Survival status Class

1st class, 2nd class, 3rd class and crew

How many male crew members survived?

http://lib.stat.cmu.edu/S/Harrell/data/descriptions/titanic.html 48

slide-49
SLIDE 49

Mosaic Plot Matrix

49

[Friendly 1999]

How many male crew members survived?

slide-50
SLIDE 50

Parallel Sets

50

[Kosara 2006]

How many male crew members survived?

slide-51
SLIDE 51

Stratification of Multiple Datasets

Cluster A1 Cluster A2 Cluster A3 B1 B2 Tabular e.g., mRNA Categorical, e.g., mutation status

51

T1 Evaluate whether stratifications support each other

slide-52
SLIDE 52

Stratification of Multiple Datasets

Cluster A1 Cluster A2 Cluster A3 B1 B2 Dependent Data, e.g. clinical data

  • Dep. C1
  • Dep. C2

T2 Review effect of stratification

Tabular e.g., mRNA Categorical, e.g., mutation status

52

slide-53
SLIDE 53

53

Band = Subset of Patients Rows = Patients Columns = Genes

slide-54
SLIDE 54

54

Patients stratified by Copy Number Patients stratified by Clustering

slide-55
SLIDE 55

55

Table Cate- gorical Depen- dent

T3 Show expression patterns in subtypes

slide-56
SLIDE 56

56

Survival EGFR Copy Number Status mRNA Levels Glioma Pathway Survival

slide-57
SLIDE 57

57

Live-Demo!

http://stratomex.caleydo.org

slide-58
SLIDE 58

PATHW HWAY Y & & EXPERIM ERIMENT ENTAL AL DATA

Case Study

58

slide-59
SLIDE 59

Experimental Data and Pathways

Pathways represent consensus knowledge for a healthy organism or specific disease Cannot account for variation found in real-world data Branches can be (in)activated due to

mutation, changed gene expression, modulation due to drug treatment, etc.

59

slide-60
SLIDE 60

Why use Visualization?

Efficient communication of information A

  • 3.4

B 2.8 C 3.1 D

  • 3

E 0.5 F 0.3

C B D F A E

60

slide-61
SLIDE 61

Experimental Data and Pathways

[Lindroos2002] [KEGG]

61

slide-62
SLIDE 62

REQU QUIR IREMEN EMENTS TS ANALYS YSIS IS

62

slide-63
SLIDE 63

What to Consider when Visualizing Experimental Data and Pathways

Five Requirements

Ideal visualization technique addresses all Talking about 3 today

63

slide-64
SLIDE 64

R I: Data Scale

Large number of experiments

Large datasets have more than 500 experiments

Multiple groups/conditions

64

slide-65
SLIDE 65

R II: Data Heterogeneity

Different types of data, e.g.,

mRNA expression

numerical

mutation status

categorical

copy number variation

  • rdered categorical

metabolite concentration

numerical

Require different visualization techniques

65

slide-66
SLIDE 66

R V: Supporting Multiple T asks

Two central tasks:

Explore topology of pathway Explore the attributes of the nodes (experimental data)

Need to support both!

66

C B D F A E

slide-67
SLIDE 67

VISU SUALIZA ALIZATION TION TECHNI HNIQUES QUES

Alexander Lex | Harvard University

67

slide-68
SLIDE 68

Visualization Approaches

68

Separate Linked Views Small Multiples Layout Adaption Linearization

[Meyer 2010] [Junker 2006]

Alexander Lex | Harvard University

Path-Extraction On-Node Mapping

[Lindroos 2002]

slide-69
SLIDE 69

On-Node Mapping

Alexander Lex | Harvard University

69

[Lindroos2002]

slide-70
SLIDE 70

On-Node Mapping

Alexander Lex | Harvard University

70

[Westenberg 2008] [Gehlenborg 2010]

slide-71
SLIDE 71

On-Node & Tooltip

Alexander Lex | Harvard University

71

[Streit 2008]

slide-72
SLIDE 72

[Lindroos 2002]

On-Node Mapping

Visualization Approaches

72

Small Multiples Layout Adaption Linearization

[Meyer 2010] [Junker 2006]

Alexander Lex | Harvard University

Path-Extraction Separate Linked Views

slide-73
SLIDE 73

Separate Linked Views

Alexander Lex | Harvard University

73

[Shannon 2008]

slide-74
SLIDE 74

Separate Linked Views

Alexander Lex | Harvard University

74

slide-75
SLIDE 75

Separate Linked Views

Alexander Lex | Harvard University

75

slide-76
SLIDE 76

Separate Linked Views

[Lindroos 2002]

On-Node Mapping

Visualization Approaches

76

Layout Adaption Linearization

[Meyer 2010] [Junker 2006]

Alexander Lex | Harvard University

Path-Extraction Small Multiples

slide-77
SLIDE 77

Small Multiples

Alexander Lex | Harvard University

77

slide-78
SLIDE 78

Small Multiples

Alexander Lex | Harvard University

78

[Barsky 2008] Video!

slide-79
SLIDE 79

Separate Linked Views

[Lindroos 2002]

On-Node Mapping

Visualization Approaches

79

Small Multiples Linearization

[Meyer 2010]

Alexander Lex | Harvard University

Path-Extraction Layout Adaption

[Junker 2006]

slide-80
SLIDE 80

Layout Adaption

„Moderate“ Layout Adaption

make space for on-node encoding

Alexander Lex | Harvard University

80

[Gehlenborg 2010] [Junker 2006]

slide-81
SLIDE 81

Layout Adaption

„Extreme“ layout adaption

encode information through position

Alexander Lex | Harvard University

81

[Bezerianos 2010]

Video: http://www.youtube.com/watch?v=NLiHw5B0Mco

slide-82
SLIDE 82

Layout Adaption

[Junker 2006]

Separate Linked Views

[Lindroos 2002]

On-Node Mapping

Visualization Approaches

82

Small Multiples

Alexander Lex | Harvard University

Path-Extraction Linearization

[Meyer 2010]

slide-83
SLIDE 83

Linearization – Pathline

Alexander Lex | Harvard University

83

[Meyer 2010]

Combination of

layout adaption separate linked views

slide-84
SLIDE 84

Linearization

Alexander Lex | Harvard University

84

[Meyer 2010]

slide-85
SLIDE 85

Visualization Approaches

85

On-Node Mapping Separate Linked Views Small Multiples Layout Adaption Linearization

[Meyer 2010] [Junker 2006] [Lindroos 2002]

Alexander Lex | Harvard University

Path-Extraction

slide-86
SLIDE 86

CALEYDO LEYDO ENR NROU OUTE TE

86

slide-87
SLIDE 87

Pathway View A E C B D F Pathway View C B D F A E enRoute View

Concept

Group 1 Dataset 1 Group 2 Dataset 1 Group 1 Dataset 2

B C F A D E

87

slide-88
SLIDE 88

Pathway View

On-Node Mapping Path highlighting with Bubble Sets [Collins2009] Selection

Start- and end node Iterative adding of nodes

IGF-1

low high 88

slide-89
SLIDE 89

enRoute View

89

Group 1 Dataset 1 Group 2 Dataset 1 Group 1 Dataset 2

B C F A D E

Path Representation

slide-90
SLIDE 90

enRoute View

90

Group 1 Dataset 1 Group 2 Dataset 1 Group 1 Dataset 2

B C F A D E

Experimental Data Representation

slide-91
SLIDE 91

Experimental Data Representation

Gene Expression Data (Numerical) Copy Number Data (Ordered Categorical) Mutation Data

91

slide-92
SLIDE 92

enRoute View – Putting All Together

92

slide-93
SLIDE 93

93

Live-Demo!

http://enroute.caleydo.org

slide-94
SLIDE 94

CON ONCL CLUSION USION

94

slide-95
SLIDE 95

Impact?

Long Tail Science Exciting and Hard Problems Smart and Engaged Collaborators Potential for High Impact!

95

slide-96
SLIDE 96

However, despite understandable celebration of these achievements, sober reflection reveals many challenges ahead.

Mark I. McCarthy et al. on our understanding of the genetic basis

  • f common phenotypes.

96

slide-97
SLIDE 97

Visualization in Biology

Alexander Lex, Harvard University alex@seas.harvard.edu http://alexander-lex.com

?

Christian Partl Marc Streit Nils Gehlenborg Samuel Gratzl Dieter Schmalstieg Hanspeter Pfister

97