Outline Part 1 Introduction to Genomics Part 2 Visual Design for - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Part 1 Introduction to Genomics Part 2 Visual Design for - - PowerPoint PPT Presentation

Visual Analytics for Genomics Cydney Nielsen ! BC Cancer Agency ! Vancouver, BC, Canada ! Outline Part 1 Introduction to Genomics Part 2 Visual Design for Genomics Part 3 Hands-On Design Exercise Part 1 Introduction to Genomics Genomics


slide-1
SLIDE 1

Visual Analytics for

Genomics

Cydney Nielsen!

BC Cancer Agency! Vancouver, BC, Canada!

slide-2
SLIDE 2

Outline

Part 1 Introduction to Genomics Part 2 Visual Design for Genomics Part 3 Hands-On Design Exercise

slide-3
SLIDE 3

Part 1

Introduction to Genomics

slide-4
SLIDE 4

genome: the complete genetic material of a cell

Genomics Workflow

Part 1. Intro to Genomics

slide-5
SLIDE 5

Sequencing Experiment

Part 1. Intro to Genomics

slide-6
SLIDE 6

Sequencing Experiment

Part 1. Intro to Genomics

slide-7
SLIDE 7

Sequencing Experiment

Part 1. Intro to Genomics

G - C! T - A!

slide-8
SLIDE 8

sample data insight

Genomics Workflow

Part 1. Intro to Genomics

slide-9
SLIDE 9

sample data insight

experiment

sequencing technology!

Genomics Workflow

Part 1. Intro to Genomics

slide-10
SLIDE 10

sample data insight

experiment analysis

visualization! computation!

+

sequencing technology!

Genomics Workflow

Part 1. Intro to Genomics

slide-11
SLIDE 11

sample data insight

experiment analysis

visualization! computation!

+

sequencing technology!

Genomics Workflow

Part 1. Intro to Genomics

slide-12
SLIDE 12

sample data

experiment

sequencing technology!

Genomics Workflow

molecular biology

Part 1. Intro to Genomics

slide-13
SLIDE 13

Genomics Workflow

computational biology / bioinformatics visual analytics

Part 1. Intro to Genomics

data insight

analysis

visualization! computation!

+

slide-14
SLIDE 14

sample data

experiment

sequencing technology!

Genomics Workflow

molecular biology

Part 1. Intro to Genomics

slide-15
SLIDE 15

Sequencing Experiment

Part 1. Intro to Genomics AAAAAAAAAAAAAAGATGT$ CACCAGTACACCGATA$ ACCAGATGGATTAGATGTA$ TACACCGATACACCAGA$ AAAGATGTATACCACCAG$ Sequencing machine! Millions of short sequences (“reads”)! e.g. 75 nt each compared to >3 billion nt in human genome!

slide-16
SLIDE 16

Sequencing Experiment

Part 1. Intro to Genomics ~$5,000$ in$2001$ ~10¢$ in$2011$

slide-17
SLIDE 17

Genomics Workflow

computational biology / bioinformatics visual analytics

Part 1. Intro to Genomics

data insight

analysis

visualization! computation!

+

slide-18
SLIDE 18

De novo assembly!

AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$

Genome$Assembly$

Sequencing Experiments

Part 1. Intro to Genomics

slide-19
SLIDE 19

De novo assembly! Re-sequencing!

AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$ AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$

Reference$Genome$ Genome$Assembly$

Sequencing Experiments

Part 1. Intro to Genomics

slide-20
SLIDE 20

De novo assembly! Re-sequencing! Enrichment!

AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$ AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$ AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$

Reference$Genome$ Reference$Genome$ Genome$Assembly$

Sequencing Experiments

Part 1. Intro to Genomics

slide-21
SLIDE 21

Sequencing Experiments

Part 1. Intro to Genomics

  • What sequence variations appear in cancer patients,

but not in unaffected individuals?!

  • Are these variations predictive of survival outcome?!
  • Are these variations causal for the disease (driver

mutations) or not?! !

slide-22
SLIDE 22

Part 1 - Summary

Part 1. Intro to Genomics

  • 1. Large and ever increasing volume of sequencing

data!

  • 2. Improved analysis techniques are essential for

biologists and clinicians to make the most of these data!

  • 3. Great potential for visual analytics to facilitate

insight and understanding! !

slide-23
SLIDE 23

Part 2

Visual Design for Genomics

slide-24
SLIDE 24

Large number of samples for comparison! Challenge 1

Part 2. Visual Design for Genomics

slide-25
SLIDE 25

Large number of samples for comparison!

“To systematically characterize the genomic changes in hundreds of tumors… and thousands of samples over the next five years”! ! The Cancer Genome Atlas! www.cancergenome.nih.gov!

Challenge 1

Part 2. Visual Design for Genomics

slide-26
SLIDE 26

Stacked data tracks along a common genome x-axis!

Genome coordinate! Data samples!

Genome Browsers

slide-27
SLIDE 27

Home Genomes Blat Tables Gene Sorter PCR PDF/PS Session FAQ Help

UCSC Cancer Genomics Heatmaps

Glioblastoma Copy Number Abnormality, Agilent 244A array (n=200)

Tumor vs normal G e n d e r

Genome coordinate! Data samples! Zhu et al., Nature Methods, 2009!

Genome Browsers

Part 2. Visual Design for Genomics

slide-28
SLIDE 28

Large number of samples for comparison! ! ! ! Challenge 1

Critically consider what you need to display!

!

e.g. replace primary data with a biologically meaningful summary, such as significant changes between samples !

Part 2. Visual Design for Genomics

slide-29
SLIDE 29

Genomic features are small and sparse! Challenge 2

Part 2. Visual Design for Genomics

slide-30
SLIDE 30

LOCAL VIEW!

Genome Browsers

Part 2. Visual Design for Genomics

slide-31
SLIDE 31

LOCAL VIEW!

Human chr1, 1 pt corresponds to 480 kb, which is larger than 98% of all human genes! !

Genome Browsers

Part 2. Visual Design for Genomics

slide-32
SLIDE 32

a b

Chromatin states: Chromosome 3L 5′ 3′ 5′ 3′ 5′ 3′ 5′ 3′

Pericentromeric heterochromatin Cluster of small expressed genes PcG domains Heterochromatin- like domain Open chromatin domain

9 8 7 6 5 4 3 2 1

GLOBAL VIEW!

Kharchenko et al., Nature, 2011! Anders, Bioinformatics, 2009!

Hilbert Curve

Part 2. Visual Design for Genomics

slide-33
SLIDE 33

Connect overview and detail!

Challenge 2 Genomic features are small and sparse!

Part 2. Visual Design for Genomics

slide-34
SLIDE 34

Genomic features involve non-adjacent positions! Challenge 3

Part 2. Visual Design for Genomics

slide-35
SLIDE 35

Challenge 3

J J Jʹ Jʹ K Kʹ K Jʹ J Kʹ K’ K J Kʹ Jʹ K Reference Variant

c e a b d

J K Jʹ Kʹ

Structural rearrangements!

Part 2. Visual Design for Genomics

slide-36
SLIDE 36

Challenge 3

J J Jʹ Jʹ K Kʹ K Jʹ J Kʹ K’ K J Kʹ Jʹ K Reference Variant

c e a b d

J K Jʹ Kʹ

Structural rearrangements!

Part 2. Visual Design for Genomics

slide-37
SLIDE 37

Challenge 3 Structural rearrangements!

Circos, Martin Krzywinski! Part 2. Visual Design for Genomics

slide-38
SLIDE 38

Challenge 3

J J Jʹ Jʹ K Kʹ K Jʹ J Kʹ K’ K J Kʹ Jʹ K Reference Variant

c e a b d

J K Jʹ Kʹ

Structural rearrangements!

Part 2. Visual Design for Genomics

slide-39
SLIDE 39

Challenge 3 Structural rearrangements!

VISTA-Dot! Part 2. Visual Design for Genomics

slide-40
SLIDE 40

Challenge 3

All these representations use a genomic coordinate system, which emphasizes base-pair distance between points. ! ! Is this the best use of positional information?!

Part 2. Visual Design for Genomics

slide-41
SLIDE 41
  • M. Krzywinski adapted from Mackinlay J (1986) ACM Trans Graph 5: 110-141.!

Challenge 3

Part 2. Visual Design for Genomics

slide-42
SLIDE 42

Challenge 3

J J Jʹ Jʹ K Kʹ K Jʹ J Kʹ K’ K J Kʹ Jʹ K Reference Variant

c e a b d

J K Jʹ Kʹ

Structural rearrangements!

Part 2. Visual Design for Genomics

slide-43
SLIDE 43

Genomic features involve non-adjacent positions! Challenge 3

Encode important information in position!

Part 2. Visual Design for Genomics

slide-44
SLIDE 44

Challenge 4 Large number of data types!

Part 2. Visual Design for Genomics

slide-45
SLIDE 45

Genomic location (Mb) 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 2 4 1 Copy number Allelic ratio 100 Non-inverted

  • rientation

Inverted

  • rientation

Deletion-type Tandem dup-type Head-to-head inverted Tail-to-tail inverted

SNU-C1 (colorectal): Chr 15

A

Stephens et al., Cell, 2011!

Genomic rearrangement in cancer

Part 2. Visual Design for Genomics

slide-46
SLIDE 46

CAST/EiJ WSB/EiJ PWK/PhJ

b a

SPRET/EiJ 1 2 9 P 2 / O l a H s d 1 2 9 S 1 / S v I m J 1 2 9 S 5 / S v E v

B r d

A / J A K R / J B A L B / c J C 3 H / H e J C 5 7 B L / 6 N J C B A / J D B A / 2 J L P / J N O D / S h i L t J N Z O / H I L t J

1 2 3 4 5 6 7 8 9 10 1 1 1 2 13 14 15 16 17 18 19 X X 19 18 1 7 16 15 1 4 13 12 11 10 9 8 7 6 5 4 3 2 1 X 19 18 17 16 15 1 4 13 1 2 1 1 1 9 8 7 6 5 4 3 2 1 X 1 9 1 8 1 7 16 15 14 13 12 11 1 9 8 7 6 5 4 3 2 1 X 19 18 1 7 16 15 14 13 1 2 11 1 9 8 7 6 5 4 3 2 1 >100,000 742 179 836 SNPs SVs TEs Uncallable

17 mouse genomes

Part 2. Visual Design for Genomics Keane et al., Nature, 2011!

slide-47
SLIDE 47

Challenge 4 Large number of data types!

Exploit domain-specific details in your design!

Part 2. Visual Design for Genomics

slide-48
SLIDE 48

Challenge 5 No longer one genome but many!

Part 2. Visual Design for Genomics

slide-49
SLIDE 49

Challenge 5 No longer one genome but many!

Part 2. Visual Design for Genomics

slide-50
SLIDE 50

Ossowski et al. Genome Research, 2008!

Single nucleotide variation

Part 2. Visual Design for Genomics

slide-51
SLIDE 51

Integrative Genomics Viewer (IGV)!

Robinson et al. Nature Biotechnology, 2011!

Single nucleotide variation

Part 2. Visual Design for Genomics

slide-52
SLIDE 52

Challenge 5 No longer one genome but many!

Be open to change (genomics is evolving quickly)!

Part 2. Visual Design for Genomics

slide-53
SLIDE 53
  • 1. Cri<cally$consider$what$you$need$to$display$
  • 2. Connect$overview$and$detail$
  • 3. Encode$important$informa<on$in$posi<on$
  • 4. Exploit$domainIspecific$details$in$your$design$
  • 5. Be$open$to$change$(genomics$is$evolving$quickly)$

Part 2 - Summary

Part 2. Visual Design for Genomics

slide-54
SLIDE 54

Part 3

Hands-On Design Exercise

slide-55
SLIDE 55

Genome Assembly

Part 3. Hands-On Design Exercise AAAAAAAAAAAAAAGATGT$ CACCAGTACACCGATA$ ACCAGATGGATTAGATGTA$ TACACCGATACACCAGA$ AAAGATGTATACCACCAG$

Input!

slide-56
SLIDE 56

Genome Assembly

Part 3. Hands-On Design Exercise AAAAAAAAAAAAAAGATGT$ CACCAGTACACCGATA$ ACCAGATGGATTAGATGTA$ TACACCGATACACCAGA$ AAAGATGTATACCACCAG$ AAAAAAAAAAAAAAGATGT$ AAAGATGTATACCACCAG$ CACCAGTACACCGATA$ TACACCGATACACCAGA$ ACCAGATGGATTAGATGTA$

Input! Aligned!

slide-57
SLIDE 57

Genome Assembly

Part 3. Hands-On Design Exercise AAAAAAAAAAAAAAGATGT$ CACCAGTACACCGATA$ ACCAGATGGATTAGATGTA$ TACACCGATACACCAGA$ AAAGATGTATACCACCAG$ AAAAAAAAAAAAAAGATGTATACCACCAGTACACCGATACACCAGATGGATTAGATGTA$ AAAAAAAAAAAAAAGATGT$ AAAGATGTATACCACCAG$ CACCAGTACACCGATA$ TACACCGATACACCAGA$ ACCAGATGGATTAGATGTA$

Input! Aligned! Consensus!

slide-58
SLIDE 58

Sequence Alignment Rules

Part 3. Hands-On Design Exercise

slide-59
SLIDE 59

1.$Maximize$sequence$overlap:$ $ This$overlap$is$BETTER…& $

AAAAAAAAAAAAAAGATGTATACCACCAGTACACCGATACACCAGATG

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$GATGTATACCACCAGTACACCGATACACCAGATGGATTAGATGTAGGGG $ …than$this$overlap:$ $

AAAAAAAAAAAAAAGTATGTATACCACCAGTACACCGATACACCAGATG

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$GATGTATACCACCAGTACAC $

Sequence Alignment Rules

Part 3. Hands-On Design Exercise

slide-60
SLIDE 60

1.$Maximize$sequence$overlap:$ $ This$overlap$is$BETTER…& $

AAAAAAAAAAAAAAGATGTATACCACCAGTACACCGATACACCAGATG

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$GATGTATACCACCAGTACACCGATACACCAGATGGATTAGATGTAGGGG $ …than$this$overlap:$ $

AAAAAAAAAAAAAAGTATGTATACCACCAGTACACCGATACACCAGATG

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$GATGTATACCACCAGTACAC $ 2.$Align$leUers$rightIsideIup,$reading$leV$to$right$(just$like$wriUen$English):$ $ NOT$a$valid&overlap $: $ $ $ $ $ $Valid$overlap:$

$

CACCAGTACATTTTTAAAGGG CACCAGTACATTTTTAAAGGG

Sequence Alignment Rules

Part 3. Hands-On Design Exercise

GTACACCGGGAAATTTTTA ATTTTTAAAGGGCCACATG

slide-61
SLIDE 61

Sequence Alignments

Part 3. Hands-On Design Exercise AGCAGATC…AAAAAAAA AAAAAAAA…TACTTACA …TACTTACA…GGGGGGGG GGGGGGGG…GGGGGGGG GGGGGGGG…GACAGATA AAAAAAAA…AAAAAAAA

Yellow set!

slide-62
SLIDE 62

Sequence Alignments

Part 3. Hands-On Design Exercise GATAGA…AAAAAA AAAAAA…CAGATG …CAGATG…GGGGGG GGGGGG…GGGGGG GGGGGG…ATAGAC

Blue set!

…ATAGAC…AAAAAA AAAAAA…GGACAT AAAAAA…AAAAAA

slide-63
SLIDE 63

Sequence Alignments

Part 3. Hands-On Design Exercise GATAGA…AAAAAA AAAAAA…CAGATG …CAGATG…GGGGGG GGGGGG…ATAGAC

Both sets together (pretend you don’t know colour)!

…ATAGAC…AAAAAA AAAAAA…GGACAT AGCAGA…AAAAAA AAAAAA…CTTACA …CTTACA…GGGGGG GGGGGG…CAGATA Ambiguous –! could belong to multiple sequences:! AAAAAA…AAAAAA GGGGGG…GGGGGG

slide-64
SLIDE 64

Sequence Alignments

Part 3. Hands-On Design Exercise GATAGA…AAAAAA AAAAAA…CAGATG …CAGATG…GGGGGG GGGGGG…ATAGAC …ATAGAC…AAAAAA AAAAAA…GGACAT AGCAGA…AAAAAA AAAAAA…CTTACA …CTTACA…GGGGGG GGGGGG…CAGATA

slide-65
SLIDE 65
slide-66
SLIDE 66

Choosing a representation

Part 3. Hands-On Design Exercise

slide-67
SLIDE 67

Choosing a representation

Part 3. Hands-On Design Exercise

slide-68
SLIDE 68

Choosing a representation

Part 3. Hands-On Design Exercise

slide-69
SLIDE 69

ABySS-Explorer

Part 3. Hands-On Design Exercise

slide-70
SLIDE 70

ABySS-Explorer

inversion event in a human lymphoma genome reference human genome

(a) (b)

Nielsen et al. 2009! ! ABySS-Explorer: visualizing genome sequence assemblies.! ! IEEE Trans Vis Comput Graph! VisWeek Proceedings! (Best paper award)! !

Part 3. Hands-On Design Exercise

slide-71
SLIDE 71

Resources

The&Cartoon&Guide&to&Gene5cs& Larry$Gonick$and$Mark$Wheelis$(1991)$$ The&Processes&of&Life:&An&Introduc5on&to&Molecular&Biology& Lawrence$E.$Hunter$(2009)$ Nature&Methods&special&issue&on&Visualizing&Biological&Data&(2010)& hUp://www.nature.com/nmeth/journal/v7/n3s$ $ Bang&Wong’s&monthly&Points&of&View&column& hUp://bang.clearscience.info$