Visual Analytics for
Genomics
Cydney Nielsen!
BC Cancer Agency! Vancouver, BC, Canada!
Outline Part 1 Introduction to Genomics Part 2 Visual Design for - - PowerPoint PPT Presentation
Visual Analytics for Genomics Cydney Nielsen ! BC Cancer Agency ! Vancouver, BC, Canada ! Outline Part 1 Introduction to Genomics Part 2 Visual Design for Genomics Part 3 Hands-On Design Exercise Part 1 Introduction to Genomics Genomics
Cydney Nielsen!
BC Cancer Agency! Vancouver, BC, Canada!
Outline
Part 1 Introduction to Genomics Part 2 Visual Design for Genomics Part 3 Hands-On Design Exercise
genome: the complete genetic material of a cell
Genomics Workflow
Part 1. Intro to Genomics
Sequencing Experiment
Part 1. Intro to Genomics
Sequencing Experiment
Part 1. Intro to Genomics
Sequencing Experiment
Part 1. Intro to Genomics
G - C! T - A!
sample data insight
Genomics Workflow
Part 1. Intro to Genomics
sample data insight
experiment
sequencing technology!
Genomics Workflow
Part 1. Intro to Genomics
sample data insight
experiment analysis
visualization! computation!
+
sequencing technology!
Genomics Workflow
Part 1. Intro to Genomics
sample data insight
experiment analysis
visualization! computation!
+
sequencing technology!
Genomics Workflow
Part 1. Intro to Genomics
sample data
experiment
sequencing technology!
Genomics Workflow
molecular biology
Part 1. Intro to Genomics
Genomics Workflow
computational biology / bioinformatics visual analytics
Part 1. Intro to Genomics
data insight
analysis
visualization! computation!
+
sample data
experiment
sequencing technology!
Genomics Workflow
molecular biology
Part 1. Intro to Genomics
Sequencing Experiment
Part 1. Intro to Genomics AAAAAAAAAAAAAAGATGT$ CACCAGTACACCGATA$ ACCAGATGGATTAGATGTA$ TACACCGATACACCAGA$ AAAGATGTATACCACCAG$ Sequencing machine! Millions of short sequences (“reads”)! e.g. 75 nt each compared to >3 billion nt in human genome!
Sequencing Experiment
Part 1. Intro to Genomics ~$5,000$ in$2001$ ~10¢$ in$2011$
Genomics Workflow
computational biology / bioinformatics visual analytics
Part 1. Intro to Genomics
data insight
analysis
visualization! computation!
+
De novo assembly!
AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$
Genome$Assembly$
Sequencing Experiments
Part 1. Intro to Genomics
De novo assembly! Re-sequencing!
AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$ AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$
Reference$Genome$ Genome$Assembly$
Sequencing Experiments
Part 1. Intro to Genomics
De novo assembly! Re-sequencing! Enrichment!
AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$ AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$ AGCTTCAGATGGACAGATAA$ GGCATACAGACTTAGACATA$ CCAGACAAGACAGACACAGTA$ TACAAGACATAAGCAATACAGA$ CCAGACAAGACAGACACAGTA$
Reference$Genome$ Reference$Genome$ Genome$Assembly$
Sequencing Experiments
Part 1. Intro to Genomics
Sequencing Experiments
Part 1. Intro to Genomics
but not in unaffected individuals?!
mutations) or not?! !
Part 1 - Summary
Part 1. Intro to Genomics
data!
biologists and clinicians to make the most of these data!
insight and understanding! !
Large number of samples for comparison! Challenge 1
Part 2. Visual Design for Genomics
Large number of samples for comparison!
“To systematically characterize the genomic changes in hundreds of tumors… and thousands of samples over the next five years”! ! The Cancer Genome Atlas! www.cancergenome.nih.gov!
Challenge 1
Part 2. Visual Design for Genomics
Stacked data tracks along a common genome x-axis!
Genome coordinate! Data samples!
Genome Browsers
Home Genomes Blat Tables Gene Sorter PCR PDF/PS Session FAQ Help
UCSC Cancer Genomics Heatmaps
Glioblastoma Copy Number Abnormality, Agilent 244A array (n=200)
Tumor vs normal G e n d e r
Genome coordinate! Data samples! Zhu et al., Nature Methods, 2009!
Genome Browsers
Part 2. Visual Design for Genomics
Large number of samples for comparison! ! ! ! Challenge 1
Critically consider what you need to display!
!
e.g. replace primary data with a biologically meaningful summary, such as significant changes between samples !
Part 2. Visual Design for Genomics
Genomic features are small and sparse! Challenge 2
Part 2. Visual Design for Genomics
LOCAL VIEW!
Genome Browsers
Part 2. Visual Design for Genomics
LOCAL VIEW!
Human chr1, 1 pt corresponds to 480 kb, which is larger than 98% of all human genes! !
Genome Browsers
Part 2. Visual Design for Genomics
a b
Chromatin states: Chromosome 3L 5′ 3′ 5′ 3′ 5′ 3′ 5′ 3′
Pericentromeric heterochromatin Cluster of small expressed genes PcG domains Heterochromatin- like domain Open chromatin domain
9 8 7 6 5 4 3 2 1
GLOBAL VIEW!
Kharchenko et al., Nature, 2011! Anders, Bioinformatics, 2009!
Hilbert Curve
Part 2. Visual Design for Genomics
Connect overview and detail!
Challenge 2 Genomic features are small and sparse!
Part 2. Visual Design for Genomics
Genomic features involve non-adjacent positions! Challenge 3
Part 2. Visual Design for Genomics
Challenge 3
J J Jʹ Jʹ K Kʹ K Jʹ J Kʹ K’ K J Kʹ Jʹ K Reference Variant
c e a b d
J K Jʹ Kʹ
Structural rearrangements!
Part 2. Visual Design for Genomics
Challenge 3
J J Jʹ Jʹ K Kʹ K Jʹ J Kʹ K’ K J Kʹ Jʹ K Reference Variant
c e a b d
J K Jʹ Kʹ
Structural rearrangements!
Part 2. Visual Design for Genomics
Challenge 3 Structural rearrangements!
Circos, Martin Krzywinski! Part 2. Visual Design for Genomics
Challenge 3
J J Jʹ Jʹ K Kʹ K Jʹ J Kʹ K’ K J Kʹ Jʹ K Reference Variant
c e a b d
J K Jʹ Kʹ
Structural rearrangements!
Part 2. Visual Design for Genomics
Challenge 3 Structural rearrangements!
VISTA-Dot! Part 2. Visual Design for Genomics
Challenge 3
All these representations use a genomic coordinate system, which emphasizes base-pair distance between points. ! ! Is this the best use of positional information?!
Part 2. Visual Design for Genomics
Challenge 3
Part 2. Visual Design for Genomics
Challenge 3
J J Jʹ Jʹ K Kʹ K Jʹ J Kʹ K’ K J Kʹ Jʹ K Reference Variant
c e a b d
J K Jʹ Kʹ
Structural rearrangements!
Part 2. Visual Design for Genomics
Genomic features involve non-adjacent positions! Challenge 3
Encode important information in position!
Part 2. Visual Design for Genomics
Challenge 4 Large number of data types!
Part 2. Visual Design for Genomics
Genomic location (Mb) 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 2 4 1 Copy number Allelic ratio 100 Non-inverted
Inverted
Deletion-type Tandem dup-type Head-to-head inverted Tail-to-tail inverted
SNU-C1 (colorectal): Chr 15
A
Stephens et al., Cell, 2011!
Genomic rearrangement in cancer
Part 2. Visual Design for Genomics
CAST/EiJ WSB/EiJ PWK/PhJ
b a
SPRET/EiJ 1 2 9 P 2 / O l a H s d 1 2 9 S 1 / S v I m J 1 2 9 S 5 / S v E v
B r dA / J A K R / J B A L B / c J C 3 H / H e J C 5 7 B L / 6 N J C B A / J D B A / 2 J L P / J N O D / S h i L t J N Z O / H I L t J
1 2 3 4 5 6 7 8 9 10 1 1 1 2 13 14 15 16 17 18 19 X X 19 18 1 7 16 15 1 4 13 12 11 10 9 8 7 6 5 4 3 2 1 X 19 18 17 16 15 1 4 13 1 2 1 1 1 9 8 7 6 5 4 3 2 1 X 1 9 1 8 1 7 16 15 14 13 12 11 1 9 8 7 6 5 4 3 2 1 X 19 18 1 7 16 15 14 13 1 2 11 1 9 8 7 6 5 4 3 2 1 >100,000 742 179 836 SNPs SVs TEs Uncallable17 mouse genomes
Part 2. Visual Design for Genomics Keane et al., Nature, 2011!
Challenge 4 Large number of data types!
Exploit domain-specific details in your design!
Part 2. Visual Design for Genomics
Challenge 5 No longer one genome but many!
Part 2. Visual Design for Genomics
Challenge 5 No longer one genome but many!
Part 2. Visual Design for Genomics
Ossowski et al. Genome Research, 2008!
Single nucleotide variation
Part 2. Visual Design for Genomics
Integrative Genomics Viewer (IGV)!
Robinson et al. Nature Biotechnology, 2011!
Single nucleotide variation
Part 2. Visual Design for Genomics
Challenge 5 No longer one genome but many!
Be open to change (genomics is evolving quickly)!
Part 2. Visual Design for Genomics
Part 2 - Summary
Part 2. Visual Design for Genomics
Genome Assembly
Part 3. Hands-On Design Exercise AAAAAAAAAAAAAAGATGT$ CACCAGTACACCGATA$ ACCAGATGGATTAGATGTA$ TACACCGATACACCAGA$ AAAGATGTATACCACCAG$
Input!
Genome Assembly
Part 3. Hands-On Design Exercise AAAAAAAAAAAAAAGATGT$ CACCAGTACACCGATA$ ACCAGATGGATTAGATGTA$ TACACCGATACACCAGA$ AAAGATGTATACCACCAG$ AAAAAAAAAAAAAAGATGT$ AAAGATGTATACCACCAG$ CACCAGTACACCGATA$ TACACCGATACACCAGA$ ACCAGATGGATTAGATGTA$
Input! Aligned!
Genome Assembly
Part 3. Hands-On Design Exercise AAAAAAAAAAAAAAGATGT$ CACCAGTACACCGATA$ ACCAGATGGATTAGATGTA$ TACACCGATACACCAGA$ AAAGATGTATACCACCAG$ AAAAAAAAAAAAAAGATGTATACCACCAGTACACCGATACACCAGATGGATTAGATGTA$ AAAAAAAAAAAAAAGATGT$ AAAGATGTATACCACCAG$ CACCAGTACACCGATA$ TACACCGATACACCAGA$ ACCAGATGGATTAGATGTA$
Input! Aligned! Consensus!
Sequence Alignment Rules
Part 3. Hands-On Design Exercise
1.$Maximize$sequence$overlap:$ $ This$overlap$is$BETTER…& $
AAAAAAAAAAAAAAGATGTATACCACCAGTACACCGATACACCAGATG
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$GATGTATACCACCAGTACACCGATACACCAGATGGATTAGATGTAGGGG $ …than$this$overlap:$ $
AAAAAAAAAAAAAAGTATGTATACCACCAGTACACCGATACACCAGATG
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$GATGTATACCACCAGTACAC $
Sequence Alignment Rules
Part 3. Hands-On Design Exercise
1.$Maximize$sequence$overlap:$ $ This$overlap$is$BETTER…& $
AAAAAAAAAAAAAAGATGTATACCACCAGTACACCGATACACCAGATG
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$GATGTATACCACCAGTACACCGATACACCAGATGGATTAGATGTAGGGG $ …than$this$overlap:$ $
AAAAAAAAAAAAAAGTATGTATACCACCAGTACACCGATACACCAGATG
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$GATGTATACCACCAGTACAC $ 2.$Align$leUers$rightIsideIup,$reading$leV$to$right$(just$like$wriUen$English):$ $ NOT$a$valid&overlap $: $ $ $ $ $ $Valid$overlap:$
$
CACCAGTACATTTTTAAAGGG CACCAGTACATTTTTAAAGGG
Sequence Alignment Rules
Part 3. Hands-On Design Exercise
GTACACCGGGAAATTTTTA ATTTTTAAAGGGCCACATG
Sequence Alignments
Part 3. Hands-On Design Exercise AGCAGATC…AAAAAAAA AAAAAAAA…TACTTACA …TACTTACA…GGGGGGGG GGGGGGGG…GGGGGGGG GGGGGGGG…GACAGATA AAAAAAAA…AAAAAAAA
Yellow set!
Sequence Alignments
Part 3. Hands-On Design Exercise GATAGA…AAAAAA AAAAAA…CAGATG …CAGATG…GGGGGG GGGGGG…GGGGGG GGGGGG…ATAGAC
Blue set!
…ATAGAC…AAAAAA AAAAAA…GGACAT AAAAAA…AAAAAA
Sequence Alignments
Part 3. Hands-On Design Exercise GATAGA…AAAAAA AAAAAA…CAGATG …CAGATG…GGGGGG GGGGGG…ATAGAC
Both sets together (pretend you don’t know colour)!
…ATAGAC…AAAAAA AAAAAA…GGACAT AGCAGA…AAAAAA AAAAAA…CTTACA …CTTACA…GGGGGG GGGGGG…CAGATA Ambiguous –! could belong to multiple sequences:! AAAAAA…AAAAAA GGGGGG…GGGGGG
Sequence Alignments
Part 3. Hands-On Design Exercise GATAGA…AAAAAA AAAAAA…CAGATG …CAGATG…GGGGGG GGGGGG…ATAGAC …ATAGAC…AAAAAA AAAAAA…GGACAT AGCAGA…AAAAAA AAAAAA…CTTACA …CTTACA…GGGGGG GGGGGG…CAGATA
Choosing a representation
Part 3. Hands-On Design Exercise
Choosing a representation
Part 3. Hands-On Design Exercise
Choosing a representation
Part 3. Hands-On Design Exercise
ABySS-Explorer
Part 3. Hands-On Design Exercise
ABySS-Explorer
inversion event in a human lymphoma genome reference human genome(a) (b)
Nielsen et al. 2009! ! ABySS-Explorer: visualizing genome sequence assemblies.! ! IEEE Trans Vis Comput Graph! VisWeek Proceedings! (Best paper award)! !
Part 3. Hands-On Design Exercise
Resources
The&Cartoon&Guide&to&Gene5cs& Larry$Gonick$and$Mark$Wheelis$(1991)$$ The&Processes&of&Life:&An&Introduc5on&to&Molecular&Biology& Lawrence$E.$Hunter$(2009)$ Nature&Methods&special&issue&on&Visualizing&Biological&Data&(2010)& hUp://www.nature.com/nmeth/journal/v7/n3s$ $ Bang&Wong’s&monthly&Points&of&View&column& hUp://bang.clearscience.info$