Examining Tumor Phylogeny Inference in Noisy Sequencing Data
Kiran Tomlinson and Layla Oesper
Department of Computer Science, Carleton College
- Dec. 4, 2018
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
- Dec. 4, 2018
1 / 24
Examining Tumor Phylogeny Inference in Noisy Sequencing Data Kiran - - PowerPoint PPT Presentation
Examining Tumor Phylogeny Inference in Noisy Sequencing Data Kiran Tomlinson and Layla Oesper Department of Computer Science, Carleton College Dec. 4, 2018 Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference Dec. 4, 2018 1 / 24
Kiran Tomlinson and Layla Oesper
Department of Computer Science, Carleton College
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
1 / 24
Mutation Time Tumor population
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
2 / 24
Mutation Time Tumor population Cell
Heterogeneous tumor
Mutations
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
2 / 24
Mutation Time Tumor population Cell
Heterogeneous tumor
Mutations
Clonal tree Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
2 / 24
How can we reconstruct a tumor’s clonal tree from its genome?
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
3 / 24
How can we reconstruct a tumor’s clonal tree from its genome?
Why is this important?
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
3 / 24
How can we reconstruct a tumor’s clonal tree from its genome?
Why is this important?
1 Personalized medicine (Greaves 2015), (McGranahan and Swanton
2017)
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
3 / 24
How can we reconstruct a tumor’s clonal tree from its genome?
Why is this important?
1 Personalized medicine (Greaves 2015), (McGranahan and Swanton
2017)
2 Improved understanding of cancer development Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
3 / 24
1
Background Previous work Bulk sequencing data ISA AncesTree
2
Methods
3
Results
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
4 / 24
…GAT… …GTT… Single nucleotide variants (SNV)
PhyloSub (Jiao et al. 2014) Rec-BTP (Hajirasouliha et al. 2014) AncesTree (El-Kebir et al. 2015) CITUP (Malikic et al. 2015) LICHeE (Popic et al. 2015) BitPhylogeny (Yuan et al. 2015)
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
5 / 24
…GAT… …GTT… Single nucleotide variants (SNV)
PhyloSub (Jiao et al. 2014) Rec-BTP (Hajirasouliha et al. 2014) AncesTree (El-Kebir et al. 2015) CITUP (Malikic et al. 2015) LICHeE (Popic et al. 2015) BitPhylogeny (Yuan et al. 2015)
SNVs and CNAs/structural variants:
SubcloneSeeker (Qiao et al. 2014) PhyloWGS (Deshwar et al. 2015) SPRUCE (El-Kebir et al. 2016) Canopy (Jiang et al. 2016) PASTRI (Satas and Raphael 2017)
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
5 / 24
…GAT… …GTT… Single nucleotide variants (SNV)
PhyloSub (Jiao et al. 2014) Rec-BTP (Hajirasouliha et al. 2014) AncesTree (El-Kebir et al. 2015) CITUP (Malikic et al. 2015) LICHeE (Popic et al. 2015) BitPhylogeny (Yuan et al. 2015)
Single-cell sequencing data:
OncoNEM (Ross et al. 2016) SCITE (Jahn et al. 2016) SiFit (Zafar et al. 2017)
SNVs and CNAs/structural variants:
SubcloneSeeker (Qiao et al. 2014) PhyloWGS (Deshwar et al. 2015) SPRUCE (El-Kebir et al. 2016) Canopy (Jiang et al. 2016) PASTRI (Satas and Raphael 2017)
Single-cell and bulk data:
ddClone (Salehi et al. 2017) B-SCITE (Malikic et al. 2018)
and many more....
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
5 / 24
…GAT… …GTT… Single nucleotide variants (SNV)
PhyloSub (Jiao et al. 2014) Rec-BTP (Hajirasouliha et al. 2014) AncesTree (El-Kebir et al. 2015) CITUP (Malikic et al. 2015) LICHeE (Popic et al. 2015) BitPhylogeny (Yuan et al. 2015)
Single-cell sequencing data:
OncoNEM (Ross et al. 2016) SCITE (Jahn et al. 2016) SiFit (Zafar et al. 2017)
SNVs and CNAs/structural variants:
SubcloneSeeker (Qiao et al. 2014) PhyloWGS (Deshwar et al. 2015) SPRUCE (El-Kebir et al. 2016) Canopy (Jiang et al. 2016) PASTRI (Satas and Raphael 2017)
Single-cell and bulk data:
ddClone (Salehi et al. 2017) B-SCITE (Malikic et al. 2018)
and many more....
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
5 / 24
Sample 1 (S1) Sample 2 (S2)
Aligned reads
S2 S1
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
6 / 24
Sample 1 (S1) Sample 2 (S2)
Aligned reads
S2 S1
0.17 0.33 0.17 0.5 0.25 0.25 0.25
S2
VAF matrix F (# variant reads / # total reads)
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
6 / 24
~3 billion base pairs
Infinite Sites Assumption (Kimura 1969)
No position in the genome mutates more than once.
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
7 / 24
0.5 0.17 0.33 0.17 0.5 0.25 0.25 0.25
S2
Ancestry graph (AG) Clonal trees VAF matrix F
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
8 / 24
0.5 0.17 0.33 0.17 0.5 0.25 0.25 0.25
S2
Ancestry graph (AG) Clonal trees VAF matrix F
Observation
Possible clonal trees ≡ AG spanning trees satisfying the sum condition: Fij ≥
Fik ∀i ∈ {1, . . . , s}.
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
8 / 24
0.5 0.17 0.33 0.17 0.5 0.25 0.25 0.25
S2
Ancestry graph (AG) Clonal trees VAF matrix F
Variant Allele Frequency Factorization Problem (VAFFP)
Given: VAF matrix F. Find: Usage matrix U and clonal matrix B such that F = 1 2UB.
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
8 / 24
1
Background
2
Methods Enumeration VAFFP Noise in sequencing data Handling noise Shrinking the search space
3
Results
0.99 0.87 0.99 0.34 0.73 0.01
sample
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
9 / 24
0.17 0.33 0.17 0.5 0.25 0.25 0.25
S2
Enumeration VAFFP
Given: VAF matrix F. Find: The set T (GF) of all ancestry graph spanning trees that satisfy the sum condition. How: Modified version of (Gabow and Myers 1978)
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
10 / 24
0.17 0.33 0.17 0.5 0.25 0.25 0.25
S2
Enumeration VAFFP (strict)
Given: VAF matrix F. Find: The set T (GF) of all ancestry graph spanning trees that satisfy the sum condition. How: Modified version of (Gabow and Myers 1978)
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
10 / 24
Input: Mixed cell sample
DNA fragments Short reads
…GATTACA…
Aligned reads Mutations 0.75 0.25 0.50 0.25 0.25
Output: VAFs Sequencing Alignment Variant Calling
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
11 / 24
Fij ≥
Fik ∀i ∈ {1, . . . , s}
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
12 / 24
Fij +ε ≥
Fik ∀i ∈ {1, . . . , s}
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
12 / 24
0.99 0.87 0.99 0.34 0.73 0.01
1 Complete weighted digraph 2 Posterior robability of ancestry: beta-binomial model (El-Kebir et al.
2015)
3 Enumerate spanning trees in weight order (Camerini et al. 1980) Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
13 / 24
Goal: simplify ancestry graph
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
14 / 24
Goal: simplify ancestry graph
2-transitive 3-transitive
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
14 / 24
Goal: simplify ancestry graph
2-transitive 3-transitive
k-PTR
Subgraph resulting from removing all ≥ k-transitive edges.
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
14 / 24
3-PTR
Goal: simplify ancestry graph
2-transitive 3-transitive
k-PTR
Subgraph resulting from removing all ≥ k-transitive edges.
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
14 / 24
Goal: simplify ancestry graph
2-transitive 3-transitive
k-PTR
Subgraph resulting from removing all ≥ k-transitive edges.
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
14 / 24
1
Background
2
Methods
3
Results Simulated data Real data Conclusions
DAZAP1, EXOC6B, GHDC, PLA2G16 LRRC16A 0.637 GPR158, OCA2, SLC12A1 0.435 COL24A1, DDX1, HMCN1, KLHDC2, MAP2K1, NOD1, ZFHX4, ZNF566 0.647
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
15 / 24
2 4 6 8 10 12 Number of Mutations 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials 2 4 6 8 10 12 14 16 Number of Samples 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials
Defaults: 10 mutation clusters 5 samples 60× coverage No overdispersion
60 100 140 180 Coverage 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials 0.02 0.04 0.06 0.08 Overdispersion Parameter 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
16 / 24
2 4 6 8 10 12 Number of Mutations 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement 2 4 6 8 10 12 14 16 Number of Samples 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement
Defaults: 10 mutation clusters 5 samples 60× coverage No overdispersion Ancestor-descendant distance (Govek et
60 100 140 180 Coverage 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement 0.02 0.04 0.06 0.08 Overdispersion Parameter 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
17 / 24
2 4 6 8 10 12 Number of Mutations 0.0 0.2 0.4 0.6 0.8 1.0 Mean A-D Improvement
strict approximate
0.99 0.87 0.99 0.34 0.73 0.01
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
18 / 24
2 3 4 5 6 None Transitive Removal Threshold 101 102 103 104 105 106 AG Spanning Trees
Mean Max
2 3 4 5 6 None Transitive Removal Threshold 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
19 / 24
Chronic lymphocytic leukemia (Schuh et al. 2012)
3 patients (CLL003, CLL006, CLL077) 5 samples each, spaced over time WGS (40× coverage) and deep sequencing (100000× coverage)
Clear cell renal carcinoma (Gerlinger et al. 2014)
8 patients (EV003, EV005, EV006, EV007, RK26, RMH002, RMH004, RMH008) 5-11 samples from different regions Amplicon sequencing (> 400× coverage)
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
20 / 24
Patient Samples Mutations1 # Clusters |T (GF)| CLL003 (deep) 5 15/20 4 CLL003 (WGS) 5 13/30 4 CLL006 (deep) 5 5/10 5 2 CLL006 (WGS) 5 6/16 5 CLL077 (deep) 5 12/16 4 1 CLL077 (WGS) 5 16/20 4 EV003 8 12/16 4, 5, 6 EV005 7 61/64 5, 6 EV006 9 52/57 5 EV007 8 54/56 4, 5 RK26 11 62/62 4, 5, 6 RMH002 5 48/48 5, 6 RMH004 6 126/126 5, 6 RMH008 8 69/71 5, 6
1After/before filtering out mutations with VAF above 0.5. Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
21 / 24
Patient Samples Mutations1 # Clusters |T (GF)| CLL003 (deep) 5 15/20 4 CLL003 (WGS) 5 13/30 4 CLL006 (deep) 5 5/10 5 2 CLL006 (WGS) 5 6/16 5 CLL077 (deep) 5 12/16 4 1 CLL077 (WGS) 5 16/20 4 EV003 8 12/16 4, 5, 6 EV005 7 61/64 5, 6 EV006 9 52/57 5 EV007 8 54/56 4, 5 RK26 11 62/62 4, 5, 6 RMH002 5 48/48 5, 6 RMH004 6 126/126 5, 6 RMH008 8 69/71 5, 6
1After/before filtering out mutations with VAF above 0.5. Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
21 / 24
100000× coverage
1 2 3 4 5 Sample 0.0 0.1 0.2 0.3 0.4 0.5 Variant Frequency
40× coverage
1 2 3 4 5 Sample 0.0 0.1 0.2 0.3 0.4 0.5 Variant Frequency
LRRC16A DAZAP1, EXOC6B, GHDC, OCA2, PLA2G16 0.999 COL24A1, HMCN1, KLHDC2, MAP2K1, NOD1 SLC12A1 0.999 0.999 DAZAP1, EXOC6B, GHDC, PLA2G16 LRRC16A 0.637 GPR158, OCA2, SLC12A1 0.435 COL24A1, DDX1, HMCN1, KLHDC2, MAP2K1, NOD1, ZFHX4, ZNF566 0.647
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
22 / 24
100000× coverage
1 2 3 4 5 Sample 0.0 0.1 0.2 0.3 0.4 0.5 Variant Frequency
40× coverage
1 2 3 4 5 Sample 0.0 0.1 0.2 0.3 0.4 0.5 Variant Frequency
LRRC16A DAZAP1, EXOC6B, GHDC, OCA2, PLA2G16 0.999 COL24A1, HMCN1, KLHDC2, MAP2K1, NOD1 SLC12A1 0.999 0.999 DAZAP1, EXOC6B, GHDC, PLA2G16 LRRC16A 0.637 GPR158, OCA2, SLC12A1 0.435 COL24A1, DDX1, HMCN1, KLHDC2, MAP2K1, NOD1, ZFHX4, ZNF566 0.647 Filtered for high VAF in deep data Clustered differently by k-means Not present in deep sequencing
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
22 / 24
1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa
0.02 0.04 0.06 0.08 Overdispersion Parameter 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
23 / 24
1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa
0.02 0.04 0.06 0.08 Overdispersion Parameter 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
23 / 24
1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa
⇒ ⇒ Goal: Find likely clonal tr
0.99 0.87 0.99 0.34 0.73 0.01
Appr
sample
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
23 / 24
1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa
2 3 4 5 6 None Transitive Removal Threshold 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
23 / 24
1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa
2 4 6 8 10 12 Number of Mutations 0.0 0.2 0.4 0.6 0.8 1.0 Mean A-D Improvement
strict approximate
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
23 / 24
This project is supported by NSF CRII award IIS-1657380 and by Elledge, Eugster, and Class of ’49 Fellowships from Carleton College (to LO). Thanks to Zach DiNardo, Thais Del Rosario Hernandez, and Rosa Zhou for helpful conversations. Special thanks to Layla Oesper for her mentorship, support, and feedback.
Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference
24 / 24