Examining Tumor Phylogeny Inference in Noisy Sequencing Data Kiran - - PowerPoint PPT Presentation

examining tumor phylogeny inference in noisy sequencing
SMART_READER_LITE
LIVE PREVIEW

Examining Tumor Phylogeny Inference in Noisy Sequencing Data Kiran - - PowerPoint PPT Presentation

Examining Tumor Phylogeny Inference in Noisy Sequencing Data Kiran Tomlinson and Layla Oesper Department of Computer Science, Carleton College Dec. 4, 2018 Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference Dec. 4, 2018 1 / 24


slide-1
SLIDE 1

Examining Tumor Phylogeny Inference in Noisy Sequencing Data

Kiran Tomlinson and Layla Oesper

Department of Computer Science, Carleton College

  • Dec. 4, 2018

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

1 / 24

slide-2
SLIDE 2

Clonal theory (Nowell 1976)

Mutation Time Tumor population

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

2 / 24

slide-3
SLIDE 3

Clonal theory (Nowell 1976)

Mutation Time Tumor population Cell

Heterogeneous tumor

Mutations

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

2 / 24

slide-4
SLIDE 4

Clonal theory (Nowell 1976)

Mutation Time Tumor population Cell

Heterogeneous tumor

Mutations

Clonal tree Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

2 / 24

slide-5
SLIDE 5

Inferring tumor phylogeny

How can we reconstruct a tumor’s clonal tree from its genome?

?

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

3 / 24

slide-6
SLIDE 6

Inferring tumor phylogeny

How can we reconstruct a tumor’s clonal tree from its genome?

?

Why is this important?

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

3 / 24

slide-7
SLIDE 7

Inferring tumor phylogeny

How can we reconstruct a tumor’s clonal tree from its genome?

?

Why is this important?

1 Personalized medicine (Greaves 2015), (McGranahan and Swanton

2017)

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

3 / 24

slide-8
SLIDE 8

Inferring tumor phylogeny

How can we reconstruct a tumor’s clonal tree from its genome?

?

Why is this important?

1 Personalized medicine (Greaves 2015), (McGranahan and Swanton

2017)

2 Improved understanding of cancer development Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

3 / 24

slide-9
SLIDE 9

Outline

1

Background Previous work Bulk sequencing data ISA AncesTree

2

Methods

3

Results

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

4 / 24

slide-10
SLIDE 10

Previous work

…GAT… …GTT… Single nucleotide variants (SNV)

  • nly:

PhyloSub (Jiao et al. 2014) Rec-BTP (Hajirasouliha et al. 2014) AncesTree (El-Kebir et al. 2015) CITUP (Malikic et al. 2015) LICHeE (Popic et al. 2015) BitPhylogeny (Yuan et al. 2015)

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

5 / 24

slide-11
SLIDE 11

Previous work

…GAT… …GTT… Single nucleotide variants (SNV)

  • nly:

PhyloSub (Jiao et al. 2014) Rec-BTP (Hajirasouliha et al. 2014) AncesTree (El-Kebir et al. 2015) CITUP (Malikic et al. 2015) LICHeE (Popic et al. 2015) BitPhylogeny (Yuan et al. 2015)

SNVs and CNAs/structural variants:

SubcloneSeeker (Qiao et al. 2014) PhyloWGS (Deshwar et al. 2015) SPRUCE (El-Kebir et al. 2016) Canopy (Jiang et al. 2016) PASTRI (Satas and Raphael 2017)

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

5 / 24

slide-12
SLIDE 12

Previous work

…GAT… …GTT… Single nucleotide variants (SNV)

  • nly:

PhyloSub (Jiao et al. 2014) Rec-BTP (Hajirasouliha et al. 2014) AncesTree (El-Kebir et al. 2015) CITUP (Malikic et al. 2015) LICHeE (Popic et al. 2015) BitPhylogeny (Yuan et al. 2015)

Single-cell sequencing data:

OncoNEM (Ross et al. 2016) SCITE (Jahn et al. 2016) SiFit (Zafar et al. 2017)

SNVs and CNAs/structural variants:

SubcloneSeeker (Qiao et al. 2014) PhyloWGS (Deshwar et al. 2015) SPRUCE (El-Kebir et al. 2016) Canopy (Jiang et al. 2016) PASTRI (Satas and Raphael 2017)

Single-cell and bulk data:

ddClone (Salehi et al. 2017) B-SCITE (Malikic et al. 2018)

and many more....

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

5 / 24

slide-13
SLIDE 13

Previous work

…GAT… …GTT… Single nucleotide variants (SNV)

  • nly:

PhyloSub (Jiao et al. 2014) Rec-BTP (Hajirasouliha et al. 2014) AncesTree (El-Kebir et al. 2015) CITUP (Malikic et al. 2015) LICHeE (Popic et al. 2015) BitPhylogeny (Yuan et al. 2015)

Single-cell sequencing data:

OncoNEM (Ross et al. 2016) SCITE (Jahn et al. 2016) SiFit (Zafar et al. 2017)

SNVs and CNAs/structural variants:

SubcloneSeeker (Qiao et al. 2014) PhyloWGS (Deshwar et al. 2015) SPRUCE (El-Kebir et al. 2016) Canopy (Jiang et al. 2016) PASTRI (Satas and Raphael 2017)

Single-cell and bulk data:

ddClone (Salehi et al. 2017) B-SCITE (Malikic et al. 2018)

and many more....

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

5 / 24

slide-14
SLIDE 14

Bulk sequencing data

Sample 1 (S1) Sample 2 (S2)

Aligned reads

S2 S1

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

6 / 24

slide-15
SLIDE 15

Bulk sequencing data

Sample 1 (S1) Sample 2 (S2)

Aligned reads

S2 S1

  • 0.5

0.17 0.33 0.17 0.5 0.25 0.25 0.25

  • S1

S2

VAF matrix F (# variant reads / # total reads)

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

6 / 24

slide-16
SLIDE 16

ISA

~3 billion base pairs

Infinite Sites Assumption (Kimura 1969)

No position in the genome mutates more than once.

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

7 / 24

slide-17
SLIDE 17

AncesTree (El-Kebir et al. 2015)

0.5 0.17 0.33 0.17 0.5 0.25 0.25 0.25

  • S1

S2

Ancestry graph (AG) Clonal trees VAF matrix F

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

8 / 24

slide-18
SLIDE 18

AncesTree (El-Kebir et al. 2015)

0.5 0.17 0.33 0.17 0.5 0.25 0.25 0.25

  • S1

S2

Ancestry graph (AG) Clonal trees VAF matrix F

Observation

Possible clonal trees ≡ AG spanning trees satisfying the sum condition: Fij ≥

  • k child of j

Fik ∀i ∈ {1, . . . , s}.

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

8 / 24

slide-19
SLIDE 19

AncesTree (El-Kebir et al. 2015)

0.5 0.17 0.33 0.17 0.5 0.25 0.25 0.25

  • S1

S2

Ancestry graph (AG) Clonal trees VAF matrix F

Variant Allele Frequency Factorization Problem (VAFFP)

Given: VAF matrix F. Find: Usage matrix U and clonal matrix B such that F = 1 2UB.

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

8 / 24

slide-20
SLIDE 20

Outline

1

Background

2

Methods Enumeration VAFFP Noise in sequencing data Handling noise Shrinking the search space

3

Results

  • Illumina

⇒ ⇒ Goal: Find likely clonal tr

0.99 0.87 0.99 0.34 0.73 0.01

Appr

  • .

sample

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

9 / 24

slide-21
SLIDE 21

E-VAFFP

F TG(F)

  • 0.5

0.17 0.33 0.17 0.5 0.25 0.25 0.25

  • S1

S2

…}

{

, ,

Enumeration VAFFP

Given: VAF matrix F. Find: The set T (GF) of all ancestry graph spanning trees that satisfy the sum condition. How: Modified version of (Gabow and Myers 1978)

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

10 / 24

slide-22
SLIDE 22

E-VAFFP

F TG(F)

  • 0.5

0.17 0.33 0.17 0.5 0.25 0.25 0.25

  • S1

S2

…}

{

, ,

Enumeration VAFFP (strict)

Given: VAF matrix F. Find: The set T (GF) of all ancestry graph spanning trees that satisfy the sum condition. How: Modified version of (Gabow and Myers 1978)

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

10 / 24

slide-23
SLIDE 23

Sources of noise

Input: Mixed cell sample

DNA fragments Short reads

…GATTACA…

Aligned reads Mutations 0.75 0.25 0.50 0.25 0.25

Output: VAFs Sequencing Alignment Variant Calling

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

11 / 24

slide-24
SLIDE 24

Relaxed sum condition

!

Fij ≥

  • k child of j

Fik ∀i ∈ {1, . . . , s}

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

12 / 24

slide-25
SLIDE 25

Relaxed sum condition

!

✓ ✓

Fij +ε ≥

  • k child of j

Fik ∀i ∈ {1, . . . , s}

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

12 / 24

slide-26
SLIDE 26

Approximate ancestry graph

  • Illumina

⇒ ⇒ Goal: Find likely clonal tr

0.99 0.87 0.99 0.34 0.73 0.01

Appr

  • .

sample

1 Complete weighted digraph 2 Posterior robability of ancestry: beta-binomial model (El-Kebir et al.

2015)

3 Enumerate spanning trees in weight order (Camerini et al. 1980) Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

13 / 24

slide-27
SLIDE 27

Partial transitive reduction

Goal: simplify ancestry graph

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

14 / 24

slide-28
SLIDE 28

Partial transitive reduction

Goal: simplify ancestry graph

2-transitive 3-transitive

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

14 / 24

slide-29
SLIDE 29

Partial transitive reduction

Goal: simplify ancestry graph

2-transitive 3-transitive

k-PTR

Subgraph resulting from removing all ≥ k-transitive edges.

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

14 / 24

slide-30
SLIDE 30

Partial transitive reduction

3-PTR

Goal: simplify ancestry graph

2-transitive 3-transitive

k-PTR

Subgraph resulting from removing all ≥ k-transitive edges.

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

14 / 24

slide-31
SLIDE 31

Partial transitive reduction

Goal: simplify ancestry graph

2-transitive 3-transitive

k-PTR

Subgraph resulting from removing all ≥ k-transitive edges.

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

14 / 24

slide-32
SLIDE 32

Outline

1

Background

2

Methods

3

Results Simulated data Real data Conclusions

DAZAP1, EXOC6B, GHDC, PLA2G16 LRRC16A 0.637 GPR158, OCA2, SLC12A1 0.435 COL24A1, DDX1, HMCN1, KLHDC2, MAP2K1, NOD1, ZFHX4, ZNF566 0.647

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

15 / 24

slide-33
SLIDE 33

Simulated data: solution existence

2 4 6 8 10 12 Number of Mutations 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials 2 4 6 8 10 12 14 16 Number of Samples 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials

Defaults: 10 mutation clusters 5 samples 60× coverage No overdispersion

60 100 140 180 Coverage 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials 0.02 0.04 0.06 0.08 Overdispersion Parameter 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

16 / 24

slide-34
SLIDE 34

Simulated data: solution quality

2 4 6 8 10 12 Number of Mutations 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement 2 4 6 8 10 12 14 16 Number of Samples 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement

Defaults: 10 mutation clusters 5 samples 60× coverage No overdispersion Ancestor-descendant distance (Govek et

  • al. 2018)

60 100 140 180 Coverage 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement 0.02 0.04 0.06 0.08 Overdispersion Parameter 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

17 / 24

slide-35
SLIDE 35

Simulated data: approximate vs strict

2 4 6 8 10 12 Number of Mutations 0.0 0.2 0.4 0.6 0.8 1.0 Mean A-D Improvement

strict approximate

0.99 0.87 0.99 0.34 0.73 0.01

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

18 / 24

slide-36
SLIDE 36

Simulated data: PTR

2 3 4 5 6 None Transitive Removal Threshold 101 102 103 104 105 106 AG Spanning Trees

Mean Max

2 3 4 5 6 None Transitive Removal Threshold 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

19 / 24

slide-37
SLIDE 37

Real data

Chronic lymphocytic leukemia (Schuh et al. 2012)

3 patients (CLL003, CLL006, CLL077) 5 samples each, spaced over time WGS (40× coverage) and deep sequencing (100000× coverage)

Clear cell renal carcinoma (Gerlinger et al. 2014)

8 patients (EV003, EV005, EV006, EV007, RK26, RMH002, RMH004, RMH008) 5-11 samples from different regions Amplicon sequencing (> 400× coverage)

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

20 / 24

slide-38
SLIDE 38

Real data: strict solution rarity

Patient Samples Mutations1 # Clusters |T (GF)| CLL003 (deep) 5 15/20 4 CLL003 (WGS) 5 13/30 4 CLL006 (deep) 5 5/10 5 2 CLL006 (WGS) 5 6/16 5 CLL077 (deep) 5 12/16 4 1 CLL077 (WGS) 5 16/20 4 EV003 8 12/16 4, 5, 6 EV005 7 61/64 5, 6 EV006 9 52/57 5 EV007 8 54/56 4, 5 RK26 11 62/62 4, 5, 6 RMH002 5 48/48 5, 6 RMH004 6 126/126 5, 6 RMH008 8 69/71 5, 6

1After/before filtering out mutations with VAF above 0.5. Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

21 / 24

slide-39
SLIDE 39

Real data: strict solution rarity

Patient Samples Mutations1 # Clusters |T (GF)| CLL003 (deep) 5 15/20 4 CLL003 (WGS) 5 13/30 4 CLL006 (deep) 5 5/10 5 2 CLL006 (WGS) 5 6/16 5 CLL077 (deep) 5 12/16 4 1 CLL077 (WGS) 5 16/20 4 EV003 8 12/16 4, 5, 6 EV005 7 61/64 5, 6 EV006 9 52/57 5 EV007 8 54/56 4, 5 RK26 11 62/62 4, 5, 6 RMH002 5 48/48 5, 6 RMH004 6 126/126 5, 6 RMH008 8 69/71 5, 6

1After/before filtering out mutations with VAF above 0.5. Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

21 / 24

slide-40
SLIDE 40

CLL077

100000× coverage

1 2 3 4 5 Sample 0.0 0.1 0.2 0.3 0.4 0.5 Variant Frequency

40× coverage

1 2 3 4 5 Sample 0.0 0.1 0.2 0.3 0.4 0.5 Variant Frequency

LRRC16A DAZAP1, EXOC6B, GHDC, OCA2, PLA2G16 0.999 COL24A1, HMCN1, KLHDC2, MAP2K1, NOD1 SLC12A1 0.999 0.999 DAZAP1, EXOC6B, GHDC, PLA2G16 LRRC16A 0.637 GPR158, OCA2, SLC12A1 0.435 COL24A1, DDX1, HMCN1, KLHDC2, MAP2K1, NOD1, ZFHX4, ZNF566 0.647

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

22 / 24

slide-41
SLIDE 41

CLL077

100000× coverage

1 2 3 4 5 Sample 0.0 0.1 0.2 0.3 0.4 0.5 Variant Frequency

40× coverage

1 2 3 4 5 Sample 0.0 0.1 0.2 0.3 0.4 0.5 Variant Frequency

LRRC16A DAZAP1, EXOC6B, GHDC, OCA2, PLA2G16 0.999 COL24A1, HMCN1, KLHDC2, MAP2K1, NOD1 SLC12A1 0.999 0.999 DAZAP1, EXOC6B, GHDC, PLA2G16 LRRC16A 0.637 GPR158, OCA2, SLC12A1 0.435 COL24A1, DDX1, HMCN1, KLHDC2, MAP2K1, NOD1, ZFHX4, ZNF566 0.647 Filtered for high VAF in deep data Clustered differently by k-means Not present in deep sequencing

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

22 / 24

slide-42
SLIDE 42

Conclusions

1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa

0.02 0.04 0.06 0.08 Overdispersion Parameter 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Trials

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

23 / 24

slide-43
SLIDE 43

Conclusions

1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa

0.02 0.04 0.06 0.08 Overdispersion Parameter 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

23 / 24

slide-44
SLIDE 44

Conclusions

1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa

  • Illumina

⇒ ⇒ Goal: Find likely clonal tr

0.99 0.87 0.99 0.34 0.73 0.01

Appr

  • .

sample

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

23 / 24

slide-45
SLIDE 45

Conclusions

1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa

2 3 4 5 6 None Transitive Removal Threshold 0.0 0.2 0.4 0.6 0.8 1.0 A-D Distance Improvement

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

23 / 24

slide-46
SLIDE 46

Conclusions

1 Strict ISA-based trees are rare in simulated and real data 2 Overdispersion makes solutions rarer, but not worse 3 Approximate AG and relaxed sum condition increase robustness 4 PTR simplifies AG with minor quality impact (skews topology) 5 Approximate AG outperforms strict for few mutations and vice versa

2 4 6 8 10 12 Number of Mutations 0.0 0.2 0.4 0.6 0.8 1.0 Mean A-D Improvement

strict approximate

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

23 / 24

slide-47
SLIDE 47

Acknowledgment

This project is supported by NSF CRII award IIS-1657380 and by Elledge, Eugster, and Class of ’49 Fellowships from Carleton College (to LO). Thanks to Zach DiNardo, Thais Del Rosario Hernandez, and Rosa Zhou for helpful conversations. Special thanks to Layla Oesper for her mentorship, support, and feedback.

Tomlinson and Oesper (Carleton College) Tumor Phylogeny Inference

  • Dec. 4, 2018

24 / 24