CIMICE: Markov Chain Inference Method to Identify Cancer Evolution - - PowerPoint PPT Presentation

cimice markov chain inference method to identify cancer
SMART_READER_LITE
LIVE PREVIEW

CIMICE: Markov Chain Inference Method to Identify Cancer Evolution - - PowerPoint PPT Presentation

Motivation Methods Results CIMICE: Markov Chain Inference Method to Identify Cancer Evolution Nicol` o Rossi Nicola Gigante Carla Piazza Nicola Vitacolonna Dip. di Scienze Matematiche, Informatiche e Fisiche - UniUD 1 / 17 Motivation


slide-1
SLIDE 1

Motivation Methods Results

CIMICE: Markov Chain Inference Method to Identify Cancer Evolution

Nicol`

  • Rossi

Nicola Gigante Carla Piazza Nicola Vitacolonna

  • Dip. di Scienze Matematiche, Informatiche e Fisiche - UniUD

1 / 17

slide-2
SLIDE 2

Motivation Methods Results

Premises

The Context

◮ Investigation on the mutational history of a cancer cell ◮ Relying on single cell data at a single time instant ◮ For the reconstruction a probabilistic model

The Aim

◮ Identification of a minimal set of assumptions on models/data ◮ Detection of the sources of uncertainty in the reconstruction ◮ Provision of suggestions for further experiments

2 / 17

slide-3
SLIDE 3

Motivation Methods Results

Premises

The Context

◮ Investigation on the mutational history of a cancer cell ◮ Relying on single cell data at a single time instant ◮ For the reconstruction a probabilistic model

The Aim

◮ Identification of a minimal set of assumptions on models/data ◮ Detection of the sources of uncertainty in the reconstruction ◮ Provision of suggestions for further experiments

2 / 17

slide-4
SLIDE 4

Motivation Methods Results

Our Results

Model Reconstruction

We find a minimal set of assumptions such that:

◮ without convergent evolutionary paths

there is one probabilistic underlying model CIMICE infers it in efficiently w.r.t. the data size

◮ with convergent evolutionary paths

there is an infinite set of possible models CIMICE heuristically assigns weights on convergences to pick a preferred one

3 / 17

slide-5
SLIDE 5

Motivation Methods Results

Our Results

Model Reconstruction

We find a minimal set of assumptions such that:

◮ without convergent evolutionary paths

there is one probabilistic underlying model CIMICE infers it in efficiently w.r.t. the data size

◮ with convergent evolutionary paths

there is an infinite set of possible models CIMICE heuristically assigns weights on convergences to pick a preferred one

0.2 0.8 0.3 0.7 1 1

3 / 17

slide-6
SLIDE 6

Motivation Methods Results

Our Results

Model Reconstruction

We find a minimal set of assumptions such that:

◮ without convergent evolutionary paths

there is one probabilistic underlying model CIMICE infers it in efficiently w.r.t. the data size

◮ with convergent evolutionary paths

there is an infinite set of possible models CIMICE heuristically assigns weights on convergences to pick a preferred one

0.2 0.8 0.3 0.7 1 1

3 / 17

slide-7
SLIDE 7

Motivation Methods Results

Our Results

Generative Models

Whenever our biological assumptions are reasonable, CIMICE produces synthetic data to

◮ generate more data from an inferred model ◮ test different model inference methods

0.2 0.8 0.2 0.6 0.8 0.7 0.2 0.3 1 0.2 1 4 / 17

slide-8
SLIDE 8

Motivation Methods Results

Plan of the Talk

  • 1. Single Cell Data
  • 2. From Biological Assumptions to Models
  • 3. Two Reconstruction Problems
  • 4. Our Inference Algorithm
  • 5. CIMICE Tool
  • 6. Synthetic Models and Tests
  • 7. Real Data
  • 8. Conclusion

5 / 17

slide-9
SLIDE 9

Motivation Methods Results

Single Cell Data

6 / 17

slide-10
SLIDE 10

Motivation Methods Results

Single Cell Data

6 / 17

slide-11
SLIDE 11

Motivation Methods Results

From Biological Assumptions to Models

Cancer Progression Markov Chains (CPMC)

Biological Assumptions

∪ Along cancer progression cells accumulate gene mutations ∅ Initially there are no mutated genes MC New mutations probabilistically depends only on the current genotype ▽

/

Each time a minimal number of mutations is acquired

Models: CPMC

The states are the genotypes

The healthy genotype is the source

It is a Discrete Time Markov Chain

It is acyclic and anti-transitive

0.2 0.8 0.2 0.6 0.8 0.7 0.2 0.3 1 0.2 1 7 / 17

slide-12
SLIDE 12

Motivation Methods Results

From Biological Assumptions to Models

Cancer Progression Markov Chains (CPMC)

Biological Assumptions

∪ Along cancer progression cells accumulate gene mutations ∅ Initially there are no mutated genes MC New mutations probabilistically depends only on the current genotype ▽

/

Each time a minimal number of mutations is acquired

Models: CPMC

The states are the genotypes

The healthy genotype is the source

It is a Discrete Time Markov Chain

It is acyclic and anti-transitive

0.2 0.8 0.2 0.6 0.8 0.7 0.2 0.3 1 0.2 1 7 / 17

slide-13
SLIDE 13

Motivation Methods Results

From Biological Assumptions to Models

Cancer Progression Markov Chains (CPMC)

Biological Assumptions

∪ Along cancer progression cells accumulate gene mutations ∅ Initially there are no mutated genes MC New mutations probabilistically depends only on the current genotype ▽

/

Each time a minimal number of mutations is acquired

Models: CPMC

The states are the genotypes

The healthy genotype is the source

It is a Discrete Time Markov Chain

It is acyclic and anti-transitive

0.2 0.8 0.2 0.6 0.8 0.7 0.2 0.3 1 0.2 1 7 / 17

slide-14
SLIDE 14

Motivation Methods Results

From Biological Assumptions to Models

Cancer Progression Markov Chains (CPMC)

Biological Assumptions

∪ Along cancer progression cells accumulate gene mutations ∅ Initially there are no mutated genes MC New mutations probabilistically depends only on the current genotype ▽

/

Each time a minimal number of mutations is acquired

Models: CPMC

The states are the genotypes

The healthy genotype is the source

It is a Discrete Time Markov Chain

It is acyclic and anti-transitive

0.2 0.8 0.2 0.6 0.8 0.7 0.2 0.3 1 0.2 1 7 / 17

slide-15
SLIDE 15

Motivation Methods Results

From Biological Assumptions to Models

Cancer Progression Markov Chains (CPMC)

Biological Assumptions

∪ Along cancer progression cells accumulate gene mutations ∅ Initially there are no mutated genes MC New mutations probabilistically depends only on the current genotype ▽

/

Each time a minimal number of mutations is acquired

Models: CPMC

The states are the genotypes

The healthy genotype is the source

It is a Discrete Time Markov Chain

It is acyclic and anti-transitive

0.2 0.8 0.2 0.6 0.8 0.7 0.2 0.3 1 0.2 1 7 / 17

slide-16
SLIDE 16

Motivation Methods Results

From Biological Assumptions to Models

Cancer Progression Markov Chains (CPMC)

Biological Assumptions

∪ Along cancer progression cells accumulate gene mutations ∅ Initially there are no mutated genes MC New mutations probabilistically depends only on the current genotype ▽

/

Each time a minimal number of mutations is acquired

Models: CPMC

The states are the genotypes

The healthy genotype is the source

It is a Discrete Time Markov Chain

It is acyclic and anti-transitive

0.2 0.8 0.2 0.6 0.8 0.7 0.2 0.3 1 0.2 1 7 / 17

slide-17
SLIDE 17

Motivation Methods Results

What’s new?

Main differences w.r.t. literature

We propose a method that:

◮ refers to single cell DNA-seq data ◮ assumes clean and rich data ◮ points out where more knowledge is needed ◮ exploits a deterministic approach, which can be extended to

include statistical methods and expert knowledge

◮ is patient-driven and suitable to study treatment effects

8 / 17

slide-18
SLIDE 18

Motivation Methods Results

Two Reconstruction Problems

Single Time Input:

Di a single cell data matrix

Many Time Input:

D0, D1, . . . Dt a time sequence of matrices ⇓

Output:

A CPMC whose simulation would generate the data In such CPMC time ticks at each mutational event (no self-loops)

Remark

◮ It is not a standard Markov Chain reconstruction problem ◮ There can be infinitely many solutions

Appendix 9 / 17

slide-19
SLIDE 19

Motivation Methods Results

Two Reconstruction Problems

Single Time Input:

Di a single cell data matrix

Many Time Input:

D0, D1, . . . Dt a time sequence of matrices ⇓

Output:

A CPMC whose simulation would generate the data In such CPMC time ticks at each mutational event (no self-loops)

Remark

◮ It is not a standard Markov Chain reconstruction problem ◮ There can be infinitely many solutions

Appendix 9 / 17

slide-20
SLIDE 20

Motivation Methods Results

Our Inference Algorithm

Fundamentals

◮ the topology is directly induced by ∪ and ▽

/

◮ without convergencies, the probabilities can be directly

computed thanks to the topology, MC, and Bayes’ theorem

◮ with convergent paths, we exploit heuristics to estimate

backward probabilities

Appendix

Convergency Ambiguities

s1 1 s2 1 s3 1 1 s4 1 1 = ⇒

?? ??

10 / 17

slide-21
SLIDE 21

Motivation Methods Results

Our Inference Algorithm

Fundamentals

◮ the topology is directly induced by ∪ and ▽

/

◮ without convergencies, the probabilities can be directly

computed thanks to the topology, MC, and Bayes’ theorem

◮ with convergent paths, we exploit heuristics to estimate

backward probabilities

Appendix

Convergency Ambiguities

s1 1 s2 1 s3 1 1 s4 1 1 = ⇒

?? ??

10 / 17

slide-22
SLIDE 22

Motivation Methods Results

CIMICE Tool

s1 1 s2 1 s3 1 s4 1 1 s5 1 1 s6 1 1 s7 1 1 1 s8 . . . . . . . . . . . . s9 . . . . . . . . . . . . . . . ∅ ∅ ∅ Probabilities computation Confluences resolution 0.2 0.3 1 0.8 0.7 1 Elaboration Input Output

https://github.com/redsnic/tumorEvolutionWithMarkovChains

11 / 17

slide-23
SLIDE 23

Motivation Methods Results

Synthetic Models and Tests

Random walks on a given CPMC can be used to either:

◮ generate more data on a specific case or ◮ generate the genotypes of the “artificial” dataset and then

test our methodology

12 / 17

slide-24
SLIDE 24

Motivation Methods Results

Real Data

Present situation

◮ small single cell datasets are available ◮ in Ogundijo and Wang, BMC Bioinf. 20:6(2019) a method for inferring

genotype proportions from is described

◮ their output is a suitable input for us

Chronic Lymphocytic Leukemia - Patient CLL077

∅ 1 0.973 0.027 1 ∅ 1 0.831 0.169 1 ∅ 1 0.429 0.571 1 Before treatments Before ofatumumab Nine months after treatment

Drugs: chlorambuci, fludarabine and cyclophosphamide, ofatumumab

13 / 17

slide-25
SLIDE 25

Motivation Methods Results

Conclusions

◮ We proposed a method for tumor phylogeny reconstruction

using DTMCs

◮ The reconstruction algorithm is time efficient, can easily be

enhanced with statistical methods, and it is suitable for very large datasets

◮ We are planning to include treatments information directly in

the model

◮ It is possible to use the artificial data produced by CIMICE to

test similar tools

14 / 17

slide-26
SLIDE 26

Motivation Methods Results

Related Works

◮ The evolution of tumour phylogenetics: principles and practice

  • R. Schwartz and A. A. Sch¨

affer, Nature Reviews Genetics 18:213-229, 2017

◮ Learning mutational graphs of individual tumour evolution

from single-cell and multi-region sequencing data D. Ramazzotti, A. Graudenzi, L. De Sano, M. Antoniotti, and G. Caravagna, BMC Bioinformatics 20:210, 2019

◮ SeqClone: sequential Monte Carlo based inference of tumor

subclones Oyetunji E. Ogundijo and Xiaodong Wang, BMC Bioinformatics 20:6, 2019

15 / 17

slide-27
SLIDE 27

Appendix

Infinitely Many Solutions

0.4 0.4 0.4 0.4 0.6 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.2 0.2 0.7 0.6

0.2 1 1 1

∅ D1 0.2 0.4 0.4 D2 0.04 0.08 0.08 0.28 0.12 0.4 D3 0.008 0.016 0.016 0.336 0.144 0.48 D4 0.0016 0.0032 0.0032 0.3472 0.1488 0.496

Back to Presentation 16 / 17

slide-28
SLIDE 28

Appendix

Technical Details

◮ the data are P[X(k) = S] for each S genotype ◮ k is unknown but large enough ◮ the probability of the edge (S, T) on the chain with self-lops is

P(S, T) = P[X(k) = T|X(k − 1) = S] does not depend on k

◮ we are interested in the probability of the edge (S, T) on the chain

without self-loops JP(S, T) = P(S, T) norm(S)

Back to Presentation 17 / 17

slide-29
SLIDE 29

Appendix

Technical Details

◮ by acyclicity and anti-transitivity of CPMCs

P(S, T) = k

i=1 P[fi(T)] ∗ P[X(i − 1) = S|fi(T)]

k−1

i=0 P[X(i) = S]

◮ which without self-loops become

JP(S, T) = k

i=1 P[fi(T)] ∗ P[X(i − 1) = S|fi(T)]

norm(S)

◮ in the case without convergencies P[X(i − 1) = S|fi(T)] = 1 and

k

  • i=1

P[fi(T)] = P[X(≤ k) = T] = P[X(k) = T]+

  • Y ∈Adj−[T]

P[X(≤ k) = Y ]

Back to Presentation 17 / 17

slide-30
SLIDE 30

Appendix

Technical Details

◮ by acyclicity and anti-transitivity of CPMCs

P(S, T) = k

i=1 P[fi(T)] ∗ P[X(i − 1) = S|fi(T)]

k−1

i=0 P[X(i) = S]

◮ which without self-loops become

JP(S, T) = k

i=1 P[fi(T)] ∗ P[X(i − 1) = S|fi(T)]

norm(S)

◮ in the case with convergencies the heuristics extimate

P[X(i − 1) = S|fi(T)] and P[X(≤ k) = T]

Back to Presentation 17 / 17