Development of statistical methods for DNA copy number analysis in - - PowerPoint PPT Presentation

development of statistical methods for dna copy number
SMART_READER_LITE
LIVE PREVIEW

Development of statistical methods for DNA copy number analysis in - - PowerPoint PPT Presentation

Development of statistical methods for DNA copy number analysis in cancerology Morgane Pierre-Jean Supervisors : Catherine Matias and Pierre Neuvial Laboratoire de Mathmatique et de Modlisation dEvry, LaMME December 2nd, 2016


slide-1
SLIDE 1

Development of statistical methods for DNA copy number analysis in cancerology

Morgane Pierre-Jean

Supervisors : Catherine Matias and Pierre Neuvial

Laboratoire de Mathématique et de Modélisation d’Evry, LaMME

December 2nd, 2016

slide-2
SLIDE 2

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Outline

1

Introduction

2

Segmentation

3

Heterogeneity Model

4

Simulations

5

Application to real data sets

6

Conclusion

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 2/ 55

slide-3
SLIDE 3

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Outline

1

Introduction Alterations in tumor cells Notion of Heterogeneity

2

Segmentation

3

Heterogeneity Model

4

Simulations

5

Application to real data sets

6

Conclusion

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 3/ 55

slide-4
SLIDE 4

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Alterations in tumor cells

Objectives

Alterations in tumor cells can be observed at several levels

Gene expression DNA structure Mutations DNA copy number

Why study genetic alterations in cancers ?

Help to diagnosis Identify biomarkers linked to drug resistance Personalized treatments

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 4/ 55

slide-5
SLIDE 5

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Alterations in tumor cells

Objectives

Alterations in tumor cells can be observed at several levels

Gene expression DNA structure Mutations DNA copy number

Why study genetic alterations in cancers ?

Help to diagnosis Identify biomarkers linked to drug resistance Personalized treatments

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 4/ 55

slide-6
SLIDE 6

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Alterations in tumor cells

Illustration of alterations at level of DNA copy number

) ) )

Matched Normal (diploid) Tumor with gain Tumor with deletion

B B B B A A AB BB AB

(diploid)

B B BB BB A A ABB BBB AB

with gain

  • BB

BB

  • A

BB BB A

with deletion copy-neutral LOH

A A AA A A AA A A AA Morgane Pierre-Jean Development of statistical methods for DNA copy number data 5/ 55

slide-7
SLIDE 7

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Alterations in tumor cells

Human Karyotype

(a) Normal cell (b) Tumor cell

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 6/ 55

slide-8
SLIDE 8

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Alterations in tumor cells

How to measure DNA copy number more precisely ?

CGH arrays (measuring total DNA copy number) SNP arrays (measuring quantity of alleles for predefined SNPs) Sequencing technologies (WGS or WES)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 7/ 55

slide-9
SLIDE 9

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Alterations in tumor cells

What kind of signals from SNPs arrays ?

Total copy number cj = NA

j + NB j

B allele fraction bj =

NB

j

cj

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 8/ 55

slide-10
SLIDE 10

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Alterations in tumor cells

What kind of signals from SNPs arrays ?

Total copy number cj = NA

j + NB j

B allele fraction bj =

NB

j

cj

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 8/ 55

slide-11
SLIDE 11

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Notion of Heterogeneity

Notion of heterogeneity in cancers

Differences between tumors of the same disease in different patients (inter-tumor heterogeneity) Differences between cancer cells within a single tumor of one patient (intra-tumor heterogeneity).

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 9/ 55

slide-12
SLIDE 12

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Notion of Heterogeneity

Heterogeneity illustration

(a) Tumor sample (b) Copy-number profile

= 0.6× ( ) + 0.2× ( ) + 0.2× ( )

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 10/ 55

slide-13
SLIDE 13

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Notion of Heterogeneity

Heterogeneity illustration

(a) Tumor sample (b) Copy-number profile

= 0.6× ( ) + 0 × ( ) + 0.4× ( )

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 11/ 55

slide-14
SLIDE 14

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Notion of Heterogeneity

Mathematical modelization

y1• ∈ RJ and y2• ∈ RJ the observed DNA copy number profiles y1• = w11z1• + w12z2• + w13z3• y2• = w21z1• + w22z2• + w23z3•

= 0.6× ( ) + 0.2× ( ) + 0.2× ( ) = 0.6× ( ) + 0 × ( ) + 0.4× ( )

Find w and z for the two profiles

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 12/ 55

slide-15
SLIDE 15

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Notion of Heterogeneity

Mathematical modelization

y1• ∈ RJ and y2• ∈ RJ the observed DNA copy number profiles y1• = w11z1• + w12z2• + w13z3• y2• = w21z1• + w22z2• + w23z3•

= 0.6× ( ) + 0.2× ( ) + 0.2× ( ) = 0.6× ( ) + 0 × ( ) + 0.4× ( )

Find w and z for the two profiles

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 12/ 55

slide-16
SLIDE 16

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Notion of Heterogeneity

General mathematical modelization

Let yi• ∈ RJ the observed DNA copy number profiles yi• =

p

  • k=1

wikzk• + ǫ Latent profiles assumed to be shared between the observed profiles Minimize

n

  • i=1

yi• −

p

  • k=1

wikzk•2 under some constraints.

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 13/ 55

slide-17
SLIDE 17

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Notion of Heterogeneity

Related works

Matrix Factorization problem min

W ,Z Y − WZ2 F

Penalized latent models to infer heterogeneity

Fused Lasso latent model FLlat (Nowak et al., 2011) CGH analysis with Dictionary Learning e-FLlat (Masecchia et al., 2013) Evolutionary history by next-generation sequencing Canopy (Jiang et al., 2016)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 14/ 55

slide-18
SLIDE 18

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Notion of Heterogeneity

InCaSCN- Inferring Cancer Subclone using Copy Number

Features of method

joint segmentation of all n profiles ⇒ S − 1 breakpoints (Pierre-Jean et al., Briefings in Bionformatics, 2015) Integration of B allele fraction information by using transformations Biological interpretation of constraints on latent profiles of TCN and BAF and weight matrix W

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 15/ 55

slide-19
SLIDE 19

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Outline

1

Introduction

2

Segmentation Models Recursive Binary Segmentation for multiple samples

3

Heterogeneity Model

4

Simulations

5

Application to real data sets

6

Conclusion

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 16/ 55

slide-20
SLIDE 20

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

What is segmentation ?

Total copy number B allele fraction cj = NA

j + NB j

bj =

NB

j

cj

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 17/ 55

slide-21
SLIDE 21

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

What is segmentation ?

Total copy number B allele fraction Decrease of Heterozygosity cj = NA

j + NB j

bj =

NB

j

cj

dj = 2 × |bj − 1

2|

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 17/ 55

slide-22
SLIDE 22

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

What is segmentation ?

Total copy number B allele fraction Decrease of Heterozygosity cj = NA

j + NB j

bj =

NB

j

cj

dj = 2 × |bj − 1

2|

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 17/ 55

slide-23
SLIDE 23

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

What is segmentation ?

Total copy number B allele fraction Decrease of Heterozygosity cj = NA

j + NB j

bj =

NB

j

cj

dj = 2 × |bj − 1

2|

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 17/ 55

slide-24
SLIDE 24

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Models

Segmentation methods

Multiple change-point Recursive Total variation Hidden Markov Models Kernel methods

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 18/ 55

slide-25
SLIDE 25

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Models

Segmentation methods

Multiple change-point Recursive Total variation Hidden Markov Models Kernel methods

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 18/ 55

slide-26
SLIDE 26

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Models

Segmentation methods

Multiple change-point Recursive

Joint segmentation

Total variation Hidden Markov Models Kernel methods

Change-point detection in whole distribution

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 18/ 55

slide-27
SLIDE 27

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Models

A change-point model

Biological assumption : DNA copy number signal is piecewise constant in the mean Statistical model for S − 1 change points at (t1, ...tS−1) : ∀j = 1, . . . , J cj = γj + ǫj where ∀s ∈ {1, . . . , S} , ∀j ∈ [tS−1, tS[ γj = Γs

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 19/ 55

slide-28
SLIDE 28

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Models

A change-point model

Biological assumption : DNA copy number signal is piecewise constant in the mean Statistical model for S − 1 change points at (t1, ...tS−1) : ∀j = 1, . . . , J cj = γj + ǫj where ∀s ∈ {1, . . . , S} , ∀j ∈ [tS−1, tS[ γj = Γs

Complexity

Challenges : S and (t1, ...tS−1) are unknown For a fixed S, the number of possible partitions : C S−1

J−1 = O(JS−2)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 19/ 55

slide-29
SLIDE 29

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

Two-step approaches for joint segmentation

Gey and Lebarbier (2008) and Vert and Bleakley (2010) First step : Running a fast but approximate segmentation method (RBS) Second step Pruning the final set of breakpoints using dynamic programming that is slower but exact

Versatility of RBS

Possibility to have different scales TCN-DoH segmentation Several TCN signals Several TCN-DoH signals

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 20/ 55

slide-30
SLIDE 30

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

Two-step approaches for joint segmentation

Gey and Lebarbier (2008) and Vert and Bleakley (2010) First step : Running a fast but approximate segmentation method (RBS) Second step Pruning the final set of breakpoints using dynamic programming that is slower but exact

Versatility of RBS

Possibility to have different scales TCN-DoH segmentation Several TCN signals Several TCN-DoH signals

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 20/ 55

slide-31
SLIDE 31

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

Binary Segmentation

Take the simple case : dimension is equal to 1 (d = 1) :

H0 : No breakpoint vs H1 : Exactly one breakpoint

The likelihood ratio statistic is given by max1≤j≤J |Zj| Zj = Sj

j − SJ−Sj J−j

  • 1

j + 1 J−j

, (1) And Sj =

1≤t≤j cj

If (d > 1) : the likelihood ratio statistic becomes max1≤j≤J Zj2

2

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 21/ 55

slide-32
SLIDE 32

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

First step : Recursive Binary Segmentation (RBS)

Complexity : O(dJlog(S)) First breakpoint For each j : we compute Zj : t1 = arg max1≤j≤J Zj2

2

  • 1.0

1.5 2.0 2.5 3.0 20 40 60

position TCN

  • −10

−5 5 10 15

  • 1.0

1.5 2.0 2.5 3.0 20 40 60

position TCN

  • −10

−5

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 22/ 55

slide-33
SLIDE 33

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

First step : Recursive Binary Segmentation (RBS)

Complexity : O(dJlog(S)) First breakpoint For each j : we compute Zj : t1 = arg max1≤j≤J Zj2

2

  • ● ● ● ● ● ● ● ● ● ●
  • ● ● ● ● ● ● ● ● ● ●

50 100 150 200 250 20 40 60

position Z

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 22/ 55

slide-34
SLIDE 34

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

First step : Recursive Binary Segmentation (RBS)

Second breakpoint :

max1≤j≤t1 Zj2

2

maxt1<j≤J Zj2

2

Compute RSE for each segment. Keep the RSE that yield the maximum gain Add the breakpoint to the active set

  • ● ●
  • ● ● ● ● ●
  • ● ● ● ● ● ● ● ● ● ● ●
  • ● ● ● ● ● ● ● ● ●
  • 100

200 300 400 20 40 60

position Z

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 23/ 55

slide-35
SLIDE 35

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

First step : Recursive Binary Segmentation (RBS)

Second breakpoint :

max1≤j≤t1 Zj2

2

maxt1<j≤J Zj2

2

Compute RSE for each segment. Keep the RSE that yield the maximum gain Add the breakpoint to the active set

  • ● ●
  • ● ● ● ● ●
  • ● ● ● ● ● ● ● ● ● ● ●
  • ● ● ● ● ● ● ● ● ●
  • 100

200 300 400 20 40 60

position Z

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 23/ 55

slide-36
SLIDE 36

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

First step : Recursive Binary Segmentation (RBS)

Third breakpoint :

max1≤j≤t1 Zj2

2

maxt1<j≤t2 Zj2

2

maxt2<j≤J Zj2

2

Compute RSE for each segment. Keep the RSE that yield the maximum gain Add the breakpoint to the active set

  • ● ● ● ● ● ● ● ● ●
  • ● ●
  • ● ●
  • ● ● ● ● ● ●
  • ● ● ● ● ● ● ● ● ●
  • ● ●
  • 50

100 150 20 40 60

position Z

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 24/ 55

slide-37
SLIDE 37

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

First step : Recursive Binary Segmentation (RBS)

Third breakpoint :

max1≤j≤t1 Zj2

2

maxt1<j≤t2 Zj2

2

maxt2<j≤J Zj2

2

Compute RSE for each segment. Keep the RSE that yield the maximum gain Add the breakpoint to the active set

  • ● ● ● ● ● ● ● ● ●
  • ● ●
  • ● ●
  • ● ● ● ● ● ●
  • ● ● ● ● ● ● ● ● ●
  • ● ●
  • 50

100 150 20 40 60

position Z

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 24/ 55

slide-38
SLIDE 38

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

First step : Recursive Binary Segmentation (RBS)

Third breakpoint :

max1≤j≤t1 Zj2

2

maxt1<j≤t2 Zj2

2

maxt2<j≤J Zj2

2

Compute RSE for each segment. Keep the RSE that yield the maximum gain Add the breakpoint to the active set

  • ● ● ●
  • ● ● ● ●
  • ● ●
  • ● ●
  • ● ● ●
  • ● ●

20 40 60 20 40 60

position Z

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 24/ 55

slide-39
SLIDE 39

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

First step : Recursive Binary Segmentation (RBS)

Third breakpoint :

max1≤j≤t1 Zj2

2

maxt1<j≤t2 Zj2

2

maxt2<j≤J Zj2

2

Compute RSE for each segment. Keep the RSE that yield the maximum gain Add the breakpoint to the active set

  • ● ● ●
  • ● ● ● ●
  • ● ●
  • ● ●
  • ● ● ●
  • ● ●

20 40 60 20 40 60

position Z

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 24/ 55

slide-40
SLIDE 40

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Recursive Binary Segmentation for multiple samples

Summary

Contributions to segmentation methods

Implementation of a fast joint segmentation followed by a

  • pruning. (jointseg package)

Kernel methods (preprint submitted to CSDA) Evaluation of performance (Pierre-Jean et al., Briefings in Bionformatics, 2015)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 25/ 55

slide-41
SLIDE 41

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Outline

1

Introduction

2

Segmentation

3

Heterogeneity Model BAF integration Model Algorithm Model selection

4

Simulations

5

Application to real data sets

6

Conclusion

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 26/ 55

slide-42
SLIDE 42

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion BAF integration

Integrating BAF through Parental copy numbers

What is parental copy number ? dj = 2|bj − 1/2| for AB SNPs

Minor copy number

c1

j = cj(1 − dj)/2

Major copy number

c2

j = cj(1 + dj)/2

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 27/ 55

slide-43
SLIDE 43

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Model

Model on parental copy number

min

W ,Z 1,Z 2 Y 1 − WZ 12 F + λ1 p

  • k=1

S−1

  • s=1

|z1

k,s+1 − z1 k,s|

(2) Y 2 − WZ 22

F + λ2 p

  • k=1

S−1

  • s=1

|z2

k,s+1 − z2 k,s|

  • s. t wi• ∈ ∆p where

∆p =

  • w ∈ Rp

s.t. w ≥ 0 and p

k=1 wk = 1

  • Morgane Pierre-Jean

Development of statistical methods for DNA copy number data 28/ 55

slide-44
SLIDE 44

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Algorithm

Final algorithm

Algorithm 1 Find weights and latent profiles

1: Parameters : λ1, λ2 and p 2: INIT : Matrices Y ∈ Rn×S, Y 1 ∈ Rn×S and Y 2 ∈ Rn×S and

matrix Z 1

0 and Z 2 0 ∈ Rp×S, and

3: for l = 0, 1, 2, . . . do 4:

Minimize in W with Z 1

l and Z 2 l fixed

5:

Minimize in Z 1 with Wl fixed

6:

Minimize in Z 2 with Wl fixed

7:

Wl, Z 1

l and Z 2 l are updated

8:

Check if Wl−1 − Wl2

2 < ǫ or maxit is reached

9: end for

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 29/ 55

slide-45
SLIDE 45

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Algorithm

Final algorithm

Algorithm 2 Find weights and latent profiles

1: Parameters : λ1, λ2 and p 2: INIT : Matrices Y ∈ Rn×S, Y 1 ∈ Rn×S and Y 2 ∈ Rn×S and

matrix Z 1

0 and Z 2 0 ∈ Rp×S, and

3: for l = 0, 1, 2, . . . do 4:

Minimize in W with Z 1

l and Z 2 l fixed

5:

Minimize in Z 1 with Wl fixed

6:

Minimize in Z 2 with Wl fixed

7:

Wl, Z 1

l and Z 2 l are updated

8:

Check if Wl−1 − Wl2

2 < ǫ or maxit is reached

9: end for

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 29/ 55

slide-46
SLIDE 46

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Algorithm

Solving 4 : Inference of W

Weights of each patient can be treated independently Solve n least-squares problems with equality constraint plus inequality constraints for the non-negativity of the coefficient linear inverse problem that can be solved in R with the package limSolve.

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 30/ 55

slide-47
SLIDE 47

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Algorithm

Solving 5 and 6 : Inference of latent profiles

for a fixed W cut into two independent LASSO problems in (Z1, Z2) Use matrix algebra and properties of the vectorization operator Obtain LASSO problem that can be solved in R with the package glmnet.

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 31/ 55

slide-48
SLIDE 48

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Model selection

Choice of λ1 and λ2 values when p is fixed

Use a BIC criterion We search to minimize : (nS) × log

  • Y − ˆ

W ˆ Z2

F

nS

  • + k(Z) log(nS)

where k(Z T) is the number of breakpoints. This criterion helps to strike a balance between over-fit and under-fit models.

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 32/ 55

slide-49
SLIDE 49

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Model selection

Choice of p

Use the percentage of variation explained (PVE) for each p, where the PVE is defined as : PVEP = 1 − n

i=1

S

j=1

  • yij − p

k=1 ˆ

wik ˆ zkj 2 n

i=1

S

j=1 (yij − ¯

yi)2 where ¯ yi =

S

j=1 yij

S

.

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 33/ 55

slide-50
SLIDE 50

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Outline

1

Introduction

2

Segmentation

3

Heterogeneity Model

4

Simulations Generating data with known truth Framework

5

Application to real data sets

6

Conclusion

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 34/ 55

slide-51
SLIDE 51

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Generating data with known truth

Proposed approach

Step 1- Annotate a real data set

Loss of one copy (Chr18) Normal region (Chr21)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 35/ 55

slide-52
SLIDE 52

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Generating data with known truth

Proposed approach

Step 1- Annotate a real data set

Loss of one copy (Chr18) Normal region (Chr21)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 35/ 55

slide-53
SLIDE 53

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Generating data with known truth

Proposed approach

Step 2 - Synthetic data generation by resampling 100% tumor cells

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 36/ 55

slide-54
SLIDE 54

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Generating data with known truth

Proposed approach

Step 2 - Synthetic data generation by resampling 79% tumor cells

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 36/ 55

slide-55
SLIDE 55

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Generating data with known truth

Proposed approach

Step 2 - Synthetic data generation by resampling 50% tumor cells

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 36/ 55

slide-56
SLIDE 56

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Generating data with known truth

Summary

Advantages

1 More realistic noise Hocking et al. (2013) 2 SNR is controlled with the proportion of tumor cells Staaf

et al. (2008); Rasmussen et al. (2011)

3 Variety of simulated profiles Willenbrock and Fridlyand (2005) 4 True and false positive evaluation Hocking et al. (2013)

Application

1 Performance of segmentation methods 2 Evaluation of heterogeneity model Morgane Pierre-Jean Development of statistical methods for DNA copy number data 37/ 55

slide-57
SLIDE 57

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Framework

Characteristics

100 data sets simulated 30 tumor samples and 5 latent profiles based on realistic simulation framework Each matrix W is different for the 100 data sets

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 38/ 55

slide-58
SLIDE 58

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Framework

Simulated latent profiles

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 39/ 55

slide-59
SLIDE 59

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Framework

Performance evaluation

We compared performance of three methods : InCaSCN on parental copy number profiles InCaSCN on total copy number profiles FLLAT on total copy number profiles (Nowak et al., 2011)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 40/ 55

slide-60
SLIDE 60

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Framework

Better estimation and interpretation of weights by using InCaSCN

  • 0.0

0.2 0.4 0.6 FLLAT−TCN InCaSCN−TCN InCaSCN−C1C2

Loss

method FLLAT−TCN InCaSCN−TCN InCaSCN−C1C2

  • 0.75

0.80 0.85 0.90 0.95 1.00 FLLAT−TCN InCaSCN−TCN InCaSCN−C1C2

Rand Index

method FLLAT−TCN InCaSCN−TCN InCaSCN−C1C2

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 41/ 55

slide-61
SLIDE 61

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Framework

Inferred latent profiles from InCaSCN recover the true alterations

Evaluation

Characterize each region as normal or altered for latent profiles AUC close to 1 : altered regions have been recovered with a few number of mistakes

  • 0.25

0.50 0.75 1.00 FLLAT−TCN InCaSCN−TCN InCaSCN−C1C2

AUC

method FLLAT−TCN InCaSCN−TCN InCaSCN−C1C2

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 42/ 55

slide-62
SLIDE 62

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Framework

Conclusion

InCaSCN enables to recover both :

simulated latent profiles weights with a small error

Results on simulation are very promising for the application to real data sets.

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 43/ 55

slide-63
SLIDE 63

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Outline

1

Introduction

2

Segmentation

3

Heterogeneity Model

4

Simulations

5

Application to real data sets Inter-tumoral heterogeneity application Intra-tumoral heterogeneity application

6

Conclusion

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 44/ 55

slide-64
SLIDE 64

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Inter-tumoral heterogeneity application

Collaboration with Institut Curie

Fabien Reyal’s team (RT2 : Residual Tumor and Response to Treatment) Triple-negative breast cancer (TNBC)

16 patients Micro-biopsy of the Primary Tumor at diagnosis Neo-adjuvant chemotherapy before surgery Primary Tumor size reduced but incomplete –> Residual

10 patients with Primary Tumor and Residual samples 6 patients with an additional metastasis Lymph Node sample

Whole exome sequencing data RNAseq data

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 45/ 55

slide-65
SLIDE 65

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Inter-tumoral heterogeneity application

Collaboration with Institut Curie

Fabien Reyal’s team (RT2 : Residual Tumor and Response to Treatment) Triple-negative breast cancer (TNBC)

16 patients Micro-biopsy of the Primary Tumor at diagnosis Neo-adjuvant chemotherapy before surgery Primary Tumor size reduced but incomplete –> Residual

10 patients with Primary Tumor and Residual samples 6 patients with an additional metastasis Lymph Node sample

Whole exome sequencing data RNAseq data

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 45/ 55

slide-66
SLIDE 66

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Inter-tumoral heterogeneity application

Collaboration with Institut Curie

Fabien Reyal’s team (RT2 : Residual Tumor and Response to Treatment) Triple-negative breast cancer (TNBC)

16 patients Micro-biopsy of the Primary Tumor at diagnosis Neo-adjuvant chemotherapy before surgery Primary Tumor size reduced but incomplete –> Residual

10 patients with Primary Tumor and Residual samples 6 patients with an additional metastasis Lymph Node sample

Whole exome sequencing data RNAseq data

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 45/ 55

slide-67
SLIDE 67

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Inter-tumoral heterogeneity application

Collaboration with Institut Curie

Fabien Reyal’s team (RT2 : Residual Tumor and Response to Treatment) Triple-negative breast cancer (TNBC)

16 patients Micro-biopsy of the Primary Tumor at diagnosis Neo-adjuvant chemotherapy before surgery Primary Tumor size reduced but incomplete –> Residual

10 patients with Primary Tumor and Residual samples 6 patients with an additional metastasis Lymph Node sample

Whole exome sequencing data RNAseq data

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 45/ 55

slide-68
SLIDE 68

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Inter-tumoral heterogeneity application

Results

time point patient Subclone a Subclone b Subclone c Subclone d Subclone e Subclone f Subclone g Subclone h Subclone i Subclone j Subclone k Subclone l Subclone m

PT_patient56 RES_patient56 LN_patient56 PT_patient36 RES_patient36 LN_patient36 RES_patient43 LN_patient43 PT_patient35 RES_patient35 PT_patient29 RES_patient29 PT_patient1 RES_patient1 PT_patient50 LN_patient40 PT_patient40 PT_patient45 RES_patient45 PT_patient27 RES_patient27 RES_patient40 PT_patient34 RES_patient50 PT_patient32 RES_patient32 RES_patient34 LN_patient34

0.12 0.88 1 1 0.32 0.64 0.01 0.03 1 1 1 0.1 0.9 1 0.17 0.83 0.18 0.81 1 0.21 0.01 0.01 0.78 1 0.46 0.02 0.1 0.01 0.03 0.37 0.02 0.59 0.38 0.01 0.64 0.01 0.01 0.03 0.01 0.03 0.24 0.02 0.03 0.55 0.09 0.04 0.04 0.03 0.12 0.05 0.05 0.01 0.5 0.08 0.01 0.13 0.02 0.14 0.11 0.33 0.67 1 1 1 1 1 1 1 0.02 0.98

0.2 0.4 0.6 0.8 1 percentage of clone in sample

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 46/ 55

slide-69
SLIDE 69

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Inter-tumoral heterogeneity application

Conclusion on the application

Only one latent profile (subclone B) common across the patients Patients are mainly grouped together For two patients (40 and 50), it seems that the resistant clone is already present in PT and becomes largely predominant in RES Same results from RNAseq analysis (B. Sadacca)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 47/ 55

slide-70
SLIDE 70

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Intra-tumoral heterogeneity application

Collaboration with UCSF

Henrik Bengtsson and Joe Costello Glioblastoma

96 patients Primary Tumor samples Recurrence 1 with several samples Sometimes Recurrence 2 with several samples

Whole exome sequencing data Preprocessing with sequenza

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 48/ 55

slide-71
SLIDE 71

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Intra-tumoral heterogeneity application

Results

Subclone a Subclone b Subclone c Subclone d Subclone e A08295 A24709 Z00005 Z00006 Z00233 Z00234

0.4 0.59 1 0.86 0.14 0.79 0.21 0.16 0.84 0.13 0.72 0.16 0.2 0.4 0.6 0.8 percentage of clone in sample

Primary Recurrence1 Recurrence2

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 49/ 55

slide-72
SLIDE 72

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion Intra-tumoral heterogeneity application

Conclusions

Conclusions

One resistant subclone already present in PT New cancer in Recurrence 2

Conclusions on the model

Fast and efficient algorithm Application to other data sets Similar results than the model that uses mutations

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 50/ 55

slide-73
SLIDE 73

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Outline

1

Introduction

2

Segmentation

3

Heterogeneity Model

4

Simulations

5

Application to real data sets

6

Conclusion

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 51/ 55

slide-74
SLIDE 74

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Contributions

Segmentation Methods Realistic simulation framework Performance of segmentation methods Heterogeneity Bioinformatic pipelines under several R packages

jointseg acnr InCaSCN

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 52/ 55

slide-75
SLIDE 75

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Contributions

Segmentation Methods Realistic simulation framework Performance of segmentation methods Heterogeneity Bioinformatic pipelines under several R packages

jointseg acnr InCaSCN

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 52/ 55

slide-76
SLIDE 76

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Perspectives

Exploring DNA copy number latent profiles Link to clinical outcomes Discover biomarkers Collaboration with UCSF

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 53/ 55

slide-77
SLIDE 77

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

Thank you for your attention

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 54/ 55

slide-78
SLIDE 78

Introduction Segmentation Heterogeneity Model Simulations Application Conclusion

  • S. Gey and E. Lebarbier. Using CART to detect multiple change points in the mean for large sample.

Technical report, Statistics for Systems Biology research group, 2008.

  • T. Hocking, G. Schleiermacher, I. Janoueix-Lerosey, V. Boeva, J. Cappo, O. Delattre, F. Bach, and J.-P.
  • Vert. Learning smoothing models of copy number profiles using breakpoint annotations. BMC

Bioinformatics, 14(1) :164, 2013.

  • Y. Jiang, Y. Qiu, A. J. Minn, and N. R. Zhang. Assessing intratumor heterogeneity and tracking

longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proceedings of the National Academy of Sciences, 113(37) :E5528–E5537, 2016. doi : 10.1073/pnas.1522203113. URL http://www.pnas.org/content/113/37/E5528.abstract.

  • S. Masecchia, S. Salzo, A. Barla, and A. Verri. A dictionary learning based method for acgh
  • segmentation. In Proceedings of the European Symposium on Artificial Neural Networks, 2013.
  • G. Nowak, T. Hastie, J. R. Pollack, and R. Tibshirani. A fused lasso latent feature model for analyzing

multi-sample acgh data. Biostatistics, page kxr012, 2011.

  • M. Rasmussen, M. Sundström, H. Göransson Kultima, J. Botling, and et al. Allele-specific copy number

analysis of tumor samples with aneuploidy and tumor heterogeneity. Genome Biol, 12(10) :R108,

  • Oct. 2011.
  • J. Staaf, D. Lindgren, J. Vallon-Christersson, A. Isaksson, and et al. Segmentation-based detection of

allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol, 9(9) :R136, Oct. 2008. J.-P. Vert and K. Bleakley. Fast detection of multiple change-points shared by many signals using group

  • LARS. Advances in Neural Information Processing Systems, 23 :2343–2351, 2010.
  • H. Willenbrock and J. Fridlyand. A comparison study : applying segmentation to array-CGH data for

downstream analyses. Bioinformatics, 21(22) :4084–91, Nov 2005. doi : 10.1093/bioinformatics/bti677. Morgane Pierre-Jean Development of statistical methods for DNA copy number data 55/ 55

slide-79
SLIDE 79

Selection of number of latent profiles

  • 0.900

0.925 0.950 0.975 1.000 2 3 4 5 6

Number of latent profiles PVE

HSGOC

  • 0.6

0.7 0.8 0.9 1.0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Number of latent profiles PVE

TNBC

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 56/ 55

slide-80
SLIDE 80

Ovarian cancer application

Intra-tumoral heterogeneity

Public data set High serious grade ovarian cancer (HSGOC) Quantify heterogeneity Reconstruct tumor evolution

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 57/ 55

slide-81
SLIDE 81

Ovarian cancer application

Results

We focused on one patient with 11 samples

Ovary (Biopsy) Omentum Ascites (relapse)

We select a model with 4 latent profiles

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 58/ 55

slide-82
SLIDE 82

Ovarian cancer application

Results : Weight matrix

Tissue Time point

3 1 2 4 Om_S7 Om_S8 Om_S6 Om_S2 Om_S3 Om_S4 Ov_B1 Ascites_R1 Om_S1 Om_S5 Om_B1

0.2 0.4 0.6 0.8 1

Value

Color Key

tissue: omentum tissue: surface of ovary tissue: ascites timepoint: interval debulking surgery timepoint: laparoscopic biopsy timepoint: relapse

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 59/ 55

slide-83
SLIDE 83

Ovarian cancer application

Conclusions an Perspectives

One clone seems to be not resistant to the drug (latent profile 3) There may exist only one resistant clone to the drugs that led to a relapse (latent profile 4) exploring if there are not known genes that can be responsible for the resistance

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 60/ 55

slide-84
SLIDE 84

Kidney cancer application

Spatial Intra-tumoral heterogeneity

Public data set Kidney cancer Several patients with several samples at various location.

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 61/ 55

slide-85
SLIDE 85

Kidney cancer application

Kidney cancer application

Subclone a Subclone b Subclone c Subclone d Subclone e Subclone f Subclone g

R5 R3 R9 R4 R1 R2 R10 R7 R6 R11 R12 R8 R13

0.03 0.84 0.06 0.07 0.08 0.2 0.56 0.04 0.07 0.06 0.11 0.66 0.15 0.02 0.06 0.73 0.01 0.02 0.24 0.48 0.02 0.18 0.05 0.21 0.05 0.52 0.14 0.19 0.07 0.08 0.01 0.23 0.06 0.1 0.03 0.57 0.73 0.27 0.32 0.24 0.11 0.32 0.23 0.09 0.45 0.22 0.23 0.05 0.31 0.15 0.26 0.06 0.93 0.94 0.05 0.01 0.2 0.4 0.6 0.8 percentage of clone in sample

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 62/ 55

slide-86
SLIDE 86

Kidney cancer application

Sequencing information

Illumina Hi-Seq 2500 pair-end aligned on hg19 Depth : WEG : 100x bwa for alignement (soft clapping remove head and tail and map on the middle) reads sizes reads : 100 bases

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 63/ 55

slide-87
SLIDE 87

Kidney cancer application

Random Features

For a signal of length J. Method computation Storage Kernel O(SJ2) O(SJ) Approximation O(p2J) O(SJ) Random Feature O(SMJ) O(MJ)

Morgane Pierre-Jean Development of statistical methods for DNA copy number data 64/ 55