Increasing stability and interpretability of gene expression - - PowerPoint PPT Presentation

increasing stability and interpretability of gene
SMART_READER_LITE
LIVE PREVIEW

Increasing stability and interpretability of gene expression - - PowerPoint PPT Presentation

Motivation Stabilizing the signature Results Conclusion and Perspectives Increasing stability and interpretability of gene expression signatures Prediction of breast cancer outcome Anne-Claire Haury Laurent Jacob Jean-Philippe Vert Center


slide-1
SLIDE 1

Motivation Stabilizing the signature Results Conclusion and Perspectives

Increasing stability and interpretability of gene expression signatures

Prediction of breast cancer outcome Anne-Claire Haury Laurent Jacob Jean-Philippe Vert

Center For Computational Biology ∈ Mines Paristech/Institut Curie/INSERM U900

SMPGD Marseille - January 14, 2010

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-2
SLIDE 2

Motivation Stabilizing the signature Results Conclusion and Perspectives

Outline

1

Motivation Gene expression signatures Mathematical tools for model selection

2

Stabilizing the signature Main procedure Scoring

3

Results

4

Conclusion and Perspectives

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-3
SLIDE 3

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

Outline

1

Motivation Gene expression signatures Mathematical tools for model selection

2

Stabilizing the signature Main procedure Scoring

3

Results

4

Conclusion and Perspectives

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-4
SLIDE 4

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

SIGNATURES AS A PROGNOSTIC TOOL

Signature: list of genes sufficient to predict response (e.g. metastasis vs no metastasis) Should involve few genes Should be robust to perturbations of the data and, more importantly, stable across datasets

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-5
SLIDE 5

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

INSTABILITY OF SIGNATURES FOR BREAST CANCER OUTCOME

Many proposals through literature, e.g. Van’t Veer et al.,2002; Van de Vijver et al., 2002; Wang et al. 2005 However: very few overlap between them, if any Moreover: lists of genes may be hard to interpret

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-6
SLIDE 6

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

PROPOSAL : GRAPHICAL PRIOR

Consider a graph with PPI + coregulation information (Chuang et al., 2007) Assumption : genes close on the graph build perturbed components Consider groups of genes from this graph (e.g. edges, connected components, etc.)

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-7
SLIDE 7

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

Outline

1

Motivation Gene expression signatures Mathematical tools for model selection

2

Stabilizing the signature Main procedure Scoring

3

Results

4

Conclusion and Perspectives

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-8
SLIDE 8

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

MODEL SELECTION FRAMEWORK

INPUTS:

n examples (e.g. microarrays) p variables (e.g. genes) X : n × p design matrix (e.g. gene expression dataset) Y : n × 1 binary response vector (e.g. phenotype to predict)

OUTPUTS (that we hope for):

Relevant features for discriminating against the two possible phenotype’s status, i.e. good accuracy Stable signature both across inner perturbations of a dataset and many datasets Genes connected on the graph

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-9
SLIDE 9

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

L1-PENALIZED CLASSIFIERS

Lasso : selects genes (Tibshirani, 1996) βLasso = arg min

β∈Rp n

  • i=1

L(xiβ, yi) + λ||β||1 Group Lasso (Yuan & Lin, 2006): implies group sparsity for groups of covariates that form a partition of {1...p} Overlapping group Lasso (Jacob et al., 2009): selects a union of potentially overlapping groups of covariates (e.g. gene pathways). Graph Lasso: uses groups induced by the graph (e.g. edges, connected components)

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-10
SLIDE 10

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

L1-PENALIZED CLASSIFIERS

Lasso : selects genes (Tibshirani, 1996) βLasso = arg min

β∈Rp n

  • i=1

L(xiβ, yi) + λ||β||1 Group Lasso (Yuan & Lin, 2006): implies group sparsity for groups of covariates that form a partition of {1...p} Overlapping group Lasso (Jacob et al., 2009): selects a union of potentially overlapping groups of covariates (e.g. gene pathways). Graph Lasso: uses groups induced by the graph (e.g. edges, connected components)

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-11
SLIDE 11

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

L1-PENALIZED CLASSIFIERS

Lasso : selects genes (Tibshirani, 1996) βLasso = arg min

β∈Rp n

  • i=1

L(xiβ, yi) + λ||β||1 Group Lasso (Yuan & Lin, 2006): implies group sparsity for groups of covariates that form a partition of {1...p} Overlapping group Lasso (Jacob et al., 2009): selects a union of potentially overlapping groups of covariates (e.g. gene pathways). Graph Lasso: uses groups induced by the graph (e.g. edges, connected components)

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-12
SLIDE 12

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

L1-PENALIZED CLASSIFIERS

Lasso : selects genes (Tibshirani, 1996) βLasso = arg min

β∈Rp n

  • i=1

L(xiβ, yi) + λ||β||1 Group Lasso (Yuan & Lin, 2006): implies group sparsity for groups of covariates that form a partition of {1...p} Overlapping group Lasso (Jacob et al., 2009): selects a union of potentially overlapping groups of covariates (e.g. gene pathways). Graph Lasso: uses groups induced by the graph (e.g. edges, connected components)

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-13
SLIDE 13

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

PROPERTIES OF LASSO-LIKE ALGORITHMS

Advantages :

Do well when the number of features greatly exceeds the sample size, i.e. p >> n Relatively easy to implement. Quite fast to run.

Drawbacks :

Dependency on a parameter λ to choose: tradeoff between accuracy and no overfitting Bad behaviour in the presence of too correlated features : false positives and false negatives. Also implies great instability.

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-14
SLIDE 14

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

PROPERTIES OF LASSO-LIKE ALGORITHMS

Advantages :

Do well when the number of features greatly exceeds the sample size, i.e. p >> n Relatively easy to implement. Quite fast to run.

Drawbacks :

Dependency on a parameter λ to choose: tradeoff between accuracy and no overfitting Bad behaviour in the presence of too correlated features : false positives and false negatives. Also implies great instability.

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-15
SLIDE 15

Motivation Stabilizing the signature Results Conclusion and Perspectives Gene expression signatures Mathematical tools for model selection

EXAMPLE

Groups 1 and 2 are very correlated The Group Lasso algorithm might choose one or the other at random Scenario 1: Both are relevant. But only one will be selected. Scenario 2: Group 1 is relevant, group 2 is noise. Roughly 50% probability that only group 2 is selected

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-16
SLIDE 16

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

Outline

1

Motivation Gene expression signatures Mathematical tools for model selection

2

Stabilizing the signature Main procedure Scoring

3

Results

4

Conclusion and Perspectives

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-17
SLIDE 17

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

TAKE ADVANTAGE OF RANDOMIZATION

Basis: Meinshausen & Buehlmann, 2009 : Stability Selection. Simulate different datasets by perturbating the data, i.e. do a 100 times as follows

1

Randomly choose n/2 examples from the data (without replacement)

2

Run the whole path of the graph lasso

3

Store the selected groups

When done: for each λ compute each group’s selection frequency, i.e. get something like: Groups λ1 (the largest) ..... λL (the smallest) 1 0.25 ..... 0.6 ... ... ..... ..... p 0.65 ..... 0.96

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-18
SLIDE 18

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

TAKE ADVANTAGE OF RANDOMIZATION

Basis: Meinshausen & Buehlmann, 2009 : Stability Selection. Simulate different datasets by perturbating the data, i.e. do a 100 times as follows

1

Randomly choose n/2 examples from the data (without replacement)

2

Run the whole path of the graph lasso

3

Store the selected groups

When done: for each λ compute each group’s selection frequency, i.e. get something like: Groups λ1 (the largest) ..... λL (the smallest) 1 0.25 ..... 0.6 ... ... ..... ..... p 0.65 ..... 0.96

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-19
SLIDE 19

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

GRAPHICAL ILLUSTRATION

10

−3

10

−2

10

−1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

λ Π

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-20
SLIDE 20

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

Outline

1

Motivation Gene expression signatures Mathematical tools for model selection

2

Stabilizing the signature Main procedure Scoring

3

Results

4

Conclusion and Perspectives

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-21
SLIDE 21

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

SCORING THE GROUPS

Initial scoring proposed in Meinshausen & Buehlmann, 2009 : threshold.

10

−3

10

−2

10

−1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

λ Π A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-22
SLIDE 22

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

SCORING THE GROUPS

Initial scoring proposed in Meinshausen & Buehlmann, 2009 : threshold. However : hard to choose a grid for λ.

10

−3

10

−2

10

−1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

λ Π

πthres

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-23
SLIDE 23

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

SCORING THE GROUPS

Initial scoring proposed in Meinshausen & Buehlmann, 2009 : threshold. However : hard to choose a grid for λ.

10

−3

10

−2

10

−1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

λ Π

πthres

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-24
SLIDE 24

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

SCORING THE GROUPS

For each λ we compute the frequence ratio of each group; we then keep the maximum value over the grid for each group, i.e. the score vector is defined as ∀j ∈ Groups, Sj = max

λ

p(j ∈ Solution|λ)

  • j p(j ∈ Solution|λ)

10

−3

10

−2

10

−1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

λ Π A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-25
SLIDE 25

Motivation Stabilizing the signature Results Conclusion and Perspectives Main procedure Scoring

SCORING THE GROUPS

For each λ we compute the frequence ratio of each group; we then keep the maximum value over the grid for each group, i.e. the score vector is defined as ∀j ∈ Groups, Sj = max

λ

p(j ∈ Solution|λ)

  • j p(j ∈ Solution|λ)

10

−3

10

−2

0.02 0.04 0.06 0.08 0.1

λ S

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-26
SLIDE 26

Motivation Stabilizing the signature Results Conclusion and Perspectives

DATA AND OBJECTIVES

Data:

Van’t Veer dataset : 295 tumors, 78 metastatic, 8141 genes Wang dataset : 286 tumors, 106 metastatic, 8141 genes Graph (Chuang et al., 2007), 8141 nodes, 57235 edges

Algorithms to compare:

Lasso Graph Lasso (edges) Lasso + stability selection Graph Lasso + stability selection

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-27
SLIDE 27

Motivation Stabilizing the signature Results Conclusion and Perspectives

ACCURACY

For a signature of 60 genes: balanced accuracy on Van’t Veer data, five fold CV No Stability selection Stability Selection Lasso 0.61 ± 0.03 0.57 ± 0.02 Graph Lasso 0.62 ± 0.02 0.58 ± 0.03 Accuracy when tested on Wang data:

Lasso Lasso + stab. sel. Graph Lasso Graph Lasso + stab. sel. 0.2 0.4 0.6 0.8 Balanced Accuracy Signature learnt on Van’t Veer dataset Signature learnt on Wang dataset

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-28
SLIDE 28

Motivation Stabilizing the signature Results Conclusion and Perspectives

STABILITY

Inner stability:

1 2 3 4 5 20 40 60 80 100 120 140 Lasso Lasso with stability selection Graph Lasso Graph Lasso with stability selection

Stability across datasets:

20 40 60 80 100 120 1 2 3 4 5 6 7 Number of genes in the signatures Number of genes in the overlap Lasso Graph Lasso with stability selection Lasso with stability selection Graph Lasso

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-29
SLIDE 29

Motivation Stabilizing the signature Results Conclusion and Perspectives

CONNECTIVITY

CA = Size of the largest connected component Number of genes selected

20 40 60 80 100 0.2 0.4 0.6 0.8 1 Number of genes in the signature CA Lasso Lasso with stability selection Graph Lasso with stability selection Graph Lasso

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-30
SLIDE 30

Motivation Stabilizing the signature Results Conclusion and Perspectives

SIGNATURE OBTAINED FROM LASSO

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-31
SLIDE 31

Motivation Stabilizing the signature Results Conclusion and Perspectives

SIGNATURE OBTAINED FROM GRAPH LASSO WITH STABILITY SELECTION

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-32
SLIDE 32

Motivation Stabilizing the signature Results Conclusion and Perspectives

SIGNATURE OBTAINED FROM GRAPH LASSO WITH STABILITY SELECTION

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-33
SLIDE 33

Motivation Stabilizing the signature Results Conclusion and Perspectives

CONCLUSION

Selecting groups from a graph instead of genes:

Adds relevant biological information to the model Increases connectivity and hence interpretability of the signature Drawback: may become computationally more demanding (larger groups)

Using stability selection:

Improves stability of the signature within a given dataset Drawback: hard to know how many genes should be in the signature

Neither of these methods or their combination change the accuracy

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures

slide-34
SLIDE 34

Motivation Stabilizing the signature Results Conclusion and Perspectives

PERSPECTIVES

Take subtypes of tumors into account : need more data Related project (with F . Reyal): build a larger Breast Cancer dataset Try different groups/ Larger groups Compare biological processes involved instead of genes

A.C. Haury, L. Jacob, J.P . Vert Increasing stability and interpretability of signatures