including prior knowledge in machine learning for genomic
play

Including prior knowledge in machine learning for genomic data - PowerPoint PPT Presentation

Including prior knowledge in machine learning for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm StatLearn workshop, Grenoble, March 17, 2011 J.P Vert (ParisTech) Prior knowlege in ML StatLearn 1 / 68 Outline


  1. Including prior knowledge in machine learning for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm StatLearn workshop, Grenoble, March 17, 2011 J.P Vert (ParisTech) Prior knowlege in ML StatLearn 1 / 68

  2. Outline Motivations 1 Finding multiple change-points in a single profile 2 Finding multiple change-points shared by many signals 3 Supervised classification of genomic profiles 4 Learning molecular classifiers with network information 5 Conclusion 6 J.P Vert (ParisTech) Prior knowlege in ML StatLearn 2 / 68

  3. Outline Motivations 1 Finding multiple change-points in a single profile 2 Finding multiple change-points shared by many signals 3 Supervised classification of genomic profiles 4 Learning molecular classifiers with network information 5 Conclusion 6 J.P Vert (ParisTech) Prior knowlege in ML StatLearn 3 / 68

  4. Chromosomic aberrations in cancer J.P Vert (ParisTech) Prior knowlege in ML StatLearn 4 / 68

  5. Comparative Genomic Hybridization (CGH) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 5 / 68

  6. Can we identify breakpoints and "smooth" each profile? 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 J.P Vert (ParisTech) Prior knowlege in ML StatLearn 6 / 68

  7. Can we detect frequent breakpoints? 1 0.5 0 − 0.5 − 1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 0.5 0 − 0.5 − 1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 0.5 0 − 0.5 − 1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 0.5 0 − 0.5 − 1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 A collection of bladder tumour copy number profiles. J.P Vert (ParisTech) Prior knowlege in ML StatLearn 7 / 68

  8. Can we detect discriminative patterns? 0.5 0.5 0 0 −0.5 −0.5 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 2 1 0 0 −2 −1 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 1 2 0 0 −1 −2 −2 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 4 1 2 0 0 −2 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 0.5 0 0 −2 −0.5 −4 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Aggressive (left) vs non-aggressive (right) melanoma. J.P Vert (ParisTech) Prior knowlege in ML StatLearn 8 / 68

  9. DNA → RNA → protein CGH shows the (static) DNA Cancer cells have also abnormal (dynamic) gene expression (= transcription) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 9 / 68

  10. Tissue profiling with DNA chips Data Gene expression measures for more than 10 k genes Measured typically on less than 100 samples of two (or more) different classes (e.g., different tumors) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 10 / 68

  11. Can we identify the cancer subtype? (diagnosis) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 11 / 68

  12. Can we predict the future evolution? (prognosis) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 12 / 68

  13. Summary 0.5 0.5 0 0 −0.5 −0.5 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 2 1 0 0 −2 −1 0 500 1000 1500 2000 2500 −4 0 500 1000 1500 2000 2500 1 2 0 0 −1 −2 −2 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 4 1 2 0 0 −2 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 0.5 0 0 −2 −0.5 −4 0 500 1000 1500 2000 2500 −1 0 500 1000 1500 2000 2500 Many problems... Data are high-dimensional, but "structured" Classification accuracy is not all, interpretation is necessary (pattern discovery) A general strategy min R ( β ) + λ Ω( β ) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 13 / 68

  14. Outline Motivations 1 Finding multiple change-points in a single profile 2 Finding multiple change-points shared by many signals 3 Supervised classification of genomic profiles 4 Learning molecular classifiers with network information 5 Conclusion 6 J.P Vert (ParisTech) Prior knowlege in ML StatLearn 14 / 68

  15. The problem 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 Let Y ∈ R p the signal U ∈ R p with We want to find a piecewise constant approximation ˆ at most k change-points. J.P Vert (ParisTech) Prior knowlege in ML StatLearn 15 / 68

  16. The problem 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 Let Y ∈ R p the signal U ∈ R p with We want to find a piecewise constant approximation ˆ at most k change-points. J.P Vert (ParisTech) Prior knowlege in ML StatLearn 15 / 68

  17. An optimal solution? 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 We can define an "optimal" piecewise constant approximation U ∈ R p as the solution of ˆ p − 1 � U ∈ R p � Y − U � 2 min such that 1 ( U i + 1 � = U i ) ≤ k i = 1 � p � This is an optimization problem over the partitions... k Dynamic programming finds the solution in O ( p 2 k ) in time and O ( p 2 ) in memory But: does not scale to p = 10 6 ∼ 10 9 ... J.P Vert (ParisTech) Prior knowlege in ML StatLearn 16 / 68

  18. An optimal solution? 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 We can define an "optimal" piecewise constant approximation U ∈ R p as the solution of ˆ p − 1 � U ∈ R p � Y − U � 2 min such that 1 ( U i + 1 � = U i ) ≤ k i = 1 � p � This is an optimization problem over the partitions... k Dynamic programming finds the solution in O ( p 2 k ) in time and O ( p 2 ) in memory But: does not scale to p = 10 6 ∼ 10 9 ... J.P Vert (ParisTech) Prior knowlege in ML StatLearn 16 / 68

  19. An optimal solution? 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 We can define an "optimal" piecewise constant approximation U ∈ R p as the solution of ˆ p − 1 � U ∈ R p � Y − U � 2 min such that 1 ( U i + 1 � = U i ) ≤ k i = 1 � p � This is an optimization problem over the partitions... k Dynamic programming finds the solution in O ( p 2 k ) in time and O ( p 2 ) in memory But: does not scale to p = 10 6 ∼ 10 9 ... J.P Vert (ParisTech) Prior knowlege in ML StatLearn 16 / 68

  20. An optimal solution? 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 We can define an "optimal" piecewise constant approximation U ∈ R p as the solution of ˆ p − 1 � U ∈ R p � Y − U � 2 min such that 1 ( U i + 1 � = U i ) ≤ k i = 1 � p � This is an optimization problem over the partitions... k Dynamic programming finds the solution in O ( p 2 k ) in time and O ( p 2 ) in memory But: does not scale to p = 10 6 ∼ 10 9 ... J.P Vert (ParisTech) Prior knowlege in ML StatLearn 16 / 68

  21. Promoting sparsity with the ℓ 1 penalty The ℓ 1 penalty (Tibshirani, 1996; Chen et al., 1998) If R ( β ) is convex and "smooth", the solution of p � β ∈ R p R ( β ) + λ min | β i | i = 1 is usually sparse. J.P Vert (ParisTech) Prior knowlege in ML StatLearn 17 / 68

  22. Promoting piecewise constant profiles penalty The total variation / variable fusion penalty If R ( β ) is convex and "smooth", the solution of p − 1 � β ∈ R p R ( β ) + λ | β i + 1 − β i | min i = 1 is usually piecewise constant (Rudin et al., 1992; Land and Friedman, 1996). Proof: Change of variable u i = β i + 1 − β i , u 0 = β 1 We obtain a Lasso problem in u ∈ R p − 1 u sparse means β piecewise constant J.P Vert (ParisTech) Prior knowlege in ML StatLearn 18 / 68

  23. TV signal approximator p − 1 � β ∈ R p � Y − β � 2 min such that | β i + 1 − β i | ≤ µ i = 1 Adding additional constraints does not change the change-points: � p i = 1 | β i | ≤ ν (Tibshirani et al., 2005; Tibshirani and Wang, 2008) � p i = 1 β 2 i ≤ ν (Mairal et al. 2010) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 19 / 68

  24. Solving TV signal approximator p − 1 β ∈ R p � Y − β � 2 � min such that | β i + 1 − β i | ≤ µ i = 1 QP with sparse linear constraints in O ( p 2 ) -> 135 min for p = 10 5 (Tibshirani and Wang, 2008) Coordinate descent-like method O ( p ) ? -> 3s s for p = 10 5 (Friedman et al., 2007) For all µ with the LARS in O ( pK ) (Harchaoui and Levy-Leduc, 2008) For all µ in O ( p ln p ) (Hoefling, 2009) For the first K change-points in O ( p ln K ) (Bleakley and V., 2010) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 20 / 68

  25. Speed trial : 2 s. for K = 100, p = 10 7 Speed for K=1, 10, 1e2, 1e3, 1e4, 1e5 1 0.9 0.8 0.7 0.6 seconds 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 signal length 5 x 10 J.P Vert (ParisTech) Prior knowlege in ML StatLearn 21 / 68

  26. Summary 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 A fast method for multiple change-point detection An embedded method that boils down to a dichotomic wrapper method (very different from dynamic programming) J.P Vert (ParisTech) Prior knowlege in ML StatLearn 22 / 68

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend