fast sparse methods for genomics data
play

Fast sparse methods for genomics data Jean-Philippe Vert - PowerPoint PPT Presentation

Fast sparse methods for genomics data Jean-Philippe Vert Optimization and Statistical Learning workshop, Les Houches, January 6-11, 2013 JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 1 / 47 Normal vs cancer cells What


  1. Fast sparse methods for genomics data Jean-Philippe Vert Optimization and Statistical Learning workshop, Les Houches, January 6-11, 2013 JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 1 / 47

  2. Normal vs cancer cells What goes wrong? How to treat? JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 2 / 47

  3. Biology is now quantitative, "high-throughput" DOE Joint Genome Institute JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 3 / 47

  4. Big data in biology "The $1,000 genome, the $1 million interpretation" (B. Kopf) High-dimensional, heterogeneous, structured data. "Large p " http://aws.amazon.com/1000genomes/ JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 4 / 47

  5. In this talk Mapping DNA breakpoints in cancer genomes (w. K Bleakley) 1 Isoform detection from RNA-seq data (w. E Bernard, J Mairal, L 2 Jacob) JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 5 / 47

  6. Outline Mapping DNA breakpoints in cancer genomes (w. K Bleakley) 1 Isoform detection from RNA-seq data (w. E Bernard, J Mairal, L 2 Jacob) JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 6 / 47

  7. Chromosomic aberrations in cancer JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 7 / 47

  8. Comparative Genomic Hybridization (CGH) Motivation Comparative genomic hybridization (CGH) data measure the DNA copy number along the genome Very useful, in particular in cancer research to observe systematically variants in DNA content 1 0.5 Log-ratio 0 -0.5 -1 Chromosome 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 X JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 8 / 47

  9. Can we identify breakpoints and "smooth" each profile? 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 A classical multiple change-point detection problem Should scale to lengths of order 10 6 ∼ 10 9 JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 9 / 47

  10. Can we identify breakpoints and "smooth" each profile? 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 A classical multiple change-point detection problem Should scale to lengths of order 10 6 ∼ 10 9 JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 9 / 47

  11. An optimal solution 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 For a signal Y ∈ R p , define an optimal approximation β ∈ R p with k breakpoints as the solution of p − 1 β ∈ R p � Y − β � 2 � min such that 1 ( β i + 1 � = β i ) ≤ k i = 1 � p � This is an optimization problem over the partitions... k Dynamic programming finds the solution in O ( p 2 k ) in time and O ( p 2 ) in memory But: does not scale to p = 10 6 ∼ 10 9 ... JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 10 / 47

  12. An optimal solution 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 For a signal Y ∈ R p , define an optimal approximation β ∈ R p with k breakpoints as the solution of p − 1 β ∈ R p � Y − β � 2 � min such that 1 ( β i + 1 � = β i ) ≤ k i = 1 � p � This is an optimization problem over the partitions... k Dynamic programming finds the solution in O ( p 2 k ) in time and O ( p 2 ) in memory But: does not scale to p = 10 6 ∼ 10 9 ... JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 10 / 47

  13. An optimal solution 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 For a signal Y ∈ R p , define an optimal approximation β ∈ R p with k breakpoints as the solution of p − 1 β ∈ R p � Y − β � 2 � min such that 1 ( β i + 1 � = β i ) ≤ k i = 1 � p � This is an optimization problem over the partitions... k Dynamic programming finds the solution in O ( p 2 k ) in time and O ( p 2 ) in memory But: does not scale to p = 10 6 ∼ 10 9 ... JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 10 / 47

  14. An optimal solution 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 0 100 200 300 400 500 600 700 800 900 1000 For a signal Y ∈ R p , define an optimal approximation β ∈ R p with k breakpoints as the solution of p − 1 β ∈ R p � Y − β � 2 � min such that 1 ( β i + 1 � = β i ) ≤ k i = 1 � p � This is an optimization problem over the partitions... k Dynamic programming finds the solution in O ( p 2 k ) in time and O ( p 2 ) in memory But: does not scale to p = 10 6 ∼ 10 9 ... JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 10 / 47

  15. Promoting sparsity with the ℓ 1 penalty The ℓ 1 penalty (Tibshirani, 1996; Chen et al., 1998) If R ( β ) is convex and "smooth", the solution of p � min β ∈ R p R ( β ) + λ | β i | i = 1 is usually sparse. JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 11 / 47

  16. Promoting piecewise constant profiles penalty The total variation / variable fusion penalty If R ( β ) is convex and "smooth", the solution of p − 1 � β ∈ R p R ( β ) + λ min | β i + 1 − β i | i = 1 is usually piecewise constant (Rudin et al., 1992; Land and Friedman, 1996). Proof: Change of variable u i = β i + 1 − β i , u 0 = β 1 We obtain a Lasso problem in u ∈ R p − 1 u sparse means β piecewise constant JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 12 / 47

  17. TV signal approximator p − 1 β ∈ R p � Y − β � 2 � min such that | β i + 1 − β i | ≤ µ i = 1 Adding additional constraints does not change the change-points: � p i = 1 | β i | ≤ ν (Tibshirani et al., 2005; Tibshirani and Wang, 2008) � p i = 1 β 2 i ≤ ν (Mairal et al. 2010) JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 13 / 47

  18. Solving TV signal approximator p − 1 β ∈ R p � Y − β � 2 � min such that | β i + 1 − β i | ≤ µ i = 1 QP with sparse linear constraints in O ( p 2 ) -> 135 min for p = 10 5 (Tibshirani and Wang, 2008) Coordinate descent-like method O ( p ) ? -> 3s s for p = 10 5 (Friedman et al., 2007) With the LARS in O ( pk ) (Harchaoui and Levy-Leduc, 2008) For all µ in O ( p ln p ) (Hoefling, 2009) For the first k change-points in O ( p ln k ) (Bleakley and V., 2010) JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 14 / 47

  19. Solving TV signal approximator in O ( p ln k ) Theorem (V. and Bleakley, 2010; see also Hoefling, 2009) TV signal approximator performs "greedy" dichotomic segmentation Algorithm 1 Greedy dichotomic segmentation Require: k number of intervals, γ ( I ) gain function to split an interval I into I L ( I ) , I R ( I ) 1: I 0 represents the interval [1 , n ] 2: P = { I 0 } 3: for i = 1 to k do I ∗ ← arg max γ ( I ∗ ) 4: I 2P P ← P\ { I ∗ } 5: P ← P [ { I L ( I ∗ ) , I R ( I ∗ ) } 6: 7: end for 8: return P Apparently greedy algorithm finds the global optimum! JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 15 / 47

  20. Speed trial : 2 s. for k = 100, p = 10 7 Speed for K=1, 10, 1e2, 1e3, 1e4, 1e5 1 0.9 0.8 0.7 0.6 seconds 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 signal length 5 x 10 JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 16 / 47

  21. Extension 1: linear discrimination / regression 0.5 0.5 0 0 −0.5 −0.5 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 2 1 0 0 −2 −1 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 1 2 0 0 −1 −2 −2 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 4 1 2 0 0 −2 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 0.5 0 0 −2 −0.5 −4 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Aggressive (left) vs non-aggressive (right) melanoma JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 17 / 47

  22. Fused lasso for supervised classification 0.5 0.5 0 0 −0.5 −0.5 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 2 1 0 0 −2 −1 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 1 2 0 0 −1 −2 −2 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 4 1 2 0 0 −2 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 0.5 0 0 −2 −0.5 −4 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Idea: find a linear predictor f ( Y ) = β ⊤ Y that best discriminates the aggressive vs non-aggressive samples, subject to the constraints that it should be sparse and piecewise constant Mathematically: β ∈ R p R ( β ) + λ 1 � β � 1 + λ 2 � β � TV min Computationnally: proximal methods JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 18 / 47

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend