learning penalties for change point detection using max
play

Learning penalties for change-point detection using max-margin - PowerPoint PPT Presentation

Learning penalties for change-point detection using max-margin interval regression Presented as a paper at ICML13 Toby Dylan Hocking toby.hocking@mail.mcgill.ca Co-authors Guillem Rigaill, Francis Bach, Jean-Philippe Vert 24 May 2017


  1. Learning penalties for change-point detection using max-margin interval regression Presented as a paper at ICML’13 Toby Dylan Hocking toby.hocking@mail.mcgill.ca Co-authors Guillem Rigaill, Francis Bach, Jean-Philippe Vert 24 May 2017

  2. Introduction: how to detect changes in copy number? From labels to interval regression A convex relaxation of the label error Results on the neuroblastoma data Conclusions and future work

  3. Motivation: tumor genome copy number analysis ◮ Comparative genomic hybridization microarrays (aCGH) allow genome-wide copy number analysis since logratio is proportional to DNA copy number (Pinkel et al. , 1998). ◮ Tumors often contain breakpoints, amplifications, and deletions at specific chromosomal locations that we would like to detect. ◮ Which genomic alterations are linked with good or bad patient outcome? ◮ To answer clinical questions like this one, we first need to accurately detect these genomic alterations.

  4. aCGH neuroblastoma copy number data

  5. Copy number profiles are predictive of progression in neuroblastoma Gudrun Schleiermacher, et al. Accumulation of Segmental Alterations Determines Progression in Neuroblastoma. J Clinical Oncology 2010. 2 types of profiles: ◮ Numerical: entire chromosome amplification. Good outcome. ◮ Segmental: deletion 1p 3p 11q, gain 1q 2p 17q. Bad outcome.

  6. But which model is the best? ◮ GLAD: adaptive weights smoothing (Hup´ e et al. , 2004) ◮ DNAcopy: circular binary segmentation (Venkatraman and Olshen, 2007) ◮ cghFLasso: fused lasso signal approximator with heuristics (Tibshirani and Wang, 2007) ◮ HaarSeg: wavelet smoothing (Ben-Yaacov and Eldar, 2008) ◮ GADA: sparse Bayesian learning (Pique-Regi et al. , 2008) ◮ flsa: fused lasso signal approximator path algorithm (Hoefling 2009) ◮ cghseg: pruned dynamic programming (Rigaill 2010) ◮ PELT: pruned exact linear time (Killick et al. , 2011) Comparison study: Learning smoothing models of copy number profiles using breakpoint annotations (Hocking et al. , 2012).

  7. But which model is the best? ◮ GLAD: adaptive weights smoothing (Hup´ e et al. , 2004) ◮ DNAcopy: circular binary segmentation (Venkatraman and Olshen, 2007) ◮ cghFLasso: fused lasso signal approximator with heuristics (Tibshirani and Wang, 2007) ◮ HaarSeg: wavelet smoothing (Ben-Yaacov and Eldar, 2008) ◮ GADA: sparse Bayesian learning (Pique-Regi et al. , 2008) ◮ flsa: fused lasso signal approximator path algorithm (Hoefling 2009) ◮ cghseg: pruned dynamic programming (Rigaill 2010) ◮ PELT: pruned exact linear time (Killick et al. , 2011) Comparison study: Learning smoothing models of copy number profiles using breakpoint annotations (Hocking et al. , 2012).

  8. The cghseg.k/pelt.n least squares model For a signal y ∈ R d , we define ˆ y k , the maximum likelihood model with k ∈ { 1 , . . . , d } segments as || y − µ || 2 arg min 2 µ ∈ R d d − 1 � subject to k − 1 = 1 µ j � = µ j +1 j =1 and select the number of segments using the penalty k ∗ ( λ ) = arg min y k || 2 λ kd + || y − ˆ 2 k and use the constant λ which maximizes agreement with a database of labels. But why this particular penalty? Should we normalize using the variance of the signal y ?

  9. The cghseg.k/pelt.n least squares model For a signal y ∈ R d , we define ˆ y k , the maximum likelihood model with k ∈ { 1 , . . . , d } segments as || y − µ || 2 arg min 2 µ ∈ R d d − 1 � subject to k − 1 = 1 µ j � = µ j +1 j =1 and select the number of segments using the penalty k ∗ ( λ ) = arg min y k || 2 λ kd + || y − ˆ 2 k and use the constant λ which maximizes agreement with a database of labels. But why this particular penalty? Should we normalize using the variance of the signal y ?

  10. Introduction: how to detect changes in copy number? From labels to interval regression A convex relaxation of the label error Results on the neuroblastoma data Conclusions and future work

  11. Creating breakpoint labels (demo)

  12. Labels for 2 signals 1.2 1breakpoint signal i = 4 . 17 0.8 0.4 0.0 logratio 1.2 0breakpoints signal i = 6 . 15 0.8 0.4 0.0 0 25 50 75 100 position on chromosome (mega base pairs)

  13. Estimated model with 1 segment 1.2 1breakpoint signal i = 4 . 17 0.8 0.4 0.0 logratio 1.2 0breakpoints signal i = 6 . 15 0.8 0.4 0.0 0 25 50 75 100 position on chromosome (mega base pairs) status correct false negative

  14. Estimated model with 2 segments 1.2 1breakpoint signal i = 4 . 17 0.8 0.4 0.0 logratio 1.2 0breakpoints signal i = 6 . 15 0.8 0.4 0.0 0 25 50 75 100 position on chromosome (mega base pairs) status correct false positive

  15. Estimated model with 3 segments 1.2 1breakpoint signal i = 4 . 17 0.8 0.4 0.0 logratio 1.2 0breakpoints signal i = 6 . 15 0.8 0.4 0.0 0 25 50 75 100 position on chromosome (mega base pairs) status false positive

  16. Estimated model with 4 segments 1.2 1breakpoint signal i = 4 . 17 0.8 0.4 0.0 logratio 1.2 0breakpoints signal i = 6 . 15 0.8 0.4 0.0 0 25 50 75 100 position on chromosome (mega base pairs) status false positive

  17. Label error curves for 2 signals 1 signal i = 4 . 17 annotation error e i ( k ) 0 1 signal i = 6 . 15 0 1 5 10 20 number of segments k

  18. Definition of label error For signal i we have a set of labels S i where for each label ( r , a ) ∈ S i we have ◮ a specific labeled region e.g. r = [10 , 30]. ◮ a labeled number of allowed change-points e.g. a = { 0 } The label error e i : { 1 , . . . , k max } → R + is the total number of incorrectly predicted labels for the model with k segments: � e i ( k ) = 1 | ˆ i ∩ r |�∈ a , P k ( r , a ) ∈ S i where ˆ P k i is the set of predicted change-point positions, e.g. { 25 , 35 } .

  19. Penalized model label error For a log penalty L = log λ ∈ R we define the optimal number of segments z ∗ y k i || 2 i ( L ) = arg min exp( L ) k + || y i − ˆ 2 k ∈{ 1 ,..., k max } and the penalized model annnotation error E i : R → R + E i ( L ) = e i [ z ∗ i ( L )] Both are piecewise constant functions that can be calculated exactly.

  20. Error and model selection curves for 2 signals signal i = 4 . 17 signal i = 6 . 15 20 segments z ∗ 10 i ( L ) 5 1 1 error E i ( L ) 0 -4 -2 0 2 -4 -2 0 2 log penalty L

  21. Error and model selection curves for 2 signals signal i = 4 . 17 signal i = 6 . 15 20 segments z ∗ 10 i ( L ) 5 1 1 error E i ( L ) 0 -4 -2 0 2 -4 -2 0 2 log penalty L

  22. Label error curves for 2 signals 1.00 signal i = 4 . 17 0.75 annotation error E i ( L ) 0.50 0.25 limit 0.00 L i 1.00 L i signal i = 6 . 15 0.75 0.50 0.25 0.00 -4 -2 0 2 log penalty L

  23. Target interval [ L i , L i ] for 2 signals 2 log penalty L limit 0 L i L i -2 -4 -2.7 -2.4 -2.1 variance estimate log ˆ s i

  24. Target interval [ L i , L i ] for all signals 2 log penalty L limit 0 L i L i -2 -4 -2.7 -2.4 -2.1 variance estimate log ˆ s i

  25. Limit point representation 2 log penalty L limit 0 L i L i -2 -4 -2.7 -2.4 -2.1 variance estimate log ˆ s i

  26. Max margin regression line 2 log penalty L limit 0 L i L i -2 -4 -2.7 -2.4 -2.1 variance estimate log ˆ s i

  27. Max margin interval regression is a linear program ◮ Let x i ∈ R m be the signal features, i.e. number of points sampled log d i , variance estimate log ˆ s i , ... ◮ Learn an affine function f ( x i ) = w ′ x i + β from features to model complexity using a linear program: β ∈ R , w ∈ R m ,µ ∈ R + µ maximize w ′ x i + β − L i ≥ µ subject to ∀ i , if L i > −∞ , L i − w ′ x i − β ≥ µ. ∀ i , if L i < ∞ , ◮ Linear objective, linear constraints ⇒ linear program. ◮ Many solvers exist, i.e. simplex, interior point, dual methods. ◮ Implemented in R: lpSolve , quadprog , ...

  28. Learning the penalty function ◮ For every signal i , calculate a variance estimate feature x i = log ˆ s i . ◮ Use that to predict model complexity using an affine function f ( x i ) = w log ˆ s i + β. ◮ Equivalent to learning a penalty function z ∗ i || 2 y k i [ f ( x i )] = arg min || y i − ˆ 2 + exp[ f ( x i )] k k y k i || 2 s w i e β k . = arg min || y i − ˆ 2 + ˆ k

  29. Learning the penalty function ◮ For every signal i , calculate a variance estimate feature x i = log ˆ s i . ◮ Use that to predict model complexity using an affine function f ( x i ) = w log ˆ s i + β. ◮ Equivalent to learning a penalty function z ∗ i || 2 y k i [ f ( x i )] = arg min || y i − ˆ 2 + exp[ f ( x i )] k k y k i || 2 s w i e β k . = arg min || y i − ˆ 2 + ˆ k

  30. Introduction: how to detect changes in copy number? From labels to interval regression A convex relaxation of the label error Results on the neuroblastoma data Conclusions and future work

  31. Learning problem ◮ Signal features x i ∈ R m i.e. variance, number of points, etc. ◮ We would like to find a penalty function f : R m → R with minimal label error n � arg min E i [ f ( x i )] . f i =1 ◮ But E i is a non-convex, piecewise constant function. ◮ Grid search? ◮ Many features m ⇒ intractable. ◮ No guarantee to find the global min. ◮ Instead, we propose to minimize a margin-based convex relaxation n � arg min l i [ f ( x i )] . f i =1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend