Learning algorithms and statistical software, with applications to - PowerPoint PPT Presentation

Learning algorithms and statistical software, with applications to bioinformatics PhD defense of Toby Dylan Hocking toby.hocking@inria.fr http://cbio.ensmp.fr/~thocking/ 20 November 2012 1

Summary of contributions ◮ Ch. 2: clusterpath for finding groups in data, ICML 2011. ◮ Ch. 3: breakpoint annotations for smoothing model training and evaluation, HAL-00663790. ◮ Ch. 4-5: penalties for breakpoint detection in simulated and real signals, under review. ◮ Statistical software contributions in R: ◮ Ch. 7: direct labels for readable statistical graphics, Best Student Poster at useR 2011. ◮ Ch. 8: documentation generation to convert comments into a package for distribution, accepted in JSS. ◮ Ch. 9: named capture regular expressions for extracting data from text files, talk for useR 2011, accepted into R-2.14. 2

Cancer cells show chromosomal copy number alterations Spectral karyotypes show the number of copies of the sex chromosomes (X,Y) and autosomes (1-22). Source: Alberts et al. 2002. Normal cell with 2 copies of Cancer cell with many copy each autosome. number alterations. 3

Copy number profiles of neuroblastoma tumors 4

Ch. 2: clusterpath finds groups in data Ch 3: breakpoint annotations for smoothing model selection Ch. 4–5: penalties for breakpoint detection 5

The clusterpath relaxes a hard fusion penalty || α − X || 2 min F α ∈ R n × p � subject to 1 α i � = α j ≤ t α 2 number of breakpoints X 2 i < j α C = α 2 = α 3 Combinatorial! Relaxation: X 3 � || α i − α j || q w ij ≤ t i < j The clusterpath is the path of optimal α obtained by varying t . Related work: “fused lasso” α 1 Tibshirani and Saunders (2005), X 1 “convex clustering shrinkage” Pel- ckmans et al. (2005), “grouping α 1 survival pursuit” Shen and Huang (2010), “sum of norms” Lindsten et al. (2011). 6

Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 7

Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 1. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 8

Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 1. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 17

Clusterpath learns a tree, even for odd cluster shapes Comparison with other methods for finding 2 clusters. Caveat: does not recover overlapping clusters, e.g. iris data, gaussian mixture. 18

Contributions in chapter 2, future work Hocking et al. Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties. ICML 2011. ◮ Theorem. No splits in the ℓ 1 clusterpath with identity weights w ij = 1. What about other situations? ◮ Convex and hierarchical clustering algorithms. ◮ ℓ 1 homotopy method O ( pn log n ). ◮ ℓ 2 active-set method O ( pn 2 ). ◮ ℓ ∞ Franck-Wolfe algorithm. ◮ Implementation in R package clusterpath on R-Forge. 19

Ch. 2: clusterpath finds groups in data Ch 3: breakpoint annotations for smoothing model selection Ch. 4–5: penalties for breakpoint detection 20

How to detect breakpoints in 23 × 575 =13,225 signals? 21

Which model should we use? ◮ GLAD: adaptive weights smoothing (Hup´ e et al. , 2004) ◮ DNAcopy: circular binary segmentation (Venkatraman and Olshen, 2007) ◮ cghFLasso: fused lasso signal approximator with heuristics (Tibshirani and Wang, 2007) ◮ HaarSeg: wavelet smoothing (Ben-Yaacov and Eldar, 2008) ◮ GADA: sparse Bayesian learning (Pique-Regi et al. , 2008) ◮ flsa: fused lasso signal approximator path algorithm (Hoefling 2009) ◮ cghseg: pruned dynamic programming (Rigaill 2010) ◮ PELT: pruned exact linear time (Killick et al. , 2011) ... and how to select the smoothing parameter in each model? 22

575 copy number profiles, each annotated in 6 regions 23

Not enough breakpoints 24

Too many breakpoints 25

Good agreement with annotated regions 26

Select the best model using the breakpoint annotations Breakpoint detection training errors for 3 models of data(neuroblastoma,package="neuroblastoma") . cghseg.k, pelt.n flsa.norm dnacopy.sd 80 predicted annotations percent incorrectly in training set 60 statistic false.positive 40 false.negative errors 20 11.5 ● 4.8 2.2 ● ● 0 −5 −4 −3 −2 −1 0 −2 −1 0 1 2 3 0.0 0.5 1.0 log10(smoothing parameter lambda) <− more breakpoints fewer breakpoints −> Idea: for several smoothing parameters λ , calculate the annotation error function E ( λ ), (black line) then select the model with least error. (black dot) ˆ λ = arg min E ( λ ) . λ 27

PELT/cghseg show the best breakpoint detection ROC curves for breakpoint detection training errors of each model, by varying the smoothness parameter λ . probability(predict breakpoint | breakpoint) optimization−based models approximate optimization glad ● 1.0 ● ● dnacopy.alpha glad.haarseg ● cghseg.mBIC ● glad.default dnacopy ● pelt.n ● ● 0.9 default cghseg.k True positive rate = ● glad.MinBkpWeight flsa dnacopy ● ● norm 0.8 prune gada ● glad.lambdabreak 0.7 ● ● dnacopy.sd flsa 0.6 ● pelt.default 0.5 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 False positive rate = probability(predict breakpoint | normal) Open circle shows smoothness λ selected using annotations. 28

Few annotations required for a good breakpoint detector Percent of correctly predicted annotations on test set profiles 100 cghseg.k, pelt.n 98 96 94 flsa.norm 92 90 dnacopy.sd 88 86 84 glad.lambdabreak 82 80 1 5 10 15 20 25 30 Annotated profiles in global model training set 29

Interactive web site for annotation and model building Takita J et al. Aberrations of NEGR1 on 1p31 and MYEOV on 11q13 in neuroblastoma. Cancer Sci. 2011 Sep;102(9):1645-50. 30

Learning algorithms and statistical software, with applications to - PowerPoint PPT Presentation

Learning algorithms and statistical software, with applications to bioinformatics PhD defense of Toby Dylan Hocking toby.hocking@inria.fr http://cbio.ensmp.fr/~thocking/ 20 November 2012 1 Summary of contributions Ch. 2: clusterpath for

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Implementing Legacy Implementing Legacy Statistical Algorithms in a Statistical Algorithms in a

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Introduction to Software Testing Software Testing - Module 1 Part 1 The Software Engineering

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Crimp Quality Assurance & Statistical Process Control Elpress Analyzer Software Statistical

Spin norm: combinatorics and representations Chao-Ping Dong Institute of Mathematics Hunan

A t-SVD-based Nuclear Norm with Imaging Applications Ning Hao 2 Misha E. Kilmer 2 Oguz Semerci 1

Section 3 Iterative Methods in Matrix Algebra Numerical Analysis II Xiaojing Ye, Math &

Quiz 1. Write a procedure check least squares ( A , x , b ) with the following spec: I input:

Differentiability and strict convexity of the stable norm Michael Goldman CMAP, Polytechnique/

Chapter 5 Least Squares Chapter 5 Inconsistent Systems In regression (and many other

Fitting Linear Models Requires assumptions about i s. Usual assumptions: 1. 1 , . . . , n

Regression Practical Machine Learning Fabian Wauthier 09/10/2009 Adapted from slides by Kurt

Learning algorithms and statistical software, with applications to - PowerPoint PPT Presentation

Learning algorithms and statistical software, with applications to bioinformatics PhD defense of Toby Dylan Hocking toby.hocking@inria.fr http://cbio.ensmp.fr/~thocking/ 20 November 2012 1 Summary of contributions Ch. 2: clusterpath for

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Implementing Legacy Implementing Legacy Statistical Algorithms in a Statistical Algorithms in a

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Introduction to Software Testing Software Testing - Module 1 Part 1 The Software Engineering

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Crimp Quality Assurance &amp; Statistical Process Control Elpress Analyzer Software Statistical

Spin norm: combinatorics and representations Chao-Ping Dong Institute of Mathematics Hunan

A t-SVD-based Nuclear Norm with Imaging Applications Ning Hao 2 Misha E. Kilmer 2 Oguz Semerci 1

Section 3 Iterative Methods in Matrix Algebra Numerical Analysis II Xiaojing Ye, Math &amp;

Quiz 1. Write a procedure check least squares ( A , x , b ) with the following spec: I input:

Differentiability and strict convexity of the stable norm Michael Goldman CMAP, Polytechnique/

Chapter 5 Least Squares Chapter 5 Inconsistent Systems In regression (and many other

Fitting Linear Models Requires assumptions about i s. Usual assumptions: 1. 1 , . . . , n

Regression Practical Machine Learning Fabian Wauthier 09/10/2009 Adapted from slides by Kurt

Crimp Quality Assurance & Statistical Process Control Elpress Analyzer Software Statistical

Section 3 Iterative Methods in Matrix Algebra Numerical Analysis II Xiaojing Ye, Math &