Kernel Methods for Predictive Sequence Analysis Cheng Soon Ong 1 , 2 - PowerPoint PPT Presentation

Kernel Methods for Predictive Sequence Analysis Cheng Soon Ong 1 , 2 and Gunnar Rätsch 1 1 Friedrich Miescher Laboratory, Tübingen 2 Max Planck Institute for Biological Cybernetics, Tübingen Tutorial at the German Conference on Bioinformatics September 19, 2006 ❤tt♣✿✴✴✇✇✇✳❢♠❧✳♠♣❣✳❞❡✴r❛❡ts❝❤✴♣r♦❥❡❝ts✴❣❝❜t✉t♦r✐❛❧

Tutorial Outline Classification of Sequences Example: Recognition of splice sites Machine learning & support vector machines Every ’AG’ is a possible acceptor splice site Kernels Computer has to learn what splice sites look like Basics given some known genes/splice sites . . . Substring kernels (Spectrum, WD, . . . ) Prediction on unknown DNA Efficient data structures Other kernels (Fisher Kernel, . . . ) Some theoretical aspects Loss functions & Regularization Regression & Multi-Class problems Representer Theorem Extensions Applications Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 2 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 3 From Sequences to Features Numerical Representation Many algorithms depend on numerical representations. Each example is a vector of values (features). Use background knowledge to design good features. intron exon x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 . . . GC before 0 . 6 0 . 2 0 . 4 0 . 3 0 . 2 0 . 4 0 . 5 0 . 5 . . . GC after 0 . 7 0 . 7 0 . 3 0 . 6 0 . 3 0 . 4 0 . 7 0 . 6 . . . AG AG AAG 0 0 0 1 1 0 0 1 . . . TTT AG 1 1 1 0 0 1 0 0 . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . Label +1 +1 +1 − 1 − 1 +1 − 1 − 1 . . . Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 4 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 5

Recognition of Splice Sites Recognition of Splice Sites Given: Potential acceptor splice sites Given: Potential acceptor splice sites intron exon intron exon Goal: Rule that distinguishes true from false ones Goal: Rule that distinguishes true from false ones e.g. exploit that exons have higher GC content Linear classifiers or with large margin that certain motifs are lo- cated nearby Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 6 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 7 Empirical Inference Machine Learning: Main Tasks Supervised Learning We have both examples and labels for each example. The aim is to learn about the pattern between examples and labels. Unsupervised Learning We do not have labels for the examples, and wish to discover the underlying structure of the data. Reinforcement Learning How an autonomous agent that senses and acts in The machine utilizes information from training data to pre- its environment can learn to choose optimal actions to dict the outputs associated with a particular test example. achieve its goals. Use training data to “train” the machine. Use trained machine to perform prediction on test data. Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 8 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 9

How to measure performance? Measuring performance Important not just to memorize the training examples! What to do in practice Use some of the labeled examples for validation. We split the data into training and validation sets, and use the error on the validation set to estimate the ex- pected error. A. Cross validation Split data into c disjoint parts, and use each subset as the validation set, while using the rest as the training set. B. Random splits Randomly split the data set into two parts, for example 80% of the data for training and 20% for validation. This is usually repeated many times. We assume that the future examples are similar to our la- Report mean and standard deviation of performance on the va beled examples. Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 10 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 11 Classifier: SVM Classifier: depends on training data Minimize N 1 � 2 � w � 2 + C ξ i i =1 Subject to y i ( � w , x i � + b ) � 1 − ξ i ξ i � 0 for all i = 1 , . . . , N. Consider linear classifiers with parameters w , b : Called the soft margin SVM or the C -SVM [Cortes and d � Vapnik, 1995] f ( x ) = w j x j + b = � w , x � + b The examples on the margin are called support vectors j =1 [Vapnik, 1995] Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 12 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 13

Summary: Empirical Inference SVM is dependent on training data Minimize Minimize N N N 1 � � 1 � 2 � w � 2 + C α i α j � x i , x j � + C ξ i ξ i 2 i,j i =1 i =1 Subject to Subject to y i ( � � N y i ( � w , x i � + b ) � 1 − ξ i j =1 α j x j , x i � + b ) � 1 − ξ i ξ i � 0 ξ i � 0 for all i = 1 , . . . , N. for all i = 1 , . . . , N. Representer Theorem SVM solution only depends N on scalar products between � w = α i x i examples ( � kernel trick) i =1 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 14 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 15 Tutorial Outline Recognition of Splice Sites Machine learning & support vector machines Given: Potential acceptor splice sites Kernels Basics Substring kernels (Spectrum, WD, . . . ) intron exon Goal: Rule that distinguishes true from false ones Efficient data structures Other kernels (Fisher Kernel, . . . ) Some theoretical aspects Loss functions & Regularization Linear Classifiers Regression & Multi-Class problems with large margin Representer Theorem Extensions Applications Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 16 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 17

Recognition of Splice Sites Nonlinear Algorithms in Feature Space Linear separation might be not sufficient! Given: Potential acceptor splice sites ⇒ Map into a higher dimensional feature space Example: all second order monomials Φ : R 2 → R 3 intron exon √ Goal: Rule that distinguishes true from false ones ( x 1 , x 2 ) �→ ( z 1 , z 2 , z 3 ) := ( x 2 2 x 1 x 2 , x 2 1 , 2 ) z 3 x 2 ✕ ✕ ✕ ✕ ✕ ✕ More realistic problem!? ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ Not linearly separable! ✕ ❍ ✕ ✕ ✕ ✕ x 1 ❍ ❍ ❍ ❍ ✕ ❍ ❍ Need nonlinear separation!? ❍ ✕ z 1 ❍ ❍ ❍ ✕ ✕ ❍ ✕ ❍ ❍ ❍ ✕ Need more features!? ✕ ❍ ✕ ✕ ✕ ✕ ✕ ✕ z 2 ✕ ✕ Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 18 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 19 Kernel “Trick” Kernology I √ Example: x ∈ R 2 and Φ( x ) := ( x 2 If k is a continuous kernel of a positive integral operator on 2 x 1 x 2 , x 2 1 , 2 ) [Boser et al., 1992] L 2 ( D ) (where D is some compact space), √ √ � � ( x 2 2 x 1 x 2 , x 2 2 ) , ( y 2 2 y 1 y 2 , y 2 � Φ( x ) , Φ( y ) � = 1 , 1 , 2 ) � f ( x )k( x , y ) f ( y ) d x d y ≥ 0 , for f � = 0 � ( x 1 , x 2 ) , ( y 1 , y 2 ) � 2 = � x , y � 2 = it can be expanded as : =: k( x , y ) N F � Scalar product in feature space (here R 3 ) can be com- k( x , y ) = λ i ψ i ( x ) ψ i ( y ) puted in input space (here R 2 )! i =1 with λ i > 0 , and N F ∈ N or N F = ∞ . In that case Also works for higher orders and dimensions √ λ 1 ψ 1 ( x ) ⎛ ⎞ ⇒ relatively low dimensional input spaces √ λ 2 ψ 2 ( x ) ⇒ very high dimensional feature spaces Φ( x ) := ⎝ ⎠ . . . works only for Mercer Kernels k( x , y ) satisfies � Φ( x ) , Φ( y ) � = k( x , y ) [Mercer, 1909]. Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 20 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 21

Kernel Methods for Predictive Sequence Analysis Cheng Soon Ong 1 , 2 - PowerPoint PPT Presentation

Kernel Methods for Predictive Sequence Analysis Cheng Soon Ong 1 , 2 and Gunnar Rtsch 1 1 Friedrich Miescher Laboratory, Tbingen 2 Max Planck Institute for Biological Cybernetics, Tbingen Tutorial at the German Conference on Bioinformatics

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Workshop exercise Data integration and analysis In this exercise, we would like to work out

Debug Information From Metadata to Modules Adrian Prantl Duncan Exon Smith Apple Apple What is

Reducing technical variability and bias in RNA-seq data Francesca Finotello NETTAB 2012

Unix commands for beginners D. Puthier TAGC/Inserm, U1090, denis.puthier@univ-amu.fr Matthieu

Biocaml The OCaml Bioinformatics Library Ashish Agarwal, Sebastien Mondet, Philippe Veber,

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

Monitoring Minimal Residual Disease in AML with molecular markers Giuseppe Saglio University of

Modeling Contacts in Macro-molecular assemblies: from Inference to Assessment

Kernel Methods for Predictive Sequence Analysis Cheng Soon Ong 1 , 2 - PowerPoint PPT Presentation

Kernel Methods for Predictive Sequence Analysis Cheng Soon Ong 1 , 2 and Gunnar Rtsch 1 1 Friedrich Miescher Laboratory, Tbingen 2 Max Planck Institute for Biological Cybernetics, Tbingen Tutorial at the German Conference on Bioinformatics

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Workshop exercise Data integration and analysis In this exercise, we would like to work out

Debug Information From Metadata to Modules Adrian Prantl Duncan Exon Smith Apple Apple What is

Reducing technical variability and bias in RNA-seq data Francesca Finotello NETTAB 2012

Unix commands for beginners D. Puthier TAGC/Inserm, U1090, denis.puthier@univ-amu.fr Matthieu

Biocaml The OCaml Bioinformatics Library Ashish Agarwal, Sebastien Mondet, Philippe Veber,

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

Monitoring Minimal Residual Disease in AML with molecular markers Giuseppe Saglio University of

Modeling Contacts in Macro-molecular assemblies: from Inference to Assessment

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or