 
              Kernel Methods for Predictive Sequence Analysis Cheng Soon Ong 1 , 2 and Gunnar Rätsch 1 1 Friedrich Miescher Laboratory, Tübingen 2 Max Planck Institute for Biological Cybernetics, Tübingen Tutorial at the German Conference on Bioinformatics September 19, 2006 ❤tt♣✿✴✴✇✇✇✳❢♠❧✳♠♣❣✳❞❡✴r❛❡ts❝❤✴♣r♦❥❡❝ts✴❣❝❜t✉t♦r✐❛❧
Tutorial Outline Classification of Sequences Example: Recognition of splice sites Machine learning & support vector machines Every ’AG’ is a possible acceptor splice site Kernels Computer has to learn what splice sites look like Basics given some known genes/splice sites . . . Substring kernels (Spectrum, WD, . . . ) Prediction on unknown DNA Efficient data structures Other kernels (Fisher Kernel, . . . ) Some theoretical aspects Loss functions & Regularization Regression & Multi-Class problems Representer Theorem Extensions Applications Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 2 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 3 From Sequences to Features Numerical Representation Many algorithms depend on numerical representations. Each example is a vector of values (features). Use background knowledge to design good features. intron exon x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 . . . GC before 0 . 6 0 . 2 0 . 4 0 . 3 0 . 2 0 . 4 0 . 5 0 . 5 . . . GC after 0 . 7 0 . 7 0 . 3 0 . 6 0 . 3 0 . 4 0 . 7 0 . 6 . . . AG AG AAG 0 0 0 1 1 0 0 1 . . . TTT AG 1 1 1 0 0 1 0 0 . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . Label +1 +1 +1 − 1 − 1 +1 − 1 − 1 . . . Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 4 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 5
Recognition of Splice Sites Recognition of Splice Sites Given: Potential acceptor splice sites Given: Potential acceptor splice sites intron exon intron exon Goal: Rule that distinguishes true from false ones Goal: Rule that distinguishes true from false ones e.g. exploit that exons have higher GC content Linear classifiers or with large margin that certain motifs are lo- cated nearby Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 6 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 7 Empirical Inference Machine Learning: Main Tasks Supervised Learning We have both examples and labels for each example. The aim is to learn about the pattern between examples and labels. Unsupervised Learning We do not have labels for the examples, and wish to discover the underlying structure of the data. Reinforcement Learning How an autonomous agent that senses and acts in The machine utilizes information from training data to pre- its environment can learn to choose optimal actions to dict the outputs associated with a particular test example. achieve its goals. Use training data to “train” the machine. Use trained machine to perform prediction on test data. Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 8 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 9
How to measure performance? Measuring performance Important not just to memorize the training examples! What to do in practice Use some of the labeled examples for validation. We split the data into training and validation sets, and use the error on the validation set to estimate the ex- pected error. A. Cross validation Split data into c disjoint parts, and use each subset as the validation set, while using the rest as the training set. B. Random splits Randomly split the data set into two parts, for example 80% of the data for training and 20% for validation. This is usually repeated many times. We assume that the future examples are similar to our la- Report mean and standard deviation of performance on the va beled examples. Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 10 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 11 Classifier: SVM Classifier: depends on training data Minimize N 1 � 2 � w � 2 + C ξ i i =1 Subject to y i ( � w , x i � + b ) � 1 − ξ i ξ i � 0 for all i = 1 , . . . , N. Consider linear classifiers with parameters w , b : Called the soft margin SVM or the C -SVM [Cortes and d � Vapnik, 1995] f ( x ) = w j x j + b = � w , x � + b The examples on the margin are called support vectors j =1 [Vapnik, 1995] Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 12 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 13
Summary: Empirical Inference SVM is dependent on training data Minimize Minimize N N N 1 � � 1 � 2 � w � 2 + C α i α j � x i , x j � + C ξ i ξ i 2 i,j i =1 i =1 Subject to Subject to y i ( � � N y i ( � w , x i � + b ) � 1 − ξ i j =1 α j x j , x i � + b ) � 1 − ξ i ξ i � 0 ξ i � 0 for all i = 1 , . . . , N. for all i = 1 , . . . , N. Representer Theorem SVM solution only depends N on scalar products between � w = α i x i examples ( � kernel trick) i =1 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 14 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Introduction, Page 15 Tutorial Outline Recognition of Splice Sites Machine learning & support vector machines Given: Potential acceptor splice sites Kernels Basics Substring kernels (Spectrum, WD, . . . ) intron exon Goal: Rule that distinguishes true from false ones Efficient data structures Other kernels (Fisher Kernel, . . . ) Some theoretical aspects Loss functions & Regularization Linear Classifiers Regression & Multi-Class problems with large margin Representer Theorem Extensions Applications Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 16 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 17
Recognition of Splice Sites Nonlinear Algorithms in Feature Space Linear separation might be not sufficient! Given: Potential acceptor splice sites ⇒ Map into a higher dimensional feature space Example: all second order monomials Φ : R 2 → R 3 intron exon √ Goal: Rule that distinguishes true from false ones ( x 1 , x 2 ) �→ ( z 1 , z 2 , z 3 ) := ( x 2 2 x 1 x 2 , x 2 1 , 2 ) z 3 x 2 ✕ ✕ ✕ ✕ ✕ ✕ More realistic problem!? ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ Not linearly separable! ✕ ❍ ✕ ✕ ✕ ✕ x 1 ❍ ❍ ❍ ❍ ✕ ❍ ❍ Need nonlinear separation!? ❍ ✕ z 1 ❍ ❍ ❍ ✕ ✕ ❍ ✕ ❍ ❍ ❍ ✕ Need more features!? ✕ ❍ ✕ ✕ ✕ ✕ ✕ ✕ z 2 ✕ ✕ Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 18 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 19 Kernel “Trick” Kernology I √ Example: x ∈ R 2 and Φ( x ) := ( x 2 If k is a continuous kernel of a positive integral operator on 2 x 1 x 2 , x 2 1 , 2 ) [Boser et al., 1992] L 2 ( D ) (where D is some compact space), √ √ � � ( x 2 2 x 1 x 2 , x 2 2 ) , ( y 2 2 y 1 y 2 , y 2 � Φ( x ) , Φ( y ) � = 1 , 1 , 2 ) � f ( x )k( x , y ) f ( y ) d x d y ≥ 0 , for f � = 0 � ( x 1 , x 2 ) , ( y 1 , y 2 ) � 2 = � x , y � 2 = it can be expanded as : =: k( x , y ) N F � Scalar product in feature space (here R 3 ) can be com- k( x , y ) = λ i ψ i ( x ) ψ i ( y ) puted in input space (here R 2 )! i =1 with λ i > 0 , and N F ∈ N or N F = ∞ . In that case Also works for higher orders and dimensions √ λ 1 ψ 1 ( x ) ⎛ ⎞ ⇒ relatively low dimensional input spaces √ λ 2 ψ 2 ( x ) ⇒ very high dimensional feature spaces Φ( x ) := ⎝ ⎠ . . . works only for Mercer Kernels k( x , y ) satisfies � Φ( x ) , Φ( y ) � = k( x , y ) [Mercer, 1909]. Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 20 Cheng Soon Ong and Gunnar Rätsch: Kernel Methods for Predictive Sequence Analysis: Kernels, Page 21
Recommend
More recommend