Support vector machine prediction of signal peptide cleavage site - PowerPoint PPT Presentation

1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan

2 Outline 1. SVM and kernel methods 2. New kernels for bioinformatics 3. Example: signal peptide cleavage site prediction

3 Part 1 SVM and kernel methods

4 Support vector machines φ • Objects to classified x mapped to a feature space • Largest margin separating hyperplan in the feature space

5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) >

5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) > • Simple kernels can represent complex Φ

5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) > • Simple kernels can represent complex Φ • For a given kernel, not only SVM but also clustering, PCA, ICA... possible in the feature space = kernel methods

6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid...

6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors

6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings:

6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98)

6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99)

6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99) ⋆ Kernel for translation initiation site (Zien et al. 00)

6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99) ⋆ Kernel for translation initiation site (Zien et al. 00) ⋆ String kernel (Lodhi et al. 00)

7 Kernel engineering Use prior knowledge to build the geometry of the feature space through K ( ., . )

8 Part 2 New kernels for bioinfomatics

9 The problem • X a set of objects

9 The problem • X a set of objects • p ( x ) a probability distribution on X

9 The problem • X a set of objects • p ( x ) a probability distribution on X • How to build K ( x, y ) from p ( x ) ?

10 Product kernel K prod ( x, y ) = p ( x ) p ( y )

10 Product kernel K prod ( x, y ) = p ( x ) p ( y ) x p(y) p(x) 0 y

10 Product kernel K prod ( x, y ) = p ( x ) p ( y ) x p(y) p(x) 0 y SVM = Bayesian classifier

11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y )

11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y ) p(x) x p(y) z p(z) y

11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y ) p(x) x p(y) z p(z) y No learning

12 Interpolated kernel If objects are composite: x = ( x 1 , x 2 ) : K ( x, y ) = K diag ( x 1 , y 1 ) K prod ( x 2 , y 2 )

12 Interpolated kernel If objects are composite: x = ( x 1 , x 2 ) : K ( x, y ) = K diag ( x 1 , y 1 ) K prod ( x 2 , y 2 ) = p ( x 1 ) δ ( x 1 , y 1 ) × p ( x 2 | x 1 ) p ( y 2 | y 1 ) A* AA AB BA B* BB

13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n )

13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n ) • A list of index subsets: V = { I 1 , . . . , I v } where I i ⊂ { 1 , . . . , n }

13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n ) • A list of index subsets: V = { I 1 , . . . , I v } where I i ⊂ { 1 , . . . , n } • Interpolated kernel: K V ( x, y ) = 1 � K diag ( x I , y I ) K prod ( x I c , y I c ) |V| I ∈V

14 Rare common subparts For a given p ( x ) and p ( y ) , we have: K V ( x, y ) = K prod ( x, y ) × 1 δ ( x I , y I ) � |V| p ( x I ) I ∈V

14 Rare common subparts For a given p ( x ) and p ( y ) , we have: K V ( x, y ) = K prod ( x, y ) × 1 δ ( x I , y I ) � |V| p ( x I ) I ∈V x and y get closer in the feature space when they share rare common subparts

15 Implementation • Factorization for particular choices of p ( . ) and V

15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n

15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n ⋆ product distribution p ( x ) = � n j =1 p j ( x j ) .

15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n ⋆ product distribution p ( x ) = � n j =1 p j ( x j ) . ⋆ implementation in O ( n ) because n � � ( . . . ) = ( . . . ) i =1 I ∈V

16 Part 3 Application: SVM prediction of signal peptide cleavage site

17 Secretory pathway mRNA Nascent protein Signal peptide ER −Nucleus −Chloroplast Golgi −Mitochondrion −Cell surface (secreted) −Peroxisome −Lysosome −Cytosole −Plasma membrane

18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein

18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein • 6-12 hydrophobic residues (in yellow) • (-3,-1) : small uncharged residues

19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ]

19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ] • 1,418 positive examples, 65,216 negative examples

19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ] • 1,418 positive examples, 65,216 negative examples • Computation of a weight matrix: SVM + K prod (naive Bayes) vs SVM + K interpolated

20 Result: ROC curves 100 Interpolated Kernel False Negative (%) 80 Product Kernel (Bayes) 60 40 4 8 12 16 20 24 0 False positive (%)

21 Conclusion

22 Conclusion • An other way to derive a kernel from a probability distribution

22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts

22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts • Encouraging result on real-world application’ “how to improve a weight matrix based classifier”

22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts • Encouraging result on real-world application’ “how to improve a weight matrix based classifier” • Future work: more application-specific kernels

23 Acknowledgement • Minoru Kanehisa • Applied Biosystems for the travel grant

Support vector machine prediction of signal peptide cleavage site - PowerPoint PPT Presentation

1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan 2 Outline 1. SVM and kernel methods 2. New kernels for

Analysis of the Signal Peptide dataset November 28, 2019 1 Signal Peptide - A short peptide

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I

Peptide modeling in isolation and in interaction : steps towards rational peptide design Pierre

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

The Prediction Error Signal 1 Prediction Error Signal Behavior 2 LP Speech Analysis file:s5,

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Algorithms in Bioinformatics: A f Practical Introduction Practical Introduction Peptide

Why Deep Learning Is More Natural Questions Efficient than Support Support Vector . . . Support

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Multi-class Support Vector Machine Rizal Zaini Ahmad Fathony November 10, 2016 University of

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machine w T x + b = 0 b || w || Support Vector Support Vector w X i y i ( x

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 27 August

What Have We Learned from Meta-analyses of Clinical Trials 2016 IPS: The Mental Health Services

AND H OW D O I DO IT ? Benjamin A. Goldstein PhD, MPH ben.goldstein@duke.edu Department of

Extracellular vesicles as biomarkers: flow flaws, facts and clinical acts Edwin van der Pol May

2018 ABSTRACT WINNERS 2018 ABSTRACT WINNER Author: Prof. Moustafa Hamada El Azazy Riyadh,

Data analysis in (infra)Structure and project evaluation genetic epi and advise

Statins on the Risk of Cardiovascular Disease Brian A. Ference MD, MPhil, MSc, John J. P.

Saturated Fat, Carbohydrates, & Metabolic Syndrome Should We Be Changing Our

The Cycle of Statistical Research Qingyuan Zhao Statistical Laboratory, University of Cambridge

Support vector machine prediction of signal peptide cleavage site - PowerPoint PPT Presentation

1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan 2 Outline 1. SVM and kernel methods 2. New kernels for

Analysis of the Signal Peptide dataset November 28, 2019 1 Signal Peptide - A short peptide

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I

Peptide modeling in isolation and in interaction : steps towards rational peptide design Pierre

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

The Prediction Error Signal 1 Prediction Error Signal Behavior 2 LP Speech Analysis file:s5,

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Algorithms in Bioinformatics: A f Practical Introduction Practical Introduction Peptide

Why Deep Learning Is More Natural Questions Efficient than Support Support Vector . . . Support

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Multi-class Support Vector Machine Rizal Zaini Ahmad Fathony November 10, 2016 University of

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machine w T x + b = 0 b || w || Support Vector Support Vector w X i y i ( x

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 27 August

What Have We Learned from Meta-analyses of Clinical Trials 2016 IPS: The Mental Health Services

AND H OW D O I DO IT ? Benjamin A. Goldstein PhD, MPH ben.goldstein@duke.edu Department of

Extracellular vesicles as biomarkers: flow flaws, facts and clinical acts Edwin van der Pol May

2018 ABSTRACT WINNERS 2018 ABSTRACT WINNER Author: Prof. Moustafa Hamada El Azazy Riyadh,

Data analysis in (infra)Structure and project evaluation genetic epi and advise

Statins on the Risk of Cardiovascular Disease Brian A. Ference MD, MPhil, MSc, John J. P.

Saturated Fat, Carbohydrates, &amp; Metabolic Syndrome Should We Be Changing Our

The Cycle of Statistical Research Qingyuan Zhao Statistical Laboratory, University of Cambridge

Saturated Fat, Carbohydrates, & Metabolic Syndrome Should We Be Changing Our