support vector machine prediction of signal peptide
play

Support vector machine prediction of signal peptide cleavage site - PowerPoint PPT Presentation

1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan 2 Outline 1. SVM and kernel methods 2. New kernels for


  1. 1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan

  2. 2 Outline 1. SVM and kernel methods 2. New kernels for bioinformatics 3. Example: signal peptide cleavage site prediction

  3. 3 Part 1 SVM and kernel methods

  4. 4 Support vector machines φ • Objects to classified x mapped to a feature space • Largest margin separating hyperplan in the feature space

  5. 5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) >

  6. 5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) > • Simple kernels can represent complex Φ

  7. 5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) > • Simple kernels can represent complex Φ • For a given kernel, not only SVM but also clustering, PCA, ICA... possible in the feature space = kernel methods

  8. 6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid...

  9. 6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors

  10. 6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings:

  11. 6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98)

  12. 6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99)

  13. 6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99) ⋆ Kernel for translation initiation site (Zien et al. 00)

  14. 6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99) ⋆ Kernel for translation initiation site (Zien et al. 00) ⋆ String kernel (Lodhi et al. 00)

  15. 7 Kernel engineering Use prior knowledge to build the geometry of the feature space through K ( ., . )

  16. 8 Part 2 New kernels for bioinfomatics

  17. 9 The problem • X a set of objects

  18. 9 The problem • X a set of objects • p ( x ) a probability distribution on X

  19. 9 The problem • X a set of objects • p ( x ) a probability distribution on X • How to build K ( x, y ) from p ( x ) ?

  20. 10 Product kernel K prod ( x, y ) = p ( x ) p ( y )

  21. 10 Product kernel K prod ( x, y ) = p ( x ) p ( y ) x p(y) p(x) 0 y

  22. 10 Product kernel K prod ( x, y ) = p ( x ) p ( y ) x p(y) p(x) 0 y SVM = Bayesian classifier

  23. 11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y )

  24. 11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y ) p(x) x p(y) z p(z) y

  25. 11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y ) p(x) x p(y) z p(z) y No learning

  26. 12 Interpolated kernel If objects are composite: x = ( x 1 , x 2 ) : K ( x, y ) = K diag ( x 1 , y 1 ) K prod ( x 2 , y 2 )

  27. 12 Interpolated kernel If objects are composite: x = ( x 1 , x 2 ) : K ( x, y ) = K diag ( x 1 , y 1 ) K prod ( x 2 , y 2 ) = p ( x 1 ) δ ( x 1 , y 1 ) × p ( x 2 | x 1 ) p ( y 2 | y 1 ) A* AA AB BA B* BB

  28. 13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n )

  29. 13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n ) • A list of index subsets: V = { I 1 , . . . , I v } where I i ⊂ { 1 , . . . , n }

  30. 13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n ) • A list of index subsets: V = { I 1 , . . . , I v } where I i ⊂ { 1 , . . . , n } • Interpolated kernel: K V ( x, y ) = 1 � K diag ( x I , y I ) K prod ( x I c , y I c ) |V| I ∈V

  31. 14 Rare common subparts For a given p ( x ) and p ( y ) , we have: K V ( x, y ) = K prod ( x, y ) × 1 δ ( x I , y I ) � |V| p ( x I ) I ∈V

  32. 14 Rare common subparts For a given p ( x ) and p ( y ) , we have: K V ( x, y ) = K prod ( x, y ) × 1 δ ( x I , y I ) � |V| p ( x I ) I ∈V x and y get closer in the feature space when they share rare common subparts

  33. 15 Implementation • Factorization for particular choices of p ( . ) and V

  34. 15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n

  35. 15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n ⋆ product distribution p ( x ) = � n j =1 p j ( x j ) .

  36. 15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n ⋆ product distribution p ( x ) = � n j =1 p j ( x j ) . ⋆ implementation in O ( n ) because n � � ( . . . ) = ( . . . ) i =1 I ∈V

  37. 16 Part 3 Application: SVM prediction of signal peptide cleavage site

  38. 17 Secretory pathway mRNA Nascent protein Signal peptide ER −Nucleus −Chloroplast Golgi −Mitochondrion −Cell surface (secreted) −Peroxisome −Lysosome −Cytosole −Plasma membrane

  39. 18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein

  40. 18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein • 6-12 hydrophobic residues (in yellow) • (-3,-1) : small uncharged residues

  41. 19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ]

  42. 19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ] • 1,418 positive examples, 65,216 negative examples

  43. 19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ] • 1,418 positive examples, 65,216 negative examples • Computation of a weight matrix: SVM + K prod (naive Bayes) vs SVM + K interpolated

  44. 20 Result: ROC curves 100 Interpolated Kernel False Negative (%) 80 Product Kernel (Bayes) 60 40 4 8 12 16 20 24 0 False positive (%)

  45. 21 Conclusion

  46. 22 Conclusion • An other way to derive a kernel from a probability distribution

  47. 22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts

  48. 22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts • Encouraging result on real-world application’ “how to improve a weight matrix based classifier”

  49. 22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts • Encouraging result on real-world application’ “how to improve a weight matrix based classifier” • Future work: more application-specific kernels

  50. 23 Acknowledgement • Minoru Kanehisa • Applied Biosystems for the travel grant

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend