Support vector machine prediction of signal peptide cleavage site - - PowerPoint PPT Presentation

support vector machine prediction of signal peptide
SMART_READER_LITE
LIVE PREVIEW

Support vector machine prediction of signal peptide cleavage site - - PowerPoint PPT Presentation

1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan 2 Outline 1. SVM and kernel methods 2. New kernels for


slide-1
SLIDE 1

1

Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings

Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan

slide-2
SLIDE 2

2

Outline

  • 1. SVM and kernel methods
  • 2. New kernels for bioinformatics
  • 3. Example: signal peptide cleavage site prediction
slide-3
SLIDE 3

3

Part 1

SVM and kernel methods

slide-4
SLIDE 4

4

Support vector machines

φ

  • Objects to classified x mapped to a feature space
  • Largest margin separating hyperplan in the feature space
slide-5
SLIDE 5

5

The kernel trick

  • Implicit definition of x → Φ(x) through the kernel:

K(x, y)

def

= < Φ(x), Φ(y) >

slide-6
SLIDE 6

5

The kernel trick

  • Implicit definition of x → Φ(x) through the kernel:

K(x, y)

def

= < Φ(x), Φ(y) >

  • Simple kernels can represent complex Φ
slide-7
SLIDE 7

5

The kernel trick

  • Implicit definition of x → Φ(x) through the kernel:

K(x, y)

def

= < Φ(x), Φ(y) >

  • Simple kernels can represent complex Φ
  • For a given kernel, not only SVM but also clustering,

PCA, ICA... possible in the feature space = kernel methods

slide-8
SLIDE 8

6

Kernel examples

  • “Classical” kernels: polynomial, Gaussian, sigmoid...
slide-9
SLIDE 9

6

Kernel examples

  • “Classical” kernels: polynomial, Gaussian, sigmoid...

but the objects x must be vectors

slide-10
SLIDE 10

6

Kernel examples

  • “Classical” kernels: polynomial, Gaussian, sigmoid...

but the objects x must be vectors

  • “Exotic” kernels for strings:
slide-11
SLIDE 11

6

Kernel examples

  • “Classical” kernels: polynomial, Gaussian, sigmoid...

but the objects x must be vectors

  • “Exotic” kernels for strings:

⋆ Fisher kernel (Jaakkoola and Haussler 98)

slide-12
SLIDE 12

6

Kernel examples

  • “Classical” kernels: polynomial, Gaussian, sigmoid...

but the objects x must be vectors

  • “Exotic” kernels for strings:

⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99)

slide-13
SLIDE 13

6

Kernel examples

  • “Classical” kernels: polynomial, Gaussian, sigmoid...

but the objects x must be vectors

  • “Exotic” kernels for strings:

⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99) ⋆ Kernel for translation initiation site (Zien et al. 00)

slide-14
SLIDE 14

6

Kernel examples

  • “Classical” kernels: polynomial, Gaussian, sigmoid...

but the objects x must be vectors

  • “Exotic” kernels for strings:

⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99) ⋆ Kernel for translation initiation site (Zien et al. 00) ⋆ String kernel (Lodhi et al. 00)

slide-15
SLIDE 15

7

Kernel engineering

Use prior knowledge to build the geometry of the feature space through K(., .)

slide-16
SLIDE 16

8

Part 2

New kernels for bioinfomatics

slide-17
SLIDE 17

9

The problem

  • X a set of objects
slide-18
SLIDE 18

9

The problem

  • X a set of objects
  • p(x) a probability distribution on X
slide-19
SLIDE 19

9

The problem

  • X a set of objects
  • p(x) a probability distribution on X
  • How to build K(x, y) from p(x)?
slide-20
SLIDE 20

10

Product kernel

Kprod(x, y) = p(x)p(y)

slide-21
SLIDE 21

10

Product kernel

Kprod(x, y) = p(x)p(y)

x p(x) y p(y)

slide-22
SLIDE 22

10

Product kernel

Kprod(x, y) = p(x)p(y)

x p(x) y p(y)

SVM = Bayesian classifier

slide-23
SLIDE 23

11

Diagonal kernel

Kdiag(x, y) = p(x)δ(x, y)

slide-24
SLIDE 24

11

Diagonal kernel

Kdiag(x, y) = p(x)δ(x, y)

p(y) p(x) p(z) x y z

slide-25
SLIDE 25

11

Diagonal kernel

Kdiag(x, y) = p(x)δ(x, y)

p(y) p(x) p(z) x y z

No learning

slide-26
SLIDE 26

12

Interpolated kernel

If objects are composite: x = (x1, x2) : K(x, y) = Kdiag(x1, y1)Kprod(x2, y2)

slide-27
SLIDE 27

12

Interpolated kernel

If objects are composite: x = (x1, x2) : K(x, y) = Kdiag(x1, y1)Kprod(x2, y2) = p(x1)δ(x1, y1) × p(x2|x1)p(y2|y1)

AA BA BB AB B* A*

slide-28
SLIDE 28

13

General interpolated kernel

  • Composite objects x = (x1, . . . , xn)
slide-29
SLIDE 29

13

General interpolated kernel

  • Composite objects x = (x1, . . . , xn)
  • A list of index subsets: V = {I1, . . . , Iv} where Ii ⊂

{1, . . . , n}

slide-30
SLIDE 30

13

General interpolated kernel

  • Composite objects x = (x1, . . . , xn)
  • A list of index subsets: V = {I1, . . . , Iv} where Ii ⊂

{1, . . . , n}

  • Interpolated kernel:

KV(x, y) = 1 |V|

  • I∈V

Kdiag(xI, yI)Kprod(xIc, yIc)

slide-31
SLIDE 31

14

Rare common subparts

For a given p(x) and p(y), we have: KV(x, y) = Kprod(x, y) × 1 |V|

  • I∈V

δ(xI, yI) p(xI)

slide-32
SLIDE 32

14

Rare common subparts

For a given p(x) and p(y), we have: KV(x, y) = Kprod(x, y) × 1 |V|

  • I∈V

δ(xI, yI) p(xI) x and y get closer in the feature space when they share rare common subparts

slide-33
SLIDE 33

15

Implementation

  • Factorization for particular choices of p(.) and V
slide-34
SLIDE 34

15

Implementation

  • Factorization for particular choices of p(.) and V
  • Example:

⋆ V = P({1, . . . , n}) the set of all subsets: |V| = 2n

slide-35
SLIDE 35

15

Implementation

  • Factorization for particular choices of p(.) and V
  • Example:

⋆ V = P({1, . . . , n}) the set of all subsets: |V| = 2n ⋆ product distribution p(x) = n

j=1 pj(xj).

slide-36
SLIDE 36

15

Implementation

  • Factorization for particular choices of p(.) and V
  • Example:

⋆ V = P({1, . . . , n}) the set of all subsets: |V| = 2n ⋆ product distribution p(x) = n

j=1 pj(xj).

⋆ implementation in O(n) because

  • I∈V

(. . .) =

n

  • i=1

(. . .)

slide-37
SLIDE 37

16

Part 3

Application: SVM prediction of signal peptide cleavage site

slide-38
SLIDE 38

17

Secretory pathway

Nascent protein ER Golgi Signal peptide mRNA −Cell surface (secreted) −Lysosome −Plasma membrane −Nucleus −Chloroplast −Mitochondrion −Peroxisome −Cytosole

slide-39
SLIDE 39

18

Signal peptides

Protein

  • 1

+1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein

slide-40
SLIDE 40

18

Signal peptides

Protein

  • 1

+1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein

  • 6-12 hydrophobic residues (in yellow)
  • (-3,-1) : small uncharged residues
slide-41
SLIDE 41

19

Experiment

  • Challenge :

classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x−8, x−7, . . . , x−1, x1, x2]

slide-42
SLIDE 42

19

Experiment

  • Challenge :

classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x−8, x−7, . . . , x−1, x1, x2]

  • 1,418 positive examples, 65,216 negative examples
slide-43
SLIDE 43

19

Experiment

  • Challenge :

classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x−8, x−7, . . . , x−1, x1, x2]

  • 1,418 positive examples, 65,216 negative examples
  • Computation of a weight matrix:

SVM + Kprod (naive Bayes) vs SVM + Kinterpolated

slide-44
SLIDE 44

20

Result: ROC curves

False positive (%)

40 60 80 100 4 8 12 16 20 24

False Negative (%)

Product Kernel (Bayes) Interpolated Kernel

slide-45
SLIDE 45

21

Conclusion

slide-46
SLIDE 46

22

Conclusion

  • An other way to derive a kernel from a probability

distribution

slide-47
SLIDE 47

22

Conclusion

  • An other way to derive a kernel from a probability

distribution

  • Useful when objects can be compared by comparing

subparts

slide-48
SLIDE 48

22

Conclusion

  • An other way to derive a kernel from a probability

distribution

  • Useful when objects can be compared by comparing

subparts

  • Encouraging result on real-world application’ “how to

improve a weight matrix based classifier”

slide-49
SLIDE 49

22

Conclusion

  • An other way to derive a kernel from a probability

distribution

  • Useful when objects can be compared by comparing

subparts

  • Encouraging result on real-world application’ “how to

improve a weight matrix based classifier”

  • Future work: more application-specific kernels
slide-50
SLIDE 50

23

Acknowledgement

  • Minoru Kanehisa
  • Applied Biosystems for the travel grant