10 support vector machines
play

10. Support Vector Machines Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSuplec Fall 2017 10. Support Vector Machines Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectjves Defjne a


  1. Positjve defjnite matrices ● Have a unique Cholesky decompositjon L: lower triangular, with positjve elements on the diagonal ● Sesquilinear form is an inner product – conjugate symmetry – linearity in the fjrst argument – positjve defjniteness 71

  2. Polynomial kernels ? Compute 72

  3. Polynomial kernels More generally, for is an inner product in a feature space of all monomials of degree up to d. 73

  4. Gaussian kernel What is the dimension of the feature space? 74

  5. Gaussian kernel The feature space has infjnite dimension. 75

  6. 76

  7. Toy example 77

  8. Toy example: linear SVM 78

  9. Toy example: polynomial SVM (d=2) 79

  10. Kernels for strings 80

  11. Protein sequence classifjcatjon Goal: predict which proteins are secreted or not, based on their sequence. 81

  12. Substring-based representatjons ● Represent strings based on the presence/absence of substrings of fjxed length. ? Strings of length k 82

  13. Substring-based representatjons ● Represent strings based on the presence/absence of substrings of fjxed length. – Number of occurrences of u in x : spectrum kernel [Leslie et al., 2002]. 83

  14. Substring-based representatjons ● Represent strings based on the presence/absence of substrings of fjxed length. – Number of occurrences of u in x : spectrum kernel [Leslie et al., 2002]. – Number of occurrences of u in x, up to m mismatches: mismatch kernel [Leslie et al., 2004]. 84

  15. Substring-based representatjons ● Represent strings based on the presence/absence of substrings of fjxed length. – Number of occurrences of u in x : spectrum kernel [Leslie et al., 2002]. – Number of occurrences of u in x, up to m mismatches: mismatch kernel [Leslie et al., 2004]. – Number of occcurrences of u in x, allowing gaps, with a weight decaying exponentjally with the number of gaps: substring kernel [Lohdi et al., 2002]. 85

  16. Spectrum kernel ● Implementatjon: – Formally, a sum over |A k |terms – How many non-zero terms in ? ? 86

  17. Spectrum kernel ● Implementatjon: – Formally, a sum over |A k |terms – At most | x | - k + 1 non-zero terms in – Hence: Computatjon in O(| x |+| x' |) ● Predictjon for a new sequence x: Write f(x) as a functjon of only |x|-k+1 weights. ? 87

  18. Spectrum kernel ● Implementatjon: – Formally, a sum over |A k |terms – At most | x | - k + 1 non-zero terms in – Hence: Computatjon in O(| x |+| x' |) ● Fast predictjon for a new sequence x: 88

  19. The choice of kernel maters Performance of several kernels on the SCOP superfamily recognitjon kernel [Saigo et al., 2004] 89

  20. Kernels for graphs 90

  21. Graph data ● Molecules ● Images [Harchaoui & Bach, 2007] 91

  22. Subgraph-based representatjons 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 no occurrence 1+ occurrences of the 1 st feature of the 10 th feature 92

  23. Tanimoto & MinMax ● The Tanimoto and MinMax similaritjes are kernels 93

  24. Which subgraphs to use? ● Indexing by all subgraphs... – Computjng all subgraph occurences is NP-hard. – Actually, fjnding whether a given subgraph occurs in a graph is NP-hard in general. htup://jeremykun.com/2015/11/12/a-quasipolynomial-tjme-algorithm-for-graph-isomorphism-the-details/ 94

  25. Which subgraphs to use? ● Specifjc subgraphs that lead to computatjonally effjcient indexing: – Subgraphs selected based on domain knowledge E.g. chemical fjngerprints – All frequent subgraphs [Helma et al., 2004] – All paths up to length k [Nicholls 2005] – All walks up to length k [Mahé et al., 2005] – All trees up to depth k [Rogers, 2004] – All shortest paths [Borgwardt & Kriegel, 2005] – All subgraphs up to k vertjces (graphlets) [Shervashidze et al., 2009] 95

  26. Which subgraphs to use? Path of length 5 Walk of length 5 Tree of depth 2 96

  27. Which subgraphs to use? Paths Trees Walks [Harchaoui & Bach, 2007] 97

  28. The choice of kernel maters Predictjng inhibitors for 60 cancer cell lines [Mahé & Vert, 2009] 98

  29. The choice of kernel maters ● COREL14: 1400 natural images, 14 classes ● Kernels: histogram (H), walk kernel (W), subtree kernel (TW), weighted subtree kernel (wTW), combinatjon (M). [Harchaoui & Bach, 2007] 99

  30. Summary ● Linearly separable case: hard-margin SVM ● Non-separable, but stjll linear: sofu-margin SVM ● Non-linear: kernel SVM ● Kernels for – real-valued data – strings – graphs. 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend