support vector machines
play

Support Vector Machines Prof. Mike Hughes Many ideas/slides - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Support Vector Machines Prof. Mike Hughes Many ideas/slides attributable to: Dan Sheldon (U.Mass.) James, Witten, Hastie, Tibshirani (ISL/ESL books) 2


  1. Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Support Vector Machines Prof. Mike Hughes Many ideas/slides attributable to: Dan Sheldon (U.Mass.) James, Witten, Hastie, Tibshirani (ISL/ESL books) 2

  2. SVM Unit Objectives Big ideas: Support Vector Machine • Why maximize margin? • What is a support vector? • What is hinge loss? • Advantages over logistic regression • Less sensitive to outliers • Advantages from sparsity in when using kernels • Disadvantages • Not probabilistic • Less elegant to do multi-class Mike Hughes - Tufts COMP 135 - Spring 2019 3

  3. What will we learn? Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Spring 2019 4

  4. Recall: Kernelized Regression Mike Hughes - Tufts COMP 135 - Spring 2019 5

  5. Linear Regression Mike Hughes - Tufts COMP 135 - Spring 2019 6

  6. Kernel Function k ( x i , x j ) = φ ( x i ) T φ ( x j ) Interpret: similarity function for x_i and x_j Properties: Input: any two vectors Output: scalar real larger values mean more similar symmetric Mike Hughes - Tufts COMP 135 - Spring 2019 7

  7. Common Kernels Mike Hughes - Tufts COMP 135 - Spring 2019 8

  8. Kernel Matrices • K_train : N x N symmetric matrix   k ( x 1 , x 1 ) k ( x 1 , x 2 ) . . . k ( x 1 , x N ) k ( x 2 , x 1 ) k ( x 2 , x 2 ) . . . k ( x 2 , x N )   K = .   .   .   k ( x N , x 1 ) k ( x N , x 2 ) . . . k ( x N , x N ) • K_test : T x N matrix for test feature Mike Hughes - Tufts COMP 135 - Spring 2019 9

  9. Linear Kernel Regression Mike Hughes - Tufts COMP 135 - Spring 2019 10

  10. Radial basis kernel aka Gaussian aka Squared Exponential Mike Hughes - Tufts COMP 135 - Spring 2019 11

  11. Compare: Linear Regression Prediction : Linear transform of G-dim features G y ( x i , θ ) = θ T φ ( x i ) = X ˆ θ g · φ ( x i ) g g =1 Training : Solve G-dim optimization problem N + L2 penalty y ( x n , θ )) 2 X min ( y n − ˆ (optional) θ n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 12

  12. Kernelized Linear Regression • Prediction : N X y ( x i , α , { x n } N ˆ n =1 ) = α n k ( x n , x i ) = X n =1 • Training: Solve N-dim optimization problem N X y ( x n , α , X )) 2 min ( y n − ˆ α n =1 Can do all needed operations with only access to kernel (no feature vectors are created in memory) Mike Hughes - Tufts COMP 135 - Spring 2019 13

  13. Math on board • Goal: Kernelized linear prediction reweights each training example Mike Hughes - Tufts COMP 135 - Spring 2019 14

  14. Can kernelize any linear model Regression: Prediction N X y ( x i , α , { x n } N ˆ n =1 ) = α n k ( x n , x i ) n =1 Logistic Regression: Prediction p ( Y i = 1 | x i ) = σ (ˆ y ( x i , α , X )) Mike Hughes - Tufts COMP 135 - Spring 2019 15

  15. Training : Reg. Vs. Logistic Reg. N y ( x n , α , X )) 2 X min ( y n − ˆ α n =1 N X min log loss( y n , σ (ˆ y ( x n , α , X ))) α n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 16

  16. Downsides of LR Log loss means any example misclassified pays a steep price, pretty sensitive to outliers Mike Hughes - Tufts COMP 135 - Spring 2019 17

  17. Stepping back Which do we prefer? Why? Mike Hughes - Tufts COMP 135 - Spring 2019 18

  18. Idea: Define Regions Separated by Wide Margin Mike Hughes - Tufts COMP 135 - Spring 2019 19

  19. w is perpendicular to boundary Mike Hughes - Tufts COMP 135 - Spring 2019 20

  20. Examples that define the margin are called support vectors Nearest positive example Nearest negative example Mike Hughes - Tufts COMP 135 - Spring 2019 21

  21. Observation: Non-support training examples do not influence margin at all Could perturb these examples without impacting boundary Mike Hughes - Tufts COMP 135 - Spring 2019 22

  22. How wide is the margin? Mike Hughes - Tufts COMP 135 - Spring 2019 23

  23. Small margin ! = #+1 &' w x + b ≥ 0 −1 &' w x + b < 0 y positive w x + b>0 y negative Margin: distance to the boundary w x + b<0 Mike Hughes - Tufts COMP 135 - Spring 2019 24

  24. Large margin ! = #+1 &' w x + b ≥ 0 −1 &' w x + b < 0 y positive w x + b>0 y negative Margin: distance to the boundary w x + b<0 Mike Hughes - Tufts COMP 135 - Spring 2019 25

  25. How wide is the margin? Distance from nearest positive example to nearest negative example along vector w: Mike Hughes - Tufts COMP 135 - Spring 2019 26

  26. How wide is the margin? Distance from nearest positive example to nearest negative example along vector w: By construction, we assume Mike Hughes - Tufts COMP 135 - Spring 2019 27

  27. SVM Training Problem Version 1: Hard margin For each n = 1, 2, …. N Requires all training examples to be correctly classified This is a constrained quadratic optimization problem. There are standard optimization methods as well methods specially designed for SVM. Mike Hughes - Tufts COMP 135 - Spring 2019 28

  28. SVM Training Problem Version 1: Hard margin Minimizing this equivalent to maximizing the margin width in feature space For each n = 1, 2, …. N Requires all training examples to be correctly classified This is a constrained quadratic optimization problem. There are standard optimization methods as well methods specially designed for SVM. Mike Hughes - Tufts COMP 135 - Spring 2019 29

  29. Soft margin: Allow some misclassifications Slack at example i Distance on wrong side of the margin Mike Hughes - Tufts COMP 135 - Spring 2019 30

  30. Hard vs. soft constraints HARD: All positive examples to satisfy SOFT: Want each positive examples to satisfy with slack as small as possible (minimize absolute value ) Mike Hughes - Tufts COMP 135 - Spring 2019 31

  31. Hinge loss for positive example Assumes current example has positive label y = +1 +1 0 Mike Hughes - Tufts COMP 135 - Spring 2019 32

  32. SVM Training Problem Version 2: Soft margin Tradeoff parameter C controls model complexity Smaller C: Simpler model, encourage large margin even if we make lots of mistakes Bigger C: Avoid mistakes Mike Hughes - Tufts COMP 135 - Spring 2019 33

  33. SVM vs LR: Compare training Both require tuning complexity hyperparameter C to avoid overfitting Mike Hughes - Tufts COMP 135 - Spring 2019 34

  34. Loss functions: SVM vs Logistic Regr. Mike Hughes - Tufts COMP 135 - Spring 2019 35

  35. SVMs: Prediction Make binary prediction via hard threshold Mike Hughes - Tufts COMP 135 - Spring 2019 36

  36. SVMs and Kernels: Prediction Make binary prediction via hard threshold Efficient training algorithms using modern quadratic programming solve the dual optimization problem of SVM soft margin problem Mike Hughes - Tufts COMP 135 - Spring 2019 37

  37. Support vectors are often small fraction of all examples Nearest positive example Nearest negative example Mike Hughes - Tufts COMP 135 - Spring 2019 38

  38. Support vectors defined by non-zero alpha in kernel view Mike Hughes - Tufts COMP 135 - Spring 2019 39

  39. SVM + Squared Exponential Kernel Mike Hughes - Tufts COMP 135 - Spring 2019 40

  40. SVM Logistic Regr Loss hinge cross entropy (log loss) Sensitive to less more outliers Probabilistic? no yes Kernelizable? Yes, with speed Yes benefits from sparsity Multi-class? Only via a separate Easy , use softmax one-vs-all for each class Mike Hughes - Tufts COMP 135 - Spring 2019 41

  41. Activity • KernelRegressionDemo.ipynb • Scroll down to SVM vs logistic regression • Can you visualize behavior with different C? • Try different kernels? • Examine alpha vector? Mike Hughes - Tufts COMP 135 - Spring 2019 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend