iaml support vector machines i
play

IAML: Support Vector Machines I Nigel Goddard School of Informatics - PowerPoint PPT Presentation

IAML: Support Vector Machines I Nigel Goddard School of Informatics Semester 1 1 / 18 Outline Separating hyperplane with maximum margin Non-separable training data Expanding the input into a high-dimensional space Support vector


  1. IAML: Support Vector Machines I Nigel Goddard School of Informatics Semester 1 1 / 18

  2. Outline ◮ Separating hyperplane with maximum margin ◮ Non-separable training data ◮ Expanding the input into a high-dimensional space ◮ Support vector regression ◮ Reading: W & F sec 6.3 (maximum margin hyperplane, nonlinear class boundaries), SVM handout. SV regression not examinable. 2 / 18

  3. Overview ◮ Support vector machines are one of the most effective and widely used classification algorithms. ◮ SVMs are the combination of two ideas ◮ Maximum margin classification ◮ The “kernel trick” ◮ SVMs are a linear classifier, like logistic regression and perceptron 3 / 18

  4. Stuff You Need to Remember w ⊤ x is length of the projection of x onto w (if w is a unit vector) x w b i.e., b = w T x . (If you do not remember this, see supplementary maths notes on course Web site.) 4 / 18

  5. Separating Hyperplane For any linear classifier ◮ Training instances ( x i , y i ) , i = 1 , . . . , n . y i ∈ {− 1 , + 1 } ◮ Hyperplane w ⊤ x + w 0 = 0 ◮ Notice for this lecture we use − 1 rather than 0 for negative class. This will be convenient for the maths. x 2 o o o o o o o w o o x o o x x x x 1 x x x x 5 / 18

  6. A Crap Decision Boundary Seems okay This is crap x 2 x 2 o o o o o o o o o o o o o o o x o o w o o o x x x o x o x x x x 1 x x x x 1 x x x w x x 6 / 18

  7. Idea: Maximize the Margin The margin is the distance between the decision boundary (the hyperplane) and the closest training point. x o x x o x o o ~ margin w o 7 / 18

  8. Computing the Margin ◮ The tricky part will be to get an equation for the margin ◮ We’ll start by getting the distance from the origin to the hyperplane ◮ i.e., We want to compute the scalar b below w b w T x + w 0 = 0 8 / 18

  9. Computing the Distance to Origin ◮ Define z as the point on the hyperplane closest to the origin. ◮ z must be proportional to z w , because w normal to hyperplane w b ◮ By definition of b , we have the norm of z given by: w T x + w 0 = 0 || z || = b So b w || w || = z 9 / 18

  10. Computing the Distance to Origin ◮ We know that (a) z on the hyperplane and (b) b w || w || = z . ◮ First (a) means w T z + w 0 = 0 ◮ Substituting we get w T b w || w || + w 0 = 0 b w T w || w || + w 0 = 0 b = − w 0 || w || √ ◮ Remember || w || = w T w . ◮ Now we have the distance from the origin to the hyperplane! 10 / 18

  11. Computing the Distance to Hyperplane x c w a b ◮ Now we want c , the distance from x to the hyperplane. ◮ It’s clear that c = | b − a | , where a the length of the projection of x onto w . Quiz: What is a ? 11 / 18

  12. Computing the Distance to Hyperplane x c w a b ◮ Now we want c , the distance from x to the hyperplane. ◮ It’s clear that c = | b − a | , where a the length of the projection of x onto w . Quiz: What is a ? a = w T x || w || 12 / 18

  13. Equation for the Margin ◮ The perpendicular distance from a point x to the hyperplane w T x + w 0 = 0 is 1 || w ||| w T x + w 0 | ◮ The margin is the distance from the closest training point to the hyperplane 1 || w ||| w T x i + w 0 | min i 13 / 18

  14. The Scaling ◮ Note that ( w , w 0 ) and ( c w , cw 0 ) defines the same hyperplane. The scale is arbitrary. ◮ This is because we predict class y = 1 if w T x + w 0 ≥ 0. That’s the same thing as saying c w T x + cw 0 ≥ 0 ◮ To remove this freedom, we will put a constraint on ( w , w 0 ) | w ⊤ x i + w 0 | = 1 min i ◮ With this constraint, the margin is always 1 / || w || . 14 / 18

  15. First version of Max Margin Optimization Problem ◮ Here is a first version of an optimization problem to maximize the margin (we will simplify) 1 / || w || max w subject to w ⊤ x i + w 0 ≥ 0 for all i with y i = 1 w ⊤ x i + w 0 ≤ 0 for all i with y i = − 1 | w ⊤ x i + w 0 | = 1 min i ◮ The first two constraints are too lose. It’s the same thing to say max 1 / || w || w subject to w ⊤ x i + w 0 ≥ 1 for all i with y i = 1 w ⊤ x i + w 0 ≤ − 1 for all i with y i = − 1 | w ⊤ x i + w 0 | = 1 min i ◮ Now the third constraint is redundant 15 / 18

  16. First version of Max Margin Optimization Problem ◮ That means we can simplify to max 1 / || w || w subject to w ⊤ x i + w 0 ≥ 1 for all i with y i = 1 w ⊤ x i + w 0 ≤ − 1 for all i with y i = − 1 ◮ Here’s a compact way to write those two constraints max 1 / || w || w subject to y i ( w ⊤ x i + w 0 ) ≥ 1 for all i ◮ Finally, note that maximizing 1 / || w || is the same thing as minimizing || w || 2 16 / 18

  17. The SVM optimization problem ◮ So the SVM weights are determined by solving the optimization problem: || w || 2 min w s.t. y i ( w ⊤ x i + w 0 ) ≥ + 1 for all i ◮ Solving this will require maths that we don’t have in this course. But I’ll show the form of the solution next time. 17 / 18

  18. Fin (Part I) 18 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend