support vector machines
play

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 - PowerPoint PPT Presentation

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin


  1. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7

  2. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data

  3. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data

  4. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Linear Classification • Consider a two class classification problem • Use a linear model y ( x ) = w T φ ( x ) + b followed by a threshold function • For now, let’s assume training data are linearly separable • Recall that the perceptron would converge to a perfect classifier for such data • But there are many such perfect classifiers

  5. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Max Margin y = 1 y = 0 y = − 1 margin • We can define the margin of a classifier as the minimum distance to any example • In support vector machines the decision boundary which maximizes the margin is chosen

  6. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Marginal Geometry y > 0 x 2 y = 0 R 1 y < 0 R 2 x w y ( x ) � w � x ⊥ x 1 − w 0 � w � • Recall from Ch. 4 • Projection of x in w dir. is w T x || w || • y ( x ) = 0 when w T x = − b , or w T x || w || = − b || w || • So w T x || w || = y ( x ) || w || − − b || w || is signed distance to decision boundary

  7. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vectors y = − 1 y = 0 y = 1 • Assuming data are separated by the hyperplane, distance to decision boundary is t n y ( x n ) || w || • The maximum margin criterion chooses w , b by: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Points with this min value are known as support vectors

  8. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

  9. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

  10. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

  11. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

  12. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • So the optimization problem is now a constrained optimization problem: 1 2 || w || 2 arg min w , b ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 s . t . • To solve this, we need to take a detour into Lagrange multipliers

  13. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data

  14. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

  15. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

  16. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

  17. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

  18. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Example x 2 • Consider the problem ( x ⋆ 1 , x ⋆ 2 ) f ( x 1 , x 2 ) = 1 − x 2 1 − x 2 max 2 x x 1 s . t . g ( x 1 , x 2 ) = x 1 + x 2 − 1 = 0 g ( x 1 , x 2 ) = 0 • Lagrangian: L ( x , λ ) = 1 − x 2 1 − x 2 2 + λ ( x 1 + x 2 − 1 ) • Stationary points require: ∂ L /∂ x 1 = − 2 x 1 + λ = 0 ∂ L /∂ x 2 = − 2 x 2 + λ = 0 ∂ L /∂λ = x 1 + x 2 − 1 = 0 • So stationary point is ( x ∗ 1 , x ∗ 2 ) = ( 1 2 , 1 2 ) , λ = 1

  19. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers - Inequality Constraints Consider the problem: ∇ f ( x ) x A max f ( x ) ∇ g ( x ) x s . t . g ( x ) ≥ 0 x B g ( x ) = 0 g ( x ) > 0 • Optimization over a region – solutions either at stationary points (gradients 0) in region or on boundary L ( x , λ ) = f ( x ) + λ g ( x ) • Solutions have either: • ∇ f ( x ) = 0 and λ = 0 (in region), or • ∇ f ( x ) = − λ ∇ g ( x ) and λ > 0 (on boundary, > for maximizing f ). • For both, λ g ( x ) = 0 • Solutions have g ( x ) ≥ 0 , λ ≥ 0 , λ g ( x ) = 0

  20. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers - Inequality Constraints Consider the problem: ∇ f ( x ) x A max f ( x ) ∇ g ( x ) x s . t . g ( x ) ≥ 0 x B g ( x ) = 0 g ( x ) > 0 • Optimization over a region – solutions either at stationary points (gradients 0) in region or on boundary L ( x , λ ) = f ( x ) + λ g ( x ) • Solutions have either: • ∇ f ( x ) = 0 and λ = 0 (in region), or • ∇ f ( x ) = − λ ∇ g ( x ) and λ > 0 (on boundary, > for maximizing f ). • For both, λ g ( x ) = 0 • Solutions have g ( x ) ≥ 0 , λ ≥ 0 , λ g ( x ) = 0

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend