Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 - PowerPoint PPT Presentation

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Linear Classification • Consider a two class classification problem • Use a linear model y ( x ) = w T φ ( x ) + b followed by a threshold function • For now, let’s assume training data are linearly separable • Recall that the perceptron would converge to a perfect classifier for such data • But there are many such perfect classifiers

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Max Margin y = 1 y = 0 y = − 1 margin • We can define the margin of a classifier as the minimum distance to any example • In support vector machines the decision boundary which maximizes the margin is chosen

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Marginal Geometry y > 0 x 2 y = 0 R 1 y < 0 R 2 x w y ( x ) � w � x ⊥ x 1 − w 0 � w � • Recall from Ch. 4 • Projection of x in w dir. is w T x || w || • y ( x ) = 0 when w T x = − b , or w T x || w || = − b || w || • So w T x || w || = y ( x ) || w || − − b || w || is signed distance to decision boundary

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vectors y = − 1 y = 0 y = 1 • Assuming data are separated by the hyperplane, distance to decision boundary is t n y ( x n ) || w || • The maximum margin criterion chooses w , b by: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Points with this min value are known as support vectors

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • So the optimization problem is now a constrained optimization problem: 1 2 || w || 2 arg min w , b ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 s . t . • To solve this, we need to take a detour into Lagrange multipliers

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Example x 2 • Consider the problem ( x ⋆ 1 , x ⋆ 2 ) f ( x 1 , x 2 ) = 1 − x 2 1 − x 2 max 2 x x 1 s . t . g ( x 1 , x 2 ) = x 1 + x 2 − 1 = 0 g ( x 1 , x 2 ) = 0 • Lagrangian: L ( x , λ ) = 1 − x 2 1 − x 2 2 + λ ( x 1 + x 2 − 1 ) • Stationary points require: ∂ L /∂ x 1 = − 2 x 1 + λ = 0 ∂ L /∂ x 2 = − 2 x 2 + λ = 0 ∂ L /∂λ = x 1 + x 2 − 1 = 0 • So stationary point is ( x ∗ 1 , x ∗ 2 ) = ( 1 2 , 1 2 ) , λ = 1

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers - Inequality Constraints Consider the problem: ∇ f ( x ) x A max f ( x ) ∇ g ( x ) x s . t . g ( x ) ≥ 0 x B g ( x ) = 0 g ( x ) > 0 • Optimization over a region – solutions either at stationary points (gradients 0) in region or on boundary L ( x , λ ) = f ( x ) + λ g ( x ) • Solutions have either: • ∇ f ( x ) = 0 and λ = 0 (in region), or • ∇ f ( x ) = − λ ∇ g ( x ) and λ > 0 (on boundary, > for maximizing f ). • For both, λ g ( x ) = 0 • Solutions have g ( x ) ≥ 0 , λ ≥ 0 , λ g ( x ) = 0

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 - PowerPoint PPT Presentation

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Topologically relevant stationarity concepts Oliver Stein Institute of Operations Research

Phase transition in the fine structure constant Danny Marfatia University of Kansas with

MATH 612 Computational methods for equation solving and function minimization Week # 11 F

Generative Adversarial Networks (part 2) Benjamin Striner 1 1 Carnegie Mellon University April 10,

Local invariant sets of analytic vector fields Niclas Kruff RWTH Aachen University August 3,

st rt tt

Defining Point-Set Surfaces Nina Amenta Yong Joo Kil SIGGRAPH 2004 11/2/2005 1 Point-Set

Investigation of Crouzeixs Conjecture via Nonsmooth Optimization Michael L. Overton Courant

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 - PowerPoint PPT Presentation

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Topologically relevant stationarity concepts Oliver Stein Institute of Operations Research

Phase transition in the fine structure constant Danny Marfatia University of Kansas with

MATH 612 Computational methods for equation solving and function minimization Week # 11 F

Generative Adversarial Networks (part 2) Benjamin Striner 1 1 Carnegie Mellon University April 10,

Local invariant sets of analytic vector fields Niclas Kruff RWTH Aachen University August 3,

st rt tt

Defining Point-Set Surfaces Nina Amenta Yong Joo Kil SIGGRAPH 2004 11/2/2005 1 Point-Set

Investigation of Crouzeixs Conjecture via Nonsmooth Optimization Michael L. Overton Courant

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David