Lecture 10 Support Vector Machines Oct - 20 - 2008 Linear - PowerPoint PPT Presentation

Lecture 10 Support Vector Machines Oct - 20 - 2008

Linear Separators Linear Separators • Which of the linear separators is optimal? p p + + − + + − + + + − + − − − + − − − − −

Concept of Margin Concept of Margin • Recall that in Perceptron we learned that Recall that in Perceptron, we learned that the convergence rate of the Perceptron algorithm depends on a concept called algorithm depends on a concept called margin

Intuition of Margin • Consider points A, B, and C A • We are quite confident in our q + w · x + b = 0 w · x + b = 0 w · x + b > 0 prediction for A because it is far from the decision B + − + + boundary. boundary − − + + + • In contrast, we are not so − + − − w · x + b < 0 confident in our prediction for − + − − C because a slight change in C − − the decision boundary may − flip the decision. flip the decision. Given a training set, we would like to make all of our predictions correct and confident! This can be captured by di ti t d fid t! Thi b t d b the concept of margin

Functional Margin g • One possible way to define margin: • We define this as the functional margin of the linear classifier w.r.t training example ( x i , y i ) • The large the value, the better – really? • What if we rescale ( w , b ) by a factor α, consider the linear classifier specified by ( α w , α b ) – Decision boundary remain the same Decision boundary remain the same – Yet, functional margin gets multiplied by α – We can change the functional margin of a linear classifier We can change the functional margin of a linear classifier without changing anything meaningful – We need something more meaningful

What we really want What we really want A + w · x + b = 0 w · x + b = 0 B + + + − + + + − + − − − + + − − C − − − We want the distances between the examples and the decision boundary to be large – this quantity is what we call geometric margin But how do we compute the geometric margin of a data point w.r.t a particular line (parameterized by w and b)?

Some basic facts about lines Some basic facts about lines w · x + b = 0 X 1 X 1 ? ? ⋅ + 1 1 w x b || || w

Geometric Margin A + + • The geometric margin of ( w , b ) γ A w.r.t. x (i) is the distance from x (i) to B + − + + the decision surface the decision surface − + + + • This distance can be computed as − + − − − ⋅ + w x i i + ( ) y b − − γ = C i − − w − − Given training set S ={( x i , y i ): i=1,…, N }, the geometric • margin of the classifier w.r.t. S is γ = γ ( ) min i = L 1 i N Note that the points closest to the boundary are called the support Note that the points closest to the boundary are called the support vectors – in fact these are the only points that really matters, other examples are ignorable

What we have done so far What we have done so far • We have established that we want to find a We have established that we want to find a linear decision boundary whose margin is the largest • We know how to measure the margin of a linear decision boundary • Now what? • We have a new learning objective – Given a linearly separable (will be relaxed later) training set S={( x i , y i ): i=1,…, N }, we would like to find a linear classifier ( w b) with maximum margin a linear classifier ( w , b) with maximum margin.

Maximum Margin Classifier • This can be represented as a constrained optimization problem. γ max w , b ⋅ + w x (i) ( ) b ≥ ≥ γ γ = = y ( ) (i) L subject to subject to : : , 1 1 , , y i i N N w • This optimization problem is in a nasty form so we • This optimization problem is in a nasty form, so we need to do some rewriting • Let γ ’ = γ ⋅ ||w||, we can rewrite this as γ γ || || γ ' max w w w , b ⋅ + ≥ γ = w x i i L subject to : ( ) ' , 1 , , y b i N

Maximum Margin Classifier • Note that we can arbitrarily rescale w and b to make the γ γ ' functional margin large or small g g γ ' • So we can rescale them such that =1 max γ ' max w w , b ⋅ + ≥ γ = w x i i L subject to : ( ) ' , 1 , , y b i N 1 2 w w max max (or (or equivalent equivalent ly ly min min ) ) w w w , , b b ⋅ x + ≥ = w i i L subject to : ( ) 1 , 1 , , y b i N Maximizing the geometric margin is equivalent to minimizing the magnitude of w subject to maintaining a functional margin of at least 1

Solving the Optimization Problem 1 2 w min w 2 , b ⋅ x + + ≥ ≥ = w w x i i i i L subject to s bject to : : ( ( ) ) 1 1 , 1 1 , , y b b i i N N • This results in a quadratic optimization problem with linear inequality constraints. li i lit t i t • This is a well-known class of mathematical programming problems for which several (non-trivial) programming problems for which several (non trivial) algorithms exist. – In practice, we can just regard the QP solver as a “black box” without bothering how it works “black-box” without bothering how it works • You will be spared of the excruciating details and jump to jump to

The solution • We can not give you a close form solution that you can directly plug in the numbers and compute for an arbitrary y p g p y data sets • But, the solution can always be written in the following form form N N N N ∑ ∑ = α α = w i i i , s.t. 0 y x y i i = = 1 1 i i • This is the form of w b can be calculated accordingly This is the form of w , b can be calculated accordingly using some additional steps • The weight vector is a linear combination of all the training examples • Importantly, many of the α i ’s are zeros • These points that have non-zero α i ’s are the support ’s are the support These points that have non zero vectors

A Geometrical Interpretation A Geometrical Interpretation Class 2 α 10 = 0 α 8 = 0.6 α 7 = 0 α 2 = 0 α 5 = 0 α 1 = 0.8 α 4 = 0 α 6 = 1.4 6 α 9 = 0 α 3 = 0 Class 1

A few important notes regarding the geometric interpretation • gives the decision boundary gives the decision boundary • positive support vectors lie on this line • negative support vectors lie on this line • We can think of a decision boundary now as a tube of certain width, no points can be inside the tube – Learning involves adjusting the location and orientation of the tube to find the largest fitting tube for i t ti f th t b t fi d th l t fitti t b f the given training set

Lecture 10 Support Vector Machines Oct - 20 - 2008 Linear - PowerPoint PPT Presentation

Lecture 10 Support Vector Machines Oct - 20 - 2008 Linear Separators Linear Separators Which of the linear separators is optimal? p p + + + + + + + + + Concept of Margin

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Support Vector Machines Machine Learning 1 Big picture Linear models 2 Big picture Linear

1 Learning Linear Separators Here we can think of examples as being from { 0 , 1 } n or from R n .

d i E The Derivative as a Rate of Change a l l u d Dr. Abdulla Eid b A College of

Profit Maximization Molly W. Dahl Georgetown University Econ 101 Spring 2009 1 Economic

Emily Shen, David Wagner EVT/WOTE 2011 San Francisco, CA 8 August 2011 Voters rank (a subset

Online Learning & Margins Instructor: Sham Kakade 1 Introduction There are two common

Enabling Safety Upgrades That Reduce Risk David Lochbaum Director, Nuclear Safety Project

ModelPlex: Verified Runtime Monitors and Verified Test Oracles for Safety of Cyber-Physical

Lecture 10 Support Vector Machines Oct - 20 - 2008 Linear - PowerPoint PPT Presentation

Lecture 10 Support Vector Machines Oct - 20 - 2008 Linear Separators Linear Separators Which of the linear separators is optimal? p p + + + + + + + + + Concept of Margin

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Support Vector Machines Machine Learning 1 Big picture Linear models 2 Big picture Linear

1 Learning Linear Separators Here we can think of examples as being from { 0 , 1 } n or from R n .

d i E The Derivative as a Rate of Change a l l u d Dr. Abdulla Eid b A College of

Profit Maximization Molly W. Dahl Georgetown University Econ 101 Spring 2009 1 Economic

Emily Shen, David Wagner EVT/WOTE 2011 San Francisco, CA 8 August 2011 Voters rank (a subset

Online Learning &amp; Margins Instructor: Sham Kakade 1 Introduction There are two common

Enabling Safety Upgrades That Reduce Risk David Lochbaum Director, Nuclear Safety Project

ModelPlex: Verified Runtime Monitors and Verified Test Oracles for Safety of Cyber-Physical

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Online Learning & Margins Instructor: Sham Kakade 1 Introduction There are two common