lecture 10 support vector machines
play

Lecture 10 Support Vector Machines Oct - 20 - 2008 Linear - PowerPoint PPT Presentation

Lecture 10 Support Vector Machines Oct - 20 - 2008 Linear Separators Linear Separators Which of the linear separators is optimal? p p + + + + + + + + + Concept of Margin


  1. Lecture 10 Support Vector Machines Oct - 20 - 2008

  2. Linear Separators Linear Separators • Which of the linear separators is optimal? p p + + − + + − + + + − + − − − + − − − − −

  3. Concept of Margin Concept of Margin • Recall that in Perceptron we learned that Recall that in Perceptron, we learned that the convergence rate of the Perceptron algorithm depends on a concept called algorithm depends on a concept called margin

  4. Intuition of Margin • Consider points A, B, and C A • We are quite confident in our q + w · x + b = 0 w · x + b = 0 w · x + b > 0 prediction for A because it is far from the decision B + − + + boundary. boundary − − + + + • In contrast, we are not so − + − − w · x + b < 0 confident in our prediction for − + − − C because a slight change in C − − the decision boundary may − flip the decision. flip the decision. Given a training set, we would like to make all of our predictions correct and confident! This can be captured by di ti t d fid t! Thi b t d b the concept of margin

  5. Functional Margin g • One possible way to define margin: • We define this as the functional margin of the linear classifier w.r.t training example ( x i , y i ) • The large the value, the better – really? • What if we rescale ( w , b ) by a factor α, consider the linear classifier specified by ( α w , α b ) – Decision boundary remain the same Decision boundary remain the same – Yet, functional margin gets multiplied by α – We can change the functional margin of a linear classifier We can change the functional margin of a linear classifier without changing anything meaningful – We need something more meaningful

  6. What we really want What we really want A + w · x + b = 0 w · x + b = 0 B + + + − + + + − + − − − + + − − C − − − We want the distances between the examples and the decision boundary to be large – this quantity is what we call geometric margin But how do we compute the geometric margin of a data point w.r.t a particular line (parameterized by w and b)?

  7. Some basic facts about lines Some basic facts about lines w · x + b = 0 X 1 X 1 ? ? ⋅ + 1 1 w x b || || w

  8. Geometric Margin A + + • The geometric margin of ( w , b ) γ A w.r.t. x (i) is the distance from x (i) to B + − + + the decision surface the decision surface − + + + • This distance can be computed as − + − − − ⋅ + w x i i + ( ) y b − − γ = C i − − w − − Given training set S ={( x i , y i ): i=1,…, N }, the geometric • margin of the classifier w.r.t. S is γ = γ ( ) min i = L 1 i N Note that the points closest to the boundary are called the support Note that the points closest to the boundary are called the support vectors – in fact these are the only points that really matters, other examples are ignorable

  9. What we have done so far What we have done so far • We have established that we want to find a We have established that we want to find a linear decision boundary whose margin is the largest • We know how to measure the margin of a linear decision boundary • Now what? • We have a new learning objective – Given a linearly separable (will be relaxed later) training set S={( x i , y i ): i=1,…, N }, we would like to find a linear classifier ( w b) with maximum margin a linear classifier ( w , b) with maximum margin.

  10. Maximum Margin Classifier • This can be represented as a constrained optimization problem. γ max w , b ⋅ + w x (i) ( ) b ≥ ≥ γ γ = = y ( ) (i) L subject to subject to : : , 1 1 , , y i i N N w • This optimization problem is in a nasty form so we • This optimization problem is in a nasty form, so we need to do some rewriting • Let γ ’ = γ ⋅ ||w||, we can rewrite this as γ γ || || γ ' max w w w , b ⋅ + ≥ γ = w x i i L subject to : ( ) ' , 1 , , y b i N

  11. Maximum Margin Classifier • Note that we can arbitrarily rescale w and b to make the γ γ ' functional margin large or small g g γ ' • So we can rescale them such that =1 max γ ' max w w , b ⋅ + ≥ γ = w x i i L subject to : ( ) ' , 1 , , y b i N 1 2 w w max max (or (or equivalent equivalent ly ly min min ) ) w w w , , b b ⋅ x + ≥ = w i i L subject to : ( ) 1 , 1 , , y b i N Maximizing the geometric margin is equivalent to minimizing the magnitude of w subject to maintaining a functional margin of at least 1

  12. Solving the Optimization Problem 1 2 w min w 2 , b ⋅ x + + ≥ ≥ = w w x i i i i L subject to s bject to : : ( ( ) ) 1 1 , 1 1 , , y b b i i N N • This results in a quadratic optimization problem with linear inequality constraints. li i lit t i t • This is a well-known class of mathematical programming problems for which several (non-trivial) programming problems for which several (non trivial) algorithms exist. – In practice, we can just regard the QP solver as a “black box” without bothering how it works “black-box” without bothering how it works • You will be spared of the excruciating details and jump to jump to

  13. The solution • We can not give you a close form solution that you can directly plug in the numbers and compute for an arbitrary y p g p y data sets • But, the solution can always be written in the following form form N N N N ∑ ∑ = α α = w i i i , s.t. 0 y x y i i = = 1 1 i i • This is the form of w b can be calculated accordingly This is the form of w , b can be calculated accordingly using some additional steps • The weight vector is a linear combination of all the training examples • Importantly, many of the α i ’s are zeros • These points that have non-zero α i ’s are the support ’s are the support These points that have non zero vectors

  14. A Geometrical Interpretation A Geometrical Interpretation Class 2 α 10 = 0 α 8 = 0.6 α 7 = 0 α 2 = 0 α 5 = 0 α 1 = 0.8 α 4 = 0 α 6 = 1.4 6 α 9 = 0 α 3 = 0 Class 1

  15. A few important notes regarding the geometric interpretation • gives the decision boundary gives the decision boundary • positive support vectors lie on this line • negative support vectors lie on this line • We can think of a decision boundary now as a tube of certain width, no points can be inside the tube – Learning involves adjusting the location and orientation of the tube to find the largest fitting tube for i t ti f th t b t fi d th l t fitti t b f the given training set

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend