recall a linear classifier
play

Recall: A Linear Classifier A Line (generally hyperplane) that - PowerPoint PPT Presentation

Support Vector Machine Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 3, 2014 Recall: A Linear Classifier A Line (generally hyperplane) that separates the two classes of


  1. Support ¡Vector ¡Machine ¡ Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014

  2. Recall: ¡A ¡Linear ¡Classifier ¡ A Line (generally hyperplane) that separates the two classes of points Choose a “good” line § Optimize some objective function § LDA: objective function depending on mean and scatter § Depends on all the points There can be many such lines , many parameters to optimize 2 ¡

  3. Recall: ¡A ¡Linear ¡Classifier ¡ § What do we really want? § Primarily – least number of misclassifications § Consider a separation line § When will we worry about misclassification? § Answer: when the test point is near the margin § So – why consider scatter, mean etc (those depend on all points), rather just concentrate on the “border” 3 ¡

  4. Support ¡Vector ¡Machine: ¡intui:on ¡ support vectors § Recall: A projection line w support vectors for the points lets us define a separation line L § How? [not mean and scatter] § Identify support vectors , the training data points that act as “support” § Separation line L between support vectors w L 2 L 1 L § Maximize the margin : the distance between lines L 1 and L 2 (hyperplanes) defined by the support vectors 4 ¡

  5. Basics ¡ Distance of L from origin w 1 x 1 + w 2 x 2 = a a 2 = a 2 + w 2 w w 1 L : w • x = a a w w 5 ¡

  6. Support ¡Vector ¡Machine: ¡formula:on ¡ L 1 : w • x + b = − 1 § Scale w and b such that we L 2 : w • x + b = 1 have the lines are defined by these equations § Then we have: d ( 0 , L 1 ) = − 1 − b , d ( 0 , L 2 ) = 1 − b w w § The margin (separation of the two classes) d ( L 1 , L 2 ) = 2 w w L : w • x + b = 0 min w , min w , Consider the classes as w T x + b ≤ − 1, ∀ x ∈ class 1 another dimension y i =- 1, +1 y i ( w T x ) ≥ 1, ∀ i w T x + b ≥ 1, ∀ x ∈ class 2 6 ¡

  7. Langrangian ¡for ¡Op:miza:on ¡ § An optimization problem minimize f ( x ) subject to g ( x ) = 0 § The Langrangian: L ( x, λ ) = f ( x ) – λ g ( x ) where ∇ ( x , λ ) = 0 § In general (many constrains, with indices i ) ∑ L ( x , λ ) = f ( x ) + λ i g i ( x ) i 7 ¡

  8. The ¡SVM ¡Quadra:c ¡Op:miza:on ¡ § The Langrangian of the SVM optimization: 2 − ∑ ∑ L P = w α i y i ( w • x i + b ) + α i i i α i ≥ 0 ∀ i § The Dual Problem α i − 1 ∑ ∑ max L = x i • x j α i α j 2 i i , j where The input vectors appear only ∑ w = y i x i α i in the form of dot products i ∑ y i = 0 α i i 8 ¡

  9. Case: ¡not ¡linearly ¡separable ¡ x ! ( x 2 , x ) § Data may not be linearly separable § Map the data into a higher dimensional space § Data can become separable (by a hyperplane) in the higher dimensional space § Kernel trick § Possible only for certain functions when have a kernel function K such that K ( x i , x j ) = φ ( x i ) • φ ( x j ) 9 ¡

  10. Non ¡– ¡linear ¡SVM ¡kernels ¡ 10 ¡

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend