Recall: A Linear Classifier A Line (generally hyperplane) that - - PowerPoint PPT Presentation

recall a linear classifier
SMART_READER_LITE
LIVE PREVIEW

Recall: A Linear Classifier A Line (generally hyperplane) that - - PowerPoint PPT Presentation

Support Vector Machine Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 3, 2014 Recall: A Linear Classifier A Line (generally hyperplane) that separates the two classes of


slide-1
SLIDE 1

Support ¡Vector ¡Machine ¡

Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata

November 3, 2014

slide-2
SLIDE 2

Recall: ¡A ¡Linear ¡Classifier ¡

2 ¡

A Line (generally hyperplane) that separates the two classes of points Choose a “good” line

§ Optimize some objective function § LDA: objective function depending on mean and scatter § Depends on all the points

There can be many such lines, many parameters to optimize

slide-3
SLIDE 3

Recall: ¡A ¡Linear ¡Classifier ¡

3 ¡

§ What do we really want? § Primarily – least number

  • f misclassifications

§ Consider a separation line § When will we worry about misclassification? § Answer: when the test point is near the margin § So – why consider scatter, mean etc (those depend on all points), rather just concentrate on the “border”

slide-4
SLIDE 4

Support ¡Vector ¡Machine: ¡intui:on ¡

4 ¡

§ Recall: A projection line w for the points lets us define a separation line L § How? [not mean and scatter] § Identify support vectors, the training data points that act as “support” § Separation line L between support vectors § Maximize the margin: the distance between lines L1 and L2 (hyperplanes) defined by the support vectors

w L support vectors support vectors L2 L1

slide-5
SLIDE 5

Basics ¡

Distance of L from origin

5 ¡

w

L : w•x = a w1x1 + w2x2 = a

a w1

2 + w2 2 = a

w

a w

slide-6
SLIDE 6

Support ¡Vector ¡Machine: ¡formula:on ¡

6 ¡

§ Scale w and b such that we have the lines are defined by these equations § Then we have:

w

L : w•x + b = 0 L2 : w•x + b =1 L1 : w•x + b = −1

d(0, L1) = −1− b w ,d(0, L2) = 1− b w

§ The margin (separation of the two classes)

d(L1, L2) = 2 w min w , wTx + b ≤ −1,∀x ∈ class1 wTx + b ≥1,∀x ∈ class2 min w , yi(wTx) ≥1,∀i

Consider the classes as another dimension yi=-1, +1

slide-7
SLIDE 7

Langrangian ¡for ¡Op:miza:on ¡

§ An optimization problem

minimize f(x) subject to g(x) = 0

§ The Langrangian:

L(x,λ) = f(x) – λg(x) where

§ In general (many constrains, with indices i)

7 ¡

∇(x,λ) = 0 L(x,λ) = f (x)+ λigi(x)

i

slide-8
SLIDE 8

The ¡SVM ¡Quadra:c ¡Op:miza:on ¡

§ The Langrangian of the SVM optimization:

8 ¡

LP = w

2 −

αiyi(w•xi + b)

i

+ αi

i

αi ≥ 0∀i

§ The Dual Problem

maxL = αi − 1 2 αiα j

i, j

xi •x j

i

where w = αi

i

yixi αi

i

yi = 0 The input vectors appear only in the form of dot products

slide-9
SLIDE 9

Case: ¡not ¡linearly ¡separable ¡

9 ¡

x ! (x2, x)

§ Data may not be linearly separable § Map the data into a higher dimensional space § Data can become separable (by a hyperplane) in the higher dimensional space § Kernel trick § Possible only for certain functions when have a kernel function K such that

K(xi,x j) = φ(xi)•φ(x j)

slide-10
SLIDE 10

Non ¡– ¡linear ¡SVM ¡kernels ¡

10 ¡