[PPT] - 0.6 out 0.5 train r Erro v al 5 25 15 eted g E PowerPoint Presentation

SLIDE 1 Review

f

Le ture 13

V

alidation

D

v al

D

(N)

D

train

(N − K)

g

(K)

E

v al(g )

g

E

v al(g−) estimates

E

ut(g)
Data
ntamination

PSfrag repla ements V alidation Set Size, K Exp e ted Erro r

E

v al

g−

m∗

E
ut
g−

m∗

5

15 25 0.5 0.6 0.7 0.8

D

v al slightly

ntaminated
Cross

validation

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

train train v alidate

D z }| {

10-fold ross validation

SLIDE 2 Lea rning F rom Data Y aser S. Abu-Mostafa Califo rnia Institute

f

T e hnology Le ture 14: Supp

rt

V e to r Ma hines Sp

nso

red b y Calte h's Provost O e, E&AS Division, and IST

Thursda

y , Ma y 17, 2012

SLIDE 3 Outline

Maximizing

the ma rgin

The

solution

Nonlinea

r transfo rms

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 2/20

SLIDE 4 Better linea r sepa ration Linea rly sepa rable data

Hi Hi Hi Hi Hi Hi

Dierent sepa rating lines Whi h is b est? T w

questions:

1. Why is bigger ma rgin b etter? 2. Whi h w maximizes the ma rgin?

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 3/20

SLIDE 5 Rememb er the gro wth fun tion? All di hotomies with any line:

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 4/20

SLIDE 6 Di hotomies with fat ma rgin F at ma rgins imply few er di hotomies

0.397 0.5 0.866 infinity 0.397 0.5 0.866 infinity

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 5/20

SLIDE 7 Finding w with la rge ma rgin Let xn b e the nea rest data p

int

to the plane w Tx = 0. Ho w fa r is it? 2 p relimina ry te hni alities: 1. No rmalize w :

|w

Txn| = 1 2. Pull

ut w0

:

w = (w1, · · · , wd)

apa rt from b The plane is no w

w

Tx + b = 0 (no x0 )

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 6/20

SLIDE 8 Computing the distan e The distan e b et w een xn and the plane w Tx + b = 0 where |w Txn + b| = 1 The ve to r w is ⊥ to the plane in the X spa e:

xn

Hi Hi

x’ x’’ w

T ak e x′ and x′′

n

the plane

w

Tx′ + b = 0 and

w

Tx′′ + b = 0

= ⇒ w

T(x′ − x′′) = 0

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 7/20

SLIDE 9 and the distan e is . . . Distan e b et w een xn and the plane:

xn

Hi Hi

x w

T ak e any p

int x
n

the plane Proje tion

f xn − x
n w

ˆ w = w w = ⇒

distan e =

ˆ

w

T(xn − x)

distan e

= 1 w

w

Txn − w Tx

=

1 w

w

Txn + b − w Tx − b

=

1 w

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 8/20

SLIDE 10 The

ptimization

p roblem Maximize

1 w

subje t to

min

n=1,2,...,N |w

Txn + b| = 1 Noti e: |w Txn + b| = yn (w Txn + b) Minimize 1

2 w

Tw subje t to yn (w Txn + b) ≥ 1 fo r

n = 1, 2, . . . , N

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 9/20

SLIDE 11 Outline

Maximizing

the ma rgin

The

solution

Nonlinea

r transfo rms

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 10/20

SLIDE 12 Constrained

ptimization

Minimize

1 2 w

Tw subje t to

yn (w

Txn + b) ≥ 1 fo r

n = 1, 2, . . . , N w ∈ Rd, b ∈ R

Lagrange? inequalit y

nstraints =

⇒

KKT

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 11/20

SLIDE 13 W e sa w this b efo re

w

lin

w

tw = C

w E

in =

nst.

∇E

in normal Rememb er regula rization? Minimize

E

in(w) =

1 N (Zw − y)

T(Zw − y) subje t to:

w

Tw ≤ C

∇E

in no rmal to

nstraint
ptimize
nstrain

Regula rization:

E

in

w

Tw SVM:

w

Tw

E

in

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 12/20

SLIDE 14 Lagrange fo rmulation Minimize

L(w, b, α) = 1 2 w

Tw −

N

n=1

αn(yn (w

Txn + b) −1) w.r.t. w and b and maximize w.r.t. ea h αn ≥ 0

∇

wL = w − N

n=1

αnynxn = 0 ∂L ∂b = −

N

n=1

αnyn = 0

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 13/20

SLIDE 15 Substituting . . .

w =

N

n=1

αnynxn

and

N

n=1

αnyn = 0

in the Lagrangian

L(w, b, α) = 1 2 w

Tw −

N

n=1

αn (yn (w

Txn+b) −1 ) w e get

L(α) =

N

n=1

αn − 1 2

N

n=1

N

m=1

ynym αnαm x

T

nxm

Maximize w.r.t. to α subje t to αn ≥ 0 fo r n = 1, · · · , N and

N

n=1 αnyn = 0

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 14/20

SLIDE 16 The solution

quadrati

p rogramming

min

α

1 2 α

T

     y1y1 x1

Tx1

y1y2 x1

Tx2

. . . y1yN x1

TxN

y2y1 x2

Tx1

y2y2 x2

Tx2

. . . y2yN x2

TxN

. . . . . . . . . . . . yNy1 xN

Tx1 yNy2 xN Tx2 . . . yNyN xN TxN

    

quadrati
e ients

α + (−1

T) linea r

α

subje t to

y

Tα = 0

linea

r

nstraint
lo

w er b

unds

≤ α ≤ ∞

upp

er b

unds

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 15/20

SLIDE 17 QP hands us α Solution: α = α1, · · · , αN

= ⇒ w =

N

n=1

αnynxn

KKT

ndition:

F

r n = 1, · · · , N

αn (yn (w

Txn + b) − 1) = 0 W e sa w this b efo re!

αn > 0 = ⇒ xn

is a supp

rt

ve to r

w

lin

w

tw = C

w E

in =

nst.

∇E

in normal

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 16/20

SLIDE 18 Supp

rt

ve to rs

Hi Hi

Closest xn 's to the plane: a hieve the ma rgin

= ⇒ yn (w

Txn + b) = 1

w =

xn

is SV

αnynxn

Solve fo r b using any SV:

yn (w

Txn + b) = 1

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 17/20

SLIDE 19 Outline

Maximizing

the ma rgin

The

solution

Nonlinea

r transfo rms

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 18/20

SLIDE 20

z

instead

f x

L(α) =

N

n=1

αn − 1 2

N

n=1

N

m=1

ynym αnαm z

T

nzm

PSfrag repla ements

−1 1 −1 1

PSfrag repla ements

0.5 1 0.5 1

X − → Z

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 19/20

SLIDE 21 Supp

rt

ve to rs in X spa e

Hi Hi

Supp

rt

ve to rs live in Z spa e In X spa e, p re-images

f

supp

rt

ve to rs The ma rgin is maintained in Z spa e Generalization result

E[E

ut] ≤

E[#

f

SV's]

N − 1

A

M L

Creato r: Y aser Abu-Mostafa

LFD

Le ture 14 20/20