SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch - - PowerPoint PPT Presentation

support vector machines support vector machines
SMART_READER_LITE
LIVE PREVIEW

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch - - PowerPoint PPT Presentation

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1 LOGISTICS LOGISTICS TAs and Office hours Tuesday: Dr. Bloch (College of Architecture Cafe) - 11am - 11:55am Tuesday: TJ (VL C449 Cubicle D) -


slide-1
SLIDE 1

Matthieu R Bloch Tuesday, February 25, 2020

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES

1

slide-2
SLIDE 2

LOGISTICS LOGISTICS

TAs and Office hours Tuesday: Dr. Bloch (College of Architecture Cafe) - 11am - 11:55am Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Thursday: Hossein (VL C449 Cubicle B): 10:45pm - 12:00pm Friday: Brighton (TSRB 523a) - 12pm-1:15pm Projects Thanks for forming teams Start working on your proposals! Discussion: proposal deadline extension Midterm March 5th Sample midterm posted (do not share) Open notes

2

slide-3
SLIDE 3

RECAP: KARUSH-KUHN TUCKER CONDITIONS RECAP: KARUSH-KUHN TUCKER CONDITIONS

Assume , , are all differentiable Consider Stationarity: Primal feasibility: Dual feasibility: Complementary slackness:

f { } gi { } hj x, λ, μ 0 = ∇f(x) + ∇ (x) + ∇ (x) ∑

i=1 m

λi gi ∑

j=1 p

μj hj ∀i ∈ [1; m] (x) ≤ 0 ∀j ∈ [1; p] (x) = 0 gi hj ∀i ∈ [1; m] ≥ 0 λi ∀i ∈ [1; m] (x) = 0 λigi

3

slide-4
SLIDE 4

KKT CONDITIONS: NECESSITY AND SUFFICIENCY KKT CONDITIONS: NECESSITY AND SUFFICIENCY

Theorem (KKT necessity) If and are primal and dual solutions with zero duality gap, then and satisfy the KKT conditions. Theorem (KKT sufficiency) If the original problem is convex and and satisfy the KKT conditions, then is primal optimal, is dual optimal, and the duality gap is zero. If a constrained optimization problem is differentiable and convex KKT conditions are necessary and sufficient for primal/dual optimality (with zero duality gap) we can use the KKT conditions to find a solution to our optimization problem We’re in luck: the optimal so-margin hyperplane falls in this category!

x∗ ( , ) λ∗ μ∗ x∗ ( , ) λ∗ μ∗ x ~ ( , ) λ ~ μ ~ x ~ ( , ) λ ~ μ ~

4

slide-5
SLIDE 5

OPTIMAL SOFT-MARGIN HYPERPLANE REVISITED OPTIMAL SOFT-MARGIN HYPERPLANE REVISITED

The optimal so-margin hyperplane is the solution of the following Optimization problem is differentiable and convex KKT conditions are necessary and sufficient, duality gap is zero We will kernelize the dual problem The Lagrangian is with . The Lagrange dual function is The dual problem is

+ s.t. ∀i ∈ [1; N] ( + b) ≥ 1 − and ≥ 0 argmin

w,b,ξ

1 2 ∥w∥2

2

C N ∑

i=1 N

ξi yi w⊺xi ξi ξi L(w, b, ξ, λ, μ) ≜ w + + (1 − − ( + b)) − 1 2 w⊺ C N ∑

i=1 N

ξi ∑

i=1 N

λi ξi yi w⊺xi ∑

i=1 N

μiξi λ ≥ 0, μ ≥ 0 (λ, μ) = L(w, b, ξ, λ, μ) LD min

w,b,ξ

(λ, μ) max

λ≥0,μ≥0 LD

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

OPTIMAL SOFT-MARGIN HYPERPLANE: KERNELIZATION OPTIMAL SOFT-MARGIN HYPERPLANE: KERNELIZATION

Let’s simplify using the KKT conditions Lemma (Simplification of dual function) The dual function is Lemma (Simplification of dual problem) The dual optimization problem function is We can very efficiently solve for

(λ, μ) LD (λ, μ) = − + LD 1 2 ∑

i=1 N

j=1 N

λiλjyiyjx⊺

i xj

i=1 N

λi − + s .t. { max

λ,μ

1 2 ∑

i=1 N

j=1 N

λiλjyiyjx⊺

i xj

i=1 N

λi ∀i ∈ [1; N] = 0 ∑N

i=1 λiyi

∀i ∈ [1; N] 0 ≤ ≤ λi

C N

λ∗

8

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

slide-12
SLIDE 12

OPTIMAL SOFT-MARGIN HYPERPLANE: PRIMAL SOLUTIONS OPTIMAL SOFT-MARGIN HYPERPLANE: PRIMAL SOLUTIONS

Assume that we now know , how do we find ? Lemma (Finding primal solutions) for some such that The only data points that matter are those for which By completementary slackness they are the ones for which These points are called support vectors Points are on or inside the margin In practice, the number of support vectors is oen

( , ) λ∗ μ∗ ( , ) w∗ b∗ = and = − w∗ ∑

i=1 N

λ∗

i yixi

b∗ yi w∗⊺xi i ∈ [1; N] 0 < < λ∗

i C N

≠ 0 λ∗

i

( + b) = 1 − yi w∗⊺xi ξ∗

i

≪ N

12

slide-13
SLIDE 13

13

 