Matthieu R Bloch Tuesday, February 25, 2020
SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES
1
SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch - - PowerPoint PPT Presentation
SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1 LOGISTICS LOGISTICS TAs and Office hours Tuesday: Dr. Bloch (College of Architecture Cafe) - 11am - 11:55am Tuesday: TJ (VL C449 Cubicle D) -
Matthieu R Bloch Tuesday, February 25, 2020
1
TAs and Office hours Tuesday: Dr. Bloch (College of Architecture Cafe) - 11am - 11:55am Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Thursday: Hossein (VL C449 Cubicle B): 10:45pm - 12:00pm Friday: Brighton (TSRB 523a) - 12pm-1:15pm Projects Thanks for forming teams Start working on your proposals! Discussion: proposal deadline extension Midterm March 5th Sample midterm posted (do not share) Open notes
2
Assume , , are all differentiable Consider Stationarity: Primal feasibility: Dual feasibility: Complementary slackness:
f { } gi { } hj x, λ, μ 0 = ∇f(x) + ∇ (x) + ∇ (x) ∑
i=1 m
λi gi ∑
j=1 p
μj hj ∀i ∈ [1; m] (x) ≤ 0 ∀j ∈ [1; p] (x) = 0 gi hj ∀i ∈ [1; m] ≥ 0 λi ∀i ∈ [1; m] (x) = 0 λigi
3
Theorem (KKT necessity) If and are primal and dual solutions with zero duality gap, then and satisfy the KKT conditions. Theorem (KKT sufficiency) If the original problem is convex and and satisfy the KKT conditions, then is primal optimal, is dual optimal, and the duality gap is zero. If a constrained optimization problem is differentiable and convex KKT conditions are necessary and sufficient for primal/dual optimality (with zero duality gap) we can use the KKT conditions to find a solution to our optimization problem We’re in luck: the optimal so-margin hyperplane falls in this category!
x∗ ( , ) λ∗ μ∗ x∗ ( , ) λ∗ μ∗ x ~ ( , ) λ ~ μ ~ x ~ ( , ) λ ~ μ ~
4
The optimal so-margin hyperplane is the solution of the following Optimization problem is differentiable and convex KKT conditions are necessary and sufficient, duality gap is zero We will kernelize the dual problem The Lagrangian is with . The Lagrange dual function is The dual problem is
+ s.t. ∀i ∈ [1; N] ( + b) ≥ 1 − and ≥ 0 argmin
w,b,ξ
1 2 ∥w∥2
2
C N ∑
i=1 N
ξi yi w⊺xi ξi ξi L(w, b, ξ, λ, μ) ≜ w + + (1 − − ( + b)) − 1 2 w⊺ C N ∑
i=1 N
ξi ∑
i=1 N
λi ξi yi w⊺xi ∑
i=1 N
μiξi λ ≥ 0, μ ≥ 0 (λ, μ) = L(w, b, ξ, λ, μ) LD min
w,b,ξ
(λ, μ) max
λ≥0,μ≥0 LD
5
6
7
Let’s simplify using the KKT conditions Lemma (Simplification of dual function) The dual function is Lemma (Simplification of dual problem) The dual optimization problem function is We can very efficiently solve for
(λ, μ) LD (λ, μ) = − + LD 1 2 ∑
i=1 N
∑
j=1 N
λiλjyiyjx⊺
i xj
∑
i=1 N
λi − + s .t. { max
λ,μ
1 2 ∑
i=1 N
∑
j=1 N
λiλjyiyjx⊺
i xj
∑
i=1 N
λi ∀i ∈ [1; N] = 0 ∑N
i=1 λiyi
∀i ∈ [1; N] 0 ≤ ≤ λi
C N
λ∗
8
9
10
11
Assume that we now know , how do we find ? Lemma (Finding primal solutions) for some such that The only data points that matter are those for which By completementary slackness they are the ones for which These points are called support vectors Points are on or inside the margin In practice, the number of support vectors is oen
( , ) λ∗ μ∗ ( , ) w∗ b∗ = and = − w∗ ∑
i=1 N
λ∗
i yixi
b∗ yi w∗⊺xi i ∈ [1; N] 0 < < λ∗
i C N
≠ 0 λ∗
i
( + b) = 1 − yi w∗⊺xi ξ∗
i
≪ N
12
13