10701 Recitation 5 Duality and SVM Ahmed Hefny Outline - - PowerPoint PPT Presentation

10701 recitation 5
SMART_READER_LITE
LIVE PREVIEW

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline - - PowerPoint PPT Presentation

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Examples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian


slide-1
SLIDE 1

10701 Recitation 5 Duality and SVM

Ahmed Hefny

slide-2
SLIDE 2

Outline

  • Langrangian and Duality

– The Lagrangian – Duality – Examples

  • Support Vector Machines

– Primal Formulation – Dual Formulation – Soft Margin and Hinge Loss

slide-3
SLIDE 3

Lagrangian

  • Consider the problem

min

𝑦 𝑔(𝑦)

s.t. 𝑕𝑗 𝑦 = 0

  • Add a Lagrange multiplier for each constraint

𝑀 𝑦, 𝑣 = 𝑔 𝑦 + 𝑗 𝑣𝑗𝑕𝑗(𝑦)

slide-4
SLIDE 4

Lagrangian

  • Lagrangian

𝑀 𝑦, 𝑣 = 𝑔 𝑦 + 𝑗 𝑣𝑗𝑕𝑗(𝑦)

  • Setting gradient to 0 gives

– 𝑕𝑗 𝑦 = 0 [Feasible point] – 𝛼𝑔 𝑦 + 𝑗 𝑣𝑗𝛼𝑕𝑗 𝑦 = 0 [Cannot decrease 𝑔 except by violating constraints]

slide-5
SLIDE 5

Lagrangian

  • Consider the problem

min

𝑦 𝑔(𝑦)

s.t. 𝑕𝑗 𝑦 = 0 ℎ𝑘 𝑦 ≤ 0

  • Add a Lagrange multiplier for each constraint

𝑀 𝑦, 𝑣, 𝜇 = 𝑔 𝑦 + 𝑗 𝑣𝑗𝑕𝑗(𝑦) + 𝑘 𝜇𝑘ℎ𝑘(𝑦)

slide-6
SLIDE 6

Duality

slide-7
SLIDE 7

Duality

  • Primal problem

min

𝑦 𝑔(𝑦)

s.t. 𝑕𝑗 𝑦 = 0 ℎ𝑘 𝑦 ≤ 0

  • Equivalent to

min

𝑦

max

𝜇≥0,𝑣 𝑔 𝑦 + 𝑗

𝑣𝑗𝑕𝑗(𝑦) +

𝑘

𝜇𝑘ℎ𝑘(𝑦)

slide-8
SLIDE 8

Duality

  • Primal problem

min

𝑦 𝑔(𝑦)

s.t. 𝑕𝑗 𝑦 = 0 ℎ𝑘 𝑦 ≤ 0

  • Equivalent to

min

𝑦 𝑔(𝑦)

𝑦 𝑗𝑡 𝑔𝑓𝑏𝑡𝑗𝑐𝑚𝑓 ∞ 𝑝. 𝑥.

slide-9
SLIDE 9

Duality

  • Dual Problem

max

𝜇≥0,𝑣 min 𝑦 𝑔 𝑦 + 𝑗 𝑣𝑗𝑕𝑗(𝑦) + 𝑘 𝜇𝑘ℎ𝑘(𝑦)

  • Dual function:

– Concave, regardless of the convexity of the primal – Lower bound on primal

Lagrangian Dual Function 𝑀(𝜇, 𝑣)

slide-10
SLIDE 10

Duality

λ 𝑦

Primal Problem min

𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇)

slide-11
SLIDE 11

Duality

λ 𝑦

Primal Problem min

𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇)

For each row (choice of 𝑦), pick the largest element then select the minimum.

slide-12
SLIDE 12

Duality

λ 𝑦

Dual Problem max

𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇)

For each column (choice of 𝜇), pick the smallest element then select the maximum.

slide-13
SLIDE 13

Duality

𝑦∗, 𝜇∗ λ 𝑦

Claim:

min

𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) ≥ max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇)

slide-14
SLIDE 14

Duality

𝑦∗, 𝜇∗ λ 𝑦

Claim:

min

𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) ≥ max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇)

For any 𝜇 ≥ 0

min

𝑦 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦∗, 𝜇 ≤ 𝑀(𝑦∗, 𝜇∗)

The difference between primal minimum And dual maximum is called duality gap duality gap = 0  Strong Duality

slide-15
SLIDE 15

Duality

𝑦∗, 𝜇∗ λ 𝑦

When does

min

𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇)

slide-16
SLIDE 16

Duality

𝒚∗, 𝝁∗ λ 𝑦

When does

min

𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇)

𝑦∗, 𝜇∗ is a saddle point 𝑀 𝑦∗, 𝜇 ≤ 𝑀 𝑦∗, 𝜇∗ ≤ 𝑀(𝑦, 𝜇∗)

slide-17
SLIDE 17

Duality

𝒚∗, 𝝁∗ λ 𝑦

When does

min

𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇)

𝑦∗, 𝜇∗ is a saddle point 𝑀 𝑦∗, 𝜇 ≤ 𝑀 𝑦∗, 𝜇∗ ≤ 𝑀(𝑦, 𝜇∗)

Necessity  By definition of dual Sufficiency  𝑀 𝜇 = min

x 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦∗, 𝜇∗

𝑀 𝜇∗ = 𝑀 𝑦∗, 𝜇∗

slide-18
SLIDE 18

Duality

𝒚∗, 𝝁∗ λ 𝑦

When does

min

𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇)

𝑦∗, 𝜇∗ is a saddle point 𝑀 𝑦∗, 𝜇 ≤ 𝑀 𝑦∗, 𝜇∗ ≤ 𝑀(𝑦, 𝜇∗)

Necessity  By definition of dual Sufficiency  𝑀 𝜇 = min𝑦 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦∗, 𝜇∗ 𝑀 𝜇∗ = 𝑀 𝑦∗, 𝜇∗ The dual at 𝜇∗ is the upper bound

slide-19
SLIDE 19

Duality

  • If strong duality holds, KKT conditions apply to
  • ptimal point

– Stationary Point 𝛼𝑀 𝑦, 𝑣, 𝜇 = 0 – Primal Feasibility – Dual Feasibility (𝜇 ≥ 0) – Complementary Slackness (𝜇𝑗ℎ𝑗 𝑦 = 0)

  • KKT conditions are

– Sufficient – Necessary under strong duality

slide-20
SLIDE 20

Example: LP

  • Primal

min

𝑦 𝑑𝑈𝑦

s.t. 𝐵𝑦 ≥ 𝑐

slide-21
SLIDE 21

Example: LP

  • Primal

min

𝑦 𝑑𝑈𝑦

s.t. 𝐵𝑦 ≥ 𝑐

  • Lagrangian

𝑀 𝑦, 𝜇 = 𝑑𝑈𝑦 − 𝜇𝑈 𝐵𝑦 − 𝑐

slide-22
SLIDE 22

Example: LP

  • Dual Function

𝑀 𝜇 = min

𝑦 𝑑𝑈𝑦 − 𝜇𝑈 𝐵𝑦 − 𝑐

slide-23
SLIDE 23

Example: LP

  • Dual Function

𝑀 𝜇 = min

𝑦 𝑑𝑈𝑦 − 𝜇𝑈 𝐵𝑦 − 𝑐

  • Set gradient w.r.t 𝑦 to 0

𝑑 − 𝐵𝑈𝜇 = 0

slide-24
SLIDE 24

Example: LP

  • Dual Function

𝑀 𝜇 = min

𝑦 𝑑𝑈𝑦 − 𝜇𝑈 𝐵𝑦 − 𝑐

  • Set gradient w.r.t 𝑦 to 0

𝑑 − 𝐵𝑈𝜇 = 0

  • Dual Problem

max

𝜇≥0 𝜇𝑈𝑐

s.t. 𝑑 − 𝐵𝑈𝜇 = 0

Why keep this as a constraint ?

slide-25
SLIDE 25

Example: LASSO

  • We will use duality to transform LASSO into a

QP

slide-26
SLIDE 26

Example: LASSO

Primal min 1 2 𝑧 − 𝑌𝑥 2 + 𝛿 𝑥 1 What is the dual function in this case ?

slide-27
SLIDE 27

Example: LASSO

Reformulated Primal min 1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 s.t. 𝑨 = 𝑌𝑥 Dual 𝑀 𝜇 = min

𝑨,𝑥

1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 + 𝜇𝑈(𝑨 − 𝑌𝑥)

slide-28
SLIDE 28

Example: LASSO

Dual 𝑀 𝜇 = min

𝑨,𝑥

1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 + 𝜇𝑈(𝑨 − 𝑌𝑥) Setting gradient to zero gives 𝑨 = 𝑧 − 𝜇 𝑌𝑈𝜇 ∞ ≤ 𝛿

slide-29
SLIDE 29

Example: LASSO

  • Dual Problem

max − 1 2 𝜇 2 + 𝜇𝑈𝑧 s.t. 𝑌𝑈𝜇 ∞ ≤ 𝛿

slide-30
SLIDE 30

Support Vector Machines

docs.opencv.org

slide-31
SLIDE 31

Support Vector Machines

  • Find the maximum margin hyper-plane
  • “Distance” from a point 𝑦

to the hyper-plane 𝑥, 𝑦𝑗 + 𝑐 = 0 is given by 𝑒𝑗 = ( 𝑥, 𝑦𝑗 + 𝑐)/ 𝑥

  • 𝑁𝑏𝑠𝑕𝑗𝑜 = min

𝑗 𝑧𝑗𝑒𝑗 = 1 𝑥 min𝑗

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗

  • Max Margin: max

𝑥,𝑐 1 𝑥 min𝑗

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗

slide-32
SLIDE 32

Support Vector Machines

  • Max Margin

max

𝑥,𝑐

1 𝑥 min𝑗 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗

  • Unpleasant (max min ?)
  • No Unique Solution
slide-33
SLIDE 33

Support Vector Machines

  • Max Margin

max

𝑥,𝑐

1 𝑥 min𝑗 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 s.t. ???

slide-34
SLIDE 34

Support Vector Machines

  • Max Margin

max

𝑥,𝑐

1 𝑥 min𝑗 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 s.t. min𝑗 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 = 1

slide-35
SLIDE 35

Support Vector Machines

  • Max Margin

min

𝑥,𝑐

1 2 𝑥 2 s.t. min

𝑗

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 = 1

slide-36
SLIDE 36

Support Vector Machines

  • Max Margin (Canonical Representation)

min

𝑥,𝑐

1 2 𝑥 2 s.t. 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 ≥ 1, ∀𝑗

  • QP, much better than

max

𝑥,𝑐 1 𝑥 min𝑗

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗

slide-37
SLIDE 37

SVM Dual Problem

Recall that the Lagrangian is formed by adding a Lagrange multiplier for each constraint.

𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 −

𝑗

𝛽𝑗 [ 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 − 1]

slide-38
SLIDE 38

SVM Dual Problem

𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 −

𝑗

𝛽𝑗 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 − 1

Fix 𝛽 and minimize w.r.t 𝑥, 𝑐: 𝑥 − 𝑗 𝛽𝑗 𝑧𝑗𝑦𝑗 = 0 𝑗 𝛽𝑗𝑧𝑗 = 0

slide-39
SLIDE 39

SVM Dual Problem

𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 −

𝑗

𝛽𝑗 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 − 1

Fix 𝛽 and minimize w.r.t 𝑥, 𝑐: 𝑥 − 𝑗 𝛽𝑗 𝑧𝑗𝑦𝑗 = 0 𝑗 𝛽𝑗𝑧𝑗 = 0

Plug-in Constraint (why ?)

slide-40
SLIDE 40

SVM Dual Problem

Dual Problem

max − 1 2

𝑗 𝑘

𝛽𝑗𝛽𝑘𝑧𝑗𝑧𝑘 𝑦𝑗, 𝑦𝑘 +

𝑗

𝛽𝑗 s.t. 𝑗 𝛽𝑗𝑧𝑗 = 0 𝛽𝑗 ≥ 0

Another QP. So what ?

slide-41
SLIDE 41

SVM Dual Problem

  • Only Inner products  Kernel Trick
  • Complementary Slackness  Support Vectors
  • KKT conditions lead to Efficient optimization

algorithms (compared to general QP solver)

slide-42
SLIDE 42

SVM Dual Problem

  • Classification of a test point

𝑔 𝑦 = 𝑥, 𝑦 + 𝑐 =

𝑗

𝛽𝑗𝑧𝑗 𝑦𝑗, 𝑦 + 𝑐

  • To get 𝑐 use the fact that 𝑧𝑗𝑔(𝑦𝑗) = 1 for any

support vector.

  • For numerical stability, average over all

support vectors.

slide-43
SLIDE 43

Soft Margin SVM

Hard Margin SVM

min

w,b 𝑗 𝐹∞ 1 −

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 +

1 2 𝑥 2

, where 𝐹∞ 𝑦 = ∞ 𝑦 ≥ 0 𝑦 < 0

slide-44
SLIDE 44

Soft Margin SVM

Hard Margin SVM

min

w,b 𝑗 𝐹∞ 1 −

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 +

1 2 𝑥 2

, where 𝐹∞ 𝑦 = ∞ 𝑦 ≥ 0 𝑦 < 0

𝑧𝑗𝑔(𝑦𝑗) 𝑚𝑝𝑡𝑡 loss regularization

slide-45
SLIDE 45

Soft Margin SVM

Relax it a little bit

min

w,b 𝑗 𝐹𝐷 1 −

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 +

1 2 𝑥 2

, where 𝐹𝐷 𝑦 = 𝐷𝑦 𝑦 ≥ 0 𝑦 < 0

slide-46
SLIDE 46

Soft Margin SVM

Relax it a little bit

min

w,b 𝑗 𝐹𝐷 1 −

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 +

1 2 𝑥 2

, where 𝐹𝐷 𝑦 = 𝐷𝑦 𝑦 ≥ 0 𝑦 < 0

𝑧𝑗𝑔(𝑦𝑗) 𝑚𝑝𝑡𝑡

slide-47
SLIDE 47

Soft Margin SVM

Relax it a little bit

min

w,b 𝐷 𝑗 1 −

𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 + +

1 2 𝑥 2

𝑚𝑝𝑡𝑡 𝑧𝑗𝑔(𝑦𝑗)

slide-48
SLIDE 48

Soft Margin SVM

Equivalent Formulation

min

w,b,𝜂 𝐷 𝑗 𝜂𝑗 + 1 2 𝑥 2

s.t. 𝜂𝑗 ≥ 0 𝑥, 𝑦𝑗 + 𝑐 𝑧𝑗 ≥ 1 − 𝜂𝑗

slide-49
SLIDE 49

Conclusions

  • Duality allows for establishing a lower bound on

minimization problem.

  • Key idea

– “min max” upper bounds “max min”

  • Strong Duality  Necessity of KKT Conditions
  • Duality on SVMs

– Kernel Trick – Support Vectors

  • Soft Margin SVM = Hinge Loss
slide-50
SLIDE 50

Resources

  • Bishop, “Pattern Recognition and Machine

Learning”, Chp 7

  • Gordon & Tibshirani, 10725 Optimization (Fall

2012) Lecture Slides: http://www.cs.cmu.edu/~ggordon/10725- F12/schedule.html

  • Fiterau, Kernels and SVM

“http://alex.smola.org/teaching/cmu2013-10- 701/slides/6_Recitation_Kernels.pdf”