10701 recitation 5
play

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline - PowerPoint PPT Presentation

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Examples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian


  1. 10701 Recitation 5 Duality and SVM Ahmed Hefny

  2. Outline • Langrangian and Duality – The Lagrangian – Duality – Examples • Support Vector Machines – Primal Formulation – Dual Formulation – Soft Margin and Hinge Loss

  3. Lagrangian • Consider the problem min 𝑦 𝑔(𝑦) s.t. 𝑕 𝑗 𝑦 = 0 • Add a Lagrange multiplier for each constraint 𝑀 𝑦, 𝑣 = 𝑔 𝑦 + 𝑗 𝑣 𝑗 𝑕 𝑗 (𝑦)

  4. Lagrangian • Lagrangian 𝑀 𝑦, 𝑣 = 𝑔 𝑦 + 𝑗 𝑣 𝑗 𝑕 𝑗 (𝑦) • Setting gradient to 0 gives – 𝑕 𝑗 𝑦 = 0 [Feasible point] – 𝛼𝑔 𝑦 + 𝑗 𝑣 𝑗 𝛼𝑕 𝑗 𝑦 = 0 [Cannot decrease 𝑔 except by violating constraints]

  5. Lagrangian • Consider the problem min 𝑦 𝑔(𝑦) 𝑕 𝑗 𝑦 = 0 s.t. ℎ 𝑘 𝑦 ≤ 0 • Add a Lagrange multiplier for each constraint 𝑀 𝑦, 𝑣, 𝜇 = 𝑔 𝑦 + 𝑗 𝑣 𝑗 𝑕 𝑗 (𝑦) + 𝑘 𝜇 𝑘 ℎ 𝑘 (𝑦)

  6. Duality

  7. Duality • Primal problem min 𝑦 𝑔(𝑦) 𝑕 𝑗 𝑦 = 0 s.t. ℎ 𝑘 𝑦 ≤ 0 • Equivalent to min 𝜇≥0,𝑣 𝑔 𝑦 + max 𝑣 𝑗 𝑕 𝑗 (𝑦) + 𝜇 𝑘 ℎ 𝑘 (𝑦) 𝑦 𝑗 𝑘

  8. Duality • Primal problem min 𝑦 𝑔(𝑦) 𝑕 𝑗 𝑦 = 0 s.t. ℎ 𝑘 𝑦 ≤ 0 • Equivalent to 𝑦 𝑔(𝑦) 𝑦 𝑗𝑡 𝑔𝑓𝑏𝑡𝑗𝑐𝑚𝑓 min ∞ 𝑝. 𝑥.

  9. Duality • Dual Problem 𝑦 𝑔 𝑦 + 𝑗 𝑣 𝑗 𝑕 𝑗 (𝑦) + 𝑘 𝜇 𝑘 ℎ 𝑘 (𝑦) 𝜇≥0,𝑣 min max Lagrangian Dual Function 𝑀(𝜇, 𝑣) • Dual function: – Concave, regardless of the convexity of the primal – Lower bound on primal

  10. Duality Primal Problem min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) 𝑦 λ

  11. Duality Primal Problem min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) 𝑦 For each row (choice of 𝑦 ), pick the largest element then select the minimum. λ

  12. Duality Dual Problem max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝑦 For each column (choice of 𝜇 ), pick the smallest element then select the maximum. λ

  13. Duality Claim: min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) ≥ max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝑦 ∗ , 𝜇 ∗ 𝑦 λ

  14. Duality Claim: min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) ≥ max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝑦 ∗ , 𝜇 ∗ 𝑦 For any 𝜇 ≥ 0 𝑦 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦 ∗ , 𝜇 ≤ 𝑀(𝑦 ∗ , 𝜇 ∗ ) min The difference between primal minimum And dual maximum is called duality gap λ duality gap = 0  Strong Duality

  15. Duality When does min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝑦 ∗ , 𝜇 ∗ 𝑦 λ

  16. Duality When does min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝒚 ∗ , 𝝁 ∗ 𝑦 𝑦 ∗ , 𝜇 ∗ is a saddle point 𝑀 𝑦 ∗ , 𝜇 ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ ≤ 𝑀(𝑦, 𝜇 ∗ ) λ

  17. Duality When does min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝒚 ∗ , 𝝁 ∗ 𝑦 𝑦 ∗ , 𝜇 ∗ is a saddle point 𝑀 𝑦 ∗ , 𝜇 ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ ≤ 𝑀(𝑦, 𝜇 ∗ ) Necessity  By definition of dual Sufficiency  x 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ 𝑀 𝜇 = min λ 𝑀 𝜇 ∗ = 𝑀 𝑦 ∗ , 𝜇 ∗

  18. Duality When does min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝒚 ∗ , 𝝁 ∗ 𝑦 𝑦 ∗ , 𝜇 ∗ is a saddle point 𝑀 𝑦 ∗ , 𝜇 ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ ≤ 𝑀(𝑦, 𝜇 ∗ ) Necessity  By definition of dual Sufficiency  𝑀 𝜇 = min 𝑦 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ λ 𝑀 𝜇 ∗ = 𝑀 𝑦 ∗ , 𝜇 ∗ The dual at 𝜇 ∗ is the upper bound

  19. Duality • If strong duality holds, KKT conditions apply to optimal point – Stationary Point 𝛼𝑀 𝑦, 𝑣, 𝜇 = 0 – Primal Feasibility – Dual Feasibility ( 𝜇 ≥ 0 ) – Complementary Slackness ( 𝜇 𝑗 ℎ 𝑗 𝑦 = 0 ) • KKT conditions are – Sufficient – Necessary under strong duality

  20. Example: LP • Primal 𝑦 𝑑 𝑈 𝑦 min s.t. 𝐵𝑦 ≥ 𝑐

  21. Example: LP • Primal 𝑦 𝑑 𝑈 𝑦 min s.t. 𝐵𝑦 ≥ 𝑐 • Lagrangian 𝑀 𝑦, 𝜇 = 𝑑 𝑈 𝑦 − 𝜇 𝑈 𝐵𝑦 − 𝑐

  22. Example: LP • Dual Function 𝑦 𝑑 𝑈 𝑦 − 𝜇 𝑈 𝐵𝑦 − 𝑐 𝑀 𝜇 = min

  23. Example: LP • Dual Function 𝑦 𝑑 𝑈 𝑦 − 𝜇 𝑈 𝐵𝑦 − 𝑐 𝑀 𝜇 = min • Set gradient w.r.t 𝑦 to 0 − 𝐵 𝑈 𝜇 = 0 𝑑

  24. Example: LP • Dual Function 𝑦 𝑑 𝑈 𝑦 − 𝜇 𝑈 𝐵𝑦 − 𝑐 𝑀 𝜇 = min • Set gradient w.r.t 𝑦 to 0 𝑑 − 𝐵 𝑈 𝜇 = 0 • Dual Problem 𝜇≥0 𝜇 𝑈 𝑐 max s.t. 𝑑 − 𝐵 𝑈 𝜇 = 0 Why keep this as a constraint ?

  25. Example: LASSO • We will use duality to transform LASSO into a QP

  26. Example: LASSO Primal min 1 2 𝑧 − 𝑌𝑥 2 + 𝛿 𝑥 1 What is the dual function in this case ?

  27. Example: LASSO Reformulated Primal min 1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 s.t. 𝑨 = 𝑌𝑥 Dual 1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 + 𝜇 𝑈 (𝑨 − 𝑌𝑥) 𝑀 𝜇 = min 𝑨,𝑥

  28. Example: LASSO Dual 1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 + 𝜇 𝑈 (𝑨 − 𝑌𝑥) 𝑀 𝜇 = min 𝑨,𝑥 Setting gradient to zero gives 𝑨 = 𝑧 − 𝜇 𝑌 𝑈 𝜇 ∞ ≤ 𝛿

  29. Example: LASSO • Dual Problem max − 1 2 𝜇 2 + 𝜇 𝑈 𝑧 s.t. 𝑌 𝑈 𝜇 ∞ ≤ 𝛿

  30. Support Vector Machines docs.opencv.org

  31. Support Vector Machines • Find the maximum margin hyper-plane • “Distance” from a point 𝑦 to the hyper-plane 𝑥, 𝑦 𝑗 + 𝑐 = 0 is given by 𝑒 𝑗 = ( 𝑥, 𝑦 𝑗 + 𝑐)/ 𝑥 1 • 𝑁𝑏𝑠𝑕𝑗𝑜 = min 𝑗 𝑧 𝑗 𝑒 𝑗 = 𝑥 min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 1 • Max Margin: max 𝑥 min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥,𝑐

  32. Support Vector Machines • Max Margin 1 max min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥 𝑥,𝑐 • Unpleasant (max min ?) • No Unique Solution

  33. Support Vector Machines • Max Margin 1 max min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥 𝑥,𝑐 s.t. ???

  34. Support Vector Machines • Max Margin 1 max min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥 𝑥,𝑐 s.t. min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 = 1

  35. Support Vector Machines • Max Margin 1 2 𝑥 2 min 𝑥,𝑐 s.t. min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 = 1 𝑗

  36. Support Vector Machines • Max Margin (Canonical Representation) 1 2 𝑥 2 min 𝑥,𝑐 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 ≥ 1, ∀𝑗 s.t. • QP, much better than 1 max 𝑥 min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥,𝑐

  37. SVM Dual Problem Recall that the Lagrangian is formed by adding a Lagrange multiplier for each constraint. 𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 − 𝛽 𝑗 [ 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 − 1] 𝑗

  38. SVM Dual Problem 𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 − 𝛽 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 − 1 𝑗 Fix 𝛽 and minimize w.r.t 𝑥, 𝑐 : 𝑥 − 𝑗 𝛽 𝑗 𝑧 𝑗 𝑦 𝑗 = 0 𝑗 𝛽 𝑗 𝑧 𝑗 = 0

  39. SVM Dual Problem 𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 − 𝛽 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 − 1 𝑗 Fix 𝛽 and minimize w.r.t 𝑥, 𝑐 : Plug-in 𝑥 − 𝑗 𝛽 𝑗 𝑧 𝑗 𝑦 𝑗 = 0 𝑗 𝛽 𝑗 𝑧 𝑗 = 0 Constraint (why ?)

  40. SVM Dual Problem Dual Problem max − 1 2 𝛽 𝑗 𝛽 𝑘 𝑧 𝑗 𝑧 𝑘 𝑦 𝑗 , 𝑦 𝑘 + 𝛽 𝑗 𝑗 𝑘 𝑗 s.t. 𝑗 𝛽 𝑗 𝑧 𝑗 = 0 𝛽 𝑗 ≥ 0 Another QP. So what ?

  41. SVM Dual Problem • Only Inner products  Kernel Trick • Complementary Slackness  Support Vectors • KKT conditions lead to Efficient optimization algorithms (compared to general QP solver)

  42. SVM Dual Problem • Classification of a test point 𝑔 𝑦 = 𝑥, 𝑦 + 𝑐 = 𝛽 𝑗 𝑧 𝑗 𝑦 𝑗 , 𝑦 + 𝑐 𝑗 • To get 𝑐 use the fact that 𝑧 𝑗 𝑔(𝑦 𝑗 ) = 1 for any support vector. • For numerical stability, average over all support vectors.

  43. Soft Margin SVM Hard Margin SVM 2 1 w,b 𝑗 𝐹 ∞ 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + 2 𝑥 , where 𝐹 ∞ 𝑦 = ∞ 𝑦 ≥ 0 0 𝑦 < 0

  44. Soft Margin SVM Hard Margin SVM 2 1 w,b 𝑗 𝐹 ∞ 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + 2 𝑥 , where loss regularization 𝑚𝑝𝑡𝑡 𝐹 ∞ 𝑦 = ∞ 𝑦 ≥ 0 0 𝑦 < 0 𝑧 𝑗 𝑔(𝑦 𝑗 )

  45. Soft Margin SVM Relax it a little bit 2 1 w,b 𝑗 𝐹 𝐷 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + 2 𝑥 , where 𝐹 𝐷 𝑦 = 𝐷𝑦 𝑦 ≥ 0 0 𝑦 < 0

  46. Soft Margin SVM Relax it a little bit 2 1 w,b 𝑗 𝐹 𝐷 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + 2 𝑥 , where 𝑚𝑝𝑡𝑡 𝐹 𝐷 𝑦 = 𝐷𝑦 𝑦 ≥ 0 0 𝑦 < 0 𝑧 𝑗 𝑔(𝑦 𝑗 )

  47. Soft Margin SVM Relax it a little bit 1 2 𝑥 2 w,b 𝐷 𝑗 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + + 𝑚𝑝𝑡𝑡 𝑧 𝑗 𝑔(𝑦 𝑗 )

  48. Soft Margin SVM Equivalent Formulation 1 2 𝑥 2 w,b,𝜂 𝐷 𝑗 𝜂 𝑗 + min s.t. 𝜂 𝑗 ≥ 0 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 ≥ 1 − 𝜂 𝑗

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend