qp cone program duality support vector machines
play

QP & cone program duality Support vector machines 10-725 - PowerPoint PPT Presentation

QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Quadratic programs Cone programs SOCP , SDP QP SOCP SDP SOC, S + are self-dual Poly-time algos (but


  1. QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani

  2. Review • Quadratic programs • Cone programs ‣ SOCP , SDP ‣ QP ⊆ SOCP ⊆ SDP ‣ SOC, S + are self-dual • Poly-time algos (but not strongly poly-time, yet) • Examples: group lasso, Huber regression, matrix completion Geoff Gordon—10-725 Optimization—Fall 2012 2

  3. Matrix completion • Observe A ij for ij ∈ E, write O ij = { = { • min ||(X–A) ○ P|| 2 + λ ||X|| � � � * F * X Geoff Gordon—10-725 Optimization—Fall 2012 23 Geoff Gordon—10-725 Optimization—Fall 2012 3

  4. Max-variance unfolding aka semidefinite embedding • Goal: given x 1 , … x T ∈ R n 1 ‣ find y 1 , …, y T ∈ R k (k ≪ n) ‣ ||y i – y j || ≈ ||x i – x j || ∀ i,j ∈ E 0.5 • If x i were near a k-dim 0 subspace of R n , PCA! � 0.5 • Instead, two steps: � 1 ‣ first look for z 1 , … z T ∈ R n with � 1 � 0.5 0 0.5 1 1.5 ‣ ||z i – z j || = ||x i – x j || ∀ i,j ∈ E ‣ and var(z) as big as possible ‣ then use PCA to get y i from z i Geoff Gordon—10-725 Optimization—Fall 2012 4

  5. MVU/SDE • max z tr(cov(z)) s.t. ||z i – z j || = ||x i – x j || ∀ i,j ∈ E Geoff Gordon—10-725 Optimization—Fall 2012 5

  6. Result • Embed 400 images of a teapot into 2d [Weinberger & Saul, AAAI, 2006] Euclidean query B distance from query to A is smaller; after MVU, distance to B is smaller A Geoff Gordon—10-725 Optimization—Fall 2012 6

  7. Duality for QPs and Cone Ps • Combined QP/CP: ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L ‣ cones K, L implement any/all of equality, inequality, generalized inequality ‣ assume K, L proper (closed, convex, solid, pointed) Geoff Gordon—10-725 Optimization—Fall 2012 7

  8. Primal-dual pair • Primal: ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L • Dual: ‣ max –z T Hz/2 – b T y s.t. Hz + c – A T y ∈ L * y ∈ K * Geoff Gordon—10-725 Optimization—Fall 2012 8

  9. KKT conditions dual pair ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L primal- ‣ max –b T y – z T Hz/2 s.t. Hz + c – A T y ∈ L* y ∈ K* Geoff Gordon—10-725 Optimization—Fall 2012 9

  10. KKT conditions ‣ primal: Ax+b ∈ K x ∈ L ‣ dual: Hz + c – A T y ∈ L* y ∈ K* ‣ quadratic: Hx = Hz ‣ comp. slack: y T (Ax+b) = 0 x T (Hz+c–A T y) = 0 Geoff Gordon—10-725 Optimization—Fall 2012 10

  11. Support vector machines (separable case) Geoff Gordon—10-725 Optimization—Fall 2012

  12. Maximizing margin • margin M = y i (x i . w - b) • max M s.t. M ≤ y i (x i . w - b) Geoff Gordon—10-725 Optimization—Fall 2012

  13. For example 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 3 Geoff Gordon—10-725 Optimization—Fall 2012 13

  14. Slacks • min ||v|| 2 /2 s.t. y i (x iT v – d) ≥ 1 ∀ i 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 3 Geoff Gordon—10-725 Optimization—Fall 2012 14

  15. SVM duality • min ||v|| 2 /2 – Σ s i s.t. y i (x iT v – d) ≥ 1–s i ∀ i • min v T v/2 + 1 T s s.t. Av – yd + s – 1 ≥ 0 Geoff Gordon—10-725 Optimization—Fall 2012 15

  16. Interpreting the dual • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ α : % α >0: !#$ α <1: ! y T α =0: "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 16

  17. From dual to primal • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ % !#$ ! "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 17

  18. A suboptimal support set 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 18

  19. SVM duality: the applet Geoff Gordon—10-725 Optimization—Fall 2012

  20. Why is the dual useful? aka the kernel trick max 1 T α – α T AA T α /2 s.t. y T α = 0 0 ≤ α ≤ 1 • SVM: n examples, m features ‣ primal: ‣ dual: Geoff Gordon—10-725 Optimization—Fall 2012 20

  21. Interior-point methods 10-725 Optimization Geoff Gordon Ryan Tibshirani

  22. Ball center aka Chebyshev center • X = { x | Ax + b ≥ 0 } • Ball center: ‣ ‣ if ||a i || = 1 ‣ in general: Geoff Gordon—10-725 Optimization—Fall 2012 22

  23. Analytic center • Let s = Ax + b • Analytic center: ‣ ‣ Geoff Gordon—10-725 Optimization—Fall 2012 23

  24. Bad conditioning? No problem. Geoff Gordon—10-725 Optimization—Fall 2012 24

  25. Newton for analytic center • Lagrangian L(x,s,y) = – ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 25

  26. Adding an objective • Analytic center was for { x | Ax + b = s ≥ 0 } • Now: min c T x st Ax + b = s ≥ 0 • Same trick: ‣ min t c T x – ∑ ln s i st Ax + b = s ≥ 0 ‣ parameter t ≥ 0 ‣ central path = ‣ t → 0: t → ∞ : ‣ L(x,s,y) = Geoff Gordon—10-725 Optimization—Fall 2012 26

  27. Newton for central path • L(x,s,y) = t c T x – ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend