administrivia
play

Administrivia HW4 out based on feedback survey, fewer questions: - PowerPoint PPT Presentation

Administrivia HW4 out based on feedback survey, fewer questions: 4, but only do 3 range of problem types: focus on those that help your understanding split out spoilers for Q2 Midterm mean 65 (out of 95), std dev


  1. Administrivia • HW4 out ‣ based on feedback survey, ‣ fewer questions: 4, but only do 3 ‣ range of problem types: focus on those that help your understanding ‣ split out “spoilers” for Q2 • Midterm ‣ mean 65 (out of 95), std dev 11.3 ‣ back at end of class Geoff Gordon—10-725 Optimization—Fall 2012 1

  2. Review • Cone & QP duality ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L ‣ max –z T Hz/2 – b T y s.t. Hz + c – A T y ∈ L * y ∈ K * • KKT conditions ‣ primal: Ax+b ∈ K x ∈ L ‣ dual: Hz + c – A T y ∈ L* y ∈ K* ‣ quadratic: Hx = Hz ‣ comp. slack: y T (Ax+b) = 0 x T (Hz+c–A T y) = 0 Geoff Gordon—10-725 Optimization—Fall 2012 2

  3. Review Support vector machines query B A Maximum-variance unfolding Geoff Gordon—10-725 Optimization—Fall 2012 3

  4. Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani

  5. SVM duality • min ||v|| 2 /2 – Σ s i s.t. y i (x iT v – d) ≥ 1–s i s i ≥ 0 • min v T v/2 + 1 T s s.t. Av – yd + s – 1 ≥ 0 Geoff Gordon—10-725 Optimization—Fall 2012 5

  6. Interpreting the dual • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ α : % α >0: !#$ α <1: ! y T α =0: "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 6

  7. From dual to primal • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ % !#$ ! "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 7

  8. A suboptimal support set 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 8

  9. SVM duality: the applet Geoff Gordon—10-725 Optimization—Fall 2012

  10. Why is the dual useful? max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 • SVM: n examples, m features: x i = ϕ (u i ) ∈ R m ‣ primal: ‣ dual: Geoff Gordon—10-725 Optimization—Fall 2012 10

  11. The kernel trick • Don’t even need to know features x i = ϕ (u i ), as long as we can compute dot products x iT x j • Matrix of dot products: ‣ K ij = ‣ only need subroutine for k (don’t care about ϕ ) ‣ how do we know k works? ‣ ‣ this is a “positive definite function,” aka “Mercer kernel”— ∃ many examples Geoff Gordon—10-725 Optimization—Fall 2012 11

  12. Examples of kernels • K(u i , u j ) = (1 + u iT u j ) d ‣ can represent any degree-d polynomial ‣ i.e., decision surface is p(u) = b for degree-d poly p • K(u i , u j ) = (u iT u j ) d ‣ polynomial where all terms have degree exactly d ‣ d=1 reduces to original (linear) SVM • K(u i , u j ) = exp(–||u i –u j || 2 /2 σ 2 ) ‣ Gaussian radial basis functions of width σ Geoff Gordon—10-725 Optimization—Fall 2012 12

  13. Gaussian kernel σ = 0.5 2 1 0 � 1 � 2 � 2 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 13

  14. Interior-point methods 10-725 Optimization Geoff Gordon Ryan Tibshirani

  15. Ball center aka Chebyshev center • X = { x | Ax + b ≥ 0 } • Ball center: ‣ ‣ if ||a i || = 1 ‣ in general: Geoff Gordon—10-725 Optimization—Fall 2012 15

  16. Ellipsoid center aka max-volume inscribed ellipsoid • Center d of largest inscribed ellipsoid ‣ E = { Bu + d | ||u|| 2 ≤ 1 } . ‣ vol(E) ≥ vol(X)/n in R n • min log det B -1 s.t. ‣ a iT (Bu+d) + b i ≥ 0 ∀ i ∀ u with ||u|| ≤ 1 ‣ B ≽ 0 • Convex optimization, but relatively expensive: ‣ convex objective, semidefinite constraint ‣ each (u, a i , b i ) yields a linear constraint on B, d Geoff Gordon—10-725 Optimization—Fall 2012 16

  17. Analytic center • Let s = Ax + b • Analytic center: ‣ ‣ Geoff Gordon—10-725 Optimization—Fall 2012 17

  18. Bad conditioning? No problem. a iT x+b i ≥ 0 min – ∑ ln(a iT x+b i ) y = Mx+q Geoff Gordon—10-725 Optimization—Fall 2012 18

  19. Newton for analytic center • f(x) = – ∑ ln(a iT x + b i ) ‣ df/dx = – ∑ a i / (a iT x + b i ) ‣ d 2 f/df 2 = Geoff Gordon—10-725 Optimization—Fall 2012 19

  20. Adding an objective • Analytic center was for: find x st Ax + b ≥ 0 • Now: min c T x st Ax + b ≥ 0 • Same trick: ‣ min f t (x) = c T x – (1/t) ∑ ln(a iT x + b i ) ‣ parameter t > 0 ‣ central path = ‣ t → 0: t → ∞ : Geoff Gordon—10-725 Optimization—Fall 2012 20

  21. Newton for central path • min f t (x) = c T x – (1/t) ∑ ln(a iT x + b i ) ‣ df/dx = ‣ d 2 f/dx 2 = Geoff Gordon—10-725 Optimization—Fall 2012 21

  22. Central path example objective t → 0 t →∞ Geoff Gordon—10-725 Optimization—Fall 2012 22

  23. Dikin ellipsoid • E(x 0 ) = { x | (x–x 0 ) T H(x–x 0 ) ≤ 1 } ‣ H = Hessian of log barrier at x 0 ‣ unit ball of Hessian norm at x 0 • E(x) ⊆ X for any strictly feasible x ‣ affine constraints can be just feasible ‣ E(x): as above, but intersected w/ affine constraints • vol(E(x ac )) ≥ vol(X)/m ‣ weaker than ellipsoid center, but still very useful Geoff Gordon—10-725 Optimization—Fall 2012 23

  24. E(x 0 ) ⊆ X • E(x 0 ) = { x | (x–x 0 ) T H(x–x 0 ) ≤ 1 } ‣ H = A T S -2 A ‣ S = diag(s) = diag(Ax 0 + b) Geoff Gordon—10-725 Optimization—Fall 2012 24

  25. Constraint form of central path • min – ∑ ln s i st Ax + b ≥ 0 c T x ≤ λ • ∃ a 1-1 mapping λ (t) w/ x( λ (t)) = x(t) ∀ t>0 ‣ but this form is slightly less convenient since we don’t know minimal feasible value of λ or maximal nontrivial value of λ Geoff Gordon—10-725 Optimization—Fall 2012 25

  26. Dual of central path • min c T x – (1/t) ∑ ln s i st Ax + b = s ≥ 0 ‣ min x,s max y L(x,s,y) = c T x – (1/t) ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 26

  27. Primal-dual correspondence • Primal and dual for central path: ‣ min c T x – (1/t) ∑ ln s i st Ax + b = s ≥ 0 ‣ max (m ln t)/t + m/t + (1/t) ∑ ln y i – y T b st A T y = c y ≥ 0 • L(x,s,y) = c T x – (1/t) ∑ ln s i + y T (s–Ax–b) ‣ grad wrt s: ‣ to get x: Geoff Gordon—10-725 Optimization—Fall 2012 27

  28. Duality gap • At optimum: ‣ primal value c T x – (1/t) ∑ ln s i = dual value (m ln t)/t + m/t + (1/t) ∑ ln y i – y T b ‣ s ○ y = te Geoff Gordon—10-725 Optimization—Fall 2012 28

  29. Primal-dual constraint form • Primal-dual pair: ‣ min c T x st Ax + b ≥ 0 ‣ max –b T y st A T y = c y ≥ 0 • KKT: ‣ Ax + b ≥ 0 (primal feasibility) ‣ y ≥ 0 A T y = c (dual feasibility) ‣ c T x + b T y ≤ 0 (strong duality) ‣ …or, c T x + b T y ≤ λ (relaxed strong duality) Geoff Gordon—10-725 Optimization—Fall 2012 29

  30. Analytic center of relaxed KKT • Relaxed KKT conditions: ‣ Ax + b ≥ 0 ‣ y ≥ 0 ‣ A T y = c ‣ c T x + b T y ≤ λ • Central path = {analytic centers of relaxed KKT} Geoff Gordon—10-725 Optimization—Fall 2012 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend