sparse convex optimization methods
play

Sparse Convex Optimization Methods for Machine Learning PhD - PowerPoint PPT Presentation

Sparse Convex Optimization Methods for Machine Learning PhD Defense Talk 2011 / 10 / 04 Martin Jaggi Examiner: Co-Examiners: Emo Welzl Bernd Grtner, Elad Hazan, Joachim Giesen, Joachim Buhmann Convex Optimization D R n f ( x ) x


  1. Sparse Convex Optimization Methods for Machine Learning PhD Defense Talk 2011 / 10 / 04 Martin Jaggi Examiner: Co-Examiners: Emo Welzl Bernd Gärtner, Elad Hazan, Joachim Giesen, Joachim Buhmann

  2. Convex Optimization D ⊂ R n

  3. f ( x ) x ∈ D f ( x ) min x D ⊂ R n

  4. f ( x ) x ∈ D f ( x ) min x D ⊂ R n

  5. f ( x ) x ∈ D f ( x ) min x D ⊂ R n

  6. f ( x ) x ∈ D f ( x ) min x D ⊂ R n

  7. f ( x ) The Linearized Problem min y ∈ D f ( x ) + h y � x, d x i x s D ⊂ R n Algorithm 1 Greedy on a Compact Convex Set Pick an arbitrary starting point x (0) ⇤ D Theorem: for k = 0 . . . ⇥ do Algorithm obtains Let d x ⇤ ∂ f ( x ( k ) ) be a subgradient to f at x ( k ) � 1 accuracy � O Compute s := approx arg min ⌅ y, d x ⇧ k y ∈ D after steps . 2 k Let α := k +2 Update x ( k +1) := x ( k ) + α ( s � x ( k ) ) end for

  8. f ( x ) The Linearized Problem min y ∈ D f ( x ) + h y � x, d x i x D ⊂ R n d x Our Method Gradient Descent Approx. solve Cost per step Projection back to D linearized problem on D 1 /k 1 /k Convergence ✓ Sparse / Low ✗ Rank Solutions (depending on the domain)

  9. History & Related Work Known Approx. Primal-Dual Domain Stepsize Subproblem Guarantee Frank & Wolfe linear inequality ✗ ✗ ✗ 1956 constraints Dunn general bounded ✓ ✗ ✗ 1978, 1980 convex domain Zhang ✓ ✗ ✗ convex hulls 2003 Clarkson ✓ ✓ ✗ unit simplex 2008, 2010 Hazan semidefinite matrices ✓ ✓ ✓ 2008 of bounded trace general bounded ✓ ✓ ✓ J. PhD Thesis convex domain

  10. Sparse Approximation x ∈ ∆ n f ( x ) min D := conv ( { e i | i ∈ [ n ] } ) unit simplex 1 for k = 0 . . . ∞ do Corollary: Let d x ∈ ∂ f ( x ( k ) ) be a subgradient to f at x ( k ) Algorithm gives an -approximate Compute i := arg min i ( d x ) i ε � 1 2 Let α := k +2 � solution of sparsity . O Update x ( k +1) := x ( k ) + α ( e i − x ( k ) ) end for ε [ Clarkson SODA '08 ] lower bound: Sparsity as a function of “Coresets” � 1 � the approximation quality Ω ε

  11. Applications • Smallest enclosing ball • Linear Classifiers (such as Support Vector Machines, -loss) ` 2 x ∈ ∆ n x T ( K + t 1 ) x min • Model Predictive Control • Mean Variance Portfolio Optimization x ∈ ∆ n x T Cx − t · b T x min

  12. Sparse Approximation k x k 1  1 f ( x ) min D := conv ( {± e i | i ∈ [ n ] } ) ` 1 -ball for k = 0 . . . ∞ do Corollary: Let d x ∈ @ f ( x ( k ) ) be a subgradient to f at x ( k ) Compute i := arg max i | ( d x ) i | , Algorithm gives an -approximate ε and let s := e i · sign (( − d x ) i ) � 1 2 Let ↵ := � solution of sparsity . O k +2 Update x ( k +1) := x ( k ) + ↵ ( s − x ( k ) ) ε end for lower bound: Sparsity as a function of “Coresets” � 1 � the approximation quality Ω ε

  13. Applications • -regularized regression ` 1 k x k 1  t k Ax � b k 2 min 2 Sparse Recovery

  14. Low Rank Approximation min x ∈ D f ( x ) � v 2 R n , vv T � X 2 Sym n × n � D := conv ( ) = k v k 2 =1 X ⌫ 0 Tr ( X ) = 1 spectahedron for k = 0 . . . 1 do Corollary: X 2 ∂ f ( X ( k ) ) be a subgradient to f at X ( k ) Let D 2 Let α := Algorithm gives an -approximate ε k +2 � 1 Compute v := v ( k ) = ApproxEV ( D X , α C f ) � solution of rank . Update X ( k +1) := X ( k ) + α ( vv T � X ( k ) ) O end for ε [ Hazan LATIN '08 ] lower bound: � 1 � Ω ε

  15. Applications • Trace norm regularized problems k X k ∗  t f ( X ) min Low-Rank Matrix Recovery • Max norm regularized problems

  16. ⎫ ⎭ Matrix Factorizations for recommender systems Movies The Netflix challenge: 1 3 17k Movies Customers 500k Customers 4 100M Observed Entries 1 ( ≈ 1%) ≈ UV T Y 2 3 = v (1) 5 3 ⎬ k Sulovsk ý 2 v ( k ) = 2 1 3 u (1) u ( k ) X ⌫ 0 f ( X ) min Is equivalent to: s.t. Tr ( X ) = t ( Y ij � ( UV T ) ij ) 2 X min U,V ( i,j ) ∈ Ω 1 2 4 1  UU T UV T  s.t. k U k 2 F ro + k V k 2 F ro = t 2 3 5 3 2 =: X 2 1 3   5 V U T V V T 1 1 2 2 [ J, Sulovsk ý ICML 2010 ] 4 2 2 3 1 3 3

  17. A Simple Alternative Optimization Duality The Problem x ∈ D f ( x ) min f ( x ) The Dual gap (x) ω ( x ) := min y ∈ D f ( x ) + h y � x, d x i ω ( x ) Weak Duality x ω ( x ) ≤ f ( x ⇤ ) ≤ f ( x 0 ) D ⊂ R n

  18. Pathwise Optimization f t 0 ( x ) 200 The Parameterized Problem x ∈ D f t ( x ) min ω t 0 ( x ) f t ( x ∗ t ) 100 “Better than necessary” g t ( x ) ≤ ε 2 0 t t 0 The difference � 1 − 1 � ⇐ | t 0 − t | ≤ ε · P f g t 0 ( x ) − g t ( x ) ≤ ε 2 “Continuity in the parameter” “Still good enough” Theorem: � 1 � g t 0 ( x ) ≤ ε O There are many intervals ε of piecewise constant -approx. solutions. ε [ Giesen, J, Laue ESA 2010 ]

  19. Applications test accuracy • Smallest enclosing ball ionosphere breast-cancer of moving points t • SVMs, MKL (with 2 base kernels) x ∈ ∆ n x T ( K + t 1 ) x min • Model Predictive Control • robust PCA • Mean Variance Portfolio Optimization • Recommender Systems x ∈ ∆ n x T Cx − t · b T x min

  20. f t 0 ( x ) 200 Thanks ω t 0 ( x ) f t ( x ∗ t ) 100 co-authors: 0 Bernd Gärtner t t 0 Joachim Giesen Soeren Laue Marek Sulovsk ý f ( x ) 3D visualization: Robert Carnecky x D ⊂ R n

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend