sketchy decisions
play

Sketchy Decisions Joel A. Tropp Steele Family Professor of - PowerPoint PPT Presentation

Sketchy Decisions Joel A. Tropp Steele Family Professor of Applied & Computational Mathematics jtropp@cms.caltech.edu Department of Computing + Mathematical Sciences California Institute of Technology Collaborators: Volkan Cevher


  1. Sketchy Decisions ❦ Joel A. Tropp Steele Family Professor of Applied & Computational Mathematics jtropp@cms.caltech.edu Department of Computing + Mathematical Sciences California Institute of Technology Collaborators: Volkan Cevher (EPFL), Roarke Horstmeyer (Duke), Quoc Tran-Dinh (UNC), Madeleine Udell (Cornell), Alp Yurtsever (EPFL) Research supported by ONR, AFOSR, NSF, DARPA, ERC, SNF, Sloan, and Moore. 1

  2. . Motivation: . MaxCut Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 2

  3. Graphs & Cuts ❧ Let G = ( V , E ) be an undirected graph with V = {1,..., n } ❧ A cut is a subset S ⊂ V ❧ The weight of the cut is the number of edges between S and V \ S ❧ The associated cut vector χ ∈ R n has entries � + 1, i ∈ S χ i = − 1, i ∉ S ❧ The Laplacian L is the n × n positive-semidefinite (psd) matrix ( e i − e j )( e i − e j ) t � L = { i , j } ∈ E ❧ Calculate the weight of the cut algebraically: ( χ i − χ j ) 2 = 4 · weight( S ) χ t L χ = � { i , j } ∈ E Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 3

  4. The Most Unkindest Cut of All ❧ Calculate the maximum cut via a mathematical program: x t Lx x ∈ { ± 1} n maximize subject to ( MAXCUT ) ❧ NP-hard , so relax to a semidefinite program (SDP) via the map xx t �→ X : maximize trace( LX ) ( MAXCUT SDP) subject to diag( X ) = 1 , X psd ❧ Report signum of maximum eigenvector of solution (or randomly round) ❧ Provably good idea, but... ❧ Laplacian L of a graph with m edges has Θ ( m ) nonzeros ❧ SDP decision variable X has Θ ( n 2 ) degrees of freedom ❧ Storage! Communication! Computation! Sources: Goemans & Williamson 1996. Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 4

  5. . Optimization with . Optimal Storage Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 5

  6. Optimization with Optimal Storage Can we develop algorithms that reliably solve an optimization problem using storage that does not exceed the size of the problem data or the size of the solution? Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 6

  7. Model Problem: Trace-Constrained SDP minimize trace( C X ) subject to A ( X ) = b (SDP (P)) X ∈ ∆ n Details: ❧ C ∈ H n and b ∈ R d ❧ A : H n → R d is a real-linear map ❧ ∆ n = { X ∈ H n : X psd, trace X = 1} ❧ In many applications, d ≪ n 2 and all solutions have low rank ❧ Goal: Produce a rank- r approximation to a solution of (SDP (P)) H n = set of n × n Hermitian matrices Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 7

  8. Optimal Storage What kind of storage bounds can we hope for? ❧ Assume black-box implementation of operations with objective + constraint: u �→ A ( uu ∗ ) ( u , z ) �→ ( A ∗ z ) u u �→ Cu C n → C n C n → R d C n × R d → C n ❧ Need Θ ( n + d ) storage for output of black-box operations ❧ Need Θ ( r n ) storage for rank- r approximate solution of model problem Definition. An algorithm for the model problem has optimal storage if its working storage is Θ ( d + r n ) rather than Θ ( n 2 ) . Source: Yurtsever et al. 2017; Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 8

  9. So Many Algorithms... ❧ 1990s: Interior-point methods ❧ Storage cost Θ ( n 4 ) for Hessian ❧ Late 1990s: Dual (sub)gradient methods ❧ Dual cutting plane methods, spectral bundle methods ❧ Storage grows in each iteration; slow convergence ❧ 2000s: Convex first-order methods ❧ (Accelerated) proximal gradient, operator splitting, and others ❧ Store matrix variable Θ ( n 2 ) ; projection onto psd cone ❧ 2003–Present: Nonconvex heuristics ❧ Burer–Monteiro factorization idea + various nonlinear programming methods ❧ Store low-rank matrix factors Θ ( r n ) ❧ For guaranteed solution, need unrealistic + unverifiable statistical assumptions Sources: Interior-point: Nemirovski & Nesterov 1994; ... First-order: Rockafellar 1976; Helmberg & Rendl 1997; Auslender & Teboulle 2006; ... CGM: Frank & Wolfe 1956; Levitin & Poljak 1967; Jaggi 2013; ... Heuristics: Burer & Monteiro 2003; Keshavan et al. 2009; Jain et al. 2012; Bhojanapalli et al. 2015; Boumal et al. 2016; .... Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 9

  10. The Challenge ❧ Some algorithms provably solve the model problem... ❧ Some algorithms have optimal storage guarantees... Is there a practical algorithm that provably computes a low-rank approximation to a solution of the model problem + has optimal storage guarantees? Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 10

  11. . SketchyPDA Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 11

  12. Algorithm Design Principles 1. Dualize (SDP (P)) carefully ❧ Dimension of dual problem equals number d of primal constraints ❧ Compute dual objective and its subgradient via eigenvector computation 2. Solve using a primal–dual averaging (PDA) method ❧ Subgradient ascent on dual objective ❧ Construct primal solution as a sequence of rank-one updates 3. Sketch stream of primal updates ❧ Dimension of sketch is proportional to size of solution Θ ( r n ) ❧ Sketching a rank-one update is inexpensive ❧ Return primal solution in factored form after optimization converges Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 12

  13. Dual in the Sun ❧ Lagrangian: C + A ∗ z , X � � L ( X ; z ) = 〈 C , X 〉+〈 z , A ( X ) − b 〉 = −〈 z , b 〉 ❧ Dual function: C + A ∗ z � � q ( z ) = min X ∈ ∆ n L ( X ; z ) = λ min −〈 z , b 〉 ❧ Subgradients of dual function: ∇ q ( z ) = A ( vv ∗ ) − b v is a min. eigenvector of C + A ∗ z where ❧ Compute dual objective + subgradient via Lanczos method ❧ Based on matrix–vector multiply: ( C + A ∗ z ) u ❧ Can complete all calculations using our primitives! Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 13

  14. SDP Primal & Dual minimize 〈 C , X 〉 (SDP (P)) subject to A ( X ) = b X ∈ ∆ n C + A ∗ z � � maximize λ min −〈 z , b 〉 (SDP (D)) z ∈ R d subject to Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 14

  15. PDA: Primal–Dual Averaging ❧ Initialize: z ← 0 and X ← 0 ❧ Compute (approximate) minimum eigenvector v of matrix C + A ∗ z ❧ Form subgradient: ∇ q ( z ) = A ( vv ∗ ) − b ❧ Subgradient ascent on dual variable: z ← z + η ∇ q ( z ) ❧ Update primal variable by averaging: X ← (1 − θ ) X + θ ( vv ∗ ) ❧ Observe: Primal expressed as a stream of rank-one linear updates! ❧ Warning: This is not the whole story! Sources: Nesterov 2007, 2015; Yurtsever et al. 2015; .... Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 15

  16. Sketching the Primal Decision Variable ❧ Fix target rank r for solution ❧ Draw Gaussian dimension reduction map Ω ∈ C n × k where k = 2 r ❧ Sketch takes form Y = X Ω ∈ C n × k ❧ Can perform rank-one linear update X ← (1 − θ ) X + θ ( vv ∗ ) on sketch: Y ← (1 − θ ) Y + θ v ( v ∗ Ω ) ❧ Can compute provably good rank- r approximation ˆ X from sketch: E � X − ˆ X � S 1 ≤ 2 � X −� X � r � S 1 ❧ Only uses storage Θ ( r n ) ! Sources: Woolfe et al. 2008; Clarkson & Woodruff 2009, 2017; Halko et al. 2009; Gittens 2011, 2013; Woodruff 2014; Cohen et al. 2015; Boutsidis et al. 2015; Pourkamali-Anaraki & Becker 2016; Tropp et al. 2016, 2017; Wang et al. 2017; .... Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 16

  17. SketchyPDA for the Model Problem Input: Problem data; dual Lipschitz constant L ; suboptimality ε ; target rank r X = V Λ V ∗ in factored form Output: Rank- r approximate solution ˆ function SKETCHYPDA 1 SKETCH.INIT ( n , r ) ⊲ Initialize primal sketch 2 z ← 0 ⊲ Initialize dual variable 3 γ ← ε / L 2 and β ← 0 ⊲ Step size, distance traveled 4 for t ← 0,1,2,3,... do 5 v ← MinEigVec ( C + A ∗ z ) ⊲ Use Lanczos! 6 g ← A ( vv ∗ ) − b ⊲ Form subgradient 7 z ← z + γ g ⊲ Dual subgradient ascent 8 β ← β + γ ⊲ Update distance traveled 9 SKETCH.UPDATE ( v , γ / β ) ⊲ Update primal sketch 10 ( V , Λ ) ← SKETCH.RECONSTRUCT () ⊲ Approx. eigendecomp of X 11 return ( V , Λ ) 12 Source: Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 17

  18. Methods for SKETCH Object function SKETCH.INIT ( n , r ) ⊲ Rank- r approx of n × n psd matrix 1 k ← 2 r 2 Ω ← randn ( C , n , k ) 3 Y ← zeros ( n , k ) 4 function SKETCH.UPDATE ( v , θ ) 5 ⊲ Average vv ∗ into sketch Y ← (1 − θ ) Y + θ v ( v ∗ Ω ) 6 function SKETCH.RECONSTRUCT ( ) 7 C ← chol ( Ω ∗ Y ) ⊲ Cholesky decomposition 8 Z ← Y / C ⊲ Solve least-squares problems 9 ⊲ Compute r -truncated SVD ( U , Σ , ∼ ) ← svds ( Z , r ) 10 return ( U , Σ 2 ) ⊲ Return eigenvalue factorization 11 ❧ Modify reconstruction procedure for numerical stability! Sources: Yurtsever et al. 2017; Cevher et al. 2017; Tropp et al. 2016, 2017. Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend