Sketchy Decisions Joel A. Tropp Steele Family Professor of - PowerPoint PPT Presentation

Sketchy Decisions ❦ Joel A. Tropp Steele Family Professor of Applied & Computational Mathematics jtropp@cms.caltech.edu Department of Computing + Mathematical Sciences California Institute of Technology Collaborators: Volkan Cevher (EPFL), Roarke Horstmeyer (Duke), Quoc Tran-Dinh (UNC), Madeleine Udell (Cornell), Alp Yurtsever (EPFL) Research supported by ONR, AFOSR, NSF, DARPA, ERC, SNF, Sloan, and Moore. 1

. Motivation: . MaxCut Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 2

Graphs & Cuts ❧ Let G = ( V , E ) be an undirected graph with V = {1,..., n } ❧ A cut is a subset S ⊂ V ❧ The weight of the cut is the number of edges between S and V \ S ❧ The associated cut vector χ ∈ R n has entries � + 1, i ∈ S χ i = − 1, i ∉ S ❧ The Laplacian L is the n × n positive-semidefinite (psd) matrix ( e i − e j )( e i − e j ) t � L = { i , j } ∈ E ❧ Calculate the weight of the cut algebraically: ( χ i − χ j ) 2 = 4 · weight( S ) χ t L χ = � { i , j } ∈ E Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 3

The Most Unkindest Cut of All ❧ Calculate the maximum cut via a mathematical program: x t Lx x ∈ { ± 1} n maximize subject to ( MAXCUT ) ❧ NP-hard , so relax to a semidefinite program (SDP) via the map xx t �→ X : maximize trace( LX ) ( MAXCUT SDP) subject to diag( X ) = 1 , X psd ❧ Report signum of maximum eigenvector of solution (or randomly round) ❧ Provably good idea, but... ❧ Laplacian L of a graph with m edges has Θ ( m ) nonzeros ❧ SDP decision variable X has Θ ( n 2 ) degrees of freedom ❧ Storage! Communication! Computation! Sources: Goemans & Williamson 1996. Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 4

. Optimization with . Optimal Storage Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 5

Optimization with Optimal Storage Can we develop algorithms that reliably solve an optimization problem using storage that does not exceed the size of the problem data or the size of the solution? Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 6

Model Problem: Trace-Constrained SDP minimize trace( C X ) subject to A ( X ) = b (SDP (P)) X ∈ ∆ n Details: ❧ C ∈ H n and b ∈ R d ❧ A : H n → R d is a real-linear map ❧ ∆ n = { X ∈ H n : X psd, trace X = 1} ❧ In many applications, d ≪ n 2 and all solutions have low rank ❧ Goal: Produce a rank- r approximation to a solution of (SDP (P)) H n = set of n × n Hermitian matrices Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 7

Optimal Storage What kind of storage bounds can we hope for? ❧ Assume black-box implementation of operations with objective + constraint: u �→ A ( uu ∗ ) ( u , z ) �→ ( A ∗ z ) u u �→ Cu C n → C n C n → R d C n × R d → C n ❧ Need Θ ( n + d ) storage for output of black-box operations ❧ Need Θ ( r n ) storage for rank- r approximate solution of model problem Definition. An algorithm for the model problem has optimal storage if its working storage is Θ ( d + r n ) rather than Θ ( n 2 ) . Source: Yurtsever et al. 2017; Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 8

So Many Algorithms... ❧ 1990s: Interior-point methods ❧ Storage cost Θ ( n 4 ) for Hessian ❧ Late 1990s: Dual (sub)gradient methods ❧ Dual cutting plane methods, spectral bundle methods ❧ Storage grows in each iteration; slow convergence ❧ 2000s: Convex first-order methods ❧ (Accelerated) proximal gradient, operator splitting, and others ❧ Store matrix variable Θ ( n 2 ) ; projection onto psd cone ❧ 2003–Present: Nonconvex heuristics ❧ Burer–Monteiro factorization idea + various nonlinear programming methods ❧ Store low-rank matrix factors Θ ( r n ) ❧ For guaranteed solution, need unrealistic + unverifiable statistical assumptions Sources: Interior-point: Nemirovski & Nesterov 1994; ... First-order: Rockafellar 1976; Helmberg & Rendl 1997; Auslender & Teboulle 2006; ... CGM: Frank & Wolfe 1956; Levitin & Poljak 1967; Jaggi 2013; ... Heuristics: Burer & Monteiro 2003; Keshavan et al. 2009; Jain et al. 2012; Bhojanapalli et al. 2015; Boumal et al. 2016; .... Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 9

The Challenge ❧ Some algorithms provably solve the model problem... ❧ Some algorithms have optimal storage guarantees... Is there a practical algorithm that provably computes a low-rank approximation to a solution of the model problem + has optimal storage guarantees? Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 10

. SketchyPDA Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 11

Algorithm Design Principles 1. Dualize (SDP (P)) carefully ❧ Dimension of dual problem equals number d of primal constraints ❧ Compute dual objective and its subgradient via eigenvector computation 2. Solve using a primal–dual averaging (PDA) method ❧ Subgradient ascent on dual objective ❧ Construct primal solution as a sequence of rank-one updates 3. Sketch stream of primal updates ❧ Dimension of sketch is proportional to size of solution Θ ( r n ) ❧ Sketching a rank-one update is inexpensive ❧ Return primal solution in factored form after optimization converges Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 12

Dual in the Sun ❧ Lagrangian: C + A ∗ z , X � � L ( X ; z ) = 〈 C , X 〉+〈 z , A ( X ) − b 〉 = −〈 z , b 〉 ❧ Dual function: C + A ∗ z � � q ( z ) = min X ∈ ∆ n L ( X ; z ) = λ min −〈 z , b 〉 ❧ Subgradients of dual function: ∇ q ( z ) = A ( vv ∗ ) − b v is a min. eigenvector of C + A ∗ z where ❧ Compute dual objective + subgradient via Lanczos method ❧ Based on matrix–vector multiply: ( C + A ∗ z ) u ❧ Can complete all calculations using our primitives! Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 13

SDP Primal & Dual minimize 〈 C , X 〉 (SDP (P)) subject to A ( X ) = b X ∈ ∆ n C + A ∗ z � � maximize λ min −〈 z , b 〉 (SDP (D)) z ∈ R d subject to Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 14

PDA: Primal–Dual Averaging ❧ Initialize: z ← 0 and X ← 0 ❧ Compute (approximate) minimum eigenvector v of matrix C + A ∗ z ❧ Form subgradient: ∇ q ( z ) = A ( vv ∗ ) − b ❧ Subgradient ascent on dual variable: z ← z + η ∇ q ( z ) ❧ Update primal variable by averaging: X ← (1 − θ ) X + θ ( vv ∗ ) ❧ Observe: Primal expressed as a stream of rank-one linear updates! ❧ Warning: This is not the whole story! Sources: Nesterov 2007, 2015; Yurtsever et al. 2015; .... Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 15

Sketching the Primal Decision Variable ❧ Fix target rank r for solution ❧ Draw Gaussian dimension reduction map Ω ∈ C n × k where k = 2 r ❧ Sketch takes form Y = X Ω ∈ C n × k ❧ Can perform rank-one linear update X ← (1 − θ ) X + θ ( vv ∗ ) on sketch: Y ← (1 − θ ) Y + θ v ( v ∗ Ω ) ❧ Can compute provably good rank- r approximation ˆ X from sketch: E � X − ˆ X � S 1 ≤ 2 � X −� X � r � S 1 ❧ Only uses storage Θ ( r n ) ! Sources: Woolfe et al. 2008; Clarkson & Woodruff 2009, 2017; Halko et al. 2009; Gittens 2011, 2013; Woodruff 2014; Cohen et al. 2015; Boutsidis et al. 2015; Pourkamali-Anaraki & Becker 2016; Tropp et al. 2016, 2017; Wang et al. 2017; .... Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 16

SketchyPDA for the Model Problem Input: Problem data; dual Lipschitz constant L ; suboptimality ε ; target rank r X = V Λ V ∗ in factored form Output: Rank- r approximate solution ˆ function SKETCHYPDA 1 SKETCH.INIT ( n , r ) ⊲ Initialize primal sketch 2 z ← 0 ⊲ Initialize dual variable 3 γ ← ε / L 2 and β ← 0 ⊲ Step size, distance traveled 4 for t ← 0,1,2,3,... do 5 v ← MinEigVec ( C + A ∗ z ) ⊲ Use Lanczos! 6 g ← A ( vv ∗ ) − b ⊲ Form subgradient 7 z ← z + γ g ⊲ Dual subgradient ascent 8 β ← β + γ ⊲ Update distance traveled 9 SKETCH.UPDATE ( v , γ / β ) ⊲ Update primal sketch 10 ( V , Λ ) ← SKETCH.RECONSTRUCT () ⊲ Approx. eigendecomp of X 11 return ( V , Λ ) 12 Source: Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 17

Methods for SKETCH Object function SKETCH.INIT ( n , r ) ⊲ Rank- r approx of n × n psd matrix 1 k ← 2 r 2 Ω ← randn ( C , n , k ) 3 Y ← zeros ( n , k ) 4 function SKETCH.UPDATE ( v , θ ) 5 ⊲ Average vv ∗ into sketch Y ← (1 − θ ) Y + θ v ( v ∗ Ω ) 6 function SKETCH.RECONSTRUCT ( ) 7 C ← chol ( Ω ∗ Y ) ⊲ Cholesky decomposition 8 Z ← Y / C ⊲ Solve least-squares problems 9 ⊲ Compute r -truncated SVD ( U , Σ , ∼ ) ← svds ( Z , r ) 10 return ( U , Σ 2 ) ⊲ Return eigenvalue factorization 11 ❧ Modify reconstruction procedure for numerical stability! Sources: Yurtsever et al. 2017; Cevher et al. 2017; Tropp et al. 2016, 2017. Joel A. Tropp (Caltech), Sketchy Decisions , CSA, Matheon, TU Berlin, 7 December 2017 18

Sketchy Decisions Joel A. Tropp Steele Family Professor of - PowerPoint PPT Presentation

Sketchy Decisions Joel A. Tropp Steele Family Professor of Applied & Computational Mathematics jtropp@cms.caltech.edu Department of Computing + Mathematical Sciences California Institute of Technology Collaborators: Volkan Cevher

A very short, sketchy, introduction to A very short, sketchy, introduction to Bioconductor

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell

GCSE or Equivalent Options Decisions! Decisions! Decisions! An important time for our Year 10

Doing Your Taxes Decisions Decisions Decisions How do I get ready? Should I

Dysphagia: decisions, decisions, decisions Sean White Home Enteral Feed Dietitian Sheffield

Today Making Simple Decisions Making Decisions Making Sequential Decisions Planning

Decisions Matter: Understanding How and Why We Make Decisions About the Environment Elke U.

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Crop I nsurance Decisions in 2 0 1 1 Crop I nsurance Decisions in 2 0 1 1 Bruce J. Sherrick

Web 2.0: Hiring & Firing Web 2.0: Hiring & Firing Decisions Based on Social Decisions

Derek Smithee, WQP Division Chief Decisions dont require data But GOOD decisions do! Quality

PROJECT BOUNDARY Nag Murty Nayan Jain Erik Bjontegard ? ? ?

Child Support Task Force: Decisions Recap, Self-Support Reserve Language and Adjustments for

New guidance for better decisions in complex times: Guidance for Strategic Decisions on Climate

Crop Production Costs and Crop Production Costs and Rotation Decisions Rotation Decisions Gary

COMP 3403 Algorithm Analysis Part 1 Chapters 1 3 Jim Diamond CAR 409 Jodrey School

Open problems in coding and cryptography Grard Cohen May 2, 2012 1 / 1 Outline 1 Packings 2

INTRODUCTION TO RHYTHM YU / LAMONT MARCH 27, 2018 2 REVIEW OF VOCAL TRACT LENGTH Review

On the regularization properties of some spectral gradient methods Daniela di Serafino

Performance in Operating Systems and Middleware Frank Feinbube Operating Systems and Middleware

A new complex frequency spectrum for the analysis of transmission efficiency in waveguide-like

Object Recognition: Scale Invariant Feature Transform (SIFT) - based Approach, in comparison

Optimality conditions for bang-bang controls (theory and examples) Joint work with Helmut Maurer

Sketchy Decisions Joel A. Tropp Steele Family Professor of - PowerPoint PPT Presentation

Sketchy Decisions Joel A. Tropp Steele Family Professor of Applied & Computational Mathematics jtropp@cms.caltech.edu Department of Computing + Mathematical Sciences California Institute of Technology Collaborators: Volkan Cevher

A very short, sketchy, introduction to A very short, sketchy, introduction to Bioconductor

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell

GCSE or Equivalent Options Decisions! Decisions! Decisions! An important time for our Year 10

Doing Your Taxes Decisions Decisions Decisions How do I get ready? Should I

Dysphagia: decisions, decisions, decisions Sean White Home Enteral Feed Dietitian Sheffield

Today Making Simple Decisions Making Decisions Making Sequential Decisions Planning

Decisions Matter: Understanding How and Why We Make Decisions About the Environment Elke U.

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Crop I nsurance Decisions in 2 0 1 1 Crop I nsurance Decisions in 2 0 1 1 Bruce J. Sherrick

Web 2.0: Hiring &amp; Firing Web 2.0: Hiring &amp; Firing Decisions Based on Social Decisions

Derek Smithee, WQP Division Chief Decisions dont require data But GOOD decisions do! Quality

PROJECT BOUNDARY Nag Murty Nayan Jain Erik Bjontegard ? ? ?

Child Support Task Force: Decisions Recap, Self-Support Reserve Language and Adjustments for

New guidance for better decisions in complex times: Guidance for Strategic Decisions on Climate

Crop Production Costs and Crop Production Costs and Rotation Decisions Rotation Decisions Gary

COMP 3403 Algorithm Analysis Part 1 Chapters 1 3 Jim Diamond CAR 409 Jodrey School

Open problems in coding and cryptography Grard Cohen May 2, 2012 1 / 1 Outline 1 Packings 2

INTRODUCTION TO RHYTHM YU / LAMONT MARCH 27, 2018 2 REVIEW OF VOCAL TRACT LENGTH Review

On the regularization properties of some spectral gradient methods Daniela di Serafino

Performance in Operating Systems and Middleware Frank Feinbube Operating Systems and Middleware

A new complex frequency spectrum for the analysis of transmission efficiency in waveguide-like

Object Recognition: Scale Invariant Feature Transform (SIFT) - based Approach, in comparison

Optimality conditions for bang-bang controls (theory and examples) Joint work with Helmut Maurer

Web 2.0: Hiring & Firing Web 2.0: Hiring & Firing Decisions Based on Social Decisions