Sketchy Decisions Joel A. Tropp Steele Family Professor of - - PowerPoint PPT Presentation

sketchy decisions
SMART_READER_LITE
LIVE PREVIEW

Sketchy Decisions Joel A. Tropp Steele Family Professor of - - PowerPoint PPT Presentation

Sketchy Decisions Joel A. Tropp Steele Family Professor of Applied & Computational Mathematics jtropp@cms.caltech.edu Department of Computing + Mathematical Sciences California Institute of Technology Collaborators: Volkan Cevher


slide-1
SLIDE 1

Sketchy Decisions

Joel A. Tropp

Steele Family Professor of Applied & Computational Mathematics

jtropp@cms.caltech.edu

Department of Computing + Mathematical Sciences California Institute of Technology

Collaborators: Volkan Cevher (EPFL), Roarke Horstmeyer (Duke), Quoc Tran-Dinh (UNC), Madeleine Udell (Cornell), Alp Yurtsever (EPFL)

Research supported by ONR, AFOSR, NSF, DARPA, ERC, SNF, Sloan, and Moore. 1

slide-2
SLIDE 2

.

Motivation: . MaxCut

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 2

slide-3
SLIDE 3

Graphs & Cuts

❧ Let G = (V,E) be an undirected graph with V = {1,...,n} ❧ A cut is a subset S ⊂ V ❧ The weight of the cut is the number of edges between S and V \S ❧ The associated cut vector χ ∈ Rn has entries

χi =

  • +1,

i ∈ S −1, i ∉ S

❧ The Laplacian L is the n ×n positive-semidefinite (psd) matrix

L =

  • {i,j}∈E

(ei −ej)(ei −ej)t

❧ Calculate the weight of the cut algebraically:

χtLχ =

  • {i,j}∈E

(χi −χj)2 = 4·weight(S)

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 3

slide-4
SLIDE 4

The Most Unkindest Cut of All

❧ Calculate the maximum cut via a mathematical program:

maximize xtLx subject to x ∈ {±1}n (MAXCUT)

❧ NP-hard, so relax to a semidefinite program (SDP) via the map xxt → X :

maximize trace(LX ) subject to diag(X ) = 1, X psd (MAXCUT SDP)

❧ Report signum of maximum eigenvector of solution (or randomly round) ❧ Provably good idea, but... ❧ Laplacian L of a graph with m edges has Θ(m) nonzeros ❧ SDP decision variable X has Θ(n2) degrees of freedom ❧ Storage! Communication! Computation!

Sources: Goemans & Williamson 1996. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 4

slide-5
SLIDE 5

.

Optimization with . Optimal Storage

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 5

slide-6
SLIDE 6

Optimization with Optimal Storage

Can we develop algorithms that reliably solve an optimization problem using storage that does not exceed the size of the problem data

  • r the size of the solution?

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 6

slide-7
SLIDE 7

Model Problem: Trace-Constrained SDP minimize trace(C X ) subject to

A (X ) = b

X ∈ ∆n (SDP (P))

Details:

❧ C ∈ Hn and b ∈ Rd ❧ A : Hn → Rd is a real-linear map ❧ ∆n = {X ∈ Hn : X psd, trace X = 1} ❧ In many applications, d ≪ n2 and all solutions have low rank ❧ Goal: Produce a rank-r approximation to a solution of (SDP (P))

Hn = set of n ×n Hermitian matrices

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 7

slide-8
SLIDE 8

Optimal Storage

What kind of storage bounds can we hope for?

❧ Assume black-box implementation of operations with objective + constraint:

u → Cu u → A (uu∗) (u,z) → (A ∗z)u Cn → Cn Cn → Rd Cn ×Rd → Cn

❧ Need Θ(n +d) storage for output of black-box operations ❧ Need Θ(r n) storage for rank-r approximate solution of model problem

  • Definition. An algorithm for the model problem has
  • ptimal storage if its working storage is Θ(d +r n) rather than Θ(n2).

Source: Yurtsever et al. 2017; Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 8

slide-9
SLIDE 9

So Many Algorithms...

❧ 1990s: Interior-point methods ❧ Storage cost Θ(n4) for Hessian ❧ Late 1990s: Dual (sub)gradient methods ❧ Dual cutting plane methods, spectral bundle methods ❧ Storage grows in each iteration; slow convergence ❧ 2000s: Convex first-order methods ❧ (Accelerated) proximal gradient, operator splitting, and others ❧ Store matrix variable Θ(n2); projection onto psd cone ❧ 2003–Present: Nonconvex heuristics ❧ Burer–Monteiro factorization idea + various nonlinear programming methods ❧ Store low-rank matrix factors Θ(r n) ❧ For guaranteed solution, need unrealistic + unverifiable statistical assumptions

Sources: Interior-point: Nemirovski & Nesterov 1994; ... First-order: Rockafellar 1976; Helmberg & Rendl 1997; Auslender & Teboulle 2006; ... CGM: Frank & Wolfe 1956; Levitin & Poljak 1967; Jaggi 2013; ... Heuristics: Burer & Monteiro 2003; Keshavan et al. 2009; Jain et al. 2012; Bhojanapalli et al. 2015; Boumal et al. 2016; .... Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 9

slide-10
SLIDE 10

The Challenge

❧ Some algorithms provably solve the model problem... ❧ Some algorithms have optimal storage guarantees...

Is there a practical algorithm that provably computes a low-rank approximation to a solution of the model problem + has optimal storage guarantees?

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 10

slide-11
SLIDE 11

.

SketchyPDA

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 11

slide-12
SLIDE 12

Algorithm Design Principles

  • 1. Dualize (SDP (P)) carefully

❧ Dimension of dual problem equals number d of primal constraints ❧ Compute dual objective and its subgradient via eigenvector computation

  • 2. Solve using a primal–dual averaging (PDA) method

❧ Subgradient ascent on dual objective ❧ Construct primal solution as a sequence of rank-one updates

  • 3. Sketch stream of primal updates

❧ Dimension of sketch is proportional to size of solution Θ(r n) ❧ Sketching a rank-one update is inexpensive ❧ Return primal solution in factored form after optimization converges

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 12

slide-13
SLIDE 13

Dual in the Sun

❧ Lagrangian:

L (X ;z) = 〈C, X 〉+〈z, A (X )−b〉 =

  • C +A ∗z, X
  • −〈z, b〉

❧ Dual function:

q(z) = min

X ∈∆n L (X ;z) = λmin

  • C +A ∗z
  • −〈z, b〉

❧ Subgradients of dual function:

∇q(z) = A (vv ∗)−b where v is a min. eigenvector of C +A ∗z

❧ Compute dual objective + subgradient via Lanczos method ❧ Based on matrix–vector multiply: (C +A ∗z)u ❧ Can complete all calculations using our primitives!

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 13

slide-14
SLIDE 14

SDP Primal & Dual minimize 〈C, X 〉 (SDP (P)) subject to

A (X ) = b

X ∈ ∆n maximize λmin

  • C +A ∗z
  • −〈z, b〉

(SDP (D)) subject to z ∈ Rd

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 14

slide-15
SLIDE 15

PDA: Primal–Dual Averaging

❧ Initialize: z ← 0 and X ← 0 ❧ Compute (approximate) minimum eigenvector v of matrix C +A ∗z ❧ Form subgradient:

∇q(z) = A (vv ∗)−b

❧ Subgradient ascent on dual variable:

z ← z +η∇q(z)

❧ Update primal variable by averaging:

X ← (1−θ)X +θ(vv ∗)

❧ Observe: Primal expressed as a stream of rank-one linear updates! ❧ Warning: This is not the whole story!

Sources: Nesterov 2007, 2015; Yurtsever et al. 2015; .... Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 15

slide-16
SLIDE 16

Sketching the Primal Decision Variable

❧ Fix target rank r for solution ❧ Draw Gaussian dimension reduction map

Ω ∈ Cn×k where k = 2r

❧ Sketch takes form

Y = X Ω ∈ Cn×k

❧ Can perform rank-one linear update X ← (1−θ)X +θ(vv ∗) on sketch:

Y ← (1−θ)Y +θv(v ∗Ω)

❧ Can compute provably good rank-r approximation ˆ

X from sketch: EX − ˆ X S1 ≤ 2X −X rS1

❧ Only uses storage Θ(r n)!

Sources: Woolfe et al. 2008; Clarkson & Woodruff 2009, 2017; Halko et al. 2009; Gittens 2011, 2013; Woodruff 2014; Cohen et al. 2015; Boutsidis et al. 2015; Pourkamali-Anaraki & Becker 2016; Tropp et al. 2016, 2017; Wang et al. 2017; .... Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 16

slide-17
SLIDE 17

SketchyPDA for the Model Problem

Input: Problem data; dual Lipschitz constant L; suboptimality ε; target rank r Output: Rank-r approximate solution ˆ X = V ΛV ∗ in factored form

1

function SKETCHYPDA

2

SKETCH.INIT(n,r) ⊲ Initialize primal sketch

3

z ← 0 ⊲ Initialize dual variable

4

γ ← ε/L2 and β ← 0 ⊲ Step size, distance traveled

5

for t ← 0,1,2,3,... do

6

v ← MinEigVec(C +A ∗z) ⊲ Use Lanczos!

7

g ← A (vv ∗)−b ⊲ Form subgradient

8

z ← z +γg ⊲ Dual subgradient ascent

9

β ← β+γ ⊲ Update distance traveled

10

SKETCH.UPDATE(v,γ/β) ⊲ Update primal sketch

11

(V ,Λ) ← SKETCH.RECONSTRUCT() ⊲ Approx. eigendecomp of X

12

return (V ,Λ)

Source: Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 17

slide-18
SLIDE 18

Methods for SKETCH Object

1

function SKETCH.INIT(n, r) ⊲ Rank-r approx of n ×n psd matrix

2

k ← 2r

3

Ω ← randn(C,n,k)

4

Y ← zeros(n,k)

5

function SKETCH.UPDATE(v, θ)

6

Y ← (1−θ)Y +θv(v ∗Ω) ⊲ Average vv ∗ into sketch

7

function SKETCH.RECONSTRUCT( )

8

C ← chol(Ω∗Y ) ⊲ Cholesky decomposition

9

Z ← Y /C ⊲ Solve least-squares problems

10

(U,Σ,∼) ← svds(Z ,r) ⊲ Compute r-truncated SVD

11

return (U, Σ2) ⊲ Return eigenvalue factorization

❧ Modify reconstruction procedure for numerical stability!

Sources: Yurtsever et al. 2017; Cevher et al. 2017; Tropp et al. 2016, 2017. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 18

slide-19
SLIDE 19

Refinements: SKETCHYUPD

❧ Line search on dual objective with relaxed descent condition ❧ Dual step size determined by line search ❧ Homotopy on line search tolerance ❧ Adapt accuracy of Lanczos eigenvector computation ❧ Restart primal average at dyadic intervals ❧ Reliable convergence test ❧ Handle more general inclusions: A (X ) ∈ K ❧ Need inexpensive prox for support function of K ❧ Outcome: Reliable, provable algorithm with good empirical behavior!

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 19

slide-20
SLIDE 20

Less Filling / Great Taste

Theorem 1 (CYTT 2017). SKETCHYUPD has the following properties:

❧ SKETCHYUPD has optimal storage guarantee Θ(d +r n) ❧ UPD produces iterates X t ∈ ∆n that satisfy

|trace(C X t)−trace(C X ⋆)| = ˜

O

  • t −1/2

and A (X t)−b = ˜

O

  • t −1/2

❧ Suppose the UPD iterates X t converge to X upd. Then SKETCHYUPD produces

rank-r iterates ˆ X t that satisfy limsup

t→∞

E

  • ˆ

X t − X upd

  • S1

≤ 2

  • X upd −X updr
  • S1

In particular, if rank(X upd) ≤ r, then E

  • ˆ

X t − X upd

  • S1 → 0

Source: “Everything you always wanted in an algorithm. And less.” https://www.youtube.com/watch?v=0agZEMEpiVI. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 20

slide-21
SLIDE 21

.

Performance of . SketchyUPD

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 21

slide-22
SLIDE 22

Results for MAXCUT SDP

iteration

100 101 102 103

|p(X) − p⋆|/|p⋆|

10-3 10-2 10-1 100

iteration

100 101 102 103

AX − b2

10-2 10-1 100 101

iteration

100 101 102 103

cut value

3000 3500 4000 4500 5000 5500 6000

❧ Instance: G67 ❧ Graph properties: n = 1.00·104, m = 2.00·104 ❧ Primal dim: n2 = 1.00·108; dual dim: d = 1.00·104 ❧ s/iter ≈ 0.595 ❧ s/SVD ≈ 96.57

Sources: Davis & Hu 2011; Boumal et al. 2015–2017. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 22

slide-23
SLIDE 23

Results for MAXCUT SDP

iteration

100 101 102 103

|p(X)|

×105 4 4.5 5 5.5 6 6.5

iteration

100 101 102 103

AX − b2

10-2 10-1 100 101 102

iteration

100 101 102 103

cut value

×105 3 3.05 3.1 3.15 3.2 3.25 3.3 3.35 3.4 3.45 3.5

❧ Instance: Gn-pin-pout ❧ Graph properties: n = 1.00·106; m = 5.02·107 ❧ Primal dim: n2 = 1.00·1012; dual dim: d = 1.00·106 ❧ s/iter ≈ 0.942 ❧ s/SVD ≈ —

Sources: Davis & Hu 2011. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 23

slide-24
SLIDE 24

SDP Relaxation of Abstract Phase Retrieval

❧ Let x ∈ Cn be an unknown signal with x = 1 ❧ Let ai ∈ Cn be known measurement vectors for i = 1,...,d ❧ Collect data bi = |〈ai, x〉|2 ❧ Solve phase retrieval SDP:

minimize trace(X ) subject to a∗

i X ai = bi

for i = 1,...,d X ∈ ∆n

❧ Reconstruction: Maximum eigenvector of SDP solution

Sources: AIM Frames Workshop 2008; Balan et al. 2009; Chai et al. 2011; Candès et al. 2013; .... Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 24

slide-25
SLIDE 25

Example: Fourier Ptychography

x phase gradient y phase gradient 29 illuminations; 250×250 pixels each; d = 1.81·106 measurements image size n = 501×501 pixels; matrix size n2 = 6.25·1010

Sources: Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 25

slide-26
SLIDE 26

The Quadratic Assignment Problem (QAP)

❧ Let A,B ∈ Hn be fixed (sparse) matrices ❧ Quadratic assignment problem:

minimize trace(APBPt) subject to P is a permutation matrix

❧ Applications: ❧ Graph matching ❧ Alignment in NMR spectroscopy ❧ ... ❧ Considered to be one of the hardest combinatorial problems to solve

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 26

slide-27
SLIDE 27

An SDP Relaxation of QAP

minimize trace((A ⊗B)Y) subject to diag(Y) = vec(P) P1 = 1, 1tP = 1t, P ≥ 0 trace1(Y) = I, trace2(Y) = I (Y)i j ≥ 0 for (i, j) ∈ supp(A ⊗B)

  • 1

vec(P)t vec(P)

Y

  • 0,

trace(Y) = n

❧ Variable Y has dimension n2 ×n2 ❧ DNN formulation hard for n ≥ 50 ❧ Progress on other relaxations for n ≤ 1000

Sources: Zhao et al. 1998; Bravo Ferreira et al. 2017; Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 27

slide-28
SLIDE 28

Upper Bounds for QAP

iterations

100 101 102 103 104

|p(X)|

101 102 103 104

iterations

100 101 102 103 104

feasibility gap

10-1 100 101 102

iterations

100 101 102 103 104

cost

140 160 180 200 220 240 260 280 300 320

❧ Instance: esc64a ❧ Properties: n = 64 ❧ Primal dim: n4 = 1.68·107 ❧ QAP optimum: 116 ❧ SketchyUPD upper bound: ≈ 135 ❧ [BKS17] upper bound: 178

Sources: Burkard et al. 2017; Bravo Ferreira et al. 2017; Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 28

slide-29
SLIDE 29

Upper Bounds for QAP

iterations

100 101 102 103 104

|p(X)|

102 103 104

iterations

100 101 102 103 104

feasibility gap

100 101 102

iterations

100 101 102 103 104

cost

100 150 200 250 300

❧ Instance: esc128 ❧ Properties: n = 128 ❧ Primal dim: n4 = 2.68·108 ❧ QAP optimum: 64 ❧ SketchyUPD upper bound: ≈ 90 ❧ [BKS17] upper bound: 176

Sources: Burkard et al. 2017; Bravo Ferreira et al. 2017; Cevher et al. 2017. Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 29

slide-30
SLIDE 30

To learn more...

E-mail: jtropp@cms.caltech.edu Web: http://users.cms.caltech.edu/∼jtropp Related Papers:

❧ Cevher, Yurtsever, Tran, & Tropp, “Storage-optimal algorithms for semidefinite programming,” coming soon! ❧ Yurtsever, Udell, Tropp, & Cevher, “Sketchy decisions: Convex low-rank matrix optimization with optimal storage,”

AISTATS 2017, arXiv:1702.06838

❧ Tropp, Yurtsever, Udell, & Cevher, “Fixed-rank approximation of a positive-semidefinite matrix from streaming data,”

NIPS 2017, arXiv:1706.05736

❧ Tropp, Yurtsever, Udell, & Cevher, “Practical sketching algorithms for low-rank matrix approximation,” SIMAX 2017,

arXiv:1609.00048

❧ Horstmeyer el al. ‘‘Solving ptychography with a convex relaxation,” New J. Physics, 2015 ❧ Halko, Martinsson, & Tropp, “Finding structure with randomness: Probabilistic algorithms for computing

approximate matrix decompositions,” SIAM Review, 2011

❧ More to come!

Joel A. Tropp (Caltech), Sketchy Decisions, CSA, Matheon, TU Berlin, 7 December 2017 30