Living on the Edge Phase Transitions in Convex Programs with - - PowerPoint PPT Presentation

living on the edge
SMART_READER_LITE
LIVE PREVIEW

Living on the Edge Phase Transitions in Convex Programs with - - PowerPoint PPT Presentation

Living on the Edge Phase Transitions in Convex Programs with Random Data Joel A. Tropp Michael B. McCoy Computing + Mathematical Sciences California Institute of Technology Joint with Dennis Amelunxen and Martin Lotz (Manchester)


slide-1
SLIDE 1

Living on the Edge

Phase Transitions in Convex Programs with Random Data Joel A. Tropp Michael B. McCoy

Computing + Mathematical Sciences California Institute of Technology Joint with Dennis Amelunxen and Martin Lotz (Manchester)

Research supported in part by ONR, AFOSR, DARPA, and the Sloan Foundation 1

slide-2
SLIDE 2

Convex Programs with Random Data

Examples...

❧ Stat and ML. Random data models; fit model via optimization ❧ Sensing. Collect random measurements; reconstruct via optimization ❧ Coding. Random channel models; decode via optimization

Motivations...

❧ Average-case analysis. Randomness describes “typical” behavior ❧ Fundamental bounds. Opportunities and limits for convex methods

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 2

slide-3
SLIDE 3

Research Challenge...

Understand and predict precise behavior

  • f random convex programs

References: Donoho–Maleki–Montanari 2009, Donoho–Johnstone–Montanari 2011, Donoho–Gavish–Montanari 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 3

slide-4
SLIDE 4

A Theory Emerges...

❧ Vershik & Sporyshev, “An asymptotic estimate for the average number of steps...” 1986 ❧ Donoho, “High-dimensional centrally symmetric polytopes...” 2/2005 ❧ Rudelson & Vershynin, “On sparse reconstruction...” 2/2006 ❧ Donoho & Tanner, “Counting faces of randomly projected polytopes...” 5/2006 ❧ Xu & Hassibi, “Compressed sensing over the Grassmann manifold...” 9/2008 ❧ Stojnic, “Various thresholds for ℓ1 optimization...” 7/2009 ❧ Bayati & Montanari, “The LASSO risk for gaussian matrices” 8/2010 ❧ Oymak & Hassibi, “New null space results and recovery thresholds...” 11/2010 ❧ Chandrasekaran, Recht, et al., “The convex geometry of linear inverse problems” 12/2010 ❧ McCoy & Tropp, “Sharp recovery bounds for convex demixing...” 5/2012 ❧ Bayati, Lelarge, & Montanari, “Universality in polytope phase transitions...” 7/2012 ❧ Chandrasekaran & Jordan, “Computational & statistical tradeoffs...” 10/2012 ❧ Amelunxen, Lotz, McCoy, & Tropp, “Living on the edge...” 3/2013 ❧ Stojnic, various works 3/2013 ❧ Foygel & Mackey, “Corrupted sensing: Novel guarantees...” 5/2013 ❧ Oymak & Hassibi, “Asymptotically exact denoising...” 5/2013 ❧ McCoy & Tropp, “From Steiner formulas for cones...” 8/2013 ❧ McCoy & Tropp, “The achievable performance of convex demixing...” 9/2013 ❧ Oymak, Thrampoulidis, & Hassibi, “The squared-error of generalized LASSO...” 11/2013

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 4

slide-5
SLIDE 5

The Core Question

How big is a cone?

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 5

slide-6
SLIDE 6

.

Regularized Denoising

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 6

slide-7
SLIDE 7

Denoising a Piecewise Smooth Signal

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −4 −3 −2 −1 1 2 3 4 Piecewise smooth function + additive white noise Time Value Original + noise Original 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −4 −3 −2 −1 1 2 3 Denoised piecewise smooth function Time Value By wavelet shrinkage Original

❧ Observation: z = x♮ + σg where g ∼ normal(0, I) ❧ Denoise via wavelet shrinkage = convex optimization: minimize 1 2 x − z2

2 + λ W x1

Reference: Donoho & Johnstone, early 1990s Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 7

slide-8
SLIDE 8

Setup for Regularized Denoising

❧ Let x♮ ∈ Rd be “structured” but unknown ❧ Let f : Rd → R be a convex function that reflects “structure” ❧ Observe z = x♮ + σg where g ∼ normal(0, I) ❧ Remove noise by solving the convex program* minimize 1 2 x − z2

2

subject to f(x) ≤ f(x♮) ❧ Hope: The minimizer x approximates x♮

*We assume the side information f(x♮) is available. This is equivalent** to knowing the

  • ptimal choice of Lagrange multiplier for the constraint.

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 8

slide-9
SLIDE 9

Geometry of Denoising I

{x : f(x) ≤ f(x♮)}

z σg

  • x

x♮

References: Chandrasekaran & Jordan 2012, Oymak & Hassibi 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 9

slide-10
SLIDE 10

Descent Cones

Definition. The descent cone of a function f at a point x is D(f, x) := {h : f(x + εh) ≤ f(x) for some ε > 0}

{h : f(x + h) ≤ f(x)}

D(f, x)

{y : f(y) ≤ f(x)}

x + D(f, x) x

References: Rockafellar 1970, Hiriary-Urruty & Lemar´ echal 1996 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 10

slide-11
SLIDE 11

Geometry of Denoising II

{x : f(x) ≤ f(x♮)}

x♮ + K z σg

  • x

ΠK(σg) x♮ K = D(f, x♮)

References: Chandrasekaran & Jordan 2012, Oymak & Hassibi 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 11

slide-12
SLIDE 12

The Risk of Regularized Denoising

Theorem 1. [Oymak & Hassibi 2013] Assume ❧ We observe z = x♮ + σg where g is standard normal ❧ The vector x solves minimize 1 2 z − x2

2

subject to f(x) ≤ f(x♮) Then sup

σ>0

E x − x♮2

2

σ2 = E ΠK(g)2

2

where K = D(f, x♮) and ΠK is the Euclidean metric projector onto K.

Related: Bhaskar–Tang–Recht 2012, Donoho–Johnstone–Montanari 2012, Chandrasekaran & Jordan 2012 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 12

slide-13
SLIDE 13

.

Statistical . Dimension

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 13

slide-14
SLIDE 14

Statistical Dimension: The Motion Picture

K

g ΠK(g) small cone

K

g ΠK(g) big cone

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 14

slide-15
SLIDE 15

The Statistical Dimension of a Cone

  • Definition. The statistical dimension δ(K) of a closed,

convex cone K is the quantity δ(K) := E

  • ΠK(g)2

2

  • .

where ❧ ΠK is the Euclidean metric projector onto K ❧ g ∼ normal(0, I) is a standard normal vector

References: Rudelson & Vershynin 2006, Stojnic 2009, Chandrasekaran et al. 2010, Chandrasekaran & Jordan 2012, Amelunxen et al. 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 15

slide-16
SLIDE 16

Basic Statistical Dimension Calculations

Cone Notation Statistical Dimension j-dim subspace Lj j Nonnegative orthant Rd

+ 1 2d

Second-order cone Ld+1

1 2(d + 1)

Real psd cone Sd

+ 1 4d(d − 1)

Complex psd cone Hd

+ 1 2d2

References: Chandrasekaran et al. 2010, Amelunxen et al. 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 16

slide-17
SLIDE 17

Circular Cones

1/4 1/2 3/4 1 References: Amelunxen et al. 2013, Mu et al. 2013, McCoy & Tropp 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 17

slide-18
SLIDE 18

Descent Cone of ℓ1 Norm at Sparse Vector

1/4 1/2 3/4 1 1/4 1/2 3/4 1 References: Stojnic 2009, Donoho & Tanner 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Mackey & Foygel 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 18

slide-19
SLIDE 19

Descent Cone of S1 Norm at Low-Rank Matrix

1/4 1/2 3/4 1 1/4 1/2 3/4 1 References: Oymak & Hassibi 2010, Chandrasekaran et al. 2010, Amelunxen et al. 2013, Foygel & Mackey 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 19

slide-20
SLIDE 20

.

Regularized Linear Inverse Problems

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 20

slide-21
SLIDE 21

Example: The Random Demodulator

Time (μ s) Frequency (MHz)

40.08 80.16 120.23 160.31 200.39 0.01 0.02 0.04 0.05 0.06 0.07

Time (μ s) Frequency (MHz)

40.08 80.16 120.23 160.31 200.39 0.01 0.02 0.04 0.05 0.06 0.07

Pseudorandom Number Generator Seed

Input to sensor Reconstruct (e.g., convex optimization) Linear data acquisition system Output from sensor

Reference: Tropp et al. 2010, ... Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 21

slide-22
SLIDE 22

Setup for Linear Inverse Problems

❧ Let x♮ ∈ Rd be a structured, unknown vector ❧ Let f : Rd → R be a convex function that reflects structure ❧ Let A ∈ Rm×d be a measurement operator ❧ Observe z = Ax♮ ❧ Find estimate x by solving convex program minimize f(x) subject to Ax = z ❧ Hope: x = x♮

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 22

slide-23
SLIDE 23

Geometry of Linear Inverse Problems

x♮ + null(A)

{x : f(x) ≤ f(x♮)}

x♮ x♮ + D(f, x♮) Success! x♮ + null(A)

{x : f(x) ≤ f(x♮)}

x♮ x♮ + D(f, x♮) Failure!

References: Cand` es–Romberg–Tao 2005, Rudelson–Vershynin 2006, Chandrasekaran et al. 2010, Amelunxen et al. 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 23

slide-24
SLIDE 24

Linear Inverse Problems with Random Data

Theorem 2. [CRPW10; ALMT13] Assume ❧ The vector x♮ ∈ Rd is unknown ❧ The observation z = Ax♮ where A ∈ Rm×d is standard normal ❧ The vector x solves minimize f(x) subject to Ax = z Then m δ

  • D(f, x♮)
  • =

  • x = x♮

whp [CRPW10; ALMT13] m δ

  • D(f, x♮)
  • =

  • x = x♮

whp [ALMT13]

References: Stojnic 2009, Chandrasekaran et al. 2010, Amelunxen et al. 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 24

slide-25
SLIDE 25

Sparse Recovery via ℓ1 Minimization

25 50 75 100 25 50 75 100

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 25

slide-26
SLIDE 26

Low-Rank Recovery via S1 Minimization

10 20 30 300 600 900

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 26

slide-27
SLIDE 27

.

Demixing Structured Signals

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 27

slide-28
SLIDE 28

Example: Demixing Spikes + Sines

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 −1 −0.5 0.5 1 Time (seconds)

axis([A(1:2),1,1]);

Uy

0: Signal with sparse DCT

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 −1 −0.5 0.5 1

x0: Sparse noise

Amplitude Amplitude

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 −1 −0.5 0.5 1

Noisy Signal

Amplitude

Time (seconds)

Observation: z = x♮ + Uy♮ where U is the DCT

References: Starck–Donoho–Cand` es 2002, Starck–Elad–Donoho 2004 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 28

slide-29
SLIDE 29

Convex Demixing Yields...

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 −1 −0.5 0.5 1 Time (seconds) Demixed Original

Amplitude

minimize x1 + λ y1 subject to z = x + Uy

References: Starck–Donoho–Cand` es 2002, Starck–Elad–Donoho 2004 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 29

slide-30
SLIDE 30

Setup for Demixing Problems

❧ Let x♮ ∈ Rd and y♮ ∈ Rd be structured, unknown vectors ❧ Let f, g : Rd → R be convex functions that reflect structure ❧ Let U ∈ Rd×d be a known orthogonal matrix ❧ Observe z = x♮ + Uy♮ ❧ Demix via convex program minimize f(x) subject to g(y) ≤ g(y♮) x + Uy = z ❧ Hope: ( x, y) = (x♮, y♮)

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 30

slide-31
SLIDE 31

Geometry of Demixing Problems

x♮ x♮ + D(f, x♮) x♮ − UD(g, y♮) Success! x♮ x♮ + D(f, x♮) x♮ − UD(g, y♮) Failure!

References: McCoy & Tropp 2012, Amelunxen et al. 2013, McCoy & Tropp 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 31

slide-32
SLIDE 32

Demixing Problems with Random Incoherence

Theorem 3. [MT12, ALMT13] Assume ❧ The vectors x♮ ∈ Rd and y♮ ∈ Rd are unknown ❧ The observation z = x♮ + Qy♮ where Q is random orthogonal ❧ The pair ( x, y) solves minimize f(x) subject to g(y) ≤ g(y♮) x + Qy = z Then δ

  • D(f, x♮)
  • + δ
  • D(g, y♮)
  • d

= ⇒ ( x, y) = (x♮, y♮) whp δ

  • D(f, x♮)
  • + δ
  • D(g, y♮)
  • d

= ⇒ ( x, y) = (x♮, y♮) whp

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 32

slide-33
SLIDE 33

Sparse + Sparse via ℓ1 + ℓ1 Minimization

25 50 75 100 25 50 75 100

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 33

slide-34
SLIDE 34

Low-Rank + Sparse via S1 + ℓ1 Minimization

7 14 21 28 35 245 490 735 980 1225

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 34

slide-35
SLIDE 35

Statistical Dimension & Phase Transitions

❧ Key Question: When do two randomly oriented cones strike? ❧ Intuition: When do two randomly oriented subspaces strike? The Approximate Kinematic Formula Let C and K be closed convex cones in Rd δ(C) + δ(K) d = ⇒ P {C ∩ QK = {0}} ≈ 1 δ(C) + δ(K) d = ⇒ P {C ∩ QK = {0}} ≈ 0 where Q is a random orthogonal matrix

References: Amelunxen et al. 2013, McCoy & Tropp 2013 Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 35

slide-36
SLIDE 36

To learn more...

E-mail: jtropp@cms.caltech.edu Web: http://users.cms.caltech.edu/~jtropp Main Papers Discussed:

❧ Chandrasekaran, Recht, et al., “The convex geometry of linear inverse problems.” FOCM 2012 ❧ MT, “Sharp recovery bounds for convex deconvolution, with applications.” FOCM 2014 ❧ ALMT, “Living on the edge: A geometric theory of phase transitions in convex optimization.” I&I 2014 ❧ Oymak & Hassibi, “Asymptotically exact denoising in relation to compressed sensing,” arXiv cs.IT 1305.2714 ❧ MT, “From Steiner formulas for cones to concentration of intrinsic volumes.” DCG 2014 ❧ MT, “The achievable performance of convex demixing,” arXiv cs.IT 1309.7478 ❧ More to come!

Living on the Edge, Modern Time–Frequency Analysis, Strobl, 3 June 2014 36