Beyond Compressed Sensing: The Effectiveness of Convex Programming in the Information and Physical Sciences
Emmanuel Cand` es EUSIPCO 2015, Nice, September 2015
Beyond Compressed Sensing: The Effectiveness of Convex Programming - - PowerPoint PPT Presentation
Beyond Compressed Sensing: The Effectiveness of Convex Programming in the Information and Physical Sciences Emmanuel Cand` es EUSIPCO 2015, Nice, September 2015 Three stories about signal recovery from missing information Today I want to
Emmanuel Cand` es EUSIPCO 2015, Nice, September 2015
Today I want to tell you three stories from my life. That’s it. No big deal. Just three stories Steve Jobs
Today I want to tell you three stories from my life. That’s it. No big deal. Just three stories Steve Jobs (1) Missing phase (phase retrieval) (2) Missing and/or corrupted entries in data matrix (robust PCA) (3) Missing high-frequency spectrum (super-resolution) Makes signal/data recovery difficult
Convex programming usually (but not always) returns the right answer!
Collaborators: Y. Eldar, X. Li, T. Strohmer, V. Voroninski
Method for determining atomic structure within a crystal principle typical setup 10 Nobel Prizes in X-ray crystallography, and counting...
principle Franklin’s photograph
Detectors record intensities of diffracted rays → phaseless data only! Fraunhofer diffraction − → intensity of electrical field |ˆ x(f1, f2)|2 =
Detectors record intensities of diffracted rays → phaseless data only! Fraunhofer diffraction − → intensity of electrical field |ˆ x(f1, f2)|2 =
How can we recover the phase (or equivalently signal x(t1, t2)) from |ˆ x(f1, f2)|?
R¨
Dierolf (2010)
Imaging single large protein complexes
Phaseless measurements about x0 ∈ Cn bk = |ak, x0|2 k ∈ {1, . . . , m} = [m] Phase retrieval is feasibility problem find x subject to |ak, x|2 = bk k ∈ [m] Solving quadratic equations is NP hard in general
Phaseless measurements about x0 ∈ Cn bk = |ak, x0|2 k ∈ {1, . . . , m} = [m] Phase retrieval is feasibility problem find x subject to |ak, x|2 = bk k ∈ [m] Solving quadratic equations is NP hard in general 1985 Nobel Prize for Hauptman & Karle: use very specific prior knowledge
Phaseless measurements about x0 ∈ Cn bk = |ak, x0|2 k ∈ {1, . . . , m} = [m] Phase retrieval is feasibility problem find x subject to |ak, x|2 = bk k ∈ [m] Solving quadratic equations is NP hard in general 1985 Nobel Prize for Hauptman & Karle: use very specific prior knowledge Standard approach: Gerchberg–Saxton (or Fienup) iterative algorithm Sometimes works well Sometimes does not
Lifting: X = xx∗ bk = |ak, x|2 = a∗
kxx∗ak = a∗ kXak
Turns quadratic measurements into linear measurements b := A(X) about xx∗
Lifting: X = xx∗ bk = |ak, x|2 = a∗
kxx∗ak = a∗ kXak
Turns quadratic measurements into linear measurements b := A(X) about xx∗
find X
A(X) = b X 0, rank(X) = 1 ⇐ ⇒ min rank(X)
A(X) = b X 0 Combinatorially hard
Lifting: X = xx∗ bk = |ak, x|2 = a∗
kxx∗ak = a∗ kXak
Turns quadratic measurements into linear measurements b := A(X) about xx∗
minimize Tr(X) subject to A(X) = b X 0 This is a semidefinite program (SDP) Trace is convex proxy for rank
Special class of convex optimization problems Relatively natural extension of linear programming (LP) ‘Efficient’ numerical solvers (interior point methods)
minimize c, x subject to aT
k x = bk k = 1, . . .
x ≥ 0
minimize C, X subject to Ak, X = bk k = 1, . . . X 0 Standard inner product: C, X = Tr(C∗X)
Relaxation of quadratically constrained QP’s Shor (87) [Lower bounds on nonconvex quadratic optimization problems] Goemans and Williamson (95) [MAX-CUT] Ben-Tal and Nemirovskii (01) [Monograph] ... Similar approach for array imaging: Chai, Moscoso, Papanicolaou (11)
Quadratic equations b = A(xx∗) bk = |ak, x|2 k ∈ [m] Lift b = A(X) minimize Tr(X) subject to A(X) = b X 0
→ highly underdetermined (m ≪ n2)
Quadratic equations b = A(xx∗) bk = |ak, x|2 k ∈ [m] Simplest model: ak independently and uniformly sampled on unit sphere
Quadratic equations b = A(xx∗) bk = |ak, x|2 k ∈ [m] Simplest model: ak independently and uniformly sampled on unit sphere
Assume m n. With prob. 1 − O(e−γm), for all x ∈ Cn, only point in feasible set {X : A(X) = b and X 0} is xx∗
Quadratic equations b = A(xx∗) bk = |ak, x|2 k ∈ [m] Simplest model: ak independently and uniformly sampled on unit sphere
Assume m n. With prob. 1 − O(e−γm), for all x ∈ Cn, only point in feasible set {X : A(X) = b and X 0} is xx∗ Injectivity if m ≥ 4n − 2 (Balan, Bodmann, Casazza, Edidin ’09)
Random masking + diffraction Similar theory: C. , Li and Soltanolkotabi (’13)
(a) Smooth signal (real part) (b) Random signal (real part)
Figure: Recovery (with reweighting) of n-dimensional complex signal (2n unknowns) from 4n quadratic measurements (random binary masks)
How can feasible set {X 0} ∩ {A(X) = b} have a unique point? Intersection of
y y z
Rank-1 matrices are on the boundary (extreme rays) of PSD cone
50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250
3 Gaussian masks
50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250
8 binary masks error with 8 binary masks
Courtesy
Low SNR (20dB) High SNR (60 dB)
Reconstructions from noisy data using 32 Gaussian random masks Similar output with 16 masks
Figure: Relative MSE vs. signal-to-noise ratio on dB scale (binary masks)
Collaborators: X. Li, Y. Ma, J. Wright
M = L + S M: data matrix (observed) L: low-rank (unobserved) S: sparse (unobserved)
M = L + S M: data matrix (observed) L: low-rank (unobserved) S: sparse (unobserved)
Again, missing information
PCA sensitive to outliers: breaks down with one (badly) corrupted data point x11 x12 . . . x1n x21 x22 . . . x2n . . . . . . . . . . . . xd1 xd2 . . . xdn ❆
PCA sensitive to outliers: breaks down with one (badly) corrupted data point x11 x12 . . . x1n x21 x22 . . . x2n . . . . . . . . . . . . xd1 xd2 . . . xdn = ⇒ x11 x12 . . . x1n x21 x22 . . . x2n . . . . . . . . . . . . xd1 ❆ . . . xdn
Data increasingly high dimensional Gross errors frequently occur in many applications
Image processing Web data analysis Bioinformatics ... Occlusions Malicious tampering Sensor failures ...
❆ x12 . . . x1n x21 x22 . . . ❆ . . . . . . . . . . . . xd1 ❆ . . . xdn Important to make PCA robust
Movies Users × × ❆ ❆ × ❆ × × × ❆ × Observe corrupted entries Yij = Lij + Sij (i, j) ∈ Ωobs L low-rank matrix S entries that have been tampered with (impulsive noise)
Recover L from missing and corrupted samples
M = L + S Sparse component cannot be low rank: S = ∗ · · · ∗ · · · . . . . . . . . . . . . . . . . . . ∗ · · · Sparsity pattern will be assumed (uniform) random
M = L + S Sparse component cannot be low rank: S = ∗ · · · ∗ · · · . . . . . . . . . . . . . . . . . . ∗ · · · Sparsity pattern will be assumed (uniform) random Low-rank component cannot be sparse: L = ∗ ∗ ∗ ∗ · · · ∗ ∗ · · · . . . . . . . . . . . . . . . . . . · · ·
L = x1 x2 · · · xn−1 xn x1 x2 · · · xn−1 xn . . . . . . . . . . . . . . . x1 x2 · · · xn−1 xn
⇒ L + S = ❆ x2 · · · xn−1 xn ❆ x2 · · · xn−1 xn . . . . . . . . . . . . . . . ❆ x2 · · · xn−1 xn
L = x1 x2 x3 x4 · · · xn−1 xn x1 x2 x3 x4 · · · xn−1 xn · · · · · · . . . . . . . . . . . . . . . . . . · · · Incoherent condition [C. and Recht (’08)]: column and row spaces not aligned with coordinate axes (cannot have small subsets of rows and/or columns that are singular)
M = x1 x2 ❆ x4 · · · xn−1 xn x1 x2 ❆ x4 · · · xn−1 ❆ ❆ ❆ · · · ❆ · · · . . . . . . . . . . . . . . . . . . ❆ · · · ❆ Incoherent condition [C. and Recht (’08)]: column and row spaces not aligned with coordinate axes (cannot have small subsets of rows and/or columns that are singular)
M = L + S L unknown (rank unknown) S unknown (# of entries = 0, locations, magnitudes all unknown)
M = L + S L unknown (rank unknown) S unknown (# of entries = 0, locations, magnitudes all unknown)
minimize ˆ L∗ + λ ˆ S1 subject to ˆ L + ˆ S = M See also Chandrasekaran, Sanghavi, Parrilo, Willsky (’09) nuclear norm: L∗ =
i σi(L) (sum of sing. values)
ℓ1 norm: S1 =
ij |Sij| (sum of abs. values)
min ˆ L∗ + λ ˆ S1
ˆ L + ˆ S = M ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆
min ˆ L∗ + λ ˆ S1
ˆ L + ˆ S = M
L is n × n of rank(L) ≤ ρrn (log n)−2 and incoherent S is n × n, random sparsity pattern of cardinality at most ρsn2 Then with probability 1 − O(n−10), SDP with λ = 1/√n is exact: ˆ L = L, ˆ S = S Same conclusion for rectangular matrices with λ = 1/ √ max dim ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆
min ˆ L∗ + λ ˆ S1
ˆ L + ˆ S = M
L is n × n of rank(L) ≤ ρrn (log n)−2 and incoherent S is n × n, random sparsity pattern of cardinality at most ρsn2 Then with probability 1 − O(n−10), SDP with λ = 1/√n is exact: ˆ L = L, ˆ S = S Same conclusion for rectangular matrices with λ = 1/ √ max dim No tuning parameter! Whatever the magnitudes of L and S × ❆ ❆ ❆ × ❆ ❆ ❆ × × ❆ ❆ × ❆ ❆ × ❆ ❆ ❆ ❆ × ❆ ❆ × × ❆ ❆ ❆ ❆ ❆ ❆ ❆ × × ❆ ❆
(a) RPCA, Random Signs (b) RPCA, Coherent Signs (c) Matrix Completion
L = XY T is a product of independent n × r i.i.d. N(0, 1/n) matrices
min ˆ L∗ + λ ˆ S1
ˆ Lij + ˆ Sij = Lij + Sij (i, j) ∈ Ωobs
× ❆ ? ? × ? ? ? × ❆ ? ? × ? ? × ? ? ? ? × ? ? ❆ × ? ❆ ? ? ? ? ? × ❆ ? ?
Same theorem: with high prob. λ = 1 √
= ⇒ ˆ L = L!
Sequence of 200 video frames (144 × 172 pixels) with a static background Problem: detect any activity in the foreground
… …
RPCA
From GoDec
L + S L S automatic and improved background suppression
Joint with R. Otazo and D. Sodickson
NUFFT Standard L + S Motion-Guided L + S 12.8 fold acceleration min L∗ + λS1
A(L + S) = y
Joint with R. Otazo and D. Sodickson
NUFFT Standard L + S Motion-Guided L + S Temporal blurring
Joint with R. Otazo and D. Sodickson
NUFFT Standard L + S Motion-Guided L + S 12.8 fold acceleration min L∗ + λS1
A(L + S) = y
Joint with R. Otazo and D. Sodickson
NUFFT Standard L + S Motion-Guided L + S
Joint with R. Otazo and D. Sodickson
Collaborator: C. Fernandez-Granda
In any optical imaging system, diffraction imposes fundamental limit on resolution
The physical phenomenon called diffraction is of the utmost importance in the theory of optical imaging systems (Joseph Goodman)
Interested in usual bandlimited imaging systems
Pupil Airy disk Cross section
Lord Rayleigh
data Retrieve fine scale information from low-pass data
Equivalent description: extrapolate spectrum Fundamental problem Radar Microscopy Spectroscopy Medical imaging Astronomy Geophysics ...
Microscope receives light from fluorescent molecules
Resolution is much coarser than size of individual molecules (low-pass data) Can we ‘beat’ the diffraction limit and super-resolve those molecules? Higher molecule density − → faster imaging
Signal x =
j ajδτj
aj ∈ C, τj ∈ [0, 1] Data y = Fnx: n = 2flo + 1 low-frequency coefficients (Nyquist sampling) y(k) = 1 e−i2πktx(dt) =
aje−i2πkτj k ∈ Z, |k| ≤ flo Resolution limit: (λlo/2 is Rayleigh distance) 1/flo = λlo
Signal x =
j ajδτj
aj ∈ C, τj ∈ [0, 1] Data y = Fnx: n = 2flo + 1 low-frequency coefficients (Nyquist sampling) y(k) = 1 e−i2πktx(dt) =
aje−i2πkτj k ∈ Z, |k| ≤ flo Resolution limit: (λlo/2 is Rayleigh distance) 1/flo = λlo
Can we resolve the signal beyond this limit?
Low-frequency data about spike train
Low-frequency data about spike train
min ˜ xTV subject to Fn ˜ x = y xTV =
t |xt|
x =
ajδτj = ⇒ xTV =
|aj| Work on ℓ1 minimization: Logan, Donoho, Stark, Tropp, Elad, C. , Tao...
y(k) = 1 e−i2πktx(dt) |k| ≤ flo
y(k) = 1 e−i2πktx(dt) |k| ≤ flo
If spikes are separated by at least 1.86 /flo := 1.86 λlo then min TV solution is exact! (Current state of the art 1.4λlo)
y(k) = 1 e−i2πktx(dt) |k| ≤ flo
If spikes are separated by at least 1.86 /flo := 1.86 λlo then min TV solution is exact! (Current state of the art 1.4λlo) Infinite precision! (Whatever the amplitudes)
y(k) = 1 e−i2πktx(dt) |k| ≤ flo
If spikes are separated by at least 1.86 /flo := 1.86 λlo then min TV solution is exact! (Current state of the art 1.4λlo) Infinite precision! (Whatever the amplitudes) Cannot go below λlo
y(k) = 1 e−i2πktx(dt) |k| ≤ flo
If spikes are separated by at least 1.86 /flo := 1.86 λlo then min TV solution is exact! (Current state of the art 1.4λlo) Infinite precision! (Whatever the amplitudes) Cannot go below λlo Can recover (2λlo)−1 = flo/2 = n/4 spikes from n low-freq. samples
y(k) = 1 e−i2πktx(dt) |k| ≤ flo
If spikes are separated by at least 1.86 /flo := 1.86 λlo then min TV solution is exact! (Current state of the art 1.4λlo) Infinite precision! (Whatever the amplitudes) Cannot go below λlo Can recover (2λlo)−1 = flo/2 = n/4 spikes from n low-freq. samples Essentially same result in higher dimensions
Primal problem min xTV s. t. Fnx = y Infinite-dimensional variable x Finitely many constraints Dual problem max Rey, c s. t. F∗
nc∞ ≤ 1
Finite-dimensional variable c Infinitely many constraints (F∗
n c)(t) =
ckei2πkt
Primal problem min xTV s. t. Fnx = y Infinite-dimensional variable x Finitely many constraints Dual problem max Rey, c s. t. F∗
nc∞ ≤ 1
Finite-dimensional variable c Infinitely many constraints (F∗
n c)(t) =
ckei2πkt
|(F∗
n c)(t)| ≤ 1 for all t ∈ [0, 1] equivalent to
(1) there is Q Hermitian s. t.
c c∗ 1
(3) sums along superdiagonals vanish: n−j
i=1 Qi,i+j = 0 for 1 ≤ j ≤ n − 1
c: coeffs. of low-pass trig. poly.
k ckei2πkt interpolating sign of primal solution
Signal Estimate
Noisy Noiseless
Signal Estimate
Noisy Noiseless
Signal Estimate
Noisy Noiseless
Signal Estimate
Phase retrieval minimize b − A(X)1 subject to X 0 Robust PCA minimize L∗ + λS1 + γY − L − S2
F
Super-resolution minimize xTV subject to y − Fnx2
2 ≤ σ2
Stability in all cases, sometimes near the information theoretic limit
Noise Algorithms Applications Avalanche of related works
Phase retrieval
Netrapalli, Jain, Sanghavi, Phase retrieval using alternating minimization (’13) Waldspurger, d’Aspremont, Mallat, Phase recovery, MaxCut and complex semidefinite programming (’12)
Robust PCA
Gross, Recovering low-rank matrices from few coefficients in any basis (’09) Chandrasekaran, Parrilo and Willsky, Latent variable graphical model selection via convex optimization (’11) Hsu, Kakade and Zhang, Robust matrix decomposition with outliers (’11)
Super-resolution
Kahane, Analyse et synth` ese harmoniques (’11) Slepian, Prolate spheroidal wave functions, Fourier analysis, and uncertainty. V - The discrete case (’78)
Three important problems with missing data
Phase retrieval Matrix completion/RPCA Super-resolution
Three simple and model-free recovery procedures via convex programming Three near-perfect solutions — Three theorems Three fundamental modeling/computational tools Matrices Probability (Convex) Optimization
(a) Smooth signal (real part) (b) Random signal (real part)
Figure: Recovery (with reweighting) of n-dimensional complex signal (2n unknowns) from 4n quadratic measurements (random binary masks)
CS: sparse signals are ‘away’ from null space of sampling operator Super-res: this is not the case Signal Spectrum
10
−10
10
−5
10
CS: sparse signals are ‘away’ from null space of sampling operator Super-res: this is not the case Signal Spectrum
10
−10
10
−5
10 10
−10
10
−5
10
David Slepian If distance between spikes less than λlo/2 (Rayleigh), problem hopelessly ill posed