Structured random measurements in compressed sensing Holger Rauhut - - PowerPoint PPT Presentation

structured random measurements in compressed sensing
SMART_READER_LITE
LIVE PREVIEW

Structured random measurements in compressed sensing Holger Rauhut - - PowerPoint PPT Presentation

Structured random measurements in compressed sensing Holger Rauhut Lehrstuhl C f ur Mathematik (Analysis) RWTH Aachen Winter School on Compressed Sensing TU Berlin December 35, 2014 1 / 82 Too Few Data Often it is hard, expensive,


slide-1
SLIDE 1

Structured random measurements in compressed sensing

Holger Rauhut Lehrstuhl C f¨ ur Mathematik (Analysis) RWTH Aachen Winter School on Compressed Sensing TU Berlin December 3–5, 2014

1 / 82

slide-2
SLIDE 2

Too Few Data

Often it is hard, expensive, time consuming or otherwise difficult to collect enough measurements in order to reconstruct/acquire a signal.

Magnetic Resonance Imaging Wireless communication Radar

2 / 82

slide-3
SLIDE 3

Compressive sensing

Reconstruction of signals from minimal amount of measured data Key ingredients

◮ Compressibility / Sparsity (small complexity of relevant

information)

◮ Efficient algorithms (convex optimization) ◮ Randomness (random matrices)

Here: Structured random matrices

3 / 82

slide-4
SLIDE 4

Sparsity / Compressibility

Data Compression Most types of signals can be represented well by a sparse expansion, i.e., with only few nonzero coefficients in an appropriate basis (JPEG, MPEG, MP3 etc.). Compressive Sensing / Sparse Recovery Sparse / Compressible signals can be recovered from only few linear measurements via efficient algorithms

4 / 82

slide-5
SLIDE 5

Fourier-Coefficients Time-Domain Signal with 30 Samples Traditional Reconstruction Sparse Recovery Method

5 / 82

slide-6
SLIDE 6

Mathematical formulation

Recover a vector x ∈ CN from underdetermined linear measurements y = Ax, A ∈ Cm×N, where m ≪ N. Key finding of compressive sensing: Recovery is possible if x belongs to a set of low complexity.

◮ Standard compressive sensing: Sparsity (small number of

nonzero coefficients)

◮ Low rank matrix recovery

◮ Phase retrieval

◮ Low rank tensor recovery

◮ Only partial results for tensor recovery available so far. 6 / 82

slide-7
SLIDE 7

Sparsity and Compressibility

◮ coefficient vector: x ∈ CN, N ∈ N ◮ support of x: supp x := {j, xj = 0} ◮ s- sparse vectors: x0 := |supp x| ≤ s.

s-term approximation error σs(x)q := inf{x − zq, z is s-sparse}, 0 < q ≤ ∞. x is called compressible if σs(x)q decays quickly in s. Recall xq = (N

j=1 |xj|q)1/q

Stechkin estimate, for p < q, σs(x)q ≤ s1/q−1/pxp. The unit balls BN

p = {x ∈ CN, xp ≤ 1}, 0 < p ≤ 1, are good

models for compressible signals.

7 / 82

slide-8
SLIDE 8

Compressive Sensing Problem

Reconstruct an s-sparse vector x ∈ CN (or a compressible vector) from its vector y of m measurements y = Ax, A ∈ Cm×N. Interesting case: s < m ≪ N. Preferably fast reconstruction algorithm!

8 / 82

slide-9
SLIDE 9

ℓ1-minimization

ℓ0-minimization is NP-hard, min

x∈CN x0

subject to Ax = y. ℓ1 minimization: min

x x1

subject to Ax = y Convex relaxation of ℓ0-minimization problem. Efficient minimization methods available. Alternatives: Greedy Algorithms (Matching Pursuits) Iterative hard thresholding Iteratively reweighted least squares

9 / 82

slide-10
SLIDE 10

A typical result in compressive sensing

For a draw of a Gaussian random matrix A ∈ Rm×N an s-sparse vector x ∈ RN can be recovered via ℓ1-minimization (and other algorithms) with high probability from y = Ax provided m ≥ Cs ln(eN/s). Similar results for certain structured random matrices (random partial Fourier, partial random circulant,...)

10 / 82

slide-11
SLIDE 11

Recovery conditions

11 / 82

slide-12
SLIDE 12

Uniform vs. nonuniform recovery

Often recovery results are for random matrices A ∈ Rm×N.

◮ Uniform recovery

With high probability on A every sparse vector (low rank matrix etc.) is recovered; P(∀s-sparse x, recovery of x is successful using A) ≥ 1 − ε. Recovery conditions on A

◮ Null space property ◮ Restricted isometry property

◮ Nonuniform recovery

A fixed sparse vector (low rank matrix etc.) is recovered with high probability using A ∈ Rm×N; ∀s-sparse x : P(recovery of x is successful using A) ≥ 1 − ε. Recovery conditions on A

◮ Tangent cone (descent cone) of norm at x intersects ker A

trivially.

◮ Dual certificates 12 / 82

slide-13
SLIDE 13

Recovery via tangent cones

For x ∈ RN, consider optimization problem min z subject to Az = Ax. Tangent cone (descent cone) T(x) = cone{z − x : z ∈ RN, z ≤ x}.

Theorem

x is recovered if ker A ∩ T(x) = {0}.

13 / 82

slide-14
SLIDE 14

Recovery from Gaussian random matrices

Entries of A ∈ Rm×N: independent standard random variables Recovery guarantees via Gaussian width: For a set T ∈ Rn and a standard Gaussian vector g ∈ Rn, ℓ(T) = E sup

x∈T

x, g

Theorem (Chandrasekaran, Parillo, Recht, Willsky 2010)

A vector x ∈ RN is recovered from y = Ax via min z s.t. Az = y with high probability if m ℓ(T(x) ∩ SN−1). Based on Gordon’s escape through a mesh theorem (Gordon’s comparison theorem and concentration of measure for Lipschitz functions) s-sparse vector recovery via ℓ1-minimization if m 2s ln(eN/s).

14 / 82

slide-15
SLIDE 15

Dual certificate

Theorem (Fuchs 2004, Tropp 2005)

Let A ∈ Cm×N, a vector x ∈ CN with support S is the unique solution of min z1 subject to Az = Ax if one of the following equivalent conditions holds: (a)

  • j∈S

sgn(xj)vj

  • < vS1 for all v ∈ ker A \ {0},

(b) AS is injective and there exists a dual vector h ∈ Cm such that (A∗h)j = sgn(xj), j ∈ S, |(A∗h)ℓ| < 1, ℓ ∈ S.

15 / 82

slide-16
SLIDE 16

Dual certificate (II)

Corollary

Let a1, . . . , aN be the columns of A ∈ Cm×N. For x ∈ CN with support S, if the matrix AS is injective and if |A†

Saℓ, sgn(xS)| < 1

for all ℓ ∈ S, then the vector x is the unique ℓ1-minimizer with y = Ax. Here, A†

S is Moore-Penrose pseudo inverse.

One ingredient: Check that AS is well-conditioned, i.e., A∗

SAS − I2→2 ≤ δ < 1.

16 / 82

slide-17
SLIDE 17

Null space property

Null space property: necessary and sufficient for exact recovery of all s-sparse vectors via ℓ1-minimization with A, vS1 ≤ ρvSc1 for all v ∈ ker A, S ⊂ {1, . . . N}, |S| = s for some 0 < ρ < 1. Implies also stability of reconstruction: x − x♯1 ≤ 2(1 + ρ) 1 − ρ σs(x)1

17 / 82

slide-18
SLIDE 18

Restricted Isometry Property (RIP)

Definition

The restricted isometry constant δs of a matrix A ∈ Cm×N is defined as the smallest δs such that (1 − δs)x2

2 ≤ Ax2 2 ≤ (1 + δs)x2 2

for all s-sparse x ∈ CN. Requires that all s-column submatrices of A are well-conditioned.

18 / 82

slide-19
SLIDE 19

RIP implies recovery by ℓ1-minimization

Theorem (Cand` es, Romberg, Tao ’04 – Cand` es ’08 – Foucart, Lai ’09 – Foucart ’09/’12 – Li, Mo ’11 – Andersson, Str¨

  • mberg ’12 – Cai, Zhang ’13)

Assume that the restricted isometry constant of A ∈ Cm×N satisfies δ2s < 1/ √ 2 ≈ 0.7071. Then ℓ1-minimization reconstructs every s-sparse vector x ∈ CN from y = Ax. Extends to low rank matrix recovery via nuclear norm minimization

19 / 82

slide-20
SLIDE 20

Stability

Theorem (Cand` es, Romberg, Tao ’04 – Cand` es ’08 – Foucart, Lai ’09 – Foucart ’09/’12 – Li, Mo ’11 – Andersson, Str¨

  • mberg ’12 – Cai, Zhang ’13)

Let A ∈ Cm×N with δ2s < 1/ √ 2 ≈ 0.7071. Let x ∈ CN, and assume that noisy data are observed, y = Ax + η with η2 ≤ σ. Let x# by a solution of min

z z1

such that Az − y2 ≤ σ. Then x − x#2 ≤ C σs(x)1 √s + Dσ and x − x#1 ≤ Cσs(x)1 + D√sσ for constants C, D > 0, that depend only on δ2s.

20 / 82

slide-21
SLIDE 21

Matrices satisfying the RIP

Open problem: Give explicit matrices A ∈ Cm×N with small δ2s ≤ 0.7 for “large” s. Goal: δs ≤ δ, if m ≥ Cδslnα(N), for constants Cδ and α. Deterministic matrices known, for which m ≥ Cδ,ks2 suffices if N ≤ mk. Way out: consider random matrices.

21 / 82

slide-22
SLIDE 22

Random matrices

22 / 82

slide-23
SLIDE 23

RIP for Gaussian and Bernoulli matrices

Gaussian: entries of A independent N(0, 1) random variables Bernoulli : entries of A independent Bernoulli ±1 distributed rv

Theorem

Let A ∈ Rm×N be a Gaussian or Bernoulli random matrix and assume m ≥ Cδ−2(s ln(eN/s) + ln(2ε−1)) for a universal constant C > 0. Then with probability at least 1 − ε the restricted isometry constant of

1 √mA satisfies δs ≤ δ.

Consequence: Recovery via ℓ1-minimization with probability exceeding 1 − e−cm provided m ≥ Cs ln(eN/s). Bound is optimal as follows from lower bound for Gelfand widths

  • f ℓp-balls, 0 < p ≤ 1.

(Gluskin, Garnaev 1984 — Foucart, Pajor, R, Ullrich 2010)

23 / 82

slide-24
SLIDE 24

Structured Random Measurements

24 / 82

slide-25
SLIDE 25

Structured Random Measurements

Why structure?

◮ Applications impose structure due to physical constraints,

limited freedom to inject randomness.

◮ Fast matrix vector multiplies (FFT) in recovery algorithms,

unstructured random matrices impracticable for large scale applications.

◮ Storage problems for unstructured matrices.

25 / 82

slide-26
SLIDE 26

Random Sampling

26 / 82

slide-27
SLIDE 27

Random Sampling of Sparse Trigonometric Polynomials

Consider trigonometric polynomial f (t) =

  • k∈Γ

xke2πik·t, t ∈ [0, 1]d, Γ ⊂ Zd, |Γ| = N. f is called s-sparse if x is s-sparse. Task: Reconstruct f from sample values f (tℓ), tℓ ∈ [0, 1]d, ℓ = 1, . . . , m. Nonequispaced Fourier matrix Aℓ,k = e2πik·tℓ, ℓ = 1, . . . , m, k ∈ Γ. Then y = (f (tℓ))m

ℓ=1 is given as

y = Ax. Choose sampling points tℓ independently and uniformly at random from [0, 1]d! Then A is structured random matrix.

27 / 82

slide-28
SLIDE 28

Restricted isometry property

Theorem (Cand` es, Tao ’06 – Rudelson, Vershynin ’06 – R ’08/’10 – Bourgain ’14 – Haviv, Regev ’15)

Let A ∈ Cm×N be the random Fourier matrix generated with independent and uniformly distributed sampling points. If m ≥ Cδ−2s max{ln2(s) ln(N), ln(ε−1)}, then the restricted isometry constant of

1 √mA satisfies δs ≤ δ with

probability at least 1 − ε. The constant C > 0 is universal. Implies stable recovery of all s-sparse trigonometric polynomials from m ≥ Cs ln2(s) ln(N) random samples via ℓ1-minimization. Simplified condition: m ≥ Cδs ln3(N) for RIP to hold with probability at least 1 − Nln2(N)

28 / 82

slide-29
SLIDE 29

Nonuniform recovery

Theorem (Cand` es, Romberg, Tao ’06 – R’07)

Let x ∈ CN be s-sparse and A ∈ Cm×N be the random Fourier matrix generated with independent and uniformly distributed sampling points. If m ≥ Cs ln(N/ǫ) then ℓ1-minimization reconstructs x from y = Ax exactly with probability at least 1 − ε. Proof uses combinatorial arguments. Can be generalized (and simplified) using Gross’ golfing scheme (yields m ≥ Cs ln(N) ln(ε−1))

29 / 82

slide-30
SLIDE 30

Numerical Example

The real part of a sparse trigonometric polynomial with sparsity s = 6, N = 81 (maximal degree 40) and m = 25 random sampling

  • points. Reconstruction by ℓ1-minimization is exact!

30 / 82

slide-31
SLIDE 31

Application: Magnetic Resonance Imaging

Comparison of a traditional MRI reconstruction (left) and a compressive sensing reconstruction (right). Acquisition accelerated by a factor of 7.2 by random subsampling of the frequency domain

Image courtesy of Michael Lustig and Shreyas Vasanawala, Stanford University 31 / 82

slide-32
SLIDE 32

Bounded orthonormal systems (BOS)

D ⊂ Rd endowed with probability measure ν. φ1, . . . , φN : D → C function system on D. Orthonormality

  • D

φj(t)φk(t)dν(t) = δj,k = if j = k, 1 if j = k. Uniform bound in L∞: φj∞ = sup

t∈D

|φj(t)| ≤ K for all j ∈ [N]. Example: Fourier system

32 / 82

slide-33
SLIDE 33

Sampling

Consider functions f (t) =

N

  • k=1

xkφk(t), t ∈ D. f is called s-sparse if x is s-sparse. Sampling points t1, . . . , tm ∈ D. Sample values: yℓ = f (tℓ) =

N

  • k=1

xkφk(tℓ) , ℓ ∈ [m]. Sampling matrix A ∈ Cm×N with entries Aℓ,k = φk(tℓ), ℓ ∈ [m], k ∈ [N]. Then y = Ax. Choose sampling points t1, . . . , tm independently at random according to ν. Then A is structured random matrix.

33 / 82

slide-34
SLIDE 34

Restricted isometry property

Theorem (Cand` es, Tao ’06 – Rudelson, Vershynin ’06 – R ’08/’10 – Bourgain ’14 – Haviv, Regev ’15)

Let A ∈ Cm×N be the random sampling matrix associated to a BOS with constant K ≥ 1 generated from independent random sampling points. If m ≥ CK 2δ−2s max{ln2(s) ln(N), ln(ε−1)}, then the restricted isometry constant of

1 √mA satisfies δs ≤ δ with

probability at least 1 − ε. The constant C > 0 is universal. Implies stable recovery of all s-sparse trigonometric polynomials from m ≥ Cs ln2(s) ln(N) random samples via ℓ1-minimization. Simplified condition: m ≥ Cδs ln3(N) for RIP to hold with probability at least 1 − Nln3(N)

34 / 82

slide-35
SLIDE 35

Legendre Polynomials

Consider D = [−1, 1] with normalized Lebesgue measure and

  • rthonormal system of Legendre polynomials φj = Pj,

j = 0, . . . , N − 1. It holds Pj∞ = √2j + 1, so K = √ 2N − 1. The previous result yields the (almost) trivial bound m ≥ CNs log2(s) log(m) log(N) > N. Can we do better?

35 / 82

slide-36
SLIDE 36

Preconditioning (R, Ward 2012)

For an orthonormal system {φj} on D w.r.t. prob. measure ν choose a positive weight function w on [−1, 1] with

  • D

1 w 2(t)dν(t) = 1. Then

dµ(t) :=

1 w 2(t)dν(t) defines prob. measure.

Define ψj(t) = w(t)φj(t). Then

  • D

ψj(t)ψk(t)dµ(t) =

  • D

φj(t)w(t)φk(t)w(t) 1 w 2(t)dν(t) = δj,k, so that {ψj} is ONS w.r.t. µ. If maxj ψj∞ = maxj φjw∞ ≤ K, then random sampling matrix with sampling point chosen according to µ satisfies RIP with high probability if m ≥ Cδ−2K 2s ln2 ln(N). Samples with respect to preconditioned system y ′

ℓ = N

  • j=1

xkψk(tℓ) =

N

  • j=1

xkw(tℓ)φk(tℓ) = w(tℓ)yℓ. Legendre polynomials: Choose Chebyshev weight w(x) = π

2 (1 − x2)1/4. Then

wPj∞ ≤ √ 3 for all j ∈ N0.

36 / 82

slide-37
SLIDE 37

Towards tomography: Sparsity in Wavelets

Simple image model: sparsity in (discrete) wavelet basis W ∈ RN×N, z = Wx for sparse x. Measurements w.r.t. randomly selected Fourier coefficients ONS: φj(t) = e2πij·t/N, t ∈ ZN, columns of Fourier matrix F (Uniform) random selection operator R : CN → Cm Samples: y = RF ∗z = RF ∗Wx = RUx with new ONS U = F ∗W Entries Uk,j = wk, φj satisfy maxj,k |Uj,k| ≍ √ N (after correct normalization) Preconditioning (Krahmer, Ward ’14): Use variable density sampling of Fouier coefficients! Probabilities in 2D: p(j, k) = c 1 + j2 + k2 . Preconditioned matrix satisfies RIP with high probability if m ≥ Cδ−2s ln3(s) log2(N).

37 / 82

slide-38
SLIDE 38

Fourier recovery via total variation minimization

Relation of total variation semi-norm to wavelet coefficients (Needell, Ward ’13) leads to

Theorem (Krahmer, Ward ’13)

Randomly choose m Fourier coefficients of a vector x indexed by {−N, . . . , N} × {−N, . . . , N} according to the probability distribution p(j, k) =

c 1+j2+k2 , j, k ∈ {−N, . . . , N}. If

m ≥ Cs log3(s) log5(N) then with high probability 2D-TV minimization recovers x up to the error x − x♯2 ≤ C ∇x − (∇x)s1 √s , where (∇x)s is the best s-term approximation to the gradient ∇x.

38 / 82

slide-39
SLIDE 39

Extension to weighted ℓ1-minimization (R, Ward ’14)

Weighted sparsity: x0,ω =

j∈supp(x) ω2 j

Recovery of weighted s-sparse via weighted ℓ1-minimization: min

N

  • j=1

|zj|ωj subject to Az = y. May promote smoothness in addition to sparsity! Recovery guaranteed under weighted version of RIP Weighted RIP holds with high probability if ONS {φj} satisfies φj∞ ≤ ωj for all j and m ≥ Cs log3(s) log(N).

39 / 82

slide-40
SLIDE 40

Interpolation via weighted ℓ1-minimization

1 0.5 0.5 1 0.2 0.4 0.6 0.8 1 Original function

Runge’s example f (x) =

1 1+25x2

Weights: wj = 1 + |j| 20 Interpolation points chosen uniformly at random from [−1, 1].

1 0.5 0.5 1 0.5 0.5 1 Least squares solution 1 0.5 0.5 1 0.5 0.5 Residual error 1 0.5 0.5 1 0.2 0.4 0.6 0.8 1 Unweighted l1 minimizer 1 0.5 0.5 1 0.5 0.5 Residual error 1 0.5 0.5 1 0.2 0.4 0.6 0.8 1 Weighted l1 minimizer 1 0.5 0.5 1 0.5 0.5 Residual error

40 / 82

slide-41
SLIDE 41

Interpolation via weighted ℓ1-minimization

−1 −0.5 0.5 1 0.2 0.4 0.6 0.8 1 Original function

f (x) = |x| Weights: wj = 1 + |j|. 20 Interpolation points chosen uniformly at random from [−1, 1].

1 0.5 0.5 1 0.2 0.4 0.6 0.8 1 Least squares solution 1 0.5 0.5 1 0.5 0.5 Residual error 1 0.5 0.5 1 0.2 0.4 0.6 0.8 1 Unweighted l1 minimizer 1 0.5 0.5 1 0.5 0.5 Residual error 1 0.5 0.5 1 0.2 0.4 0.6 0.8 1 Weighted l1 minimizer 1 0.5 0.5 1 0.5 0.5 Residual error

41 / 82

slide-42
SLIDE 42

Subsampled random convolutions

42 / 82

slide-43
SLIDE 43

Subsampled random convolutions

Cyclic Convolution: (b ∗ x)ℓ = N

j=1 bℓ−j mod Nxj

Subsampling Ax = ΦΘ(b)x = RΘ(b ∗ x) RΘ : CN → Cm: restriction to entries in Θ ⊂ {1, . . . , N}, #Θ = m. Example: Θ = {1, 2, . . . , m} Task: Recovery of sparse (compressible) x from subsampled convolution y = ΦΘ(b)x! Choice of b = ξ as subgaussian random vector (independent, mean-zero, variance one, subgaussian entries, P(|ξj| ≥ t) ≤ 2e−ct2) Examples

◮ Rademacher b = ǫ: independent ±1 entries ◮ Gaussian b = g: standard Gaussian random vector,

g ∼ N(0, Id)

43 / 82

slide-44
SLIDE 44

Partial random circulant matrices

Circulant matrix Φ = Φ(b) ∈ CN×N with entries Φi,j = bj−i

mod N

Φx = Φ(b)x = b ∗ x Subsampled convolution with random vector ξ corresponds to partial random circulant matrix ΦΘ(ξ) = RΘΦ(ξ). Fast matrix-vector multiplication via FFT Variations partial random Toeplitz matrices (noncyclic convolution), subsampled 2D-convolutions, ...

44 / 82

slide-45
SLIDE 45

Motivation I: Radar

Received signal is superposition of delayed versions of sent signal. Task: Determine delays (corresponding to distances) from subsampled receive signal! Advantage: uses slow analog-to-digital converters, avoids matched filter, can resolve small targets

45 / 82

slide-46
SLIDE 46

Motivation II: Compressive Coded Aperture Imaging

Super-resolution based on compressive sensing

◮ Use coded mask instead

  • f pinhole (or lense)

◮ Observed coded aperture

image is subsampled 2D-convolution of image x with point-spread function b

Marcia, Willett 2009 – Romberg 2009

46 / 82

slide-47
SLIDE 47

No lens? No problem for FlatCam

News (November 23, 2015) Group around Richard Baraniuk developed FlatCam, a thin sensor chip with mask that replaces lens.

47 / 82

slide-48
SLIDE 48

Numerical experiments

Sparse recovery via ℓ1-minimization with partial random circulant matrix A ∈ Rm×N, N = 500, m = 100.

5 10 15 20 25 30 35 40 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Empirical Recovery Rate Sparsity

48 / 82

slide-49
SLIDE 49

RIP estimate for partial random circulant matrices

Theorem (Krahmer, Mendelson, R 2012)

Let Θ ⊂ [N] be an arbitrary (deterministic) set of cardinality m. Let ξ be a subgaussian random vector in CN (e.g. Rademacher or Gaussian). Assume that m ≥ Cδ−2s max{ln2(s) ln2(N), ln(ε−1)}. Then with probability at least 1 − ε the restricted isometry constants of

1 √mΦΘ(ξ) satisfy δs ≤ δ.

Analysis via new bounds for chaos processes Previous bounds: Haupt, Bajwa, Raz (2008): m ≥ Cδs2 ln N. R, Romberg, Tropp (2010): m ≥ Cδs3/2 ln3/2(N). Random sets Θ, Romberg (2009): m ≥ Cδs ln6 N. Nonuniform recovery (Rauhut ’09, Aschenbr¨ ucker ’13): m ≥ Cs ln(N/ε)

49 / 82

slide-50
SLIDE 50

RIP for time-frequency structured random matrices (Krahmer, Mendelson, Rauhut 2012)

Translation and modulation on Cm: (Tkg)j = gj−k mod m (Mℓg)j = e2πiℓj/mgj Gabor synthesis matrix A = (TkMℓg)(k,ℓ)∈[m]2 ∈ Cm×m2 For subgaussian random vector g the matrix Ag satisfies RIP with high probability if m ≥ Cs ln2 s ln2 m Analysis uses bound for suprema of chaos processes Improves previous bound by R, Pfander, Tropp 2011 Nonuniform recovery (Pfander, R 2010): m ≥ Cs ln(m/ε).

50 / 82

slide-51
SLIDE 51

Radar

51 / 82

slide-52
SLIDE 52

Remote sensing (radar imaging)

n antenna elements on square [0, B]2 in plane z = 0. Targets in the plane z = z0 on grid of resolution cells rj ∈ [−L, L]2 × {z0}, j = 1, . . . , N with mesh size h. x ∈ CN: vector of reflectivities in resolution cells (rj)j=1,...,N. Often sparse scene!

52 / 82

slide-53
SLIDE 53

Sensing mechanism (Fannjiang, Strohmer, Yan 2010)

Antenna at position a ∈ R3 emits monochromatic wave (wavelength λ, wavenumber ω) with amplitude at position r ∈ R3 given by Green’s function of Helmholtz equation H(a, r) = exp (2πir − a2/λ) 4πr − a2 . Approximation (valid for large z0): H(a, r) ≈ eiωz0

4πz0 G(a, r) with

G(a, r) = exp iω 2z0 (|r1 − a1|2 + |r2 − a2|2)

  • Signal corresponding to emitting antenna aℓ and receive antenna

ak (Born approximation) y(k,ℓ) =

N

  • j=1

xjG(aℓ, rj)G(rj, ak) = (Ax)(k,ℓ), k, ℓ = 1, . . . , n n2 measurements

53 / 82

slide-54
SLIDE 54

Random scattering matrix

Choose antenna positions aj, j ∈ [n], independently and uniformly at random in [0, B]2. Then A ∈ Cn2×N is structured random

  • matrix. Entries

A(k,ℓ);j = G(ak, rj)G(rj, aℓ), (k, ℓ) ∈ [n]2, j ∈ [N]. Define v(ak, aℓ) = (G(ak, rj)G(rj, aℓ))j∈[N] ∈ CN. Then A =           v(a1, a1) v(a1, a2) . . . v(a2, a1) . . . v(an, an)           Rows and columns are coupled. Under the condition hB

λz0 ∈ N we have EA∗A = I.

54 / 82

slide-55
SLIDE 55

Reconstruction via ℓ1-minimization

Sparse scene (sparsity s = 100, 6400 grid points): Reconstruction (n = 30 antennas, 900 noisy measurements, SNR 20dB)

55 / 82

slide-56
SLIDE 56

Nonuniform recovery

Theorem (H¨ ugel, R, Strohmer – 2012)

Let x ∈ CN. Choose the n antenna positions independent and uniformly at random in [0, B]2. Assume hB

λz0 ∈ N, where h is mesh

size and λ the wavelength; further n2 ≥ Cs ln2(N/ε) . Let y = Ax + e ∈ Cn2 with e2 ≤ ηn. Let x# be the solution to min z1 subject to y − Az2 ≤ ηn. Then with probability at least 1 − ε x − x#2 ≤ C1σs(x)1 + C2 √sη. Exact recovery when η = 0 and σs(x)1 = 0. RIP estimate open.

56 / 82

slide-57
SLIDE 57

Reduction to bounding a stochastic process

Many problems in analyzing random measurements in sparse and low rank recovery lead to bounding a stochastic process sup

t∈T

|Xt| Choices for T:

◮ Sd−1 = {x ∈ RN : x2 = 1} (operator norm) ◮ Ts = {x ∈ RN : x2 ≤ 1, x0 ≤ s} (RIP constants) ◮ descent cone T(x)

Various random matrices lead to different types of processes

◮ Empirical processes (subgaussian matrices, random partial

Fourier matrices)

◮ chaos processes (partial random circulant matrices)

Proof techniques: ǫ-nets (covering numbers), chaining, noncommutative Bernstein inequality

57 / 82

slide-58
SLIDE 58

Proof techniques

58 / 82

slide-59
SLIDE 59

RIP for Gaussian random matrices

Recall: Restricted isometry constant of matrix A ∈ Rm×N is smallest number such that (1 − δs)x2

2 ≤ Ax2 2 ≤ (1 + δs)x2 2

for all s-sparse x ∈ Rn Equivalently with Ts = {x ∈ Rn, x2 ≤ 1, x0 ≤ s}, BS = {x ∈ Rn : supp x ⊂ S, x2 ≤ 1}: δs = max

x∈Ts |Ax2 2 − x2 2| =

max

S⊂[N],#S≤s max x∈BS

|Ax2

2 − x2 2|

= max

S⊂[N],#S≤s A∗ SAS − I2→2

Estimation of δs for Gaussian random matrix A:

◮ Fix x ∈ Ts and estimate P(|Ax2 2 − x2 2| ≥ u) ◮ Take union bound over sufficiently fine ε-net of Ts ◮ Extend bound over all of Ts

59 / 82

slide-60
SLIDE 60

Step 1: Concentration inequality

Lemma

For a fixed vector x ∈ Rn with x2 ≤ 1 and random draw of a Gaussian matrix A ∈ Rm×N with i.i.d. entries N(0, 1/m), it holds P(|Ax2

2 − x2 2| ≤ u) ≤ 2 exp(−cmu2),

u ∈ (0, 1). Proof sketch:

◮ Ax2 2 = m j=1 |Aj, x|2 with A1, . . . , Am the rows of A ◮ The random variables Aj, x are Gaussian with variance 1 mx2 2, so that Zj = |Aj, x|2 − 1 mx2 2 are independent

mean zero subexponential random variables.

◮ Ax2 2 − x2 2 = m j=1 Zj is a sum of independent mean-zero

subexponential random variables. Bernstein’s inequality implies, for 0 < u ≤ 1, P(|Ax2

2 − x2 2| ≥ u) = P(| m

  • j=1

Zj| ≥ u) ≤ 2 exp(−cmu2).

60 / 82

slide-61
SLIDE 61

ε-nets and covering numbers

Definition

For a subset T ⊂ X of a metric space (X, d), the covering number N(T, d, ε) is the smallest number n such that there exists a set of points x1, . . . , xn ∈ T with T ⊂

n

  • j=1

Bd(xj, ε), Bd(xj, ε) = {x ∈ X : d(x, xj) ≤ ε}. A set {x1, . . . , xn} with this property is called ε-net of T.

Lemma

For some norm · on Rn, the covering numbers of the unit ball B = {x ∈ Rn, x ≤ 1} satisfy N(B, · , ε) ≤ (1 + 2/ε)n. Proof idea: volume comparison

61 / 82

slide-62
SLIDE 62

Union bound over net

Let {x1, . . . , xn} be a ε-net of BS, #S ≤ s with n ≤ (1 + 2/ε)s. Then P( max

j=1,...,n |Axj2 2 − xj2 2| ≥ u) ≤ n

  • j=1

P(|Axj2

2 − xj2 2| ≥ u)

≤ 2n exp(−cmu2) ≤ 2(1 + 2/ε)s exp(−cmu2) = 2 exp(−cmu2 + ln(1 + 2/ε)s) Can extend this estimate to all x ∈ BS, if ε = 1/4, say: P(max

x∈BS |Ax2 2 − x2 2| ≥ u) ≤ 2 exp(−c′mu2 + c′′s).

RIP estimate P(δs ≥ u) = P( max

S⊂[N],#S≤s max x∈BS |Ax2 2 − x2 2| ≥ u)

  • S⊂[N],#S≤s

P(max

x∈BS |Ax2 2 − x2 2| ≥ u) ≤

N s

  • 2e−c′mu2+c′′s

≤ 2 eN s s e−c′mu2+c′′s = 2 exp(−c′mu2 + s ln(eN/s) + c′′s) ≤ η if m ≥ Cu−2(s ln(eN/s) + ln(2η−1)).

62 / 82

slide-63
SLIDE 63

Structured random matrices

The simple net-technique does not work! Consider random partial Fourier matrices: Rows of A ∈ Cm×N are selected independently and uniformly at random among the vectors φj = (e2πijk/N)N

k=1 ∈ CN, j = 1, . . . , N. Set

A =

1 √mA.

Ax2

2 − x2 2 = 1

m

m

  • k=1
  • |φjk, x|2 − x2

2

  • = 1

m

m

  • k=1

Zk Observe that by Plancherel’s theorem for the Fourier transform

  • xj = N−1/2 N

k=1 xke−2πijk/N

E|x, φjk|2 = 1 N

N

  • j=1

|x, φj|2 =

N

  • j=1

| xj|2 = x2

2

so that Zk :=

  • |φjk, x|2 − x2

2

  • satisfies EZk = 0, and

Z1, . . . , Zm are independent.

63 / 82

slide-64
SLIDE 64

Application of Bernstein’s inequality

Ax2

2 − x2 2 = 1

m

m

  • k=1

Zk, Zk = |φjk, x|2 − x2

2

Theorem (Bernstein)

Let Y1, . . . , Ym be a sequence of independent mean zero random variables with |Yk| ≤ K and E|Yk|2 ≤ σk 2 for all k = 1, . . . , m. Then with σ2 = m

k=1 σ2 k for all t > 0,

P(|

m

  • k=1

Yk| ≥ t) ≤ 2 exp

t2/2 σ2 + Kt/3

  • .

If x2 = 1 and x0 ≤ s then x1 ≤ √s and |Zk| =

  • |φjk, x|2 − x2

2

  • ≤ φjk2

∞x2 1 − x2 2 ≤ s

E|Zk|2 = E |φjk, x|4 − x4

2 ≤ Es|φjk, x|2 − 1 = s − 1 ≤ s.

Plugging into Bernstein inequality yields, for 0 < u ≤ 1, P

  • Ax2

2 − x2 2

  • ≥ u
  • ≤ 2 exp

(mu)2/2 ms + smu/3

  • ≤ 2 exp
  • −3

8 m s u2

  • 64 / 82
slide-65
SLIDE 65

Union bound over net

Fix support S ⊂ [N], #S = s and choose an 1

4-net x1, . . . , xn of

BS of cardinality n ≤ (1 + 2/(1/4))s = 9s. Union bound yields P( max

j=1,...,n |Axj2 2 − xj2 2| ≥ u) ≤ 2n exp

  • −3mu2

8s

2 · 9s exp

  • −3mu2

8s

  • = 2 exp
  • −3mu2

8s + s ln(9)

  • Requiring that the right hand side is less than 1/2, say, leeds to

3mu2 8s − ln(9)s ≥ ln(4) ⇐ ⇒ m ≥ 8 3u2 (ln(9)s2 + ln(4)s) Net technique does not avoid square root bottleneck!

65 / 82

slide-66
SLIDE 66

Noncommutative Bernstein inequality

Recall: One ingredient for proving nonuniform recovery guarantees is a bound for A∗

SAS − I2→2

for fixed support S ⊂ [N].

Theorem (Ahlswede, Winter ’01; Tropp ’12)

Let X1, . . . , Xm ∈ Cd×d be independent mean-zero self-adjoint random matrices. Assume that Xℓ2→2 ≤ K almost surely, ℓ ∈ [m], and set σ2 :=

  • m

ℓ=1 E(X 2 ℓ )

  • 2→2. Then, for t > 0,

P

  • m
  • ℓ=1

Xℓ

  • 2→2

≥ t

  • ≤ 2d exp

t2/2 σ2 + Kt/3

  • .

66 / 82

slide-67
SLIDE 67

Conditioning of submatrices of partial Fourier matrices

Theorem

Let S ⊂ [N] with #S = s and let A ∈ Cm×N a random draw of the partial Fourier matrix. Then AS =

1 √mAS satisfies, for 0 < u ≤ 1,

P

  • A∗

S

AS − I2→2 ≥ u

  • ≤ 2s exp
  • −3mu2

8s

  • .

In other words, A∗

S

AS − I2→2 ≤ δ with probability at least 1 − η if m ≥ 8 3δ2 s ln(2s/η−1). Can be used to show nonuniform recovery if m ≥ Cs ln(N). For RIP, union bound over all S ⊂ [N] with #S = s results again in quadratic scaling.

67 / 82

slide-68
SLIDE 68

Analysis of RIP for bounded orthonormal systems: empirical processes

With Xℓ = (φk(tℓ))k ∈ CN a (normalized) measurement yℓ =

1 √m(Ax)ℓ is given by

yℓ = 1 √m

  • k∈Γ

xkφk(tℓ) = 1 √mXℓ, x. The vectors Xℓ, ℓ = 1, . . . , m, are independent, mean zero and E|Xℓ, x|2 = x2

2.

The restricted isometry constant δs can be written as δs = sup

x∈Ts

m−1/2Ax2

2−x2 2 = sup x∈Ts

1 m

m

  • ℓ=1
  • |Xℓ, x|2 − E|Xℓ, x|2

with Ts = {x ∈ CN : x2 ≤ 1, x0 ≤ s}. Introducing fx(tℓ) =

1 √mXℓ, x, δs is the supremum of an empirical process,

δs = sup

x∈Ts m

  • ℓ=1
  • fx(tℓ)2 − E
  • fx(tℓ)2

.

68 / 82

slide-69
SLIDE 69

Reduction to subgaussian process via symmetrization

Symmetrization (ǫ denoting a vector with independent ±1 entries): (Eδp

s )1/p ≤ 2

  • E sup

x∈Ts

  • 1

m

m

  • ℓ=1

ǫℓfx(tℓ)2

  • p1/p

, p ≥ 1. Conditional on (tℓ), Xx := 1

m

m

ℓ=1 ǫℓfx(tℓ)2 is a subgaussian

process: P(|

m

  • ℓ=1

ǫℓ(fx(tℓ)2 − fy(tℓ)2)| ≥ u d(x, y)) ≤ 2e−cu2 with d(x, y) = sup

z∈Ts

m

  • ℓ=1

f 2

z (tℓ)

1/2 dt(x, y), dt(x, y) = max

ℓ=1,...,m |fx(tℓ) − fy(tℓ)|.

69 / 82

slide-70
SLIDE 70

Bounds for suprema of subgaussian processes via generic chaining

Theorem (Dirksen ’13)

Let Xt, t ∈ T, be a processes satisfying P(|Xt − Xr| ≥ ud(t, r)) ≤ 2e−cu2. Then, for any t0 ∈ T, p ≥ 1,

  • E sup

t∈T

|Xt − Xt0|p 1/p ≤ Cγ2(T, d) + 2 sup

t∈T

(E|Xt − Xt0|p)1/p, and as a consequence, P

  • sup

t∈T

|Xt − Xt0| ≥ C1γ2(T, d) + C2∆(T)u

  • ≤ exp(−u2/2).

Here, γ2(T, d) is Talagrand’s γ2-functional and ∆d(T) = supr,t d(r, t).

70 / 82

slide-71
SLIDE 71

Generic Chaining (Talagrand)

Let T be a subset of a space with metric d. A sequence of subsets Tr ⊂ A, r ∈ N0, is called admissible if |T0| = 1, |Tr| ≤ 22r , r ≥ 1. For α > 0 define the γα-functional (most important: α = 2) γα(T, d) = inf sup

t∈T ∞

  • r=0

2r/αd(t, Tr), d(t, Tr) = inf

tr∈Tr d(t, tr),

where the infimum is over all admissible sequences (Tr). Estimate by Dudley-type integral γα(T, d) ≤ C ∆d(T) (ln N(T, d, u))1/α du, where N(T, d, u) denotes the smallest number of balls of radius u in the metric d required to cover T.

71 / 82

slide-72
SLIDE 72

RIP Analysis of Random Fourier Matrices Continued

(Eδp

s )1/p ≤ 2

  • E sup

x∈Ts

  • 1

m

m

  • ℓ=1

ǫℓfx(tℓ)2

  • p1/p

Et sup

z∈Ts

m

  • ℓ=1

f 2

z (tℓ)

1/2 γ2(Ts, dt) + Et sup

z∈Ts

  • Eǫ|

m

  • ℓ=1

ǫℓfz(tℓ)2|p 1/p Using two bounds for covering numbers (volumetric estimate & Maurey’s lemma); Khintchine’s inequality: γ2(Ts, dt)

  • s/m log s log N

sup

z∈Ts

  • Eǫ|

m

  • ℓ=1

ǫℓfz(tℓ)2|p 1/p √p

  • s/m

After some manipulations: (Eδp

s )1/p K

s m log s log N + s m log2 s log2 N + √p s m + p s m.

72 / 82

slide-73
SLIDE 73

Consequence: If m K 2δ−2s max{log2 s log2 N, log(η−1)} then P (δs ≥ δ) ≤ η.

73 / 82

slide-74
SLIDE 74

Recall: RIP estimate for partial random circulant matrices

Theorem (Krahmer, Mendelson, R 2012)

Let Θ ⊂ [N] be an arbitrary (deterministic) set of cardinality m. Let ξ be a subgaussian random vector in CN (e.g. Rademacher or Gaussian). Assume that m ≥ Cδ−2s max{ln2(s) ln2(N), ln(ε−1)}. Then with probability at least 1 − ε the restricted isometry constants of

1 √mΦΘ(ξ) satisfy δs ≤ δ.

74 / 82

slide-75
SLIDE 75

Mathematical Analysis: Chaos processes

With Ts = {x ∈ CN : x2 ≤ 1, x0 ≤ s}: δs = sup

x∈Ts

  • Ax2

2 − x2 2

  • .

For partial random circulant matrices generated by random vector ξ Ax = 1 √mRΘ(ξ ∗ x) =: Vxξ with appropriate Vx ∈ Rm×N. Furthermore, EVxξ2

2 = x2 2.

Therefore, δs is supremum of a chaos process, δs = sup

x∈Ts

  • Vxξ2

2 − EVxξ2 2

  • .

75 / 82

slide-76
SLIDE 76

Generic Chaining for Chaos Processes

Theorem (Krahmer, Mendelson, R 2012)

Let B = −B ⊂ Cm×N be a symmetric set of matrices and ξ ∈ CN be a subgaussian random vector. Then E sup

B∈B

  • Bξ2

2 − EBξ2 2

  • ≤ C1γ2(B, · 2→2)2 + C2∆·F (B)γ2(B, · 2→2).

Here, BF =

  • tr(B∗B) denotes the Frobenius norm.

Symmetry assumption B = −B can be dropped at the cost of slightly more complicated bound.

76 / 82

slide-77
SLIDE 77

Tail bound

Theorem (Krahmer, Mendelson, R ’12 – Dirksen ’13)

Let B = −B ⊂ Cm×N and ξ ∈ CN be a subgaussian random

  • vector. Then

P

  • sup

B∈B

  • Bξ2

2 − EBξ2 2

  • ≥ C1E + t
  • ≤ 2 exp
  • −C2 min

t2 V 2 , t U

  • ,

where E := ∆·F (B)γ2(B, · 2→2) + γ2(B, · 2→2)2, V := ∆·2→2∆·F (B), U := ∆2

·2→2(B).

77 / 82

slide-78
SLIDE 78

Relation to previous estimate of Talagrand

In the Rademacher case ξ = ǫ, rewrite process as sup

B∈B

  • Bǫ2

2 − EBǫ2 2

  • = sup

B∈B

  • j=k

ǫjǫk(B∗B)j,k

  • Chaos process indexed by D = {B∗B : B ∈ B}.

General estimate (Talagrand, 1993) E sup

D∈D

|

  • j=k

ǫjǫkDj,k| ≤ C1γ2(D, · F) + C2γ1(D, · 2→2). This bound was used in the previous RIP estimate due to R, Romberg, Tropp (2010). The appearance of the γ1-functional results in the exponent 3/2.

78 / 82

slide-79
SLIDE 79

Application to RIP estimate

For B = {Vx : x ∈ Ts} with Ts = {x ∈ CN : x2 ≤ 1, x0 ≤ s} and Vxξ =

1 √mRΘ(x ∗ ξ) we have

∆·F (B) = 1, ∆·2→2(B) ≤ s m. Covering number estimates are similar to the ones for random Fourier sampling and lead to γ2(B, · 2→2) ∆·2→2

  • ln(N(B, · 2→2, u))du
  • s log2 s log2 N

m .

79 / 82

slide-80
SLIDE 80

Application of bound for chaos processes to subgaussian random matrices

A subgaussian random matrix E sup

x∈T

|A2

2 − x2 2| γ2(T, · 2)2 + sup x∈T

x2 · γ2(T, · 2) Relation to Gaussian width: γ2(T, · 2) ≍ ℓ(T)

80 / 82

slide-81
SLIDE 81

Thanks very much for your attention!

81 / 82

slide-82
SLIDE 82

References

  • H. Rauhut. Random sampling of sparse trigonometric polynomials. Appl. Comput. Harmon. Anal.,

22(1):16-42, 2007.

  • H. Rauhut. Stability results for random sampling of sparse trigonometric polynomials. IEEE Trans.

Information Theory, 54(12):5661-5670, 2008.

  • H. Rauhut. Compressive sensing and structured random matrices. In M. Fornasier, editor, Theoretical

Foundations and Numerical Methods for Sparse Recovery, volume 9 of Radon Series Comp. Appl. Math., pages 1-92. deGruyter, 2010.

  • S. Foucart, A. Pajor, H. Rauhut, and T. Ullrich. The Gelfand widths of ℓp-balls for 0 < p ≤ 1.
  • J. Complexity, 26:629-640, 2010.

  • M. Fornasier and H. Rauhut. Compressive Sensing. In O. Scherzer, editor, Handbook of Mathematical

Methods in Imaging, p. 187-228. Springer, 2011.

  • M. Fornasier, H. Rauhut, and R. Ward. Low-rank matrix recovery via iteratively reweighted least squares
  • minimization. SIAM J. Optimization, 21(4):1614-1640, 2011.

  • H. Rauhut, J. Romberg, and J. Tropp. Restricted isometries for partial random circulant matrices. Appl.
  • Comput. Harmonic Anal., 32(2):242-254, 2012.

  • H. Rauhut and R. Ward. Sparse Legendre expansions via ℓ1-minimization. J. Approx. Theory,

164(5):517-533, 2012.

  • G. Pfander, H. Rauhut, J. Tropp. The restricted isometry property for time-frequency structured random
  • matrices. Prob. Theory Rel. Fields 156:707-737, 2013.

  • M. H¨

ugel, H. Rauhut, T. Strohmer. Remote sensing via l1-minimization. Found. Comp. Math. 14:115-150, 2014.

  • F. Krahmer, S. Mendelson and H. Rauhut. Suprema of chaos processes and the restricted isometry

property, Comm. Pure Appl. Math., 67(11):1877-1904, 2014.

  • F. Krahmer and R. Ward. Stable and robust sampling strategies for compressive imaging, IEEE Trans.

Image Proc., 23(2):612-622, 2014.

  • J. Bourgain. An improved estimate in the restricted isometry problem. In Geometric Aspects of Functional

Analysis, volume 2116 of Lecture Notes in Mathematics, pages 6570. Springer, 2014.

  • I. Haviv, O. Regev. The restricted isometry property of subsampled Fourier matrices. arXiv:1507.01768,

2015

  • S. Dirksen. Tail bounds via generic chaining. Electron. J. Probab., 20(53):1-29, 2015.

  • S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sensing, Birkh¨

auser, ANHA series, 2013. 82 / 82