Going off the grid Benjamin Recht University of California, - - PowerPoint PPT Presentation

going off the grid
SMART_READER_LITE
LIVE PREVIEW

Going off the grid Benjamin Recht University of California, - - PowerPoint PPT Presentation

Going off the grid Benjamin Recht University of California, Berkeley Joint work with Badri Bhaskar Parikshit Shah Gonnguo Tang imaging astronomy seismology spectroscopy k X c j e i 2 f j t x ( t ) = DOA Estimation j =1 Sampling GPS


slide-1
SLIDE 1

Going off the grid

Benjamin Recht University of California, Berkeley

Joint work with Badri Bhaskar Parikshit Shah Gonnguo Tang

slide-2
SLIDE 2

DOA Estimation GPS Radar Sampling Ultrasound astronomy imaging seismology

x(t) =

k

X

j=1

cjei2πfjt

spectroscopy

slide-3
SLIDE 3

DOA Estimation GPS Radar Sampling Ultrasound astronomy imaging seismology spectroscopy

x(t) =

k

X

j=1

cjg(t − τj)

slide-4
SLIDE 4

DOA Estimation GPS Radar Sampling Ultrasound astronomy imaging seismology spectroscopy

x(t) =

k

X

j=1

cjg(t − τj)ei2πfjt

slide-5
SLIDE 5

DOA Estimation GPS Radar Sampling Ultrasound astronomy imaging seismology spectroscopy

x(s) = PSF ~ 8 < :

k

X

j=1

cjδ(s − sj) 9 = ;

slide-6
SLIDE 6

DOA Estimation GPS Radar Sampling Ultrasound astronomy imaging seismology spectroscopy

ˆ x(ω) = d PSF(ω)

k

X

j=1

cje−i2πωsj

slide-7
SLIDE 7

for some Observe a sparse combination of sinusoids Spectrum Estimation: find a combination of sinusoids agreeing with time series data xm =

s

X

k=1

ckei2πmuk uk ∈ [0, 1) Classic (1790...): Prony’s method

  • toep(x) is positive semidefinite, and any null vector

corresponds to a polynomial that vanishes at

  • MUSIC, ESPRIT, Cadzow, etc.

Assume coefficients are positive for simplicity

ei2πuk

r

  • k=1

ck

  • 1

ei2πuk ei4πuk ei6πuk

  • 1

ei2πuk ei4πuk ei6πuk

=     x0 ¯ x1 ¯ x2 ¯ x3 x1 x0 ¯ x1 ¯ x2 x2 x1 x0 ¯ x1 x3 x2 x1 x0     =: toep(x)

slide-8
SLIDE 8

for some Observe a sparse combination of sinusoids Spectrum Estimation: find a combination of sinusoids agreeing with time series data xm =

s

X

k=1

ckei2πmuk uk ∈ [0, 1)

x ≈ Fc

sparse

n × N Fab = exp(i2πab/N)

Contemporary: Solve with LASSO:

minimize x Fc2

2 + µc1

slide-9
SLIDE 9

for some Observe a sparse combination of sinusoids Spectrum Estimation: find a combination of sinusoids agreeing with time series data xm =

s

X

k=1

ckei2πmuk uk ∈ [0, 1) Classic Contemporary SVD gridding+L1 minimization grid free robust model selection quantitative theory need to know model order lack of quantitative theory unstable in practice discretization error basis mismatch numerical instability Can we bridge the gap?

slide-10
SLIDE 10

! ! ! !

  • Search for best linear combination of fewest atoms
  • “rank” = fewest atoms needed to describe the model

Parsimonious Models

atoms model weights rank

slide-11
SLIDE 11

Atomic Norms

  • Given a basic set of atoms, , define the function

! !

  • When is centrosymmetric, we get a norm

! ! ! ! !

  • When does this work?

kxkA = inf{ X

a∈A

|ca| : x = X

a∈A

caa} kxkA = inf{t > 0 : x 2 tconv(A)} A minimize kzkA subject to Φz = y

IDEA:

A

slide-12
SLIDE 12

Atomic Norm Minimization

  • Generalizes existing, powerful methods
  • Rigorous formula for developing new analysis

algorithms

  • Precise, tight bounds on number of measurements

needed for model recovery

  • One algorithm prototype for a myriad of data-

analysis applications

minimize kzkA subject to Φz = y

IDEA:

Chandrasekaran, R, Parrilo, and Willsky

slide-13
SLIDE 13

Spectrum Estimation

u?

k ∈ [0, 1)

for some Observe a sparse combination of sinusoids

A =                  eiθ ei2πφ+iθ ei4πφ+iθ · · · ei2πφn+iθ       : θ ∈ [0, 2π) , φ ∈ [0, 1)           

Atomic Set Observe:

(signal plus noise)

Classical techniques (Prony, Matrix Pencil, MUSIC, ESPRIT, Cadzow), use the fact that noiseless moment matrices are low-rank:

r

X

k=1

αk 2 6 6 4 1 eφki e2φki e3φki 3 7 7 5 2 6 6 4 1 eφki e2φki e3φki 3 7 7 5

= 2 6 6 4 µ0 µ1 µ2 µ3 ¯ µ1 µ0 µ1 µ2 ¯ µ2 ¯ µ1 µ0 µ1 ¯ µ3 ¯ µ2 ¯ µ1 µ0 3 7 7 5 ⌫ 0

x?

m = s

X

k=1

c?

kei2⇡mu?

k

y = x? + ω

slide-14
SLIDE 14

Atomic Norm for Spectrum Estimation

  • How do we solve the optimization problem?
  • Can we approximate the true signal from partial and

noisy measurements?

  • Can we estimate the frequencies from partial and

noisy measurements?

IDEA:

minimize kzkA subject to kΦz yk  δ

Atomic Set:

A =                     eiθ ei2πφ+iθ ei4πφ+iθ . . . ei2πφn+iθ        : θ ∈ [0, 2π) , φ ∈ [0, 1)             

slide-15
SLIDE 15
  • For general coefficients, the convex hull is

characterized by linear matrix inequalities (Toeplitz positive semidefinite)

! ! !

  • Moment Curve of [t,t2,t3,t4,...],

Which atomic norm for sinusoids?

for some xm =

s

X

k=1

ckei2πmuk uk ∈ [0, 1) When the are positive

ck

r

X

k=1

ck 2 6 6 4 1 ei2πuk ei4πuk ei6πuk 3 7 7 5 2 6 6 4 1 ei2πuk ei4πuk ei6πuk 3 7 7 5

= 2 6 6 4 x0 ¯ x1 ¯ x2 ¯ x3 x1 x0 ¯ x1 ¯ x2 x2 x1 x0 ¯ x1 x3 x2 x1 x0 3 7 7 5 ⌫ 0

t ∈ S1

kxkA = inf ⇢

1 2t + 1 2w0 :

t x∗ x toep(w)

  • ⌫ 0
slide-16
SLIDE 16

A A ∪ {−A} conv(A ∪ {−A}) conv(A)

slide-17
SLIDE 17

cos(2πu1k)/2 − cos(2πu2k)/2 cos(2πu1k)/2 + cos(2πu2k)/2

u2 = 0.1411 u1 = 0.1413

slide-18
SLIDE 18

Nearly optimal rates

u?

k ∈ [0, 1)

for some Observe a sparse combination of sinusoids Observe:

(signal plus noise)

min

p6=q d(up, uq) ≥ 4

n

Assume frequencies are far apart:

x?

m = s

X

k=1

c?

kei2⇡mu?

k

1 nkˆ x x?k2

2  Cσ2s log(n)

n Error Rate:

1 nkˆ x x?k2

2 C0σ2s

n

No algorithm can do better than even if we knew all of the frequencies (uk*) No algorithm can do better than even if the frequencies are well- separated

E  1 nkˆ x x?k2

2

  • C0σ2s log(n/s)

n

y = x? + ω ˆ x = arg min

x 1 2kx yk2 2 + µkxkA

Solve:

slide-19
SLIDE 19
  • Frequencies generated at

random with 1/n separation. Random phases, fading amplitudes.

  • Cadzow and MUSIC provided

model order. AST (Atomic Norm Soft Thresholding) and LASSO estimate noise power.

P(β) β

  • Performance profile over all

parameter values and settings

  • For algorithm s:

Ps(β) = # {p ∈ P : MSEs(p) ≤ β mins MSEs(p)} #(P)

Lower is better Higher is better

Mean Square Error Performance Profile

slide-20
SLIDE 20

Extracting the decomposition

  • How do you extract the frequencies? Look at the

dual norm:

! ! !

  • Dual norm is the maximum modulus of a polynomial

kvk∗

A = max a∈Aha, vi = max u∈[0,1)

  • n

X

k=1

vke2πiku

  • At optimality, maximum

modulus is attained at the support of the signal works way better than Prony interpolation in practice

slide-21
SLIDE 21

Aside: “the proof”

  • How do you prove any compressed sensing-esque

result? Look at the dual norm:

kvk∗

A = max a∈Aha, vi = max u∈[0,1)

  • n

X

k=1

vke2πiku

  • Construct a polynomial such

that the atoms you want maximize the polynomial.

  • Then prove that the

polynomial is bounded everywhere else.

Generic proof:

slide-22
SLIDE 22

Localization Guarantees

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Frequency Dual Polynomial Magnitude True Frequency Estimated Frequency 0.16/n Spurious Frequency

  • l: ˆ

fl∈F

|ˆ cl| ≤ C1σ

  • k2 (n)

n

  • cj −
  • l: ˆ

fl∈Nj

ˆ cl

  • ≤ C3σ
  • k2 (n)

n

  • l: ˆ

fl∈Nj

|ˆ cl|

  • fj∈T d(fj, ˆ

fl) 2 ≤ C2σ

  • k2 (n)

n

slide-23
SLIDE 23

Frequency Localization

20 40 60 80 100 0.2 0.4 0.6 0.8 1 β Ps(β) AST MUSIC Cadzow 100 200 300 400 500 0.2 0.4 0.6 0.8 1 β Ps(β) AST MUSIC Cadzow 5 10 15 20 0.2 0.4 0.6 0.8 1 β Ps(β) AST MUSIC Cadzow −10 −5 5 10 15 20 20 40 60 80 100 k = 32 SNR (dB) m1 AST MUSIC Cadzow −10 −5 5 10 15 20 0.01 0.02 0.03 0.04 k = 32 SNR (dB) m2 AST MUSIC Cadzow

−10 −5 5 10 15 20 1 2 3 4 5 k = 32 SNR (dB) m3 AST MUSIC Cadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

  • l: ˆ

fl∈F

|ˆ cl| ≤ C1σ

  • k2 (n)

n

  • cj −
  • l: ˆ

fl∈Nj

ˆ cl

  • ≤ C3σ
  • k2 (n)

n

  • l: ˆ

fl∈Nj

|ˆ cl|

  • fj∈T d(fj, ˆ

fl) 2 ≤ C2σ

  • k2 (n)

n

slide-24
SLIDE 24

Incomplete Data/Random Sampling

  • Observe a random subset of samples

! !

  • On the grid, Candes, Romberg & Tao

(2004)

!

  • Off the grid, new, Compressed

Sensing extended to the wide continuous domain

!

  • Recover the missing part by solving

! ! ! !

  • Extract frequencies from the dual
  • ptimal solution

T ⊂ {0, 1, . . . , n − 1}

Full observation Random sampling

minimize

z

kzkA subject to zT = xT .

slide-25
SLIDE 25

Exact Recovery (off the grid)

Theorem: (Candès and Fernandez-Granda 2012) A line spectrum with minimum frequency separation Δ > 4/s can be recovered from the first 2s Fourier coefficients via atomic norm minimization. Theorem: (Tang, Bhaskar, Shah, and R. 2012) A line spectrum with minimum frequency separation Δ > 4/n can be recovered from most subsets of the first n Fourier coefficients of size at least O(s log(s) log(n)).

  • s random samples are better than s equispaced samples.
  • On a grid, this is just compressed sensing
  • Off the grid, this is new
  • See [BCGLS12] for a sketching approach to a similar problem
  • No balancing of coherence as n grows.

{

Δ WANT {uk, ck}

xm =

s

X

k=1

ck exp(2πimuk)

slide-26
SLIDE 26

−0.5 0.5 1 −0.5 0.5 0.5 1 SDP −0.5 0.5 1 −0.5 0.5 0.5 1 BP:4x −0.5 0.5 1 −0.5 0.5 0.5 1 BP:64x

P(β) β

Mean Square Error Performance Profile

slide-27
SLIDE 27

Discretization

  • Discretize the parameter space to get a finite

dictionary

Wakin, Becker, et.al (2012), Fannjiang, Strohmer &Yan (2010), Bajwa, Haupt, Sayeed & Nowak (2010), Tropp, Laska, Duarte, Romberg & Baraniuk (2010), Herman & Strohmer (2009), Malioutov, Cetin & Willsky (2005), Candes, Romberg & Tao (2004)

?

x(t) =

N

X

l=1

clei2πflt, fl ∈ grids

  • Basis mismatch

Chi, Scharf, Pezeshki & Calderbank (2011), Herman & Strohmer (2010)

slide-28
SLIDE 28
  • Relaxations:

!

  • Let be a finite net.

! !

  • Let be a matrix whose columns are the set

! ! ! !

  • When polynomials of bounded degree are Lipschitz
  • n T, there is a lambda in (0,1] such that

Approximation through discretization

A1 ⇢ A2 = ) kxkA2  kxkA1 Ψ A✏ kxkA✏ = inf (X

k

|ck| : x = Ψc ) λ✏kxkA✏  kxkA  kxkA✏ T✏ ⊂ T T T✏

+ extra equality constraints

`1

slide-29
SLIDE 29
  • Relaxations:

!

  • Let be a finite net.

! !

  • Let be a matrix whose columns are the set

! ! ! !

  • When polynomials of bounded degree are Lipschitz
  • n T, there is a lambda in (0,1] such that

Approximation through discretization

A1 ⇢ A2 = ) kxkA2  kxkA1 Ψ A✏ kxkA✏ = inf (X

k

|ck| : x = Ψc ) λ✏kxkA✏  kxkA  kxkA✏ T✏ ⊂ T T T✏

+ extra equality constraints

`1

slide-30
SLIDE 30

Discretization for sinusoids

  • Approximation fast to compute using first-order methods.
  • Approximation only improves as N improves.
  • Allows you to estimate the signal, but you can’t recover

the exact frequencies. Discretized Atomic Set:

AN =                  eiθ ei2πφ+iθ ei4πφ+iθ · · · ei2πφn+iθ       : θ = 2πk

N ,

k = 1, . . . , N − 1 , φ ∈ [0, 1)           

kxkAN = inf ( N X

k=1

|ck| : x = F[N, n]c )

First n rows of N x N DFT Matrix

F[N, n] =

  • 1 2πn

N

  • kxkAN  kxkA  kxkAN
slide-31
SLIDE 31

Discretization

  • Discretize the parameter space to get a

finite number of grid points

!

  • Enforce finite number of constraints:

! !

  • Equivalently, in the primal replace the

atomic norm with a discrete one

! ! ! !

  • What happens to the solutions when

| hΦ∗z, a(ωj)i |  1, ωj 2 Ωm

slide-32
SLIDE 32

Convergence in Dual

  • Assumption: there exist parameters

such that are linearly independent

!

  • Enforce finite constraints in the dual:

Theorem:

  • The discretized optimal objectives

converge to the original objective

  • Any solution sequence of the

discretized problems has a subsequence that converges to the solution set of the original problem

  • For the LASSO dual, the convergence

speed is

{ˆ zm} O(ρm)

0.5 1 0.5 1 1.5 m = 32 0.5 1 0.5 1 1.5 m = 128 0.5 1 0.5 1 1.5 m = 512 4 6 8 10 12 30 20 10 log2(|fm − f ∗|) 4 6 8 10 12 15 10 5 log2(m) log2(∥ˆ zm − ˆ z∥)

| hΦ∗z, a(ωj)i |  1, ωj 2 Ωm

slide-33
SLIDE 33

Convergence in Primal

  • The original primal solution is

associated with a measure :

! !

  • The discretized primal solutions

are associated with measure that are supported only on Theorem

  • The discretized optimal measures

converge in distribution to an original

  • ptimal measure
  • Fixed a neighborhood of the support of

an original optimal measure, the supports of the discretized optimal measures will eventually converge to the neighborhood

ˆ x ˆ µ ˆ xm ˆ µm Ωm

0.05 0.1 0.5 1 m = 64 0.05 0.1 0.5 1 m = 256 0.05 0.1 0.5 1 m = 1024

ˆ x = R

Ω a(ω)ˆ

µ(dω), kˆ xkA = kˆ µkTV

slide-34
SLIDE 34

Single Molecule Imaging

Courtesy of Zhuang Research Lab

slide-35
SLIDE 35

Single Molecule Imaging

  • Bundles of 8 tubes of 30 nm diameter
  • Sparse density: 81049 molecules on

12000 frames

  • Resolution: 64x64 pixels
  • Pixel size: 100nmx100nm
  • Field of view: 6400nmx6400nm
  • Target resolution: 10nmx10nm
  • Discretize the FOV into 640x640

pixels

I(x, y) = X

j

cjPSF(x − xj, y − yj) (xj, yj) ∈ [0, 6400]2 (x, y) ∈ {50, 150, . . . , 6350}

slide-36
SLIDE 36

Single Molecule Imaging

slide-37
SLIDE 37

Single Molecule Imaging

10 20 30 40 50 0.2 0.4 0.6 0.8 1 Radius Precision Sparse CoG quickPALM 10 20 30 40 50 0.2 0.4 0.6 0.8 1 Radius Recall Sparse CoG quickPALM 10 20 30 40 50 20 40 60 80 100 Radius Jaccard Sparse CoG quickPALM 10 20 30 40 50 0.2 0.4 0.6 0.8 1 Radius Fscore Sparse CoG quickPALM

slide-38
SLIDE 38

Summary and Future Work

  • Unified approach for continuous problems in

estimation.

  • Obviates incoherence and basis mismatch
  • New insight into bounds on estimation error and exact

recovery through convex duality and algebraic geometry

  • State-of-the-art results in classic fields of spectrum

estimation and system identification

`

  • Recovery of general sparse signal trains
  • More general moment problems in signal

processing

slide-39
SLIDE 39

Acknowledgements

Work developed with Venkat Chandrasekaran, Babak Hassibi (Caltech), Weiyu Xu (Iowa), Pablo A. Parrilo, Alan Willsky (MIT), Maryam Fazel (Washington) Badri Bhaskar, Rob Nowak, Nikhil Rao, Gongguo Tang (Wisconsin).

http://pages.cs.wisc.edu/~brecht/publications.html

For all references, see:

slide-40
SLIDE 40

References

  • Atomic norm denoising with applications to line spectral estimation. Badri

Narayan Bhaskar, Gongguo Tang, and Benjamin Recht. IEEE Transactions

  • n Signal Processing. Vol 61, no 23, pages 5987-5999. 2013.
  • Compressed sensing off the grid. Gongguo Tang, Badri Bhaskar, Parikshit

Shah, and Benjamin Recht. IEEE Transactions on Information Theory. Vol 59, no 11, pages 7465-7490. 2013.

  • Near Minimax Line Spectral Estimation. Gongguo Tang, Badri Narayan

Bhaskar, and Benjamin Recht. Submitted to IEEE Transactions on Information Theory, 2013.

  • The convex geometry of inverse problems. Venkat Chandrasekaran,

Benjamin Recht, Pablo Parrilo, and Alan Willsky. Foundations on Computational Mathematics. Vol. 12, no 6, pages 805-849. 2012.

  • All references can be found at

http://pages.cs.wisc.edu/~brecht/publications.html