Compressed sensing in the real world - The need for a new theory - - PowerPoint PPT Presentation

compressed sensing in the real world the need for a new
SMART_READER_LITE
LIVE PREVIEW

Compressed sensing in the real world - The need for a new theory - - PowerPoint PPT Presentation

Compressed sensing in the real world - The need for a new theory Anders C. Hansen (Cambridge) Joint work with: B. Adcock (Purdue) C. Poon (Cambridge) B. Roman (Cambridge) Paris, January 13, 2014 1 / 49 Compressed Sensing in Inverse Problems


slide-1
SLIDE 1

Compressed sensing in the real world - The need for a new theory

Anders C. Hansen (Cambridge)

Joint work with:

  • B. Adcock (Purdue)
  • C. Poon (Cambridge)
  • B. Roman (Cambridge)

Paris, January 13, 2014

1 / 49

slide-2
SLIDE 2

Compressed Sensing in Inverse Problems

Typical analog/infinite-dimensional inverse problem where compressed sensing is/can be used: (i) Magnetic Resonance Imaging (MRI) (ii) X-ray Computed Tomography (iii) Thermoacoustic and Photoacoustic Tomography (iv) Single Photon Emission Computerized Tomography (v) Electrical Impedance Tomography (vi) Electron Microscopy (vii) Reflection seismology (viii) Radio interferometry (ix) Fluorescence Microscopy

2 / 49

slide-3
SLIDE 3

Compressed Sensing in Inverse Problems

Most of these problems are modelled by the Fourier transform Ff (ω) =

  • Rd f (x)e−2πiω·x dx,
  • r the Radon transform Rf : S × R → C (where S denotes the circle)

Rf (θ, p) =

  • x,θ=p

f (x) dm(x), where dm denotes Lebesgue measure on the hyperplane {x : x, θ = p}.

◮ Fourier slice theorem ⇒ both problems can be viewed as the

problem of reconstructing f from pointwise samples of its Fourier transform. g = Ff , f ∈ L2(Rd). (1)

3 / 49

slide-4
SLIDE 4

Compressed Sensing

◮ Given the linear system

Ux0 = y.

◮ Solve

min z1 subject to PΩUz = PΩy, where PΩ is a projection and Ω ⊂ {1, . . . , N} is subsampled with |Ω| = m. If m ≥ C · N · µ(U) · s · log(ǫ−1) · log (N) . then P(z = x0) ≥ 1 − ǫ, where µ(U) = max

i,j |Ui,j|2

is referred to as the incoherence parameter.

4 / 49

slide-5
SLIDE 5

Pillars of Compressed Sensing

◮ Sparsity ◮ Incoherence ◮ Uniform Random Subsampling

In addition: The Restricted Isometry Property + uniform recovery. Problem: These concepts are absent in virtually all the problems listed above. Moreover, uniform random subsampling gives highly suboptimal results. Compressed sensing is currently used with great success in these fields, however the current theory does not cover this.

5 / 49

slide-6
SLIDE 6

Uniform Random Subsampling

U = UdftV −1

dwt.

5% subsamp-map Reconstruction Enlarged

6 / 49

slide-7
SLIDE 7

Sparsity

◮ The classical idea of sparsity in compressed sensing is that

there are s important coefficients in the vector x0 that we want to recover.

◮ The location of these coefficients is arbitrary.

7 / 49

slide-8
SLIDE 8

Sparsity and the Flip Test

Let x = and y = Udfx, A = PΩUdfV −1

dw ,

where PΩ is a projection and Ω ⊂ {1, . . . , N} is subsampled with |Ω| = m. Solve min z1 subject to Az = PΩy.

8 / 49

slide-9
SLIDE 9

Sparsity - The Flip Test

1 2 3 4 5 6 7 8 9 10 0.5 1 1.5 2

x105

Truncated (max = 151.58)

Figure: Wavelet coefficients and subsampling reconstructions from 10% of Fourier coefficients with

distributions (1 + ω2

1 + ω2 2)−1 and (1 + ω2 1 + ω2 2)−3/2.

If sparsity is the right model we should be able to flip the

  • coefficients. Let

zf =

1 2 3 4 5 6 7 8 9 10 0.5 1 1.5 2

x105

Truncated (max = 151.58)

9 / 49

slide-10
SLIDE 10

Sparsity - The Flip Test

◮ Let

˜ y = UdfV −1

dw zf ◮ Solve

min z1 subject to Az = PΩ˜ y to get ˆ zf .

◮ Flip the coefficients of ˆ

zf back to get ˆ z, and let ˆ x = V −1

dw ˆ

z.

◮ If the ordering of the wavelet coefficients did not matter i.e.

sparsity is the right model, then ˆ x should be close to x.

10 / 49

slide-11
SLIDE 11

Sparsity- The Flip Test: Results

Figure: The reconstructions from the reversed coefficients. Conclusion: The ordering of the coefficients did matter. Moreover, this phenomenon happens with all wavelets, curvelets, contourlets and shearlets and any reasonable subsampling scheme. Question: Is sparsity really the right model?

11 / 49

slide-12
SLIDE 12

Sparsity - The Flip Test

CS reconstr. CS reconstr, w/ flip Subsampling coeffs. pattern 512, 20% UHadV −1

dwt

Fluorescence Microscopy 1024, 12% UHadV −1

dwt

Compressive Imaging, Hadamard Spectroscopy

12 / 49

slide-13
SLIDE 13

Sparsity - The Flip Test (contd.)

CS reconstr. CS reconstr, w/ flip Subsampling coeffs. pattern 1024, 20% UdftV −1

dwt

Magnetic Resonance Imaging 512, 12% UdftV −1

dwt

Tomography, Electron Microscopy

13 / 49

slide-14
SLIDE 14

Sparsity - The Flip Test (contd.)

CS reconstr. CS reconstr, w/ flip Subsampling coeffs. pattern 1024, 10% UdftV −1

dwt

Radio interferometry

14 / 49

slide-15
SLIDE 15

What about the RIP?

◮ Did any of the matrices used in the examples satisfy the RIP?

15 / 49

slide-16
SLIDE 16

Images are not sparse, they are asymptotically sparse

How to measure asymptotic sparsity: Suppose

f =

  • j=1

βjϕj. Let N =

  • k∈N

{Mk−1 + 1, . . . , Mk}, where 0 = M0 < M1 < M2 < . . . and {Mk−1 + 1, . . . , Mk} is the set of indices corresponding to the kth scale. Let ǫ ∈ (0, 1] and let sk := sk(ǫ) = min

  • K :
  • K
  • i=1

βπ(i)ϕπ(i)

  • ≥ ǫ
  • Mk
  • i=Mk−1+1

βjϕj

  • ,

in order words, sk is the effective sparsity at the kth scale. Here π : {1, . . . , Mk − Mk−1} → {Mk−1 + 1, . . . , Mk} is a bijection such that |βπ(i)| ≥ |βπ(i+1)|.

16 / 49

slide-17
SLIDE 17

Images are not sparse, they are asymptotically sparse

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Relative threshold, ǫ Sparsity, sk(ǫ)/(Mk − Mk−1)

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8 Worst sparsity Best sparsity

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Relative threshold, ǫ Sparsity, sk(ǫ)/(Mk − Mk−1)

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8 Worst sparsity Best sparsity

Figure: Relative sparsity of Daubechies 8 wavelet coefficients.

17 / 49

slide-18
SLIDE 18

Images are not sparse, they are asymptotically sparse

Curvelets

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Relative threshold, ǫ Sparsity, sk(ǫ)/(Mk − Mk−1)

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Worst sparsity Best sparsity

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Relative threshold, ǫ Sparsity, sk(ǫ)/(Mk − Mk−1)

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Worst sparsity Best sparsity

Contourlets

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Relative threshold, ǫ Sparsity, sk(ǫ)/(Mk − Mk−1)

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Worst sparsity Best sparsity

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Relative threshold, ǫ Sparsity, sk(ǫ)/(Mk − Mk−1)

Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Worst sparsity Best sparsity

Shearlets

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Relative threshold, ǫ Sparsity, sk(ǫ)/(Mk − Mk−1)

Level 1 Level 2 Level 3 Level 4 Level 5 Worst sparsity Best sparsity

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Relative threshold, ǫ Sparsity, sk(ǫ)/(Mk − Mk−1)

Level 1 Level 2 Level 3 Level 4 Level 5 Worst sparsity Best sparsity

18 / 49

slide-19
SLIDE 19

Analog inverse problems are coherent

Let Un = UdfV −1

dw ∈ Cn×n

where Udf is the discrete Fourier transform and Vdw is the discrete wavelet transform. Then µ(Un) = 1 for all n and all Daubechies wavelets!

19 / 49

slide-20
SLIDE 20

Analog inverse problems are coherent, why?

Note that WOT-lim

n→∞

UdfV −1

dw = U,

where U =    ϕ1, ψ1 ϕ2, ψ1 · · · ϕ1, ψ2 ϕ2, ψ2 · · · . . . . . . ...   . Thus, we will always have µ(UdfV −1

dw ) ≥ c.

20 / 49

slide-21
SLIDE 21

Analog inverse problems are asymptotically incoherent

Fourier to DB4 Fourier to Legendre Polynomials

Figure: Plots of the absolute values of the entries of the matrix U

21 / 49

slide-22
SLIDE 22

Hadamard and wavelets are coherent

Let Un = HV −1

dw ∈ Cn×n

where H is a Hadamard matrix and Vdw is the discrete wavelet

  • transform. Then

µ(Un) = 1 for all n and all Daubechies wavelets!

22 / 49

slide-23
SLIDE 23

Hadamard and wavelets are asymptotically incoherent

Hadamard to Haar Hadamard to DB8

23 / 49

slide-24
SLIDE 24

We need a new theory

◮ Such theory must incorporates asymptotic sparsity and

asymptotic incoherence.

◮ It must explain the two intriguing phenomena observed in

practice:

◮ The optimal sampling strategy is signal structure dependent ◮ The success of compressed sensing is resolution dependent

◮ The theory cannot be RIP based (at least not with the

classical definition of the RIP)

24 / 49

slide-25
SLIDE 25

Sparsity in levels

Definition

For r ∈ N let M = (M1, . . . , Mr) ∈ Nr with 1 ≤ M1 < . . . < Mr and s = (s1, . . . , sr) ∈ Nr, with sk ≤ Mk − Mk−1, k = 1, . . . , r, where M0 = 0. We say that β ∈ l2(N) is (s, M)-sparse if, for each k = 1, . . . , r, ∆k := supp(β) ∩ {Mk−1 + 1, . . . , Mk}, satisfies |∆k| ≤ sk. We denote the set of (s, M)-sparse vectors by Σs,M.

25 / 49

slide-26
SLIDE 26

Sparsity in levels

Definition

Let f =

j∈N βjϕj ∈ H, where β = (βj)j∈N ∈ l1(N). Let

σs,M(f ) := min

η∈Σs,M

β − ηl1. (2)

26 / 49

slide-27
SLIDE 27

Multi-level sampling scheme

Definition

Let r ∈ N, N = (N1, . . . , Nr) ∈ Nr with 1 ≤ N1 < . . . < Nr, m = (m1, . . . , mr) ∈ Nr, with mk ≤ Nk − Nk−1, k = 1, . . . , r, and suppose that Ωk ⊆ {Nk−1 + 1, . . . , Nk}, |Ωk| = mk, k = 1, . . . , r, are chosen uniformly at random, where N0 = 0. We refer to the set Ω = ΩN,m := Ω1 ∪ . . . ∪ Ωr. as an (N, m)-multilevel sampling scheme.

27 / 49

slide-28
SLIDE 28

Local coherence

Definition

Let U ∈ CN×N. If N = (N1, . . . , Nr) ∈ Nr and M = (M1, . . . , Mr) ∈ Nr with 1 ≤ N1 < . . . Nr and 1 ≤ M1 < . . . < Mr we define the (k, l)th local coherence of U with respect to N and M by µN,M(k, l) =

  • µ(PNk−1

Nk

UPMl−1

Ml

) · µ(PNk−1

Nk

U), k, l = 1, . . . , r, where N0 = M0 = 0.

28 / 49

slide-29
SLIDE 29

The optimization problem

inf

η∈ℓ1(N) ηℓ1 subject to PΩUη − y ≤ δ.

(3)

29 / 49

slide-30
SLIDE 30

Theoretical Results

Theorem

Let U ∈ CN×N be an isometry and β ∈ CN. Suppose that Ω = ΩN,m is a multilevel sampling scheme, where N = (N1, . . . , Nr) ∈ Nr and m = (m1, . . . , mr) ∈ Nr. Let (s, M), where M = (M1, . . . , Mr) ∈ Nr, M1 < . . . < Mr, and s = (s1, . . . , sr) ∈ Nr, be any pair such that the following holds: for ǫ > 0 and 1 ≤ k ≤ r, 1 Nk − Nk−1 mk · log(ǫ−1) ·

r

X

l=1

µN,M(k, l) · sl ! · log (N) , (4) and mk ˆ mk · (log(ǫ−1) + 1) · log (N) , with ˆ mk satisfying 1

r

X

k=1

„Nk − Nk−1 ˆ mk − 1 « · µN,M(k, l) · ˜ sk, ∀ l = 1, . . . , r, (5) for all ˜ s1, . . . , ˜ sr ∈ (0, ∞) such that

30 / 49

slide-31
SLIDE 31

Theoretical Results

Theorem

˜ s1 + . . . + ˜ sr ≤ s1 + . . . + sr = s, ˜ sk ≤ Sk(N, M, s). Suppose that ξ ∈ ℓ1(N) is a minimizer of (3). Then, with probability exceeding 1 − sǫ, we have that ξ − β ≤ C · “ δ · √ K · ` 1 + L · √s ´ + σs,M(f ) ” , (6) for some constant C, where σs,M(f ) is as in (2), L = 1 +

q log2(6ǫ−1) log2(4KM√s) and

K = maxk=1,...,r n

Nk −Nk−1 mk

  • .

31 / 49

slide-32
SLIDE 32

Theoretical Results

Sk = Sk(N, M, s) = max

η∈Θ PNk−1 Nk

Uη2, where Θ is given by Θ = {η : ηℓ∞ ≤ 1, |supp(PMl−1

Ml

η)| = sl, l = 1, . . . , r}.

32 / 49

slide-33
SLIDE 33

Fourier to wavelets

mk log(ǫ−1)· log(N) · Nk − Nk−1 Nk−1 ·

  • ˆ

sk +

k−2

  • l=1

sl · 2−α(k−1−l) +

r

  • l=k+2

sl · 2−v(l−1−k)

  • ,

where ˆ sk = max{sk−1, sk, sk+1}.

33 / 49

slide-34
SLIDE 34

r-level Sampling Scheme

Figure: The typical sampling pattern that will be used.

34 / 49

slide-35
SLIDE 35

The Berlin Cathedral (256x256)

15% Random 15% Multi level Fully sampled Bernoulli to DB8 Hadamard to DB8 (original image) RAM (GB): 4.8 < 0.1 Speed (it/s): 1.31 18.1

  • Conv. (sec):

(4m27s) 267 18.6

  • Rel. err. (%):

22.4 14.7

35 / 49

slide-36
SLIDE 36

The Berlin Cathedral (512x512)

15% Random 15% Multi level Fully sampled Bernoulli to DB8 Hadamard to DB8 (original image) RAM (GB): 76.8 < 0.1 Speed (it/s): 0.15 4.9

  • Conv. (sec):

(42m) 2517 (1m13s) 73.4

  • Rel. err. (%):

19.0 12.2

36 / 49

slide-37
SLIDE 37

The Berlin Cathedral (1024x1024)

15% Random 15% Multi level Fully sampled Bernoulli to DB8 Hadamard to DB8 (original image) RAM (GB): 1229 < 0.1 Speed (it/s): 0.0161 1.07

  • Conv. (sec):

6h 36m (3m45s) 225.4

  • Rel. err. (%):

10.4

37 / 49

slide-38
SLIDE 38

The Berlin Cathedral (2048x2048)

15% Random 15% Multi level Fully sampled Bernoulli to DB8 Hadamard to DB8 (original image) RAM (GB): 19661 < 0.1 Speed (it/s): 0.17

  • Conv. (sec):

(28m) 1687

  • Rel. err. (%):

8.5

38 / 49

slide-39
SLIDE 39

The Berlin Cathedral (4096x4096)

15% Random 15% Multi level Fully sampled Bernoulli to DB8 Hadamard to DB8 (original image) RAM (GB): 314573 < 0.1 Speed (it/s): 0.041

  • Conv. (sec):

(1h37m) 5852

  • Rel. err. (%):

6.56

39 / 49

slide-40
SLIDE 40

The Berlin Cathedral (8192x8192)

15% Random 15% Multi level Fully sampled Bernoulli to DB8 Hadamard to DB8 (original image) RAM (GB): 5033165 < 0.1 Speed (it/s): 0.0064

  • Conv. (sec):

238d (8h30m) 30623

  • Rel. err. (%):

3.5

40 / 49

slide-41
SLIDE 41

The Key Question

◮ Is universality really what we want?

41 / 49

slide-42
SLIDE 42

Gaussian w/model based CS vs multi-level and Hadamard

30% Gaussian to wavelet 30% Hadamard to wavelet Original image

Figure: Gaussian w/model based CS vs Hadamard

42 / 49

slide-43
SLIDE 43

Gaussian w/model based CS vs multi-level and Hadamard

30% Gaussian to wavelet 30% Had. to Curvlts Original image

Figure: Gaussian w/model based CS vs Hadamard

43 / 49

slide-44
SLIDE 44

The GLPU-Phantom

The Guerquin-Kern,Lejeune, Pruessmann, Unser-Phantom (ETH and EPFL)

44 / 49

slide-45
SLIDE 45

256 × 256 full sampling and 5% subsampling (DB4)

MSE is obviously different.

45 / 49

slide-46
SLIDE 46

4096 × 4096 full sampling and 4% subsampling (DB4)

MSE is the same for both reconstructions.

46 / 49

slide-47
SLIDE 47

Seeing further with compressed sensing

Figure: The figure shows 512 × 512 full sampling (= 262144 samples) with 2048 × 2048 zero padding.

47 / 49

slide-48
SLIDE 48

Seeing further with compressed sensing

Figure: The figure shows 6.25% subsampling from 2048 × 2048 (= 262144 samples) and DB4.

48 / 49

slide-49
SLIDE 49

Take home message: Compressed Sensing Vol. II

◮ The optimal sampling strategy depends on the structure of

the signal (unless you have perfect incoherence).

◮ Real world problems are usually completely coherent, yet

asymptotically incoherent. Thus, one must use multi-level sampling.

◮ We have covered the abstract orthonormal basis case, but

there is tons of work to be done (frames, TV, curvelets, contourlets, shearlets, polynomials, etc).

◮ When building hardware one does not need to strive for

incoherence, one only needs asymptotic incoherence.

◮ Speed and storage issues in compressive imaging can be

solved by using multi-level sampling.

◮ Related work:

◮ Krahmer and Ward ◮ Baranuik, Cevher, Duarte, Hegde (model based CS) ◮ Calderbank, Carin, Carson, Chen, Rodrigues 49 / 49