Iteratively Reweighted 1 Approaches to Sparse Composite - - PowerPoint PPT Presentation

iteratively reweighted 1 approaches to sparse composite
SMART_READER_LITE
LIVE PREVIEW

Iteratively Reweighted 1 Approaches to Sparse Composite - - PowerPoint PPT Presentation

Iteratively Reweighted 1 Approaches to Sparse Composite Regularization Phil Schniter Joint work with Prof. Rizwan Ahmad (OSU) Supported in part by NSF grant CCF-1018368. MATHEON Conf. on Compressed Sensing and its Applications TU-Berlin


slide-1
SLIDE 1

Iteratively Reweighted ℓ1 Approaches to Sparse Composite Regularization Phil Schniter

Joint work with Prof. Rizwan Ahmad (OSU)

Supported in part by NSF grant CCF-1018368.

MATHEON Conf. on Compressed Sensing and its Applications TU-Berlin — Dec 11, 2015

slide-2
SLIDE 2

Introduction and Motivation for Composite Penalties

Outline

1

Introduction and Motivation for Composite Penalties

2

Co-L1 and its Interpretations

3

Co-IRW-L1 and its Interpretations

4

Numerical Experiments

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 2 / 29

slide-3
SLIDE 3

Introduction and Motivation for Composite Penalties

Introduction

Goal: Recover signal x ∈ CN from noisy linear measurements y = Φx + w ∈ CM where usually M ≪ N. Approach: Solve the optimization problem ˆ x = arg min

x γy − Φx2 2 + R(x),

with γ > 0 controlling the measurement fidelity. Question: How should we choose penalty/regularization R(x)?

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 3 / 29

slide-4
SLIDE 4

Introduction and Motivation for Composite Penalties

Typical Choices of Penalty

Say Ψx is (approximately) sparse for “analysis operator” Ψ ∈ CL×N. ℓ0 penalty: R(x) = Ψx0 Impractical: optimization problem is NP hard ℓ1 penalty (generalized LASSO): R(x) = Ψx1 Tightest convex relaxation of ℓ0 penalty Fast algorithms: ADMM, MFISTA, NESTA-UP, grAMPa . . . non-convex penalties R(x) = Ψxp for p ∈ (0, 1) (via IRW-L2) R(x) = L

l=1 log(ǫ + |ψT l x|) with ǫ ≥ 0 (via IRW-L1)

many others...

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 4 / 29

slide-5
SLIDE 5

Introduction and Motivation for Composite Penalties

Choice of Analysis Operator

How to choose Ψ in practice? Maybe a wavelet transform? Which one? Maybe a concatenation of several transforms   Ψ1 . . . ΨD   (e.g., SARA1)? What if signal is more sparse in one dictionary than another? Can we compensate for this? Can we exploit this?

1Carrillo, McEwen, Van De Ville, Thiran, Wiaux, “Sparsity averaged reweighted

analysis,” IEEE SPL, 2013

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 5 / 29

slide-6
SLIDE 6

Introduction and Motivation for Composite Penalties

Example: Undecimated Wavelet Transform of MRI Cine

Note different sparsity rate in each subband of 1-level UWT:

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 6 / 29

slide-7
SLIDE 7

Introduction and Motivation for Composite Penalties

Composite ℓ1 Penalties

We propose to use composite ℓ1 (Co-L1) penalties of the form R(x; λ)

D

  • d=1

λdΨdx1, λd ≥ 0 where Ψd ∈ CLd×N have unit-norm rows. The Ψd could be chosen, for example, as

different DWTs (i.e., db1,db2,db3,. . . ,db10), different subbands of a given DWT, row-subsets of I (i.e., group/hierarchical sparsity), or all of the above.

We then aim to simultaneously tune the weights {λd} and recover the signal x.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 7 / 29

slide-8
SLIDE 8

Introduction and Motivation for Composite Penalties

The Co-L1 Algorithm

1: input:

{Ψd}D

d=1, Φ, y, γ > 0, ǫ ≥ 0

2: if Ψdx ∈ RLd then Cd = 1, elseif Ψdx ∈ CLd then Cd = 2. 3: initialization:

λ(1)

d

= 1 ∀d

4: for t = 1, 2, 3, . . . 5:

x(t) ← arg min

x

  • γy − Φx2

2 + D

  • d=1

λ(t)

d Ψdx1

  • 6:

λ(t+1)

d

← CdLd ǫ + Ψdx(t)1 , d = 1, . . . , D

7: end 8: output: x(t)

leverages existing ℓ1 solvers (e.g., ADMM, MFISTA, NESTA-UP, grAMPa), reduces to the IRW-L1 algorithm [Figueiredo,Nowak’07] when Ld = 1 ∀d (single-atom dictionaries). applies to both real- and complex-valued cases,

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 8 / 29

slide-9
SLIDE 9

Introduction and Motivation for Composite Penalties

The Co-IRW-L1 Algorithm

1: input:

{Ψd}D

d=1, Φ, y, γ > 0

2: initialization:

λ(1)

d

= 1 ∀d, W (1)

d

= I ∀d

3: for t = 1, 2, 3, . . . 4:

x(t) ← arg min

x

  • γy − Φx2

2 + D

  • d=1

λ(t)

d W (t) d Ψdx1

  • 5:

(λ(t+1)

d

, ǫ(t+1)

d

) ← arg max

λd∈Λ,ǫd>0 log p(x(t); λ, ǫ), d = 1, ..., D

6:

W (t+1)

d

← diag

  • 1

ǫ(t+1)

d

+ |ψT

d,1x(t)|

, · · · , 1 ǫ(t+1)

d

+ |ψT

d,Ldx(t)|

  • , d = 1, ..., D

7: end 8: output: x(t)

tunes both λd and diagonal W d for all d: hierarchical weighting. also tunes regularization parameters ǫd for all d.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 9 / 29

slide-10
SLIDE 10

Introduction and Motivation for Composite Penalties

Understanding Co-L1 and Co-IRW-L1

In the sequel, we provide four interpretations of each algorithm:

1 Majorization-minimization (MM) for a particular non-convex penalty, 2 a particular approximation of ℓ0 minimization, 3 Bayesian estimation according to a particular hierarchical prior, 4 variational EM algorithm under a particular prior.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 10 / 29

slide-11
SLIDE 11

Co-L1 and its Interpretations

Outline

1

Introduction and Motivation for Composite Penalties

2

Co-L1 and its Interpretations

3

Co-IRW-L1 and its Interpretations

4

Numerical Experiments

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 11 / 29

slide-12
SLIDE 12

Co-L1 and its Interpretations

Optimization Interpretations of Co-L1

Co-L1 is an MM approach to the weighted log-sum optimization problem arg min

x

  • γy − Φx2

2 + D

  • d=1

Ld log(ǫ + Ψdx1)

  • and

As ǫ → 0, Co-L1 aims to solve the weighted ℓ1,0 problem arg min

x

  • γy − Φx2

2 + D

  • d=1

Ld 1Ψdx1>0

  • Note: Ld is # atoms in dictionary Ψd, and 1 is the indicator function.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 12 / 29

slide-13
SLIDE 13

Co-L1 and its Interpretations

Approximate-ℓ0 Interpretation of Log-Sum Penalty

1 log(1/ǫ)

N

  • n=1

log(ǫ + |un|) = 1 log(1/ǫ)

  • n: xn=0

log(ǫ) +

  • n: xn=0

log(ǫ + |un|)

  • = x0 − N +
  • n: xn=0 log(ǫ + |un|)

log(1/ǫ)

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 0.5 1 1.5 eps=1e-13 eps=0.001 eps=0.1 ell1 ell0

As ǫ→0, the log-sum penalty becomes a scaled and shifted version of the ℓ0 penalty.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 13 / 29

slide-14
SLIDE 14

Co-L1 and its Interpretations

Bayesian Interpretations of Co-L1

Co-L1 is an MM approach to Bayesian MAP estimation under an AWGN likelihood and the hierarchical prior p(x|λ) =

D

  • d=1

λd 2 Ld exp

  • −λdΨdx1
  • i.i.d. Laplacian

p(λ) =

D

  • d=1

Γ

  • 0, 1

ǫ

  • ,

i.i.d. Gamma (i.i.d. Jeffrey’s as ǫ → 0) and As ǫ → 0, Co-L1 is a variational EM approach to estimating (determin- istic) λ under an AWGN likelihood and the prior p(x; λ) =

D

  • d=1

λd 2 Ld exp

  • −λd(Ψdx1 + ǫ)
  • i.i.d. Laplacian as ǫ → 0

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 14 / 29

slide-15
SLIDE 15

Co-IRW-L1 and its Interpretations

Outline

1

Introduction and Motivation for Composite Penalties

2

Co-L1 and its Interpretations

3

Co-IRW-L1 and its Interpretations

4

Numerical Experiments

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 15 / 29

slide-16
SLIDE 16

Co-IRW-L1 and its Interpretations

A Simplified Version of Co-IRW-L1

Consider the real-valued and fixed-ǫd variant of Co-IRW-L1.

1: input:

{Ψd}D

d=1, Φ, y, γ > 0, ǫd > 0 ∀d

2: initialization:

λ(1)

d

= 1 ∀d, W (1)

d

= I ∀d

3: for t = 1, 2, 3, . . . 4:

x(t) ← arg min

x

  • γy − Φx2

2 + D

  • d=1

λ(t)

d W (t) d Ψdx1

  • 5:

λ(t+1)

d

  • 1

Ld

Ld

  • l=1

log

  • 1 + |ψT

d,lx(t)|

ǫd −1 + 1, d = 1, ..., D

6:

W (t+1)

d

← diag

  • 1

ǫd + |ψT

d,1x(t)|

, · · · , 1 ǫd + |ψT

d,Ldx(t)|

  • , d = 1, ..., D,

7: end 8: output: x(t)

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 16 / 29

slide-17
SLIDE 17

Co-IRW-L1 and its Interpretations

Optimization Interpretations of real-Co-IRW-L1-ǫ

Real-Co-IRW-L1-ǫ is an MM approach to the non-convex optimization

arg min

x

  • γy − Φx2

2 + D

  • d=1

Ld

  • l=1

log

  • ǫd + |ψT

d,lx|

Ld

  • i=1

log

  • 1 + |ψT

d,ix|

ǫd

  • and

As ǫd → 0, real-Co-IRW-L1-ǫ aims to solve the ℓ0+ weighted ℓ0,0 prob- lem arg min

x

  • γy − Φx2

2 + Ψx0 + D

  • d=1

Ld 1Ψdx0>0

  • Note: Ld is the size of dictionary Ψd, and 1 is the indicator function.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 17 / 29

slide-18
SLIDE 18

Co-IRW-L1 and its Interpretations

Bayesian Interpretations of real-Co-IRW-L1-ǫ

Real-Co-IRW-L1 is an MM approach to Bayesian MAP estimation under an AWGN likelihood and the hierarchical prior p(x|λ) =

D

  • d=1

Ld

  • l=1

λd 2ǫd

  • 1 + |ψT

d,lx|

ǫd −(λd+1) i.i.d. generalized-Pareto p(λ) =

D

  • d=1

p(λd), p(λd) ∝

  • 1

λd

λd > 0 else Jeffrey’s non-informative and Real-Co-IRW-L1 is a variational EM approach to estimating (determin- istic) λ under an AWGN likelihood and the prior p(x; λ) =

D

  • d=1

Ld

  • l=1

λd − 1 2ǫd

  • 1 + |ψT

d,lx|

ǫd −λd i.i.d. generalized-Pareto

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 18 / 29

slide-19
SLIDE 19

Co-IRW-L1 and its Interpretations

The Co-IRW-L1 Algorithm

Finally, we self-tune ǫd ∀d and allow for real or complex quantities:

1: input:

{Ψd}D

d=1, Φ, y, γ > 0

2: if Ψx ∈ RL, use Λ = (1, ∞) and the real version of log p(x; λ, ǫ);

elseif Ψx ∈ CL, use Λ = (2, ∞) and the complex version of log p(x; λ, ǫ).

3: initialization:

λ(1)

d

= 1 ∀d, W (1)

d

= I ∀d

4: for t = 1, 2, 3, . . . 5:

x(t) ← arg min

x

  • γy − Φx2

2 + D

  • d=1

λ(t)

d W (t) d Ψdx1

  • 6:

(λ(t+1)

d

, ǫ(t+1)

d

) ← arg max

λd∈Λ,ǫd>0 log p(x(t); λ, ǫ), d = 1, ..., D

7:

W (t+1)

d

← diag

  • 1

ǫ(t+1)

d

+ |ψT

d,1x(t)|

, · · · , 1 ǫ(t+1)

d

+ |ψT

d,Ldx(t)|

  • , d = 1, ..., D

8: end 9: output: x(t)

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 19 / 29

slide-20
SLIDE 20

Numerical Experiments

Outline

1

Introduction and Motivation for Composite Penalties

2

Co-L1 and its Interpretations

3

Co-IRW-L1 and its Interpretations

4

Numerical Experiments

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 20 / 29

slide-21
SLIDE 21

Numerical Experiments

Experiment: Synthetic finite difference image

α = 1 α = 27

48×48 image with a total of 28 horiz & vert transitions. α

# vertical transitions # horizontal transitions

Ψ1 = vertical finite difference, Ψ2 = horizon. finite difference “spread-spectrum” Φ sampling ratio M

N = 0.3

AWGN @ 30 dB SNR

5 10 15 20 25 20 25 30 35 40 45 50

L1 Co−L1 IRW−L1 Co−IRW−L1

median recovery SNR [dB] transition ratio α

⇒ The composite algorithms significantly outperform the non-composite ones ⇒ Performance improves as sparsities become more disparate!

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 21 / 29

slide-22
SLIDE 22

Numerical Experiments

Experiment: Shepp-Logan Phantom

96 × 96 image Ψ ∈ R7N×N = 2D UWT-db1, Ψd ∈ RN×N ∀d “spread-spectrum” Φ AWGN @ 30 dB SNR

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 −10 10 20 30 40 50

L1 Co−L1 IRW−L1 Co−IRW−L1

median recovery SNR [dB] sampling ratio M/N

⇒ The composite algorithms significantly outperform the non-composite ones ⇒ Performance gap is larger for small M/N

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 22 / 29

slide-23
SLIDE 23

Numerical Experiments

Experiment: Cameraman

96 × 104 image Ψ ∈ R7N×N = 2D UWT-db1, Ψd ∈ RN×N ∀d “spread-spectrum” Φ AWGN @ 40 dB SNR

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 −5 5 10 15 20 25 30 35

L1 Co−L1 IRW−L1 Co−IRW−L1

median recovery SNR [dB] sampling ratio M/N

⇒ The composite algorithms significantly outperform the non-composite ones ⇒ Performance gap is larger for small M/N

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 23 / 29

slide-24
SLIDE 24

Numerical Experiments

Experiment: 1D Dynamic MRI

x-y profile x-t profile k-t sampling

144 × 48 spatiotemporal profile extracted from MRI cine Ψ ∈ R3N×N: [db1;db2;db3] 2D DWT Φ: variable density random Fourier AWGN @ 30 dB SNR

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 24 / 29

slide-25
SLIDE 25

Numerical Experiments

Experiment: 1D Dynamic MRI (cont.)

sampling ratio M/N = 0.3 L1 Co-L1 IRW-L1

Co-IRW-L1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 −5 5 10 15 20 25 30 35

L1 Co−L1 IRW−L1 Co−IRW−L1

median recovery SNR [dB] sampling ratio M/N

The composite algs significantly outperform the non-composite ones at small measurement ratios M/N Little advantage to Co-IRW-L1 over Co-L1 in this experiment

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 25 / 29

slide-26
SLIDE 26

Numerical Experiments

Average Runtimes for Previous Experiments

Shepp-Logan Cameraman dMRI L1 8.12s 9.88s 22.0s Co-L1 8.83s 12.8s 21.7s IRW-L1 7.95s 12.7s 24.1s Co-IRW-L1 9.29s 16.9s 29.6s The composite algs run only 1.3× longer than the non-composite ones.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 26 / 29

slide-27
SLIDE 27

Numerical Experiments

Open Questions

Performance guarantees? Convergence guarantees? (So far we have only established an asymptotic stationary point condition using an MM analysis of Julien Mairal.2 Design of dictionaries {Ψd}? Extension to matrix compressive sensing (e.g., low-rank, row-sparse, column-sparse, etc.)?

  • 2J. Mairal, “Optimization with first-order surrogate functions,” ICML, 2013.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 27 / 29

slide-28
SLIDE 28

Numerical Experiments

Conclusions

We proposed a new “composite-L1” approach to L2-penalized signal reconstruction that learns and exploits differences in sparsity across sub-dictionaries. Relative to standard L1 methods, our composite L1 methods give significant improvements in reconstruction SNR at low sampling rates, at the cost of very mild complexity increase. Our algorithms can be interpreted as MM approaches to non-convex

  • ptimization, approximate ℓ0 methods, Bayesian methods, and

variational Bayesian methods.

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 28 / 29

slide-29
SLIDE 29

Numerical Experiments

References

Thanks!

1

  • R. Ahmad and P. Schniter, “Iteratively Reweighted L1 Approaches to Sparse

Composite Regularization,” IEEE Transactions on Computational Imaging, to

  • appear. (See also http://arxiv.org/abs/1504.05110v4)

2

  • R. Ahmad and P. Schniter, “Iteratively Reweighted L1 Approaches to

L2-Constrained Sparse Composite Regularization,” (See http://arxiv.org/abs/1504.05110v2)

Phil Schniter (Ohio State) Composite ℓ1 Regularization MATHEON — Dec’15 29 / 29