On the convergence rate of iterative 1 minimization algorithms - - PowerPoint PPT Presentation

on the convergence rate
SMART_READER_LITE
LIVE PREVIEW

On the convergence rate of iterative 1 minimization algorithms - - PowerPoint PPT Presentation

On the convergence rate of iterative 1 minimization algorithms Ignace Loris Applied Inverse Problems Vienna 21/7/2009 Ignace Loris Convergence of 1 algorithms Theme comparison of iterative algorithms for sparse recovery: sparse


slide-1
SLIDE 1

On the convergence rate

  • f

iterative ℓ1 minimization algorithms

Ignace Loris

Applied Inverse Problems

Vienna 21/7/2009

Ignace Loris Convergence of ℓ1 algorithms

slide-2
SLIDE 2

Theme

comparison of iterative algorithms for sparse recovery:

sparse ↔ “ℓ1” ← algorithms

influence of noise and ill-conditioning on effectiveness

  • influence of noise and ill-conditioning on speed

Ignace Loris Convergence of ℓ1 algorithms

slide-3
SLIDE 3

Overview

sparsity through ℓ1 penalty: illustration on toy example When does it work? ↔ How to make it work? Speed comparison of iterative algorithms Speed comparison of ‘continuation’ algorithms

Ignace Loris Convergence of ℓ1 algorithms

slide-4
SLIDE 4

Setting

Data vector: y (noisy) Model vector: x (coefficients in some basis) Linear relationship: Kx = y Problems: insufficient data, inconsistent data, ill-conditioning of K Solution: minimize a penalized least-squares functional: xrec = arg min

x

Kx − y2 + penalty

Ignace Loris Convergence of ℓ1 algorithms

slide-5
SLIDE 5

Old and new penalties

Traditional: ℓ2-norm penalties xrec = arg min

x

Kx − y2 + λx2 (advantage: linear equations) Sometimes better reconstructions by using other information

Suppose the desired model x is sparse (few nonzero components) In that case, ℓ1 penalty is suitable (Donoho et al. 2006 etc):

xrec = arg min

x

Kx − y2 + 2λx1

Ignace Loris Convergence of ℓ1 algorithms

slide-6
SLIDE 6

Question 1: When does it work?

Experiment: Input : xinput =sparse Data : y = Kx + n ℓ1 reconstruction : xrec Compare xrec to xinput When does this work?

Ignace Loris Convergence of ℓ1 algorithms

slide-7
SLIDE 7

ℓ1 penalty for sparse recovery: toy example

20 40 60 80 100 0.2 0.4 0.6 0.8

20 40 60 80 100 5 10 15 20 25 30

xinput K Synthetic noisy data: y = Kxinput + n

Ignace Loris Convergence of ℓ1 algorithms

slide-8
SLIDE 8

ℓ1 penalty for sparse recovery: toy example

20 40 60 80 100 5 10 15 20 25 30

K Synthetic noisy data: y = Kxinput + n min

x

Kx − y2 + λx2 min

x

Kx − y2 + 2λx1

Ignace Loris Convergence of ℓ1 algorithms

slide-9
SLIDE 9

ℓ1 penalty for sparse recovery: toy example

20 40 60 80 100 5 10 15 20 25 30

K Synthetic noisy data: y = Kxinput + n

20 40 60 80 100 0.2 0.4 0.6 0.8 20 40 60 80 100 0.2 0.4 0.6 0.8

min

x

Kx − y2 + λx2 min

x

Kx − y2 + 2λx1

Ignace Loris Convergence of ℓ1 algorithms

slide-10
SLIDE 10

ℓ1 penalty for sparse recovery: toy example

xinput

20 40 60 80 100 0.2 0.4 0.6 0.8

20 40 60 80 100 5 10 15 20 25 30

K Synthetic noisy data: y = Kxinput + n

20 40 60 80 100 0.2 0.4 0.6 0.8 20 40 60 80 100 0.2 0.4 0.6 0.8

min

x

Kx − y2 + λx2 min

x

Kx − y2 + 2λx1

Ignace Loris Convergence of ℓ1 algorithms

slide-11
SLIDE 11

Success rate of noiseless sparse recovery with ℓ1

no noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kxinput Reconstruct: xrec = arg minKx=y x1

Ignace Loris Convergence of ℓ1 algorithms

slide-12
SLIDE 12

Success rate of noiseless sparse recovery with ℓ1

no noise, K=random matrix (elements from a Gaussian distribution) k/n n/N 1 1 Prepare data: y = Kxinput Reconstruct: xrec = arg minKx=y x1 Gray level ÷ probability of xrec = xinput

N =number of variables n = number of data k = # of nonzero components in model

← White: success rate of perfect recovery=100% “Compressed Sensing” Bruckstein et al. (2009)

Ignace Loris Convergence of ℓ1 algorithms

slide-13
SLIDE 13

Reconstruction error for noisy sparse recovery with ℓ1

2% noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kxinput + n Reconstruct: xrec = arg minx Kx − y2 + 2λx1 with Kxrec − y = n

Ignace Loris Convergence of ℓ1 algorithms

slide-14
SLIDE 14

Reconstruction error for noisy sparse recovery with ℓ1

2% noise, K=random matrix (elements from a Gaussian distribution) k/n n/N 1 1 Prepare data: y = Kxinput + n Reconstruct: xrec = arg minx Kx − y2 + 2λx1 with Kxrec − y = n

N =number of variables n = number of data k = # of nonzero components in xinput

Gray level ÷average error

  • xrec − xinput/xinput
  • ← reconstruction error= 2%

Loris and Verhoeven (2009).

Ignace Loris Convergence of ℓ1 algorithms

slide-15
SLIDE 15

Reconstruction error for noisy sparse recovery with ℓ1

2% noise, K =submatrix of square matrix with condition #= 104 Prepare data: y = Kxinput + n Reconstruct: xrec = arg minx Kx − y2 + 2λx1 with Kxrec − y = n

Ignace Loris Convergence of ℓ1 algorithms

slide-16
SLIDE 16

Reconstruction error for noisy sparse recovery with ℓ1

2% noise, K =submatrix of square matrix with condition #= 104 k/n n/N 1 1 Prepare data: y = Kxinput + n Reconstruct: xrec = arg minx Kx − y2 + 2λx1 with Kxrec − y = n

N =number of variables n = number of data k = # of nonzero components in xinput

Gray level ÷average error

  • xrec − xinput/xinput
  • ← reconstruction error= 2%

Loris and Verhoeven (2009).

Ignace Loris Convergence of ℓ1 algorithms

slide-17
SLIDE 17

Important consequence for evaluation of speed-up:

success of sparse recovery by ℓ1 minimisation depends on noise and condition number OUR goal here: evaluate convergence rate of iterative ℓ1 minimization algorithms: x(n)

?

− → ¯ x ≡ arg min

x

Kx − y2 + 2λx1 Hence: compare x(n) with ¯ x, not with xinput ! catch 22?

Ignace Loris Convergence of ℓ1 algorithms

slide-18
SLIDE 18

Direct method for finding ¯ x(λ)

Minimization of: F(x) = K x − y2 + 2λx1 Nonlinear variational equations: (K T(y − K ¯ x))i = λ sgn(¯ xi) if ¯ xi = 0 |(K T(y − K ¯ x))i| ≤ λ if ¯ xi = 0 Variational equations are piecewise linear ⇓

1

¯ x(λ) is piecewise linear in λ

2

a direct method exists (Efron et al., 2004; Osborne et al., 2000)

Ignace Loris Convergence of ℓ1 algorithms

slide-19
SLIDE 19

Direct method for finding ¯ x(λ)

If λ ≥ λmax ≡ maxi |(K Ty)i|, then ¯ x(λ) = 0. ¯ x(λ > 0) is found by letting λ ց Need to solve ‘small’ linear system at every step Continue until desired solution ¯ x(λn) is reached.

( E.g. when K ¯ x − y2 = n2 )

Result: ¯ x(λ) for λmax ≥ λ ≥ λmin Time-complexity: at first linear, later cubic

200 400 600 800 1000 100 200 300 400 500 600 time (s)

# nonzero components of ¯ x

Alternative: iterative method

Ignace Loris Convergence of ℓ1 algorithms

slide-20
SLIDE 20

Iterative soft-thresholding

Iterative Soft Thresholding: xn+1 = T(xn) x(0) = 0 with T(x) ≡ Sλ

  • x + K T(y − Kx)
  • and Sλ = component-wise soft-thresholding

Advantages:

1

very simple

2

F(xn) − F(¯ x) ≤ x0 − ¯ x2 2n ∀n > 0

Ignace Loris Convergence of ℓ1 algorithms

slide-21
SLIDE 21

Iterative soft-thresholding

IST is not very fast:

5h 10h 15h 20h 1 xn − ¯ x/¯ x

Real inversions: millions of variables Speed-up needed

Ignace Loris Convergence of ℓ1 algorithms

slide-22
SLIDE 22

Fast Iterative Soft-Thresholding Algorithm

Fista (Beck and Teboulle, 2008) xn+1 = T(xn + βn(xn − xn−1)) with same T(x) ≡ Sλ

  • x + K T(y − Kx)
  • and where (βn)n is a fixed sequence of numbers:

20 40 60 80 100 0.2 0.4 0.6 0.8 1.0

Advantages:

1

very simple

2

F(xn) − F(¯ x) ≤ 2x0 − ¯ x2 n2 ∀n > 0

Goes back to (Nesterov, 1983)

Ignace Loris Convergence of ℓ1 algorithms

slide-23
SLIDE 23

Other iterative methods

Projected Steepest Descent (Daubechies et al. 2008): x(n+1) = PR

  • x(n) + β(n) r(n)

(1) with r(n) = K T(y − Kx(n)) and β(n) = r(n)2/Kr(n)2. PR(·) =orthogonal projection onto an ℓ1 ball of radius R:

R

the ‘GPSR-algorithm’ (Figueiredo et al., 2007) uses auxiliary variables u, v ≥ 0 with x = u − v the ‘ℓ1-ls-algorithm’ (Kim et al., 2007), an interior point method using preconditioned conjugate gradient substeps Sparsa (Wright et al., 2009) . . . How to choose? → How to compare?

Ignace Loris Convergence of ℓ1 algorithms

slide-24
SLIDE 24

How to compare? Evaluation of speed-up

convergence: xn − ¯ x how? − → 0

(a) (b) (d) (c) (a) (e) 1 1 2h 4h 6h 8h 10m 5m 10m

Only gives an idea for one value of penalization parameter λ. → Prefer a range of λ: λmax ≥ λ ≥ λmin In one picture:

1

error: xn − ¯ x/¯ x

2

penalty: λ

3

computer time needed: t

How?

Ignace Loris Convergence of ℓ1 algorithms

slide-25
SLIDE 25

Approximation isochrones: IST

2 4 6 8 10 12 14 1 20 100 200

xn − ¯ x/¯ x ← large λ log2 λmax/λ small λ → supp ¯ x(λ)

← more sparse less sparse → 1m 10m

Ignace Loris Convergence of ℓ1 algorithms

slide-26
SLIDE 26

Approximation isochrones: FISTA

2 4 6 8 10 12 14 1 20 100 200

xn − ¯ x/¯ x ← large λ log2 λmax/λ small λ → supp ¯ x(λ)

← more sparse less sparse →

Ignace Loris Convergence of ℓ1 algorithms

slide-27
SLIDE 27

Approximation isochrones: Proj. Steepest Descent

2 4 6 8 10 12 14 1 20 100 200

xn − ¯ x/¯ x ← large λ log2 λmax/λ small λ → supp ¯ x(λ)

← more sparse less sparse →

Ignace Loris Convergence of ℓ1 algorithms

slide-28
SLIDE 28

Approximation isochrones: GPSR

2 4 6 8 10 12 14 1 20 100 200

xn − ¯ x/¯ x ← large λ log2 λmax/λ small λ → supp ¯ x(λ)

← more sparse less sparse →

Ignace Loris Convergence of ℓ1 algorithms

slide-29
SLIDE 29

Approximation isochrones: L1LS

2 4 6 8 10 12 14 1 20 100 200

xn − ¯ x/¯ x ← large λ log2 λmax/λ small λ → supp ¯ x(λ)

← more sparse less sparse →

Ignace Loris Convergence of ℓ1 algorithms

slide-30
SLIDE 30

Approximation isochrones: ‘Seismology’

2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(a) iterative thresholding

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(a’) iterative thresholding

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(b) projected steepest descent

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(c) GPSR

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(d) ℓ1-ls

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(e) FISTA

K = from seismic tomography t = 1, 2, . . . , 10m

Ignace Loris Convergence of ℓ1 algorithms

slide-31
SLIDE 31

Approximation isochrones: ‘Seismology’

2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200 2 4 6 8 10 12 14 1 20 100 200

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(a) iterative thresholding

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(a’) iterative thresholding

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(b) projected steepest descent

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(c) GPSR

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(d) ℓ1-ls

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(e) FISTA

K = from seismic tomography Severely ill-conditioned:

500 1000 1500 −10 −8 −6 −4 −2

log10 λn n

t = 1, 2, . . . , 10m

Ignace Loris Convergence of ℓ1 algorithms

slide-32
SLIDE 32

Approximation isochrones: ‘Gaussian’

2 4 6 8 10 12 14 1 100 500 1000 1500 1800 2 4 6 8 10 12 14 1 100 500 1000 1500 1800 2 4 6 8 10 12 14 1 100 500 1000 1500 1800 2 4 6 8 10 12 14 1 100 500 1000 1500 1800 2 4 6 8 10 12 14 1 100 500 1000 1500 1800 2 4 6 8 10 12 14 1 100 500 1000 1500 1800

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(a) iterative thresholding

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(a’) iterative thresholding

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(b) projected steepest descent

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(c) GPSR

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(d) ℓ1-ls

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(e) FISTA

K = random matrix Well-conditioned:

500 1000 1500 −10 −8 −6 −4 −2

log10 λn n

t = 6, 12, . . . , 60s

Ignace Loris Convergence of ℓ1 algorithms

slide-33
SLIDE 33

Approximation isochrones: Mix

2 4 6 8 10 12 14 1 100 200 300 400 500 2 4 6 8 10 12 14 1 100 200 300 400 500 2 4 6 8 10 12 14 1 100 200 300 400 500 2 4 6 8 10 12 14 1 100 200 300 400 500 2 4 6 8 10 12 14 1 100 200 300 400 500 2 4 6 8 10 12 14 1 100 200 300 400 500

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(a) iterative thresholding

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(a’) iterative thresholding

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(b) projected steepest descent

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(c) GPSR

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(d) ℓ1-ls

x(n) − ¯ x/¯ x |supp ¯ x(λ)| log2 λmax/λ

(e) FISTA

K = random matrix with spec- trum replaced ill-conditioned:

500 1000 1500 −10 −8 −6 −4 −2

log10 λn n

t = 1, 2, . . . , 10m

Ignace Loris Convergence of ℓ1 algorithms

slide-34
SLIDE 34

Continuation algorithms

Idea: start from large λ and let it descend slowly during the iteration (or equivalently letting x1 increase slowly) Goal: increase the size of the active set gradually Problem: What is a good way of decreasing λ or increasing x1 ?

Ignace Loris Convergence of ℓ1 algorithms

slide-35
SLIDE 35

Continuation algorithms

Two options used:

1

geometric decrease of λ: λn = λmax

  • λmin

λmax

n/N (Hale et al., 2008)

2

linear increase in x(n)1: Rn = n

N Rmax (Daubechies et al. 2008)

Both depend on the total # of iterations N, and on λmin or Rmax.

5 10 15 1 2 4 6 8 10 12 5 10 15 1 2 4 6 8 10 12

x(n) − ¯ x/¯ x log2 λmax/λ

¯ x(λ)1 (A) fixed-point continuation

x(n) − ¯ x/¯ x log2 λmax/λ

¯ x(λ)1 (B) adaptive steepest descent

Good for tracing out the trade-off curve in the K ¯ x(λ) − y2 vs. ¯ x(λ)1-plane for various values of λ

Ignace Loris Convergence of ℓ1 algorithms

slide-36
SLIDE 36

Conclusions

“proof of speed-up” lies in objective evaluation of algorithms:

1

compare xn with ¯ x, not with xinput (¯ x can be found)

2

use multiple values of λ

3

use different ‘types’ of operators

All algorithms do well for large penalty parameters λ

1

for large values of λ (very noisy data): SPARSA?

2

FISTA can also do well for small values of λ

3

depends on condition number of K

Continuation algorithms: PR based algorithm does somewhat better than Sλ-based Iterative algorithms yield sparse vectors at every step ⇒ (in practice) not always necessary to run till the end More: http://homepages.vub.ac.be/˜igloris

Ignace Loris Convergence of ℓ1 algorithms