on the convergence rate
play

On the convergence rate of iterative 1 minimization algorithms - PowerPoint PPT Presentation

On the convergence rate of iterative 1 minimization algorithms Ignace Loris Applied Inverse Problems Vienna 21/7/2009 Ignace Loris Convergence of 1 algorithms Theme comparison of iterative algorithms for sparse recovery: sparse


  1. On the convergence rate of iterative ℓ 1 minimization algorithms Ignace Loris Applied Inverse Problems Vienna 21/7/2009 Ignace Loris Convergence of ℓ 1 algorithms

  2. Theme comparison of iterative algorithms for sparse recovery: sparse ↔ “ ℓ 1 ” ← algorithms influence of noise and ill-conditioning on effectiveness � influence of noise and ill-conditioning on speed Ignace Loris Convergence of ℓ 1 algorithms

  3. Overview sparsity through ℓ 1 penalty: illustration on toy example When does it work? ↔ How to make it work? Speed comparison of iterative algorithms Speed comparison of ‘continuation’ algorithms Ignace Loris Convergence of ℓ 1 algorithms

  4. Setting Data vector: y (noisy) Model vector: x (coefficients in some basis) Linear relationship: Kx = y Problems: insufficient data, inconsistent data, ill-conditioning of K Solution: minimize a penalized least-squares functional: x rec = arg min � Kx − y � 2 + penalty x Ignace Loris Convergence of ℓ 1 algorithms

  5. Old and new penalties Traditional: ℓ 2 -norm penalties x rec = arg min � Kx − y � 2 + λ � x � 2 x (advantage: linear equations) Sometimes better reconstructions by using other information Suppose the desired model x is sparse (few nonzero components) In that case, ℓ 1 penalty is suitable (Donoho et al. 2006 etc): x rec = arg min � Kx − y � 2 + 2 λ � x � 1 x Ignace Loris Convergence of ℓ 1 algorithms

  6. Question 1: When does it work? Experiment: x input = sparse Input : y = Kx + n Data : x rec ℓ 1 reconstruction : Compare x rec to x input When does this work? Ignace Loris Convergence of ℓ 1 algorithms

  7. ℓ 1 penalty for sparse recovery: toy example K x input 0.8 30 25 0.6 20 15 10 0.4 5 0 0.2 0 20 40 60 80 100 20 40 60 80 100 y = Kx input + n Synthetic noisy data: Ignace Loris Convergence of ℓ 1 algorithms

  8. ℓ 1 penalty for sparse recovery: toy example K 30 25 20 15 10 5 0 0 20 40 60 80 100 y = Kx input + n Synthetic noisy data: � Kx − y � 2 + λ � x � 2 � Kx − y � 2 + 2 λ � x � 1 min min x x Ignace Loris Convergence of ℓ 1 algorithms

  9. ℓ 1 penalty for sparse recovery: toy example K 30 25 20 15 10 5 0 0 20 40 60 80 100 y = Kx input + n Synthetic noisy data: � Kx − y � 2 + λ � x � 2 � Kx − y � 2 + 2 λ � x � 1 min min x x 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 20 40 60 80 100 20 40 60 80 100 Ignace Loris Convergence of ℓ 1 algorithms

  10. ℓ 1 penalty for sparse recovery: toy example K x input 0.8 30 25 0.6 20 15 10 0.4 5 0 0.2 0 20 40 60 80 100 20 40 60 80 100 y = Kx input + n Synthetic noisy data: � Kx − y � 2 + λ � x � 2 � Kx − y � 2 + 2 λ � x � 1 min min x x 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 20 40 60 80 100 20 40 60 80 100 Ignace Loris Convergence of ℓ 1 algorithms

  11. Success rate of noiseless sparse recovery with ℓ 1 no noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kx input Reconstruct: x rec = arg min Kx = y � x � 1 Ignace Loris Convergence of ℓ 1 algorithms

  12. Success rate of noiseless sparse recovery with ℓ 1 no noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kx input 1 Reconstruct: x rec = arg min Kx = y � x � 1 Gray level ÷ probability of x rec = x input N = number of variables k / n n = number of data k = # of nonzero components in model ← White: success rate of perfect recovery=100% “Compressed Sensing” n / N 1 Bruckstein et al. (2009) Ignace Loris Convergence of ℓ 1 algorithms

  13. Reconstruction error for noisy sparse recovery with ℓ 1 2% noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kx input + n Reconstruct: x rec = arg min x � Kx − y � 2 + 2 λ � x � 1 with � Kx rec − y � = � n � Ignace Loris Convergence of ℓ 1 algorithms

  14. Reconstruction error for noisy sparse recovery with ℓ 1 2% noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kx input + n 1 Reconstruct: x rec = arg min x � Kx − y � 2 + 2 λ � x � 1 with � Kx rec − y � = � n � Gray level ÷ average error � x rec − x input � / � x input � � � N = number of variables k / n n = number of data k = # of nonzero components in x input ← reconstruction error= 2% n / N 1 Loris and Verhoeven (2009). Ignace Loris Convergence of ℓ 1 algorithms

  15. Reconstruction error for noisy sparse recovery with ℓ 1 2% noise, K = submatrix of square matrix with condition # = 10 4 Prepare data: y = Kx input + n Reconstruct: x rec = arg min x � Kx − y � 2 + 2 λ � x � 1 with � Kx rec − y � = � n � Ignace Loris Convergence of ℓ 1 algorithms

  16. Reconstruction error for noisy sparse recovery with ℓ 1 2% noise, K = submatrix of square matrix with condition # = 10 4 Prepare data: y = Kx input + n 1 Reconstruct: x rec = arg min x � Kx − y � 2 + 2 λ � x � 1 with � Kx rec − y � = � n � Gray level ÷ average error � x rec − x input � / � x input � � � N = number of variables k / n n = number of data k = # of nonzero components in x input ← reconstruction error= 2% n / N 1 Loris and Verhoeven (2009). Ignace Loris Convergence of ℓ 1 algorithms

  17. Important consequence for evaluation of speed-up: success of sparse recovery by ℓ 1 minimisation depends on noise and condition number OUR goal here: evaluate convergence rate of iterative ℓ 1 minimization algorithms: x ( n ) � Kx − y � 2 + 2 λ � x � 1 ? x ≡ arg min → ¯ − x Hence: compare x ( n ) with ¯ x , not with x input ! catch 22? Ignace Loris Convergence of ℓ 1 algorithms

  18. x ( λ ) Direct method for finding ¯ Minimization of: F ( x ) = � K x − y � 2 + 2 λ � x � 1 Nonlinear variational equations: � ( K T ( y − K ¯ x )) i = λ sgn (¯ x i ) x i � = 0 ¯ if | ( K T ( y − K ¯ x )) i | ≤ λ x i = 0 ¯ if Variational equations are piecewise linear ⇓ x ( λ ) is piecewise linear in λ ¯ 1 a direct method exists 2 (Efron et al., 2004; Osborne et al., 2000) Ignace Loris Convergence of ℓ 1 algorithms

  19. x ( λ ) Direct method for finding ¯ If λ ≥ λ max ≡ max i | ( K T y ) i | , then ¯ x ( λ ) = 0. x ( λ > 0 ) is found by letting λ ց ¯ Need to solve ‘small’ linear system at every step x ( λ n ) is reached. Continue until desired solution ¯ x − y � 2 = � n � 2 ) ( E.g. when � K ¯ x ( λ ) for λ max ≥ λ ≥ λ min Result: ¯ Time-complexity: at first linear, later cubic 600 time (s) 500 400 300 200 100 200 400 600 800 1000 x # nonzero components of ¯ Alternative: iterative method Ignace Loris Convergence of ℓ 1 algorithms

  20. Iterative soft-thresholding Iterative Soft Thresholding: x ( 0 ) = 0 x n + 1 = T ( x n ) with x + K T ( y − Kx ) � � T ( x ) ≡ S λ and S λ = component-wise soft-thresholding Advantages: very simple 1 x ) ≤ � x 0 − ¯ x � 2 F ( x n ) − F (¯ ∀ n > 0 2 2 n Ignace Loris Convergence of ℓ 1 algorithms

  21. Iterative soft-thresholding IST is not very fast: 1 � x n − ¯ x � / � ¯ x � 0 5h 10h 15h 20h Real inversions: millions of variables Speed-up needed Ignace Loris Convergence of ℓ 1 algorithms

  22. Fast Iterative Soft-Thresholding Algorithm Fista (Beck and Teboulle, 2008) x n + 1 = T ( x n + β n ( x n − x n − 1 )) with same � x + K T ( y − Kx ) � T ( x ) ≡ S λ 1.0 0.8 and where ( β n ) n is a fixed sequence of numbers: 0.6 0.4 0.2 0 20 40 60 80 100 Advantages: very simple 1 x ) ≤ 2 � x 0 − ¯ x � 2 F ( x n ) − F (¯ ∀ n > 0 2 n 2 Goes back to (Nesterov, 1983) Ignace Loris Convergence of ℓ 1 algorithms

  23. Other iterative methods Projected Steepest Descent (Daubechies et al. 2008): x ( n ) + β ( n ) r ( n ) � x ( n + 1 ) = P R � (1) with r ( n ) = K T ( y − Kx ( n ) ) and β ( n ) = � r ( n ) � 2 / � Kr ( n ) � 2 . P R ( · ) = orthogonal projection onto an ℓ 1 ball of radius R : R the ‘GPSR-algorithm’ (Figueiredo et al., 2007) uses auxiliary variables u , v ≥ 0 with x = u − v the ‘ ℓ 1 -ls-algorithm’ (Kim et al., 2007), an interior point method using preconditioned conjugate gradient substeps Sparsa (Wright et al., 2009) . . . How to choose? → How to compare? Ignace Loris Convergence of ℓ 1 algorithms

  24. How to compare? Evaluation of speed-up x � how ? � x n − ¯ convergence: − → 0 1 1 (a) (e) 0 5m 10m (b) (a) (c) (d) 0 10m 2h 4h 6h 8h Only gives an idea for one value of penalization parameter λ . → Prefer a range of λ : λ max ≥ λ ≥ λ min In one picture: error: � x n − ¯ x � / � ¯ x � 1 penalty: λ 2 computer time needed: t 3 How? Ignace Loris Convergence of ℓ 1 algorithms

  25. Approximation isochrones: IST 20 100 200 x ( λ ) supp ¯ ← more sparse less sparse → 1 1m x � x � / � ¯ � x n − ¯ 10m 0 0 2 4 6 8 10 12 14 ← large λ log 2 λ max /λ small λ → Ignace Loris Convergence of ℓ 1 algorithms

  26. Approximation isochrones: FISTA 20 100 200 x ( λ ) supp ¯ ← more sparse less sparse → 1 x � x � / � ¯ � x n − ¯ 0 0 2 4 6 8 10 12 14 ← large λ log 2 λ max /λ small λ → Ignace Loris Convergence of ℓ 1 algorithms

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend