Convergence of Iterative Hard Thresholding Variants with Application - PowerPoint PPT Presentation

Convergence of Iterative Hard Thresholding Variants with Application to Asynchronous Parallel Methods for Sparse Recovery Jamie Haddock Asilomar Conference on Signals, Systems, and Computers, November 4, 2019 Computational and Applied Mathematics UCLA joint with Deanna Needell, Nazanin Rahnavard, and Alireza Zaeezadeh 1

Sparse Recovery Problem Sparse Recovery : reconstruct approximately sparse x ∈ R N from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ R m × N : measurement matrix e ∈ R m : noise 2

Sparse Recovery Problem Sparse Recovery : reconstruct approximately sparse x ∈ R N from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ R m × N : measurement matrix e ∈ R m : noise Approach : min x ∈ R N � x � 1 s.t. � Ax − y �≤ ǫ or m � Ax − y � 2 s.t. � x � 0 ≤ s min x ∈ R N 1 2

Sparse Recovery Problem Sparse Recovery : reconstruct approximately sparse x ∈ R N from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ R m × N : measurement matrix e ∈ R m : noise Applications : Approach : ⊲ image reconstruction min x ∈ R N � x � 1 s.t. � Ax − y �≤ ǫ or ⊲ hyper spectral imaging m � Ax − y � 2 s.t. � x � 0 ≤ s min x ∈ R N 1 ⊲ wireless communications ⊲ analog to digital conversion 2

Algorithmic Approaches Convex optimization : ⊲ linear programming ⊲ (proximal) gradient descent ⊲ coordinate descent ⊲ stochastic iterative methods (SGD) 3

Algorithmic Approaches Greedy pursuits : Convex optimization : ⊲ orthogonal matching pursuit ⊲ linear programming (OMP) ⊲ (proximal) gradient descent ⊲ regularized OMP (ROMP) ⊲ coordinate descent ⊲ compressive sampling matching pursuit (CoSaMP) ⊲ stochastic iterative methods (SGD) ⊲ iterative hard thresholding (IHT) 3

Algorithmic Approaches Greedy pursuits : Convex optimization : ⊲ orthogonal matching pursuit ⊲ linear programming (OMP) ⊲ (proximal) gradient descent ⊲ regularized OMP (ROMP) ⊲ coordinate descent ⊲ compressive sampling matching pursuit (CoSaMP) ⊲ stochastic iterative methods (SGD) ⊲ iterative hard thresholding (IHT) IHT : x ( n +1) = H k ( x ( n ) + A T ( y − A x ( n ) )) 3

StoIHT 1 1 Nguyen, Needell, Woolf, IEEE Transactions on Information Theory ’17 4

Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x 5

Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary 5

Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches 5

Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches m � Ax − y � 2 s.t. � x � 0 ≤ s is dense in x Challenge : objective of min x ∈ R N 1 5

Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches m � Ax − y � 2 s.t. � x � 0 ≤ s is dense in x Challenge : objective of min x ∈ R N 1 ⊲ likely that same non-zero entries are updated from one iteration to the next 5

Asynchronous Parallelization Asynchronous approaches : popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches m � Ax − y � 2 s.t. � x � 0 ≤ s is dense in x Challenge : objective of min x ∈ R N 1 ⊲ likely that same non-zero entries are updated from one iteration to the next ⊲ a slow core could easily “undo” the progress of previous updates by faster cores 5

Asynchronous StoIHT 2 2 Needell, Woolf, Proc. Information Theory and Applications ’17 6

Bayesian Asynchronous StoIHT Require: Number of subproblems, M , and probability of selection p ( B ). The reliability score distribution parameters, ˆ i and ˆ β 1 β 0 i , and the tally a 1 a 0 scores parameters, ˆ n and ˆ n , are available to each processor. Each processor performs the following at each iteration: 1: randomize: select B t ∈ [ M ] with probability p ( B t ) b ( t ) = x ( t ) + Mp ( B t ) A ∗ B t ( y B t − A B t x ( t ) ) 2: proxy: γ S ( t ) = supp s ( b ( t ) ) and ˜ T ( t ) = supp s ( φ ) ˆ 3: identify: x ( t +1) = b ( t ) 4: estimate: S ( t ) ∪ ˜ ˆ T ( t ) 5: repeat update E Q { u ni } { u ni } = Q { u ni = 1 } 6: update ˆ i and ˆ β 1 β 0 a 1 a 0 7: i , ˆ n and ˆ n 8: until convergence 9: update φ 10: t = t + 1 2 Zaeemzadeh, H., Rahnavard, Needell, Proc. 49th Asilomar Conf. on Signals, 7 Systems and Computers ’18

Experimental Convergence 8

Tools for Analysis First step : analyze IHT variant running on each node of parallel system 9

Tools for Analysis First step : analyze IHT variant running on each node of parallel system k : x ( n +1) = H k , ˜ k ( x ( n ) + A T ( y − A x ( n ) )) IHT k , ˜ 9

Tools for Analysis First step : analyze IHT variant running on each node of parallel system k : x ( n +1) = H k , ˜ k ( x ( n ) + A T ( y − A x ( n ) )) IHT k , ˜ Non-Symmetric Isometry Property: (1 − β k ) � z � 2 2 ≤ � A z � 2 2 ≤ � z � 2 2 for all k -sparse z 9

Convergence of IHT k , ˜ k Theorem (H., Needell, Zaeemzadeh, Rahnavard ’19+) k < 1 If A has the non-symmetric restricted isometry property with β 3 k +2˜ 8 , then in iteration n, the IHT k , ˜ k algorithms with input observations y = Ax + e recover the approximation x ( n ) with � x − x ( n ) �≤ 2 − n � x k � +5 � x − x k � + 4 � x − x k � 1 +4 � e � . √ k 10

An Improved Scenario Theorem (H., Needell, Zaeemzadeh, Rahnavard ’19+) Suppose the signal x has constant values on its support, and the ˜ k indices selected (non-greedily) by the IHT k , ˜ k algorithm each lie uniformly in the support of x with probability p. If A has the non-symmetric k < 1 restricted isometry property with β 3 k +2˜ 8 , then in iteration n, the IHT k , ˜ k algorithms with input observations y = Ax + e recover the approximation x ( n ) with k � x − x ( n ) � ≤ 2 − n � x � +5 E ˜ x ( n ) � k � x − ˜ E ˜ + 4 x ( n ) � 1 +4 � e � √ E ˜ k � x − ˜ k 5 α + 4 α � � ≤ 2 − n � x � + √ � x � 1 +4 � e � k | supp ( x ) |− k | supp ( x ) |− p ˜ � �� k where α = . | supp ( x ) | | supp ( x ) | 11

Experimental Convergence of IHT k , ˜ k Figure 1: Plot of error � x − x ( n ) � vs. iteration for 100 iterations of IHT k , ˜ k with various probabilities p that the ˜ k indices lie in supp( x ). 12

Rate of Support Intersection 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 300 350 400 (a) 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 300 350 400 (b) Figure 2: The rate at which the shared indices between nodes lie in the true support of signal x for iterations of (a) AStoIHT and (b) BAStoIHT. 13

Conclusions and Future Work 14

Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant 14

Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence 14

Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version 14

Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version ⊲ analyze StoIHT k , ˜ k 14

Conclusions and Future Work ⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version ⊲ analyze StoIHT k , ˜ k ⊲ extend to non-heuristic analysis of Asynchronous StoIHT 14

Thanks for listening! Questions? [1] J. Haddock, D. Needell, N. Rahnavard, and A. Zaeemzadeh. Convergence of iterative hard thresholding variants with application to asynchronous parallel methods for sparse recovery. In Proc. Asilomar Conf. Sig. Sys. Comp., 2019. [2] Deanna Needell and Tina Woolf. An asynchronous parallel approach to sparse recovery. In 2017 Information Theory and Applications Workshop (ITA), pages 1–5. IEEE, 2 2017. [3] Nam Nguyen, Deanna Needell, and Tina Woolf. Linear Convergence of Stochastic Iterative Greedy Algorithms With Sparse Constraints. IEEE Transactions on Information Theory, 63(11):6869–6895, 11 2017. [4] A. Zaeemzadeh, J. Haddock, N. Rahnavard, and D. Needell. A Bayesian approach for asynchronous parallel sparse recovery. In Proc. Asilomar Conf. Sig. Sys. Comp., 2018. 15

Convergence of Iterative Hard Thresholding Variants with Application - PowerPoint PPT Presentation

Convergence of Iterative Hard Thresholding Variants with Application to Asynchronous Parallel Methods for Sparse Recovery Jamie Haddock Asilomar Conference on Signals, Systems, and Computers, November 4, 2019 Computational and Applied

Thresholding of Text Documents Oliver A Nina William A Barrett Thresholding or Binarization

Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity Optimization Baojian Zhou 1 ,

Consensus Variants Usman Mazhar Mirza 6/17/2013 1 Consensus Variants In the variants we

Iterative Methods Mostly for SPD systems Iterative Linear conjugate gradient and its variants

Convergence o of Iterative V Voting Omer Lev & Jeffrey S. Rosenschein AAMAS 2012

The game Euclid , its variants, and continued fractions Nhan Bao Ho 23 April 2014 Nhan Bao Ho

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea

On the convergence rate of iterative 1 minimization algorithms Ignace Loris Applied Inverse

Matrix estimation by Universal Singular Value Thresholding Sourav Chatterjee Courant Institute,

Score Distribution Based Term Specific Thresholding for Spoken Term Detection D. Can M. Sarac

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

CS475 / CM375 Lecture 23: Nov 29, 2011 Convergence of Iterative Methods CS475/CM375 (c) 2011 P.

Minor variants in HIV-1 Minor variants in HIV-1 Why? Why? University of Cologne Institute of

Influence of the K103N minor variants in Influence of the K103N minor variants in therapy-nave

Drive-by Haskell Contributions Neil Mitchell http://ndmitchell.com Getting started contributing

802.1 Plenary - 03/2013 Orlando Closing Agenda 802.1 officers etc Officers Chair: Tony

802.1 Plenary - 11/2008 Closing Agenda The following are 802.1 voters: Aboul-Magd, Osama Goetz,

11-823: Conlanging Building a Talking Clock Festival Speech Synthesis System

Intro, packages & tools Advanced functional programming - Lecture 1 Wouter Swierstra and

definite comparative descriptions the more superlative-like comparative construction Elizabeth

Tools Advanced Functional Programming Summer School 2019 Alejandro Serrano 1 Main tools in the

Split-scope definites How the can mean two things at once Dylan Bumford 18 February 2016

Convergence of Iterative Hard Thresholding Variants with Application - PowerPoint PPT Presentation

Convergence of Iterative Hard Thresholding Variants with Application to Asynchronous Parallel Methods for Sparse Recovery Jamie Haddock Asilomar Conference on Signals, Systems, and Computers, November 4, 2019 Computational and Applied

Thresholding of Text Documents Oliver A Nina William A Barrett Thresholding or Binarization

Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity Optimization Baojian Zhou 1 ,

Consensus Variants Usman Mazhar Mirza 6/17/2013 1 Consensus Variants In the variants we

Iterative Methods Mostly for SPD systems Iterative Linear conjugate gradient and its variants

Convergence o of Iterative V Voting Omer Lev &amp; Jeffrey S. Rosenschein AAMAS 2012

The game Euclid , its variants, and continued fractions Nhan Bao Ho 23 April 2014 Nhan Bao Ho

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea

On the convergence rate of iterative 1 minimization algorithms Ignace Loris Applied Inverse

Matrix estimation by Universal Singular Value Thresholding Sourav Chatterjee Courant Institute,

Score Distribution Based Term Specific Thresholding for Spoken Term Detection D. Can M. Sarac

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

CS475 / CM375 Lecture 23: Nov 29, 2011 Convergence of Iterative Methods CS475/CM375 (c) 2011 P.

Minor variants in HIV-1 Minor variants in HIV-1 Why? Why? University of Cologne Institute of

Influence of the K103N minor variants in Influence of the K103N minor variants in therapy-nave

Drive-by Haskell Contributions Neil Mitchell http://ndmitchell.com Getting started contributing

802.1 Plenary - 03/2013 Orlando Closing Agenda 802.1 officers etc Officers Chair: Tony

802.1 Plenary - 11/2008 Closing Agenda The following are 802.1 voters: Aboul-Magd, Osama Goetz,

11-823: Conlanging Building a Talking Clock Festival Speech Synthesis System

Intro, packages &amp; tools Advanced functional programming - Lecture 1 Wouter Swierstra and

definite comparative descriptions the more superlative-like comparative construction Elizabeth

Tools Advanced Functional Programming Summer School 2019 Alejandro Serrano 1 Main tools in the

Split-scope definites How the can mean two things at once Dylan Bumford 18 February 2016

Convergence o of Iterative V Voting Omer Lev & Jeffrey S. Rosenschein AAMAS 2012

Intro, packages & tools Advanced functional programming - Lecture 1 Wouter Swierstra and