Convergence of Iterative Hard Thresholding Variants with Application - - PowerPoint PPT Presentation

convergence of iterative hard thresholding variants with
SMART_READER_LITE
LIVE PREVIEW

Convergence of Iterative Hard Thresholding Variants with Application - - PowerPoint PPT Presentation

Convergence of Iterative Hard Thresholding Variants with Application to Asynchronous Parallel Methods for Sparse Recovery Jamie Haddock Asilomar Conference on Signals, Systems, and Computers, November 4, 2019 Computational and Applied


slide-1
SLIDE 1

Convergence of Iterative Hard Thresholding Variants with Application to Asynchronous Parallel Methods for Sparse Recovery

Jamie Haddock Asilomar Conference on Signals, Systems, and Computers, November 4, 2019

Computational and Applied Mathematics UCLA

joint with Deanna Needell, Nazanin Rahnavard, and Alireza Zaeezadeh

1

slide-2
SLIDE 2

Sparse Recovery Problem

Sparse Recovery: reconstruct approximately sparse x ∈ RN from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ Rm×N: measurement matrix e ∈ Rm: noise

2

slide-3
SLIDE 3

Sparse Recovery Problem

Sparse Recovery: reconstruct approximately sparse x ∈ RN from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ Rm×N: measurement matrix e ∈ Rm: noise Approach: minx∈RNx1 s.t. Ax − y≤ ǫ

  • r

minx∈RN 1

mAx − y2 s.t. x0≤ s 2

slide-4
SLIDE 4

Sparse Recovery Problem

Applications: ⊲ image reconstruction ⊲ hyper spectral imaging ⊲ wireless communications ⊲ analog to digital conversion Sparse Recovery: reconstruct approximately sparse x ∈ RN from few nonadaptive, linear, and noisy measurements, y = Ax + e A ∈ Rm×N: measurement matrix e ∈ Rm: noise Approach: minx∈RNx1 s.t. Ax − y≤ ǫ

  • r

minx∈RN 1

mAx − y2 s.t. x0≤ s 2

slide-5
SLIDE 5

Algorithmic Approaches

Convex optimization: ⊲ linear programming ⊲ (proximal) gradient descent ⊲ coordinate descent ⊲ stochastic iterative methods (SGD)

3

slide-6
SLIDE 6

Algorithmic Approaches

Convex optimization: ⊲ linear programming ⊲ (proximal) gradient descent ⊲ coordinate descent ⊲ stochastic iterative methods (SGD) Greedy pursuits: ⊲ orthogonal matching pursuit (OMP) ⊲ regularized OMP (ROMP) ⊲ compressive sampling matching pursuit (CoSaMP) ⊲ iterative hard thresholding (IHT)

3

slide-7
SLIDE 7

Algorithmic Approaches

Convex optimization: ⊲ linear programming ⊲ (proximal) gradient descent ⊲ coordinate descent ⊲ stochastic iterative methods (SGD) Greedy pursuits: ⊲ orthogonal matching pursuit (OMP) ⊲ regularized OMP (ROMP) ⊲ compressive sampling matching pursuit (CoSaMP) ⊲ iterative hard thresholding (IHT) IHT: x(n+1) = Hk(x(n) + AT(y − Ax(n)))

3

slide-8
SLIDE 8

StoIHT

1

1Nguyen, Needell, Woolf, IEEE Transactions on Information Theory ’17

4

slide-9
SLIDE 9

Asynchronous Parallelization

Asynchronous approaches: popular when the objective functions are sparse in x

5

slide-10
SLIDE 10

Asynchronous Parallelization

Asynchronous approaches: popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary

5

slide-11
SLIDE 11

Asynchronous Parallelization

Asynchronous approaches: popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches

5

slide-12
SLIDE 12

Asynchronous Parallelization

Asynchronous approaches: popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches Challenge: objective of minx∈RN 1

mAx − y2 s.t. x0≤ s is dense in x 5

slide-13
SLIDE 13

Asynchronous Parallelization

Asynchronous approaches: popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches Challenge: objective of minx∈RN 1

mAx − y2 s.t. x0≤ s is dense in x

⊲ likely that same non-zero entries are updated from one iteration to the next

5

slide-14
SLIDE 14

Asynchronous Parallelization

Asynchronous approaches: popular when the objective functions are sparse in x ⊲ all cores run simultaneously accessing and updating shared memory as necessary ⊲ eliminates idle time of synchronous approaches Challenge: objective of minx∈RN 1

mAx − y2 s.t. x0≤ s is dense in x

⊲ likely that same non-zero entries are updated from one iteration to the next ⊲ a slow core could easily “undo” the progress of previous updates by faster cores

5

slide-15
SLIDE 15

Asynchronous StoIHT

2

2Needell, Woolf, Proc. Information Theory and Applications ’17

6

slide-16
SLIDE 16

Bayesian Asynchronous StoIHT

Require: Number of subproblems, M, and probability of selection p(B). The reliability score distribution parameters, ˆ β1

i and ˆ

β0

i , and the tally

scores parameters, ˆ a1

n and ˆ

a0

n, are available to each processor.

Each processor performs the following at each iteration:

1: randomize:

select Bt ∈ [M] with probability p(Bt)

2: proxy:

b(t) = x(t) +

γ Mp(Bt)A∗ Bt(yBt − ABtx(t))

3: identify:

ˆ S(t) = supps(b(t)) and ˜ T (t) = supps(φ)

4: estimate:

x(t+1) = b(t)

ˆ S(t)∪ ˜ T (t)

5: repeat 6:

update EQ{uni}{uni} = Q{uni = 1}

7:

update ˆ β1

i and ˆ

β0

i , ˆ

a1

n and ˆ

a0

n

8: until convergence 9: update φ 10: t = t + 1

2Zaeemzadeh, H., Rahnavard, Needell, Proc. 49th Asilomar Conf. on Signals,

Systems and Computers ’18

7

slide-17
SLIDE 17

Experimental Convergence

8

slide-18
SLIDE 18

Tools for Analysis

First step: analyze IHT variant running on each node of parallel system

9

slide-19
SLIDE 19

Tools for Analysis

First step: analyze IHT variant running on each node of parallel system IHTk,˜

k: x(n+1) = Hk,˜ k(x(n) + AT(y − Ax(n))) 9

slide-20
SLIDE 20

Tools for Analysis

First step: analyze IHT variant running on each node of parallel system IHTk,˜

k: x(n+1) = Hk,˜ k(x(n) + AT(y − Ax(n)))

Non-Symmetric Isometry Property: (1 − βk)z2

2≤ Az2 2≤ z2 2 for all k-sparse z 9

slide-21
SLIDE 21

Convergence of IHTk,˜

k

Theorem (H., Needell, Zaeemzadeh, Rahnavard ’19+) If A has the non-symmetric restricted isometry property with β3k+2˜

k < 1 8,

then in iteration n, the IHTk,˜

k algorithms with input observations

y = Ax + e recover the approximation x(n) with x − x(n)≤ 2−nxk+5x − xk+ 4 √ k x − xk1+4e.

10

slide-22
SLIDE 22

An Improved Scenario

Theorem (H., Needell, Zaeemzadeh, Rahnavard ’19+) Suppose the signal x has constant values on its support, and the ˜ k indices selected (non-greedily) by the IHTk,˜

k algorithm each lie uniformly

in the support of x with probability p. If A has the non-symmetric restricted isometry property with β3k+2˜

k < 1 8, then in iteration n, the

IHTk,˜

k algorithms with input observations y = Ax + e recover the

approximation x(n) with E˜

kx − x(n) ≤ 2−nx+5E˜ kx − ˜

x(n) + 4 √ k E˜

kx − ˜

x(n)1+4e ≤ 2−nx+

  • 5α + 4α

√ k

  • x1+4e

where α =

  • |supp(x)|−k

|supp(x)|

  • |supp(x)|−p˜

k |supp(x)|

  • .

11

slide-23
SLIDE 23

Experimental Convergence of IHTk,˜

k

Figure 1: Plot of error x − x(n) vs. iteration for 100 iterations of IHTk,˜

k with

various probabilities p that the ˜ k indices lie in supp(x).

12

slide-24
SLIDE 24

Rate of Support Intersection

50 100 150 200 250 300 350 400 0.2 0.4 0.6 0.8 1

(a)

50 100 150 200 250 300 350 400 0.2 0.4 0.6 0.8 1

(b)

Figure 2: The rate at which the shared indices between nodes lie in the true support of signal x for iterations of (a) AStoIHT and (b) BAStoIHT.

13

slide-25
SLIDE 25

Conclusions and Future Work

14

slide-26
SLIDE 26

Conclusions and Future Work

⊲ provided a convergence analysis for an IHT variant

14

slide-27
SLIDE 27

Conclusions and Future Work

⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence

14

slide-28
SLIDE 28

Conclusions and Future Work

⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version

14

slide-29
SLIDE 29

Conclusions and Future Work

⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version ⊲ analyze StoIHTk,˜

k 14

slide-30
SLIDE 30

Conclusions and Future Work

⊲ provided a convergence analysis for an IHT variant ⊲ identified scenario when IHT variant has potentially faster convergence ⊲ provided heuristic for why asynchronous versions of StoIHT converge faster than non-parallelized version ⊲ analyze StoIHTk,˜

k

⊲ extend to non-heuristic analysis of Asynchronous StoIHT

14

slide-31
SLIDE 31

Thanks for listening!

Questions?

[1] J. Haddock, D. Needell, N. Rahnavard, and A. Zaeemzadeh. Convergence of iterative hard thresholding variants with application to asynchronous parallel methods for sparse recovery. In Proc. Asilomar Conf. Sig. Sys. Comp., 2019. [2] Deanna Needell and Tina Woolf. An asynchronous parallel approach to sparse

  • recovery. In 2017 Information Theory and Applications Workshop (ITA), pages

1–5. IEEE, 2 2017. [3] Nam Nguyen, Deanna Needell, and Tina Woolf. Linear Convergence of Stochastic Iterative Greedy Algorithms With Sparse Constraints. IEEE Transactions on Information Theory, 63(11):6869–6895, 11 2017. [4] A. Zaeemzadeh, J. Haddock, N. Rahnavard, and D. Needell. A Bayesian approach for asynchronous parallel sparse recovery. In Proc. Asilomar Conf. Sig. Sys. Comp., 2018.

15