sequential compressed sensing
play

Sequential Compressed Sensing Dmitry Malioutov DRW, research done - PowerPoint PPT Presentation

Sequential Compressed Sensing Dmitry Malioutov DRW, research done mostly at MIT Joint work with Sujay Sanghavi and Alan Willsky September 03, 2009 1 Motivation Many important classes of signals are either sparse, or compressible. Examples:


  1. Sequential Compressed Sensing Dmitry Malioutov DRW, research done mostly at MIT Joint work with Sujay Sanghavi and Alan Willsky September 03, 2009 1

  2. Motivation Many important classes of signals are either sparse, or compressible. Examples: sparsity in : (a) standard, (b) Fourier, (c) wavelet basis. 2 2 50 1.5 100 1.5 1 150 1 0.5 200 0 0.5 250 −0.5 300 0 350 −1 −0.5 400 −1.5 450 −1 −2 0 20 40 60 80 100 120 0 50 100 150 100 200 300 400 500 70 60 100 50 200 40 30 300 20 400 10 500 0 0 50 100 150 100 200 300 400 500 CS: K -sparse x ∈ R N . We take M << N measurements y = A x + n , and try to recover x knowing that it is sparse. Related problems: recovering structure of graphical models from samples, recovering low-rank matrices from few measurements, ... 2

  3. Motivation CS: for certain random A , x can be efficiently recovered with high prob. after O ( K log( N/K )) samples, where x is K -sparse. Req. M for signal with K = 10. 100 However: Gaussian 80 60 • may not know K a-priori 40 20 • such bounds are not available for 0 15 20 25 30 35 40 45 50 all decoders 100 Bernoulli • constants may not be tight. 80 60 How many samples to get? 40 20 0 15 20 25 30 35 40 45 50 Our approach: receive samples sequentially y i = a ′ i x and stop once we know that enough samples have been received. 3

  4. Presentation Outline 1. CS formulation with sequential observations. 2. Stopping rule for the Gaussian case. 3. Stopping rule for the Bernoulli case. 4. Near-sparse and noisy signals. 5. Efficient solution of the sequential problem. 4

  5. Batch CS Batch CS: suppose y = A x ∗ . Find the sparsest x satisfying y = A x . Relaxations: greedy methods, convex ℓ 1 , non-convex ℓ p , sparse Bayesian learning, message passing, e.t.c. – these all give sparse solutions. How to verify that the solution also recovers x ∗ ? 0.4 0.2 0 Top plot: reconstruction from M = 20 −0.2 M = 20 samples, N = 80. −0.4 0 10 20 30 40 50 60 70 80 0.5 Bottom plot: using M = 21 sam- M = 21 ples (correct). 0 −0.5 0 10 20 30 40 50 60 70 80 To guarantee correct reconstruction with high probability - we need a superfluous number of samples to be on the ’safe side’. 5

  6. Sequential CS formulation Observations are available in sequence: y i = a ′ i x ∗ , i = 1 , .., M . At step M we use any sparse decoder to get a feasible solution ˆ x M , e.g. the ℓ 1 decoder: a ′ x M = arg min || x || 1 ˆ s.t. i x = y i , i = 1 , .., M and either declare victory and stop, or ask for another sample. Q: How does one know when enough samples have been received? Waiting for M ∝ CK log( N/K ): requires knowledge of K , K = � x ∗ � 0 . Also only rough bounds on proportionality constants may be known, and not even for all algorithms. 6

  7. Gaussian measurement case Receive y i = a ′ i x ∗ , where a i ∼ N (0 , I ) i.i.d. Gaussian samples. x M +1 = ˆ x M then ˆ x M = x ∗ with probability 1. Claim : if ˆ A M � [ a ′ 1 , ... a ′ M ] ′ x M ˆ x ∗ a T M +1 x = y M +1 A M x = y 1: M A new sample a ′ M +1 x = y M +1 passes a random hyperplane through x M is zero. x ∗ . Probability that this hyperplane also goes through ˆ 7

  8. Gaussian case (continued) x M = y M +1 x M � 0 < M or if (ii) a ′ Even simpler rules: (i) if � ˆ M +1 ˆ x M = x ∗ . then ˆ This works because for a random Gaussian matrix all M × M submatrices are non-singular with prob. 1. ||x|| 0 40 30 20 10 Example: N = 100, and K = 10. 0 0 5 10 15 20 25 30 35 40 ||x|| 1 15 x M � 0 . Top plot: � ˆ 10 5 x M � 1 . Midle plot: � ˆ 0 0 5 10 15 20 25 30 35 40 Bottom plot: � x ∗ − ˆ ||y − A x|| 2 x M � 2 . 6 4 2 0 0 5 10 15 20 25 30 35 40 M 8

  9. Bernoulli case Let a i have equiprobable i.i.d. Bernoulli entries ± 1. Now M × M submatrices of A M can be singular (non-0 probability). The stopping rule for the Gaussian case does not hold. We modify x M = ˆ x M +1 = ... = ˆ x M + T . it as follows: wait until ˆ x M + T � = x ∗ ) < 2 − T . Claim : After T -step agreement P (ˆ Proof depends on Lemma (Tao and Vu): Let a ∈ {− 1 , 1 } N be an i.i.d. equiprobable Bernoulli. Let W be a fixed d -dimensional subspace of R N , 0 ≤ d < N . Then P ( a ∈ W ) ≤ 2 d − N . x M � = x ∗ . Let J and I be their supports, L = |I ∪ J | . Suppose ˆ x M − x ∗ ) ′ a I∪J = 0 } is an ( L − 1)-dim. Then A = { a I∪J | (ˆ subspace of R L . Prob that a M +1 I∪J belongs to A is at most 1 / 2. ⋄ 9

  10. Bernoulli case (continued) Rule only uses T . Ideally we should also use M and N : errors are more likely for smaller M and N . Conjecture: for M × M matrix P (det( A ) = 0) ∝ M 2 2 1 − M . (Main failure: a pair of equal rows or columns). Best provable upper bound is still quite loose. Such analysis could allow shorter delay. ||x|| 0 ||x|| 0 30 15 10 20 10 5 0 0 0 5 10 15 0 5 10 15 20 25 30 ||y − A x|| 2 ||y − A x|| 2 8 2 6 1.5 4 1 0.5 2 0 0 0 5 10 15 0 5 10 15 20 25 30 Example with K = 3, N = 40. Example with K = 10, N = 40. 10

  11. Near-sparse signals In many practical settings signals are near-sparse: e.g. Fourier or wavelet transforms of smooth signals. 10 50 20 (a) signal, 40 5 0 30 (b) wav. coeffs., 20 −20 0 (c) coeffs. sorted. 10 −40 −5 0 0 50 100 150 0 50 100 150 0 50 100 150 x M has similar error CS results: with roughly O ( K log N ) samples, ˆ to keeping K largest entries in x ∗ . Our approach: x M to x M , we obtain T new samples, and find distance from ˆ Given ˆ H M + T � { x | y i = a ′ i x, 1 ≤ i ≤ M + T } . This distance can be used to bound the reconstruction error � x ∗ − ˆ x M � 2 . 11

  12. Near-sparse signals (continued) Let H M + T � { x | y i = a ′ i x , i = 1 , .., M + T } . Let θ T be the angle x M ) and H M + T . between the line ( x ∗ , ˆ x M , H M + T ) x M ) = d (ˆ d ( x ∗ , ˆ x M , H M + T ) � C T d (ˆ sin( θ T ) Let L = N − M . x M + T ˆ Using properties of χ L , χ 2 L and Jensen’s ineq. we have: x M ˆ θ T � � 1 L L − 2 E [ sin( θ ) ] ≈ T ≤ x ∗ T − 2 � � 1 ≤ L − 2 T − 2 − L V ar H T A M x = y 1: M sin( θ ) M + T T 12

  13. Examples: near-sparse signals 20 Error error−est 0 10 Error (dB) −20 sample mean 8 approx. mean −40 bound on mean 6 4 −60 2 −80 0 20 40 60 80 100 0 0 20 40 60 80 100 T 10 Error 0 error−est 8 sample std −10 bound on std 6 Error (dB) −20 4 −30 2 −40 0 −50 0 20 40 60 80 100 T −60 0 200 400 600 800 1000 M (Top) sample C T , approx and Errors and bounds for (Top) sparse bound. sig., N = 100 , T = 5 , K = 10. (Bottom) sample std of C T , (Bottom): power-law decay signal, and a bound. L = 100. N = 1000 , T = 10. 13

  14. Near-sparse and noisy: simplified approach x , true: x ∗ . Take T new samples y i = a ′ i x ∗ , and Current solution ˆ y i = a ′ x − x ∗ , and let compute ˆ i ˆ x . Denote the error by δ = ˆ y i − y i . Then z i = ˆ z i = a ′ i δ, 1 ≤ i ≤ T Now z i ’s are i.i.d. from some a zero-mean distribution with variance � δ � 2 2 V ar ( a ij ). We can estimate � δ � 2 2 by estimating the variance of the z i . For example, for Gaussian a i , confidence bounds on � δ � 2 2 can be obtained from the χ 2 T distribution. This is related to recent paper by Ward, ”Compressed sensing with cross-validation” that uses the Johnson-Lindenstrauss lemma. 14

  15. Solving sequential CS Main goal of sequential CS – min number of samples. Yet, we also want efficient solution – not just resolving each time. Warm-starting simplex: x M is not feasible after M + 1-st sample. Add a ’slack’ variable: min � x � 1 + Qz , where y 1: M = A M x , y M +1 = a ′ M +1 x − z , z ≥ 0. For Q large enough, z is forced to 0. LP1 100 num iter. LP2 50 0 0 5 10 15 20 25 30 M x M and Alternative approach: homotopy continuation between ˆ x M +1 – follow the piece-wise linear solution path. Garrigues and ˆ El Ghaoui, 2008, and indep. Asif and Romberg, 2008. 15

  16. Summary and future work Sequential processing can minimize the number of required measurements. • Gaussian case: a simple rule requires the least possible number of samples. • Bernoulli case: trade-off between probability of error and delay. • Near-sparse and noisy case: Change in solutions gives us information about solution accuracy. Interesting questions: • Related results in graphical models structure recovery from samples, and low-rank matrix recovery. • More efficient sequential solutions. • Comparison with active learning approaches? 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend