recent theoretical advances in sparse approximation
play

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp - PowerPoint PPT Presentation

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S.


  1. Recent Theoretical Advances in Sparse Approximation ❦ Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S. Muthukrishnan and M. J. Strauss of AT&T Research. S. Muthukrishnan is also affiliated with Rutgers Univ. 1

  2. What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector

  3. What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector ❧ The sparse approximation problem is to solve � � � � � min min � x − b λ ϕ λ subject to | Λ | ≤ m � � � � Λ ⊂ Ω b ∈ C Λ � λ ∈ Λ 2 ❧ The inner minimization is a least squares problem ❧ But the outer minimization is combinatorial

  4. What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector ❧ The sparse approximation problem is to solve � � � � � min min � x − b λ ϕ λ subject to | Λ | ≤ m � � � � Λ ⊂ Ω b ∈ C Λ � λ ∈ Λ 2 ❧ The inner minimization is a least squares problem ❧ But the outer minimization is combinatorial ❧ Formally, we call the problem ( D , m ) - Sparse Greed is Good 2

  5. Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms

  6. Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms ❧ A complete dictionary can represent every vector without error ❧ Each vector has infinitely many representations over a redundant dictionary

  7. Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms ❧ A complete dictionary can represent every vector without error ❧ Each vector has infinitely many representations over a redundant dictionary ❧ In most modern applications, dictionaries are complete and redundant Greed is Good 3

  8. Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X

  9. Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X ❧ Method: Solve the sparse approximation problem!

  10. Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X ❧ Method: Solve the sparse approximation problem! ❧ Statisticians have developed many approaches 1. Forward selection 2. Backward elimination 3. Sequential replacement 4. Stepwise regression [Efroymson 1960] 5. Exhaustive search [Garside 1965, Beale et al. 1967] 6. Projection Pursuit Regression [Friedman–Stuetzle 1981] Reference: [A. J. Miller 2002] Greed is Good 4

  11. Transform Coding ❦ ❧ In simplest form, can be viewed as a sparse approximation problem DCT − − − → IDCT ← − − − − Reference: [Evans-Mersereau 2003] Greed is Good 5

  12. Computational Complexity ❦ Theorem 1. [Davis (1994), Natarajan (1995)] Any instance of Exact Cover by Three Sets ( x3c ) is reducible in polynomial time to a sparse approximation problem. An instance of x3c Greed is Good 6

  13. Computational Complexity II ❦ Corollary 2. Any algorithm that can solve ( D , m ) - Sparse for every dictionary and sparsity level must solve an NP-hard problem. ❧ It is widely believed that no tractable algorithms exist for NP-hard problems

  14. Computational Complexity II ❦ Corollary 2. Any algorithm that can solve ( D , m ) - Sparse for every dictionary and sparsity level must solve an NP-hard problem. ❧ It is widely believed that no tractable algorithms exist for NP-hard problems ❧ BUT a specific problem ( D , m ) - Sparse may be easy ❧ AND preprocessing is allowed Greed is Good 7

  15. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB)

  16. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing

  17. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1

  18. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1 3. The squared approximation error is d |� x , ϕ ω n �| 2 � n = m +1

  19. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1 3. The squared approximation error is d |� x , ϕ ω n �| 2 � n = m +1 Insight: ( D , m ) - Sparse can be solved approximately so long as sub-collections of m atoms in D are sufficiently close to being orthogonal. Greed is Good 8

  20. Coherence ❦ ❧ Donoho and Huo introduced the coherence parameter µ of a dictionary: � �� �� µ = max ϕ ω j , ϕ ω k � j � = k ❧ Measures how much distinct atoms look alike

  21. Coherence ❦ ❧ Donoho and Huo introduced the coherence parameter µ of a dictionary: � �� �� µ = max ϕ ω j , ϕ ω k � j � = k ❧ Measures how much distinct atoms look alike ❧ Many natural dictionaries are incoherent [Donoho–Huo 2000] ❧ Example: Spikes + sines 1 2/ √ d Greed is Good 9

  22. Coherence Bounds ❦ ❧ In general, � N − d µ ≥ d ( N − 1) ❧ If the dictionary contains an orthonormal basis, � 1 µ ≥ d

  23. Coherence Bounds ❦ ❧ In general, � N − d µ ≥ d ( N − 1) ❧ If the dictionary contains an orthonormal basis, � 1 µ ≥ d ❧ Incoherent dictionaries can be enormous [GMS 2003] Greed is Good 10

  24. Quasi-Coherence ❦ ❧ Donoho–Elad [2003] and JAT [2003] independently introduced the quasi-coherence : m � µ 1 ( m ) = max max |� ϕ ω , ϕ λ t �| ω λ 1 ,...,λ m t =1 ❧ Observe that µ 1 (1) = µ ❧ Generalizes the cumulative coherence: µ 1 ( m ) ≤ µ m Greed is Good 11

  25. Quasi-Coherence Example ❦ ❧ Consider the dictionary of translates of a double pulse: √ 35 /6 1/6 √ ❧ The coherence is µ = 35 / 36 ❧ The quasi-coherence is √  35 / 36 , m = 1 √  µ 1 ( m ) = 35 / 18 , m = 2 √ 35 / 12 , m ≥ 3  Greed is Good 12

  26. Roadmap ❦ ❧ First, a few basic algorithms for sparse approximation ❧ Then, the role of quasi-coherence in the performance of these algorithms ❧ Finally, a new algorithm that offers better approximation guarantees Greed is Good 13

  27. Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993]

  28. Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993] ❧ Procedure: 1. Initialize a 0 = 0 and r 0 = x

  29. Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993] ❧ Procedure: 1. Initialize a 0 = 0 and r 0 = x 2. At step t , select an atom ϕ λ t that solves max |� r t − 1 , ϕ ω �| ω

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend