Recent Theoretical Advances in Sparse Approximation Joel A. Tropp - PowerPoint PPT Presentation

Recent Theoretical Advances in Sparse Approximation ❦ Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S. Muthukrishnan and M. J. Strauss of AT&T Research. S. Muthukrishnan is also affiliated with Rutgers Univ. 1

What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector

What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector ❧ The sparse approximation problem is to solve � � � � � min min � x − b λ ϕ λ subject to | Λ | ≤ m � � � � Λ ⊂ Ω b ∈ C Λ � λ ∈ Λ 2 ❧ The inner minimization is a least squares problem ❧ But the outer minimization is combinatorial

What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector ❧ The sparse approximation problem is to solve � � � � � min min � x − b λ ϕ λ subject to | Λ | ≤ m � � � � Λ ⊂ Ω b ∈ C Λ � λ ∈ Λ 2 ❧ The inner minimization is a least squares problem ❧ But the outer minimization is combinatorial ❧ Formally, we call the problem ( D , m ) - Sparse Greed is Good 2

Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms

Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms ❧ A complete dictionary can represent every vector without error ❧ Each vector has infinitely many representations over a redundant dictionary

Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms ❧ A complete dictionary can represent every vector without error ❧ Each vector has infinitely many representations over a redundant dictionary ❧ In most modern applications, dictionaries are complete and redundant Greed is Good 3

Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X

Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X ❧ Method: Solve the sparse approximation problem!

Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X ❧ Method: Solve the sparse approximation problem! ❧ Statisticians have developed many approaches 1. Forward selection 2. Backward elimination 3. Sequential replacement 4. Stepwise regression [Efroymson 1960] 5. Exhaustive search [Garside 1965, Beale et al. 1967] 6. Projection Pursuit Regression [Friedman–Stuetzle 1981] Reference: [A. J. Miller 2002] Greed is Good 4

Transform Coding ❦ ❧ In simplest form, can be viewed as a sparse approximation problem DCT − − − → IDCT ← − − − − Reference: [Evans-Mersereau 2003] Greed is Good 5

Computational Complexity ❦ Theorem 1. [Davis (1994), Natarajan (1995)] Any instance of Exact Cover by Three Sets ( x3c ) is reducible in polynomial time to a sparse approximation problem. An instance of x3c Greed is Good 6

Computational Complexity II ❦ Corollary 2. Any algorithm that can solve ( D , m ) - Sparse for every dictionary and sparsity level must solve an NP-hard problem. ❧ It is widely believed that no tractable algorithms exist for NP-hard problems

Computational Complexity II ❦ Corollary 2. Any algorithm that can solve ( D , m ) - Sparse for every dictionary and sparsity level must solve an NP-hard problem. ❧ It is widely believed that no tractable algorithms exist for NP-hard problems ❧ BUT a specific problem ( D , m ) - Sparse may be easy ❧ AND preprocessing is allowed Greed is Good 7

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB)

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1 3. The squared approximation error is d |� x , ϕ ω n �| 2 � n = m +1

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1 3. The squared approximation error is d |� x , ϕ ω n �| 2 � n = m +1 Insight: ( D , m ) - Sparse can be solved approximately so long as sub-collections of m atoms in D are sufficiently close to being orthogonal. Greed is Good 8

Coherence ❦ ❧ Donoho and Huo introduced the coherence parameter µ of a dictionary: � �� µ = max ϕ ω j , ϕ ω k � j � = k ❧ Measures how much distinct atoms look alike

Coherence ❦ ❧ Donoho and Huo introduced the coherence parameter µ of a dictionary: � �� µ = max ϕ ω j , ϕ ω k � j � = k ❧ Measures how much distinct atoms look alike ❧ Many natural dictionaries are incoherent [Donoho–Huo 2000] ❧ Example: Spikes + sines 1 2/ √ d Greed is Good 9

Coherence Bounds ❦ ❧ In general, � N − d µ ≥ d ( N − 1) ❧ If the dictionary contains an orthonormal basis, � 1 µ ≥ d

Coherence Bounds ❦ ❧ In general, � N − d µ ≥ d ( N − 1) ❧ If the dictionary contains an orthonormal basis, � 1 µ ≥ d ❧ Incoherent dictionaries can be enormous [GMS 2003] Greed is Good 10

Quasi-Coherence ❦ ❧ Donoho–Elad [2003] and JAT [2003] independently introduced the quasi-coherence : m � µ 1 ( m ) = max max |� ϕ ω , ϕ λ t �| ω λ 1 ,...,λ m t =1 ❧ Observe that µ 1 (1) = µ ❧ Generalizes the cumulative coherence: µ 1 ( m ) ≤ µ m Greed is Good 11

Quasi-Coherence Example ❦ ❧ Consider the dictionary of translates of a double pulse: √ 35 /6 1/6 √ ❧ The coherence is µ = 35 / 36 ❧ The quasi-coherence is √  35 / 36 , m = 1 √  µ 1 ( m ) = 35 / 18 , m = 2 √ 35 / 12 , m ≥ 3  Greed is Good 12

Roadmap ❦ ❧ First, a few basic algorithms for sparse approximation ❧ Then, the role of quasi-coherence in the performance of these algorithms ❧ Finally, a new algorithm that offers better approximation guarantees Greed is Good 13

Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993]

Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993] ❧ Procedure: 1. Initialize a 0 = 0 and r 0 = x

Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993] ❧ Procedure: 1. Initialize a 0 = 0 and r 0 = x 2. At step t , select an atom ϕ λ t that solves max |� r t − 1 , ϕ ω �| ω

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp - PowerPoint PPT Presentation

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S.

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

6. Approximation and fitting norm approximation least-norm problems regularized

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop

Recent Advances In the Recent Advances In the Management of ITP Management of ITP Prof Gregory

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Recent advances in Mandelbrot martingales theory Julien Barral, Universit e Paris Nord

Seminar on Seminar on Recent Developments in Project Management Recent Developments in Project

Recent Results in Sparse Domination Michael Lacey Georgia Tech May 31, 2018 Section 0.0 Slide

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

Learning and Inference in Markov Logic Networks CS 486/686 University of Waterloo Lecture 23:

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

Computing for engineering simulation Data analysis I, II and Experimental Thinking Jin Yoon

Probabilistic Graphical Models Lecture 7 Variable Elimination CS/CNS/EE 155 Andreas Krause

solving systems L. Olson Department of Computer Science University of Illinois at

IntroductiontoIsabelle/HOL [| A 1 ; A 2 ; canbereadasif A 1 and A

Computational Optimization Newtons Method 2/5/08 Newtons Method Method for finding a zero

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp - PowerPoint PPT Presentation

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S.

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

6. Approximation and fitting norm approximation least-norm problems regularized

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop

Recent Advances In the Recent Advances In the Management of ITP Management of ITP Prof Gregory

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Recent advances in Mandelbrot martingales theory Julien Barral, Universit e Paris Nord

Seminar on Seminar on Recent Developments in Project Management Recent Developments in Project

Recent Results in Sparse Domination Michael Lacey Georgia Tech May 31, 2018 Section 0.0 Slide

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math &amp; CS, Emory

Learning and Inference in Markov Logic Networks CS 486/686 University of Waterloo Lecture 23:

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

Computing for engineering simulation Data analysis I, II and Experimental Thinking Jin Yoon

Probabilistic Graphical Models Lecture 7 Variable Elimination CS/CNS/EE 155 Andreas Krause

solving systems L. Olson Department of Computer Science University of Illinois at

IntroductiontoIsabelle/HOL [| A 1 ; A 2 ; canbereadasif A 1 and A

Computational Optimization Newtons Method 2/5/08 Newtons Method Method for finding a zero

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory