Sparse Representations Joel A. Tropp Department of Mathematics - PowerPoint PPT Presentation

Sparse Representations ❦ Joel A. Tropp Department of Mathematics The University of Michigan jtropp@umich.edu Research supported in part by NSF and DARPA 1

Introduction Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 2

Systems of Linear Equations We consider linear systems of the form               d Φ          = x  b        � ��   N Assume that ❧ Φ has dimensions d × N with N ≥ d ❧ Φ has full rank ❧ The columns of Φ have unit ℓ 2 norm Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 3

The Trichotomy Theorem Theorem 1. For a linear system Φ x = b , exactly one of the following situations obtains. 1. No solution exists. 2. The equation has a unique solution. 3. The solutions form a linear subspace of positive dimension. Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 4

Minimum-Energy Solutions Classical approach to underdetermined systems: min � x � 2 subject to Φ x = b Advantages: ❧ Analytically tractable ❧ Physical interpretation as minimum energy ❧ Principled way to pick a unique solution Disadvantages: ❧ Solution is typically nonzero in every component ❧ The wrong principle for most applications Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 5

Regularization via Sparsity Another approach to underdetermined systems: min � x � 0 subject to Φ x = b (P0) where � x � 0 = # { j : x j � = 0 } Advantages: ❧ Principled way to choose a solution ❧ A good principle for many applications Disadvantages: ❧ In general, computationally intractable Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 6

Sparse Approximation ❧ In practice, we solve a noise-aware variant, such as min � x � 0 subject to � Φ x − b � 2 ≤ ε ❧ This is called a sparse approximation problem ❧ The noiseless problem (P0) corresponds to ε = 0 ❧ The ε = 0 case is called the sparse representation problem Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 7

Applications Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 8

Variable Selection in Regression ❧ The oldest application of sparse approximation is linear regression ❧ The columns of Φ are explanatory variables ❧ The right-hand side b is the response variable ❧ Φ x is a linear predictor of the response ❧ Want to use few explanatory variables ❧ Reduces variance of estimator ❧ Limits sensitivity to noise Reference: [Miller 2002] Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 9

Seismic Imaging "In deconvolving any observed seismic trace, it is rather disappointing to discover that there is a nonzero spike at every point in time regardless of the data sampling rate. One might hope to find spikes only where real geologic discontinuities take place." References: [Claerbout–Muir 1973] Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 10

Transform Coding ❧ Transform coding can be viewed as a sparse approximation problem DCT − − − → IDCT ← − − − − Reference: [Daubechies–DeVore–Donoho–Vetterli 1998] Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 11

Algorithms Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 12

Sparse Representation is Hard Theorem 2. [Davis (1994), Natarajan (1995)] Any algorithm that can solve the sparse representation problem for every matrix and right-hand side must solve an NP-hard problem. Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 13

But... Many interesting instances of the sparse representation problem are tractable! Basic example: Φ is orthogonal Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 14

Algorithms for Sparse Representation ❧ Greedy methods make a sequence of locally optimal choices in hope of determining a globally optimal solution ❧ Convex relaxation methods replace the combinatorial sparse approximation problem with a related convex program in hope that the solutions coincide ❧ Other approaches include brute force, nonlinear programming, Bayesian methods, dynamic programming, algebraic techniques... Refs: [Baraniuk, Barron, Bresler, Cand` es, DeVore, Donoho, Efron, Fuchs, Gilbert, Golub, Hastie, Huo, Indyk, Jones, Mallat, Muthukrishnan, Rao, Romberg, Stewart, Strauss, Tao, Temlyakov, Tewfik, Tibshirani, Willsky...] Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 15

Orthogonal Matching Pursuit (OMP) Input: The matrix Φ , right-hand side b , and sparsity level m Initialize the residual r 0 = b For t = 1 , . . . , m do A. Find a column most correlated with the residual: ω t = arg max j =1 ,...,N |� r t − 1 , ϕ j �| B. Update residual by solving a least-squares problem: y t = arg min y � b − Φ t y � 2 r t = b − Φ t y t where Φ t = [ ϕ ω 1 . . . ϕ ω t ] Output: Estimate � x ( ω j ) = y m ( j ) Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 16

ℓ 1 Minimization Sparse Representation as a Combinatorial Problem min � x � 0 subject to Φ x = b (P0) Relax to a Convex Program min � x � 1 subject to Φ x = b (P1) ❧ Any numerical method can be used to perform the minimization ❧ Projected gradient and interior-point methods seem to work best References: [Donoho et al. 1999, Figueredo et al. 2007] Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 17

Why an ℓ 1 objective? ℓ 0 quasi-norm ℓ 1 norm ℓ 2 norm Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 18

Why an ℓ 1 objective? ℓ 0 quasi-norm ℓ 1 norm ℓ 2 norm Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 19

Relative Merits OMP (P1) Computational Cost X � Ease of Implementation X � Effectiveness X � Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 20

When do the algorithms work? Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 21

Key Insight Sparse representation is tractable when the matrix Φ is sufficiently nice (More precisely, column submatrices of the matrix should be well conditioned) Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 22

Quantifying Niceness ❧ We say Φ is incoherent when 1 √ max j � = k |� ϕ j , ϕ k �| ≤ d ❧ Incoherent matrices appear often in signal processing applications ❧ We call Φ a tight frame when N ΦΦ T = d I ❧ Tight frames have minimal spectral norm among conformal matrices Note: Both conditions can be relaxed substantially Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 23

Example: Identity + Fourier 1 1/ √ d Impulses Complex Exponentials An incoherent tight frame Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 24

Finding Sparse Solutions Theorem 3. [T 2004] Let Φ be incoherent. Suppose that the linear system Φ x = b has a solution x ⋆ that satisfies √ 1 � x ⋆ � 0 < 2 ( d + 1) . Then the vector x ⋆ is 1. the unique minimal ℓ 0 solution to the linear system, and 2. the output of both OMP and ℓ 1 minimization. References: [Donoho–Huo 2001, Greed is Good , Just Relax ] Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 25

The Square-Root Threshold √ ❧ Sparse representations are not necessarily unique past the d threshold Example: The Dirac Comb ❧ Consider the Identity + Fourier matrix with d = p 2 ❧ There is a vector b that can be written as either p spikes or p sines ❧ By the Poisson summation formula, p − 1 p − 1 � � δ pj ( t ) = 1 e − 2 π i pjt/d √ b ( t ) = for t = 0 , 1 , . . . , d d j =0 j =0 Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 26

Enter Probability Insight: The bad vectors are atypical ❧ It is usually possible to identify random sparse vectors ❧ The next theorem is the first step toward quantifying this intuition Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 27

Conditioning of Random Submatrices Theorem 4. [T 2006] Let Φ be an incoherent tight frame with at least twice as many columns as rows. Suppose that c d m ≤ log d. If A is a random m -column submatrix of Φ then � � � A ∗ A − I � < 1 Prob ≥ 99 . 44% . 2 The number c is a positive absolute constant. Reference: [ Random Subdictionaries ] Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 28

Recovering Random Sparse Vectors Model (M) for b = Φ x The matrix Φ is an incoherent tight frame Nonzero entries of x number m ≤ c d/ log N have uniformly random positions are independent, zero-mean Gaussian RVs Theorem 5. [T 2006] Let b = Φ x be a random vector drawn according to Model (M) . Then x is 1. the unique minimal ℓ 0 solution w.p. at least 99.44% and 2. the unique minimal ℓ 1 solution w.p. at least 99.44%. Reference: [ Random Subdictionaries ] Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 29

Sparse Representations Joel A. Tropp Department of Mathematics - PowerPoint PPT Presentation

Sparse Representations Joel A. Tropp Department of Mathematics The University of Michigan jtropp@umich.edu Research supported in part by NSF and DARPA 1 Introduction Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007)

Generating Sparse Representations by Adaptive Multiscale Approximations Angela Kunoth University

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

From moments to sparse representations, a geometric, algebraic and algorithmic viewpoint Bernard

Computing Sparse Representations in O(NlogN) time May 3, 2013 Tsung-Han Lin and H.T. Kung

61A Lecture 16 Announcements String Representations String Representations 4 String

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Chapter 6: Vector Semantics, Part II Tf-idf and PPMI are sparse representations tf-idf and PPMI

Sparse Representations The Sparse Representations (SRs) framework was inspired by studies of

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Sparse sampling sampling design in design in Sparse population PK/PD studies studies

AGENDA 1. Review of Metro Hartford Future 2. Vision and Benchmarks 3. Goals 4. Key

Carbon Taxes Vs Tradable Permits: Efficiency and equity effects for a small open economy John

GAINS FROM TRADE IN NEW TRADE MODELS Phemelo Tamasiga Bielefeld University

RICARDIAN MODEL CONTINUED EMPIRICAL EVIDENCE Can't expect a literal test of the model; it

Fourth-Quarter and Full-Year Results 2008 Zurich February 11, 2009 Cautionary statement

Performance of Parallel Programs Wolfgang Schreiner Research Institute for Symbolic Computation

Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off Matvey Arye,

Prediction Serving Joseph E. Gonzalez Asst. Professor, UC Berkeley jegonzal@cs.berkeley.edu