Reconstruction from Anisotropic Random Measurements Mark Rudelson - PowerPoint PPT Presentation

Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan August 7, 2013

Want to estimate a parameter β ∈ R p Example: How is a response y ∈ R related to the Parkinson’s disease affected by a set of genes among the Chinese population? Construct a linear model: y = β T � x + ǫ , where E ( y | � x ) = β T � x . ◮ Parameter: Non-zero entries in β (sparsity of β ) identify a subset of genes and indicate how much they influence y . Take a random sample of ( X , Y ) , and use the sample to estimate β ; that is, we have Y = X β + ǫ .

Model selection and parameter estimation When can we approximately recover β from n noisy observations Y ? Questions: How many measurements n do we need in order to recover the non-zero positions in β ? How does n scale with p or s , where s is the number of non-zero entries of β ? What assumptions about the data matrix X are reasonable?

Sparse recovery When β is known to be s -sparse for some 1 ≤ s ≤ n , which means that at most s of the coefficients of β can be non-zero: Assume every 2 s columns of X are linearly independent: Identifiability condition (reasonable once n ≥ 2 s ) � X υ � 2 △ Λ min ( 2 s ) = min n � υ � 2 > 0 . υ � = 0 , 2 s -sparse Proposition: (Cand` es-Tao 05). Suppose that any 2 s columns of the n × p matrix X are linearly independent. Then, any s -sparse signal β ∈ R p can be reconstructed uniquely from X β .

ℓ 0 -minimization How to reconstruct an s -sparse signal β ∈ R p from the measurements Y = X β given Λ min ( 2 s ) > 0? Let β be the unique sparsest solution to X β = Y : β = arg min β : X β = Y � β � 0 where � β � 0 := # { 1 ≤ i ≤ p : β i � = 0 } is the sparsity of β . Unfortunately, ℓ 0 -minimization is computationally intractable; (in fact, it is an NP-complete problem).

Basis pursuit Consider the following convex optimization problem β ∗ := arg min β : X β = Y � β � 1 . Basis pursuit works whenever the n × p measurement matrix X is sufficiently incoherent: RIP (Cand` es-Tao 05) requires that for all T ⊂ { 1 , . . . , p } with | T | ≤ s and for all coefficients sequences ( c j ) j ∈ T , ( 1 − δ s ) � c � 2 ≤ � X T c / n � 2 ≤ ( 1 + δ s ) � c � 2 holds for some 0 < δ s < 1 ( s -restricted isometry constant). The “good” matrices for compressed sensing should satisfy the inequalities for the largest possible s .

Restricted Isometry Property (RIP): examples For Gaussian random matrix, or any sub-Gaussian ensemble, RIP holds with s ≍ n / log ( p / n ) . For random Fourier ensemble, or randomly sampled rows of orthonormal matrices, RIP holds for s = O ( n / log 4 p ) . For a random matrix composed of columns that are independent isotropic vectors with log-concave densities, RIP holds for s = O ( n / log 2 ( p / n )) . References: Cand` es-Tao 05, 06, Rudelson-Vershynin 05, Donoho 06, Baraniuk et al. 08, Mendelson et al. 08, Adamczak et al. 09.

Basis pursuit for high dimensional data These algorithms are also robust with regards to noise, and RIP will be replaced by more relaxed conditions. In particular, the isotropicity condition which has been assumed in all literature cited above needs to be dropped. Let X i ∈ R p , i = 1 , . . . , n be i.i.d. random row vectors of the design matrix X . Covariance matrix: E X i ⊗ X i = E X i X T Σ( X i ) = i n n � � 1 X i ⊗ X i = 1 � X i X T Σ n = i n n i = 1 i = 1 X i is isotropic if Σ( X i ) = I and E � X i � 2 2 = n .

Sparse recovery for Y = X β + ǫ Lasso (Tibshirani 96), a.k.a. Basis Pursuit (Chen, Donoho and Saunders 98, and others): � β � Y − X β � 2 / 2 n + λ n � β � 1 , β = arg min where the scaling factor 1 / ( 2 n ) is chosen by convenience. Dantzig selector (Cand` es-Tao 07): β ∈ R p � � β � 1 subject to � X T ( Y − X � ( DS ) arg min β ) / n � ∞ ≤ λ n . � References: Greenshtein-Ritov 04, Meinshausen-B¨ uhlmann 06, Zhao-Yu 06, Bunea et al. 07, Cand` es-Tao 07, van de Geer 08, Zhang-Huang 08, Wainwright 09, Koltchinskii 09, Meinshausen-Yu 09, Bickel et. al. 09, and others.

The Cone Constraint For an appropriately chosen λ n , the solution of the Lasso or the Dantzig selector satisfies (under i.i.d. Gaussian noise), with high probability, υ := � β − β ∈ C ( s , k 0 ) k 0 = 1 for the Dantzig selector, and k 0 = 3 for the Lasso. Object of interest: for 1 ≤ s 0 ≤ p , and a positive number k 0 , � � x ∈ R p | ∃ J ∈ { 1 , . . . , p } , | J | = s 0 s.t. � x J c � 1 ≤ k 0 � x J � 1 C ( s 0 , k 0 ) = This object has appeared in earlier work in the noiseless setting References: Donoho-Huo 01, Elad-Bruckstein 02, Feuer-Nemirovski 03, Cand` es-Tao 07, Bickel-Ritov-Tsybakov 09, Cohen-Dahmen-DeVore 09.

The Lasso solution

Restricted Eigenvalue (RE) condition Object of interest: � � x ∈ R p | ∃ J ∈ { 1 , . . . , p } , | J | = s 0 s.t. � x J c � 1 ≤ k 0 � x J � 1 C ( s 0 , k 0 ) = . Definition Matrix A q × p satisfies RE ( s 0 , k 0 , A ) condition with parameter K ( s 0 , k 0 , A ) if for any υ � = 0, 1 � A υ � 2 K ( s 0 , k 0 , A ) := min min > 0 . � υ J � 2 � υ Jc � 1 ≤ k 0 � υ J � 1 J ⊆{ 1 ,..., p } , | J |≤ s 0 References: van de Geer 07, Bickel-Ritov-Tsybakov 09, van de Geer-B¨ uhlmann 09.

An elementary estimate Lemma For each vector υ ∈ C ( s 0 , k 0 ) , let T 0 denote the locations of the s 0 � � � � � � � υ T 0 � largest coefficients of υ in absolute values. Then � υ T c � 1 ≤ 1 , and 0 � � � υ � 2 � υ T 0 � √ 2 ≥ 1 + k 0 . Implication: Let A be a q × p matrix such that RE ( s 0 , 3 k 0 , A ) condition holds for 0 < K ( s 0 , 3 k 0 , A ) < ∞ . Then ∀ υ ∈ C ( s 0 , k 0 ) ∩ S p − 1 � � � υ T 0 � 1 2 � A υ � 2 ≥ K ( s 0 , k 0 , A ) ≥ K ( s 0 , k 0 , A ) · √ 1 + k 0 > 0

Sparse eigenvalues Definition For m ≤ p , we define the largest and smallest m -sparse eigenvalue of a q × p matrix A to be 2 / � t � 2 t ∈ R p , t � = 0 ; m − sparse � At � 2 ρ max ( m , A ) := max 2 , 2 / � t � 2 t ∈ R p , t � = 0 ; m − sparse � At � 2 ρ min ( m , A ) := min 2 . If RE ( s 0 , k 0 , A ) is satisfied with k 0 ≥ 1, then the square submatrices of size 2 s 0 of A T A are necessarily positive definite, that is, ρ min ( 2 s 0 , A ) > 0.

Examples: of A which satisfies the Restricted Eigenvalue condition, but not RIP (Ruskutti, Wainwright, and Yu 10) Spiked Identity matrix: for a ∈ [ 0 , 1 ) , Σ p × p = ( 1 − a ) I p × p + a � 1 � 1 T 1 ∈ R p is the vector of all ones. where � ρ min (Σ) > 0 Then for all s 0 × s 0 submatrix Σ SS , we have ρ max (Σ SS ) ρ min (Σ SS ) = 1 + a ( s 0 − 1 ) 1 − a � � � Σ 1 / 2 e j � Largest sparse eigenvalue → ∞ as s 0 → ∞ , but 2 = 1 is bounded.

Motivation: to construct classes of design matrices such that the Restricted Eigenvalue condition will be satisfied. Design matrix X has just independent rows, rather than independent entries: e.g., consider for some matrix A q × p X = Ψ A , where rows of the matrix Ψ n × q are independent isotropic vectors with subgaussian marginals, and RE ( s 0 , ( 1 + ε ) k 0 , A ) holds for some ε > 0, p > s 0 ≥ 0, and k 0 > 0. Design matrix X consists of independent identically distributed rows with bounded entries, whose covariance matrix Σ( X i ) = EX i X T satisfies RE ( s 0 , ( 1 + ε ) k 0 , Σ 1 / 2 ) . i The rows of X will be sampled from some distributions in R p ; The distribution may be highly non-Gaussian and perhaps discrete.

Outline Introduction The main results ◮ The reduction principle ◮ Applications of the reduction principle Ingredients of the proof Conclusion

Notation Let e 1 , . . . , e p be the canonical basis of R p . For a set J ⊂ { 1 , . . . , p } , denote E J = span { e j : j ∈ J } . For a matrix A , we use � A � 2 to denote its operator norm. For a set V ⊂ R p , we let conv V denote the convex hull of V . For a finite set Y , the cardinality is denoted by | Y | . 2 and S p − 1 be the unit Euclidean ball and the unit sphere Let B p respectively

The reduction principle: Theorem Let E = ∪ | J | = d E J for d ( 3 k 0 ) < p, where � � 16 K 2 ( s 0 , 3 k 0 , A )( 3 k 0 ) 2 ( 3 k 0 + 1 ) � 2 � Ae j d ( 3 k 0 ) = s 0 + s 0 max 2 δ 2 j and E denotes R p otherwise. Let � Ψ be a matrix such that � � � � �� ∀ x ∈ AE ( 1 − δ ) � x � 2 ≤ Ψ x � 2 ≤ ( 1 + δ ) � x � 2 . Then RE ( s 0 , k 0 , � Ψ A ) holds with 0 < K ( s 0 , k 0 , � Ψ A ) ≤ K ( s 0 , k 0 , A ) . 1 − 5 δ If the matrix � Ψ acts as almost isometry on the images of the d -sparse vectors under A , then the product � Ψ A satisfies the RE condition with a smaller parameter k 0 .

Reconstruction from Anisotropic Random Measurements Mark Rudelson - PowerPoint PPT Presentation

Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan August 7, 2013 Want to estimate a parameter R p

3D RECONSTRUCTION Reconstruction method Reconstruction from images Reconstruction from video

Delaunay Triangulation: Applications Reconstruction Meshing 1 Reconstruction From points 2 -

Developing 3D Anisotropic Mechanics Developing 3D Anisotropic Mechanics Model of Powder

ANISOTROPIC EFFECT ANISOTROPIC EFFECT WHEN USING ISOTROPIC WHEN USING ISOTROPIC CONDUCTIVE

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Uncertainty in Eddy Sources of Random Error Random Errors: . . . Covariance Measurements:

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Measurements of BB Angular Correlations Measurements of BB Angular Correlations Measurements of

Anisotropic Random Wave Models Anne Estrade & Julie Fournier MAP5 - Universit e Paris

Microsticky Microsticky Measurements by Measurements by Measurements by Microsticky

1. Reconstruction and the West 1.1 Reconstruction: Americas Unfinished Revolution, 1865-1877

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Vertex reconstruction Vertex reconstruction in large liquid scintillator detectors in large

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Kernel-based Image Reconstruction from scattered Radon data by (anisotropic) positive definite

DISCRETE PROBABILITY Discrete Probability is a finite or countable set called the

Paths and Random Walks on Graphs Based on materials by

Learning from random moments Rmi Gribonval - Inria Rennes - Bretagne Atlantique

Markov Chains & Zifan Yu Department of Mathematics, University of Maryland Random Walks

Random Number Generation with Multiple Streams for Sequential and Parallel Computing Pierre

Extreme statistics of random and quantum chaotic states Steve Tomsovic Washington State

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

EE456 Digital Communications Professor Ha Nguyen September 2016 EE456 Digital

Reconstruction from Anisotropic Random Measurements Mark Rudelson - PowerPoint PPT Presentation

Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan August 7, 2013 Want to estimate a parameter R p

3D RECONSTRUCTION Reconstruction method Reconstruction from images Reconstruction from video

Delaunay Triangulation: Applications Reconstruction Meshing 1 Reconstruction From points 2 -

Developing 3D Anisotropic Mechanics Developing 3D Anisotropic Mechanics Model of Powder

ANISOTROPIC EFFECT ANISOTROPIC EFFECT WHEN USING ISOTROPIC WHEN USING ISOTROPIC CONDUCTIVE

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Uncertainty in Eddy Sources of Random Error Random Errors: . . . Covariance Measurements:

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Measurements of BB Angular Correlations Measurements of BB Angular Correlations Measurements of

Anisotropic Random Wave Models Anne Estrade &amp; Julie Fournier MAP5 - Universit e Paris

Microsticky Microsticky Measurements by Measurements by Measurements by Microsticky

1. Reconstruction and the West 1.1 Reconstruction: Americas Unfinished Revolution, 1865-1877

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Vertex reconstruction Vertex reconstruction in large liquid scintillator detectors in large

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Kernel-based Image Reconstruction from scattered Radon data by (anisotropic) positive definite

DISCRETE PROBABILITY Discrete Probability is a finite or countable set called the

Paths and Random Walks on Graphs Based on materials by

Learning from random moments Rmi Gribonval - Inria Rennes - Bretagne Atlantique

Markov Chains &amp; Zifan Yu Department of Mathematics, University of Maryland Random Walks

Random Number Generation with Multiple Streams for Sequential and Parallel Computing Pierre

Extreme statistics of random and quantum chaotic states Steve Tomsovic Washington State

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

EE456 Digital Communications Professor Ha Nguyen September 2016 EE456 Digital

Anisotropic Random Wave Models Anne Estrade & Julie Fournier MAP5 - Universit e Paris

Markov Chains & Zifan Yu Department of Mathematics, University of Maryland Random Walks