Reconstruction from Anisotropic Random Measurements
Mark Rudelson and Shuheng Zhou
The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan
August 7, 2013
Reconstruction from Anisotropic Random Measurements Mark Rudelson - - PowerPoint PPT Presentation
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan August 7, 2013 Want to estimate a parameter R p
Reconstruction from Anisotropic Random Measurements
Mark Rudelson and Shuheng Zhou
The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan
August 7, 2013
Want to estimate a parameter β ∈ Rp
Example: How is a response y ∈ R related to the Parkinson’s disease affected by a set of genes among the Chinese population? Construct a linear model: y = βT x + ǫ, where E (y| x) = βT x.
◮ Parameter: Non-zero entries in β (sparsity of β) identify a subset of
genes and indicate how much they influence y.
Take a random sample of (X, Y), and use the sample to estimate β; that is, we have Y = Xβ + ǫ.
Model selection and parameter estimation
When can we approximately recover β from n noisy observations Y? Questions: How many measurements n do we need in order to recover the non-zero positions in β? How does n scale with p or s, where s is the number of non-zero entries of β? What assumptions about the data matrix X are reasonable?
Sparse recovery
When β is known to be s-sparse for some 1 ≤ s ≤ n, which means that at most s of the coefficients of β can be non-zero: Assume every 2s columns of X are linearly independent: Identifiability condition (reasonable once n ≥ 2s) Λmin(2s)
△
= min
υ=0,2s-sparse
Xυ2 n υ2 > 0. Proposition: (Cand` es-Tao 05). Suppose that any 2s columns of the n × p matrix X are linearly independent. Then, any s-sparse signal β ∈ Rp can be reconstructed uniquely from Xβ.
ℓ0-minimization
How to reconstruct an s-sparse signal β ∈ Rp from the measurements Y = Xβ given Λmin(2s) > 0? Let β be the unique sparsest solution to Xβ = Y: β = arg minβ:Xβ=Y β0 where β0 := #{1 ≤ i ≤ p : βi = 0} is the sparsity of β. Unfortunately, ℓ0-minimization is computationally intractable; (in fact, it is an NP-complete problem).
Basis pursuit
Consider the following convex optimization problem β∗ := arg minβ:Xβ=Y β1 . Basis pursuit works whenever the n × p measurement matrix X is sufficiently incoherent: RIP (Cand` es-Tao 05) requires that for all T ⊂ {1, . . . , p} with |T| ≤ s and for all coefficients sequences (cj)j∈T, (1 − δs) c2 ≤ XTc/n2 ≤ (1 + δs) c2 holds for some 0 < δs < 1 (s-restricted isometry constant). The “good” matrices for compressed sensing should satisfy the inequalities for the largest possible s.
Restricted Isometry Property (RIP): examples
For Gaussian random matrix, or any sub-Gaussian ensemble, RIP holds with s ≍ n/ log(p/n). For random Fourier ensemble, or randomly sampled rows of
For a random matrix composed of columns that are independent isotropic vectors with log-concave densities, RIP holds for s = O(n/log2(p/n)).
References: Cand` es-Tao 05, 06, Rudelson-Vershynin 05, Donoho 06, Baraniuk et al. 08, Mendelson et al. 08, Adamczak et al. 09.
Basis pursuit for high dimensional data
These algorithms are also robust with regards to noise, and RIP will be replaced by more relaxed conditions. In particular, the isotropicity condition which has been assumed in all literature cited above needs to be dropped. Let Xi ∈ Rp, i = 1, . . . , n be i.i.d. random row vectors of the design matrix X. Covariance matrix: Σ(Xi) = EXi ⊗ Xi = EXiX T
i
= 1 n
n
Xi ⊗ Xi = 1 n
n
XiX T
i
Xi is isotropic if Σ(Xi) = I and E Xi2
2 = n.
Sparse recovery for Y = Xβ + ǫ
Lasso (Tibshirani 96), a.k.a. Basis Pursuit (Chen, Donoho and Saunders 98, and others):
β Y − Xβ2/2n + λnβ1,
where the scaling factor 1/(2n) is chosen by convenience. Dantzig selector (Cand` es-Tao 07): (DS) arg min
β1 subject to X T(Y − X β)/n∞ ≤ λn.
References: Greenshtein-Ritov 04, Meinshausen-B¨ uhlmann 06, Zhao-Yu 06, Bunea et al. 07, Cand` es-Tao 07, van de Geer 08, Zhang-Huang 08, Wainwright 09, Koltchinskii 09, Meinshausen-Yu 09, Bickel et. al. 09, and
The Cone Constraint
For an appropriately chosen λn, the solution of the Lasso or the Dantzig selector satisfies (under i.i.d. Gaussian noise), with high probability, υ := β − β ∈ C(s, k0) k0 = 1 for the Dantzig selector, and k0 = 3 for the Lasso. Object of interest: for 1 ≤ s0 ≤ p, and a positive number k0, C(s0, k0) =
References: Donoho-Huo 01, Elad-Bruckstein 02, Feuer-Nemirovski 03, Cand` es-Tao 07, Bickel-Ritov-Tsybakov 09, Cohen-Dahmen-DeVore 09.
The Lasso solution
Restricted Eigenvalue (RE) condition
Object of interest: C(s0, k0) =
Definition
Matrix Aq×p satisfies RE(s0, k0, A) condition with parameter K(s0, k0, A) if for any υ = 0, 1 K(s0, k0, A) := min
J⊆{1,...,p},
|J|≤s0
min
υJc 1≤k0υJ1
Aυ2 υJ2 > 0.
References: van de Geer 07, Bickel-Ritov-Tsybakov 09, van de Geer-B¨ uhlmann 09.
An elementary estimate
Lemma
For each vector υ ∈ C(s0, k0), let T0 denote the locations of the s0 largest coefficients of υ in absolute values. Then
υ2
√
1+k0 .
Implication: Let A be a q × p matrix such that RE(s0, 3k0, A) condition holds for 0 < K(s0, 3k0, A) < ∞. Then ∀υ ∈ C(s0, k0) ∩ Sp−1 Aυ2 ≥
K(s0, k0, A) ≥ 1 K(s0, k0, A) · √1 + k0 > 0
Sparse eigenvalues
Definition
For m ≤ p, we define the largest and smallest m-sparse eigenvalue of a q × p matrix A to be ρmax(m, A) := max
t∈Rp,t=0;m−sparse At2 2/ t2 2 ,
ρmin(m, A) := min
t∈Rp,t=0;m−sparse At2 2/ t2 2 .
If RE(s0, k0, A) is satisfied with k0 ≥ 1, then the square submatrices of size 2s0 of ATA are necessarily positive definite, that is, ρmin(2s0, A) > 0.
Examples: of A which satisfies the Restricted Eigenvalue condition, but not RIP (Ruskutti, Wainwright, and Yu 10) Spiked Identity matrix: for a ∈ [0, 1), Σp×p = (1 − a)Ip×p + a 1 1T where 1 ∈ Rp is the vector of all ones. ρmin(Σ) > 0 Then for all s0 × s0 submatrix ΣSS, we have ρmax(ΣSS) ρmin(ΣSS) = 1 + a(s0 − 1) 1 − a Largest sparse eigenvalue → ∞ as s0 → ∞, but
bounded.
Motivation: to construct classes of design matrices such that the Restricted Eigenvalue condition will be satisfied. Design matrix X has just independent rows, rather than independent entries: e.g., consider for some matrix Aq×p X = ΨA, where rows of the matrix Ψn×q are independent isotropic vectors with subgaussian marginals, and RE(s0, (1 + ε)k0, A) holds for some ε > 0, p > s0 ≥ 0, and k0 > 0. Design matrix X consists of independent identically distributed rows with bounded entries, whose covariance matrix Σ(Xi) = EXiX T
i
satisfies RE(s0, (1 + ε)k0, Σ1/2). The rows of X will be sampled from some distributions in Rp; The distribution may be highly non-Gaussian and perhaps discrete.
Outline
Introduction The main results
◮ The reduction principle ◮ Applications of the reduction principle
Ingredients of the proof Conclusion
Notation
Let e1, . . . , ep be the canonical basis of Rp. For a set J ⊂ {1, . . . , p}, denote EJ = span{ej : j ∈ J}. For a matrix A, we use A2 to denote its operator norm. For a set V ⊂ Rp, we let conv V denote the convex hull of V. For a finite set Y, the cardinality is denoted by |Y|. Let Bp
2 and Sp−1 be the unit Euclidean ball and the unit sphere
respectively
The reduction principle:
Theorem
Let E = ∪|J|=dEJ for d(3k0) < p, where d(3k0) = s0 + s0 max
j
2
16K 2(s0, 3k0, A)(3k0)2(3k0 + 1) δ2 and E denotes Rp otherwise. Let Ψ be a matrix such that ∀x ∈ AE (1 − δ) x2 ≤
Then RE(s0, k0, ΨA) holds with 0 < K(s0, k0, ΨA) ≤ K(s0,k0,A)
1−5δ
. If the matrix Ψ acts as almost isometry on the images of the d-sparse vectors under A, then the product ΨA satisfies the RE condition with a smaller parameter k0.
Reformulation of the reduction principle:
Theorem
Restrictive Isometry. Let Ψ be a matrix such that ∀x ∈ AE (1 − δ) x2 ≤
Then for any x ∈ A
(1 − 5δ) ≤
If the matrix Ψ acts as almost isometry on the images of the d-sparse vectors under A, then it acts the same way on the images of C(s0, k0). It is reduced to checking that the almost isometry property holds for all vectors from some low-dimensional subspaces which is easier than checking the RE property directly.
Definition: subgaussian random vectors
Let Y be a random vector in Rp
1
Y is called isotropic if for every y ∈ Rp, E
= y2
2.
2
Y is ψ2 with a constant α if for every y ∈ Rp, Y, y ψ2 := inf{t : E
The ψ2 condition on a scalar random variable V is equivalent to the subgaussian tail decay of V, which means for some constant c, P (|V| > t) ≤ 2 exp(−t2/c2), for all t > 0. A random vector Y in Rp is subgaussian if the one-dimensional marginals Y, y are sub-gaussian random variables for all y ∈ Rp.
The first application of the reduction principle
Let A be a q × p matrix satisfying RE(s0, 3k0, A) condition. Let m = min(d, p) where d = s0 + s0 max
j
2
16K 2(s0, 3k0, A)(3k0)2(3k0 + 1) δ2 .
Theorem
Let Ψ be an n × q matrix whose rows are independent isotropic ψ2 random vectors in Rq with constant α. Suppose n ≥ 2000mα4 δ2 log 60ep mδ
Then with probability at least 1 − 2 exp(−δ2n/2000α4), RE(s0, k0, (1/√n)ΨA) condition holds with 0 < K(s0, k0, (1/ √ n)ΨA) ≤ K(s0, k0, A) 1 − δ .
Examples of subgaussian vectors
The random vector Y with i.i.d N(0, 1) random coordinates. Discrete Gaussian vector, which is a random vector taking values
X p with distribution P(X = m) = C exp(− m2
2 /2) for m ∈
X p. A vector with independent centered bounded random coordinates. In particular, vectors with random symmetric Bernoulli coordinates, in other words, random vertices of the discrete cube.
Previous results on (sub)Gaussian random vectors
Raskutti, Wainwright, and Yu 10: RE(s0, k0, X) holds for random Gaussian measurements / design matrix X which consists of n = O(s0 log p) independent copies of a Gaussian random vector Y ∼ Np(0, Σ), assuming that the RE condition holds for Σ1/2. Their proof relies on a deep result from the theory of Gaussian random processes – Gordon’s Minimax Lemma To establish the RE condition for more general classes of random matrices we had to introduce a new approach based on geometric functional analysis, namely, the reduction principle. The bound n = O(s0 log p) can be improved to the optimal one n = O(s0 log(p/s0)) when RE(s0, k0, Σ1/2) is replaced with RE(s0, (1 + ε)k0, Σ1/2) for any ε > 0.
In Zhou 09, subgaussian random matrices of the form X = ΨΣ1/2 was considered, where Σ is a p × p positive semidefinite matrix: X satisfies RE(s0, k0) condition with overwhelming probability if for K := K(s0, k0, Σ1/2), n > 9c′α4 δ2 (2 + k0)2K 2
s0 ∧ s0 log p
Analysis there used a result in Mendelson et al. 07, 08. The current result does not involve ρmax(s0, A), nor the global parameters of the matrices A and Ψ, such as the norm or the smallest singular value. Recall the Spiked Identity matrix: for a ∈ [0, 1), Σp×p = (1 − a)Ip×p + a 1 1T which satisfies the RE condition, such that ρmax(s0, A) grows linearly with s0 while the maximum of
Design matrices with uniformly bounded entries
Let Y ∈ Rp be a random vector such that Y∞ ≤ M a.s. and denote Σ = EYY T. Let X be an n × p matrix, whose rows X1, . . . , Xn ∼ Y. Set d = s0 + s0 max
j
2
δ2
Theorem
Assume that d ≤ p and ρ = ρmin(d, Σ1/2) > 0. Let Σ satisfy the RE(s0, 3k0, Σ1/2) condition. Suppose n ≥ CM2d · log p ρδ2 · log3 CM2d · log p ρδ2
Then with probability at least 1 − exp
holds for matrix X/√n with 0 < K(s0, k0, X/√n)) ≤ K(s0,k0,Σ1/2)
1−δ
.
Remarks on applying the reduction principle
To analyze different classes of random design matrices: Unlike the case of a random matrix with subgaussian marginals, the estimate of the second example contains the minimal sparse singular value ρmin(d, Σ1/2). The reconstruction of sparse signals by subgaussian design matrices or by random Fourier ensemble was analyzed in the literature before, however only under the RIP assumptions. The reduction principle can be applied to other types of random variables: e.g., random vectors with heavy-tailed marginals, or random vectors with log-concave densities.
References: Rudelson-Vershynin 05, Baraniuk et al. 08, Mendelson et al. 08, Vershynin 11a, b, Adamczak et al. 09.
Maurey’s empirical approximation argument (Pisier 81)
Let u1, . . . , uM ∈ Rq. Let y ∈ conv(u1, . . . , uM): y =
αjuj where αj ≥ 0, and
αj = 1. There exists a set L ⊂ {1, 2, . . . , M} such that |L| ≤ m = 4 maxj∈{1,...,M}
2
ε2 and a vector y′ ∈ conv(uj, j ∈ L) such that
proof An application of the probabilistic methods: if we only want to approximate y rather than exactly represent it as a convex combination
u1, . . . , u|L|.
Let y =
αjuj where αj ≥ 0, and
αj = 1.
Goal: to find a vector y′ ∈ conv(uj, j ∈ L) such that
Let Y be a random vector in Rq such that P (Y = uℓ) = αℓ, ℓ ∈ {1, . . . , M} Then E (Y) =
αℓuℓ = y.
Let Y1, . . . , Ym be independent copies of Y and let ε1, . . . , εm be ±1 i.i.d. mean zero Bernoulli random variables, chosen independently of Y1, . . . , Ym. By the standard symmetrization argument we have E
m
m
Yj
2
≤ 4E
m
m
εjYj
2
= 4 m2
m
E
2
4 maxℓ∈{1,...,M} uℓ2
2
m ≤ ε2 (1) where E
2
2 ≤
max
ℓ∈{1,...,M} uℓ2 2
and the last inequality in (1) follows from the definition of m.
Fix a realization Yj = ukj, j = 1, . . . , m for which
m
m
Yj
≤ ε.
ukm-1 uk2 uk3 ukm
...
The vector 1
m
m
j=1 Yj belongs to the convex hull of {uℓ : ℓ ∈ L}, where
L is the set of different elements from the sequence k1, . . . , km. Obviously |L| ≤ m and the lemma is proved. Q.E.D.
The Inclusion Lemma
To prove the restricted isometry of Ψ over the set of vectors in A
hull of the images of the sparse vectors with norms not exceeding (1 − δ)−1.
Lemma
Let 1 > δ > 0. Suppose RE(s0, k0, A) condition holds for matrix Aq×p. For a set J ⊂ {1, . . . , p}, EJ = span{ej : j ∈ J}. Set d = d(k0, A) = s0 + s0 max
j
2
0(k0 + 1)
δ2
Then A
|J|≤d
AEJ ∩ Sq−1 where for d ≥ p, EJ is understood to be Rp.
Conclusion
We prove a general reduction principle showing that if the matrix
under A, then the product ΨA satisfies the RE condition with a smaller parameter k0. We apply the reduction principle to analyze different classes of random design matrices. This analysis is reduced to checking that the almost isometry property holds for all vectors from some low-dimensional subspaces, which is easier than checking the RE property directly.
Thank you!