SLIDE 1 Non asymptotic study of the singular values of some random covariance matrices.
Olivier Gu´ edon
Universit´ e Paris-Est Marne-la-Vall´ ee
High-dimensional problems and quantum physics. June 2015
SLIDE 2 The setting.
Let A be a random matrix defined as A = (X1 . . . XN) where X1, . . . , XN are independent random vectors in Rn. What can we say about the singular values of A ? Study of M = AAT = (X1 . . . XN) XT
1
. . . XT
N
=
N
XiXT
i
SLIDE 3 The setting.
Let A be a random matrix defined as A = (X1 . . . XN) where X1, . . . , XN are independent random vectors in Rn. What can we say about the singular values of A ? Study of M = AAT = (X1 . . . XN) XT
1
. . . XT
N
=
N
XiXT
i
λmax(AAT) = sup
a∈Sn−1 N
Xi, a2 λmin(AAT) = inf
a∈Sn−1 N
Xi, a2
SLIDE 4
Random Matrix Theory.
All Xi’s have identically independent random entries. A = (aij)1≤i≤n,1≤j≤N, Mn = AAT Little n and N go to infinity and N/n → c > 1.
SLIDE 5 Random Matrix Theory.
All Xi’s have identically independent random entries. A = (aij)1≤i≤n,1≤j≤N, Mn = AAT Little n and N go to infinity and N/n → c > 1. Counting probability measure : νn,m = 1 n
n
δλk(Mn/
√ N)
SLIDE 6 Random Matrix Theory.
Marchenko-Pastur ’67 (bulk of the spectrum) If Ea2
i,j = 1, then with probability one, for any continuous
bounded functon f : [0, +∞) → R, lim
n→+∞
where ν is the semi-circle law.
SLIDE 7 Random Matrix Theory.
Marchenko-Pastur ’67 (bulk of the spectrum) If Ea2
i,j = 1, then with probability one, for any continuous
bounded functon f : [0, +∞) → R, lim
n→+∞
where ν is the semi-circle law. Bai-Yin ’93 (edge of the spectrum) If Ea2
i,j = 1, Eai,j = 0, and Ea4 i,j < +∞, then with probability
lim
n→∞ λmax( 1
N AAT) = 1 + n N lim
n→∞ λmin( 1
N AAT) = 1 − n N
SLIDE 8 Frame in Harmonic Analysis
Take an orthonormal basis from RM and project it on Rn (or on an n-dimensional subspace of RM). You get v1, . . . , vM : ∀x ∈ Rn, |x|2
2 = M
cjx, uj2 where uj =
vj |vj|2 and cj = |vj|2 2.
Define a random vector X in Rn as X = √n uj with proba cj
n .
Then for every vector θ ∈ Rn EX, θ2 =
M
cjuj, θ2 = |θ|2
2
SLIDE 9 Frame in Harmonic Analysis
Take an orthonormal basis from RM and project it on Rn (or on an n-dimensional subspace of RM. You get v1, . . . , vM : ∀x ∈ Rn, |x|2
2 = M
cjx, uj2 where uj =
vj |vj|2 and cj = |vj|2 2.
Define a random vector X in Rn as X = √n uj with proba cj
n .
Hence Σ = EX ⊗ X = Id
SLIDE 10 Frame in Harmonic Analysis
Take an orthonormal basis from RM and project it on Rn (or on an n-dimensional subspace of RM. You get v1, . . . , vM : ∀x ∈ Rn, |x|2
2 = M
cjx, uj2 where uj =
vj |vj|2 and cj = |vj|2 2.
Define a random vector X in Rn as X = √n uj with proba cj
n .
Hence Σ = EX ⊗ X = Id Question : Find the size N(ε) of a sample such that ∀θ ∈ Rn, (1 − ε)|θ|2
2 ≤ 1
N
N
Xj, θ2 ≤ (1 + ε)|θ|2
2
SLIDE 11 Frame in Harmonic Analysis
Take an orthonormal basis from RM and project it on Rn (or on an n-dimensional subspace of RM. You get v1, . . . , vM : ∀x ∈ Rn, |x|2
2 = M
cjx, uj2 where uj =
vj |vj|2 and cj = |vj|2 2.
Define a random vector X in Rn as X = √n uj with proba cj
n .
Hence Σ = EX ⊗ X = Id Question : Find the size N(ε) of a sample such that ∀θ ∈ Rn, (1 − ε)|θ|2
2 ≤ 1
N
N
Xj, θ2 ≤ (1 + ε)|θ|2
2
This gives a subset with a very particular structure.
SLIDE 12 Frame in Harmonic Analysis
Theorem (Rudelson, ’97). If X is a random vector in Rn such that |X|2 ≤ K√n a.s. and ∀θ ∈ Rn, EX, θ2 = |θ|2
2
: isotropy then for N ≈ CK(ε)n log n, E
N
N
XjXT
j − Id
SLIDE 13 Frame in Harmonic Analysis
Theorem (Rudelson, ’97). If X is a random vector in Rn such that |X|2 ≤ K√n a.s. and ∀θ ∈ Rn, EX, θ2 = |θ|2
2
: isotropy then for N ≈ CK(ε)n log n, E
N
N
XjXT
j − Id
The main assumption is |X|2 ≤ K√n, otherwise, there is no moment assumption of the entries.
SLIDE 14 Frame in Harmonic Analysis
Theorem (Rudelson, ’97). If X is a random vector in Rn such that |X|2 ≤ K√n a.s. and ∀θ ∈ Rn, EX, θ2 = |θ|2
2
: isotropy then for N ≈ CK(ε)n log n, E
N
N
XjXT
j − Id
The main assumption is |X|2 ≤ K√n, otherwise, there is no moment assumption of the entries. You can not do better ! Coupon collector.
SLIDE 15
Computing the volume of a convex body
K ⊂ Rn is given by a separation oracle
SLIDE 16
Computing the volume of a convex body
K ⊂ Rn is given by a separation oracle Elekes (’86), B´ ar´ any-F¨ uredi (’86) : it is not possible to compute with a deterministic algorithm in polynomial time the volume of a convex body (even approximately)
SLIDE 17
Computing the volume of a convex body
K ⊂ Rn is given by a separation oracle Elekes (’86), B´ ar´ any-F¨ uredi (’86) : it is not possible to compute with a deterministic algorithm in polynomial time the volume of a convex body (even approximately) Randomization - Given ε and η, Dyer-Frieze-Kannan(’89) established randomized algorithms returning a non-negative number ζ such that (1 − ε)ζ < Vol K < (1 + ε)ζ with probability at least 1 − η. The running time of the algorithm is polynomial in n, 1/ε and log(1/η).
SLIDE 18
Computing the volume of a convex body
K ⊂ Rn is given by a separation oracle Elekes (’86), B´ ar´ any-F¨ uredi (’86) : it is not possible to compute with a deterministic algorithm in polynomial time the volume of a convex body (even approximately) Randomization - Given ε and η, Dyer-Frieze-Kannan(’89) established randomized algorithms returning a non-negative number ζ such that (1 − ε)ζ < Vol K < (1 + ε)ζ with probability at least 1 − η. The running time of the algorithm is polynomial in n, 1/ε and log(1/η). The number of oracle calls is a random variable and the bound is for example on its expected value.
SLIDE 19
Computing the volume of a convex body
The randomized algorithm proposed by Kannan, Lov´ asz and Simonovits ’97 improves significantly the polynomial dependence.
SLIDE 20 Computing the volume of a convex body
The randomized algorithm proposed by Kannan, Lov´ asz and Simonovits ’97 improves significantly the polynomial dependence. Rounding - Put the convex body in a position where Bn
2 ⊂ K ⊂ d Bn 2
where d ≤ nconst.
SLIDE 21 Computing the volume of a convex body
The randomized algorithm proposed by Kannan, Lov´ asz and Simonovits ’97 improves significantly the polynomial dependence. Rounding - Put the convex body in a position where Bn
2 ⊂ K ⊂ d Bn 2
where d ≤ nconst.
- John (’48) : d ≤ n ( or d ≤ √n in the symmetric case).
How to find an algorithm to do so ?
SLIDE 22 Computing the volume of a convex body
The randomized algorithm proposed by Kannan, Lov´ asz and Simonovits ’97 improves significantly the polynomial dependence. Rounding - Put the convex body in a position where Bn
2 ⊂ K ⊂ d Bn 2
where d ≤ nconst.
- Idea : find an algorithm which produces in polynomial
time a matrix A such that AK is in an approximate isotropic position. Conjecture 2 of KLS (’97) : solved in 2010 by Adamczak, Litvak, Pajor, Tomczak-Jaegermann
SLIDE 23 Computing the volume of a convex body
The randomized algorithm proposed by Kannan, Lov´ asz and Simonovits ’97 improves significantly the polynomial dependence. Rounding - Put the convex body in a position where Bn
2 ⊂ K ⊂ d Bn 2
where d ≤ nconst.
- Idea : find an algorithm which produces in polynomial
time a matrix A such that AK is in an approximate isotropic position. Conjecture 2 of KLS (’97) : solved in 2010 by Adamczak, Litvak, Pajor, Tomczak-Jaegermann Computing the volume - Monte Carlo algorithm, estimates
Conjecture 1 of KLS (’95) : isoperimetric inequality -
SLIDE 24 Approximation of the covariance matrix.
Question of KLS (’97) : let X be a vector uniformly distributed on a convex body K, X1, . . . , XN ind. copies of X, what is the smallest N such that
N
N
Xj X⊤
j − EX X⊤
- ≤ ε
- EX X⊤
- · is the operator norm
SLIDE 25 Approximation of the covariance matrix.
Question of KLS (’97) : let X be a vector uniformly distributed on a convex body K, X1, . . . , XN ind. copies of X, what is the smallest N such that
N
N
Xj X⊤
j − Id
Assume EX X⊤ = Id,
SLIDE 26 Approximation of the covariance matrix.
Question of KLS (’97) : let X be a vector uniformly distributed on a convex body K, X1, . . . , XN ind. copies of X, what is the smallest N such that
N
N
Xj X⊤
j − Id
Assume EX X⊤ = Id, you want to control the smallest and the largest singular values. 1 − ε ≤ λmin
N
N
Xj X⊤
j
N
N
Xj X⊤
j
KLS n2/ε2, Bourgain n log3 n/ε2, ... Rudelson, Gu´ edon, Paouris, Aubrun, Giannopoulos ALPT (’10) n/ε2 : for general log-concave vectors
SLIDE 27
Log-concave random vectors
X is a log-concave random vector ↔ log-concave density with respect to the Lebesgue measure on Rn
SLIDE 28 Log-concave random vectors
X is a log-concave random vector ↔ log-concave density with respect to the Lebesgue measure on Rn Main properties about linear functionals : ∃C > 1, for all log-concave X, for all p ≥ 2, ∀θ ∈ Rn, (E|X, θ|p)1/p ≤ C p
SLIDE 29 Log-concave random vectors
X is a log-concave random vector ↔ log-concave density with respect to the Lebesgue measure on Rn Main properties about linear functionals : ∃C > 1, for all log-concave X, for all p ≥ 2, ∀θ ∈ Rn, (E|X, θ|p)1/p ≤ C p
Or equivalently in isotropic position, ∀θ ∈ Rn, ∀t ≥ 1, P (|X, θ| ≥ t) ≤ exp(−c t).
SLIDE 30 Log-concave random vectors
Theorem (Paouris, ’06). If X is an isotropic log-concave random vector in Rn then ∀t ≥ 1, P
- |X|2 ≥ c t √n
- ≤ exp(−C t √n)
SLIDE 31 Log-concave random vectors
Theorem (Paouris, ’06). If X is an isotropic log-concave random vector in Rn then ∀t ≥ 1, P
- |X|2 ≥ c t √n
- ≤ exp(−C t √n)
Theorem (ALPT, ’10) If X is isotropic log-concave then with probability greater than 1 − 2 exp(−c√n), sup
a∈Sn−1
N
N
n N .
SLIDE 32 Log-concave random vectors
Theorem (Paouris, ’06). If X is an isotropic log-concave random vector in Rn then ∀t ≥ 1, P
- |X|2 ≥ c t √n
- ≤ exp(−C t √n)
Theorem (ALPT, ’10) If X is isotropic log-concave then with probability greater than 1 − 2 exp(−c√n), sup
a∈Sn−1
N
N
n N .
N
N
XjXT
j − Id
Take N ≈ n
ε2.
SLIDE 33
Relaxation of the assumptions.
Several recent works in this direction : Srivastava-Vershynin, Vershynin, Mendelson-Paouris
SLIDE 34 Relaxation of the assumptions.
Several recent works in this direction : Srivastava-Vershynin, Vershynin, Mendelson-Paouris Theorem (Vershynin, ’11) If X is an isotropic random vector in Rn such that |X| ≤ K√n a.s. and ∀θ ∈ Rn, E|X, θ|q ≤ Lq for some q > 4 then with proba greater than 1 − δ sup
a∈Sn−1
N
N
- j=1
- Xi, a2 − 1
- ≤ Cq,L,δ(log log n)2 n
N 1
2− 1 q .
SLIDE 35 Relaxation of the assumptions.
Several recent works in this direction : Srivastava-Vershynin, Vershynin, Mendelson-Paouris Theorem (Mendelson-Paouris ’12) If X is an isotropic random vector in Rn such that |X|2 ≤ K(Nn)1/4 a.s. and ∀θ ∈ Rn, E|X, θ|q ≤ Lq for some q > 8 then with probability greater than 1 − ( 1 Nβ + exp(−cn)) sup
a∈Sn−1
N
N
n N .
SLIDE 36 Result
Theorem (GLPT ’14). If X1, . . . , XN are random vectors in Rn such that ∀j, ∀θ ∈ Rn, ∀t > 0 P (|Xj, θ| > t) ≤ 1 tq for some p ∈ (4, 8] Let ε ≤ min(1, p−4
4 ) and γ = p − 4 − 2ε. Then with
probability greater than 1 − 8 exp(−n) − 2ε−p/2 max(N−3/2, n− p−4
4 )
sup
a∈Sn−1
N
N
C 1 N max |Xi|2
2 + Cp,ε
n N γ/p .
SLIDE 37 Restricted Isometry Property
Let 1 ≤ m ≤ N, a vector z ∈ RN is said m-sparse if |{j : zj = 0}| ≤ m. And Um = SN−1 ∩ m − sparse. Let T be an n × N matrix, δm(T) is the smallest number such that for every m-sparse vector z (1 − δm(T))|z|2
2 ≤ |Tz|2 2 ≤ (1 + δm(T))|z|2 2
SLIDE 38 Restricted Isometry Property
Let 1 ≤ m ≤ N, a vector z ∈ RN is said m-sparse if |{j : zj = 0}| ≤ m. And Um = SN−1 ∩ m − sparse. Let T be an n × N matrix, δm(T) is the smallest number such that for every m-sparse vector z (1 − δm(T))|z|2
2 ≤ |Tz|2 2 ≤ (1 + δm(T))|z|2 2
Equivalently δm(T) = sup
z∈Um
2 − 1
SLIDE 39 Restricted Isometry Property
Let 1 ≤ m ≤ N, a vector z ∈ RN is said m-sparse if |{j : zj = 0}| ≤ m. And Um = SN−1 ∩ m − sparse. Let T be an n × N matrix, δm(T) is the smallest number such that for every m-sparse vector z (1 − δm(T))|z|2
2 ≤ |Tz|2 2 ≤ (1 + δm(T))|z|2 2
Equivalently δm(T) = sup
z∈Um
2 − 1
1 √nA. Then
δm A √n
z∈Um
1 n
zjXj
2
− 1
SLIDE 40 Restricted Isometry Property
Take T =
1 √nA. Observe that δm is increasing in m and
δ1 A √n
A √n
A √n
z∈Um
1 n
zjXj
2
−
N
z2
j |Xj|2 2
SLIDE 41 Restricted Isometry Property
Take T =
1 √nA. Observe that δm is increasing in m and
δ1 A √n
A √n
A √n
z∈Um
1 n
zjXj
2
−
N
z2
j |Xj|2 2
But δ1 A √n
1≤j≤N
2
n − 1
SLIDE 42 Restricted Isometry Property
Take T =
1 √nA. Observe that δm is increasing in m and
δ1 A √n
A √n
A √n
z∈Um
1 n
zjXj
2
−
N
z2
j |Xj|2 2
But δ1 A √n
1≤j≤N
2
n − 1
Define P(δ) = P
1≤j≤N
2
n − 1
SLIDE 43 Result
Theorem (GLPT ’14). If X1, . . . , XN are random vectors in Rn such that ∀j, ∀θ ∈ Rn, ∀t > 0 P (|Xj, θ| > t) ≤ 1 tq for some p > 4. Let ε ≤ min(1, p−4
4 ) and γ = p − 4 − 2ε. Let δ ∈ (0, 1) and
m = Cp,ε,δ n N n − 2(2+ε)
p−4−2ε
Then with probability greater than 1 − Cε,p 1 N + N np/4
δ 2
δm A √n
SLIDE 44 The problem.
On average. E sup
a∈Sn−1
N
N
SLIDE 45 The problem.
On average. E sup
a∈Sn−1
N
N
j, a2
SLIDE 46 The problem.
On average. EE′ sup
a∈Sn−1
N
N
j, a2
SLIDE 47 The problem.
On average. EE′ sup
a∈Sn−1
N
N
j, a2
= EE′Eε sup
a∈Sn−1
N
N
εj
j, a2
SLIDE 48 The problem.
On average. EE′ sup
a∈Sn−1
N
N
j, a2
= EE′Eε sup
a∈Sn−1
N
N
εj
j, a2
a∈Sn−1
N
N
εjXj, a2
SLIDE 49 The very very first step.
Symmetrization. E sup
a∈Sn−1
N
N
2EEε sup
a∈Sn−1
N
N
εjXj, a2
εjXj, a2
SLIDE 50 The very very first step.
Symmetrization. E sup
a∈Sn−1
N
N
2EEε sup
a∈Sn−1
N
N
εjXj, a2
εjXj, a2
subgaussian or non-commutative Khintchine inequalities
SLIDE 51 The very very first step.
Observation : for any p ≥ 2
εjtj
≤ C√p N
t2
j
1/2 sub gaussian and
εjtj
≤
N
|tj| trivial
SLIDE 52 The very very first step.
Observation : for any p ≥ 2
εjtj
≤ C√p N
t2
j
1/2 sub gaussian and
εjtj
≤
N
|tj| trivial Combine
εjtj
≤
k
t∗
j + C√p
N
(t∗
j )2
1/2
SLIDE 53 The very very first step.
EEε sup
a∈Sn−1
εjXj, a2
What is sup
a∈Sn−1 k
t∗
j = sup a∈Sn−1 k
(Xj, a∗)2 ?
SLIDE 54 The very very first step.
EEε sup
a∈Sn−1
εjXj, a2
Duality and sparsity
a∈Sn−1 k
t∗
j
1/2 =
a∈Sn−1 k
(Xj, a∗)2 1/2 = sup
a∈Sn−1 sup α∈Sk−1 k
αjXj, a∗ = sup
a∈Sn−1 sup α∈Uk k
αjXj, a = sup
α∈Uk
αj Xj
SLIDE 55 The key Lemma. Correct symmetrization
Define Ak by Ak = sup
α∈Uk
αj Xj
- 2
- Lemma. For every A, Z > 0,
sup
a∈Sn−1
- N
- j=1
- Xj, a2 − EXj, a2
- ≤ 2A2 + 6√nZ + 8N
2 min(p,4)
with probability larger than 1−4 exp(−n)−4P(Ak > A)−4·9n sup
a∈Sn−1 P
(Xi, a∗)4 > Z2
SLIDE 56 The key Lemma. Correct symmetrization
Define Ak by Ak = sup
α∈Uk
αj Xj
- 2
- Lemma. For every A, Z > 0,
sup
a∈Sn−1
- N
- j=1
- Xj, a2 − EXj, a2
- ≤ 2A2 + 6√nZ + 8N
2 min(p,4)
with probability larger than 1−4 exp(−n)−4P(Ak > A)−4·9n sup
a∈Sn−1 P
(Xi, a∗)4 > Z2
- Mixture of the trivail bound for the Rademacher
(combinatorics on the rearrangement) and subgaussian.
SLIDE 57 Second step.
Choose k = n. Then if p > 4 and for all t ≥ 1, for all a ∈ Sn−1, P (|Xi, a| > t) ≤ 1 tp then with proba greater than 1 − 10−n ;
(Xi, a∗)4 ≤ Cp N → Z = Cp √ N
SLIDE 58 Third and main step. Evaluate Ak
A2
k = sup α∈Uk
αj Xj
2
= sup
α∈Uk
αi Xi, αj Xj +
k
α2
j |Xj|2 2
≤ sup
α∈Uk
αi Xi, αj Xj + max
j≤N |Xj|2 2
SLIDE 59 Third and main step. Evaluate Ak
Call B2
k = sup α∈Uk
αi Xi, αj Xj
SLIDE 60 Third and main step. Evaluate Ak
Call B2
k = sup α∈Uk
αi Xi, αj Xj
- Combinatorial argument :
- i=j
αi Xi, αj Xj = 4 2N
αiXi,
∈I
αjXj
SLIDE 61 Third and main step. Evaluate Ak
Call B2
k = sup α∈Uk
αi Xi, αj Xj
- Combinatorial argument :
- i=j
αi Xi, αj Xj = 4 2N
αiXi,
∈I
αjXj Call Qk(I) = sup
α∈Uk
αiXi,
∈I
αjXj
B2
k ≤ 4
2N
Qk(I).
SLIDE 62 Third and main step. Evaluate Ak
Call B2
k = sup α∈Uk
αi Xi, αj Xj
- Combinatorial argument :
- i=j
αi Xi, αj Xj = 4 2N
αiXi,
∈I
αjXj Call Qk(I) = sup
α∈Uk
αiXi,
∈I
αjXj
B2
k ≤ 4
2N
Qk(I). Study of quadratic forms.
SLIDE 63
Remarks.
Recent work about the smallest singular value : Srivastava-Vershynin, Oliveira (Roberto Imbuzeiro), Koltchinskii-Mendelson → It is ”easier” in the sense that you need weaker assumption : fourth moment assumption of the linear functional gives the good rate, like in Bai-Yin result. → Reconstruction
SLIDE 64
THANK YOU