Approximating the covariance matrix with heavy tailed columns and - - PowerPoint PPT Presentation

approximating the covariance matrix with heavy tailed
SMART_READER_LITE
LIVE PREVIEW

Approximating the covariance matrix with heavy tailed columns and - - PowerPoint PPT Presentation

Approximating the covariance matrix with heavy tailed columns and RIP. Alexander Litvak University of Alberta based on a joint work with O. Gudon, A. Pajor and N. Tomczak-Jaegermann (the paper On the interval ... available at:


slide-1
SLIDE 1

Approximating the covariance matrix with heavy tailed columns and RIP.

Alexander Litvak

University of Alberta

based on a joint work with

  • O. Guédon, A. Pajor and N. Tomczak-Jaegermann

(the paper “On the interval ...” available at: http://www.math.ualberta.ca/˜alexandr/)

Aleksander Pełczy´ nski Memorial Conference Bedlewo, 2014

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 1 / 21

slide-2
SLIDE 2

Notations

· , · denotes the canonical inner product on Rn. | · | denotes the canonical Euclidean norm on Rn.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 2 / 21

slide-3
SLIDE 3

Notations

· , · denotes the canonical inner product on Rn. | · | denotes the canonical Euclidean norm on Rn. A random vector X ∈ Rn is called isotropic if for all y ∈ Rn. EX, y = 0 and E |X, y|2 = |y|2. In other words, if X is centered and its covariance matrix is the identity: E X ⊗ X = Id (recall (X ⊗ Y)(z) = X, zY

  • r

X ⊗ Y = {YiXj}ij).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 2 / 21

slide-4
SLIDE 4

Notations

· , · denotes the canonical inner product on Rn. | · | denotes the canonical Euclidean norm on Rn. A random vector X ∈ Rn is called isotropic if for all y ∈ Rn. EX, y = 0 and E |X, y|2 = |y|2. In other words, if X is centered and its covariance matrix is the identity: E X ⊗ X = Id (recall (X ⊗ Y)(z) = X, zY

  • r

X ⊗ Y = {YiXj}ij). For an n × N matrix T its operator norm from ℓN

2 to ℓn 2 is denoted by

T = sup

|x|=1

|Tx|.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 2 / 21

slide-5
SLIDE 5

KLS problem

We consider the following model: X1, . . . , XN are independent random vectors in Rn. For simplicity we assume that they are identically distributed and isotropic.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 3 / 21

slide-6
SLIDE 6

KLS problem

We consider the following model: X1, . . . , XN are independent random vectors in Rn. For simplicity we assume that they are identically distributed and isotropic. Approximation of covariance matrix (Kannan-Lovász-Simonovits (KLS) question): How many random vectors Xi are needed for the empirical covariance matrix 1 N

N

  • i=1

Xi ⊗ Xi to approximate the identity with overwhelming probability? (In Asymptotic Geometric Analysis this question was first asked about vectors uniformly distributed in an isotropic convex body. The approximation was needed in

  • rder to estimate the complexity of an algorithm computing the volume of the body).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 3 / 21

slide-7
SLIDE 7

KLS problem

Given ε > 0, how large N must be in order to have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • Alexander Litvak (Univ. of Alberta)

Approximating the covariance matrix and RIP Bedlewo, 2014 4 / 21

slide-8
SLIDE 8

KLS problem

Given ε > 0, how large N must be in order to have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • = sup

y∈Sn−1

  • 1

N

N

  • i=1
  • Xi, y2 − 1
  • Alexander Litvak (Univ. of Alberta)

Approximating the covariance matrix and RIP Bedlewo, 2014 4 / 21

slide-9
SLIDE 9

KLS problem

Given ε > 0, how large N must be in order to have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • = sup

y∈Sn−1

  • 1

N

N

  • i=1
  • Xi, y2 − 1
  • ≤ ε

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 4 / 21

slide-10
SLIDE 10

KLS problem

Given ε > 0, how large N must be in order to have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • = sup

y∈Sn−1

  • 1

N

N

  • i=1
  • Xi, y2 − 1
  • ≤ ε
  • r, equivalently,

∀y ∈ Sn−1 1 − ε ≤ 1 N

N

  • i=1

Xi, y2 ≤ 1 + ε.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 4 / 21

slide-11
SLIDE 11

KLS problem

Given ε > 0, how large N must be in order to have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • = sup

y∈Sn−1

  • 1

N

N

  • i=1
  • Xi, y2 − 1
  • ≤ ε
  • r, equivalently,

∀y ∈ Sn−1 1 − ε ≤ 1 N

N

  • i=1

Xi, y2 ≤ 1 + ε. KLS (95/97): N ∼ C(ε, δ)n2 with Prob ≥ 1 − δ.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 4 / 21

slide-12
SLIDE 12

KLS problem

Given ε > 0, how large N must be in order to have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • = sup

y∈Sn−1

  • 1

N

N

  • i=1
  • Xi, y2 − 1
  • ≤ ε
  • r, equivalently,

∀y ∈ Sn−1 1 − ε ≤ 1 N

N

  • i=1

Xi, y2 ≤ 1 + ε. KLS (95/97): N ∼ C(ε, δ)n2 with Prob ≥ 1 − δ. Bourgain (96/99): N ∼ C(ε, δ)n ln3 n with Prob ≥ 1 − δ.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 4 / 21

slide-13
SLIDE 13

KLS problem

Given ε > 0, how large N must be in order to have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • = sup

y∈Sn−1

  • 1

N

N

  • i=1
  • Xi, y2 − 1
  • ≤ ε
  • r, equivalently,

∀y ∈ Sn−1 1 − ε ≤ 1 N

N

  • i=1

Xi, y2 ≤ 1 + ε. KLS (95/97): N ∼ C(ε, δ)n2 with Prob ≥ 1 − δ. Bourgain (96/99): N ∼ C(ε, δ)n ln3 n with Prob ≥ 1 − δ. Improved to N ∼ C(ε, δ)n ln2 n by Rudelson and to N ∼ C(ε, δ)n ln n by Giannopoulos, Hartzoulaki, Tsolomitis and by Paouris.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 4 / 21

slide-14
SLIDE 14

KLS problem

Given ε > 0, how large N must be in order to have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • = sup

y∈Sn−1

  • 1

N

N

  • i=1
  • Xi, y2 − 1
  • ≤ ε
  • r, equivalently,

∀y ∈ Sn−1 1 − ε ≤ 1 N

N

  • i=1

Xi, y2 ≤ 1 + ε. KLS (95/97): N ∼ C(ε, δ)n2 with Prob ≥ 1 − δ. Bourgain (96/99): N ∼ C(ε, δ)n ln3 n with Prob ≥ 1 − δ. Improved to N ∼ C(ε, δ)n ln2 n by Rudelson and to N ∼ C(ε, δ)n ln n by Giannopoulos, Hartzoulaki, Tsolomitis and by Paouris. Aubrun (07): N ∼ n/ε2 if X1 is unconditional with Prob ≥ 1 − exp(−cn1/5).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 4 / 21

slide-15
SLIDE 15

Solution of KLS Problem in log-concave setting.

Theorem (Adamczak-LPT, 2010)

Let X1, . . . , XN be independent isotropic log-concave random vectors. Let ε ∈ (0, 1). Then for N ≥ Cn/ε2

  • ne has

P

  • sup

y∈Sn−1

  • 1

N

N

  • i=1

(|Xi, y|2 − E|Xi, y|2)

  • ≤ ε
  • ≥ 1 − exp (−c√n).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 5 / 21

slide-16
SLIDE 16

Solution of KLS Problem in log-concave setting.

Theorem (Adamczak-LPT, 2010)

Let X1, . . . , XN be independent isotropic log-concave random vectors. Let ε ∈ (0, 1). Then for N ≥ Cn/ε2

  • ne has

P

  • sup

y∈Sn−1

  • 1

N

N

  • i=1

(|Xi, y|2 − E|Xi, y|2)

  • ≤ ε
  • ≥ 1 − exp (−c√n).

In other words, for N ≥ Cn, with high probability we have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • ≤ C

n N .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 5 / 21

slide-17
SLIDE 17

Solution of KLS Problem in log-concave setting.

Theorem (Adamczak-LPT, 2010)

Let X1, . . . , XN be independent isotropic log-concave random vectors. Let ε ∈ (0, 1). Then for N ≥ Cn/ε2

  • ne has

P

  • sup

y∈Sn−1

  • 1

N

N

  • i=1

(|Xi, y|2 − E|Xi, y|2)

  • ≤ ε
  • ≥ 1 − exp (−c√n).

In other words, for N ≥ Cn, with high probability we have

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • ≤ C

n N .

  • Remark. A measure µ on Rn is log-concave if for every measurable A, B ⊂ Rn and

every θ ∈ [0, 1], µ(θA + (1 − θ)B) ≥ µ(A)θµ(B)(1−θ)

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 5 / 21

slide-18
SLIDE 18

Relations to standard Random Matrix Theory (RMT)

RMT studies in particular limit behavior of singular numbers of random matrices. Recall for n × N matrix A, the largest and the smallest singular values are defined as s1(A) = sup

|x|=1

Ax = A and sn(A) = inf

|x|=1 Ax = 1/A−1.

Classical result is the Bai-Yin Theorem.

Theorem (Bai-Yin)

Let A be an n × N random matrix with i.i.d. entries whose 4-th moments are bounded. Let β = lim

n→∞

n N ∈ (0, 1). Then 1 −

  • β = lim

n→∞ sn(A/

√ N) ≤ lim

n→∞ s1(A/

√ N) = 1 +

  • β.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 6 / 21

slide-19
SLIDE 19

Relations to standard Random Matrix Theory (RMT)

RMT studies in particular limit behavior of singular numbers of random matrices. Recall for n × N matrix A, the largest and the smallest singular values are defined as s1(A) = sup

|x|=1

Ax = A and sn(A) = inf

|x|=1 Ax = 1/A−1.

Classical result is the Bai-Yin Theorem.

Theorem (Bai-Yin)

Let A be an n × N random matrix with i.i.d. entries whose 4-th moments are bounded. Let β = lim

n→∞

n N ∈ (0, 1). Then 1 −

  • β = lim

n→∞ sn(A/

√ N) ≤ lim

n→∞ s1(A/

√ N) = 1 +

  • β.

AGA point of view: We are interested in asymptotic non-limit behavior, i.e. we would like to provide the quantitative estimates on the rate of convergence.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 6 / 21

slide-20
SLIDE 20

Relations to standard RMT

Theorem (ALPT)

Let n ≤ N. Let A be a random n × N matrix, whose columns X1, . . . , XN are isotropic log-concave independent random vectors in Rn. Denoting β = n/N we have 1 − C

  • β ≤ sn(A/

√ N) ≤ s1(A/ √ N) ≤ 1 + C

  • β,

with probability at least 1 − 2 exp(−c√n).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 7 / 21

slide-21
SLIDE 21

Relations to standard RMT

Theorem (ALPT)

Let n ≤ N. Let A be a random n × N matrix, whose columns X1, . . . , XN are isotropic log-concave independent random vectors in Rn. Denoting β = n/N we have 1 − C

  • β ≤ sn(A/

√ N) ≤ s1(A/ √ N) ≤ 1 + C

  • β,

with probability at least 1 − 2 exp(−c√n). Compare with the Bai-Yin Theorem: 1 −

  • β = lim

n→∞ sn ≤ lim n→∞ s1 = 1 +

  • β.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 7 / 21

slide-22
SLIDE 22

Approximation of covariance matrix: heavy tails

Question: Under what conditions can the KLS problem be solved with N ∼ n?

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 8 / 21

slide-23
SLIDE 23

Approximation of covariance matrix: heavy tails

Question: Under what conditions can the KLS problem be solved with N ∼ n? For example, is it enough to assume that |Xi| ≤ C√n with high probability and σq(X1) := sup

|x|=1

(E|X1, x|q)1/q ≤ C for some q > 2?

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 8 / 21

slide-24
SLIDE 24

Approximation of covariance matrix: heavy tails

Question: Under what conditions can the KLS problem be solved with N ∼ n? For example, is it enough to assume that |Xi| ≤ C√n with high probability and σq(X1) := sup

|x|=1

(E|X1, x|q)1/q ≤ C for some q > 2? Vershynin (2012): For q > 4 if σq(X1) ≤ C1 and if |Xi| < C1 √n a.s. then

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • ≤ C (ln ln n)2 (n/N)1/2−1/q.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 8 / 21

slide-25
SLIDE 25

Approximation of covariance matrix: heavy tails

Question: Under what conditions can the KLS problem be solved with N ∼ n? For example, is it enough to assume that |Xi| ≤ C√n with high probability and σq(X1) := sup

|x|=1

(E|X1, x|q)1/q ≤ C for some q > 2? Vershynin (2012): For q > 4 if σq(X1) ≤ C1 and if |Xi| < C1 √n a.s. then

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • ≤ C (ln ln n)2 (n/N)1/2−1/q.

He also conjectured that “ln ln n is not needed for an appropriate q, probably q = 4 or even q > 2.”

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 8 / 21

slide-26
SLIDE 26

Approximation of covariance matrix: heavy tails

Srivastava and Vershynin (2013): A solution (in average) under strong assumption on projections: there is η > 0 such that for every projection of rank k and every t ≥ C √ k, P (|PX1| ≥ t) ≤ C/t2(1+η)

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 9 / 21

slide-27
SLIDE 27

Approximation of covariance matrix: heavy tails

Srivastava and Vershynin (2013): A solution (in average) under strong assumption on projections: there is η > 0 such that for every projection of rank k and every t ≥ C √ k, P (|PX1| ≥ t) ≤ C/t2(1+η) Moreover, for the smallest singular value only 1-dimensional projections are needed: σq(X1) ≤ C for some q > 2.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 9 / 21

slide-28
SLIDE 28

Approximation of covariance matrix: heavy tails

Srivastava and Vershynin (2013): A solution (in average) under strong assumption on projections: there is η > 0 such that for every projection of rank k and every t ≥ C √ k, P (|PX1| ≥ t) ≤ C/t2(1+η) Moreover, for the smallest singular value only 1-dimensional projections are needed: σq(X1) ≤ C for some q > 2. Mendelson and Paouris (2012, 2014): A solution with high probability

  • 1. For q > 4 assuming that X1 is unconditional and that for some p > 2

∃ p > 2 : X1ℓn

p ≤ Cn1/p a.s. Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 9 / 21

slide-29
SLIDE 29

Approximation of covariance matrix: heavy tails

Srivastava and Vershynin (2013): A solution (in average) under strong assumption on projections: there is η > 0 such that for every projection of rank k and every t ≥ C √ k, P (|PX1| ≥ t) ≤ C/t2(1+η) Moreover, for the smallest singular value only 1-dimensional projections are needed: σq(X1) ≤ C for some q > 2. Mendelson and Paouris (2012, 2014): A solution with high probability

  • 1. For q > 4 assuming that X1 is unconditional and that for some p > 2

∃ p > 2 : X1ℓn

p ≤ Cn1/p a.s.

  • 2. For q > 8 assuming that σq(X1) ≤ C and that

maxi |Xi| ≤ (nN)1/4 a.s.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 9 / 21

slide-30
SLIDE 30

Approximation of covariance matrix: heavy tails

Srivastava and Vershynin (2013): A solution (in average) under strong assumption on projections: there is η > 0 such that for every projection of rank k and every t ≥ C √ k, P (|PX1| ≥ t) ≤ C/t2(1+η) Moreover, for the smallest singular value only 1-dimensional projections are needed: σq(X1) ≤ C for some q > 2. Mendelson and Paouris (2012, 2014): A solution with high probability

  • 1. For q > 4 assuming that X1 is unconditional and that for some p > 2

∃ p > 2 : X1ℓn

p ≤ Cn1/p a.s.

  • 2. For q > 8 assuming that σq(X1) ≤ C and that

maxi |Xi| ≤ (nN)1/4 a.s. In both MP and SV works: for i.i.d. entries with bounded moment q > 4.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 9 / 21

slide-31
SLIDE 31

KLS: 4-th moment

Theorem (GLPT)

Let X1, . . . , XN be independent isotropic random vectors. Let 4 < q ≤ 8 and p < q − 4. Assume that ∀y ∈ Sn−1 ∀t > 0 P (|X, y| > t) ≤ t−q. Then with high probability

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • ≤ C

N max

i≤N |Xi|2 + C(p, q)

n N p/q .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 10 / 21

slide-32
SLIDE 32

KLS: 4-th moment

Theorem (GLPT)

Let X1, . . . , XN be independent isotropic random vectors. Let 4 < q ≤ 8 and p < q − 4. Assume that ∀y ∈ Sn−1 ∀t > 0 P (|X, y| > t) ≤ t−q. Then with high probability

  • 1

N

N

  • i=1

Xi ⊗ Xi − Id

  • ≤ C

N max

i≤N |Xi|2 + C(p, q)

n N p/q . In particular, if max

i≤N |Xi|2 ≤ np/qN1−p/q

then

  • 1

N

N

  • i=1

Xi⊗Xi−Id

  • ≤ C(p, q)

n N p/q .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 10 / 21

slide-33
SLIDE 33

RIP

Candes and Tao (2005) introduced the concept of Restricted Isometry Property (RIP) for a given matrix T in search for sufficient conditions for T to satisfy some “reconstruction” conditions from “compressed sensing” and coding theory.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 11 / 21

slide-34
SLIDE 34

RIP

Candes and Tao (2005) introduced the concept of Restricted Isometry Property (RIP) for a given matrix T in search for sufficient conditions for T to satisfy some “reconstruction” conditions from “compressed sensing” and coding theory.

RIP parameter of order m

is the smallest number δ = δm(T) such that for every m-sparse vector x ∈ RN (1 − δ)|x|2 ≤ |Tx|2 ≤ (1 + δ)|x|2 (x is m-sparse if it has at most m non-zero coordinates).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 11 / 21

slide-35
SLIDE 35

RIP

Candes and Tao (2005) introduced the concept of Restricted Isometry Property (RIP) for a given matrix T in search for sufficient conditions for T to satisfy some “reconstruction” conditions from “compressed sensing” and coding theory.

RIP parameter of order m

is the smallest number δ = δm(T) such that for every m-sparse vector x ∈ RN (1 − δ)|x|2 ≤ |Tx|2 ≤ (1 + δ)|x|2 (x is m-sparse if it has at most m non-zero coordinates). That is, every sub-matrix of T obtained by taking m columns is “almost” isometry.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 11 / 21

slide-36
SLIDE 36

RIP

Candes and Tao (2005) introduced the concept of Restricted Isometry Property (RIP) for a given matrix T in search for sufficient conditions for T to satisfy some “reconstruction” conditions from “compressed sensing” and coding theory.

RIP parameter of order m

is the smallest number δ = δm(T) such that for every m-sparse vector x ∈ RN (1 − δ)|x|2 ≤ |Tx|2 ≤ (1 + δ)|x|2 (x is m-sparse if it has at most m non-zero coordinates). That is, every sub-matrix of T obtained by taking m columns is “almost” isometry. Note that columns of T have to satisfy |Ti| = |Tei| ∼ 1.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 11 / 21

slide-37
SLIDE 37

RIP

Candes and Tao (2005) introduced the concept of Restricted Isometry Property (RIP) for a given matrix T in search for sufficient conditions for T to satisfy some “reconstruction” conditions from “compressed sensing” and coding theory.

RIP parameter of order m

is the smallest number δ = δm(T) such that for every m-sparse vector x ∈ RN (1 − δ)|x|2 ≤ |Tx|2 ≤ (1 + δ)|x|2 (x is m-sparse if it has at most m non-zero coordinates). That is, every sub-matrix of T obtained by taking m columns is “almost” isometry. Note that columns of T have to satisfy |Ti| = |Tei| ∼ 1. There are many papers on these topics.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 11 / 21

slide-38
SLIDE 38

RIP

Candes proved that if a matrix T satisfies δ2m (T) < δ0 = √ 2 − 1 then the following holds:

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 12 / 21

slide-39
SLIDE 39

RIP

Candes proved that if a matrix T satisfies δ2m (T) < δ0 = √ 2 − 1 then the following holds:

basis pursuit algorithm (exact reconstruction by ℓ1 minimization)

whenever Tz = y has an m-sparse solution z0, then z0 is the unique solution of the problem min z1, Tz = y.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 12 / 21

slide-40
SLIDE 40

RIP

Candes proved that if a matrix T satisfies δ2m (T) < δ0 = √ 2 − 1 then the following holds:

basis pursuit algorithm (exact reconstruction by ℓ1 minimization)

whenever Tz = y has an m-sparse solution z0, then z0 is the unique solution of the problem min z1, Tz = y. Donoho (2005) showed that the later condition is equivalent to a condition on the neighborliness of polytopes in Rn (i.e. the above is equivalent to the following: TBN

1 is m-centrally-neighborly, that is every set of m vertices containing no opposite

pairs forms a vertex set of a face).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 12 / 21

slide-41
SLIDE 41

RIP

Let X1, ..., XN be i.i.d. random vectors in Rn and assume that their Euclidean norms are concentrated around √n. Let T be an n × N matrix whose columns are Xi/√n.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 13 / 21

slide-42
SLIDE 42

RIP

Let X1, ..., XN be i.i.d. random vectors in Rn and assume that their Euclidean norms are concentrated around √n. Let T be an n × N matrix whose columns are Xi/√n. In the Gaussian case, the Bernoulli (±1) case, the sub-Gaussian case one has δ2m ≤ δ0 with high probability for m = Cn ln(2N/n). Many works by: Baraniuk, Candes, Cohen, Dahmen, Davenport, DeVore, Donoho, Kashin, Mendelson, Pajor, Romberg, Rudelson, Tao, Temlyakov, Vershynin, Tomczak-Jaegermann, Wakin...

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 13 / 21

slide-43
SLIDE 43

RIP

Let X1, ..., XN be i.i.d. random vectors in Rn and assume that their Euclidean norms are concentrated around √n. Let T be an n × N matrix whose columns are Xi/√n. In the Gaussian case, the Bernoulli (±1) case, the sub-Gaussian case one has δ2m ≤ δ0 with high probability for m = Cn ln(2N/n). Many works by: Baraniuk, Candes, Cohen, Dahmen, Davenport, DeVore, Donoho, Kashin, Mendelson, Pajor, Romberg, Rudelson, Tao, Temlyakov, Vershynin, Tomczak-Jaegermann, Wakin... ALPT (2011): Similar estimates with m = Cn/ ln2(2N/n) provided that ∀y ∈ Sn−1 P (|X1, y| > t) ≤ C exp(−ct). For example, for isotropic log-concave vectors.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 13 / 21

slide-44
SLIDE 44

RIP

Let X1, ..., XN be i.i.d. random vectors in Rn and assume that their Euclidean norms are concentrated around √n. Let T be an n × N matrix whose columns are Xi/√n. In the Gaussian case, the Bernoulli (±1) case, the sub-Gaussian case one has δ2m ≤ δ0 with high probability for m = Cn ln(2N/n). Many works by: Baraniuk, Candes, Cohen, Dahmen, Davenport, DeVore, Donoho, Kashin, Mendelson, Pajor, Romberg, Rudelson, Tao, Temlyakov, Vershynin, Tomczak-Jaegermann, Wakin... ALPT (2011): Similar estimates with m = Cn/ ln2(2N/n) provided that ∀y ∈ Sn−1 P (|X1, y| > t) ≤ C exp(−ct). For example, for isotropic log-concave vectors.

  • Question. Under what (weakest) conditions on Xi’s can one obtain RIP?

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 13 / 21

slide-45
SLIDE 45

Main Results

Theorem (GLPT)

Let q > 4 and p >

4 q−4. Let X1, ..., XN be independent random vectors in Rn such that

that their Euclidean norms are concentrated around √n and assume ∀y ∈ Sn−1 P (|Xi, y| > t) ≤ C/tq. Then the matrix T whose columns are Xi/√n satisfies δ2m(T) ≤ δ0 with high probability for m = C(p, q) n (N/n)p .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 14 / 21

slide-46
SLIDE 46

Main Results

Theorem (GLPT)

Let q > 4 and p >

4 q−4. Let X1, ..., XN be independent random vectors in Rn such that

that their Euclidean norms are concentrated around √n and assume ∀y ∈ Sn−1 P (|Xi, y| > t) ≤ C/tq. Then the matrix T whose columns are Xi/√n satisfies δ2m(T) ≤ δ0 with high probability for m = C(p, q) n (N/n)p . Sharpness: For p <

2 q−2, one can’t get better than

C0(p, q) n (N/n)p .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 14 / 21

slide-47
SLIDE 47

Main Results

Theorem (GLPT)

Let α ∈ (0, 2]. Let X1, ..., XN be independent random vectors in Rn such that that their Euclidean norms are concentrated around √n and assume ∀y ∈ Sn−1 P (|Xi, y| > t) ≤ C exp(−ctα). Then the matrix T whose columns are Xi/√n satisfies δ2m(T) ≤ δ0 with high probability for m = Cα n (ln(2N/n))2/α . Sharpness: The bound on m is sharp up to constant Cα.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 15 / 21

slide-48
SLIDE 48

Ideas of proofs

The main technical tool is obtaining bounds on the following two parameters.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 16 / 21

slide-49
SLIDE 49

Ideas of proofs

The main technical tool is obtaining bounds on the following two parameters. For m ≤ N and random vectors X1, ..., XN in Rn, define Am and Bm by 1. Am := sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

aiXi

  • .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 16 / 21

slide-50
SLIDE 50

Ideas of proofs

The main technical tool is obtaining bounds on the following two parameters. For m ≤ N and random vectors X1, ..., XN in Rn, define Am and Bm by 1. Am := sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

aiXi

  • .

Note, if A is the matrix with columns Xi, then Am is the supremum of norms of submatrices consisting of m columns of A. The problem of estimating Am is interesting by itself, although for KLS problem only m = n is needed.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 16 / 21

slide-51
SLIDE 51

Ideas of proofs

The main technical tool is obtaining bounds on the following two parameters. For m ≤ N and random vectors X1, ..., XN in Rn, define Am and Bm by 1. Am := sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

aiXi

  • .

Note, if A is the matrix with columns Xi, then Am is the supremum of norms of submatrices consisting of m columns of A. The problem of estimating Am is interesting by itself, although for KLS problem only m = n is needed. 2. B2

m :=

sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

aiXi

  • 2

N

  • i=1

a2

i |Xi|2

  • Alexander Litvak (Univ. of Alberta)

Approximating the covariance matrix and RIP Bedlewo, 2014 16 / 21

slide-52
SLIDE 52

Ideas of proofs

The main technical tool is obtaining bounds on the following two parameters. For m ≤ N and random vectors X1, ..., XN in Rn, define Am and Bm by 1. Am := sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

aiXi

  • .

Note, if A is the matrix with columns Xi, then Am is the supremum of norms of submatrices consisting of m columns of A. The problem of estimating Am is interesting by itself, although for KLS problem only m = n is needed. 2. B2

m :=

sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

aiXi

  • 2

N

  • i=1

a2

i |Xi|2

  • =

sup

a∈SN−1 | supp(a)|≤m

  • i=j

aiXi, ajXj

  • .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 16 / 21

slide-53
SLIDE 53

Ideas of proofs

The main technical tool is obtaining bounds on the following two parameters. For m ≤ N and random vectors X1, ..., XN in Rn, define Am and Bm by 1. Am := sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

aiXi

  • .

Note, if A is the matrix with columns Xi, then Am is the supremum of norms of submatrices consisting of m columns of A. The problem of estimating Am is interesting by itself, although for KLS problem only m = n is needed. 2. B2

m :=

sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

aiXi

  • 2

N

  • i=1

a2

i |Xi|2

  • =

sup

a∈SN−1 | supp(a)|≤m

  • i=j

aiXi, ajXj

  • .

Bm is related to concentration. An upper bound on it plays the crucial role for RIP.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 16 / 21

slide-54
SLIDE 54

RIP

RIP parameter δm can be rewritten as δm(A/√n) = sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1

  • Alexander Litvak (Univ. of Alberta)

Approximating the covariance matrix and RIP Bedlewo, 2014 17 / 21

slide-55
SLIDE 55

RIP

RIP parameter δm can be rewritten as δm(A/√n) = sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1

  • ≤ 1

n sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1 n

N

  • i=1

a2

i |Xi|2

  • +

sup

a∈SN−1 | supp(a)|≤m

  • 1

n

N

  • i=1

a2

i |Xi|2 − 1

  • Alexander Litvak (Univ. of Alberta)

Approximating the covariance matrix and RIP Bedlewo, 2014 17 / 21

slide-56
SLIDE 56

RIP

RIP parameter δm can be rewritten as δm(A/√n) = sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1

  • ≤ 1

n sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1 n

N

  • i=1

a2

i |Xi|2

  • +

sup

a∈SN−1 | supp(a)|≤m

  • 1

n

N

  • i=1

a2

i |Xi|2 − 1

  • ≤ 1

n B2

m +

sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

a2

i

1 n |Xi|2 − 1

  • Alexander Litvak (Univ. of Alberta)

Approximating the covariance matrix and RIP Bedlewo, 2014 17 / 21

slide-57
SLIDE 57

RIP

RIP parameter δm can be rewritten as δm(A/√n) = sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1

  • ≤ 1

n sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1 n

N

  • i=1

a2

i |Xi|2

  • +

sup

a∈SN−1 | supp(a)|≤m

  • 1

n

N

  • i=1

a2

i |Xi|2 − 1

  • ≤ 1

n B2

m +

sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

a2

i

1 n |Xi|2 − 1

  • ≤ B2

m + max i≤N

  • 1

n |Xi|2 − 1

  • .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 17 / 21

slide-58
SLIDE 58

RIP

RIP parameter δm can be rewritten as δm(A/√n) = sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1

  • ≤ 1

n sup

a∈SN−1 | supp(a)|≤m

  • 1

n |Aa|2 − 1 n

N

  • i=1

a2

i |Xi|2

  • +

sup

a∈SN−1 | supp(a)|≤m

  • 1

n

N

  • i=1

a2

i |Xi|2 − 1

  • ≤ 1

n B2

m +

sup

a∈SN−1 | supp(a)|≤m

  • N
  • i=1

a2

i

1 n |Xi|2 − 1

  • ≤ B2

m + max i≤N

  • 1

n |Xi|2 − 1

  • .

Note that, max

i≤N

  • 1

n |Xi|2 − 1

  • = δ1(A/√n) ≤ δm(A/√n),

that is, concentration of |Xi| around √n is needed.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 17 / 21

slide-59
SLIDE 59

KLS

We need to estimate P

  • sup

a∈Sn−1

  • N
  • i=1
  • Xi, a2 − EXi, a2
  • > t
  • .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 18 / 21

slide-60
SLIDE 60

KLS

We need to estimate P

  • sup

a∈Sn−1

  • N
  • i=1
  • Xi, a2 − EXi, a2
  • > t
  • .

First, using symmetrization we pass to P

  • sup

a∈Sn−1

  • N
  • i=1

εiXi, a2

  • > t
  • ,

where εi are independent Bernoulli ±1 random variables.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 18 / 21

slide-61
SLIDE 61

KLS

We need to estimate P

  • sup

a∈Sn−1

  • N
  • i=1
  • Xi, a2 − EXi, a2
  • > t
  • .

First, using symmetrization we pass to P

  • sup

a∈Sn−1

  • N
  • i=1

εiXi, a2

  • > t
  • ,

where εi are independent Bernoulli ±1 random variables. Conditioning on Xi and considering decreasing rearrangement,

  • N
  • i=1

εiXi, a2

m

  • i=1

Xi, a∗ 2 +

  • N
  • i=m+1

επ(i)Xi, a∗ 2

  • ,

for some permutation π.

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 18 / 21

slide-62
SLIDE 62

KLS

Now,

m

  • i=1

Xi, a∗ 2 ≤ A2

m

and using Hoeffding’s inequality, for every t > 0 P(εi)  

  • N
  • i=m+1

επ(i)Xi, a∗ 2

  • ≥ t
  • N
  • i=m+1

Xi, a∗ 4   ≤ 2 exp(−t2/2).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 19 / 21

slide-63
SLIDE 63

KLS

Now,

m

  • i=1

Xi, a∗ 2 ≤ A2

m

and using Hoeffding’s inequality, for every t > 0 P(εi)  

  • N
  • i=m+1

επ(i)Xi, a∗ 2

  • ≥ t
  • N
  • i=m+1

Xi, a∗ 4   ≤ 2 exp(−t2/2). Finally we estimate P

  • N
  • i=m+1

Xi, a∗ 4 > s

  • and choose parameters appropriately (m = n, t = √n, ...).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 19 / 21

slide-64
SLIDE 64

Bounds on Am, Bm

Using decoupling argument,

  • i=j

aiXi, ajXj

  • = 22−N
  • I⊂{1,2,...,N}
  • i∈I

aiXi,

  • j∈Ic

ajXj

  • .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 20 / 21

slide-65
SLIDE 65

Bounds on Am, Bm

Using decoupling argument,

  • i=j

aiXi, ajXj

  • = 22−N
  • I⊂{1,2,...,N}
  • i∈I

aiXi,

  • j∈Ic

ajXj

  • .

We denote Q(a, I, Ic) :=

  • i∈I

aiXi,

  • j∈Ic

ajXj

  • and

Qm(I) = sup

a∈SN−1 | supp(a)|≤m

Q(a, I, Ic).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 20 / 21

slide-66
SLIDE 66

Bounds on Am, Bm

Using decoupling argument,

  • i=j

aiXi, ajXj

  • = 22−N
  • I⊂{1,2,...,N}
  • i∈I

aiXi,

  • j∈Ic

ajXj

  • .

We denote Q(a, I, Ic) :=

  • i∈I

aiXi,

  • j∈Ic

ajXj

  • and

Qm(I) = sup

a∈SN−1 | supp(a)|≤m

Q(a, I, Ic). Therefore B2

m =

sup

a∈SN−1 | supp(a)|≤m

  • i=j

aiXi, ajXj

  • ≤ 22−N
  • I⊂{1,2,...,N}

Qm(I).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 20 / 21

slide-67
SLIDE 67

Bounds on Am, Bm

We prove that for some γ ∈ [1/2, 1) and every ε ∈ (0, 1), t > 1, with high probability Qm(I) ≤ (1 + ε)(Qγm(I) + tAm).

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 21 / 21

slide-68
SLIDE 68

Bounds on Am, Bm

We prove that for some γ ∈ [1/2, 1) and every ε ∈ (0, 1), t > 1, with high probability Qm(I) ≤ (1 + ε)(Qγm(I) + tAm). The we use iteration procedure choosing appropriate ε and t on every step and controlling probability. It will give a bound of the type Qm(I) ≤

k

  • ℓ=1

(1 + εℓ)

  • Qγkm(I) + Am

k

  • ℓ=1

tℓ

  • Alexander Litvak (Univ. of Alberta)

Approximating the covariance matrix and RIP Bedlewo, 2014 21 / 21

slide-69
SLIDE 69

Bounds on Am, Bm

We prove that for some γ ∈ [1/2, 1) and every ε ∈ (0, 1), t > 1, with high probability Qm(I) ≤ (1 + ε)(Qγm(I) + tAm). The we use iteration procedure choosing appropriate ε and t on every step and controlling probability. It will give a bound of the type Qm(I) ≤

k

  • ℓ=1

(1 + εℓ)

  • Qγkm(I) + Am

k

  • ℓ=1

tℓ

  • ≤ C(max |Xi|2 + √m (N/m)β Am),

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 21 / 21

slide-70
SLIDE 70

Bounds on Am, Bm

We prove that for some γ ∈ [1/2, 1) and every ε ∈ (0, 1), t > 1, with high probability Qm(I) ≤ (1 + ε)(Qγm(I) + tAm). The we use iteration procedure choosing appropriate ε and t on every step and controlling probability. It will give a bound of the type Qm(I) ≤

k

  • ℓ=1

(1 + εℓ)

  • Qγkm(I) + Am

k

  • ℓ=1

tℓ

  • ≤ C(max |Xi|2 + √m (N/m)β Am),

which leads to B2

m ≤ C1

  • max |Xi|2 + √m(N/m)βAm
  • Alexander Litvak (Univ. of Alberta)

Approximating the covariance matrix and RIP Bedlewo, 2014 21 / 21

slide-71
SLIDE 71

Bounds on Am, Bm

We prove that for some γ ∈ [1/2, 1) and every ε ∈ (0, 1), t > 1, with high probability Qm(I) ≤ (1 + ε)(Qγm(I) + tAm). The we use iteration procedure choosing appropriate ε and t on every step and controlling probability. It will give a bound of the type Qm(I) ≤

k

  • ℓ=1

(1 + εℓ)

  • Qγkm(I) + Am

k

  • ℓ=1

tℓ

  • ≤ C(max |Xi|2 + √m (N/m)β Am),

which leads to B2

m ≤ C1

  • max |Xi|2 + √m(N/m)βAm
  • ≤ C2
  • max |Xi|2 + √m(N/m)βBm + √m(N/m)β max |Xi|
  • .

Alexander Litvak (Univ. of Alberta) Approximating the covariance matrix and RIP Bedlewo, 2014 21 / 21