Matrix-valued Chernoff Bounds and Applications China Theory Week - - PowerPoint PPT Presentation

matrix valued chernoff bounds and applications
SMART_READER_LITE
LIVE PREVIEW

Matrix-valued Chernoff Bounds and Applications China Theory Week - - PowerPoint PPT Presentation

Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University of Toronto September 2010 Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental


slide-1
SLIDE 1

Matrix-valued Chernoff Bounds and Applications

China Theory Week Anastasios Zouzias

University of Toronto

September 2010

slide-2
SLIDE 2

Introduction

Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda:

1

Review real-valued probabilistic inequalities

2

Present recent matrix-valued variants

3

A low rank matrix-valued inequality

4

Two applications: matrix sparsification, approximate matrix multiplication

slide-3
SLIDE 3

Introduction

Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda:

1

Review real-valued probabilistic inequalities

2

Present recent matrix-valued variants

3

A low rank matrix-valued inequality

4

Two applications: matrix sparsification, approximate matrix multiplication

slide-4
SLIDE 4

Introduction

Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda:

1

Review real-valued probabilistic inequalities

2

Present recent matrix-valued variants

3

A low rank matrix-valued inequality

4

Two applications: matrix sparsification, approximate matrix multiplication

slide-5
SLIDE 5

Introduction

Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda:

1

Review real-valued probabilistic inequalities

2

Present recent matrix-valued variants

3

A low rank matrix-valued inequality

4

Two applications: matrix sparsification, approximate matrix multiplication

slide-6
SLIDE 6

Introduction

Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda:

1

Review real-valued probabilistic inequalities

2

Present recent matrix-valued variants

3

A low rank matrix-valued inequality

4

Two applications: matrix sparsification, approximate matrix multiplication

slide-7
SLIDE 7

Law of Large Numbers

Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X1,...,Xt be independent copies of a random variable X Goal: estimate the mean E[X] using samples X1,...,Xt Approximate by the empirical mean

1 t

t

  • i=1

Xt ≈ E[X]

How good is the approximation (non-asymptotics)?

slide-8
SLIDE 8

Law of Large Numbers

Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X1,...,Xt be independent copies of a random variable X Goal: estimate the mean E[X] using samples X1,...,Xt Approximate by the empirical mean

1 t

t

  • i=1

Xt ≈ E[X]

How good is the approximation (non-asymptotics)? Question: Is there a matrix-valued LLN?

slide-9
SLIDE 9

Matrix-valued Random Variables

Let (Ω,F ,P) be a probability space. A matrix-valued random variable is a measurable function

M : Ω → Rd×d

Its expectation is a d ×d matrix, denote by E[M] ∈ Rd×d Self-adjoint matrix-valued random variable: M : Ω → Sd×d Caveat: Entries may or may not be correlated with each other

slide-10
SLIDE 10

Matrix-valued Random Variables

Let (Ω,F ,P) be a probability space. A matrix-valued random variable is a measurable function

M : Ω → Rd×d

Its expectation is a d ×d matrix, denote by E[M] ∈ Rd×d Self-adjoint matrix-valued random variable: M : Ω → Sd×d Caveat: Entries may or may not be correlated with each other

Matrix-valued random variable

is a random matrix with (possibly) correlated entries

slide-11
SLIDE 11

Real-valued Probabilistic Inequalities

Lemma (Markov)

Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0. Then

P(X ≥ α) ≤ E[X]

α .

slide-12
SLIDE 12

Real-valued Probabilistic Inequalities

Lemma (Markov)

Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0. Then

P(X ≥ α) ≤ E[X]

α .

Lemma (Chernoff-Hoeffding)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .
slide-13
SLIDE 13

Real-valued Probabilistic Inequalities

Lemma (Chernoff-Hoeffding)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .

Lemma (Bernstein)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ and

Var(X) ≤ ρ2, then P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C

ε2t ρ2 + γε/3

  • .
slide-14
SLIDE 14

Real-valued Probabilistic Inequalities

Lemma (Chernoff-Hoeffding)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .

Lemma (Bernstein)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ and

Var(X) ≤ ρ2, then P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C

ε2t ρ2 + γε/3

  • .

...and many more...

slide-15
SLIDE 15

Real-valued Probabilistic Inequalities

Lemma (Chernoff-Hoeffding)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .

Lemma (Bernstein)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ and

Var(X) ≤ ρ2, then P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C

ε2t ρ2 + γε/3

  • .

Question: How would the matrix-valued generalizations look like?

slide-16
SLIDE 16

Real-valued to Matrix-valued

Is there a meaningful way to generalize the real-valued inequalities to matrix-valued? Would these inequalities be useful to us?

slide-17
SLIDE 17

Real-valued to Matrix-valued

Is there a meaningful way to generalize the real-valued inequalities to matrix-valued? Would these inequalities be useful to us? α,̙ ∈ R

A,B ∈ Sd×d

Comments α > ̙

A B A−B is p.s.d. |α| A

Spectral norm

eα eA

Matrix Exponential

slide-18
SLIDE 18

Matrix-valued Probabilistic Inequalities

Lemma (Markov)

Let X ≥ 0 be a real-valued r.v. and α > 0. Then

P(X ≥ α) ≤ E[X]

α .

slide-19
SLIDE 19

Matrix-valued Probabilistic Inequalities

Lemma (Markov)

Let X ≥ 0 be a real-valued r.v. and α > 0. Then

P(X ≥ α) ≤ E[X]

α .

Lemma (Matrix-valued Markov [AW02])

Let M 0 be a self adjoint matrix-valued r.v. and α > 0. Then

P(M α ·I) ≤ tr(E[M])

α . Remark: P(M α ·I) = P(λmax(M) > α)

slide-20
SLIDE 20

Matrix-valued Probabilistic Inequalities

Theorem (Chernoff)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .
slide-21
SLIDE 21

Matrix-valued Probabilistic Inequalities

Theorem (Chernoff)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .

Theorem (Matrix-valued Chernoff [AW02, WX08])

Let M1,M2,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M of size

  • d. If M ≤ γ a.s., then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ dexp

  • −C ε2t

γ2

  • .
slide-22
SLIDE 22

Matrix-valued Probabilistic Inequalities

Theorem (Chernoff)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .

Theorem (Matrix-valued Chernoff [AW02, WX08])

Let M1,M2,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M of size

  • d. If M ≤ γ a.s., then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ dexp

  • −C ε2t

γ2

  • .

Remark: Proof similar with real-valued case (use of matrix exponential!)

slide-23
SLIDE 23

Matrix-valued Probabilistic Inequalities

Theorem (Chernoff)

Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .

Theorem (Matrix-valued Chernoff [AW02, WX08])

Let M1,M2,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M of size

  • d. If M ≤ γ a.s., then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ dexp

  • −C ε2t

γ2

  • .

Question: Can we remove the dependency on the dimensionality (d)?

slide-24
SLIDE 24

In general, no!

Set M =

                    g1

... ... ... ...

gd−1

...

gd                    

, gi ∼ N(0,1). Then E[M] = 0d×d

  • 1

t

t

i=1 Mi −E[M]

  • ≈ 1

√t (g1,g2,...,gd)∞, i.e., maximum deviation of d

independent Gaussian r.v.’s

slide-25
SLIDE 25

In general, no!

Set M =

                    g1

... ... ... ...

gd−1

...

gd                    

, gi ∼ N(0,1). Then E[M] = 0d×d

  • 1

t

t

i=1 Mi −E[M]

  • ≈ 1

√t (g1,g2,...,gd)∞, i.e., maximum deviation of d

independent Gaussian r.v.’s Question: Are there any natural assumptions that avoid the dependency on d?

slide-26
SLIDE 26

In general, no!

Set M =

                    g1

... ... ... ...

gd−1

...

gd                    

, gi ∼ N(0,1). Then E[M] = 0d×d

  • 1

t

t

i=1 Mi −E[M]

  • ≈ 1

√t (g1,g2,...,gd)∞, i.e., maximum deviation of d

independent Gaussian r.v.’s Question: Are there any natural assumptions that avoid the dependency on d? What if M has rank-one [RV07, Rud99]?

slide-27
SLIDE 27

In general, no!

Set M =

                    g1

... ... ... ...

gd−1

...

gd                    

, gi ∼ N(0,1). Then E[M] = 0d×d

  • 1

t

t

i=1 Mi −E[M]

  • ≈ 1

√t (g1,g2,...,gd)∞, i.e., maximum deviation of d

independent Gaussian r.v.’s Question: Are there any natural assumptions that avoid the dependency on d? What if M has rank-one [RV07, Rud99]? Low-rank [MZ10]?

slide-28
SLIDE 28

Let M1,M2,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M of size

d Theorem (“Restated” Matrix-valued Chernoff)

If M ≤ γ a.s., and t = Ω(γ2/ε2 logd) then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ 1

poly(d).

slide-29
SLIDE 29

Low Rank Matrix-valued Chernoff

Let M1,M2,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M of size

d Theorem (“Restated” Matrix-valued Chernoff)

If M ≤ γ a.s., and t = Ω(γ2/ε2 logd) then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ 1

poly(d).

Theorem (Low Rank Matrix-valued Chernoff [MZ10])

If M ≤ γ, rank(M) = O(1) a.s., E[M] ≤ 1 and t = Ω(γ/ε2 log(γ/ε2)) then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ 1

poly(t).

slide-30
SLIDE 30

Warm-up (Real-valued case)

Let’s start by proving the real-valued case Let X1,X2,...,Xt be i.i.d. copies of a real-valued r.v. X and ε > 0. If |X| ≤ γ, then

P        

  • 1

t

t

  • i=1

Xi −E[X]

  • > ε

        ≤ 2exp

  • −C ε2t

γ2

  • .

p-th moments, Ep :=

  • E
  • 1

t

t

i=1 Xi −E[X]

  • p1/p

Approach: Give tight bounds for Ep Mimic the real-valued case for matrix-valued

Fact

If g ∼ N(0,σ2), then (E|g|p)1/p = O(σ √p)

slide-31
SLIDE 31

Proof (Warm-up)

Reduce general r.v. Xi to Bernoulli ϸi ∼ ±1 (Symmetrisation Argument)

Ep :=        EXi

  • 1

t

t

  • i=1

Xi −E[X]

  • p

      

1/p

≤ 2 t        EXi Eϸi

  • t
  • i=1

ϸiXi

  • p

      

1/p

Bound Eϸi

  • t

i=1 ϸiXi

  • p. By Khintchine’s ineq. we get

Eϸi

  • t
  • i=1

ϸiXi

  • p

≤        C ·p

t

  • i=1

X2

i

       

p/2

slide-32
SLIDE 32

Proof (Warm-up)

Reduce general r.v. Xi to Bernoulli ϸi ∼ ±1 (Symmetrisation Argument)

Ep :=        EXi

  • 1

t

t

  • i=1

Xi −E[X]

  • p

      

1/p

≤ 2 t        EXi Eϸi

  • t
  • i=1

ϸiXi

  • p

      

1/p

Bound Eϸi

  • t

i=1 ϸiXi

  • p. By Khintchine’s ineq. we get

Eϸi

  • t
  • i=1

ϸiXi

  • p

≤        C ·p

t

  • i=1

X2

i

       

p/2

slide-33
SLIDE 33

Proof (Warm-up)

Reduce general r.v. Xi to Bernoulli ϸi ∼ ±1 (Symmetrisation Argument)

Ep :=        EXi

  • 1

t

t

  • i=1

Xi −E[X]

  • p

      

1/p

≤ 2 t        EXi Eϸi

  • t
  • i=1

ϸiXi

  • p

      

1/p

Bound Eϸi

  • t

i=1 ϸiXi

  • p. By Khintchine’s ineq. we get

Eϸi

  • t
  • i=1

ϸiXi

  • p

≤        C ·p

t

  • i=1

X2

i

       

p/2

slide-34
SLIDE 34

Proof (Warm-up) - Continued

Ep ≤ 2 t        EXi Eϸi

  • t
  • i=1

ϸiXi

  • p

      

1/p

, Symmetrisation

≤ 2C √p t         EXi        

t

  • i=1

X2

i

       

p/2

       

1/p

, Khintchine

≤ 2C √p t

  • tγ21/2 ,

t

  • i=1

X2

i ≤ tγ2

≤ 2Cγ √p √t

slide-35
SLIDE 35

Proof (Warm-up) - Continued

Ep ≤ 2 t        EXi Eϸi

  • t
  • i=1

ϸiXi

  • p

      

1/p

, Symmetrisation

≤ 2C √p t         EXi        

t

  • i=1

X2

i

       

p/2

       

1/p

, Khintchine

≤ 2C √p t

  • tγ21/2 ,

t

  • i=1

X2

i ≤ tγ2

≤ 2Cγ √p √t

slide-36
SLIDE 36

Proof (Warm-up) - Continued

Ep ≤ 2 t        EXi Eϸi

  • t
  • i=1

ϸiXi

  • p

      

1/p

, Symmetrisation

≤ 2C √p t         EXi        

t

  • i=1

X2

i

       

p/2

       

1/p

, Khintchine

≤ 2C √p t

  • tγ21/2 ,

t

  • i=1

X2

i ≤ tγ2

≤ 2Cγ √p √t

slide-37
SLIDE 37

Proof (Warm-up) - Continued

Ep ≤ 2 t        EXi Eϸi

  • t
  • i=1

ϸiXi

  • p

      

1/p

, Symmetrisation

≤ 2C √p t         EXi        

t

  • i=1

X2

i

       

p/2

       

1/p

, Khintchine

≤ 2C √p t

  • tγ21/2 ,

t

  • i=1

X2

i ≤ tγ2

≤ 2Cγ √p √t

slide-38
SLIDE 38

Theorem (Low Rank Matrix-valued Chernoff [MZ10])

Let M1,M2,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M of size

  • d. If M ≤ γ, rank(M) = O(1) a.s., E[M] ≤ 1 and t = Ω(γ/ε2 log(γ/ε2)) then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ 1

poly(t). Let Z =

  • 1

t

t

i=1 Mi −E[M]

  • Goal: Prove a similar bound for (EZp)1/p like before (real case)
slide-39
SLIDE 39

Theorem (Low Rank Matrix-valued Chernoff [MZ10])

Let M1,M2,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M of size

  • d. If M ≤ γ, rank(M) = O(1) a.s., E[M] ≤ 1 and t = Ω(γ/ε2 log(γ/ε2)) then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ 1

poly(t). Let Z =

  • 1

t

t

i=1 Mi −E[M]

  • Goal: Prove a similar bound for (EZp)1/p like before (real case)

Main Problem There is no Khintchine ineq. for · space as for reals

slide-40
SLIDE 40

Theorem (Low Rank Matrix-valued Chernoff [MZ10])

Let M1,M2,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M of size

  • d. If M ≤ γ, rank(M) = O(1) a.s., E[M] ≤ 1 and t = Ω(γ/ε2 log(γ/ε2)) then

P        

  • 1

t

t

  • i=1

Mi −E[M]

  • > ε

        ≤ 1

poly(t). Let Z =

  • 1

t

t

i=1 Mi −E[M]

  • Goal: Prove a similar bound for (EZp)1/p like before (real case)

Main Problem There is no Khintchine ineq. for · space as for reals ...however there is Khintchine ineq. for the Schatten space...

slide-41
SLIDE 41

Schatten Space

Let A ∈ Rd×d. Denote by Cd

p the p-th Schatten norm space in Rd equipped

with the norm

ACd

p :=

       

d

  • i=1

σi(A)p

       

1/p

, where σi(A) are the singular values of A.

p = ∞: Operator norm p = 2: Frobenius (Hilbert-Schmidt) norm p = 1: Nuclear norm A ≤ ACd

p ≤ (rank(A))1/p A for any p ≥ 1

Cd

p space has a Khintchine inequality [LPP91, LP86]!

         Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

≤ O( √p)

      

t

  • i=1

M2

i

       

1/2

  • Cd

p

where ϸi ∼ ±1.

slide-42
SLIDE 42

Schatten Space

Let A ∈ Rd×d. Denote by Cd

p the p-th Schatten norm space in Rd equipped

with the norm

ACd

p :=

       

d

  • i=1

σi(A)p

       

1/p

, where σi(A) are the singular values of A.

p = ∞: Operator norm p = 2: Frobenius (Hilbert-Schmidt) norm p = 1: Nuclear norm A ≤ ACd

p ≤ (rank(A))1/p A for any p ≥ 1

Cd

p space has a Khintchine inequality [LPP91, LP86]!

         Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

≤ O( √p)

      

t

  • i=1

M2

i

       

1/2

  • Cd

p

where ϸi ∼ ±1.

slide-43
SLIDE 43

Schatten Space

Let A ∈ Rd×d. Denote by Cd

p the p-th Schatten norm space in Rd equipped

with the norm

ACd

p :=

       

d

  • i=1

σi(A)p

       

1/p

, where σi(A) are the singular values of A.

p = ∞: Operator norm p = 2: Frobenius (Hilbert-Schmidt) norm p = 1: Nuclear norm A ≤ ACd

p ≤ (rank(A))1/p A for any p ≥ 1

Cd

p space has a Khintchine inequality [LPP91, LP86]!

         Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

≤ O( √p)

      

t

  • i=1

M2

i

       

1/2

  • Cd

p

where ϸi ∼ ±1.

slide-44
SLIDE 44

Schatten Space

Let A ∈ Rd×d. Denote by Cd

p the p-th Schatten norm space in Rd equipped

with the norm

ACd

p :=

       

d

  • i=1

σi(A)p

       

1/p

, where σi(A) are the singular values of A.

p = ∞: Operator norm p = 2: Frobenius (Hilbert-Schmidt) norm p = 1: Nuclear norm A ≤ ACd

p ≤ (rank(A))1/p A for any p ≥ 1

Cd

p space has a Khintchine inequality [LPP91, LP86]!

         Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

≤ O( √p)

      

t

  • i=1

M2

i

       

1/2

  • Cd

p

where ϸi ∼ ±1.

slide-45
SLIDE 45

What we proved before...

Real-valued:

       EXi

  • 1

t

t

  • i=1

Xi −E[X]

  • p

      

1/p

≤ C √p t         EXi        

t

  • i=1

X2

i

       

p/2

       

1/p

slide-46
SLIDE 46

...what we get now

Real-valued:

       EXi

  • 1

t

t

  • i=1

Xi −E[X]

  • p

      

1/p

≤ C √p t         EXi        

t

  • i=1

X2

i

       

p/2

       

1/p

Lemma (Main Lemma [MZ10])

Let M1,...,Mt be i.i.d. copies of a self adjoint matrix-valued r.v. M with rank at most r almost surely. Then for every p ≥ 2

       E

  • 1

t

t

  • i=1

Mi −E[M]

  • p

      

1/p

≤ C(rt)1/p √p t           EMj

  • t
  • j=1

M2

j

  • p/2

         

1/p

slide-47
SLIDE 47

Proof Sketch

Let

Ep :=

  • E
  • 1

t

t

i=1 Mi −E[M]

  • p1/p
slide-48
SLIDE 48

Proof Sketch

Let

Ep :=

  • E
  • 1

t

t

i=1 Mi −E[M]

  • p1/p
  • Ep

≤ 2 t        EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

      

1/p

Symmetrisation

slide-49
SLIDE 49

Proof Sketch

Let

Ep :=

  • E
  • 1

t

t

i=1 Mi −E[M]

  • p1/p
  • Ep

≤ 2 t        EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

      

1/p

Symmetrisation

≤ 2 t          EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

A ≤ ACd

p

slide-50
SLIDE 50

Proof Sketch

Let

Ep :=

  • E
  • 1

t

t

i=1 Mi −E[M]

  • p1/p
  • Ep

≤ 2 t        EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

      

1/p

Symmetrisation

≤ 2 t          EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

A ≤ ACd

p

slide-51
SLIDE 51

Proof Sketch

Let

Ep :=

  • E
  • 1

t

t

i=1 Mi −E[M]

  • p1/p
  • Ep

≤ 2 t        EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

      

1/p

Symmetrisation

≤ 2 t          EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

A ≤ ACd

p

≤ 2 t           EMi Cppp/2

      

t

  • i=1

M2

i

       

1/2

  • p

Cd

p

          

1/p

Khintchine

slide-52
SLIDE 52

Proof Sketch

Let

Ep :=

  • E
  • 1

t

t

i=1 Mi −E[M]

  • p1/p
  • Ep

≤ 2 t        EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

      

1/p

Symmetrisation

≤ 2 t          EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

A ≤ ACd

p

≤ 2C √p t           EMi

      

t

  • i=1

M2

i

       

1/2

  • p

Cd

p

          

1/p

Khintchine

slide-53
SLIDE 53

Proof Sketch

Let

Ep :=

  • E
  • 1

t

t

i=1 Mi −E[M]

  • p1/p
  • Ep

≤ 2 t        EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

      

1/p

Symmetrisation

≤ 2 t          EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

A ≤ ACd

p

≤ 2C √p t           EMi

      

t

  • i=1

M2

i

       

1/2

  • p

Cd

p

          

1/p

Khintchine

slide-54
SLIDE 54

Proof Sketch

Let

Ep :=

  • E
  • 1

t

t

i=1 Mi −E[M]

  • p1/p
  • Ep

≤ 2 t        EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

      

1/p

Symmetrisation

≤ 2 t          EMi Eϸi

  • t
  • i=1

ϸiMi

  • p

Cd

p

         

1/p

A ≤ ACd

p

≤ 2C √p t           EMi

      

t

  • i=1

M2

i

       

1/2

  • p

Cd

p

          

1/p

Khintchine

≤ 2C(rt)1/p √p t         EMi

  • t
  • i=1

M2

i

  • p/2

       

1/p

ACd

p ≤ rank(A)1/p A

slide-55
SLIDE 55

SECOND PART

APPLICATIONS

slide-56
SLIDE 56

Matrix Sparsification

A :=                                                                     

19 3 4 16 7 6 6 7 19 8 13 10 2 4 3 9 15 4 12 17 16 4 4 6 8 7 5 14 19 19 1 10 5 5 16 16 17 17 11 19 8 4 13 16 6 9 16 19 8 7 11 9 9 8 17 8 2 20 1 13 6 7 12 11 20 19 2 3 1 13 6 14 2 15 11 4 2 4 2 4 5 16 1 16 11 1 17 16 14 6 2 13 9 6 3 8 4 2 20 17 14 6 18 13 11 6 3 3 4 5 13 19 20 19 7 14 20 10 16 14 20 3 5 9 15 16 4 14 20 14 11 8 19 14 1 18 9 2 13 10 20 16 1 4 3 17 3 15 16 12 1 3 10 9 20 15 9 3 3 12 12 10 17 11 19 19 11 9 10 8 8 10 6 11 10 2 18 3 19 20 6 7 17 14 16 20 17 19 1 5 2 18 10 12 15 11

                                                                    

slide-57
SLIDE 57

Matrix Sparsification

  • A :=

                                                                    

19 3 16 7 6 7 8 13 10 2 4 9 15 4 17 16 4 4 6 7 14 19 19 1 10 5 5 16 16 17 11 8 4 13 6 19 8 11 9 8 17 8 2 20 1 13 7 11 20 19 2 3 1 13 6 14 2 15 11 4 2 4 2 4 5 16 1 16 11 1 16 6 2 13 6 3 8 2 20 17 14 6 18 13 11 6 3 3 4 5 13 19 20 19 7 14 20 10 16 14 20 3 5 15 16 4 14 20 14 11 8 19 14 1 18 9 2 13 10 20 16 1 4 3 3 15 12 1 3 10 9 20 15 9 3 12 10 11 19 19 11 9 10 8 8 6 11 10 2 18 3 20 6 7 17 14 16 20 17 19 1 2 18 10 12 15 11

                                                                    

slide-58
SLIDE 58

Matrix Sparsification

  • A :=

                                                                    

19 3 16 7 6 7 8 13 10 2 4 9 15 4 17 16 4 4 6 7 14 19 19 1 10 5 5 16 16 17 11 8 4 13 6 19 8 11 9 8 17 8 2 20 1 13 7 11 20 19 2 3 1 13 6 14 2 15 11 4 2 4 2 4 5 16 1 16 11 1 16 6 2 13 6 3 8 2 20 17 14 6 18 13 11 6 3 3 4 5 13 19 20 19 7 14 20 10 16 14 20 3 5 15 16 4 14 20 14 11 8 19 14 1 18 9 2 13 10 20 16 1 4 3 3 15 12 1 3 10 9 20 15 9 3 12 10 11 19 19 11 9 10 8 8 6 11 10 2 18 3 20 6 7 17 14 16 20 17 19 1 2 18 10 12 15 11

                                                                    

Goal: Given A ∈ Rn×n and ε > 0. Find sparse

A s.t.

  • A−A
  • ≤ εA
slide-59
SLIDE 59

Matrix Sparsification

Problem

Given A ∈ Rn×n and ε > 0. Find sparse

A s.t.

  • A−A
  • ≤ εA

Achlioptas, McSherry [AM07] Sparsify each entry (i,j) independently w.p. ≈ |Aij| Analysis:

A−A is a random matrix with independent entries

Arora et al. [AHK06] simplified their analysis using real-valued Chernoff bounds. Drineas, Z. [DZ10] Sample each entry (i,j) independently w.p. ≈ A2

ij/A2 F

Improve the above results using matrix-valued Chernoff bounds (matrix-valued Bernstein)

slide-60
SLIDE 60

Analysis via matrix-valued Chernoff

Define a matrix-valued r.v. M with E[M] = A Each sample of M is a zero d ×d matrix with only one non-zero entry Let pij = A2

ij/A2 F (probability selecting (i,j) entry)

P

  • M = 1

pij Aijeie⊤

j

  • = pij
slide-61
SLIDE 61

Analysis via matrix-valued Chernoff

A :=                                                                     

19 3 4 16 7 6 6 7 19 8 13 10 2 4 3 9 15 4 12 17 16 4 4 6 8 7 5 14 19 19 1 10 5 5 16 16 17 17 11 19 8 4 13 16 6 9 16 19 8 7 11 9 9 8 17 8 2 20 1 13 6 7 12 11 20 19 2 3 1 13 6 14 2 15 11 4 2 4 2 4 5 16 1 16 11 1 17 16 14 6 2 13 9 6 3 8 4 2 20 17 14 6 18 13 11 6 3 3 4 5 13 19 20 19 7 14 20 10 16 14 20 3 5 9 15 16 4 14 20 14 11 8 19 14 1 18 9 2 13 10 20 16 1 4 3 17 3 15 16 12 1 3 10 9 20 15 9 3 3 12 12 10 17 11 19 19 11 9 10 8 8 10 6 11 10 2 18 3 19 20 6 7 17 14 16 20 17 19 1 5 2 18 10 12 15 11

                                                                    

slide-62
SLIDE 62

Analysis via matrix-valued Chernoff

M1 :=                                                                     

14

                                                                    

slide-63
SLIDE 63

Analysis via matrix-valued Chernoff

M2 :=                                                                     

16

                                                                    

slide-64
SLIDE 64

Analysis via matrix-valued Chernoff

M3 :=                                                                     

15

                                                                    

slide-65
SLIDE 65

Analysis via matrix-valued Chernoff

M3 :=                                                                     

15

                                                                    

......

slide-66
SLIDE 66

Analysis via matrix-valued Chernoff

Mt :=                                                                     

13

                                                                    

slide-67
SLIDE 67

Analysis via matrix-valued Chernoff

  • A =

                                                                    

19 3 16 7 6 7 8 13 10 2 4 9 15 4 17 16 4 4 6 7 14 19 19 1 10 5 5 16 16 17 11 8 4 13 6 19 8 11 9 8 17 8 2 20 1 13 7 11 20 19 2 3 1 13 6 14 2 15 11 4 2 4 2 4 5 16 1 16 11 1 16 6 2 13 6 3 8 2 20 17 14 6 18 13 11 6 3 3 4 5 13 19 20 19 7 14 20 10 16 14 20 3 5 15 16 4 14 20 14 11 8 19 14 1 18 9 2 13 10 20 16 1 4 3 3 15 12 1 3 10 9 20 15 9 3 12 10 11 19 19 11 9 10 8 8 6 11 10 2 18 3 20 6 7 17 14 16 20 17 19 1 2 18 10 12 15 11

                                                                    

Set

A := 1

t

t

i=1 Mi

slide-68
SLIDE 68

Analysis via matrix-valued Chernoff

Define a matrix-valued r.v. M with E[M] = A Each sample of M is a zero d ×d matrix with only one non-zero entry. Let pij = A2

ij/A2 F (probability selecting (i,j) entry)

P

  • M = 1

pij Aijeie⊤

j

  • = pij

Bounding the number of samples = # of non-zero entries of

A

Matrix-valued Chernoff bounds guarantees

  • A−A
  • ≤ εA
slide-69
SLIDE 69

Approximate Matrix Multiplication

Problem

Given A ∈ Rn×m, B ∈ Rn×p and ε > 0. Approximate matrix product A⊤B; compute

A ∈ Rt×m and B ∈ Rt×p (t ≪ m,p,n) such that

  • A⊤

B−A⊤B

  • ≤ εAB.

Approaches: Randomly project their columns Non-uniform row sampling Related Work: Many results w.r.t. Frobenius norm [DKM06, Sar06, CW09] “Weak” bounds w.r.t. spectral norm [DK01, DKM06, Sar06] Similar strong bounds for the special case A = B in [RV07]

slide-70
SLIDE 70

Approximate Matrix Multiplication

Problem

Given A ∈ Rn×m, B ∈ Rn×p and ε > 0. Approximate matrix product A⊤B; compute

A ∈ Rt×m and B ∈ Rt×p (t ≪ m,p,n) such that

  • A⊤

B−A⊤B

  • ≤ εAB.

Approaches: Randomly project their columns Non-uniform row sampling Related Work: Many results w.r.t. Frobenius norm [DKM06, Sar06, CW09] “Weak” bounds w.r.t. spectral norm [DK01, DKM06, Sar06] Similar strong bounds for the special case A = B in [RV07]

slide-71
SLIDE 71

Approximate Matrix Multiplication

Problem

Given A ∈ Rn×m, B ∈ Rn×p and ε > 0. Approximate matrix product A⊤B; compute

A ∈ Rt×m and B ∈ Rt×p (t ≪ m,p,n) such that

  • A⊤

B−A⊤B

  • ≤ εAB.

Approaches: Randomly project their columns Non-uniform row sampling Related Work: Many results w.r.t. Frobenius norm [DKM06, Sar06, CW09] “Weak” bounds w.r.t. spectral norm [DK01, DKM06, Sar06] Similar strong bounds for the special case A = B in [RV07]

slide-72
SLIDE 72

Non-uniform Row Sampling

Recall that A⊤B = n

i=1 A⊤ i Bi(= n i=1 Ai ⊗Bi)

!

" # $ % # $ % & & !!! !!!

'( ) * * *

!!!

+,-,& &,-,. +,-,.

*

slide-73
SLIDE 73

Non-uniform Row Sampling

Recall that A⊤B = n

i=1 A⊤ i Bi(= n i=1 Ai ⊗Bi)

!

" # $ % # $ % & & !!! !!!

'( ) * * *

!!! "#$ "#% "#&

+,-,& &,-,. +,-,.

*

"#'

slide-74
SLIDE 74

Non-uniform Row Sampling

Theorem

There exists prob. distribution pi s.t. if we form an t ×m matrix

A and an t ×p

matrix

B by taking t i.i.d. (row indices) samples from pi with t = Ω( r/ε2 log( r/ε2)), then P

  • A⊤

B−A⊤B

  • ≤ εAB
  • ≥ 1−o

r(1),

where

r is st.rank(A)+ st.rank(B).

st.rank(A) := A2

F /A ≤ rank(A)

slide-75
SLIDE 75

Proof Sketch

Define a distribution over R(m+p)×(m+p) by

P

  • X = 1

pi B⊤

i Ai

A⊤

i Bi

  • = pi.

E[X] = B⊤A A⊤B

  • Every (matrix) sample has rank at most two.

X ≤ ˜ rA + ˜ rB(≤ ˜ r) a.s..

Applying Theorem with t = Ω(˜

r/ε2 log(˜ r/ε2)), we get i1,i2,...,it indices from [n] such that with high probability

  • 1

t

t

  • j=1

       

1 pij B⊤ ij Aij 1 pij A⊤ ij Bij

       − B⊤A A⊤B

εAB

slide-76
SLIDE 76

Conclusion and Open Problems

Matrix-valued probabilistic inequalities are powerful tools Present two application: matrix sparsification and approximate matrix multiplication More applications: graph sparsifiers [SS08], matrix completion [Rec09], bounding integrality gaps [Nem07], Cayley graph expansion, etc... Many unexplored connections. Matrix martingales - Adaptive sampling? See [Tro10]

slide-77
SLIDE 77

Thank You

slide-78
SLIDE 78

References I

  • S. Arora, E. Hazan, and S. Kale.

A Fast Random Sampling Algorithm for Sparsifying Matrices. In Proceedings of the International Workshop on Randomization and Approximation Techniques (RANDOM), pages 272–279, 2006.

  • D. Achlioptas and F

. Mcsherry. Fast Computation of Low-rank Matrix Approximations. SIAM J. Comput., 54(2):9, 2007.

  • R. Ahlswede and A. Winter.

Strong Converse for Identification via Quantum Channels. IEEE Transactions on Information Theory, 48(3):569–579, 2002.

  • K. L. Clarkson and D. P

. Woodruff. Numerical Linear Algebra in the Streaming Model. In Proceedings of the Symposium on Theory of Computing (STOC), pages 205–214, 2009. P . Drineas and R. Kannan. Fast Monte-Carlo Algorithms for Approximate Matrix Multiplication. In Proceedings of the Symposium on Foundations of Computer Science (FOCS), pages 452–459, 2001. P . Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication. SIAM J. Comput., 36(1):132–157, 2006. P . Drineas and A. Zouzias. A Note on Element-wise Matrix Sparsification via Matrix-valued Chernoff Bounds. Available at arxiv:1006.0407, June 2010. F . Lust-Piquard. Inégalités de Khintchine dans Cp (1 < p < ∞).

  • C. R. Acad. Sci. Paris Sér. I Math., 303(7):289–292, 1986.
slide-79
SLIDE 79

References II

F . Lust-Piquard and G. Pisier. Non Commutative Khintchine and Paley Inequalities. Arkiv för Matematik, 29(1-2):241–260, December 1991.

  • A. Magen and A. Zouzias.

Low Rank Matrix-valued Chernoff Bounds and Approximate Matrix Multiplication, 2010.

  • A. Nemirovski.

Sums of Random Symmetric Matrices and Quadratic Optimization under Orthogonality Constraints. Mathematical Programming, 109(2):283–317, 2007.

  • B. Recht.

A Simpler Approach to Matrix Completion. Available at arxiv:0910.0651, October 2009.

  • M. Rudelson.

Random Vectors in the Isotropic Position.

  • J. Funct. Anal., 164(1):60–72, 1999.
  • M. Rudelson and R. Vershynin.

Sampling from Large Matrices: An Approach through Geometric Functional Analysis. SIAM J. Comput., 54(4):21, 2007.

  • T. Sarlos.

Improved Approximation Algorithms for Large Matrices via Random Projections. In Proceedings of the Symposium on Foundations of Computer Science (FOCS), pages 143–152, 2006.

  • D. A. Spielman and N. Srivastava.

Graph Sparsification by Effective Resistances. In Proceedings of the Symposium on Theory of Computing (STOC), pages 563–568, 2008.

slide-80
SLIDE 80

References III

  • J. A. Tropp.

User-Friendly Tail Bounds for Sums of Random Matrices. Available at arxiv:1004.4389, April 2010.

  • A. Wigderson and D. Xiao.

Derandomizing the Ahlswede-Winter Matrix-valued Chernoff Bound using Pessimistic Estimators, and Applications. Theory of Computing, 4(1):53–76, 2008.