[PPT] - Declaring Independence via the Sketching of Sketches Piotr Indyk PowerPoint Presentation

SLIDE 1

Declaring Independence via the Sketching of Sketches

Piotr Indyk Andrew McGregor

Massachusetts Institute of

Technology

University of California, San Diego

Until August ’08 -- Hire Me!

SLIDE 2

The Problem

SLIDE 3

The Problem

Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”

Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

SLIDE 4

The Problem

Sample (sub-linear time):

How many are required to distinguish independence from “ε-far” from independence? [Batu et al. ’01], [Alon et al. ’07], [Valiant ’08]

Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”

Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

SLIDE 5

The Problem

Sample (sub-linear time):

How many are required to distinguish independence from “ε-far” from independence? [Batu et al. ’01], [Alon et al. ’07], [Valiant ’08]

Stream (sub-linear space):

Access pairs sequentially or “online” and limited memory.

Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”

Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

SLIDE 6

Formulation

SLIDE 7

Formulation

Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

SLIDE 8

Formulation

Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

Define “empirical” distributions:
Marginals: (p1, ..., pn), (q1, ..., qn)
Joint: (r11, r12, ..., rnn)
Product: (s11, s12, ..., snn) where sij equals pi qj

SLIDE 9

Formulation

Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

Define “empirical” distributions:
Marginals: (p1, ..., pn), (q1, ..., qn)
Joint: (r11, r12, ..., rnn)
Product: (s11, s12, ..., snn) where sij equals pi qj
Question: How correlated are first and second terms?

SLIDE 10

Formulation

Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

Define “empirical” distributions:
Marginals: (p1, ..., pn), (q1, ..., qn)
Joint: (r11, r12, ..., rnn)
Product: (s11, s12, ..., snn) where sij equals pi qj
Question: How correlated are first and second terms?
E.g.,

L1(s − r) =

i,j |sij − rij|

L2(s − r) = √

i,j(sij − rij)2

I(s, r) = H(p) − H(p|q)

SLIDE 11

Formulation

Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

Define “empirical” distributions:
Marginals: (p1, ..., pn), (q1, ..., qn)
Joint: (r11, r12, ..., rnn)
Product: (s11, s12, ..., snn) where sij equals pi qj
Question: How correlated are first and second terms?
E.g.,
Previous work: Can estimate L1 and L2 between marginals.
[Alon, Matias, Szegedy ’96], [Feigenbaum et al. ’99], [Indyk ’00],
[Guha, Indyk, McGregor ’07], [Ganguly, Cormode ’07]

L1(s − r) =

i,j |sij − rij|

L2(s − r) = √

i,j(sij − rij)2

I(s, r) = H(p) − H(p|q)

SLIDE 12

Our Results

SLIDE 13

Our Results

Estimating L2(s-r):
(1+ε)-factor approx. in Õ(ε-2 ln δ-1) space.
“Neat” result extending AMS sketches

SLIDE 14

Our Results

Estimating L2(s-r):
(1+ε)-factor approx. in Õ(ε-2 ln δ-1) space.
“Neat” result extending AMS sketches
Estimating L1(s-r):
O(ln n)-factor approx. in Õ(ln δ-1) space.
Sketches of sketches and sketches/embeddings

SLIDE 15

Our Results

Estimating L2(s-r):
(1+ε)-factor approx. in Õ(ε-2 ln δ-1) space.
“Neat” result extending AMS sketches
Estimating L1(s-r):
O(ln n)-factor approx. in Õ(ln δ-1) space.
Sketches of sketches and sketches/embeddings
Other Results:
L1(s-r): Additive approximations
Mutual Information: Additive but not (1+ε)-factor approx.
Distributed Model: Pairs are observed by different parties.

SLIDE 16

a) Neat Result for L2

b) Sketching Sketches

c) Other Results a) Neat Result for L2

b) Sketching Sketches

c) Other Results

SLIDE 17

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

SLIDE 18

First Attempt

SLIDE 19

First Attempt

Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

z ∈ {−1, 1}n×n

SLIDE 20

First Attempt

Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

Estimator: Suppose we can compute estimator:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

SLIDE 21

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

Estimator: Suppose we can compute estimator:
Correct in expectation and has small variance:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

SLIDE 22

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

Estimator: Suppose we can compute estimator:
Correct in expectation and has small variance:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

SLIDE 23

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

Estimator: Suppose we can compute estimator:
Correct in expectation and has small variance:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

SLIDE 24

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

Estimator: Suppose we can compute estimator:
Correct in expectation and has small variance:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

SLIDE 25

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

Estimator: Suppose we can compute estimator:
Correct in expectation and has small variance:
Repeating O(ε-2 ln δ-1) times and take the mean.

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

SLIDE 26

Computing Estimator

SLIDE 27

Computing Estimator

Need to compute: and

z.r z.s

SLIDE 28

Computing Estimator

Need to compute: and
Good News: First term is easy

1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m

z.r z.s

SLIDE 29

Computing Estimator

Need to compute: and
Good News: First term is easy

1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m

Bad News: Can’t compute second term!

z.r z.s

SLIDE 30

Computing Estimator

Need to compute: and
Good News: First term is easy

1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m

Bad News: Can’t compute second term!
Good News: Use bilinear sketch: If for
i.e., product of sketches is sketch of product.

z.s =

ij zijsij = (x.p)(y.q)

x, y ∈ {−1, 1}n zij = xiyj z.r z.s

SLIDE 31

Computing Estimator

Need to compute: and
Good News: First term is easy

1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m

Bad News: Can’t compute second term!
Good News: Use bilinear sketch: If for
i.e., product of sketches is sketch of product.
Bad News: z is no longer 4-wise independent even if x and

y are fully random, e.g., z.s =

ij zijsij = (x.p)(y.q)

x, y ∈ {−1, 1}n zij = xiyj

z11z12z21z22 = (x1)2(x2)2(y1)2(y2)2 = 1

z.r z.s

SLIDE 32

Still Get Low Variance

SLIDE 33

Still Get Low Variance

Lemma:

Variance has at most tripled.

SLIDE 34

Still Get Low Variance

Lemma:

Variance has at most tripled.

Proof:

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

SLIDE 35

Still Get Low Variance

Lemma:

Variance has at most tripled.

Proof:
Product of four entries is biased iff entries lie in rectangle

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

SLIDE 36

Still Get Low Variance

Lemma:

Variance has at most tripled.

Proof:
Product of four entries is biased iff entries lie in rectangle
Hence,

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

Var[T] ≤

(i1,j1),(i2,j2),

(i3,j3),(i4,j4) in rectangle

ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2

SLIDE 37

Still Get Low Variance

Lemma:

Variance has at most tripled.

Proof:
Product of four entries is biased iff entries lie in rectangle
Hence,
since a rectangle is uniquely specified by a diagonal and

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤

(i1,j1),(i2,j2),

(i3,j3),(i4,j4) in rectangle

ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2

SLIDE 38

Still Get Low Variance

Lemma:

Variance has at most tripled.

Proof:
Product of four entries is biased iff entries lie in rectangle
Hence,
since a rectangle is uniquely specified by a diagonal and

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤

(i1,j1),(i2,j2),

(i3,j3),(i4,j4) in rectangle

ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2

SLIDE 39

Still Get Low Variance

Lemma:

Variance has at most tripled.

Proof:
Product of four entries is biased iff entries lie in rectangle
Hence,
since a rectangle is uniquely specified by a diagonal and
Less independence useful for range-sums. [Rusu, Dobra ’06]

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤

(i1,j1),(i2,j2),

(i3,j3),(i4,j4) in rectangle

ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2

SLIDE 40

Summary of L2 Result

Thm: (1+ε)-factor approx. (w/p 1-δ) in Õ(ε-2 ln δ-1) space.
Proof Ideas:

1) First attempt: Use AMS technique. 2) Road block: Can’t sketch product distribution. 3) Bilinear sketch: Product of sketches was sketch of product! 4) PANIC: No longer 4-wise independence. 5) Relax: We didn’t need full 4-wise independence.

SLIDE 41

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

SLIDE 42

L1 Result

SLIDE 43

L1 Result

Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.

SLIDE 44

L1 Result

Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
Why not (1+ ε)-factor using Indyk’s p-stable technique?
[Indyk, ’00]

SLIDE 45

L1 Result

Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
Why not (1+ ε)-factor using Indyk’s p-stable technique?
[Indyk, ’00]
Review of L1 sketching:
Let entries of z be Cauchy(0,1)
Compute estimator |z.a|
Repeat k=O(ε-2 ln δ-1) times with different z.
Take the median and appeal to concentration lemmas.

SLIDE 46

L1 Result

Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
Why not (1+ ε)-factor using Indyk’s p-stable technique?
[Indyk, ’00]
Review of L1 sketching:
Let entries of z be Cauchy(0,1)
Compute estimator |z.a|
Repeat k=O(ε-2 ln δ-1) times with different z.
Take the median and appeal to concentration lemmas.
N.B. If median were mean we’d have a dimensionality

reduction result that doesn’t exist. [Brinkman, Charikar ’03]

SLIDE 47

Sketching Sketches

SLIDE 48

Sketching Sketches

To sketch product distribution need

z = ( y )

n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

n2

z = yMx

SLIDE 49

Sketching Sketches

To sketch product distribution need

z = ( y )

n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

n2

z = yMx Mx

SLIDE 50

Sketching Sketches

To sketch product distribution need
Sketch: Inner Sketch Outer Sketch

z = ( y )

n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

n2

z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa

SLIDE 51

Sketching Sketches

To sketch product distribution need
Sketch: Inner Sketch Outer Sketch
The Problem:
Need to take median of multiple inner sketches before

taking outer sketch.

z = ( y )

n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

n2

z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa

SLIDE 52

Sketching Sketches

To sketch product distribution need
Sketch: Inner Sketch Outer Sketch
The Problem:
Need to take median of multiple inner sketches before

taking outer sketch.

The size of the inner sketch is large.

z = ( y )

n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

n2

z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa

SLIDE 53

L1 Result

SLIDE 54

L1 Result

Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.

SLIDE 55

L1 Result

Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
Proof:
Outer sketch: Entries y are Cauchy(0,1)
Inner sketch: Entries x are “truncated” Cauchy(0,1)

SLIDE 56

L1 Result

Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
Proof:
Outer sketch: Entries y are Cauchy(0,1)
Inner sketch: Entries x are “truncated” Cauchy(0,1)

Pr

Ω(1) ≤ |M(x).a|

|a|

≤ O(log n)

≥ 9/10

SLIDE 57

L1 Result

Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
Proof:
Outer sketch: Entries y are Cauchy(0,1)
Inner sketch: Entries x are “truncated” Cauchy(0,1)
Repeat Õ(ln δ-1) times and take median.

Pr

Ω(1) ≤ |M(x).a|

|a|

≤ O(log n)

≥ 9/10

SLIDE 58

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

SLIDE 59

Other Results

SLIDE 60

Other Results

Mutual Information:
Can’t (1+ε)-factor approximate in o(n) space
Can ±ε using algorithms for approx. entropy.
[Chakrabarti, Cormode, McGregor ’07]

SLIDE 61

Other Results

Mutual Information:
Can’t (1+ε)-factor approximate in o(n) space
Can ±ε using algorithms for approx. entropy.
[Chakrabarti, Cormode, McGregor ’07]
Distributed Model:
Player 1 sees (3,.), (5,.), (2,.), (3,.), (7,.), (1,.), (3,.), (6,.), ...
Player 2 sees (.,5), (.,3), (.,7), (.,4), (.,1), (.,2), (.,9), (.,6), ...
Very hard in general, e.g., can’t check if L1(s-r)=0

SLIDE 62

Other Results

Mutual Information:
Can’t (1+ε)-factor approximate in o(n) space
Can ±ε using algorithms for approx. entropy.
[Chakrabarti, Cormode, McGregor ’07]
Distributed Model:
Player 1 sees (3,.), (5,.), (2,.), (3,.), (7,.), (1,.), (3,.), (6,.), ...
Player 2 sees (.,5), (.,3), (.,7), (.,4), (.,1), (.,2), (.,9), (.,6), ...
Very hard in general, e.g., can’t check if L1(s-r)=0
Additive Approximation for L1(s-r):
where qi is q conditioned on first term equals i.
[Guha, McGregor,

Venkatasubramanian ’06]

L1(p − q) =

i piL1(q − qi)

SLIDE 63

Main Results

Can estimate L2(r-s) well using neat extension of AMS sketch. Can estimate L1(r-s) up to O(log n) factor using p-stable distributions. Can estimate mutual information additively using entropy algorithms.

Declaring Independence via the Sketching of Sketches

The Problem

The Problem

The Problem

The Problem

Formulation

Formulation

Formulation

Formulation

Formulation

Formulation

Our Results

Our Results

Our Results

Our Results

a) Neat Result for L2

b) Sketching Sketches

c) Other Results a) Neat Result for L2

b) Sketching Sketches

c) Other Results

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

First Attempt

First Attempt

First Attempt

First Attempt

First Attempt

First Attempt

First Attempt

First Attempt

Computing Estimator

Computing Estimator

Computing Estimator

Computing Estimator

Computing Estimator

Computing Estimator

Still Get Low Variance

Still Get Low Variance

Still Get Low Variance

Still Get Low Variance

Still Get Low Variance

Still Get Low Variance

Still Get Low Variance

Still Get Low Variance

Summary of L2 Result

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

L1 Result

L1 Result

L1 Result

L1 Result

L1 Result

Sketching Sketches

Sketching Sketches

Sketching Sketches

Sketching Sketches

Sketching Sketches

Sketching Sketches

L1 Result

L1 Result

L1 Result

L1 Result

L1 Result

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

Other Results

Other Results

Other Results

Other Results

Main Results

Questions?