Declaring Independence via the Sketching of Sketches Piotr Indyk - - PowerPoint PPT Presentation

declaring independence via the sketching of sketches
SMART_READER_LITE
LIVE PREVIEW

Declaring Independence via the Sketching of Sketches Piotr Indyk - - PowerPoint PPT Presentation

Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August 08 -- Hire Me! The Problem The Problem Center for Disease Control


slide-1
SLIDE 1

Declaring Independence via the Sketching of Sketches

Piotr Indyk Andrew McGregor

Massachusetts Institute of

Technology

University of California, San Diego

Until August ’08 -- Hire Me!

slide-2
SLIDE 2

The Problem

slide-3
SLIDE 3

The Problem

Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”

Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

slide-4
SLIDE 4

The Problem

  • Sample (sub-linear time):

How many are required to distinguish independence from “ε-far” from independence? [Batu et al. ’01], [Alon et al. ’07], [Valiant ’08]

Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”

Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

slide-5
SLIDE 5

The Problem

  • Sample (sub-linear time):

How many are required to distinguish independence from “ε-far” from independence? [Batu et al. ’01], [Alon et al. ’07], [Valiant ’08]

  • Stream (sub-linear space):

Access pairs sequentially or “online” and limited memory.

Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”

Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

slide-6
SLIDE 6

Formulation

slide-7
SLIDE 7

Formulation

  • Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

slide-8
SLIDE 8

Formulation

  • Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Define “empirical” distributions:
  • Marginals: (p1, ..., pn), (q1, ..., qn)
  • Joint: (r11, r12, ..., rnn)
  • Product: (s11, s12, ..., snn) where sij equals pi qj
slide-9
SLIDE 9

Formulation

  • Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Define “empirical” distributions:
  • Marginals: (p1, ..., pn), (q1, ..., qn)
  • Joint: (r11, r12, ..., rnn)
  • Product: (s11, s12, ..., snn) where sij equals pi qj
  • Question: How correlated are first and second terms?
slide-10
SLIDE 10

Formulation

  • Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Define “empirical” distributions:
  • Marginals: (p1, ..., pn), (q1, ..., qn)
  • Joint: (r11, r12, ..., rnn)
  • Product: (s11, s12, ..., snn) where sij equals pi qj
  • Question: How correlated are first and second terms?
  • E.g.,

L1(s − r) =

i,j |sij − rij|

L2(s − r) = √

i,j(sij − rij)2

I(s, r) = H(p) − H(p|q)

slide-11
SLIDE 11

Formulation

  • Stream of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Define “empirical” distributions:
  • Marginals: (p1, ..., pn), (q1, ..., qn)
  • Joint: (r11, r12, ..., rnn)
  • Product: (s11, s12, ..., snn) where sij equals pi qj
  • Question: How correlated are first and second terms?
  • E.g.,
  • Previous work: Can estimate L1 and L2 between marginals.
  • [Alon, Matias, Szegedy ’96], [Feigenbaum et al. ’99], [Indyk ’00],
  • [Guha, Indyk, McGregor ’07], [Ganguly, Cormode ’07]

L1(s − r) =

i,j |sij − rij|

L2(s − r) = √

i,j(sij − rij)2

I(s, r) = H(p) − H(p|q)

slide-12
SLIDE 12

Our Results

slide-13
SLIDE 13

Our Results

  • Estimating L2(s-r):
  • (1+ε)-factor approx. in Õ(ε-2 ln δ-1) space.
  • “Neat” result extending AMS sketches
slide-14
SLIDE 14

Our Results

  • Estimating L2(s-r):
  • (1+ε)-factor approx. in Õ(ε-2 ln δ-1) space.
  • “Neat” result extending AMS sketches
  • Estimating L1(s-r):
  • O(ln n)-factor approx. in Õ(ln δ-1) space.
  • Sketches of sketches and sketches/embeddings
slide-15
SLIDE 15

Our Results

  • Estimating L2(s-r):
  • (1+ε)-factor approx. in Õ(ε-2 ln δ-1) space.
  • “Neat” result extending AMS sketches
  • Estimating L1(s-r):
  • O(ln n)-factor approx. in Õ(ln δ-1) space.
  • Sketches of sketches and sketches/embeddings
  • Other Results:
  • L1(s-r): Additive approximations
  • Mutual Information: Additive but not (1+ε)-factor approx.
  • Distributed Model: Pairs are observed by different parties.
slide-16
SLIDE 16

a) Neat Result for L2

b) Sketching Sketches

c) Other Results a) Neat Result for L2

b) Sketching Sketches

c) Other Results

slide-17
SLIDE 17

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

slide-18
SLIDE 18

First Attempt

slide-19
SLIDE 19

First Attempt

  • Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

z ∈ {−1, 1}n×n

slide-20
SLIDE 20

First Attempt

  • Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

  • Estimator: Suppose we can compute estimator:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

slide-21
SLIDE 21

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

  • Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

  • Estimator: Suppose we can compute estimator:
  • Correct in expectation and has small variance:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

slide-22
SLIDE 22

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

  • Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

  • Estimator: Suppose we can compute estimator:
  • Correct in expectation and has small variance:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

slide-23
SLIDE 23

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

  • Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

  • Estimator: Suppose we can compute estimator:
  • Correct in expectation and has small variance:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

slide-24
SLIDE 24

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

  • Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

  • Estimator: Suppose we can compute estimator:
  • Correct in expectation and has small variance:

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

slide-25
SLIDE 25

E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2

First Attempt

  • Random Projection: Let where zij are

unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

  • Estimator: Suppose we can compute estimator:
  • Correct in expectation and has small variance:
  • Repeating O(ε-2 ln δ-1) times and take the mean.

z ∈ {−1, 1}n×n

T = (z.r − z.s)2

(aij = rij − sij)

slide-26
SLIDE 26

Computing Estimator

slide-27
SLIDE 27

Computing Estimator

  • Need to compute: and

z.r z.s

slide-28
SLIDE 28

Computing Estimator

  • Need to compute: and
  • Good News: First term is easy

1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m

z.r z.s

slide-29
SLIDE 29

Computing Estimator

  • Need to compute: and
  • Good News: First term is easy

1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m

  • Bad News: Can’t compute second term!

z.r z.s

slide-30
SLIDE 30

Computing Estimator

  • Need to compute: and
  • Good News: First term is easy

1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m

  • Bad News: Can’t compute second term!
  • Good News: Use bilinear sketch: If for
  • i.e., product of sketches is sketch of product.

z.s =

ij zijsij = (x.p)(y.q)

x, y ∈ {−1, 1}n zij = xiyj z.r z.s

slide-31
SLIDE 31

Computing Estimator

  • Need to compute: and
  • Good News: First term is easy

1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m

  • Bad News: Can’t compute second term!
  • Good News: Use bilinear sketch: If for
  • i.e., product of sketches is sketch of product.
  • Bad News: z is no longer 4-wise independent even if x and

y are fully random, e.g., z.s =

ij zijsij = (x.p)(y.q)

x, y ∈ {−1, 1}n zij = xiyj

z11z12z21z22 = (x1)2(x2)2(y1)2(y2)2 = 1

z.r z.s

slide-32
SLIDE 32

Still Get Low Variance

slide-33
SLIDE 33

Still Get Low Variance

  • Lemma:

Variance has at most tripled.

slide-34
SLIDE 34

Still Get Low Variance

  • Lemma:

Variance has at most tripled.

  • Proof:

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

slide-35
SLIDE 35

Still Get Low Variance

  • Lemma:

Variance has at most tripled.

  • Proof:
  • Product of four entries is biased iff entries lie in rectangle

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

slide-36
SLIDE 36

Still Get Low Variance

  • Lemma:

Variance has at most tripled.

  • Proof:
  • Product of four entries is biased iff entries lie in rectangle
  • Hence,

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

Var[T] ≤

  • (i1,j1),(i2,j2),

(i3,j3),(i4,j4) in rectangle

ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2

slide-37
SLIDE 37

Still Get Low Variance

  • Lemma:

Variance has at most tripled.

  • Proof:
  • Product of four entries is biased iff entries lie in rectangle
  • Hence,
  • since a rectangle is uniquely specified by a diagonal and

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤

  • (i1,j1),(i2,j2),

(i3,j3),(i4,j4) in rectangle

ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2

slide-38
SLIDE 38

Still Get Low Variance

  • Lemma:

Variance has at most tripled.

  • Proof:
  • Product of four entries is biased iff entries lie in rectangle
  • Hence,
  • since a rectangle is uniquely specified by a diagonal and

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤

  • (i1,j1),(i2,j2),

(i3,j3),(i4,j4) in rectangle

ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2

slide-39
SLIDE 39

Still Get Low Variance

  • Lemma:

Variance has at most tripled.

  • Proof:
  • Product of four entries is biased iff entries lie in rectangle
  • Hence,
  • since a rectangle is uniquely specified by a diagonal and
  • Less independence useful for range-sums. [Rusu, Dobra ’06]

z =      x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn     

2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤

  • (i1,j1),(i2,j2),

(i3,j3),(i4,j4) in rectangle

ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2

slide-40
SLIDE 40

Summary of L2 Result

  • Thm: (1+ε)-factor approx. (w/p 1-δ) in Õ(ε-2 ln δ-1) space.
  • Proof Ideas:

1) First attempt: Use AMS technique. 2) Road block: Can’t sketch product distribution. 3) Bilinear sketch: Product of sketches was sketch of product! 4) PANIC: No longer 4-wise independence. 5) Relax: We didn’t need full 4-wise independence.

slide-41
SLIDE 41

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

slide-42
SLIDE 42

L1 Result

slide-43
SLIDE 43

L1 Result

  • Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
slide-44
SLIDE 44

L1 Result

  • Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
  • Why not (1+ ε)-factor using Indyk’s p-stable technique?
  • [Indyk, ’00]
slide-45
SLIDE 45

L1 Result

  • Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
  • Why not (1+ ε)-factor using Indyk’s p-stable technique?
  • [Indyk, ’00]
  • Review of L1 sketching:
  • Let entries of z be Cauchy(0,1)
  • Compute estimator |z.a|
  • Repeat k=O(ε-2 ln δ-1) times with different z.
  • Take the median and appeal to concentration lemmas.
slide-46
SLIDE 46

L1 Result

  • Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
  • Why not (1+ ε)-factor using Indyk’s p-stable technique?
  • [Indyk, ’00]
  • Review of L1 sketching:
  • Let entries of z be Cauchy(0,1)
  • Compute estimator |z.a|
  • Repeat k=O(ε-2 ln δ-1) times with different z.
  • Take the median and appeal to concentration lemmas.
  • N.B. If median were mean we’d have a dimensionality

reduction result that doesn’t exist. [Brinkman, Charikar ’03]

slide-47
SLIDE 47

Sketching Sketches

slide-48
SLIDE 48

Sketching Sketches

  • To sketch product distribution need

z = ( y )

  • n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

  • n2

z = yMx

slide-49
SLIDE 49

Sketching Sketches

  • To sketch product distribution need

z = ( y )

  • n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

  • n2

z = yMx Mx

slide-50
SLIDE 50

Sketching Sketches

  • To sketch product distribution need
  • Sketch: Inner Sketch Outer Sketch

z = ( y )

  • n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

  • n2

z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa

slide-51
SLIDE 51

Sketching Sketches

  • To sketch product distribution need
  • Sketch: Inner Sketch Outer Sketch
  • The Problem:
  • Need to take median of multiple inner sketches before

taking outer sketch.

z = ( y )

  • n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

  • n2

z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa

slide-52
SLIDE 52

Sketching Sketches

  • To sketch product distribution need
  • Sketch: Inner Sketch Outer Sketch
  • The Problem:
  • Need to take median of multiple inner sketches before

taking outer sketch.

  • The size of the inner sketch is large.

z = ( y )

  • n

     ( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )     

  • n2

z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa

slide-53
SLIDE 53

L1 Result

slide-54
SLIDE 54

L1 Result

  • Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
slide-55
SLIDE 55

L1 Result

  • Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
  • Proof:
  • Outer sketch: Entries y are Cauchy(0,1)
  • Inner sketch: Entries x are “truncated” Cauchy(0,1)
slide-56
SLIDE 56

L1 Result

  • Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
  • Proof:
  • Outer sketch: Entries y are Cauchy(0,1)
  • Inner sketch: Entries x are “truncated” Cauchy(0,1)

Pr

  • Ω(1) ≤ |M(x).a|

|a|

≤ O(log n)

  • ≥ 9/10
slide-57
SLIDE 57

L1 Result

  • Thm: O(ln n)-factor approx. of L1(s-r) in Õ(ln δ-1) space.
  • Proof:
  • Outer sketch: Entries y are Cauchy(0,1)
  • Inner sketch: Entries x are “truncated” Cauchy(0,1)
  • Repeat Õ(ln δ-1) times and take median.

Pr

  • Ω(1) ≤ |M(x).a|

|a|

≤ O(log n)

  • ≥ 9/10
slide-58
SLIDE 58

a) Neat Result for L2

b) Sketching Sketches

c) Other Results

slide-59
SLIDE 59

Other Results

slide-60
SLIDE 60

Other Results

  • Mutual Information:
  • Can’t (1+ε)-factor approximate in o(n) space
  • Can ±ε using algorithms for approx. entropy.
  • [Chakrabarti, Cormode, McGregor ’07]
slide-61
SLIDE 61

Other Results

  • Mutual Information:
  • Can’t (1+ε)-factor approximate in o(n) space
  • Can ±ε using algorithms for approx. entropy.
  • [Chakrabarti, Cormode, McGregor ’07]
  • Distributed Model:
  • Player 1 sees (3,.), (5,.), (2,.), (3,.), (7,.), (1,.), (3,.), (6,.), ...
  • Player 2 sees (.,5), (.,3), (.,7), (.,4), (.,1), (.,2), (.,9), (.,6), ...
  • Very hard in general, e.g., can’t check if L1(s-r)=0
slide-62
SLIDE 62

Other Results

  • Mutual Information:
  • Can’t (1+ε)-factor approximate in o(n) space
  • Can ±ε using algorithms for approx. entropy.
  • [Chakrabarti, Cormode, McGregor ’07]
  • Distributed Model:
  • Player 1 sees (3,.), (5,.), (2,.), (3,.), (7,.), (1,.), (3,.), (6,.), ...
  • Player 2 sees (.,5), (.,3), (.,7), (.,4), (.,1), (.,2), (.,9), (.,6), ...
  • Very hard in general, e.g., can’t check if L1(s-r)=0
  • Additive Approximation for L1(s-r):
  • where qi is q conditioned on first term equals i.
  • [Guha, McGregor,

Venkatasubramanian ’06]

L1(p − q) =

i piL1(q − qi)

slide-63
SLIDE 63

Main Results

Can estimate L2(r-s) well using neat extension of AMS sketch. Can estimate L1(r-s) up to O(log n) factor using p-stable distributions. Can estimate mutual information additively using entropy algorithms.

Questions?