Declaring Independence via the Sketching of Sketches
Piotr Indyk Andrew McGregor
Massachusetts Institute of
Technology
University of California, San Diego
Until August ’08 -- Hire Me!
Declaring Independence via the Sketching of Sketches Piotr Indyk - - PowerPoint PPT Presentation
Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August 08 -- Hire Me! The Problem The Problem Center for Disease Control
Piotr Indyk Andrew McGregor
Massachusetts Institute of
Technology
University of California, San Diego
Until August ’08 -- Hire Me!
Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”
Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg
How many are required to distinguish independence from “ε-far” from independence? [Batu et al. ’01], [Alon et al. ’07], [Valiant ’08]
Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”
Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg
How many are required to distinguish independence from “ε-far” from independence? [Batu et al. ’01], [Alon et al. ’07], [Valiant ’08]
Access pairs sequentially or “online” and limited memory.
Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?”
Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg
(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...
(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...
(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...
(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...
L1(s − r) =
i,j |sij − rij|
L2(s − r) = √
i,j(sij − rij)2
I(s, r) = H(p) − H(p|q)
(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...
L1(s − r) =
i,j |sij − rij|
L2(s − r) = √
i,j(sij − rij)2
I(s, r) = H(p) − H(p|q)
unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]
z ∈ {−1, 1}n×n
unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]
z ∈ {−1, 1}n×n
T = (z.r − z.s)2
E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2
unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]
z ∈ {−1, 1}n×n
T = (z.r − z.s)2
(aij = rij − sij)
E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2
unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]
z ∈ {−1, 1}n×n
T = (z.r − z.s)2
(aij = rij − sij)
E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2
unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]
z ∈ {−1, 1}n×n
T = (z.r − z.s)2
(aij = rij − sij)
E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2
unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]
z ∈ {−1, 1}n×n
T = (z.r − z.s)2
(aij = rij − sij)
E[T] = Σi1,j1,i2,j2E[zi1j1zi2j2]ai1j1ai2j2 = (L2(r − s))2 Var[T] ≤ E[T 2] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ E[T]2
unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]
z ∈ {−1, 1}n×n
T = (z.r − z.s)2
(aij = rij − sij)
z.r z.s
1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m
z.r z.s
1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m
z.r z.s
1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m
z.s =
ij zijsij = (x.p)(y.q)
x, y ∈ {−1, 1}n zij = xiyj z.r z.s
1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + zij/m
y are fully random, e.g., z.s =
ij zijsij = (x.p)(y.q)
x, y ∈ {−1, 1}n zij = xiyj
z11z12z21z22 = (x1)2(x2)2(y1)2(y2)2 = 1
z.r z.s
Variance has at most tripled.
Variance has at most tripled.
z = x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn
Variance has at most tripled.
z = x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn
Variance has at most tripled.
z = x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn
Var[T] ≤
(i3,j3),(i4,j4) in rectangle
ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2
Variance has at most tripled.
z = x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn
2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤
(i3,j3),(i4,j4) in rectangle
ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2
Variance has at most tripled.
z = x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn
2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤
(i3,j3),(i4,j4) in rectangle
ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2
Variance has at most tripled.
z = x1y1 x2y1 . . . . . . xny1 x1y2 x2y2 . . . . . . xny2 . . . . . . . . . x1yn x2yn . . . . . . xnyn
2ai1j1ai2j2ai3j3ai4j4 ≤ (ai1j1ai2j2)2 + (ai3j3ai4j4)2 Var[T] ≤
(i3,j3),(i4,j4) in rectangle
ai1j1ai2j2ai3j3ai4j4 ≤ 3E[T]2
1) First attempt: Use AMS technique. 2) Road block: Can’t sketch product distribution. 3) Bilinear sketch: Product of sketches was sketch of product! 4) PANIC: No longer 4-wise independence. 5) Relax: We didn’t need full 4-wise independence.
reduction result that doesn’t exist. [Brinkman, Charikar ’03]
z = ( y )
( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )
z = yMx
z = ( y )
( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )
z = yMx Mx
z = ( y )
( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )
z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa
taking outer sketch.
z = ( y )
( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )
z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa
taking outer sketch.
z = ( y )
( x ) ... ( x ) ... . . . . . . . . . . . . ... ( x )
z = yMx Mx Rn2 − → Rn a − → Mxa Rn − → R Mxa − → yMxa
Pr
|a|
≤ O(log n)
Pr
|a|
≤ O(log n)
Venkatasubramanian ’06]
L1(p − q) =
i piL1(q − qi)
Can estimate L2(r-s) well using neat extension of AMS sketch. Can estimate L1(r-s) up to O(log n) factor using p-stable distributions. Can estimate mutual information additively using entropy algorithms.