Sketching and Streaming for Distributions Piotr Indyk Andrew - - PowerPoint PPT Presentation

sketching and streaming for distributions
SMART_READER_LITE
LIVE PREVIEW

Sketching and Streaming for Distributions Piotr Indyk Andrew - - PowerPoint PPT Presentation

Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Main Material: Stable distributions, pseudo-random generators, embeddings, and data stream


slide-1
SLIDE 1

Sketching and Streaming for Distributions

Piotr Indyk Andrew McGregor

Massachusetts Institute of

Technology

University of California, San Diego

Main Material: Stable distributions, pseudo-random generators, embeddings, and data stream computation Piotr Indyk (FOCS 2000) Sketching information divergences Sudipto Guha, Piotr Indyk, Andrew McGregor (COLT 2007) Declaring independence via the sketching of sketches Piotr Indyk, Andrew McGregor (SODA 2008)

slide-2
SLIDE 2

The Problem

slide-3
SLIDE 3

The Problem

  • List of m red values and m green values in [n]

3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4, ...

slide-4
SLIDE 4

The Problem

  • List of m red values and m green values in [n]

3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4, ...

  • Define distributions (p1, ..., pn) and (q1, ..., qn)
slide-5
SLIDE 5

The Problem

  • List of m red values and m green values in [n]

3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4, ...

  • Define distributions (p1, ..., pn) and (q1, ..., qn)
  • How “different” are p and q?
slide-6
SLIDE 6

The Problem

  • List of m red values and m green values in [n]

3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4, ...

  • Define distributions (p1, ..., pn) and (q1, ..., qn)
  • How “different” are p and q?

Variational: |pi − qi| Euclidean: (pi − qi)2 Hellinger: (√pi − √qi)2 Kullback-Leibler: pi log(pi/qi)

slide-7
SLIDE 7

The Problem

  • List of m red values and m green values in [n]

3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4, ...

  • Define distributions (p1, ..., pn) and (q1, ..., qn)
  • How “different” are p and q?

where f and F are convex and f(1)=0.

Df (p, q) = pif (qi/pi) BF(p, q) = [F(pi)−F(qi)−(pi − qi)F ′(qi)]

slide-8
SLIDE 8

The Catch...

slide-9
SLIDE 9

The Catch...

  • What if m and n are huge and you can’t store the list?
slide-10
SLIDE 10

The Catch...

  • What if m and n are huge and you can’t store the list?
  • Applications: monitoring internet traffic, I/O efficient

external memory, processing huge log files, database query planning, sensor networks, ...

slide-11
SLIDE 11

The Catch...

  • What if m and n are huge and you can’t store the list?
  • Applications: monitoring internet traffic, I/O efficient

external memory, processing huge log files, database query planning, sensor networks, ...

  • Data Stream Model:

No control over the order of the stream Limited working memory, e.g., polylog(n,m) space Limited time to process each element

slide-12
SLIDE 12

The Catch...

  • What if m and n are huge and you can’t store the list?
  • Applications: monitoring internet traffic, I/O efficient

external memory, processing huge log files, database query planning, sensor networks, ...

  • Data Stream Model:

No control over the order of the stream Limited working memory, e.g., polylog(n,m) space Limited time to process each element

  • Previous work: quantiles, frequency moments, histograms,

clustering, entropy, graph problems...

see, e.g., Muthukrishnan “Data Streams: Algorithms and Applications”

slide-13
SLIDE 13

Today’s Talk

slide-14
SLIDE 14

Today’s Talk

  • Sketching Lp distances (0<p≤2):
  • (1+ε)-approx. with prob. 1-δ in Õ(ε-2 ln δ-1) space
  • Stable distributions and pseudo-random generators
  • Stable distributions, pseudo-random generators, embeddings & data stream

computation (Indyk, FOCS 2000)

slide-15
SLIDE 15

Today’s Talk

  • Sketching Lp distances (0<p≤2):
  • (1+ε)-approx. with prob. 1-δ in Õ(ε-2 ln δ-1) space
  • Stable distributions and pseudo-random generators
  • Stable distributions, pseudo-random generators, embeddings & data stream

computation (Indyk, FOCS 2000)

  • Impossibility of Extending to Other Divergences:
  • Can we sketch other divergences such as Hellinger?
  • Lower bounds via communication complexity
  • Sketching information divergences (Guha, Indyk, McGregor, COLT 2007)
slide-16
SLIDE 16

Today’s Talk

  • Sketching Lp distances (0<p≤2):
  • (1+ε)-approx. with prob. 1-δ in Õ(ε-2 ln δ-1) space
  • Stable distributions and pseudo-random generators
  • Stable distributions, pseudo-random generators, embeddings & data stream

computation (Indyk, FOCS 2000)

  • Impossibility of Extending to Other Divergences:
  • Can we sketch other divergences such as Hellinger?
  • Lower bounds via communication complexity
  • Sketching information divergences (Guha, Indyk, McGregor, COLT 2007)
  • Using sketches to test independence:
  • Testing independence between data streams
  • Declaring independence via the sketching of sketches (Indyk, McGregor,

SODA 2008)

slide-17
SLIDE 17
  • 1. Sketching Lp distances

p-stable distributions, pseudo-random generators

  • 2. The Unsketchables

information divergences, communication complexity

  • 3. Sketching Sketches

identifying correlations in data streams

slide-18
SLIDE 18
  • 1. Sketching Lp distances

p-stable distributions, pseudo-random generators

  • 2. The Unsketchables

information divergences, communication complexity

  • 3. Sketching Sketches

identifying correlations in data streams

slide-19
SLIDE 19

Stable Distributions

slide-20
SLIDE 20

Stable Distributions

  • A p-stable distribution μ has the following property:

If X, Y, Z ∼ µ and a, b ∈ R then : aX + bY ∼ (|a|p + |b|p)1/pZ

slide-21
SLIDE 21

Stable Distributions

  • A p-stable distribution μ has the following property:
  • Examples:

Normal(0,1) is 2-stable: Cauchy is 1-stable: 1 π 1 1 + x2 1 √ 2π e−x2/2 If X, Y, Z ∼ µ and a, b ∈ R then : aX + bY ∼ (|a|p + |b|p)1/pZ

slide-22
SLIDE 22

Approximating L1 and L2

slide-23
SLIDE 23

Approximating L1 and L2

  • Let μ be a p-stable distribution (0<p≤1)
slide-24
SLIDE 24

Approximating L1 and L2

  • Let μ be a p-stable distribution (0<p≤1)
  • Ideal Algorithm:

For i = 1 to k: Let x be a length n vector with xj ~ μ Compute ti = |x.(p-q)| Return median(t1, t2, ... , tn)/median(|μ|)

slide-25
SLIDE 25

Approximating L1 and L2

  • Let μ be a p-stable distribution (0<p≤1)
  • Ideal Algorithm:

For i = 1 to k: Let x be a length n vector with xj ~ μ Compute ti = |x.(p-q)| Return median(t1, t2, ... , tn)/median(|μ|)

Easy to compute x.(p-q): for stream 3,5,3,7,5, ... compute x3-x5+x3-x7-x5- ... and scale.

slide-26
SLIDE 26

Approximating L1 and L2

  • Let μ be a p-stable distribution (0<p≤1)
  • Ideal Algorithm:

For i = 1 to k: Let x be a length n vector with xj ~ μ Compute ti = |x.(p-q)| Return median(t1, t2, ... , tn)/median(|μ|)

Easy to compute x.(p-q): for stream 3,5,3,7,5, ... compute x3-x5+x3-x7-x5- ... and scale.

  • Lemma: Returns (1±ε)Lp(p-q) with prob. 1-δ, if k=Õ(ε-2 ln δ-1).
slide-27
SLIDE 27

Approximating L1 and L2

  • Let μ be a p-stable distribution (0<p≤1)
  • Ideal Algorithm:

For i = 1 to k: Let x be a length n vector with xj ~ μ Compute ti = |x.(p-q)| Return median(t1, t2, ... , tn)/median(|μ|)

Easy to compute x.(p-q): for stream 3,5,3,7,5, ... compute x3-x5+x3-x7-x5- ... and scale.

  • Lemma: Returns (1±ε)Lp(p-q) with prob. 1-δ, if k=Õ(ε-2 ln δ-1).
  • Proof:
  • Each ti ~ L1(p-q) |μ| by p-stablity property.
  • Apply Chernoff bounds.
slide-28
SLIDE 28

Sketches and Space

slide-29
SLIDE 29

Sketches and Space

  • Sketch/Embedding into Small Dimension:
slide-30
SLIDE 30

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
slide-31
SLIDE 31

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
  • Let C(y)= (x1.y, ... , xk.y)
slide-32
SLIDE 32

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
  • Let C(y)= (x1.y, ... , xk.y)
  • Approximate L1(p-q) from C(p) and C(p)
slide-33
SLIDE 33

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
  • Let C(y)= (x1.y, ... , xk.y)
  • Approximate L1(p-q) from C(p) and C(p)
  • CAUTION: Not an embedding into a normed space.
slide-34
SLIDE 34

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
  • Let C(y)= (x1.y, ... , xk.y)
  • Approximate L1(p-q) from C(p) and C(p)
  • CAUTION: Not an embedding into a normed space.
  • Can we also construct sketch in small space:
slide-35
SLIDE 35

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
  • Let C(y)= (x1.y, ... , xk.y)
  • Approximate L1(p-q) from C(p) and C(p)
  • CAUTION: Not an embedding into a normed space.
  • Can we also construct sketch in small space:
  • Storing all xi requires Ω(nk) space.
slide-36
SLIDE 36

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
  • Let C(y)= (x1.y, ... , xk.y)
  • Approximate L1(p-q) from C(p) and C(p)
  • CAUTION: Not an embedding into a normed space.
  • Can we also construct sketch in small space:
  • Storing all xi requires Ω(nk) space.
  • Generate xi with Nisan’s pseudo-random generator.
slide-37
SLIDE 37

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
  • Let C(y)= (x1.y, ... , xk.y)
  • Approximate L1(p-q) from C(p) and C(p)
  • CAUTION: Not an embedding into a normed space.
  • Can we also construct sketch in small space:
  • Storing all xi requires Ω(nk) space.
  • Generate xi with Nisan’s pseudo-random generator.
  • Can store the seed in O(polylog n) space.
slide-38
SLIDE 38

Sketches and Space

  • Sketch/Embedding into Small Dimension:
  • Let x1, x2, ... , xk be length n vector with xji ~ μ
  • Let C(y)= (x1.y, ... , xk.y)
  • Approximate L1(p-q) from C(p) and C(p)
  • CAUTION: Not an embedding into a normed space.
  • Can we also construct sketch in small space:
  • Storing all xi requires Ω(nk) space.
  • Generate xi with Nisan’s pseudo-random generator.
  • Can store the seed in O(polylog n) space.
  • Thm: Can (1+ε)-approx Lp(p-q) in Õ(ε-2 ln δ-1) space.
slide-39
SLIDE 39
  • 1. Sketching Lp distances

p-stable distributions, pseudo-random generators

  • 2. The Unsketchables

information divergences, communication complexity

  • 3. Sketching Sketches

identifying correlations in data streams

slide-40
SLIDE 40

Results

slide-41
SLIDE 41

Results

  • Thm (Shift Invariance): t-approx. of needs Ω(n)

space if for some a, b, c and m=an/4+bn+cn/2,

φ a

m, a+c m

  • > t2n

4

  • φ

b+c

m , b m

  • + φ

b

m, b+c m

  • φ(pi, qi)
slide-42
SLIDE 42

Results

  • Thm (Shift Invariance): t-approx. of needs Ω(n)

space if for some a, b, c and m=an/4+bn+cn/2,

φ a

m, a+c m

  • > t2n

4

  • φ

b+c

m , b m

  • + φ

b

m, b+c m

  • φ(pi, qi)
slide-43
SLIDE 43

Results

  • Thm (Shift Invariance): t-approx. of needs Ω(n)

space if for some a, b, c and m=an/4+bn+cn/2,

φ a

m, a+c m

  • > t2n

4

  • φ

b+c

m , b m

  • + φ

b

m, b+c m

  • φ(pi, qi)

a/m a/m+δ

slide-44
SLIDE 44

Results

  • Thm (Shift Invariance): t-approx. of needs Ω(n)

space if for some a, b, c and m=an/4+bn+cn/2,

φ a

m, a+c m

  • > t2n

4

  • φ

b+c

m , b m

  • + φ

b

m, b+c m

  • φ(pi, qi)

b/m b/m+δ

slide-45
SLIDE 45

Results

  • Thm (Shift Invariance): t-approx. of needs Ω(n)

space if for some a, b, c and m=an/4+bn+cn/2,

  • Corollary: Poly(n)-approx. of Df requires Ω(n) space if f is

twice differentiable and strictly convex.

φ a

m, a+c m

  • > t2n

4

  • φ

b+c

m , b m

  • + φ

b

m, b+c m

  • φ(pi, qi)
slide-46
SLIDE 46

Results

  • Thm (Shift Invariance): t-approx. of needs Ω(n)

space if for some a, b, c and m=an/4+bn+cn/2,

  • Corollary: Poly(n)-approx. of Df requires Ω(n) space if f is

twice differentiable and strictly convex.

  • Corollary: Poly(n)-approx. of BF requires Ω(n) space if there

exists ρ, z0 with

φ a

m, a+c m

  • > t2n

4

  • φ

b+c

m , b m

  • + φ

b

m, b+c m

  • φ(pi, qi)

∀ 0 ≤ z2 ≤ z1 ≤ z0, F ′′(z1)/F ′′(z2) ≥ (z1/z2)ρ

  • r

∀ 0 ≤ z2 ≤ z1 ≤ z0, F ′′(z1)/F ′′(z2) ≤ (z2/z1)ρ

slide-47
SLIDE 47

Results

  • Thm (Shift Invariance): t-approx. of needs Ω(n)

space if for some a, b, c and m=an/4+bn+cn/2,

  • Corollary: Poly(n)-approx. of Df requires Ω(n) space if f is

twice differentiable and strictly convex.

  • Corollary: Poly(n)-approx. of BF requires Ω(n) space if there

exists ρ, z0 with

  • Only exceptions are L1 and L2!

φ a

m, a+c m

  • > t2n

4

  • φ

b+c

m , b m

  • + φ

b

m, b+c m

  • φ(pi, qi)

∀ 0 ≤ z2 ≤ z1 ≤ z0, F ′′(z1)/F ′′(z2) ≥ (z1/z2)ρ

  • r

∀ 0 ≤ z2 ≤ z1 ≤ z0, F ′′(z1)/F ′′(z2) ≤ (z2/z1)ρ

slide-48
SLIDE 48

Results

  • Thm (Shift Invariance): t-approx. of needs Ω(n)

space if for some a, b, c and m=an/4+bn+cn/2,

  • Corollary: Poly(n)-approx. of Df requires Ω(n) space if f is

twice differentiable and strictly convex.

  • Corollary: Poly(n)-approx. of BF requires Ω(n) space if there

exists ρ, z0 with

  • Only exceptions are L1 and L2!

!

BREAKING NEWS: Many of these lower bounds also apply for

randomly ordered streams [Chakrabarti, Cormode, McGregor 2007]

φ a

m, a+c m

  • > t2n

4

  • φ

b+c

m , b m

  • + φ

b

m, b+c m

  • φ(pi, qi)

∀ 0 ≤ z2 ≤ z1 ≤ z0, F ′′(z1)/F ′′(z2) ≥ (z1/z2)ρ

  • r

∀ 0 ≤ z2 ≤ z1 ≤ z0, F ′′(z1)/F ′′(z2) ≤ (z2/z1)ρ

slide-49
SLIDE 49

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

slide-50
SLIDE 50

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

Question: Are x and y disjoint, i.e., x.y=0? Thm (Razborov ’92): Needs Ω(n) communication.

slide-51
SLIDE 51

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

axi+b(1-xi) copies of i & i (i ∈ [n]) b copies of i+n & i+n (i ∈ [n/4])

slide-52
SLIDE 52

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

axi+b(1-xi) copies of i & i (i ∈ [n]) b copies of i+n & i+n (i ∈ [n/4])

slide-53
SLIDE 53

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

axi+b(1-xi) copies of i & i (i ∈ [n]) b copies of i+n & i+n (i ∈ [n/4]) cyi copies of i (i ∈ [n]) c copies of i+n (i ∈ [n/4])

slide-54
SLIDE 54

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

axi+b(1-xi) copies of i & i (i ∈ [n]) b copies of i+n & i+n (i ∈ [n/4]) cyi copies of i (i ∈ [n]) c copies of i+n (i ∈ [n/4])

slide-55
SLIDE 55

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

axi+b(1-xi) copies of i & i (i ∈ [n]) b copies of i+n & i+n (i ∈ [n/4]) cyi copies of i (i ∈ [n]) c copies of i+n (i ∈ [n/4])

If x.y = 0 then divergence is If x.y =1 then divergence is at least

n 4

  • φ

b m, b + c m

  • + φ

b + c m , b m

  • φ

a m, a + c m

  • Factor t2 difference by assumption
slide-56
SLIDE 56

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

axi+b(1-xi) copies of i & i (i ∈ [n]) b copies of i+n & i+n (i ∈ [n/4]) cyi copies of i (i ∈ [n]) c copies of i+n (i ∈ [n/4])

Let A be a t-approx algorithm:

  • 1. Alice runs A on first half
  • 2. Transmits memory state
  • 3. Bob instantiates A
  • 4. Continues A on second half
slide-57
SLIDE 57

Alice

x ∈ {0,1}n weight n/4

Bob

y ∈ {0,1}n weight n/4

axi+b(1-xi) copies of i & i (i ∈ [n]) b copies of i+n & i+n (i ∈ [n/4]) cyi copies of i (i ∈ [n]) c copies of i+n (i ∈ [n/4])

Let A be a t-approx algorithm:

  • 1. Alice runs A on first half
  • 2. Transmits memory state
  • 3. Bob instantiates A
  • 4. Continues A on second half

Thm: Any t-approx algorithm for the divergence requires Ω(n) memory.

slide-58
SLIDE 58

Corollary to Df

slide-59
SLIDE 59

Corollary to Df

  • Corollary: Any poly(n) approx. of

requires Ω(n) space if f’’ exists and is strictly positive. Df (p, q) = pif (qi/pi)

slide-60
SLIDE 60

Corollary to Df

  • Corollary: Any poly(n) approx. of

requires Ω(n) space if f’’ exists and is strictly positive.

  • Proof: Set a=c=1 and b=t2n(f’’(1)+1)/ 8f(2)

Df (p, q) = pif (qi/pi)

slide-61
SLIDE 61

Corollary to Df

  • Corollary: Any poly(n) approx. of

requires Ω(n) space if f’’ exists and is strictly positive.

  • Proof: Set a=c=1 and b=t2n(f’’(1)+1)/ 8f(2)

Take Taylor expansion of f around 1: φ(b/m, (b + c)/m) = (b/m)

  • f(1) + f ′(1)/b + f ′′(1 + γ)/(2b2)

(f ′′(1) + 1)/(2mb) ≤ 8φ(a/m, (a + c)/m)/(t2n) Df (p, q) = pif (qi/pi)

slide-62
SLIDE 62

Corollary to Df

  • Corollary: Any poly(n) approx. of

requires Ω(n) space if f’’ exists and is strictly positive.

  • Proof: Set a=c=1 and b=t2n(f’’(1)+1)/ 8f(2)

Take Taylor expansion of f around 1: φ(b/m, (b + c)/m) = (b/m)

  • f(1) + f ′(1)/b + f ′′(1 + γ)/(2b2)

(f ′′(1) + 1)/(2mb) ≤ 8φ(a/m, (a + c)/m)/(t2n) Df (p, q) = pif (qi/pi)

slide-63
SLIDE 63

Corollary to Df

  • Corollary: Any poly(n) approx. of

requires Ω(n) space if f’’ exists and is strictly positive.

  • Proof: Set a=c=1 and b=t2n(f’’(1)+1)/ 8f(2)

Take Taylor expansion of f around 1: φ(b/m, (b + c)/m) = (b/m)

  • f(1) + f ′(1)/b + f ′′(1 + γ)/(2b2)

(f ′′(1) + 1)/(2mb) ≤ 8φ(a/m, (a + c)/m)/(t2n) Df (p, q) = pif (qi/pi)

slide-64
SLIDE 64

Corollary to Df

  • Corollary: Any poly(n) approx. of

requires Ω(n) space if f’’ exists and is strictly positive.

  • Proof: Set a=c=1 and b=t2n(f’’(1)+1)/ 8f(2)

Take Taylor expansion of f around 1: Similarly, φ(b/m, (b + c)/m) = (b/m)

  • f(1) + f ′(1)/b + f ′′(1 + γ)/(2b2)

(f ′′(1) + 1)/(2mb) ≤ 8φ(a/m, (a + c)/m)/(t2n) φ((b + c)/m, b/m) ≤ 8φ(a/m, (a + c)/m)/(t2n) Df (p, q) = pif (qi/pi)

slide-65
SLIDE 65

Corollary to Df

  • Corollary: Any poly(n) approx. of

requires Ω(n) space if f’’ exists and is strictly positive.

  • Proof: Set a=c=1 and b=t2n(f’’(1)+1)/ 8f(2)

Take Taylor expansion of f around 1: Similarly, Result follows by the Shift-Invariant Theorem. φ(b/m, (b + c)/m) = (b/m)

  • f(1) + f ′(1)/b + f ′′(1 + γ)/(2b2)

(f ′′(1) + 1)/(2mb) ≤ 8φ(a/m, (a + c)/m)/(t2n) φ((b + c)/m, b/m) ≤ 8φ(a/m, (a + c)/m)/(t2n) Df (p, q) = pif (qi/pi)

slide-66
SLIDE 66

Additive Error Algorithms

slide-67
SLIDE 67

Additive Error Algorithms

  • f-Divergences:

Thm: If Df bounded, Oε(ln δ-1) space is sufficient for ±ε

  • approx. with prob. 1-δ.

Thm: If Df unbounded, need Ω(n) space.

slide-68
SLIDE 68

Additive Error Algorithms

  • f-Divergences:

Thm: If Df bounded, Oε(ln δ-1) space is sufficient for ±ε

  • approx. with prob. 1-δ.

Thm: If Df unbounded, need Ω(n) space.

  • Bregman Divergences:

Thm: If F(0), F’(0), and F’’(.) exist, Oε(ln δ-1) space is sufficient for ±ε approx. with prob. 1-δ. Thm: If F(0) or F’(0) infinite, need Ω(n) space.

slide-69
SLIDE 69
  • 1. Sketching Lp distances

p-stable distributions, pseudo-random generators

  • 2. The Unsketchables

information divergences, communication complexity

  • 3. Sketching Sketches

identifying correlations in data streams

slide-70
SLIDE 70

New Problem

  • List of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Stream defines random variables:
slide-71
SLIDE 71

New Problem

  • List of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Stream defines random variables:
  • Xp with distribution (p1, ..., pn)
slide-72
SLIDE 72

New Problem

  • List of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Stream defines random variables:
  • Xp with distribution (p1, ..., pn)
  • Xq with distribution (q1, ..., qn)
slide-73
SLIDE 73

New Problem

  • List of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Stream defines random variables:
  • Xp with distribution (p1, ..., pn)
  • Xq with distribution (q1, ..., qn)
  • (Xp, Xq) with distribution (r11, r12, ..., rnn)
slide-74
SLIDE 74

New Problem

  • List of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Stream defines random variables:
  • Xp with distribution (p1, ..., pn)
  • Xq with distribution (q1, ..., qn)
  • (Xp, Xq) with distribution (r11, r12, ..., rnn)
  • “How independent are Xp and Xq?”
slide-75
SLIDE 75

New Problem

  • List of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Stream defines random variables:
  • Xp with distribution (p1, ..., pn)
  • Xq with distribution (q1, ..., qn)
  • (Xp, Xq) with distribution (r11, r12, ..., rnn)
  • “How independent are Xp and Xq?”
  • Is the joint distribution “far” from the product

distribution (s11, s12, ..., snn)?

slide-76
SLIDE 76

New Problem

  • List of m pairs in [n] x [n]:

(3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  • Stream defines random variables:
  • Xp with distribution (p1, ..., pn)
  • Xq with distribution (q1, ..., qn)
  • (Xp, Xq) with distribution (r11, r12, ..., rnn)
  • “How independent are Xp and Xq?”
  • Is the joint distribution “far” from the product

distribution (s11, s12, ..., snn)?

  • Consider L1(s-r) or L2(s-r) or mutual information:

I(Xp; Xq) = H(Xp) + H(Xq) − H(Xp, Xq) =

i,j rij lg pi rij

slide-77
SLIDE 77

Results

  • Estimating L2(s-r):
  • Thm: (1+ε)-factor approx. (w/p 1-δ) in Õ(ε-2 ln δ-1) space.
  • Estimating L1(s-r):
  • Thm: O(ln n)-factor approx. (w/p 1-δ) in Õ(ln δ-1) space.
  • Thm: ±ε approx. (w/p 1-δ) in Õ(ε-4 ln δ-1) space (2-pass).
  • Estimating I(Xp, Xq):
  • Thm: No 5/4-factor approx. (w/p 4/5) in O(n) space.
  • Thm: ±ε approx. (w/p 1-δ) in Õ(ε-2 ln δ-1) space.
slide-78
SLIDE 78

L2 Sketching Revisited

slide-79
SLIDE 79

L2 Sketching Revisited

  • Let x ∈ {-1,1}n where xi are unbiased 4-wise indept.
slide-80
SLIDE 80

L2 Sketching Revisited

  • Let x ∈ {-1,1}n where xi are unbiased 4-wise indept.
  • Compute x.(p-q)....
slide-81
SLIDE 81

L2 Sketching Revisited

  • Let x ∈ {-1,1}n where xi are unbiased 4-wise indept.
  • Compute x.(p-q)....

E[(x.(p − q))2] = Σi,jE[xixj](pi − qi)(pj − qj) = (L2(p − q))2 Var[(x.(p − q))2] ≤ E[(x.(p − q))4] = Σi,j,k,lE[xixjxkxl](pi − qi)(pj − qj)(pk − qk)(pl − ql) = (L2(p − q))4

slide-82
SLIDE 82

L2 Sketching Revisited

  • Let x ∈ {-1,1}n where xi are unbiased 4-wise indept.
  • Compute x.(p-q)....

E[(x.(p − q))2] = Σi,jE[xixj](pi − qi)(pj − qj) = (L2(p − q))2 Var[(x.(p − q))2] ≤ E[(x.(p − q))4] = Σi,j,k,lE[xixjxkxl](pi − qi)(pj − qj)(pk − qk)(pl − ql) = (L2(p − q))4

slide-83
SLIDE 83

L2 Sketching Revisited

  • Let x ∈ {-1,1}n where xi are unbiased 4-wise indept.
  • Compute x.(p-q)....

E[(x.(p − q))2] = Σi,jE[xixj](pi − qi)(pj − qj) = (L2(p − q))2 Var[(x.(p − q))2] ≤ E[(x.(p − q))4] = Σi,j,k,lE[xixjxkxl](pi − qi)(pj − qj)(pk − qk)(pl − ql) = (L2(p − q))4

slide-84
SLIDE 84

L2 Sketching Revisited

  • Let x ∈ {-1,1}n where xi are unbiased 4-wise indept.
  • Compute x.(p-q)....

E[(x.(p − q))2] = Σi,jE[xixj](pi − qi)(pj − qj) = (L2(p − q))2 Var[(x.(p − q))2] ≤ E[(x.(p − q))4] = Σi,j,k,lE[xixjxkxl](pi − qi)(pj − qj)(pk − qk)(pl − ql) = (L2(p − q))4

slide-85
SLIDE 85

L2 Sketching Revisited

  • Let x ∈ {-1,1}n where xi are unbiased 4-wise indept.
  • Compute x.(p-q)....
  • Thm: By Chebychev bounds, the average of O(ε-2 ln δ-1)

repetitions yields (1±ε) L2(p-q) with prob. 1-δ.

  • [Alon, Matias, Szegedy 1996]

E[(x.(p − q))2] = Σi,jE[xixj](pi − qi)(pj − qj) = (L2(p − q))2 Var[(x.(p − q))2] ≤ E[(x.(p − q))4] = Σi,j,k,lE[xixjxkxl](pi − qi)(pj − qj)(pk − qk)(pl − ql) = (L2(p − q))4

slide-86
SLIDE 86

Testing L2 Independence

slide-87
SLIDE 87

Testing L2 Independence

  • Idea: Estimate L2(r-s) using where zij are

unbiased 4-wise independent.

z ∈ {−1, 1}n×n

slide-88
SLIDE 88

Testing L2 Independence

  • Idea: Estimate L2(r-s) using where zij are

unbiased 4-wise independent.

  • Problem: Can’t compute sketch of product distribution!

z ∈ {−1, 1}n×n

slide-89
SLIDE 89

Testing L2 Independence

  • Idea: Estimate L2(r-s) using where zij are

unbiased 4-wise independent.

  • Problem: Can’t compute sketch of product distribution!
  • Solution: Let be 4-wise independent and

set zij = xi yj.

z ∈ {−1, 1}n×n

x, y ∈ {−1, 1}n z.s =

ij zijsij = (x.p)(y.q)

slide-90
SLIDE 90

Testing L2 Independence

  • Idea: Estimate L2(r-s) using where zij are

unbiased 4-wise independent.

  • Problem: Can’t compute sketch of product distribution!
  • Solution: Let be 4-wise independent and

set zij = xi yj.

  • Entries are no longer 4-wise independent but it’s okay.

Let aij = rij - sij, and consider T = Σij zij aij :

z ∈ {−1, 1}n×n

Var[T 2] ≤ E[T 4] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ 3(L2(p − q))4

x, y ∈ {−1, 1}n z.s =

ij zijsij = (x.p)(y.q)

slide-91
SLIDE 91

Testing L2 Independence

  • Idea: Estimate L2(r-s) using where zij are

unbiased 4-wise independent.

  • Problem: Can’t compute sketch of product distribution!
  • Solution: Let be 4-wise independent and

set zij = xi yj.

  • Entries are no longer 4-wise independent but it’s okay.

Let aij = rij - sij, and consider T = Σij zij aij :

z ∈ {−1, 1}n×n

Var[T 2] ≤ E[T 4] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ 3(L2(p − q))4

x, y ∈ {−1, 1}n z.s =

ij zijsij = (x.p)(y.q)

slide-92
SLIDE 92

Testing L2 Independence

  • Idea: Estimate L2(r-s) using where zij are

unbiased 4-wise independent.

  • Problem: Can’t compute sketch of product distribution!
  • Solution: Let be 4-wise independent and

set zij = xi yj.

  • Entries are no longer 4-wise independent but it’s okay.

Let aij = rij - sij, and consider T = Σij zij aij :

  • Repeat O(ε-2 ln δ-1) times to deduce (1±ε) L2(r-s)

z ∈ {−1, 1}n×n

Var[T 2] ≤ E[T 4] = Σi1,j1,i2,j2,i3,j3,i4,j4E[zi1j1zi2j2zi3j3zi4j4]ai1j1ai2j2ai3j3ai4j4 ≤ 3(L2(p − q))4

x, y ∈ {−1, 1}n z.s =

ij zijsij = (x.p)(y.q)

slide-93
SLIDE 93

Summary

Small space sketches of L1 and L2 using p-stable distributions. No small space sketches exists for

  • ther information divergences.

Can use sketching ideas to estimate independence.

Main Material: Stable distributions, pseudo-random generators, embeddings, and data stream computation Piotr Indyk (FOCS 2000) Sketching information divergences Sudipto Guha, Piotr Indyk, Andrew McGregor (COLT 2007) Declaring independence via the sketching of sketches Piotr Indyk, Andrew McGregor (SODA 2008)