CS 498ABD: Algorithms for Big Data, Spring 2019
CountMin and Count Sketches
Lecture 10
February 14, 2019
Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18
CountMin and Count Sketches Lecture 10 February 14, 2019 Chandra - - PowerPoint PPT Presentation
CS 498ABD: Algorithms for Big Data, Spring 2019 CountMin and Count Sketches Lecture 10 February 14, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18 Heavy Hitters Problem Heavy Hitters Problem: Find all items i such that f i > m / k
February 14, 2019
Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18
Heavy Hitters Problem: Find all items i such that fi > m/k for some fixed k. Heavy hitters are very frequent items. We saw Misra-Gries deterministic algorithm that in O(k) space finds the heavy hitters assuming they exist. Two pass algorithm correctly identifies heavy hitters.
Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 18
Turnstile model: each update is (ij, ∆j) where ∆j can be positive or negative Strict turnstile: need xi ≥ 0 at all time for all i In terms of frequent items we want additive error to xi
Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 18
Heavy Hitters Problem: Find all items i such that fi > m/k. Let b1, b2, . . . , bk be the k heavy hitters Suppose we pick h : [n] → [ck] for some c > 1 h spreads b1, . . . , bk among the buckets (k balls into ck bins) In ideal situation each bucket can be used to count a separate heavy hitter
Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 18
Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18
[Cormode-Muthukrishnan]
CountMin-Sketch(w, d): h1, h2, . . . , hd are pair-wise independent hash functions from [n] → [w]. While (stream is not empty) do et = (it, ∆t) is current item for ℓ = 1 to d do C[ℓ, hℓ(ij)] ← C[ℓ, hℓ(ij)] + ∆t endWhile For i ∈ [n] set ˜ xi = mind
ℓ=1 C[ℓ, hℓ(i)].
Counter C[ℓ, j] simply counts the sum of all xi such that hℓ(i) = j. That is, C[ℓ, j] =
xi.
Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18
Suppose there are k heavy hitters b1, b2, . . . , bk Consider bi: Hash function hℓ sends bi to hℓ(bi). C[ℓ, h(bi)] counts xbi and also other items that hash to same bucket h(bi) so we always overcount (since strict turnstile model) Repeating with many hash functions and taking minimum is right thing to do: for bi the goal is to avoid other heavy hitters colliding with it
Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 18
Let d = Ω(log 1
δ) and w > 2 ǫ. Then for any fixed i ∈ [n], xi ≤ ˜
xi and Pr[˜ xi ≥ xi + ǫx1] ≤ δ.
Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 18
Let d = Ω(log 1
δ) and w > 2 ǫ. Then for any fixed i ∈ [n], xi ≤ ˜
xi and Pr[˜ xi ≥ xi + ǫx1] ≤ δ. Unlike Misra-Greis we have over estimates Actual items are not stored (requires work to recover heavy hitters) Works in strict turnstile model and hence can handle deletions Space usage is O( log(1/δ)
ǫ
) counters and hence O( log(1/δ)
ǫ
log m) bits
Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 18
Fix ℓ: hℓ(i) is the bucket that hℓ hashes i to.
Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18
Fix ℓ: hℓ(i) is the bucket that hℓ hashes i to. Zℓ = C[ℓ, hℓ(i)] is the counter value that i is hashed to.
Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18
Fix ℓ: hℓ(i) is the bucket that hℓ hashes i to. Zℓ = C[ℓ, hℓ(i)] is the counter value that i is hashed to. E[Zℓ] = xi +
i ′=i Pr[hℓ(i ′) = hℓ(i)]xi ′
Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18
Fix ℓ: hℓ(i) is the bucket that hℓ hashes i to. Zℓ = C[ℓ, hℓ(i)] is the counter value that i is hashed to. E[Zℓ] = xi +
i ′=i Pr[hℓ(i ′) = hℓ(i)]xi ′
By pairwise-independence E[Zℓ] = xi +
i ′=i xi ′/w ≤ xi + ǫx1/2
Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18
Fix ℓ: hℓ(i) is the bucket that hℓ hashes i to. Zℓ = C[ℓ, hℓ(i)] is the counter value that i is hashed to. E[Zℓ] = xi +
i ′=i Pr[hℓ(i ′) = hℓ(i)]xi ′
By pairwise-independence E[Zℓ] = xi +
i ′=i xi ′/w ≤ xi + ǫx1/2
Via Markov applied to Zℓ − xi (we use strict turnstile here) Pr[Zℓ] ≥ xi + ǫx1 ≤ 1/2
Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18
Fix ℓ: hℓ(i) is the bucket that hℓ hashes i to. Zℓ = C[ℓ, hℓ(i)] is the counter value that i is hashed to. E[Zℓ] = xi +
i ′=i Pr[hℓ(i ′) = hℓ(i)]xi ′
By pairwise-independence E[Zℓ] = xi +
i ′=i xi ′/w ≤ xi + ǫx1/2
Via Markov applied to Zℓ − xi (we use strict turnstile here) Pr[Zℓ] ≥ xi + ǫx1 ≤ 1/2 Since the d hash functions are independent Pr[minℓ Zℓ ≥ xi + ǫx1] ≤ 1/2d ≤ δ
Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18
Let d = Ω(log 1
δ) and w > 2 ǫ. Then for any fixed i ∈ [n], xi ≤ ˜
xi and Pr[˜ xi ≥ xi + ǫx1] ≤ δ. Choose d = 2 ln n and w = 2/ǫ: we have Pr[˜ xi ≥ xi + ǫx1] ≤ 1/n2. By union bound, with probability (1 − 1/n), for all i ∈ [n], ˜ xi ≤ xi + ǫx1
Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 18
Let d = Ω(log 1
δ) and w > 2 ǫ. Then for any fixed i ∈ [n], xi ≤ ˜
xi and Pr[˜ xi ≥ xi + ǫx1] ≤ δ. Choose d = 2 ln n and w = 2/ǫ: we have Pr[˜ xi ≥ xi + ǫx1] ≤ 1/n2. By union bound, with probability (1 − 1/n), for all i ∈ [n], ˜ xi ≤ xi + ǫx1 Total space O( 1
ǫ log n) counters and hence O( 1 ǫ log n log m) bits.
Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 18
Question: Why is CountMin a linear sketch?
Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 18
Question: Why is CountMin a linear sketch? Recall that for 1 ≤ ℓ ≤ d and 1 ≤ s ≤ w: C[ℓ, s] =
xi Thus, once hash function hℓ is fixed: C[ℓ, s] = u, x where u is a row vector in {0, 1}n such that ui = 1 if hℓ(i) = s and ui = 0 otherwise Thus, once hash functions are fixed, the counter values can be written as Mx where M ∈ {0, 1}wd×n is the sketch matrix
Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 18
Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 18
[Charikar-Chen-FarachColton]
Count-Sketch(w, d): h1, h2, . . . , hd are pair-wise independent hash functions from [n] → [w]. g1, g2, . . . , gd are pair-wise independent hash functions from [n] → {−1, 1}. While (stream is not empty) do et = (it, ∆t) is current item for ℓ = 1 to d do C[ℓ, hℓ(ij)] ← C[ℓ, hℓ(ij)] + g(it)∆t endWhile For i ∈ [n] set ˜ xi = median{g1(i)C[1, h1(i)], . . . , gℓ(i)C[ℓ, hℓ(i)]}.
Like CountMin, Count sketch has wd counters. Now counter values can become negative even if x is positive.
Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 18
Each hash function hℓ spreads the elements across w buckets The has function gℓ induces cancellations (inspired by F2 estimation algorithm) Since answer may be negative even if x ≥ 0, we take the median Exercise: Show that Count sketch is also a linear sketch.
Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 18
Let d ≥ 4 log 1
δ and w > 3 ǫ2. Then for any fixed i ∈ [n],
E[˜ xi] = xi and Pr[|˜ xi − xi| ≥ ǫx2] ≤ δ.
Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 18
Let d ≥ 4 log 1
δ and w > 3 ǫ2. Then for any fixed i ∈ [n],
E[˜ xi] = xi and Pr[|˜ xi − xi| ≥ ǫx2] ≤ δ. Comparison to CountMin Error guarantee is with respect to x2 instead of x1. For x ≥ 0, x2 ≤ x1 and in some cases x2 ≪ x1. Space increases to O( 1
ǫ2 log n) counters from O( 1 ǫ log n)
counters
Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 18
Fix an i ∈ [n]. Let Zℓ = gℓ(i)C[ℓ, hℓ(i)].
Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 18
Fix an i ∈ [n]. Let Zℓ = gℓ(i)C[ℓ, hℓ(i)]. For i ′ ∈ [n] let Yi ′ be the indicator random variable that is 1 if hℓ(i) = hℓ(i ′); that is i and i ′ collide in hℓ. E[Yi ′] = E[Y 2
i ′] = 1/w from pairwise independence of hℓ.
Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 18
Fix an i ∈ [n]. Let Zℓ = gℓ(i)C[ℓ, hℓ(i)]. For i ′ ∈ [n] let Yi ′ be the indicator random variable that is 1 if hℓ(i) = hℓ(i ′); that is i and i ′ collide in hℓ. E[Yi ′] = E[Y 2
i ′] = 1/w from pairwise independence of hℓ.
Zℓ = gℓ(i)C[ℓ, hℓ(i)] = gℓ(i)
gℓ(i ′)xi ′Yi ′
Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 18
Fix an i ∈ [n]. Let Zℓ = gℓ(i)C[ℓ, hℓ(i)]. For i ′ ∈ [n] let Yi ′ be the indicator random variable that is 1 if hℓ(i) = hℓ(i ′); that is i and i ′ collide in hℓ. E[Yi ′] = E[Y 2
i ′] = 1/w from pairwise independence of hℓ.
Zℓ = gℓ(i)C[ℓ, hℓ(i)] = gℓ(i)
gℓ(i ′)xi ′Yi ′ Therefore, E[Zℓ] = xi +
E[gℓ(i)gℓ(i ′)Yi ′]xi ′ = xi, because E[gℓ(i)gℓ(i ′)] = 0 for i = i ′ from pairwise independence
Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 18
Zℓ = gℓ(i)C[ℓ, hℓ(i)]. And E[Zℓ] = xi.
Chandra (UIUC) CS498ABD 17 Spring 2019 17 / 18
Zℓ = gℓ(i)C[ℓ, hℓ(i)]. And E[Zℓ] = xi. Var(Zℓ) = E
= E (
gℓ(i)gℓ(i ′)Yi ′xi ′)2 = E
i ′=i
x2
i ′Y 2 i ′ +
xi ′xi ′′gℓ(i ′)gℓ(i ′′)Yi ′Yi ′′ =
x2
i ′ E
i ′
x2
2/w.
Chandra (UIUC) CS498ABD 17 Spring 2019 17 / 18
Zℓ = gℓ(i)C[ℓ, hℓ(i)]. We have seen: E[Zℓ] = xi and Var(Zℓ) ≤ x2
2/w.
Chandra (UIUC) CS498ABD 18 Spring 2019 18 / 18
Zℓ = gℓ(i)C[ℓ, hℓ(i)]. We have seen: E[Zℓ] = xi and Var(Zℓ) ≤ x2
2/w.
Using Chebyshev: Pr[|Zℓ − xi| ≥ ǫx2] ≤ Var(Zℓ) ǫ2x2
2
≤ 1 ǫ2w ≤ 1/3.
Chandra (UIUC) CS498ABD 18 Spring 2019 18 / 18
Zℓ = gℓ(i)C[ℓ, hℓ(i)]. We have seen: E[Zℓ] = xi and Var(Zℓ) ≤ x2
2/w.
Using Chebyshev: Pr[|Zℓ − xi| ≥ ǫx2] ≤ Var(Zℓ) ǫ2x2
2
≤ 1 ǫ2w ≤ 1/3. Via the Chernoff bound, Pr[|median{Z1, . . . , Zd} − xi| ≥ ǫx2] ≤ e−cd ≤ δ.
Chandra (UIUC) CS498ABD 18 Spring 2019 18 / 18
Let d ≥ 4 log 1
δ and w > 3 ǫ2. Then for any fixed i ∈ [n],
E[˜ xi] = xi and Pr[|˜ xi − xi| ≥ ǫx2] ≤ δ. Choose d = θ(ln n) and w = 3/ǫ2: we have Pr[|˜ xi − xi| ≥ ǫx2] ≤ 1/n2. By union bound, with probability (1 − 1/n), for all i ∈ [n], |˜ xi − xi| ≤ ǫx2
Chandra (UIUC) CS498ABD 19 Spring 2019 19 / 18
Let d ≥ 4 log 1
δ and w > 3 ǫ2. Then for any fixed i ∈ [n],
E[˜ xi] = xi and Pr[|˜ xi − xi| ≥ ǫx2] ≤ δ. Choose d = θ(ln n) and w = 3/ǫ2: we have Pr[|˜ xi − xi| ≥ ǫx2] ≤ 1/n2. By union bound, with probability (1 − 1/n), for all i ∈ [n], |˜ xi − xi| ≤ ǫx2 Total space O( 1
ǫ2 log n) counters and hence O( 1 ǫ2 log n log m) bits.
Chandra (UIUC) CS498ABD 19 Spring 2019 19 / 18