Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 - - PowerPoint PPT Presentation

improved concentration bounds for count sketch
SMART_READER_LITE
LIVE PREVIEW

Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 - - PowerPoint PPT Presentation

Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch


slide-1
SLIDE 1

Improved Concentration Bounds for Count-Sketch

Gregory T. Minton1 Eric Price2

1MIT → MSR New England 2MIT → IBM Almaden → UT Austin

2014-01-06

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 1 / 24

slide-2
SLIDE 2

Count-Sketch: a classic streaming algorithm

Charikar, Chen, Farach-Colton 2002

Solves “heavy hitters” problem Estimate a vector x ∈ Rn from low dimensional sketch Ax ∈ Rm. Nice algorithm

◮ Simple ◮ Used in Google’s MapReduce standard library

[CCF02] bounds the maximum error over all coordinates. We show, for the same algorithm,

◮ Most coordinates have asymptotically better estimation accuracy. ◮ The average accuracy over many coordinates will be asymptotically

better with high probability.

◮ Experiments show our asymptotics are correct.

Caveat: we assume fully independent hash functions.

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 2 / 24

slide-3
SLIDE 3

Outline

1

Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch

2

Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch

3

Experiments!

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 3 / 24

slide-4
SLIDE 4

Outline

1

Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch

2

Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch

3

Experiments!

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 4 / 24

slide-5
SLIDE 5

Estimating a symmetric random variable’s mean

X mean µ, standard deviation σ

1/2

µ ± ∞

1/2

Unknown distribution X over R, symmetric about unknown µ.

◮ Given samples x1, . . . , xR ∼ X. ◮ How to estimate µ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

slide-6
SLIDE 6

Estimating a symmetric random variable’s mean

X mean µ, standard deviation σ

1/2

µ ± ∞

1/2

Unknown distribution X over R, symmetric about unknown µ.

◮ Given samples x1, . . . , xR ∼ X. ◮ How to estimate µ?

Mean:

◮ Converges to µ as σ/

√ R.

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

slide-7
SLIDE 7

Estimating a symmetric random variable’s mean

X mean µ, standard deviation σ

1/2

µ ± ∞

1/2

Unknown distribution X over R, symmetric about unknown µ.

◮ Given samples x1, . . . , xR ∼ X. ◮ How to estimate µ?

Mean:

◮ Converges to µ as σ/

√ R.

◮ No robustness to outliers Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

slide-8
SLIDE 8

Estimating a symmetric random variable’s mean

X mean µ, standard deviation σ

1/2

µ ± ∞

1/2

Unknown distribution X over R, symmetric about unknown µ.

◮ Given samples x1, . . . , xR ∼ X. ◮ How to estimate µ?

Mean:

◮ Converges to µ as σ/

√ R.

◮ No robustness to outliers

Median:

◮ Extremely robust ◮ Doesn’t necessarily converge to µ. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

slide-9
SLIDE 9

Estimating a symmetric random variable’s mean

µ−σ µ µ +σ X

Median doesn’t converge Consider: median of pairwise means

  • µ =

median

i∈{1, 3, 5, ...}

xi + xi+1 2

◮ Converges as O(σ/

√ R), even with outliers.

That is: median of (X + X) converges. [See also: Hodges-Lehmann estimator.]

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 6 / 24

slide-10
SLIDE 10

Why does median converge for X + X?

WLOG µ = 0. Define the Fourier transform FX of X: FX (t) = E

x∼X[cos(τxt)]

(standard Fourier transform of PDF , specialized to symmetric X.) Convolution ⇐ ⇒ multiplication

◮ FX+X (t) = (FX (t))2 ≥ 0 for all t.

Theorem

Let Y be symmetric about 0 with FY(t) ≥ 0 for all t and E[Y 2] = σ2. Then for all ǫ ≤ 1, Pr[|y| ≤ ǫσ] ǫ Standard Chernoff bounds: median y1, . . . , yR converges as σ/ √ R. 2π ≈ 6.28

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 7 / 24

slide-11
SLIDE 11

Proof

Theorem

Let FY(t) ≥ 0 for all t and E[Y 2] = 1. Then for all ǫ ≤ 1, Pr[|y| ≤ ǫ] ǫ. FY(t) = E[cos(τyt)] ≥ 1 − τ 2 2 t2 Pr[|y| ≤ ǫ] = Y · 1 ǫ ≥ Y · 1 ǫ = FY · ǫ

1/ǫ

≥ 1

0.2

· ǫ

1/ǫ

ǫ.

  • Gregory T. Minton, Eric Price (IBM)

Improved Concentration Bounds for Count-Sketch 2014-01-06 8 / 24

slide-12
SLIDE 12

Outline

1

Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch

2

Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch

3

Experiments!

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 9 / 24

slide-13
SLIDE 13

Count-Sketch

k R Want to estimate x ∈ Rn from small “sketch.” Hash to k buckets and sum up with random signs Choose random h : [n] → [k], s : [n] → {±1}. Store yj =

  • i: h(i)=j

s(i)xi Can estimate xi by ˜ xi = yh(i)s(i).

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 10 / 24

slide-14
SLIDE 14

Count-Sketch

k R Want to estimate x ∈ Rn from small “sketch.” Hash to k buckets and sum up with random signs Choose random h : [n] → [k], s : [n] → {±1}. Store yj =

  • i: h(i)=j

s(i)xi Can estimate xi by ˜ xi = yh(i)s(i). Repeat R times, take the median. For each row, ˜ xi − xi =

  • j=i

±xj with probability 1/k

  • therwise

Symmetric, non-negative Fourier transform.

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 10 / 24

slide-15
SLIDE 15

Count-Sketch Analysis

Let σ2 = 1 k min

k-sparse x[k]

x − x[k]2

2

be the “typical” error for a single row of Count-Sketch with k columns.

Theorem

For the any coordinate i, we have for all t ≤ R that Pr[| xi − xi| >

  • t

R σ] ≤ e−Ω(t). (CCF02: t = R = O(log n) case; x − x∞ σ w.h.p.)

Corollary

Excluding e−Ω(R) probability events, we have for each i that E[( xi − xi)2] = σ2/R

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 11 / 24

slide-16
SLIDE 16

Estimation of multiple coordinates?

What about the average error on a set S of k coordinates? Linearity of expectation: E[ xS − xS2

2] = O(1) R kσ2.

Does it concentrate? Pr[ xS − xS2

2 > O(1)

R kσ2] < p =??? By expectation: p = Θ(1). If independent: p = e−Ω(k). Sum of many variables, but not independent... Chebyshev’s inequality, bounding covariance of error:

◮ Feasible to analyze (though kind of nasty). ◮ Ideally get: p = 1/

√ k.

◮ We can get p = 1/k1/14.

Can we at least get “high probability,” i.e. 1/kc for arbitrary constant c?

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 12 / 24

slide-17
SLIDE 17

Boosting the error probability

in a black box manner

We know that xS − xS2 is “small” with all but k−1/14 probability. Way to get all but k−c probability: repeat 100c times and take the median of results.

◮ With all but k−c probability, > 75c of the

x(i)

S will have “small” error.

◮ Median of results has at most 3× “small” total error.

But resulting algorithm is stupid:

◮ Run count-sketch with R′ = O(cR). ◮ Arbitrarily partition into blocks of R rows. ◮ Estimate is median (over blocks) of median (within block) of

individual estimates.

Can we show that the direct median is as good as the median-of-medians?

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 13 / 24

slide-18
SLIDE 18

Outline

1

Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch

2

Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch

3

Experiments!

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 14 / 24

slide-19
SLIDE 19

Electoral Colleges

Suppose you have a two-party election for k offices.

◮ Voters come from a distribution X over {0, 1}k. ◮ “True” majority slate of candidates x ∈ {0, 1}k. ◮ Election day, receive ballots x1, . . . , xn ∼ X.

How to best estimate x? For each office, x1 x2 x3 · · · xn−1 xn xmajority x1 · · · x|CA| · · · xn−|TX|+1 · · · xn xCA xTX · · · xelectoral Is xmajority better than xelectoral in every way? Is Pr[xmajority − x > α] ≤ Pr[xelectoral − x > α] for all α, ·?

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 15 / 24

slide-20
SLIDE 20

Electoral Colleges

Is xmajority better than xelectoral in every way, so Pr[xmajority − x > α] ≤ Pr[xelectoral − x > α] for all α, ·? Don’t know, but

Theorem

Pr[xmajority − x > 3α] ≤ 4 · Pr[xelectoral − x > α] for all p-norms ·.

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 16 / 24

slide-21
SLIDE 21

Proof

Theorem

Pr[xmajority − x > 3α] ≤ 4 · Pr[xelectoral − x > α] for all p-norms ·. Follows easily from:

Lemma (median3)

For any x1, . . . , xn ∈ Rk, we have median

partitions into states median states

median

within state xi = median populace xi

(With 4p failure probability, 3/4 of partitions have error at most α; then their median has error 3α.)

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 17 / 24

slide-22
SLIDE 22

Outline

1

Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch

2

Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch

3

Experiments!

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 18 / 24

slide-23
SLIDE 23

Concentration for sets

We know that a “median-of-medians” variant of Count-Sketch would give good estimation of sets with high probability. Therefore the standard Count-Sketch would as well.

Theorem

For any constant c, we have for any set S of coordinates that Pr[ xS − xS2 > O(

  • |S|

R σ)] |S|−c.

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 19 / 24

slide-24
SLIDE 24

Outline

1

Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch

2

Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch

3

Experiments!

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 20 / 24

slide-25
SLIDE 25

Experiments

Claims

1

Individual coordinates have error that concentrates like a Gaussian with standard deviation σ/ √ R.

2

Sets of coordinates have error O(σ

  • k/R) with high probability.

Evaluate on power-law distribution with typical parameters.

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 21 / 24

slide-26
SLIDE 26

Experiments

1

Individual coordinates have error that concentrates like a Gaussian with standard deviation σ/ √ R. Compare observed error to expected error for various R, C.

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

The ratio |c xi−xi|/mR,C

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Probability density Distribution of errors, 10 trials at n =1000000

R=20, C=20 R=20, C=30 R=20, C=50 R=20, C=100 R=20, C=200 R=20, C=500 R=20, C=1000 R=20, C=2000 R=20, C=5000 R=20, C=10000 R=50, C=20 R=50, C=30 R=50, C=50 R=50, C=100 R=50, C=200 R=100, C=20 R=100, C=30 R=100, C=50 R=100, C=100 R=100, C=200 R=200, C=20 R=200, C=30 R=200, C=50 R=200, C=100 R=500, C=20 R=500, C=30 R=1000, C=20

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 22 / 24

slide-27
SLIDE 27

Experiments

2

Sets of coordinates have error O(σ

  • k/R) with high probability.

(for large enough R, C) Compare observed error to expected error for various R, C.

1 2 3 4 5 6 7 8 9

Ek /(mR,C

p

k)

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Probability density Distribution of Ek for various C with n =10000, k =25, R =50 C=20 C=50 C=100 C=200 C=500 C=1000

2 4 6 8 10 12

Ek /(mR,C

p

k)

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Probability density Distribution of Ek for various R with n =10000, k =25, C =100 R=10 R=20 R=50 R=100 R=200 R=500 R=1000 R=2000 R=4000

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 23 / 24

slide-28
SLIDE 28

Conclusions

We present an improved analysis of Count-Sketch, a classic algorithm used in practice. Experiments show it gives the right asymptotics More applications of our lemmas? Independence? Thank You

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 24 / 24

slide-29
SLIDE 29

Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 25 / 24