Improved Concentration Bounds for Count-Sketch
Gregory T. Minton1 Eric Price2
1MIT → MSR New England 2MIT → IBM Almaden → UT Austin
2014-01-06
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 1 / 24
Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 - - PowerPoint PPT Presentation
Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch
1MIT → MSR New England 2MIT → IBM Almaden → UT Austin
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 1 / 24
◮ Simple ◮ Used in Google’s MapReduce standard library
◮ Most coordinates have asymptotically better estimation accuracy. ◮ The average accuracy over many coordinates will be asymptotically
◮ Experiments show our asymptotics are correct.
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 2 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 3 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 4 / 24
◮ Given samples x1, . . . , xR ∼ X. ◮ How to estimate µ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24
◮ Given samples x1, . . . , xR ∼ X. ◮ How to estimate µ?
◮ Converges to µ as σ/
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24
◮ Given samples x1, . . . , xR ∼ X. ◮ How to estimate µ?
◮ Converges to µ as σ/
◮ No robustness to outliers Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24
◮ Given samples x1, . . . , xR ∼ X. ◮ How to estimate µ?
◮ Converges to µ as σ/
◮ No robustness to outliers
◮ Extremely robust ◮ Doesn’t necessarily converge to µ. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24
◮ Converges as O(σ/
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 6 / 24
◮ FX+X (t) = (FX (t))2 ≥ 0 for all t.
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 7 / 24
Improved Concentration Bounds for Count-Sketch 2014-01-06 8 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 9 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 10 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 10 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 11 / 24
◮ Feasible to analyze (though kind of nasty). ◮ Ideally get: p = 1/
◮ We can get p = 1/k1/14.
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 12 / 24
◮ With all but k−c probability, > 75c of the
S will have “small” error.
◮ Median of results has at most 3× “small” total error.
◮ Run count-sketch with R′ = O(cR). ◮ Arbitrarily partition into blocks of R rows. ◮ Estimate is median (over blocks) of median (within block) of
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 13 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 14 / 24
◮ Voters come from a distribution X over {0, 1}k. ◮ “True” majority slate of candidates x ∈ {0, 1}k. ◮ Election day, receive ballots x1, . . . , xn ∼ X.
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 15 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 16 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 17 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 18 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 19 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 20 / 24
1
2
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 21 / 24
1
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
The ratio |c xi−xi|/mR,C
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Probability density Distribution of errors, 10 trials at n =1000000
R=20, C=20 R=20, C=30 R=20, C=50 R=20, C=100 R=20, C=200 R=20, C=500 R=20, C=1000 R=20, C=2000 R=20, C=5000 R=20, C=10000 R=50, C=20 R=50, C=30 R=50, C=50 R=50, C=100 R=50, C=200 R=100, C=20 R=100, C=30 R=100, C=50 R=100, C=100 R=100, C=200 R=200, C=20 R=200, C=30 R=200, C=50 R=200, C=100 R=500, C=20 R=500, C=30 R=1000, C=20
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 22 / 24
2
1 2 3 4 5 6 7 8 9
Ek /(mR,C
p
k)
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Probability density Distribution of Ek for various C with n =10000, k =25, R =50 C=20 C=50 C=100 C=200 C=500 C=1000
2 4 6 8 10 12
Ek /(mR,C
p
k)
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Probability density Distribution of Ek for various R with n =10000, k =25, C =100 R=10 R=20 R=50 R=100 R=200 R=500 R=1000 R=2000 R=4000
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 23 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 24 / 24
Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 25 / 24