improved concentration bounds for count sketch
play

Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 - PowerPoint PPT Presentation

Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch


  1. Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT → MSR New England 2 MIT → IBM Almaden → UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 1 / 24

  2. Count-Sketch: a classic streaming algorithm Charikar, Chen, Farach-Colton 2002 Solves “heavy hitters” problem Estimate a vector x ∈ R n from low dimensional sketch Ax ∈ R m . Nice algorithm ◮ Simple ◮ Used in Google’s MapReduce standard library [CCF02] bounds the maximum error over all coordinates. We show, for the same algorithm, ◮ Most coordinates have asymptotically better estimation accuracy. ◮ The average accuracy over many coordinates will be asymptotically better with high probability . ◮ Experiments show our asymptotics are correct. Caveat: we assume fully independent hash functions. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 2 / 24

  3. Outline Robust Estimation of Symmetric Variables 1 Lemma Relevance to Count-Sketch Electoral Colleges and Direct Elections 2 Lemma Relevance to Count-Sketch Experiments! 3 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 3 / 24

  4. Outline Robust Estimation of Symmetric Variables 1 Lemma Relevance to Count-Sketch Electoral Colleges and Direct Elections 2 Lemma Relevance to Count-Sketch Experiments! 3 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 4 / 24

  5. Estimating a symmetric random variable’s mean 1 / 2 mean µ , standard deviation σ X µ ± ∞ 1 / 2 Unknown distribution X over R , symmetric about unknown µ . ◮ Given samples x 1 , . . . , x R ∼ X . ◮ How to estimate µ ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

  6. Estimating a symmetric random variable’s mean 1 / 2 mean µ , standard deviation σ X µ ± ∞ 1 / 2 Unknown distribution X over R , symmetric about unknown µ . ◮ Given samples x 1 , . . . , x R ∼ X . ◮ How to estimate µ ? Mean: √ ◮ Converges to µ as σ/ R . Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

  7. Estimating a symmetric random variable’s mean 1 / 2 mean µ , standard deviation σ X µ ± ∞ 1 / 2 Unknown distribution X over R , symmetric about unknown µ . ◮ Given samples x 1 , . . . , x R ∼ X . ◮ How to estimate µ ? Mean: √ ◮ Converges to µ as σ/ R . ◮ No robustness to outliers Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

  8. Estimating a symmetric random variable’s mean 1 / 2 mean µ , standard deviation σ X µ ± ∞ 1 / 2 Unknown distribution X over R , symmetric about unknown µ . ◮ Given samples x 1 , . . . , x R ∼ X . ◮ How to estimate µ ? Mean: √ ◮ Converges to µ as σ/ R . ◮ No robustness to outliers Median: ◮ Extremely robust ◮ Doesn’t necessarily converge to µ . Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

  9. Estimating a symmetric random variable’s mean X µ − σ µ µ + σ Median doesn’t converge Consider: median of pairwise means x i + x i + 1 µ = � median 2 i ∈{ 1 , 3 , 5 , ... } √ ◮ Converges as O ( σ/ R ) , even with outliers. That is: median of ( X + X ) converges. [See also: Hodges-Lehmann estimator.] Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 6 / 24

  10. Why does median converge for X + X ? WLOG µ = 0. Define the Fourier transform F X of X : 2 π ≈ 6 . 28 F X ( t ) = E x ∼X [ cos ( τ xt )] (standard Fourier transform of PDF , specialized to symmetric X .) Convolution ⇐ ⇒ multiplication ◮ F X + X ( t ) = ( F X ( t )) 2 ≥ 0 for all t . Theorem Let Y be symmetric about 0 with F Y ( t ) ≥ 0 for all t and E [ Y 2 ] = σ 2 . Then for all ǫ ≤ 1 , Pr [ | y | ≤ ǫσ ] � ǫ √ Standard Chernoff bounds: median y 1 , . . . , y R converges as σ/ R . Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 7 / 24

  11. Proof Theorem Let F Y ( t ) ≥ 0 for all t and E [ Y 2 ] = 1 . Then for all ǫ ≤ 1 , Pr [ | y | ≤ ǫ ] � ǫ. F Y ( t ) = E [ cos ( τ yt )] ≥ 1 − τ 2 2 t 2 1 Pr [ | y | ≤ ǫ ] = Y · ǫ 1 ≥ Y · ǫ ǫ = F Y · 1 /ǫ 1 ǫ ≥ · � ǫ. � 0 . 2 1 /ǫ Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 8 / 24

  12. Outline Robust Estimation of Symmetric Variables 1 Lemma Relevance to Count-Sketch Electoral Colleges and Direct Elections 2 Lemma Relevance to Count-Sketch Experiments! 3 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 9 / 24

  13. Count-Sketch Want to estimate x ∈ R n from small “sketch.” Hash to k buckets and sum up with random signs Choose random h : [ n ] → [ k ] , s : [ n ] → {± 1 } . Store � y j = s ( i ) x i k i : h ( i )= j Can estimate x i by ˜ x i = y h ( i ) s ( i ) . R Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 10 / 24

  14. Count-Sketch Want to estimate x ∈ R n from small “sketch.” Hash to k buckets and sum up with random signs Choose random h : [ n ] → [ k ] , s : [ n ] → {± 1 } . Store � y j = s ( i ) x i k i : h ( i )= j Can estimate x i by ˜ x i = y h ( i ) s ( i ) . R Repeat R times, take the median. For each row, � ± x j � with probability 1 / k ˜ x i − x i = 0 otherwise j � = i Symmetric, non-negative Fourier transform. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 10 / 24

  15. Count-Sketch Analysis Let σ 2 = 1 � x − x [ k ] � 2 min 2 k k -sparse x [ k ] be the “typical” error for a single row of Count-Sketch with k columns. Theorem For the any coordinate i, we have for all t ≤ R that � t R σ ] ≤ e − Ω( t ) . Pr [ | � x i − x i | > (CCF02: t = R = O ( log n ) case; � � x − x � ∞ � σ w.h.p.) Corollary Excluding e − Ω( R ) probability events, we have for each i that x i − x i ) 2 ] = σ 2 / R E [( � Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 11 / 24

  16. Estimation of multiple coordinates? What about the average error on a set S of k coordinates? 2 ] = O ( 1 ) x S − x S � 2 R k σ 2 . Linearity of expectation: E [ � � Does it concentrate? 2 > O ( 1 ) x S − x S � 2 k σ 2 ] < p =??? Pr [ � � R By expectation: p = Θ( 1 ) . If independent: p = e − Ω( k ) . Sum of many variables, but not independent... Chebyshev’s inequality, bounding covariance of error: ◮ Feasible to analyze (though kind of nasty). √ ◮ Ideally get: p = 1 / k . ◮ We can get p = 1 / k 1 / 14 . Can we at least get “high probability,” i.e. 1 / k c for arbitrary constant c ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 12 / 24

  17. Boosting the error probability in a black box manner x S − x S � 2 is “small” with all but k − 1 / 14 probability. We know that � � Way to get all but k − c probability: repeat 100 c times and take the median of results. ◮ With all but k − c probability, > 75 c of the � x ( i ) S will have “small” error. ◮ Median of results has at most 3 × “small” total error. But resulting algorithm is stupid: ◮ Run count-sketch with R ′ = O ( cR ) . ◮ Arbitrarily partition into blocks of R rows. ◮ Estimate is median (over blocks) of median (within block) of individual estimates. Can we show that the direct median is as good as the median-of-medians? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 13 / 24

  18. Outline Robust Estimation of Symmetric Variables 1 Lemma Relevance to Count-Sketch Electoral Colleges and Direct Elections 2 Lemma Relevance to Count-Sketch Experiments! 3 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 14 / 24

  19. Electoral Colleges Suppose you have a two-party election for k offices. ◮ Voters come from a distribution X over { 0 , 1 } k . ◮ “True” majority slate of candidates x ∈ { 0 , 1 } k . ◮ Election day, receive ballots x 1 , . . . , x n ∼ X . How to best estimate x ? For each office, x majority x electoral x 1 x 2 x 3 · · · x n − 1 x n · · · x CA x TX x 1 · · · x | CA | · · · x n −| TX | + 1 · · · x n Is x majority better than x electoral in every way? Is Pr [ � x majority − x � > α ] ≤ Pr [ � x electoral − x � > α ] for all α , �·� ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 15 / 24

  20. Electoral Colleges Is x majority better than x electoral in every way, so Pr [ � x majority − x � > α ] ≤ Pr [ � x electoral − x � > α ] for all α , �·� ? Don’t know, but Theorem Pr [ � x majority − x � > 3 α ] ≤ 4 · Pr [ � x electoral − x � > α ] for all p-norms �·� . Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 16 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend