Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 - PowerPoint PPT Presentation

Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT → MSR New England 2 MIT → IBM Almaden → UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 1 / 24

Count-Sketch: a classic streaming algorithm Charikar, Chen, Farach-Colton 2002 Solves “heavy hitters” problem Estimate a vector x ∈ R n from low dimensional sketch Ax ∈ R m . Nice algorithm ◮ Simple ◮ Used in Google’s MapReduce standard library [CCF02] bounds the maximum error over all coordinates. We show, for the same algorithm, ◮ Most coordinates have asymptotically better estimation accuracy. ◮ The average accuracy over many coordinates will be asymptotically better with high probability . ◮ Experiments show our asymptotics are correct. Caveat: we assume fully independent hash functions. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 2 / 24

Outline Robust Estimation of Symmetric Variables 1 Lemma Relevance to Count-Sketch Electoral Colleges and Direct Elections 2 Lemma Relevance to Count-Sketch Experiments! 3 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 3 / 24

Estimating a symmetric random variable’s mean 1 / 2 mean µ , standard deviation σ X µ ± ∞ 1 / 2 Unknown distribution X over R , symmetric about unknown µ . ◮ Given samples x 1 , . . . , x R ∼ X . ◮ How to estimate µ ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

Estimating a symmetric random variable’s mean 1 / 2 mean µ , standard deviation σ X µ ± ∞ 1 / 2 Unknown distribution X over R , symmetric about unknown µ . ◮ Given samples x 1 , . . . , x R ∼ X . ◮ How to estimate µ ? Mean: √ ◮ Converges to µ as σ/ R . Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

Estimating a symmetric random variable’s mean 1 / 2 mean µ , standard deviation σ X µ ± ∞ 1 / 2 Unknown distribution X over R , symmetric about unknown µ . ◮ Given samples x 1 , . . . , x R ∼ X . ◮ How to estimate µ ? Mean: √ ◮ Converges to µ as σ/ R . ◮ No robustness to outliers Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

Estimating a symmetric random variable’s mean 1 / 2 mean µ , standard deviation σ X µ ± ∞ 1 / 2 Unknown distribution X over R , symmetric about unknown µ . ◮ Given samples x 1 , . . . , x R ∼ X . ◮ How to estimate µ ? Mean: √ ◮ Converges to µ as σ/ R . ◮ No robustness to outliers Median: ◮ Extremely robust ◮ Doesn’t necessarily converge to µ . Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 5 / 24

Estimating a symmetric random variable’s mean X µ − σ µ µ + σ Median doesn’t converge Consider: median of pairwise means x i + x i + 1 µ = � median 2 i ∈{ 1 , 3 , 5 , ... } √ ◮ Converges as O ( σ/ R ) , even with outliers. That is: median of ( X + X ) converges. [See also: Hodges-Lehmann estimator.] Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 6 / 24

Why does median converge for X + X ? WLOG µ = 0. Define the Fourier transform F X of X : 2 π ≈ 6 . 28 F X ( t ) = E x ∼X [ cos ( τ xt )] (standard Fourier transform of PDF , specialized to symmetric X .) Convolution ⇐ ⇒ multiplication ◮ F X + X ( t ) = ( F X ( t )) 2 ≥ 0 for all t . Theorem Let Y be symmetric about 0 with F Y ( t ) ≥ 0 for all t and E [ Y 2 ] = σ 2 . Then for all ǫ ≤ 1 , Pr [ | y | ≤ ǫσ ] � ǫ √ Standard Chernoff bounds: median y 1 , . . . , y R converges as σ/ R . Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 7 / 24

Proof Theorem Let F Y ( t ) ≥ 0 for all t and E [ Y 2 ] = 1 . Then for all ǫ ≤ 1 , Pr [ | y | ≤ ǫ ] � ǫ. F Y ( t ) = E [ cos ( τ yt )] ≥ 1 − τ 2 2 t 2 1 Pr [ | y | ≤ ǫ ] = Y · ǫ 1 ≥ Y · ǫ ǫ = F Y · 1 /ǫ 1 ǫ ≥ · � ǫ. � 0 . 2 1 /ǫ Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 8 / 24

Count-Sketch Want to estimate x ∈ R n from small “sketch.” Hash to k buckets and sum up with random signs Choose random h : [ n ] → [ k ] , s : [ n ] → {± 1 } . Store � y j = s ( i ) x i k i : h ( i )= j Can estimate x i by ˜ x i = y h ( i ) s ( i ) . R Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 10 / 24

Count-Sketch Want to estimate x ∈ R n from small “sketch.” Hash to k buckets and sum up with random signs Choose random h : [ n ] → [ k ] , s : [ n ] → {± 1 } . Store � y j = s ( i ) x i k i : h ( i )= j Can estimate x i by ˜ x i = y h ( i ) s ( i ) . R Repeat R times, take the median. For each row, � ± x j � with probability 1 / k ˜ x i − x i = 0 otherwise j � = i Symmetric, non-negative Fourier transform. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 10 / 24

Count-Sketch Analysis Let σ 2 = 1 � x − x [ k ] � 2 min 2 k k -sparse x [ k ] be the “typical” error for a single row of Count-Sketch with k columns. Theorem For the any coordinate i, we have for all t ≤ R that � t R σ ] ≤ e − Ω( t ) . Pr [ | � x i − x i | > (CCF02: t = R = O ( log n ) case; � � x − x � ∞ � σ w.h.p.) Corollary Excluding e − Ω( R ) probability events, we have for each i that x i − x i ) 2 ] = σ 2 / R E [( � Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 11 / 24

Estimation of multiple coordinates? What about the average error on a set S of k coordinates? 2 ] = O ( 1 ) x S − x S � 2 R k σ 2 . Linearity of expectation: E [ � � Does it concentrate? 2 > O ( 1 ) x S − x S � 2 k σ 2 ] < p =??? Pr [ � � R By expectation: p = Θ( 1 ) . If independent: p = e − Ω( k ) . Sum of many variables, but not independent... Chebyshev’s inequality, bounding covariance of error: ◮ Feasible to analyze (though kind of nasty). √ ◮ Ideally get: p = 1 / k . ◮ We can get p = 1 / k 1 / 14 . Can we at least get “high probability,” i.e. 1 / k c for arbitrary constant c ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 12 / 24

Boosting the error probability in a black box manner x S − x S � 2 is “small” with all but k − 1 / 14 probability. We know that � � Way to get all but k − c probability: repeat 100 c times and take the median of results. ◮ With all but k − c probability, > 75 c of the � x ( i ) S will have “small” error. ◮ Median of results has at most 3 × “small” total error. But resulting algorithm is stupid: ◮ Run count-sketch with R ′ = O ( cR ) . ◮ Arbitrarily partition into blocks of R rows. ◮ Estimate is median (over blocks) of median (within block) of individual estimates. Can we show that the direct median is as good as the median-of-medians? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 13 / 24

Electoral Colleges Suppose you have a two-party election for k offices. ◮ Voters come from a distribution X over { 0 , 1 } k . ◮ “True” majority slate of candidates x ∈ { 0 , 1 } k . ◮ Election day, receive ballots x 1 , . . . , x n ∼ X . How to best estimate x ? For each office, x majority x electoral x 1 x 2 x 3 · · · x n − 1 x n · · · x CA x TX x 1 · · · x | CA | · · · x n −| TX | + 1 · · · x n Is x majority better than x electoral in every way? Is Pr [ � x majority − x � > α ] ≤ Pr [ � x electoral − x � > α ] for all α , �·� ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 15 / 24

Electoral Colleges Is x majority better than x electoral in every way, so Pr [ � x majority − x � > α ] ≤ Pr [ � x electoral − x � > α ] for all α , �·� ? Don’t know, but Theorem Pr [ � x majority − x � > 3 α ] ≤ 4 · Pr [ � x electoral − x � > α ] for all p-norms �·� . Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch 2014-01-06 16 / 24

Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 - PowerPoint PPT Presentation

Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch

Sketching Streams Chris Taylor DoD Overview What-Why Sketch? Sketches Hyper Log Log

Count-Min Sketch Analysis Probability Preliminaries Proof of the claim Anil Maheshwari

Count-Min Sketch Complexity Analysis Markovs Inequality Anil Maheshwari Proof of the claim

Making Every Contact Count (MECC) Content What is Making Every Contact Count? Who is

Recitation 4 Question 3: Flying off the handle Parent Child fork() count++; print(count); 1

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Sketch Data Structures and Concentration Bounds Graham Cormode University of Warwick

Cynthia Gaub North Middle School Everett Washington www.artechtivity.com About Sketch-up State

Review SketchNet: Sketch Classification with Web Images [CVPR `16] (Speaker. Doheon Lee)

L ECTURE 10 Last time Multipurpose sketches Count-min and count-sketch Range queries,

COMPRESSING GRADIENT OPTIMIZERS VIA COUNT-SKETCHES Ryan Spring, Anastasios Kyrillidis, Vijai

Diffusion Contaminant at Contaminant Solutes (contaminants) migrate due to concentration

Probabilistic Program Analysis and Concentration of Measure Part I: Concentration of Measure

An Improved Data Stream Summary: The Count-Min Sketch and its Applications Graham Cormode,

Recap Hashing-based sketch techniques summarize large data sets Summarize vectors: Test

A Technique for Counting NATted Hosts smb@research.att.com http://www.research.att.com/smb

Two Ways to Count Solutions to Polynomial Robinson Equations Margaret Robinson Mount Holyoke

Lists and dictionaries Lists: ordered collections of things In [1]: pets = ['fido', 'molly',

Counting and Probability Whats to come? Probability. A bag contains: What is the chance that

Basics of HMMs You should be able to take this and fill in the right-hand sides. 1 The problem X

An Enhanced Global Router An Enhanced Global Router An Enhanced Global Router An Enhanced Global

Correctness of parallel programs Shaz Qadeer Research in

Slides for Lecture 8 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve