estimating frequency moments
play

Estimating Frequency Moments Moments Estimating F 0 Algorithm - PowerPoint PPT Presentation

Estimating Frequency Moments Anil Maheshwari Frequency Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari Further Improvements anil@scs.carleton.ca Estimating F 2 School of Computer Science


  1. Estimating Frequency Moments Anil Maheshwari Frequency Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari Further Improvements anil@scs.carleton.ca Estimating F 2 School of Computer Science Correctness Carleton University Improving Variance Canada Complexity

  2. Outline Estimating Frequency Moments Anil Maheshwari Frequency Moments 1 Frequency Moments Estimating F 0 2 Estimating F 0 Algorithm Algorithm 3 Correctness Further Improvements Correctness 4 Estimating F 2 Further Improvements Correctness 5 Improving Variance Estimating F 2 6 Complexity Correctness 7 Improving Variance 8 Complexity 9

  3. Frequency Moments Estimating Frequency Moments Anil Maheshwari Definition Frequency Moments Let A = ( a 1 , a 2 , . . . , a n ) be a stream, where elements are Estimating F 0 from universe U = { 1 , . . . , u } . Let m i = # of elements in Algorithm A that are equal to i . The k -th frequency moment Correctness u i , where 0 0 = 0 . m k F k = � Further Improvements i =1 Estimating F 2 Correctness Improving Variance Complexity

  4. u Estimating m k Example: F k = � Frequency i Moments i =1 Anil Maheshwari A = (3 , 2 , 4 , 7 , 2 , 2 , 3 , 2 , 2 , 1 , 4 , 2 , 2 , 2 , 1 , 1 , 2 , 3 , 2) and Frequency Moments m 1 = m 3 = 3 , m 2 = 10 , m 4 = 2 , m 7 = 1 , m 5 = m 6 = 0 Estimating F 0 7 Algorithm i = 3 0 + 10 0 + 3 0 + 2 0 + 0 0 + 0 0 + 1 0 = 5 m 0 F 0 = � Correctness i =1 (# of Distinct Elements in A ) Further Improvements 7 Estimating F 2 i = 3 1 + 10 1 + 3 1 + 2 1 + 0 1 + 0 1 + 1 1 = 19 m 1 � F 1 = Correctness i =1 Improving (# of Elements in A ) Variance 7 Complexity i = 3 2 + 10 2 + 3 2 + 2 2 + 0 2 + 0 2 + 1 2 = 123 m 2 F 2 = � i =1 (Surprise Number) . . .

  5. Streaming Problem Estimating Frequency Moments Anil Maheshwari Find frequency moments in a stream Frequency Moments Input: A stream A consisting of n elements from Estimating F 0 universe U = { 1 , . . . , u } . Algorithm Output: Estimate Frequency Moments F k ’s for different Correctness values of k . Further Improvements Estimating F 2 Our Task: Estimate F 0 and F 2 using sublinear space Correctness Reference: The space complexity of estimating frequency Improving Variance moments by Noga Alon, Yossi Matias, and Mario Complexity Szegedy, Journal of Computer Systems and Science, 1999.

  6. Estimating F 0 Estimating Frequency Moments Anil Maheshwari Computation of F 0 Frequency Moments Input: Stream A = ( a 1 , a 2 , . . . , a n ) , where each Estimating F 0 a i ∈ U = { 1 , . . . , u } . Algorithm Output: An estimate ˆ F 0 of number of distinct elements Correctness � ˆ � 1 F 0 ≥ 1 − 2 F 0 in A such that Pr c ≤ F 0 ≤ c c for some Further Improvements constant c using sublinear space. Estimating F 2 Correctness Improving Variance Complexity

  7. Algorithm for Estimating F 0 Estimating Frequency Moments Anil Maheshwari Input: Stream A and a hash function h : U → U Output: Estimate ˆ Frequency F 0 Moments Estimating F 0 Algorithm Step 1: Initialize R := 0 Correctness Step 2: For each elements a i ∈ A do: Further Improvements Compute binary representation of h ( a i ) 1 Estimating F 2 Let r be the location of the rightmost 1 2 Correctness in the binary representation Improving Variance if r > R , R := r 3 Complexity Step 3: Return ˆ F 0 = 2 R Space Requirements = O (log u ) bits

  8. Observation 1 Estimating Frequency Moments Anil Maheshwari Let d to be smallest integer such that 2 d ≥ u ( d -bits are sufficient to represent numbers in U ) Frequency Moments Estimating F 0 Observation 1: Algorithm Pr ( rightmost 1 in h ( a i ) is at location ≥ r + 1) = 1 2 r Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  9. Observations 2 Estimating Frequency Moments Anil Maheshwari Observation 2: For a i � = a j , Pr ( rightmost 1 in 1 h ( a i ) ≥ r + 1 and rightmost 1 in h ( a j ) ≥ r + 1) = Frequency 2 2 r Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  10. Observations 3 Estimating Frequency Moments Anil Maheshwari Fix r ∈ { 1 , . . . , d } . ∀ x ∈ A , define indicator r.v: Frequency Moments � 1 , if the rightmost 1 is at location ≥ r + 1 in h ( x ) I r Estimating F 0 x = 0 , otherwise Algorithm Correctness Let Z r = � I r x (sum is over distinct elements of A ) Further Improvements Observation 3: The following holds: Estimating F 2 Correctness E [ I r x ] = 1 1 2 r Improving x ] = 1 1 − 1 Variance V ar [ I r � � 2 2 r 2 r Complexity E [ Z r ] = F 0 3 2 r V ar [ Z r ] ≤ E [ Z r ] 4

  11. Observation 3.1 Estimating Frequency Moments Anil Maheshwari x ] = 1 E [ I r 2 r Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  12. Observation 3.2 Estimating Frequency Moments Anil Maheshwari x ] 2 = 1 2 ] − E [ I r 1 − 1 V ar [ I r x ] = E [ I r � � x 2 r 2 r Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  13. Observation 3.3 Estimating Frequency Moments Anil Maheshwari E [ Z r ] = F 0 2 r Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  14. Observation 3.4 Estimating Frequency Moments Anil Maheshwari V ar [ Z r ] = F 0 1 1 − 1 � � ≤ F 0 2 r = E [ Z r ] 2 r 2 r Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  15. Observation 4 Estimating Frequency Moments Anil Maheshwari If 2 r > cF 0 , Pr ( Z r > 0) < 1 c Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  16. Chebyshev’s Inequality Estimating Frequency Moments Anil Maheshwari Chebyshev’s Inequality Frequency Moments Pr ( | X − E [ X ] | ≥ α ) ≤ V ar [ X ] α 2 Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  17. Observation 5 Estimating Frequency Moments Anil Maheshwari If c 2 r < F 0 , Pr ( Z r = 0) < 1 c Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  18. Observation 6 Estimating Frequency Moments Anil Maheshwari Claim Frequency Moments � ˆ � Set ˆ 1 F 0 ≥ 1 − 2 F 0 = 2 R . We have Pr c ≤ F 0 ≤ c Estimating F 0 c Algorithm Observation 4: if 2 r > cF 0 , Pr ( Z r > 0) < 1 Correctness c Observation 5, if c 2 r < F 0 , Pr ( Z r = 0) < 1 Further c Improvements Estimating F 2 Correctness Improving Variance Complexity

  19. Improving success probability Estimating Frequency Moments Anil Maheshwari Execute the algorithm s times in parallel (with independent hash functions) Frequency Moments Let R to the median value among these runs Estimating F 0 Return ˆ F 0 = 2 R Algorithm Correctness Note: Algorithm uses O ( s log u ) bits. Further Improvements Claim Estimating F 2 Correctness For c > 4 , there exists s = O (log 1 ǫ ) , ǫ > 0 , such that Improving ˆ Pr ( 1 F 0 Variance c ≤ F 0 ≤ c ) ≥ 1 − ǫ . Complexity Technique: Median + Chernoff Bounds

  20. Improving success probability (contd.) Estimating Frequency Moments Anil Maheshwari i -th Run of the Algorithm: Frequency Step 1: Initialize R i := 0 Moments Estimating F 0 Step 2: For each elements a i ∈ A do: Algorithm Compute binary representation of h ( a i ) 1 Correctness Let r be the location of the rightmost 1 in the 2 Further binary representation Improvements if r > R i , R i := r 3 Estimating F 2 Step 3: Return R i Correctness Improving Let R = Median ( R 1 , R 2 , . . . , R s ) Variance Complexity

  21. Indicator Random Variables Estimating Frequency Moments Anil Maheshwari Define X 1 , . . . , X s be indicator random variables: Frequency Moments � c ≤ 2 Ri if success, i.e. 1 0 , F 0 ≤ c Estimating F 0 X i = 1 , otherwise Algorithm Correctness Further E [ X i ] = Pr ( X i = 1) ≤ 2 c = β < 1 2 (Since c > 4 ) Improvements 1 Estimating F 2 s � Let X = X i = Number of failures in s runs 2 Correctness i =1 Improving Variance E [ X ] ≤ sβ < s 3 2 Complexity c ≤ 2 R If X < s 2 , then 1 F 0 ≤ c 4 ( R = Median ( R 1 , R 2 , . . . , R s ) )

  22. Chernoff Bounds Estimating Frequency Moments Anil Maheshwari Chernoff Bounds Frequency Moments If r.v. X is sum of independent identical indicator r.v. and Estimating F 0 0 < δ < 1 , Pr ( X ≥ (1 + δ ) E [ X ]) ≤ e − δ 2 E [ X ] 3 Algorithm Correctness Proof: See my notes Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  23. Main Result Estimating Frequency Moments Anil Maheshwari Claim Frequency Moments For any ǫ > 0 , if s = O (log 1 ǫ ) , Pr ( X < s 2 ) ≥ 1 − ǫ Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend