count min sketch
play

Count-Min Sketch Analysis Probability Preliminaries Proof of the - PowerPoint PPT Presentation

Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Count-Min Sketch Analysis Probability Preliminaries Proof of the claim Anil Maheshwari Conclusions School of Computer Science Carleton University Canada Outline


  1. Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Count-Min Sketch Analysis Probability Preliminaries Proof of the claim Anil Maheshwari Conclusions School of Computer Science Carleton University Canada

  2. Outline Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Review 1 Complexity Analysis Probability Preliminaries Count-Min Sketch 2 Proof of the claim Conclusions Complexity Analysis 3 Probability Preliminaries 4 Proof of the claim 5 Conclusions 6

  3. Majority Element Problem Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Analysis Finding the Majority Element Probability Preliminaries Input: A stream consisting of n elements and it is given Proof of the claim that it has a majority element. Conclusions Output: The majority element. Store the stream in an array A . Sort and pick the middle element (if elements can be ordered). Count frequency of each element. Issue: May need O ( n ) memory.

  4. Majority Algorithm Count-Min Sketch Anil Maheshwari Input: Array A of size n consisting a majority element Review Output: The majority element Count-Min Sketch 1 c ← 0 Complexity Analysis 2 for i = 1 to n do Probability if c = 0 then 3 Preliminaries current ← A [ i ] ; c ← c + 1 4 Proof of the claim end Conclusions 5 else 6 if A [ i ] = current then 7 c ← c + 1 8 end 9 else 10 c ← c − 1 11 end 12 end 13 14 end 15 return current

  5. Analysis of Majority Algorithm Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Observations Analysis Probability Algorithm maintains only two variables: c and 1 Preliminaries current. Proof of the claim Conclusions Correctness: Each non-majority element can ‘kill’ at 2 most one majority element. Claim By performing a single pass, using only O (1) additional space, we can report the majority element of A (if it exists).

  6. Misra & Gries [82] Algorithm Count-Min Sketch Anil Maheshwari Review Finding Heavy Hitters Count-Min Sketch Complexity Input: A stream consisting of n elements and fixed Analysis integer k < n . Probability Preliminaries Output: Report all elements that occur ≥ n/k times. Proof of the claim Conclusions Initialize k bins, each with null element and a counter 1 with 0. For each element x in the stream do 2 if x ∈ Bin b then increment bin b ’s counter elseif find a bin whose counter is 0 and Assign x to this bin Assign 1 to its counter else decrement the counter of every bin. Output elements in the bins. 3

  7. Analysis of Misra and Gries Algorithm Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Analysis Claim Probability Let f ∗ x = Frequency of x in the stream. Preliminaries Proof of the claim Each heavy hitter x is in one of the bins with counter Conclusions value ≥ f ∗ x − n/k . Running Time Initializing k bins: O ( k ) time Processing each element requires looking at O ( k ) bins. Total Run Time = O ( nk )

  8. Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Analysis Generalize More Probability For a data stream, using very little space, we are Preliminaries Proof of the claim interested to report Conclusions All the elements that occur frequently, e.g at least 2% 1 times. For each element, its (approximate) frequency. 2

  9. Count-Min Sketch Data Structure Count-Min Sketch Anil Maheshwari Review Input: An array (stream) A consisting of n numbers and r Count-Min Sketch hash functions h 1 , . . . , h r , where h i : N → { 1 , . . . , b } Complexity Analysis Output: CMS [ · , · ] table consisting of r rows and b columns Probability 1 for i = 1 to r do Preliminaries for j = 1 to b do Proof of the claim 2 CMS [ i, j ] ← 0 Conclusions 3 end 4 5 end 6 for i = 1 to n do for j = 1 to r do 7 CMS [ j, h j ( A [ i ])] ← CMS [ j, h j ( A [ i ])] + 1 8 end 9 10 end 11 return CMS [ · , · ]

  10. Updating CMS table Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Analysis An example with b = 10 and r = 3 and assume that Probability stream A = xyy Preliminaries Proof of the claim Conclusions After Initialization: 1 2 3 4 5 6 7 8 9 10 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0

  11. Execution of Algorithm Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity An example with b = 10 and r = 3 and assume that Analysis stream A = xyy Probability Preliminaries Assume the following h -values for x and y : Proof of the claim Conclusions For x : h 1 ( x ) = 3 , h 2 ( x ) = 8 , and h 3 ( x ) = 5 For y : h 1 ( y ) = 6 , h 2 ( y ) = 8 , and h 3 ( y ) = 1 1 2 3 4 5 6 7 8 9 10 1 2 3

  12. Updating CMS table Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Insertion of x : h 1 ( x ) = 3 , h 2 ( x ) = 8 , and h 3 ( x ) = 5 : Complexity Analysis 1 2 3 4 5 6 7 8 9 10 Probability Preliminaries 1 0 0 0 0 0 0 0 0 0 0 Proof of the claim 2 0 0 0 0 0 0 0 0 0 0 Conclusions 3 0 0 0 0 0 0 0 0 0 0 After inserting x : 1 2 3 4 5 6 7 8 9 10 1 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 3 0 0 0 0 1 0 0 0 0 0

  13. Updating CMS table Count-Min Sketch Anil Maheshwari Review Insertion of 1st y : h 1 ( y ) = 6 , h 2 ( y ) = 8 , and h 3 ( y ) = 1 that Count-Min Sketch hashes to locations 6,8, and 1: Complexity Analysis Probability 1 2 3 4 5 6 7 8 9 10 Preliminaries 1 0 0 1 0 0 0 0 0 0 0 Proof of the claim 2 0 0 0 0 0 0 0 1 0 0 Conclusions 3 0 0 0 0 1 0 0 0 0 0 After inserting 1st y : 1 2 3 4 5 6 7 8 9 10 1 0 0 1 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 3 1 0 0 0 1 0 0 0 0 0

  14. Updating CMS table Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Insertion of 2nd y (hashes to same locations 6,8, and 1): Complexity Analysis 1 2 3 4 5 6 7 8 9 10 Probability Preliminaries 1 0 0 1 0 0 1 0 0 0 0 Proof of the claim 2 0 0 0 0 0 0 0 2 0 0 Conclusions 3 1 0 0 0 1 0 0 0 0 0 After inserting 2nd y : 1 2 3 4 5 6 7 8 9 10 1 0 0 1 0 0 2 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 3 2 0 0 0 1 0 0 0 0 0

  15. Observations on CMS Table Entries Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Let n = total # items in the stream. Analysis f ∗ x = true frequency of x in the stream. Probability Preliminaries Let f x = min { CMS [1 , h 1 ( x )] , . . . , CMS [ r, h r ( x )] } . This is Proof of the claim the estimate on the frequency of x that we report. Conclusions The size of CMS table ( = br ) is independent of n . 1 CMS table can be computed in O ( br + nr ) time. 2 For any x ∈ A , and for any j = 1 , . . . , r , 3 CMS [ j, h j ( x )] ≥ f ∗ x . Therefore, f x ≥ f ∗ x (i.e., f x is an overestimate). 4

  16. Assume - Proof comes later Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Analysis Claim Probability Preliminaries Let b = 2 ǫ . Then Pr [ f x − f ∗ x ≥ ǫn ] ≤ 1 Proof of the claim 2 r Conclusions Corollary With probability at least 1 − 1 / 2 r , f ∗ x ≤ f x ≤ f ∗ x + ǫn

  17. Reporting Frequent Elements Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Complexity Analysis Suppose we want to report all the elements of A that Probability occur approximately ≥ n/k times for some integer k . Preliminaries In the Claim, set ǫ = 1 / 3 k . Then b = 2 ǫ = 6 k . Proof of the claim Conclusions Construct CMS table of size br = 6 kr . Scan A and compute the entries in the CMS table. Maintain a set of O ( k ) items that occur most frequently among all the elements in A scanned so far. How?

  18. Heap Data Structure Count-Min Sketch Anil Maheshwari Review Count-Min Sketch The items are stored in a HEAP with f x values as the key. Complexity Analysis What is a Heap? Probability Preliminaries An array that stores n elements and supports: Proof of the claim Conclusions Find Max or Min: Report the element with the smallest/largest key value in Heap in O (1) time. Insert ( x, k ) : Insert element x with key k in Heap in O (log n ) time. Delete ( x ) : Delete element x from Heap in O (log n ) time. . . .

  19. Reporting Frequent Elements contd. Count-Min Sketch Anil Maheshwari Review Count-Min Sketch Assume we have scanned i − 1 items and have updated Complexity Analysis the CMS table and the heap. Probability Consider the i -th item (say x = A [ i ] ) and we perform the Preliminaries following: Proof of the claim Conclusions For j = 1 to r : update the CMS table by executing 1 CMS [ j, h j ( x )] ← CMS [ j, h j ( x )] + 1 . Let f x = min { CMS [1 , h 1 ( x )] , . . . , CMS [ r, h r ( x )] } . 2 If f x ≥ i/k , do: If x ∈ heap, delete x and re-insert it again with the 1 updated f x value. If x �∈ heap, then insert it in the heap and remove all 2 the elements whose count is less than i/k .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend