count min sketch
play

Count-Min Sketch Complexity Analysis Markovs Inequality Anil - PDF document

Count-Min Sketch Anil Maheshwari Majority element Count-Min Sketch Count-Min Sketch Complexity Analysis Markovs Inequality Anil Maheshwari Proof of the claim Conclusions anil@scs.carleton.ca School of Computer Science Carleton


  1. Count-Min Sketch Anil Maheshwari Majority element Count-Min Sketch Count-Min Sketch Complexity Analysis Markov’s Inequality Anil Maheshwari Proof of the claim Conclusions anil@scs.carleton.ca School of Computer Science Carleton University Canada

  2. Outline Count-Min Sketch Anil Maheshwari Majority element Majority element Count-Min Sketch 1 Complexity Analysis Markov’s Count-Min Sketch 2 Inequality Proof of the claim Conclusions Complexity Analysis 3 Markov’s Inequality 4 Proof of the claim 5 Conclusions 6

  3. Problem Count-Min Sketch Anil Maheshwari Majority element Finding the Majority Element Count-Min Sketch Input: A stream consisting of n elements and it is given Complexity Analysis that it has a majority element, i.e. it occurs at least Markov’s 1 + b n 2 c times Inequality Output: The majority element. Proof of the claim Conclusions An Example: n = 19 Input Stream = [3 2 4 7 2 2 3 2 2 1 4 2 2 2 1 1 2 3 2]

  4. Straightforward Solutions Count-Min Sketch Anil Maheshwari Solution 1: Store the stream in an array A . Majority element Sort and pick the middle element. Count-Min Sketch Complexity Complexity: O ( n log n ) time and O ( n ) space Analysis Markov’s Solution 2: Count frequency of each element. Inequality Input: 3 2 4 7 2 2 3 2 2 1 4 2 2 2 1 1 2 3 2 Proof of the claim Conclusions Element 1 2 3 4 7 Frequency 3 10 3 2 1 Complexity: ?

  5. Do we need that much space? Count-Min Sketch Anil Maheshwari Majority element Finding the Majority Element Count-Min Sketch Input: A stream consisting of n elements and it is given Complexity Analysis that it has a majority element. Markov’s Output: The majority element. Inequality Proof of the claim Conclusions Memory required in Solutions 1 & 2 � Number of distinct elements in the stream. What if we can only use O (1) space?

  6. Majority Algorithm Count-Min Sketch Anil Maheshwari Input: Array A of size n consisting a majority element Majority element Output: The majority element Count-Min Sketch c ← 0 1 for i = 1 to n do Complexity 2 if c = 0 then Analysis 3 current ← A [ i ] ; c ← c + 1 4 Markov’s end 5 Inequality else 6 if A [ i ] = current then 7 Proof of the claim c ← c + 1 8 Conclusions end 9 else 10 c ← c − 1 11 end 12 end 13 end 14 return current 15 A [ i ] 3 2 4 7 2 2 3 2 . . . current . . . c 0 . . .

  7. Analysis of Majority Algorithm Count-Min Sketch Anil Maheshwari Majority element Observations Count-Min Sketch Algorithm maintains only two variables: c and 1 Complexity Analysis current. Markov’s Inequality Correctness: Each non-majority element can ‘kill’ at 2 Proof of the claim most one majority element. Conclusions Claim By performing a single pass, using only O (1) additional space, we can report the majority element of A (if it exists).

  8. Misra & Gries [82] Algorithm Count-Min Sketch Anil Maheshwari Majority element Finding Heavy Hitters Count-Min Sketch Input: A stream consisting of n elements and fixed Complexity Analysis integer k < n . Markov’s Output: Report all heavy hitters, i.e. elements that occur Inequality � n/k times. Proof of the claim Conclusions 1 Initialize k bins, each with null element and a counter with 0. 2 For each element x in the stream do if x ∈ Bin b then increment bin b ’s counter elseif find a bin whose counter is 0 and Assign x to this bin Assign 1 to its counter else decrement the counter of every bin. 3 Output elements in the bins.

  9. Analysis of Misra and Gries Algorithm Count-Min Sketch Anil Maheshwari Majority element Claim Count-Min Sketch Let f ∗ x = Frequency of x in the stream. Each heavy hitter Complexity Analysis x is in one of the bins with counter value � f ∗ x � n/k . Markov’s Inequality Correctness: What can be the minimum value of the Proof of the claim counter of a heavy hitter? Conclusions Running Time: Initializing k bins: O ( k ) time Processing each element requires looking at O ( k ) bins. Total Run Time = O ( nk ) Space: O ( k ) Reference: J. Misra and D. Gries,“Finding repeated elements” in Science of Computer Programming, Vol. 2 (2): 143 -152, 1982.

  10. Count-Min Sketch Count-Min Sketch Anil Maheshwari Majority element Problem Count-Min Sketch For a data stream, using very little space, we are Complexity Analysis interested to report Markov’s All the elements that occur frequently, e.g at least 2% Inequality 1 times. Proof of the claim Conclusions For each element, its (approximate) frequency. 2

  11. Count-Min Sketch Data Structure Count-Min Sketch Anil Maheshwari Input: An array (stream) A consisting of n numbers and r hash Majority element functions h 1 , . . . , h r , where h i : N → { 1 , . . . , b } Count-Min Sketch Output: CMS [ · , · ] table consisting of r rows and b columns Complexity Analysis 1 for i = 1 to r do Markov’s for j = 1 to b do 2 Inequality CMS [ i, j ] ← 0 3 Proof of the claim end 4 Conclusions 5 end 6 for i = 1 to n do for j = 1 to r do 7 CMS [ j, h j ( A [ i ])] ← CMS [ j, h j ( A [ i ])] + 1 8 end 9 10 end 11 return CMS [ · , · ]

  12. Illustration of Algorithm Count-Min Sketch Anil Maheshwari Let b = 10 and r = 3 . Majority element Assume that stream A = xyy . Count-Min Sketch Complexity Assume the following h -values for x and y : Analysis For x : h 1 ( x ) = 3 , h 2 ( x ) = 8 , and h 3 ( x ) = 5 Markov’s Inequality For y : h 1 ( y ) = 6 , h 2 ( y ) = 8 , and h 3 ( y ) = 1 Proof of the claim Conclusions 1 2 3 4 5 6 7 8 9 10 1 CMS [ ⇤ , ⇤ ] = 2 3 for i = 1 to n do for j = 1 to r do CMS [ j, h j ( A [ i ])] ← CMS [ j, h j ( A [ i ])] + 1 end end

  13. Observations Count-Min Sketch Anil Maheshwari Let n = Total number of items in the stream. Majority element f ∗ x = True frequency of x in the stream. Count-Min Sketch Complexity Analysis Let f x = min { CMS [1 , h 1 ( x )] , . . . , CMS [ r, h r ( x )] } . Markov’s Inequality Report f x as the estimate on the frequency of x . Proof of the claim Conclusions Observations: The size of CMS table ( = br ) is independent of n . 1 CMS table can be computed in O ( br + nr ) time. 2 For any x 2 A , and for any j = 1 , . . . , r , 3 CMS [ j, h j ( x )] � f ∗ x f x is an overestimate as f x � f ∗ 4 x

  14. Assume - Proof comes later Count-Min Sketch Anil Maheshwari Majority element Claim Count-Min Sketch Let b = 2 x � ✏ n ]  1 ✏ . Then Pr [ f x � f ∗ Complexity 2 r Analysis Markov’s Inequality Proof of the claim Conclusions Corollary With probability at least 1 � 1 / 2 r , f ∗ x  f x  f ∗ x + ✏ n

  15. Reporting Frequent Elements Count-Min Sketch Anil Maheshwari Suppose we want to report all the elements of A that Majority element occur approximately � n/k times for some integer k . Count-Min Sketch Complexity In the Claim, set ✏ = 1 / 3 k . Then b = 2 ✏ = 6 k . Analysis Markov’s Construct CMS table of size br = 6 kr Inequality Scan A and compute the entries in the CMS table Proof of the claim Conclusions Maintain a set of O ( k ) items that occur most frequently among all the elements in A scanned so far.

  16. Heap Data Structure Count-Min Sketch Anil Maheshwari The items are stored in a HEAP with f x values as the key. Majority element Count-Min Sketch What is a Heap? Complexity Analysis An array that stores n elements and supports: Markov’s Inequality Find Max or Min: Report the element with the Proof of the claim smallest/largest key value in Heap in O (1) time. Conclusions Insert ( x, k ) : Insert element x with key k in Heap in O (log n ) time. Delete ( x ) : Delete element x from Heap in O (log n ) time. . . .

  17. Reporting Frequent Elements contd. Count-Min Sketch Anil Maheshwari Assume we have scanned i � 1 items and have updated Majority element the CMS table and the heap. Count-Min Sketch Complexity Consider the i -th item (say x = A [ i ] ) and we perform the Analysis following: Markov’s Inequality For j = 1 to r : update the CMS table by executing 1 Proof of the claim CMS [ j, h j ( x )] CMS [ j, h j ( x )] + 1 . Conclusions Let f x = min { CMS [1 , h 1 ( x )] , . . . , CMS [ r, h r ( x )] } . 2 If f x � i/k , do: If x 2 heap, delete x and re-insert it again with the 1 updated f x value. If x 62 heap, then insert it in the heap and remove all 2 the elements whose count is less than i/k .

  18. Reporting Frequent Elements contd. Count-Min Sketch Anil Maheshwari Majority element Claim [Cormode and Muthukrishnan 2005] Count-Min Sketch Elements that occur approx. n/k times in a data stream Complexity Analysis of size n can be reported in O ( kr + nr + n log k ) time Markov’s using O ( kr ) space with high probability. Inequality Proof of the claim Proof. Conclusions Recall Corollary: f ∗ x  f x  f ∗ x + ✏ n = f ∗ x + n/ 3 k . This implies: Heap contains elements whose frequency is at least n/k � n/ 3 k = 0 . 667 n/k (with high probability). Size of heap = O ( k ) Time Complexity: O ( br + nr + n log k ) = O ( kr + nr + n log k ) as b = 2 ✏ = 6 k . Total Space= O ( br + k ) = O ( kr )

  19. Markov’s Inequality Count-Min Sketch Anil Maheshwari Majority element Theorem Count-Min Sketch Let X be a non-negative discrete random variable and Complexity Analysis s > 0 be a constant. Then P ( X � s )  E [ X ] /s . Markov’s Inequality Proof of the claim Conclusions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend