frequency moments and counting distinct elements
play

Frequency moments and Counting Distinct Elements Lecture 05 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Frequency moments and Counting Distinct Elements Lecture 05 September 8, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28 Part I Frequency Moments Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28


  1. CS 498ABD: Algorithms for Big Data Frequency moments and Counting Distinct Elements Lecture 05 September 8, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28

  2. Part I Frequency Moments Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28

  3. Streaming model The input consists of m objects/items/tokens e 1 , e 2 , . . . , e m that are seen one by one by the algorithm. The algorithm has “limited” memory say for B tokens where B < m (often B ⌧ m ) and hence cannot store all the input Want to compute interesting functions over input Examples: Each token in a number from [ n ] High-speed network switch: tokens are packets with source, destination IP addresses and message contents. Each token is an edge in graph (graph streams) Each token in a point in some feature space Each token is a row/column of a matrix Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 28

  4. Frequency Moment Problem(s) A fundamental class of problems Formally introduced in the seminal paper of Alon Matias, 1 Szegedy titled “The Space Complexity of Approximating the Frequency Moments” in 1999. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 28

  5. Frequency Moment Problem(s) A fundamental class of problems Formally introduced in the seminal paper of Alon Matias, Szegedy titled “The Space Complexity of Approximating the Frequency Moments” in 1999. Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Example: n = 5 and stream is 4 , 2 , 4 , 1 , 1 , 1 , 4 , 5 = Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 28

  6. ⇒ Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . We can also consider the ` k norm of f which is ( F k ) 1 / k . Example: n = 5 and stream is 4 , 2 , 4 , 1 , 1 , 1 , 4 , 5 tu =3 tf I I f , fit - O m=8 f , =3 - Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 28

  7. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  8. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  9. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  10. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see = Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  11. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  12. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) 0 < k < 1 and 1 < k < 2 Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  13. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) 0 < k < 1 and 1 < k < 2 2 < k < 1 Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  14. Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

  15. Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

  16. Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Sketching Given a stream and k can we create a sketch/summary of small size? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

  17. Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Sketching Given a stream and k can we create a sketch/summary of small size? Questions easy if we have memory Ω ( n ) : store f explicitly. Interesting when memory is ⌧ n . Ideally want to do it with log c n memory for some fixed c � 1 (polylog ( n ) ). Note that log n is roughly the memory required to store one token/number. Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

  18. Need for approximation and randomization For most of the interesting problems Ω ( n ) lower bound on memory if one wants exact answer or wants deterministic algorithms. Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

  19. Need for approximation and randomization For most of the interesting problems Ω ( n ) lower bound on memory if one wants exact answer or wants deterministic algorithms. Hence focus on (1 ± ✏ ) -approximation or constant factor approximation and randomized algorithms Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

  20. Need for approximation and randomization For most of the interesting problems Ω ( n ) lower bound on memory if one wants exact answer or wants deterministic algorithms. Hence focus on (1 ± ✏ ) -approximation or constant factor approximation and randomized algorithms Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

  21. Relative approximation Let g ( � ) be a real-valued non-negative function over streams � . Definition Let A ( � ) be the real-valued output of a randomized streaming algorithm on stream � . We say that A provides an ( ↵ , � ) relative approximation for a real-valued function g if for all � :  |A ( � ) � Pr g ( � ) � 1 | > ↵  � . Our ideal goal is to obtain a ( ✏ , � ) -approximation for any given ✏ , � 2 (0 , 1) . Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 28

  22. Additive approximation Let g ( � ) be a real-valued function over streams � . If g ( � ) can be negative, focus on additive approximation. Definition Let A ( � ) be the real-valued output of a randomized streaming algorithm on stream � . We say that A provides an ( ↵ , � ) additive approximation for a real-valued function g if for all � : Pr [ |A ( � ) � g ( � ) | > ↵ ]  � . When working with additive approximations some normalization/scaling is typically necessary. Our ideal goal is to obtain a ( ✏ , � ) -approximation for any given ✏ , � 2 (0 , 1) . Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 28

  23. Part II Estimating Distinct Elements Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend