Frequency moments and Counting Distinct Elements Lecture 05 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Frequency moments and Counting Distinct Elements Lecture 05 September 8, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28

Part I Frequency Moments Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28

Streaming model The input consists of m objects/items/tokens e 1 , e 2 , . . . , e m that are seen one by one by the algorithm. The algorithm has “limited” memory say for B tokens where B < m (often B ⌧ m ) and hence cannot store all the input Want to compute interesting functions over input Examples: Each token in a number from [ n ] High-speed network switch: tokens are packets with source, destination IP addresses and message contents. Each token is an edge in graph (graph streams) Each token in a point in some feature space Each token is a row/column of a matrix Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 28

Frequency Moment Problem(s) A fundamental class of problems Formally introduced in the seminal paper of Alon Matias, 1 Szegedy titled “The Space Complexity of Approximating the Frequency Moments” in 1999. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 28

Frequency Moment Problem(s) A fundamental class of problems Formally introduced in the seminal paper of Alon Matias, Szegedy titled “The Space Complexity of Approximating the Frequency Moments” in 1999. Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Example: n = 5 and stream is 4 , 2 , 4 , 1 , 1 , 1 , 4 , 5 = Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 28

⇒ Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . We can also consider the ` k norm of f which is ( F k ) 1 / k . Example: n = 5 and stream is 4 , 2 , 4 , 1 , 1 , 1 , 4 , 5 tu =3 tf I I f , fit - O m=8 f , =3 - Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 28

Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see = Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) 0 < k < 1 and 1 < k < 2 Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) 0 < k < 1 and 1 < k < 2 2 < k < 1 Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Sketching Given a stream and k can we create a sketch/summary of small size? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Sketching Given a stream and k can we create a sketch/summary of small size? Questions easy if we have memory Ω ( n ) : store f explicitly. Interesting when memory is ⌧ n . Ideally want to do it with log c n memory for some fixed c � 1 (polylog ( n ) ). Note that log n is roughly the memory required to store one token/number. Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

Need for approximation and randomization For most of the interesting problems Ω ( n ) lower bound on memory if one wants exact answer or wants deterministic algorithms. Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

Need for approximation and randomization For most of the interesting problems Ω ( n ) lower bound on memory if one wants exact answer or wants deterministic algorithms. Hence focus on (1 ± ✏ ) -approximation or constant factor approximation and randomized algorithms Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

Relative approximation Let g ( � ) be a real-valued non-negative function over streams � . Definition Let A ( � ) be the real-valued output of a randomized streaming algorithm on stream � . We say that A provides an ( ↵ , � ) relative approximation for a real-valued function g if for all � :  |A ( � ) � Pr g ( � ) � 1 | > ↵  � . Our ideal goal is to obtain a ( ✏ , � ) -approximation for any given ✏ , � 2 (0 , 1) . Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 28

Additive approximation Let g ( � ) be a real-valued function over streams � . If g ( � ) can be negative, focus on additive approximation. Definition Let A ( � ) be the real-valued output of a randomized streaming algorithm on stream � . We say that A provides an ( ↵ , � ) additive approximation for a real-valued function g if for all � : Pr [ |A ( � ) � g ( � ) | > ↵ ]  � . When working with additive approximations some normalization/scaling is typically necessary. Our ideal goal is to obtain a ( ✏ , � ) -approximation for any given ✏ , � 2 (0 , 1) . Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 28

Part II Estimating Distinct Elements Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 28

Frequency moments and Counting Distinct Elements Lecture 05 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Frequency moments and Counting Distinct Elements Lecture 05 September 8, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28 Part I Frequency Moments Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Counting Review: Bijections Counting Infinite Sets A function f : A B is: one-to-one

Estimating Frequency Moments of Streams In this class we will look at the two simple sketches for

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting and Probability Whats to come? Counting and Probability Whats to come?

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Streaming Algorithm: Filtering & Counting Distinct Elements CompSci 590.02 Instructor:

Big-Data Algorithms: Counting Distinct Elements in a Stream Reference:

Frequency Counting Many problems can be solved by counting the number of times each character

Elements of Future COP Elements of Future COP Elements of Future COP Elements of Future COP

Some Recent Advances in the Analytic Enumeration of Circulant Graphs Valery Liskovets Institute

Architecture Specific Code Generation and Function Multiversioning Eric Christopher

SkyArrow operations B. Gioli, A. Zaldei, P. Toscano, E. Magliulo CNR IBIMET & ISAFOM BLLAST

PRESENTATION Want big impact? USE BIG IMAGE 2 Source: The Indian Express Want big impact? USE

Moments in Quantum Information Theory Sabine Burgdorf University of Konstanz EWM GM 2018 - Graz

Computation of operators in wavelet coordinates Tsogtgerel Gantumur and Rob Stevenson Department

Moments of Traces for Circular -ensembles Tiefeng Jiang University of Minnesota This is a

Foundations of Computing II Lecture 24: Biased Estimation Stefano Tessaro

Frequency moments and Counting Distinct Elements Lecture 05 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Frequency moments and Counting Distinct Elements Lecture 05 September 8, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28 Part I Frequency Moments Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Counting Review: Bijections Counting Infinite Sets A function f : A B is: one-to-one

Estimating Frequency Moments of Streams In this class we will look at the two simple sketches for

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting and Probability Whats to come? Counting and Probability Whats to come?

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Streaming Algorithm: Filtering &amp; Counting Distinct Elements CompSci 590.02 Instructor:

Big-Data Algorithms: Counting Distinct Elements in a Stream Reference:

Frequency Counting Many problems can be solved by counting the number of times each character

Elements of Future COP Elements of Future COP Elements of Future COP Elements of Future COP

Some Recent Advances in the Analytic Enumeration of Circulant Graphs Valery Liskovets Institute

Architecture Specific Code Generation and Function Multiversioning Eric Christopher

SkyArrow operations B. Gioli, A. Zaldei, P. Toscano, E. Magliulo CNR IBIMET &amp; ISAFOM BLLAST

PRESENTATION Want big impact? USE BIG IMAGE 2 Source: The Indian Express Want big impact? USE

Moments in Quantum Information Theory Sabine Burgdorf University of Konstanz EWM GM 2018 - Graz

Computation of operators in wavelet coordinates Tsogtgerel Gantumur and Rob Stevenson Department

Moments of Traces for Circular -ensembles Tiefeng Jiang University of Minnesota This is a

Foundations of Computing II Lecture 24: Biased Estimation Stefano Tessaro

Streaming Algorithm: Filtering & Counting Distinct Elements CompSci 590.02 Instructor:

SkyArrow operations B. Gioli, A. Zaldei, P. Toscano, E. Magliulo CNR IBIMET & ISAFOM BLLAST