Frequency moments and Counting Distinct Elements Lecture 05 - - PowerPoint PPT Presentation

frequency moments and counting distinct elements
SMART_READER_LITE
LIVE PREVIEW

Frequency moments and Counting Distinct Elements Lecture 05 - - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Frequency moments and Counting Distinct Elements Lecture 05 September 8, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28 Part I Frequency Moments Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28


slide-1
SLIDE 1

CS 498ABD: Algorithms for Big Data

Frequency moments and Counting Distinct Elements

Lecture 05

September 8, 2020

Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28

slide-2
SLIDE 2

Part I Frequency Moments

Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28

slide-3
SLIDE 3

Streaming model

The input consists of m objects/items/tokens e1, e2, . . . , em that are seen one by one by the algorithm. The algorithm has “limited” memory say for B tokens where B < m (often B ⌧ m) and hence cannot store all the input Want to compute interesting functions over input Examples: Each token in a number from [n] High-speed network switch: tokens are packets with source, destination IP addresses and message contents. Each token is an edge in graph (graph streams) Each token in a point in some feature space Each token is a row/column of a matrix

Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 28

slide-4
SLIDE 4

Frequency Moment Problem(s)

A fundamental class of problems Formally introduced in the seminal paper of Alon Matias, Szegedy titled “The Space Complexity of Approximating the Frequency Moments” in 1999.

Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 28

1
slide-5
SLIDE 5

Frequency Moment Problem(s)

A fundamental class of problems Formally introduced in the seminal paper of Alon Matias, Szegedy titled “The Space Complexity of Approximating the Frequency Moments” in 1999. Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Example: n = 5 and stream is 4, 2, 4, 1, 1, 1, 4, 5

Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 28

=

slide-6
SLIDE 6

Frequency Moments

Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Given a stream let fi denote the frequency of i or number of times i is seen in the stream Consider vector f = (f1, f2, . . . , fn) For k 0 the k’th frequency moment Fk = P

i f k i . We can

also consider the `k norm of f which is (Fk)1/k. Example: n = 5 and stream is 4, 2, 4, 1, 1, 1, 4, 5

Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 28

I

m=8

f , =3

fit

f,

  • O

tu =3 tf

I

slide-7
SLIDE 7

Frequency Moments

Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Given a stream let fi denote the frequency of i or number of times i is seen in the stream Consider vector f = (f1, f2, . . . , fn) For k 0 the k’th frequency moment Fk = P

i f k i .

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

slide-8
SLIDE 8

Frequency Moments

Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Given a stream let fi denote the frequency of i or number of times i is seen in the stream Consider vector f = (f1, f2, . . . , fn) For k 0 the k’th frequency moment Fk = P

i f k i .

Important cases/regimes: k = 0: F0 is simply the number of distinct elements in stream

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

slide-9
SLIDE 9

Frequency Moments

Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Given a stream let fi denote the frequency of i or number of times i is seen in the stream Consider vector f = (f1, f2, . . . , fn) For k 0 the k’th frequency moment Fk = P

i f k i .

Important cases/regimes: k = 0: F0 is simply the number of distinct elements in stream k = 1: F1 is the length of stream which is easy

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

slide-10
SLIDE 10

Frequency Moments

Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Given a stream let fi denote the frequency of i or number of times i is seen in the stream Consider vector f = (f1, f2, . . . , fn) For k 0 the k’th frequency moment Fk = P

i f k i .

Important cases/regimes: k = 0: F0 is simply the number of distinct elements in stream k = 1: F1 is the length of stream which is easy k = 2: F2 is fundamental in many ways as we will see

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

=

slide-11
SLIDE 11

Frequency Moments

Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Given a stream let fi denote the frequency of i or number of times i is seen in the stream Consider vector f = (f1, f2, . . . , fn) For k 0 the k’th frequency moment Fk = P

i f k i .

Important cases/regimes: k = 0: F0 is simply the number of distinct elements in stream k = 1: F1 is the length of stream which is easy k = 2: F2 is fundamental in many ways as we will see k = 1: F∞ is the maximum frequency (heavy hitters prob)

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

slide-12
SLIDE 12

Frequency Moments

Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Given a stream let fi denote the frequency of i or number of times i is seen in the stream Consider vector f = (f1, f2, . . . , fn) For k 0 the k’th frequency moment Fk = P

i f k i .

Important cases/regimes: k = 0: F0 is simply the number of distinct elements in stream k = 1: F1 is the length of stream which is easy k = 2: F2 is fundamental in many ways as we will see k = 1: F∞ is the maximum frequency (heavy hitters prob) 0 < k < 1 and 1 < k < 2

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

slide-13
SLIDE 13

Frequency Moments

Stream consists of e1, e2, . . . , em where each ei is an integer in [n]. We know n in advance (or an upper bound) Given a stream let fi denote the frequency of i or number of times i is seen in the stream Consider vector f = (f1, f2, . . . , fn) For k 0 the k’th frequency moment Fk = P

i f k i .

Important cases/regimes: k = 0: F0 is simply the number of distinct elements in stream k = 1: F1 is the length of stream which is easy k = 2: F2 is fundamental in many ways as we will see k = 1: F∞ is the maximum frequency (heavy hitters prob) 0 < k < 1 and 1 < k < 2 2 < k < 1

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

slide-14
SLIDE 14

Frequency Moments: Questions

Estimation Given a stream and k can we estimate Fk exactly/approximately with small memory?

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

slide-15
SLIDE 15

Frequency Moments: Questions

Estimation Given a stream and k can we estimate Fk exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k

i ?

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

slide-16
SLIDE 16

Frequency Moments: Questions

Estimation Given a stream and k can we estimate Fk exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k

i ?

Sketching Given a stream and k can we create a sketch/summary of small size?

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

slide-17
SLIDE 17

Frequency Moments: Questions

Estimation Given a stream and k can we estimate Fk exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k

i ?

Sketching Given a stream and k can we create a sketch/summary of small size? Questions easy if we have memory Ω(n): store f explicitly. Interesting when memory is ⌧ n. Ideally want to do it with logc n memory for some fixed c 1 (polylog(n)). Note that log n is roughly the memory required to store one token/number.

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

slide-18
SLIDE 18

Need for approximation and randomization

For most of the interesting problems Ω(n) lower bound on memory if

  • ne wants exact answer or wants deterministic algorithms.

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

slide-19
SLIDE 19

Need for approximation and randomization

For most of the interesting problems Ω(n) lower bound on memory if

  • ne wants exact answer or wants deterministic algorithms. Hence

focus on (1 ± ✏)-approximation or constant factor approximation and randomized algorithms

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

slide-20
SLIDE 20

Need for approximation and randomization

For most of the interesting problems Ω(n) lower bound on memory if

  • ne wants exact answer or wants deterministic algorithms. Hence

focus on (1 ± ✏)-approximation or constant factor approximation and randomized algorithms

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

slide-21
SLIDE 21

Relative approximation

Let g() be a real-valued non-negative function over streams . Definition Let A() be the real-valued output of a randomized streaming algorithm on stream . We say that A provides an (↵, ) relative approximation for a real-valued function g if for all : Pr  |A() g() 1| > ↵

  •  .

Our ideal goal is to obtain a (✏, )-approximation for any given ✏, 2 (0, 1).

Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 28

slide-22
SLIDE 22

Additive approximation

Let g() be a real-valued function over streams . If g() can be negative, focus on additive approximation. Definition Let A() be the real-valued output of a randomized streaming algorithm on stream . We say that A provides an (↵, ) additive approximation for a real-valued function g if for all : Pr [|A() g()| > ↵]  . When working with additive approximations some normalization/scaling is typically necessary. Our ideal goal is to

  • btain a (✏, )-approximation for any given ✏, 2 (0, 1).

Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 28

slide-23
SLIDE 23

Part II Estimating Distinct Elements

Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 28

slide-24
SLIDE 24

Distinct Elements

Given a stream how many distinct elements did we see? Example: in a network switch, during some time window how many distinct destination (or source) IP addresses were seen in the packets?

Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 28

I , 1,1 ,

I , ,

  • -

l

l , to

, 1,1 ,

I , I , 5,5 , I , 1,1 ,

. . , I
slide-25
SLIDE 25

Distinct Elements

Given a stream how many distinct elements did we see? Example: in a network switch, during some time window how many distinct destination (or source) IP addresses were seen in the packets? Offline solution?

Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 28

slide-26
SLIDE 26

Distinct Elements

Given a stream how many distinct elements did we see? Example: in a network switch, during some time window how many distinct destination (or source) IP addresses were seen in the packets? Offline solution? via Dictionary data structure

Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 28

slide-27
SLIDE 27

Offline Solution

DistinctElements Initialize dictionary D to be empty k 0 While (stream is not empty) do Let e be next item in stream If (e 62 D) then Insert e into D k k + 1 EndWhile Output k

Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 28

slide-28
SLIDE 28

Offline Solution

DistinctElements Initialize dictionary D to be empty k 0 While (stream is not empty) do Let e be next item in stream If (e 62 D) then Insert e into D k k + 1 EndWhile Output k

Which dictionary data structure?

Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 28

slide-29
SLIDE 29

Offline Solution

DistinctElements Initialize dictionary D to be empty k 0 While (stream is not empty) do Let e be next item in stream If (e 62 D) then Insert e into D k k + 1 EndWhile Output k

Which dictionary data structure? Binary search trees: space O(k) and total time O(m log k) Hashing: space O(k) and expected time O(m).

Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 28

slide-30
SLIDE 30

Hashing based idea

Use hash function h : [n] ! [N] for some N polynomial in n. Store only the minimum hash value seen. That is minei h(ei). Need only O(log n) bits since numbers are in range [N].

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 28

=

=

=

=

UN

z @ n )

slide-31
SLIDE 31

Hashing based idea

Use hash function h : [n] ! [N] for some N polynomial in n. Store only the minimum hash value seen. That is minei h(ei). Need only O(log n) bits since numbers are in range [N]. Question: why is this good? Assume idealized hash function: h : [n] ! [0, 1] that is fully random over the real interval

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 28

slide-32
SLIDE 32

Hashing based idea

Use hash function h : [n] ! [N] for some N polynomial in n. Store only the minimum hash value seen. That is minei h(ei). Need only O(log n) bits since numbers are in range [N]. Question: why is this good? Assume idealized hash function: h : [n] ! [0, 1] that is fully random over the real interval Suppose there are k distinct elements in the stream

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 28

I , I

101 Is 1 , I 2.

,

toe , 5

.
slide-33
SLIDE 33

Hashing based idea

Use hash function h : [n] ! [N] for some N polynomial in n. Store only the minimum hash value seen. That is minei h(ei). Need only O(log n) bits since numbers are in range [N]. Question: why is this good? Assume idealized hash function: h : [n] ! [0, 1] that is fully random over the real interval Suppose there are k distinct elements in the stream What is the expected value of the minimum of hash values?

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 28

slide-34
SLIDE 34

Analyzing idealized hash function

Lemma Suppose X1, X2, . . . , Xk are random variables that are independent and uniformaly distributed in [0, 1] and let Y = mini Xi. Then E[Y ] =

1 (k+1).

Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 28

Daly C- Et ,ttdt ] ]

tttdt

1*-1

  • Lk

, )

  • HEE'LL

. o#- →

i

EAT

  • so

'

1st

"? :&

's""

  • 9dg

Ftl

'
slide-35
SLIDE 35

Analyzing idealized hash function

Lemma Suppose X1, X2, . . . , Xk are random variables that are independent and uniformaly distributed in [0, 1] and let Y = mini Xi. Then E[Y ] =

1 (k+1).

DistinctElements Assume ideal hash function h : [n] ! [0, 1] y 1 While (stream is not empty) do Let e be next item in stream y min(y, h(e)) EndWhile Output

1 y 1

Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 28

slide-36
SLIDE 36

Analyzing idealized hash function

Lemma Suppose X1, X2, . . . , Xk are random variables that are independent and uniformaly distributed in [0, 1] and let Y = mini Xi. Then E[Y ] =

1 (k+1).

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 28

slide-37
SLIDE 37

Analyzing idealized hash function

Lemma Suppose X1, X2, . . . , Xk are random variables that are independent and uniformaly distributed in [0, 1] and let Y = mini Xi. Then E[Y ] =

1 (k+1).

Lemma Suppose X1, X2, . . . , Xk are random variables that are independent and uniformaly distributed in [0, 1] and let Y = mini Xi. Then E ⇥ Y 2⇤ =

2 (k+1)(k+2) and Var(Y ) = k (k+1)2(k+2)  1 (k+1)2.

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 28

O

Ely D=

fo 't ?(Y ) G - H

" - 'de

un

slide-38
SLIDE 38

Analyzing idealized hash function

Apply standard methodology to go from exact statistical estimator to good bounds: average h parallel and independent estimates to reduce variance apply Chebyshev to show that the average estimator is a (1 + ✏)-approximation with constant probability use preceding and median trick with O(log 1/) parallel copies to obtain a (1 + ✏)-approximation with probability (1 )

Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 28

slide-39
SLIDE 39

Averaging and reducing variance

1

Run basic estimator independently and in parallel h times to

  • btain X1, X2, . . . , Xh

2

Let Z = 1

hXi

3

Output 1

Z 1

Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 28

slide-40
SLIDE 40

Averaging and reducing variance

1

Run basic estimator independently and in parallel h times to

  • btain X1, X2, . . . , Xh

2

Let Z = 1

hXi

3

Output 1

Z 1

Claim: E[Z] =

1 (k+1) and Var(Z)  1 h 1 (k+1)2.

Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 28

slide-41
SLIDE 41

Averaging and reducing variance

1

Run basic estimator independently and in parallel h times to

  • btain X1, X2, . . . , Xh

2

Let Z = 1

hXi

3

Output 1

Z 1

Claim: E[Z] =

1 (k+1) and Var(Z)  1 h 1 (k+1)2.

Choosing h = 1/(⌘✏2) and using Chebyshev: Pr h |Z

1 k+1| ✏ k+1

i  ⌘.

Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 28

*

at

Ti

i

¥n¥

a-①

slide-42
SLIDE 42

Averaging and reducing variance

1

Run basic estimator independently and in parallel h times to

  • btain X1, X2, . . . , Xh

2

Let Z = 1

hXi

3

Output 1

Z 1

Claim: E[Z] =

1 (k+1) and Var(Z)  1 h 1 (k+1)2.

Choosing h = 1/(⌘✏2) and using Chebyshev: Pr h |Z

1 k+1| ✏ k+1

i  ⌘. Hence Pr ⇥ |( 1

Z 1) k|

⇤ O(✏)k  ⌘.

Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 28

slide-43
SLIDE 43

Averaging and reducing variance

1

Run basic estimator independently and in parallel h times to

  • btain X1, X2, . . . , Xh

2

Let Z = 1

hXi

3

Output 1

Z 1

Claim: E[Z] =

1 (k+1) and Var(Z)  1 h 1 (k+1)2.

Choosing h = 1/(⌘✏2) and using Chebyshev: Pr h |Z

1 k+1| ✏ k+1

i  ⌘. Hence Pr ⇥ |( 1

Z 1) k|

⇤ O(✏)k  ⌘. Repeat O(log 1/) times and output median. Error probability < .

Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 28

I

:EIE±

#

=

slide-44
SLIDE 44

Algorithm via regular hashing

Do not have idealized hash function. Use h : [n] ! [N] for appropriate choice of N Use pairwise independent hash family H so that random h 2 H can be stored in small space and computation can be done in small memory and fast Several variants of idea with different trade offs between memory time to process each new element of the stream approximation quality and probability of success

Chandra (UIUC) CS498ABD 19 Fall 2020 19 / 28