CS 473: Algorithms Chandra Chekuri Ruta Mehta University of - - PowerPoint PPT Presentation

cs 473 algorithms
SMART_READER_LITE
LIVE PREVIEW

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of - - PowerPoint PPT Presentation

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall 2016 Chandra & Ruta (UIUC) CS473 1 Fall 2016 1 / 32 CS 473: Algorithms, Fall 2016 Streaming Algorithms Lecture 12 October 5, 2016 Chandra


slide-1
SLIDE 1

CS 473: Algorithms

Chandra Chekuri Ruta Mehta

University of Illinois, Urbana-Champaign

Fall 2016

Chandra & Ruta (UIUC) CS473 1 Fall 2016 1 / 32

slide-2
SLIDE 2

CS 473: Algorithms, Fall 2016

Streaming Algorithms

Lecture 12

October 5, 2016

Chandra & Ruta (UIUC) CS473 2 Fall 2016 2 / 32

slide-3
SLIDE 3

Streaming Algorithms

A topic that is both very old, and very current!

Dawn of CS..

Data was stored on tapes, and amount of RAM was very small. Too much data, too little space. Store only summary or sketch of data.

Chandra & Ruta (UIUC) CS473 3 Fall 2016 3 / 32

slide-4
SLIDE 4

Streaming Algorithms

A topic that is both very old, and very current!

Dawn of CS..

Data was stored on tapes, and amount of RAM was very small. Too much data, too little space. Store only summary or sketch of data.

Now..

Terabytes of memory, Gigabytes of RAM. Data streams: Humongous amount of data (sometimes never ending)! Can go over it at most once, and sometimes not even that! Store only summary: sub-linear space-time algorithms.

Chandra & Ruta (UIUC) CS473 3 Fall 2016 3 / 32

slide-5
SLIDE 5

Examples

An internet router sees a stream of packets, and may want to know, which connection is using the most packets how many different connections median of the file sizes transferred since mid-night which connections are using more than 0.1% of the bandwidth. Computing aggregative information about data streams.

Chandra & Ruta (UIUC) CS473 4 Fall 2016 4 / 32

slide-6
SLIDE 6

Outline

Computation with data streams. Heavy-hitters Majority element (by R. Boyer and J.S. Moore) ǫ-heavy hitters – deterministic Approximate counting Counting using hashing – Count-min Sketch (Cormode-Muthukrishnan’05) Variant of Bloom filters.

Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 32

slide-7
SLIDE 7

Data Streams

A stream of data elements, S = a1, a2, . . . . Say at arrive at time t. Let us assume that at’s are numbers for this lecture.

Chandra & Ruta (UIUC) CS473 6 Fall 2016 6 / 32

slide-8
SLIDE 8

Data Streams

A stream of data elements, S = a1, a2, . . . . Say at arrive at time t. Let us assume that at’s are numbers for this lecture. Denote a[1..t] = a1, a2, . . . , at. Given some function we want to compute it continually, while using limited space. at any time t we should be able to query the function value on the stream seen so far, i.e., a[1..t].

Chandra & Ruta (UIUC) CS473 6 Fall 2016 6 / 32

slide-9
SLIDE 9

Examples

S = 3, 1, 17, 4, −9, 32, 101, 3, −722, 3, 900, 4, 32, ...

Computing Sum

F(a[1..t]) = t

i=1 ai

Outputs are: 3, 4, 21, 25, 16, 48, 149, 152, -570, ...

Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 32

slide-10
SLIDE 10

Examples

S = 3, 1, 17, 4, −9, 32, 101, 3, −722, 3, 900, 4, 32, ...

Computing Sum

F(a[1..t]) = t

i=1 ai

Outputs are: 3, 4, 21, 25, 16, 48, 149, 152, -570, ... Keep a counter, and keep adding to it. After T rounds, the number can be at most T2b. O(b + log T) space.

Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 32

slide-11
SLIDE 11

Examples

S = 3, 1, 17, 4, −9, 32, 101, 3, −722, 3, 900, 4, 32, ...

Computing max

F(a[1..t]) = maxt

i=1 ai

Outputs are: 3, 3, 17, 17, 17, 32, 101, 101, ... Just need to store b bits.

Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

slide-12
SLIDE 12

Examples

S = 3, 1, 17, 4, −9, 32, 101, 3, −722, 3, 900, 4, 32, ...

Computing max

F(a[1..t]) = maxt

i=1 ai

Outputs are: 3, 3, 17, 17, 17, 32, 101, 101, ... Just need to store b bits. Median?

Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

slide-13
SLIDE 13

Examples

S = 3, 1, 17, 4, −9, 32, 101, 3, −722, 3, 900, 4, 32, ...

Computing max

F(a[1..t]) = maxt

i=1 ai

Outputs are: 3, 3, 17, 17, 17, 32, 101, 101, ... Just need to store b bits. Median? A lot more tricky

Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

slide-14
SLIDE 14

Examples

S = 3, 1, 17, 4, −9, 32, 101, 3, −722, 3, 900, 4, 32, ...

Computing max

F(a[1..t]) = maxt

i=1 ai

Outputs are: 3, 3, 17, 17, 17, 32, 101, 101, ... Just need to store b bits. Median? A lot more tricky # distinct elements?

Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

slide-15
SLIDE 15

Examples

S = 3, 1, 17, 4, −9, 32, 101, 3, −722, 3, 900, 4, 32, ...

Computing max

F(a[1..t]) = maxt

i=1 ai

Outputs are: 3, 3, 17, 17, 17, 32, 101, 101, ... Just need to store b bits. Median? A lot more tricky # distinct elements? also tricky!

Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

slide-16
SLIDE 16

Streaming Algorithms: Framework

〈Initialize summary information〉 While stream S is not done x ← next element in S 〈Do something with x and update summary information〉 〈Output something if needed〉 Return 〈summary〉

Chandra & Ruta (UIUC) CS473 9 Fall 2016 9 / 32

slide-17
SLIDE 17

Streaming Algorithms: Framework

〈Initialize summary information〉 While stream S is not done x ← next element in S 〈Do something with x and update summary information〉 〈Output something if needed〉 Return 〈summary〉 Despite of restrictions, we can compute interesting functions if we can tolerate some error.

Chandra & Ruta (UIUC) CS473 9 Fall 2016 9 / 32

slide-18
SLIDE 18

Streaming Algorithms: One-sided Error

No false negative

Anything that needs to be considered/counted should be counted.

There may be false positive

We may over count. That is we may consider/count something that shouldn’t have been counted.

Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 32

slide-19
SLIDE 19

Part I Heavy Hitters

Chandra & Ruta (UIUC) CS473 11 Fall 2016 11 / 32

slide-20
SLIDE 20

Finding the Majority Element

Find the element that occur strictly more than half the time, if any. Note that at most one such element!

Chandra & Ruta (UIUC) CS473 12 Fall 2016 12 / 32

slide-21
SLIDE 21

Finding the Majority Element

Find the element that occur strictly more than half the time, if any. Note that at most one such element! E, D, B, D, D5, D, B, B, B, B, B11, E, E, E, E, E16 At time 5, it is D. At time 11, it is B At time 16, none!

Chandra & Ruta (UIUC) CS473 12 Fall 2016 12 / 32

slide-22
SLIDE 22

Finding the Majority Element

Find the element that accrue strictly more than half the time, if any.

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0

Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 32

slide-23
SLIDE 23

Finding the Majority Element

Find the element that accrue strictly more than half the time, if any.

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0 When element at arrives if (counter == 0) set mem=at and counter=1

Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 32

slide-24
SLIDE 24

Finding the Majority Element

Find the element that accrue strictly more than half the time, if any.

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0 When element at arrives if (counter == 0) set mem=at and counter=1 else if (at == mem) then counter++

Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 32

slide-25
SLIDE 25

Finding the Majority Element

Find the element that accrue strictly more than half the time, if any.

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0 When element at arrives if (counter == 0) set mem=at and counter=1 else if (at == mem) then counter++ else counter−− (discard at and a copy of mem) Return mem.

Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 32

slide-26
SLIDE 26

Finding the Majority Element

Find the element that accrue strictly more than half the time, if any.

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0 When element at arrives if (counter == 0) set mem=at and counter=1 else if (at == mem) then counter++ else counter−− (discard at and a copy of mem) Return mem. Even if no majority element, something is returned – False positive.

Chandra & Ruta (UIUC) CS473 13 Fall 2016 13 / 32

slide-27
SLIDE 27

Finding the Majority Element: Example

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0

Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 32

slide-28
SLIDE 28

Finding the Majority Element: Example

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0 When element at arrives if (counter == 0) set mem=at and counter=1

Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 32

slide-29
SLIDE 29

Finding the Majority Element: Example

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0 When element at arrives if (counter == 0) set mem=at and counter=1 else if (at == mem) then counter++

Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 32

slide-30
SLIDE 30

Finding the Majority Element: Example

  • R. Boyer and J. S. Moore Algorithm

Initialize: mem=∅ and counter=0 When element at arrives if (counter == 0) set mem=at and counter=1 else if (at == mem) then counter++ else counter−− (discard at and a copy of mem) Return mem. E, D, B, D, D5, D, B, B, B, B, B11, E, E, E, E, E16 at E D B D D D B B B B B . . . mem E E B B D D D D B B B . . . counter 1 1 1 2 1 1 2 3 . . .

Chandra & Ruta (UIUC) CS473 14 Fall 2016 14 / 32

slide-31
SLIDE 31

Finding a Majority Element

Correctness, if majority element

Lemma

If there is a majority element, the algorithm will output it.

Proof.

Decreasing counter is like throwing away a copy of element in mem.

Chandra & Ruta (UIUC) CS473 15 Fall 2016 15 / 32

slide-32
SLIDE 32

Finding a Majority Element

Correctness, if majority element

Lemma

If there is a majority element, the algorithm will output it.

Proof.

Decreasing counter is like throwing away a copy of element in mem. We do this every time at is different than mem, and there are less than half such at. Even if we are throwing away the majority element every time, since they are more than half all cannot be thrown away.

Chandra & Ruta (UIUC) CS473 15 Fall 2016 15 / 32

slide-33
SLIDE 33

Finding a Majority Element

Correctness, if majority element

Lemma

If there is a majority element, the algorithm will output it.

Proof.

Decreasing counter is like throwing away a copy of element in mem. We do this every time at is different than mem, and there are less than half such at. Even if we are throwing away the majority element every time, since they are more than half all cannot be thrown away. In fact at any time t, mem contains majority element of sub-stream a[1..t], if any.

Chandra & Ruta (UIUC) CS473 15 Fall 2016 15 / 32

slide-34
SLIDE 34

Finding a Majority Element

Correctness, if majority element

Gang war interpretation!

Every element is a gang member. When we have two members from different gangs, they shoot each other.

Chandra & Ruta (UIUC) CS473 16 Fall 2016 16 / 32

slide-35
SLIDE 35

Finding a Majority Element

Correctness, if majority element

Gang war interpretation!

Every element is a gang member. When we have two members from different gangs, they shoot each other. If there is a gang with more than n/2 members, that will be the only

  • ne whose members will survive!

Chandra & Ruta (UIUC) CS473 16 Fall 2016 16 / 32

slide-36
SLIDE 36

ǫ-Heavy Hitters

Definition

Given a stream S = a1, a2, ..., define count of element e at any time t to be countt(e) = |{i ≤ t | ai = e}| It is called ǫ-heavy hitter at time t if countt(e) > ǫt.

Chandra & Ruta (UIUC) CS473 17 Fall 2016 17 / 32

slide-37
SLIDE 37

ǫ-Heavy Hitters

Definition

Given a stream S = a1, a2, ..., define count of element e at any time t to be countt(e) = |{i ≤ t | ai = e}| It is called ǫ-heavy hitter at time t if countt(e) > ǫt.

Goal:

Maintain a structure containing all the ǫ-heavy hitters so far. At any point there are at most 1/ǫ such elements.

Chandra & Ruta (UIUC) CS473 17 Fall 2016 17 / 32

slide-38
SLIDE 38

ǫ-Heavy Hitters

Definition

Given a stream S = a1, a2, ..., define count of element e at any time t to be countt(e) = |{i ≤ t | ai = e}| It is called ǫ-heavy hitter at time t if countt(e) > ǫt.

Goal:

Maintain a structure containing all the ǫ-heavy hitters so far. At any point there are at most 1/ǫ such elements.

Crucial Note: false positive are OK, but no false negative

We are NOT allowed to miss any heavy-hitters, but we could store non-heavy-hitters.

Chandra & Ruta (UIUC) CS473 17 Fall 2016 17 / 32

slide-39
SLIDE 39

ǫ-Heavy Hitters: Example

If ǫ = 1/2 then the majority element!

Chandra & Ruta (UIUC) CS473 18 Fall 2016 18 / 32

slide-40
SLIDE 40

ǫ-Heavy Hitters: Example

If ǫ = 1/2 then the majority element! E, D, B, D, D5, D, B, B, B, B, B11, E, E, E, E, E16 1/3-heavy hitters At time 5, it is D. At time 11, both B and D. At time 15, none! At time 16, it is E. As time passes, the set of heavy hitters may change completely.

Chandra & Ruta (UIUC) CS473 18 Fall 2016 18 / 32

slide-41
SLIDE 41

ǫ-Heavy Hitters: Algorithm

If ǫ = 1/2 then the majority element! Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters Initialize: C[j] = 0 and T[j] = ∅ for all i.

Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 32

slide-42
SLIDE 42

ǫ-Heavy Hitters: Algorithm

If ǫ = 1/2 then the majority element! Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters Initialize: C[j] = 0 and T[j] = ∅ for all i. When element at arrives, If (at == T[j] for some j ≤ k), then C[j] + +.

Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 32

slide-43
SLIDE 43

ǫ-Heavy Hitters: Algorithm

If ǫ = 1/2 then the majority element! Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters Initialize: C[j] = 0 and T[j] = ∅ for all i. When element at arrives, If (at == T[j] for some j ≤ k), then C[j] + +. Else if (C[j] == 0 for some j ≤ k), then

Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 32

slide-44
SLIDE 44

ǫ-Heavy Hitters: Algorithm

If ǫ = 1/2 then the majority element! Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters Initialize: C[j] = 0 and T[j] = ∅ for all i. When element at arrives, If (at == T[j] for some j ≤ k), then C[j] + +. Else if (C[j] == 0 for some j ≤ k), then Set T[j] ← at and C[j] ← 1.

Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 32

slide-45
SLIDE 45

ǫ-Heavy Hitters: Algorithm

If ǫ = 1/2 then the majority element! Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters Initialize: C[j] = 0 and T[j] = ∅ for all i. When element at arrives, If (at == T[j] for some j ≤ k), then C[j] + +. Else if (C[j] == 0 for some j ≤ k), then Set T[j] ← at and C[j] ← 1. Else do C[j] − − for all j. (discard at and a copy of all T[j])

Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 32

slide-46
SLIDE 46

ǫ-Heavy Hitters: Algorithm

If ǫ = 1/2 then the majority element! Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters Initialize: C[j] = 0 and T[j] = ∅ for all i. When element at arrives, If (at == T[j] for some j ≤ k), then C[j] + +. Else if (C[j] == 0 for some j ≤ k), then Set T[j] ← at and C[j] ← 1. Else do C[j] − − for all j. (discard at and a copy of all T[j]) Same as the Majority algorithm for ǫ = 1/2.

Chandra & Ruta (UIUC) CS473 19 Fall 2016 19 / 32

slide-47
SLIDE 47

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: estt(e) ≤ countt(e) ≤ estt(e) + ǫt

Chandra & Ruta (UIUC) CS473 20 Fall 2016 20 / 32

slide-48
SLIDE 48

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: estt(e) ≤ countt(e) ≤ estt(e) + ǫt For each element, count is maintained up to ǫt error!

Chandra & Ruta (UIUC) CS473 20 Fall 2016 20 / 32

slide-49
SLIDE 49

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: estt(e) ≤ countt(e) ≤ estt(e) + ǫt For each element, count is maintained up to ǫt error! If e is not an ǫ-heavy hitter then countt(e) ≤ ǫt, and hence estt(e) = 0 is correct up to ǫt error.

Chandra & Ruta (UIUC) CS473 20 Fall 2016 20 / 32

slide-50
SLIDE 50

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: estt(e) ≤ countt(e) ≤ estt(e) + ǫt

Corollary

For any time t, T contains all the ǫ-heavy hitters in a[1..t].

Proof.

If e is a heavy hitter at time t then countt(e) > ǫt.

Chandra & Ruta (UIUC) CS473 21 Fall 2016 21 / 32

slide-51
SLIDE 51

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: estt(e) ≤ countt(e) ≤ estt(e) + ǫt

Corollary

For any time t, T contains all the ǫ-heavy hitters in a[1..t].

Proof.

If e is a heavy hitter at time t then countt(e) > ǫt.Using the lemma, estt(e) ≥ countt(e) − ǫt

Chandra & Ruta (UIUC) CS473 21 Fall 2016 21 / 32

slide-52
SLIDE 52

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: estt(e) ≤ countt(e) ≤ estt(e) + ǫt

Corollary

For any time t, T contains all the ǫ-heavy hitters in a[1..t].

Proof.

If e is a heavy hitter at time t then countt(e) > ǫt.Using the lemma, estt(e) ≥ countt(e) − ǫt > 0

Chandra & Ruta (UIUC) CS473 21 Fall 2016 21 / 32

slide-53
SLIDE 53

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: 0 ≤ countt(e) − estt(e) ≤

t k+1 ≤ ǫt

Proof.

Counter for e increases only when we see e, ∴ estt(e) ≤ countt(e).

Chandra & Ruta (UIUC) CS473 22 Fall 2016 22 / 32

slide-54
SLIDE 54

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: 0 ≤ countt(e) − estt(e) ≤

t k+1 ≤ ǫt

Proof.

Counter for e increases only when we see e, ∴ estt(e) ≤ countt(e). countt(e) − estt(e) increases by one, when we decrease all k counters, and see an element outside T

Chandra & Ruta (UIUC) CS473 22 Fall 2016 22 / 32

slide-55
SLIDE 55

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: 0 ≤ countt(e) − estt(e) ≤

t k+1 ≤ ǫt

Proof.

Counter for e increases only when we see e, ∴ estt(e) ≤ countt(e). countt(e) − estt(e) increases by one, when we decrease all k counters, and see an element outside T this is like discarding k + 1 elements. up to time t, we have only t elements to discard

Chandra & Ruta (UIUC) CS473 22 Fall 2016 22 / 32

slide-56
SLIDE 56

ǫ-Heavy Hitters

Algorithm Analysis

At any time t, our estimates are: estt(e) = C[j] if e = T[j] =

  • therwise

Lemma

Estimates satisfy: 0 ≤ countt(e) − estt(e) ≤

t k+1 ≤ ǫt

Proof.

Counter for e increases only when we see e, ∴ estt(e) ≤ countt(e). countt(e) − estt(e) increases by one, when we decrease all k counters, and see an element outside T this is like discarding k + 1 elements. up to time t, we have only t elements to discard So at most t/(k + 1) such increases.

Chandra & Ruta (UIUC) CS473 22 Fall 2016 22 / 32

slide-57
SLIDE 57

ǫ-Heavy Hitters: Algorithm

Space usage

Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters . . . Maintains O(1/ǫ) counters and elements.

Chandra & Ruta (UIUC) CS473 23 Fall 2016 23 / 32

slide-58
SLIDE 58

ǫ-Heavy Hitters: Algorithm

Space usage

Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters . . . Maintains O(1/ǫ) counters and elements. O(log t) for each counter. O(log Σ) for each element, where Σ is the description of largest element.

Chandra & Ruta (UIUC) CS473 23 Fall 2016 23 / 32

slide-59
SLIDE 59

ǫ-Heavy Hitters: Algorithm

Space usage

Set k = ⌈1/ǫ⌉ − 1. (if ǫ = 1/2 then k = 1)

Algorithm

Keep an array T[1, . . . , k] to hold elements Keep an array C[1, . . . , k] to hold their counters . . . Maintains O(1/ǫ) counters and elements. O(log t) for each counter. O(log Σ) for each element, where Σ is the description of largest element. Total: O(1/ǫ(log t + log Σ)). Recall: maintains counts for all elements up to ǫt error.

Chandra & Ruta (UIUC) CS473 23 Fall 2016 23 / 32

slide-60
SLIDE 60

Part II Use of Hash Functions

Chandra & Ruta (UIUC) CS473 24 Fall 2016 24 / 32

slide-61
SLIDE 61

Maintaining Counts

Problem Statement:

At any time t, given an element e, estimate the number of times an e appeared so far.

Chandra & Ruta (UIUC) CS473 25 Fall 2016 25 / 32

slide-62
SLIDE 62

Maintaining Counts

Problem Statement:

At any time t, given an element e, estimate the number of times an e appeared so far. If error up to ǫt is OK, then we can use ǫ-heavy hitter algorithm.

Chandra & Ruta (UIUC) CS473 25 Fall 2016 25 / 32

slide-63
SLIDE 63

Maintaining Counts

Problem Statement:

At any time t, given an element e, estimate the number of times an e appeared so far. If error up to ǫt is OK, then we can use ǫ-heavy hitter algorithm. It takes O(1/ǫ(log t + log Σ)) space.

Chandra & Ruta (UIUC) CS473 25 Fall 2016 25 / 32

slide-64
SLIDE 64

Maintaining Counts

Problem Statement:

At any time t, given an element e, estimate the number of times an e appeared so far. If error up to ǫt is OK, then we can use ǫ-heavy hitter algorithm. It takes O(1/ǫ(log t + log Σ)) space. Can we do better? Yes – Bloom filter like idea

Chandra & Ruta (UIUC) CS473 25 Fall 2016 25 / 32

slide-65
SLIDE 65

Recall: Bloom Filter

Storage for inserts and lookups Sample hash functions h1, . . . , hd independently and uniformly at random. Insert(e) For i = 1...d Set Ti[hi(e)] ← 1 Lookup(e) For i = 1...d If (Ti[hi(e)] == 0) then return “No” Return “Yes” If e inserted, then Lookup(e) will always return “Yes”.

Chandra & Ruta (UIUC) CS473 26 Fall 2016 26 / 32

slide-66
SLIDE 66

Recall: Bloom Filter

Storage for inserts and lookups Sample hash functions h1, . . . , hd independently and uniformly at random. Insert(e) For i = 1...d Set Ti[hi(e)] ← 1 Lookup(e) For i = 1...d If (Ti[hi(e)] == 0) then return “No” Return “Yes” If e inserted, then Lookup(e) will always return “Yes”. e not inserted, but still it can return “Yes” with very low probability. Due to some e′s being inserted with hi(e′) = hi(e). If Pr[e not inserted and Ti[hi(e)] = 1] ≤ α, then combined error probability would be at most αd.

Chandra & Ruta (UIUC) CS473 26 Fall 2016 26 / 32

slide-67
SLIDE 67

Count Min-Sketch

By G. Cormode and S. M. Muthukrishnan’05

Keep d arrays C1, ..., Cd, each to hold m counters. H: 2-universal family of hash functions h : U → {0, . . . , m − 1}. Sample h1, . . . , hd independently and uniformly at random from H.

Chandra & Ruta (UIUC) CS473 27 Fall 2016 27 / 32

slide-68
SLIDE 68

Count Min-Sketch

By G. Cormode and S. M. Muthukrishnan’05

Keep d arrays C1, ..., Cd, each to hold m counters. H: 2-universal family of hash functions h : U → {0, . . . , m − 1}. Sample h1, . . . , hd independently and uniformly at random from H. CMInsert(e) For i = 1...d Do Ci[hi(e)] + + CMEstimate(e) est ← ∞ For i = 1...d est ← min{est, Ci[hi(e)]} Return est As element at arrives at time t, call CMInsert(at). To get count of e at any time t, call CMEstimate(e).

Chandra & Ruta (UIUC) CS473 27 Fall 2016 27 / 32

slide-69
SLIDE 69

Count Min-Sketch

By G. Cormode and S. M. Muthukrishnan’05

CMInsert(e) For i = 1...d Do Ci[hi(e)] + + CMEstimate(e) est ← ∞ For i = 1...d est ← min{est, Ci[hi(e)]} Return est At time t, let estt(e) = CMEstimate(e). Observation: estt(e) ≥ countt(e). Question: How big (estt(e) − countt(e)) can be?

Chandra & Ruta (UIUC) CS473 28 Fall 2016 28 / 32

slide-70
SLIDE 70

Count Min-Sketch

By G. Cormode and S. M. Muthukrishnan’05

CMInsert(e) For i = 1...d Do Ci[hi(e)] + + CMEstimate(e) est ← ∞ For i = 1...d est ← min{est, Ci[hi(e)]} Return est At time t, let estt(e) = CMEstimate(e). Observation: estt(e) ≥ countt(e). Question: How big (estt(e) − countt(e)) can be? Recall: Any e, y ∈ U, if e = y then Pr[hi(y) = hi(e)] = 1

m ∀i.

Chandra & Ruta (UIUC) CS473 28 Fall 2016 28 / 32

slide-71
SLIDE 71

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

For simplicity let f′

e = estt(e) and fe = countt(e). Bound f′ e − fe.

Observations:

Chandra & Ruta (UIUC) CS473 29 Fall 2016 29 / 32

slide-72
SLIDE 72

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

For simplicity let f′

e = estt(e) and fe = countt(e). Bound f′ e − fe.

Observations: Define indicator variable Xi,e,y = [hi(y) = hi(e)]. E[Xi,e,y] = Pr[hi(y) = hi(e)] = 1/m

Chandra & Ruta (UIUC) CS473 29 Fall 2016 29 / 32

slide-73
SLIDE 73

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

For simplicity let f′

e = estt(e) and fe = countt(e). Bound f′ e − fe.

Observations: Define indicator variable Xi,e,y = [hi(y) = hi(e)]. E[Xi,e,y] = Pr[hi(y) = hi(e)] = 1/m Let Xi,e :=

y=e Xi,e,yfy be the total over counting at Ci[hi(e)].

Ci[hi(e)] = Xi,e + fe

Chandra & Ruta (UIUC) CS473 29 Fall 2016 29 / 32

slide-74
SLIDE 74

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

For simplicity let f′

e = estt(e) and fe = countt(e). Bound f′ e − fe.

Observations: Define indicator variable Xi,e,y = [hi(y) = hi(e)]. E[Xi,e,y] = Pr[hi(y) = hi(e)] = 1/m Let Xi,e :=

y=e Xi,e,yfy be the total over counting at Ci[hi(e)].

Ci[hi(e)] = Xi,e + fe and since at most t elements have arrived so far, E[Xi,e] =

  • y=e

E[Xi,e,y] fy = 1 m

  • y=e

fy ≤ t m

Chandra & Ruta (UIUC) CS473 29 Fall 2016 29 / 32

slide-75
SLIDE 75

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Ci[hi(e)] = Xi,e + fe and E[Xi,e] ≤

t m.

For ǫ > 0 Pr[Ci[hi(e)] − fe ≥ ǫt] = Pr[Xi,e ≥ ǫt] [definition]

Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 32

slide-76
SLIDE 76

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Ci[hi(e)] = Xi,e + fe and E[Xi,e] ≤

t m.

For ǫ > 0 Pr[Ci[hi(e)] − fe ≥ ǫt] = Pr[Xi,e ≥ ǫt] [definition] ≤

E[Xi,e] ǫt

[Markov’s inequality]

Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 32

slide-77
SLIDE 77

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Ci[hi(e)] = Xi,e + fe and E[Xi,e] ≤

t m.

For ǫ > 0 Pr[Ci[hi(e)] − fe ≥ ǫt] = Pr[Xi,e ≥ ǫt] [definition] ≤

E[Xi,e] ǫt

[Markov’s inequality] ≤

t/m ǫt = 1 mǫ

[derived above]

Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 32

slide-78
SLIDE 78

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Ci[hi(e)] = Xi,e + fe and E[Xi,e] ≤

t m.

For ǫ > 0 Pr[Ci[hi(e)] − fe ≥ ǫt] = Pr[Xi,e ≥ ǫt] [definition] ≤

E[Xi,e] ǫt

[Markov’s inequality] ≤

t/m ǫt = 1 mǫ

[derived above] Recall: f′

e = estt(e) = mind i=1 Ci[hi(e)].

Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 32

slide-79
SLIDE 79

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Ci[hi(e)] = Xi,e + fe and E[Xi,e] ≤

t m.

For ǫ > 0 Pr[Ci[hi(e)] − fe ≥ ǫt] = Pr[Xi,e ≥ ǫt] [definition] ≤

E[Xi,e] ǫt

[Markov’s inequality] ≤

t/m ǫt = 1 mǫ

[derived above] Recall: f′

e = estt(e) = mind i=1 Ci[hi(e)].

Pr

  • f′

e − fe ≥ ǫt

  • =

Pr[Ci[hi(e)] − fe ≥ ǫt for all i]

Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 32

slide-80
SLIDE 80

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Ci[hi(e)] = Xi,e + fe and E[Xi,e] ≤

t m.

For ǫ > 0 Pr[Ci[hi(e)] − fe ≥ ǫt] = Pr[Xi,e ≥ ǫt] [definition] ≤

E[Xi,e] ǫt

[Markov’s inequality] ≤

t/m ǫt = 1 mǫ

[derived above] Recall: f′

e = estt(e) = mind i=1 Ci[hi(e)].

Pr

  • f′

e − fe ≥ ǫt

  • =

Pr[Ci[hi(e)] − fe ≥ ǫt for all i] = Pr[Xi,e ≥ ǫt for all i]

Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 32

slide-81
SLIDE 81

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Ci[hi(e)] = Xi,e + fe and E[Xi,e] ≤

t m.

For ǫ > 0 Pr[Ci[hi(e)] − fe ≥ ǫt] = Pr[Xi,e ≥ ǫt] [definition] ≤

E[Xi,e] ǫt

[Markov’s inequality] ≤

t/m ǫt = 1 mǫ

[derived above] Recall: f′

e = estt(e) = mind i=1 Ci[hi(e)].

Pr

  • f′

e − fe ≥ ǫt

  • =

Pr[Ci[hi(e)] − fe ≥ ǫt for all i] = Pr[Xi,e ≥ ǫt for all i] = Πd

i=1 Pr[Xi,e ≥ ǫt]

[independence of hi’s]

Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 32

slide-82
SLIDE 82

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Ci[hi(e)] = Xi,e + fe and E[Xi,e] ≤

t m.

For ǫ > 0 Pr[Ci[hi(e)] − fe ≥ ǫt] = Pr[Xi,e ≥ ǫt] [definition] ≤

E[Xi,e] ǫt

[Markov’s inequality] ≤

t/m ǫt = 1 mǫ

[derived above] Recall: f′

e = estt(e) = mind i=1 Ci[hi(e)].

Pr

  • f′

e − fe ≥ ǫt

  • =

Pr[Ci[hi(e)] − fe ≥ ǫt for all i] = Pr[Xi,e ≥ ǫt for all i] = Πd

i=1 Pr[Xi,e ≥ ǫt]

[independence of hi’s] ≤ 1

ǫm

d [derived above]

Chandra & Ruta (UIUC) CS473 30 Fall 2016 30 / 32

slide-83
SLIDE 83

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Pr[estt(e) − countt(e) ≥ ǫt] ≤ 1 ǫm d

Chandra & Ruta (UIUC) CS473 31 Fall 2016 31 / 32

slide-84
SLIDE 84

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Pr[estt(e) − countt(e) ≥ ǫt] ≤ 1 ǫm d Set m = ⌈2/ǫ⌉ and d = ⌈lg 1/δ⌉ gives us Pr[estt(e) − countt(e) ≥ ǫt] ≤ δ

Chandra & Ruta (UIUC) CS473 31 Fall 2016 31 / 32

slide-85
SLIDE 85

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Pr[estt(e) − countt(e) ≥ ǫt] ≤ 1 ǫm d Set m = ⌈2/ǫ⌉ and d = ⌈lg 1/δ⌉ gives us Pr[estt(e) − countt(e) ≥ ǫt] ≤ δ Space: m ∗ d counters each of size lg(t) = O( 1

ǫ lg 1/δ lg t).

Chandra & Ruta (UIUC) CS473 31 Fall 2016 31 / 32

slide-86
SLIDE 86

Count Min-Sketch: Analysis

By G. Cormode and S. M. Muthukrishnan’05

Lemma

Given ǫ, δ > 0, we can estimate countt(e), at any time t for any element e, up to ǫt error with probability at least (1 − δ) using O( 1

ǫ lg 1/δ) many counters.

Chandra & Ruta (UIUC) CS473 32 Fall 2016 32 / 32