Stream Statistics Over Sliding Window Sum Problem Trends - - PowerPoint PPT Presentation

stream statistics over sliding window
SMART_READER_LITE
LIVE PREVIEW

Stream Statistics Over Sliding Window Sum Problem Trends - - PowerPoint PPT Presentation

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Stream Statistics Over Sliding Window Sum Problem Trends References Anil Maheshwari School of Computer Science Carleton University Canada Outline Stream


slide-1
SLIDE 1

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Stream Statistics Over Sliding Window

Anil Maheshwari

School of Computer Science Carleton University Canada

slide-2
SLIDE 2

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Outline

1

Introduction

2

Algorithm

3

Sum Problem

4

Trends

5

References

slide-3
SLIDE 3

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Problem Setting

Main Problem

The input is an endless stream of binary bits. At any time, among the last N bits received, we are interested in queries that seek an approximate count of the number of 1’s in the stream among the last k bits, where k ≤ N. Result: A data structure of size O( 1

ǫ log2 N) that can

approximate the count of the number of 1s within a factor

  • f 1 ± ǫ

Reference: Maintaining stream statistics over sliding windows by Datar, Gionis, Indyk, and Motwani, SIAM Jl. Computing 2002

slide-4
SLIDE 4

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Variants

1

A stream of positive numbers. The query consists of a value k ∈ {1, . . . , N}, and we want to know the (approximate) sum of the last k numbers in the

  • stream. (Uses sublinear space.)

2

A stream consisting of numbers from the set {−1, 0, +1}. We want to maintain the sum of last N numbers of the stream. (Requires Ω(N) bits of storage to approximate the sum that is within a constant factor of the exact sum.)

3

What are the most popular movies in the last week?

4

What is trending in the last week?

slide-5
SLIDE 5

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Main Problem

Main Problem

Report an approximate count of the number of 1’s in the stream of binary bits among the last k bits, where k ≤ N. What about Exact Count?

slide-6
SLIDE 6

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Algorithm for Approximate Count

Algorithm uses two structures: Time Stamps: To track the most recent N bits. Buckets: With the following features: O(log N) buckets maintain the 1’s among the latest N bits Number of 1’s in a bucket is a power of 2 Each 1-bit is assigned to exactly one bucket (0-bit may or may not be assigned to any bucket) At most two buckets of a given size (size = #1s) Each bucket stores time stamp of its most recent bit Most recent bit of any bucket is 1-bit

slide-7
SLIDE 7

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Algorithm contd.

On receiving a new bit in the data stream: 0-bit: Increment the time stamp of each of the buckets by 1, and if any of the buckets time stamp exceeds N, we discard that bucket. 1-bit: Following updates are done:

1

Create a bucket B0 consisting of the newest 1-bit with a time stamp of 1.

2

Scan the list of buckets in order of increasing size. Case 1: Two buckets of size 1. Increment time stamp of each bucket (and possibly discard buckets whose time stamps exceed N) Case 2: Three buckets of type B0.

slide-8
SLIDE 8

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Illustration

1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0

N A B

1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0

C

1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0

B0 B0 B1 B1 B2 B0 B0 B1 B1 B2 B0 B1 B2 B2 B0 B0 B1 B2 B2 B0 B0 B1 B2 D E Time Stamp 1 Time Stamp N Unseen part of the stream

slide-9
SLIDE 9

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Space Complexity

We have: O(log N) buckets as the size of window is N Bucket Bi stores 2i 1-bits For each bucket we store its time stamp and its size Time stamps requires O(log N) bits Storing i with bucket Bi is sufficient for its size As 0 ≤ i ≤ log N, i can represented using O(log log N) bits Total space required O(log N(log N + log log N)) = O(log2 N) bits

slide-10
SLIDE 10

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Time Complexity

On receiving a 0-bit:

  • We update time stamps of each of the O(log N) buckets
  • Requires O(log N) time

On receiving a 1-bit:

  • We update the time stamps of each bucket
  • Potentially merge & cascade buckets
  • Time (merge & cascade) ≈ # of buckets
  • Can be performed in O(log N) time

Total Time (per update): O(log n)

slide-11
SLIDE 11

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Answering Query

Query Problem

For any query value k ∈ {1, . . . , N}, report an approximate count of the number of 1’s among the latest k bits of the stream.

1

Initialize count := 0

2

Traverse buckets from right to left. For each bucket of type Bi that is encountered in the traversal:

1

Bi is completely contained in the window: Increment count by 2i

2

Bi is completely outside the window: count remains unchanged

3

Partially overlaps the window: Increment count by 2i

2

3

Report count as an approximate count.

slide-12
SLIDE 12

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Analysis of Approximation Factor

Observation: Except of one bucket, say Bj, that is partially in the window of size k, we know that all buckets

  • f type B0, B1, . . . , Bj−1 are completely within the window.
  • For those buckets, the count of the number of 1-bits is

j−1

  • i=0

2i ≥ 2j − 1

  • The true count (and the approximate count) value is at

least 2j (as the last bit of Bj is in the window of interest)

  • For the bucket Bj that overlaps partially with the

window, the number of bits that can be in the true count can be anywhere from 0 upto 2j − 1. But we only took a contribution of 2j−1 in the reported count value

  • Ratio of the true count to the reported count is within a

factor of ( 1

2, 2).

slide-13
SLIDE 13

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Refining the Analysis

  • Let r ≥ 2 be an integer parameter.
  • Maintain r − 1 or r copies of Bi for each i ≥ 1 (buckets
  • f type B0 may be less than r − 1)
  • At any time we exceed r copies of any type of buckets,

we take the oldest two buckets and merge them to form a new bucket of the next size.

  • For the query, assume that the bucket labelled Bj is only

partially overlapping the query window.

  • At least 1 +

j−1

  • i=1

(r − 1)2i 1-bits are in the query window.

  • True count and the reported value are within a factor of

1 ±

1 r−1

= ⇒ By setting r = 1 + 1

ǫ, we obtain a data structure of

size O( 1

ǫ log2 N) that approximates the count of the

number of 1s within a factor of 1 ± ǫ.

slide-14
SLIDE 14

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Computation of Sum

The Sum Problem

A stream of positive numbers. The query consists of a value k ∈ {1, . . . , N}, and we want to know the (approximate) sum of the last k numbers in the stream. 5 7 2 3 9 4 1 6 11 2 4 3

slide-15
SLIDE 15

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Approach I: Computation of Sum

Assuming d-bit numbers. For each bit position, maintain a

  • stream. Approximate number of 1′s in each stream.

Report approximate sum value as

d−1

  • i=0

count(i)2i 5 7 2 3 9 4 1 6 11 2 4 3

slide-16
SLIDE 16

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Approach II: Computation of Sum

If the next number in the stream is x, insert x 1′s in the stream 5 7 2 3 9 4 1 6 11 2 4 3

slide-17
SLIDE 17

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

What is Trending?

Among the last 1012 movie tickets sold, list all popular movies? Let c := 10−3. Maintain (decaying) scores for movies whose threshold is at least τ ∈ (0, 1). For each new sale

  • f ticket (say for Movie M):

1

For each movie whose score is being maintained, its new score is reduced by a factor of (1 − c)

2

If we have the score of M, add 1 to that score. Otherwise, create a new score for M and initialize it to 1

3

Remove any score that falls below τ

slide-18
SLIDE 18

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Questions

1

How many scores are maintained at any given time?

2

What is sum of all scores at any point of time?

3

Answer above questions for τ = 1

2 and 1 3.

slide-19
SLIDE 19

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Sum Problem Trends References

Conclusions

Main References:

1

Maintaining stream statistics over sliding windows, by Datar, Gionis, Indyk, and Motwani, SIAM Jl. Computing 2002.

2

Chapter in MMDS book (mmds.org)

3

Chapter on Data Streams in My Notes on Topics in Algorithm Design