stream statistics over sliding window
play

Stream Statistics Over Sliding Window Algorithm Sum Problem Trends - PDF document

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Stream Statistics Over Sliding Window Algorithm Sum Problem Trends References Anil Maheshwari anil@scs.carleton.ca School of Computer Science Carleton University Canada


  1. Stream Statistics Over Sliding Window Anil Maheshwari Introduction Stream Statistics Over Sliding Window Algorithm Sum Problem Trends References Anil Maheshwari anil@scs.carleton.ca School of Computer Science Carleton University Canada

  2. Outline Stream Statistics Over Sliding Window Anil Maheshwari Introduction Introduction 1 Algorithm Sum Problem Trends Algorithm 2 References Sum Problem 3 Trends 4 References 5

  3. Problem Setting Stream Statistics Over Sliding Window Anil Maheshwari Main Problem Introduction The input is an endless stream of binary bits. At any time, Algorithm among the last N bits received, we are interested in Sum Problem queries that seek an (approximate) count of the number Trends of 1 ’s in the stream among the last k bits, where k ≤ N . References ✏ log 2 N ) that can Result: A data structure of size O ( 1 approximate the count of the number of 1 s within a factor of 1 ± ✏ Reference: Maintaining stream statistics over sliding windows by Datar, Gionis, Indyk, and Motwani, SIAM Jl. Computing 2002

  4. Variants Stream Statistics Over Sliding Window Anil Maheshwari A stream of positive numbers. The query consists of 1 Introduction a value k ∈ { 1 , . . . , N } , and we want to know the Algorithm (approximate) sum of the last k numbers in the Sum Problem stream. Trends A stream consisting of numbers from the set 2 References { − 1 , 0 , +1 } . We want to maintain the sum of last N numbers of the stream. (Requires Ω ( N ) bits of storage to approximate the sum that is within a constant factor of the exact sum.) What are the most popular movies in the last week? 3 What is trending in the last week? 4 . . . 5

  5. Main Problem Stream Statistics Over Sliding Window Anil Maheshwari Main Problem Introduction Report an approximate count of the number of 1 ’s in the Algorithm stream of binary bits among the last k bits, where k ≤ N . Sum Problem Trends References What about Exact Count?

  6. Data Structure Stream Statistics Over Sliding Window Anil Maheshwari Algorithm uses two data structures: Introduction Time Stamps: To track the most recent N bits. Algorithm Buckets: O (log N ) buckets maintain the 1 ’s among the Sum Problem latest N bits. Trends References

  7. Update Stream Statistics Over Sliding Window Anil Maheshwari Introduction 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 Algorithm Sum Problem Trends References 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1

  8. Complexity Analysis Stream Statistics Over Sliding Window Anil Maheshwari Space: O (log 2 N ) bits 1 Introduction Total Time (per update): O (log N ) 2 Algorithm Sum Problem Trends References

  9. Answering Query Stream Statistics Over Sliding Window Anil Maheshwari Query Problem Introduction For any query value k ∈ { 1 , . . . , N } , report an Algorithm approximate count of the number of 1 ’s among the latest Sum Problem k bits of the stream. Trends References Initialize C := 0 1 Traverse buckets from right to left. For each bucket of 2 type B i that is encountered in the traversal: B i is completely contained in the window: 1 C := C + 2 i B i is completely outside the window: 2 C remains unchanged Partially overlaps the window: 3 C := C + 2 i 2 Report C as an approximate count 3

  10. Analysis of Approximation Factor Stream Statistics Over Sliding Window Anil Maheshwari 2-factor approximation Introduction Let C ⇤ be the true count of number of 1 s in the query Algorithm window of size k . Then, 1 C C ∗ ≤ 2 . 2 ≤ Sum Problem Trends References

  11. Improvements Stream Statistics Over Sliding Window Anil Maheshwari Let r > 2 be an integer parameter. Introduction Maintain r − 1 or r copies of B i for each i ≥ 1 Algorithm ( B 0 and the largest bucket may have fewer) Sum Problem At any time we exceed r copies of any type of buckets, we Trends take the oldest two buckets and merge them to form a new References bucket of the next size. Answer queries as before.

  12. Imrovements (contd.) Stream Statistics Over Sliding Window Anil Maheshwari Claim Introduction 1 C 1 For this setting, we have 1 − C ∗ ≤ 1 + r � 1 . r � 1 ≤ Algorithm ✏ log 2 N ) If r = 1 + 1 ✏ , we obtain a data structure of size O ( 1 Sum Problem Trends that approximates the count of the number of 1 s within a References factor of 1 ± ✏ . j � 1 2 i True Count ≥ 1 + ( r − 1) P i =1 Error ≤ 2 j � 1 − 1 2 j − 1 � 1 1 Therefore, 2 i ≤ r � 1 j − 1 1+( r � 1) P i =1

  13. Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari The Sum Problem Introduction A stream of positive numbers. The query consists of a Algorithm value k ∈ { 1 , . . . , N } , and we want to know the Sum Problem (approximate) sum of the last k numbers in the stream. Trends References 5 7 2 3 9 4 1 6 11 2 4 3

  14. Approach I: Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari If the next number in the stream is x , insert x 1 0 s in the Introduction stream Algorithm Sum Problem 5 7 2 3 9 4 1 6 11 2 4 3 Trends References

  15. Approach II: Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari Assuming d -bit numbers. For each bit position i , maintain Introduction a stream. Let C i be the value of approximate number of d � 1 Algorithm 2 i C i 1 0 s in the stream i . Report approximate sum as P Sum Problem i =0 Trends References 5 7 2 3 9 4 1 6 11 2 4 3 2 3 0 0 0 0 1 0 0 0 1 0 0 0 2 2 0 1 0 0 0 1 0 1 0 0 1 0 2 1 0 1 1 1 0 0 0 1 1 1 0 1 2 0 1 1 0 1 1 0 1 0 1 0 0 1

  16. What is Trending? Stream Statistics Over Sliding Window Anil Maheshwari Among the last 10 12 movie tickets sold, list all popular Introduction movies? Algorithm Sum Problem Let c := 10 � 3 . Maintain (decaying) scores for movies Trends whose threshold is at least ⌧ ∈ (0 , 1) . For each new sale References of ticket (say for Movie M ): For each movie whose score is being maintained, its 1 new score is reduced by a factor of (1 − c ) If we have the score of M , add 1 to that score. 2 Otherwise, create a new score for M and initialize it to 1 Remove any score that falls below ⌧ 3

  17. Questions Stream Statistics Over Sliding Window Anil Maheshwari How many scores are maintained at any given time? 1 Introduction What is sum of all scores at any point of time? 2 Algorithm Answer above questions for ⌧ = 1 2 and 1 3 . 3 Sum Problem Trends References

  18. Variants Stream Statistics Over Sliding Window Anil Maheshwari Min/Max 1 Introduction Stream with ± numbers 2 Algorithm Lower Bounds: Results are more-or-less optimal up 3 Sum Problem to constant factors Trends References . . . 4

  19. Conclusions Stream Statistics Over Sliding Window Anil Maheshwari Main References: Introduction Maintaining stream statistics over sliding windows, by Algorithm 1 Sum Problem Datar, Gionis, Indyk, and Motwani, SIAM Jl. Trends Computing 2002. References Chapter in MMDS book (mmds.org) 2 Chapter on Data Streams in My Notes on Topics in 3 Algorithm Design

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend