over a sliding window
play

over a Sliding Window Costas Busch Rensselaer Polytechnic Institute - PowerPoint PPT Presentation

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1 Outline of Talk Introduction Algorithm Analysis 2 Time 1 C


  1. A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1

  2. Outline of Talk Introduction Algorithm Analysis 2

  3. Time 1 C t t t t t Data stream: 1 2 3 5 4 v v v v v 1 2 3 5 4 For simplicity assume unit valued elements 3

  4. Most recent time window of duration W 1 C Current time t t t t t Data stream: 1 2 3 5 4 v v v v v 1 2 3 5 4 Goal: Compute the sum of elements with time stamps in time window [ C  W , C ]  v i C W t C    i 4

  5. Example I: All packets on a network link, maintain the number of different ip sources in the last one hour Example II: Large database, continuously maintain averages and frequency moments 5

  6. t t t t t Data stream: 1 2 3 4 5 v v v v v 1 2 3 5 4 Synchronous stream t i : In ascending order Asynchronous stream t i : No order guaranteed 6

  7. Why Asynchronous Data Streams? Synchronous stream Asynchronous stream Network Network delay & multi-path routing Synchronous Asynchronous Synchronous Merge w/o control 7

  8. Processing Requirements: • One pass processing • Small workspace: poly-logarithmic in the size of data • Fast processing time per element • Approximate answers are ok 8

  9. Our results: A deterministic data aggregation algorithm log W   Time: O log B       log W log B    Space: O log B log W      | X S |  Relative Error:   S 9

  10. Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing , 2002] Deterministic, Synchronous Merging buckets [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Random sampling 10

  11. Outline of Talk Introduction Algorithm Analysis 11

  12. Time 1 C Current time t t t t t t Data stream: 1 2 3 5 6 4 For simplicity assume unit valued elements 12

  13. Most recent time window of duration W 1 C Current time t t t t t t Data stream: 1 2 3 5 6 4 Goal: Compute the sum of elements with time stamps in time window [ C  W , C ] 13

  14. W W W W W 1 W 2 W 3 W 4 W Divide time into periods of duration W 14

  15. sliding window W C T 1 W 2 W 3 W 4 W The sliding window may span at most two time periods 15

  16. sliding window W S S right left C T 1 W 2 W 3 W 4 W S S S   1 2 Sum can be written as two sub-sums In two time periods 16

  17. sliding window W S S left right C T 1 W 2 W 3 W 4 W D D left right Data structure that S maintains an estimate of left In left time period 17

  18. S left 1 W T D left Without loss of Generality, D Consider data structure left [ W 1 , ] in time period 18

  19. Data structure consists of various levels D 1 D D 2 left D L 2 L is an upper bound of the sum in a period 19

  20. D Consider level i i 1  Bucket at Level 0 1 W Time period 2  i 1 Counts up to elements 20

  21. t 1 t  W  Stream: 1 1 1 1 W Increase counter value 21

  22. t t 1 t  W  Stream: 1 2 2 2 1 W Increase counter value 22

  23. t t t 1 t  W  Stream: 1 2 3 3 3 1 W Increase counter value 23

  24. ...... t t t t 1 t i W   1  Stream: i 1 2 3  2 1 1  2 1  1  2 1 i  1 W Increase counter value 24

  25. ...... t t t t t 1 t i W   1  Stream: i 1 2 3  2 1 i 1 2  1  2 1  2  i 1 1 W 2 2 i i W 1 W W 1 2  2 Split bucket 2  i 1 Counter threshold of reached 25

  26. ...... t t t t t 1 t i W   1  Stream: i 1 2 3  2 1 i 1 2  1  2 1  2 2 i i W 1 W W 1 2  2 New buckets have threshold also 2  i 1 26

  27. ...... t t t t t t W 1  Stream: i 1  1 2 3  2 1 i 1 i 2   1 t i 2 1   2 1  2 1  2  1 2 i i W 1 W W 1 2  2 Increase appropriate bucket 27

  28. ...... t t t t t t t W 1  Stream: i 1  1  1 2 3  2 1 i 1 i i 2    t W 2 1 2 2   2 i 1  2 2  2  1 2  1 i i W 1 W W 1 2  2 Increase appropriate bucket 28

  29. ...... t t t t t t t t W 1  Stream: i 1  1  1 2 3  2 1 1  i 1 i i 2    2 1 2 2 i  2 3 1 t i   1 2  2 3  2  2 2  1 i i W 1 W W 1 2  2 Increase appropriate bucket 29

  30. ...... m t t W W Stream: 1 1 t   m  2 2 x 2  i 1 1 W 1 W W 1 2  2 2 i 2 i W 3 W 3 W W Split bucket 1 1 2   4 4 30

  31. ...... m t t Stream: 1 x 1 W 1 2 2 i 2 i W 3 W 3 W W 1 1 2   4 4 31

  32. ...... m t t t W 3 W Stream: 1 1 t m 1     m 1 2  4 x 1 W 1 2 2  i 1 2 i W 3 W 3 W W 1 1 2   4 4 Increase appropriate bucket 32

  33. ...... m t ...... t t t Stream: 1 m 1  m  x 1 W 1 2 x 2  i 1 4 W W 3 3 W W 3 W W Split bucket 1 1 1 2  2   4 4 4 2 2 i i W 3 W 5 W 5 W 1 2  1  4 8 8 33

  34. ...... m t ...... t t t Stream: 1 m 1  m  x 1 W 1 2 x 4 3 W W 1  4 2 2 i i W 3 W 5 W 5 W 1 2  1  4 8 8 34

  35. Splitting Tree 2  i 1 1 W x 2  i 1 1 W 1 W W 1 2  2 x 2  i 1 2 i x 2 i 1    4 k W 3 W 3 W W 1 1 2   4 4 x x 3 2 W 3 W 5 W 5 W 1 2  1  4 8 8 35

  36. 2  i 1 1 W Max depth = log W Leaf buckets of duration 1 are not split any further t 1 1  t t t 1 2  1 2 36

  37. 2  i 1 1 W Leaf buckets The initial bucket may be split into many buckets 37

  38. 2  i 1 1 W Leaf buckets Due to space limitations 2   a log W  we only keep the last  buckets 38

  39. S 1 W T S Suppose we want to find the sum of elements in time period [ T , W ] 39

  40. S 1 W T 2 1 a Consider various levels 2 2 of splitting threshold a 2 k a 2 k 1  a 40

  41. S 1 W T 2 1 a First level with a leaf bucket 2 2 that intersects timeline a 2 k a 2 k 1  a 41

  42. S 1 W T Estimate of S: X x x x      z 1 2 x x x 2 k z 1 2 a z  a Consider buckets on right of timeline 42

  43. S 1 W T OR 2 1 a First level with a leaf bucket 2 2 On right timeline a 2 k a 2 k 1  a 43

  44. Outline of Talk Introduction Algorithm Analysis 44

  45. S 1 W T 2  i 1 Suppose that we use level in order to compute the estimate 45

  46. t Stream: k x x 1   b b t t l r 2  i 1 Consider splitting threshold level A data element is counted in the appropriate bucket 46

  47. t Stream: k t t t   r k l t k t t l r We can assume that the element is placed in the respective bucket 47

  48. t Stream: k 2  i 1 t t l r 2 i t 2 i k t t t  t t t  r l r 1 l  l r 2 2 We can assume that when bucket splits the element is placed in an arbitrary child bucket 48

  49. t Stream: k 2  i 1 t t l r t 2 i 2 i k t t t  t t t  r l r 1 l  l r 2 2 t t  t t r   l If: GOOD! k l 2 Element counted in correct bucket 49

  50. t Stream: k 2  i 1 t t l r t 2 i 2 i k t t t  t t t  r l r 1 l  l r 2 2 t t  r 1 t t l    If: BAD! r k 2 Element counted in wrong bucket 50

  51. S 1 W T Consider Leaf Buckets t k 1 W T t W  k  GOOD! If 51

  52. S 1 W T Consider Leaf Buckets t k 1 W t k  T BAD! If Element counted in wrong bucket 52

  53. S 1 W T Consider Leaf Buckets t k 1 W X S | Z | | Z |    1 2 Z :elements of left part counted on right 1 Z :elements of right part counted on left 2 53

  54. T W 1 t Z  k 1 elements of left part counted on right t k 1 W Must have been initially inserted in one of these buckets 54

  55. log W  Since tree depth | Z | O ( 2 i log W )  1 55

  56. log W  Since tree depth | Z | O ( 2 i log W )  1 Similarly, we can prove | Z | O ( 2 i log W )  2 Therefore: | X S | || Z | | Z || O ( 2 i log W )     1 2 56

  57. 2   a log W  Since  S ( 2 i log W )     It can be proven 57

  58. 2   a log W  Since  S ( 2 i log W ) It can be proven     Combined with | X S | O ( 2 i log W )   | X S |  We obtain relative error :   S 58

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend