 
              A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1
Outline of Talk Introduction Algorithm Analysis 2
Time 1 C t t t t t Data stream: 1 2 3 5 4 v v v v v 1 2 3 5 4 For simplicity assume unit valued elements 3
Most recent time window of duration W 1 C Current time t t t t t Data stream: 1 2 3 5 4 v v v v v 1 2 3 5 4 Goal: Compute the sum of elements with time stamps in time window [ C  W , C ]  v i C W t C    i 4
Example I: All packets on a network link, maintain the number of different ip sources in the last one hour Example II: Large database, continuously maintain averages and frequency moments 5
t t t t t Data stream: 1 2 3 4 5 v v v v v 1 2 3 5 4 Synchronous stream t i : In ascending order Asynchronous stream t i : No order guaranteed 6
Why Asynchronous Data Streams? Synchronous stream Asynchronous stream Network Network delay & multi-path routing Synchronous Asynchronous Synchronous Merge w/o control 7
Processing Requirements: • One pass processing • Small workspace: poly-logarithmic in the size of data • Fast processing time per element • Approximate answers are ok 8
Our results: A deterministic data aggregation algorithm log W   Time: O log B       log W log B    Space: O log B log W      | X S |  Relative Error:   S 9
Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing , 2002] Deterministic, Synchronous Merging buckets [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Random sampling 10
Outline of Talk Introduction Algorithm Analysis 11
Time 1 C Current time t t t t t t Data stream: 1 2 3 5 6 4 For simplicity assume unit valued elements 12
Most recent time window of duration W 1 C Current time t t t t t t Data stream: 1 2 3 5 6 4 Goal: Compute the sum of elements with time stamps in time window [ C  W , C ] 13
W W W W W 1 W 2 W 3 W 4 W Divide time into periods of duration W 14
sliding window W C T 1 W 2 W 3 W 4 W The sliding window may span at most two time periods 15
sliding window W S S right left C T 1 W 2 W 3 W 4 W S S S   1 2 Sum can be written as two sub-sums In two time periods 16
sliding window W S S left right C T 1 W 2 W 3 W 4 W D D left right Data structure that S maintains an estimate of left In left time period 17
S left 1 W T D left Without loss of Generality, D Consider data structure left [ W 1 , ] in time period 18
Data structure consists of various levels D 1 D D 2 left D L 2 L is an upper bound of the sum in a period 19
D Consider level i i 1  Bucket at Level 0 1 W Time period 2  i 1 Counts up to elements 20
t 1 t  W  Stream: 1 1 1 1 W Increase counter value 21
t t 1 t  W  Stream: 1 2 2 2 1 W Increase counter value 22
t t t 1 t  W  Stream: 1 2 3 3 3 1 W Increase counter value 23
...... t t t t 1 t i W   1  Stream: i 1 2 3  2 1 1  2 1  1  2 1 i  1 W Increase counter value 24
...... t t t t t 1 t i W   1  Stream: i 1 2 3  2 1 i 1 2  1  2 1  2  i 1 1 W 2 2 i i W 1 W W 1 2  2 Split bucket 2  i 1 Counter threshold of reached 25
...... t t t t t 1 t i W   1  Stream: i 1 2 3  2 1 i 1 2  1  2 1  2 2 i i W 1 W W 1 2  2 New buckets have threshold also 2  i 1 26
...... t t t t t t W 1  Stream: i 1  1 2 3  2 1 i 1 i 2   1 t i 2 1   2 1  2 1  2  1 2 i i W 1 W W 1 2  2 Increase appropriate bucket 27
...... t t t t t t t W 1  Stream: i 1  1  1 2 3  2 1 i 1 i i 2    t W 2 1 2 2   2 i 1  2 2  2  1 2  1 i i W 1 W W 1 2  2 Increase appropriate bucket 28
...... t t t t t t t t W 1  Stream: i 1  1  1 2 3  2 1 1  i 1 i i 2    2 1 2 2 i  2 3 1 t i   1 2  2 3  2  2 2  1 i i W 1 W W 1 2  2 Increase appropriate bucket 29
...... m t t W W Stream: 1 1 t   m  2 2 x 2  i 1 1 W 1 W W 1 2  2 2 i 2 i W 3 W 3 W W Split bucket 1 1 2   4 4 30
...... m t t Stream: 1 x 1 W 1 2 2 i 2 i W 3 W 3 W W 1 1 2   4 4 31
...... m t t t W 3 W Stream: 1 1 t m 1     m 1 2  4 x 1 W 1 2 2  i 1 2 i W 3 W 3 W W 1 1 2   4 4 Increase appropriate bucket 32
...... m t ...... t t t Stream: 1 m 1  m  x 1 W 1 2 x 2  i 1 4 W W 3 3 W W 3 W W Split bucket 1 1 1 2  2   4 4 4 2 2 i i W 3 W 5 W 5 W 1 2  1  4 8 8 33
...... m t ...... t t t Stream: 1 m 1  m  x 1 W 1 2 x 4 3 W W 1  4 2 2 i i W 3 W 5 W 5 W 1 2  1  4 8 8 34
Splitting Tree 2  i 1 1 W x 2  i 1 1 W 1 W W 1 2  2 x 2  i 1 2 i x 2 i 1    4 k W 3 W 3 W W 1 1 2   4 4 x x 3 2 W 3 W 5 W 5 W 1 2  1  4 8 8 35
2  i 1 1 W Max depth = log W Leaf buckets of duration 1 are not split any further t 1 1  t t t 1 2  1 2 36
2  i 1 1 W Leaf buckets The initial bucket may be split into many buckets 37
2  i 1 1 W Leaf buckets Due to space limitations 2   a log W  we only keep the last  buckets 38
S 1 W T S Suppose we want to find the sum of elements in time period [ T , W ] 39
S 1 W T 2 1 a Consider various levels 2 2 of splitting threshold a 2 k a 2 k 1  a 40
S 1 W T 2 1 a First level with a leaf bucket 2 2 that intersects timeline a 2 k a 2 k 1  a 41
S 1 W T Estimate of S: X x x x      z 1 2 x x x 2 k z 1 2 a z  a Consider buckets on right of timeline 42
S 1 W T OR 2 1 a First level with a leaf bucket 2 2 On right timeline a 2 k a 2 k 1  a 43
Outline of Talk Introduction Algorithm Analysis 44
S 1 W T 2  i 1 Suppose that we use level in order to compute the estimate 45
t Stream: k x x 1   b b t t l r 2  i 1 Consider splitting threshold level A data element is counted in the appropriate bucket 46
t Stream: k t t t   r k l t k t t l r We can assume that the element is placed in the respective bucket 47
t Stream: k 2  i 1 t t l r 2 i t 2 i k t t t  t t t  r l r 1 l  l r 2 2 We can assume that when bucket splits the element is placed in an arbitrary child bucket 48
t Stream: k 2  i 1 t t l r t 2 i 2 i k t t t  t t t  r l r 1 l  l r 2 2 t t  t t r   l If: GOOD! k l 2 Element counted in correct bucket 49
t Stream: k 2  i 1 t t l r t 2 i 2 i k t t t  t t t  r l r 1 l  l r 2 2 t t  r 1 t t l    If: BAD! r k 2 Element counted in wrong bucket 50
S 1 W T Consider Leaf Buckets t k 1 W T t W  k  GOOD! If 51
S 1 W T Consider Leaf Buckets t k 1 W t k  T BAD! If Element counted in wrong bucket 52
S 1 W T Consider Leaf Buckets t k 1 W X S | Z | | Z |    1 2 Z :elements of left part counted on right 1 Z :elements of right part counted on left 2 53
T W 1 t Z  k 1 elements of left part counted on right t k 1 W Must have been initially inserted in one of these buckets 54
log W  Since tree depth | Z | O ( 2 i log W )  1 55
log W  Since tree depth | Z | O ( 2 i log W )  1 Similarly, we can prove | Z | O ( 2 i log W )  2 Therefore: | X S | || Z | | Z || O ( 2 i log W )     1 2 56
2   a log W  Since  S ( 2 i log W )     It can be proven 57
2   a log W  Since  S ( 2 i log W ) It can be proven     Combined with | X S | O ( 2 i log W )   | X S |  We obtain relative error :   S 58
Recommend
More recommend