over a Sliding Window Costas Busch Rensselaer Polytechnic Institute - - PowerPoint PPT Presentation

over a sliding window
SMART_READER_LITE
LIVE PREVIEW

over a Sliding Window Costas Busch Rensselaer Polytechnic Institute - - PowerPoint PPT Presentation

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1 Outline of Talk Introduction Algorithm Analysis 2 Time 1 C


slide-1
SLIDE 1

1

A Deterministic Algorithm for Summarizing Asynchronous Streams

  • ver a Sliding Window

Costas Busch

Rensselaer Polytechnic Institute

Srikanta Tirthapura

Iowa State University

slide-2
SLIDE 2

2

Introduction Algorithm Analysis Outline of Talk

slide-3
SLIDE 3

3

1 C

Time

1

t

Data stream: For simplicity assume unit valued elements

2

t

3

t

4

t

5

t

1

v

2

v

3

v

4

v

5

v

slide-4
SLIDE 4

4

1 C

Current time Most recent time window of duration W Compute the sum of elements with time stamps in time window

] , [ C W C 

Goal:

1

t

Data stream:

2

t

3

t

4

t

5

t

1

v

2

v

3

v

4

v

5

v

   C t W C i

i

v

slide-5
SLIDE 5

5

Example I: All packets on a network link, maintain the number of different ip sources in the last

  • ne hour

Example II: Large database, continuously maintain averages and frequency moments

slide-6
SLIDE 6

6

Synchronous stream ti: In ascending order Asynchronous stream ti: No order guaranteed

1

t

Data stream:

2

t

3

t

4

t

5

t

1

v

2

v

3

v

4

v

5

v

slide-7
SLIDE 7

7

Why Asynchronous Data Streams?

Network

Synchronous stream Asynchronous stream Synchronous Synchronous Asynchronous Merge w/o control Network delay & multi-path routing

slide-8
SLIDE 8

8

Processing Requirements:

  • One pass processing
  • Small workspace: poly-logarithmic in

the size of data

  • Fast processing time per element
  • Approximate answers are ok
slide-9
SLIDE 9

9

Our results: A deterministic data aggregation algorithm Time:

        W B O log log

Space:         B W W B O log log log log

S S X | |   

Relative Error:

slide-10
SLIDE 10

10

Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing, 2002] Deterministic, Synchronous [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Merging buckets Random sampling

slide-11
SLIDE 11

11

Introduction Algorithm Analysis Outline of Talk

slide-12
SLIDE 12

12

1 C

Current time Time

1

t

2

t

Data stream: For simplicity assume unit valued elements

3

t

4

t

5

t

6

t

slide-13
SLIDE 13

13

1 C

Current time Most recent time window of duration W

1

t

2

t

Data stream:

3

t

4

t

5

t

6

t

Compute the sum of elements with time stamps in time window

] , [ C W C 

Goal:

slide-14
SLIDE 14

14

1

Divide time into periods of duration W

W W W W W

W W 2 W 3

W 4

slide-15
SLIDE 15

15

1

W W 2 W 3

W 4

The sliding window may span at most two time periods

C

W

sliding window

T

slide-16
SLIDE 16

16

1

W W 2 W 3

W 4

C

W

sliding window

left

S

right

S

2 1

S S S  

Sum can be written as two sub-sums In two time periods

T

slide-17
SLIDE 17

17

1

W W 2 W 3

W 4

C

W

sliding window

left

D

right

D

Data structure that maintains an estimate of In left time period

left

S

T

left

S

right

S

slide-18
SLIDE 18

18

1

W

T

Without loss of Generality, Consider data structure in time period

] , 1 [ W

left

S

left

D

left

D

slide-19
SLIDE 19

19

left

D

1

D

2

D

L

D

Data structure consists of various levels

L

2

is an upper bound of the sum in a period

slide-20
SLIDE 20

20

1

W

Counts up to elements

1

2 

i

Time period Bucket at Level

1  i

Consider level

i

D

slide-21
SLIDE 21

21

1

W

1

Increase counter value

W t  

1

1

Stream:

1

t

slide-22
SLIDE 22

22

1

W

2

Increase counter value

W t  

2

1

Stream:

1

t

2

t

slide-23
SLIDE 23

23

1

W

W t  

3

1

3

Stream: Increase counter value

1

t

2

t

3

t

slide-24
SLIDE 24

24

1

W

1 2

1   i

Increase counter value

W t i  

1 2

1

1

Stream:

1

t

2

t

3

t

1 2

1   i

t

......

slide-25
SLIDE 25

25

W t i  

1 2

1

1

Stream:

1

t

2

t

3

t

1 2

1   i

t

......

1

2 

i

t

1

W

1

2 W 1 2  W

W

1

2 

i i

2

i

2

Counter threshold of reached

1

2 

i

Split bucket

slide-26
SLIDE 26

26

W t i  

1 2

1

1

Stream:

1

t

2

t

3

t

1 2

1   i

t

......

1

2 

i

t

1

2 W 1 2  W

W

i

2

i

2

New buckets have threshold also

1

2 

i

slide-27
SLIDE 27

27

2 1

1 2

1

W t i  

Stream:

1

t

2

t

3

t

1 2

1   i

t

......

1

2 

i

t

1 2

1   i

t

1

2 W 1 2  W

W

1 2 

i i

2

Increase appropriate bucket

slide-28
SLIDE 28

28

W t W

i

 

2 2

1

2

Stream:

1

t

2

t

3

t

1 2

1   i

t

......

1

2 

i

t

1 2

1   i

t

1

2 W 1 2  W

W

1 2 

i

1 2 

i

Increase appropriate bucket

2 2

1   i

t

slide-29
SLIDE 29

29

Stream:

1

t

2

t

3

t

1 2

1   i

t

......

1

2 

i

t

1 2

1   i

t

1

2 W 1 2  W

W

2 2 

i

1 2 

i

Increase appropriate bucket

2 2

1   i

t

3 2

1   i

t

2 1

3 2

1

W t i  

slide-30
SLIDE 30

30

1

2 W 1 2  W

W

1 2  W 4 3 W 1 4 3  W

W

1

x

1

2 

i i

2

i

2

1

t

...... m

t

Stream: Split bucket

2 1 2 W t W

m 

 

slide-31
SLIDE 31

31

1

2 W

1 2  W 4 3 W 1 4 3  W

W

i

2

i

2

1

t

...... m

t

Stream:

1

x

slide-32
SLIDE 32

32

1

2 W

1 2  W 4 3 W 1 4 3  W

W

1 2 

i i

2

1

t

......m

t

Stream:

1  m

t

Increase appropriate bucket

1

x

4 3 1 2

1

W t W

m

  

slide-33
SLIDE 33

33

1

2 W

1 2  W 4 3 W 1 4 3  W

W

1

x

1

2 

i 4

x

1

t

...... m

t

Stream:

1  m

t

1 2  W 4 3 W

1 2  W

4 3 W

8 5 W 1 8 5  W

i

2

i

2

......

m

t

Split bucket

slide-34
SLIDE 34

34

1

2 W

1 4 3  W

W

1

x

4

x

1

t

...... m

t

Stream:

1  m

t

1 2  W

4 3 W

8 5 W 1 8 5  W

i

2

i

2

......

m

t

slide-35
SLIDE 35

35

1

W

1

2 W 1 2  W

W

1 2  W 4 3 W 1 4 3  W

W

1 2  W

4 3 W

8 5 W 1 8 5  W

1

2 

i 1

2 

i 1

2 

i

1

x

4

x

2

x

3

x

1

2 2

 

i k i

x

Splitting Tree

slide-36
SLIDE 36

36

1

W

Leaf buckets of duration 1 are not split any further

1

t 1

1 

t

2

t 1

2 

t

1

2 

i

Max depth =

W log

slide-37
SLIDE 37

37

1

W

1

2 

i

The initial bucket may be split into many buckets Leaf buckets

slide-38
SLIDE 38

38

1

W

1

2 

i

Due to space limitations we only keep the last buckets

W a log 2    

Leaf buckets

slide-39
SLIDE 39

39

1

W

T

Suppose we want to find the sum

  • f elements in time period

] , [ W T

S S

slide-40
SLIDE 40

40

1

W

T

  • f splitting threshold

a a

a a

1

2

2

2

k

2

1

2

 k

Consider various levels

S

slide-41
SLIDE 41

41

1

W

T

a a

a

1

2

2

2

1

2

 k

First level with a leaf bucket that intersects timeline

a

k

2

S

slide-42
SLIDE 42

42

1

W

T

a

k

2

Estimate of S:

1

x

2

x

z

x

z

x x x X     

2 1

Consider buckets on right of timeline

a z 

S

slide-43
SLIDE 43

43

1

W

T

a a

a

1

2

2

2

1

2

 k

First level with a leaf bucket On right timeline

a

k

2

OR

S

slide-44
SLIDE 44

44

Introduction Algorithm Analysis Outline of Talk

slide-45
SLIDE 45

45

Suppose that we use level in order to compute the estimate

1

2 

i

1

W

T

S

slide-46
SLIDE 46

46 k

t

Stream:

1  

b b

x x

l

t

r

t

A data element is counted in the appropriate bucket Consider splitting threshold level

1

2 

i

slide-47
SLIDE 47

47

k

t

Stream:

k

t

We can assume that the element is placed in the respective bucket

l

t

r

t

r k l

t t t  

slide-48
SLIDE 48

48

Stream:

k

t

We can assume that when bucket splits the element is placed in an arbitrary child bucket

l

t

r

t

l

t

r

t

k

t

2

r l

t t  1 2  

r l

t t 1

2 

i i

2

i

2

slide-49
SLIDE 49

49

Stream:

k

t

l

t

r

t

l

t

r

t

k

t

2

r l

t t  1 2  

r l

t t 1

2 

i i

2

i

2

2

r l k l

t t t t   

If: GOOD! Element counted in correct bucket

slide-50
SLIDE 50

50

Stream:

k

t

l

t

r

t

l

t

r

t

2

r l

t t  1 2  

r l

t t 1

2 

i i

2

i

2

r k r l

t t t t     1 2

If: BAD! Element counted in wrong bucket

k

t

slide-51
SLIDE 51

51

1

W

T

k

t

Consider Leaf Buckets If

W t T

k 

1

W

GOOD!

S

slide-52
SLIDE 52

52

1

W

T

k

t

Consider Leaf Buckets If

T tk 

1

W

BAD! Element counted in wrong bucket

S

slide-53
SLIDE 53

53

| | | |

2 1

Z Z S X   

:elements of left part counted on right

1

W

T

k

t

Consider Leaf Buckets

1

W

S

1

Z

2

Z

:elements of right part counted on left

slide-54
SLIDE 54

54 k

t

1

W

k

t

1

Z 

elements of left part counted on right

T

1

W

Must have been initially inserted in one of these buckets

slide-55
SLIDE 55

55

Since tree depth

W log 

) log 2 ( | |

1

W O Z

i

slide-56
SLIDE 56

56

Since tree depth

W log 

) log 2 ( | |

1

W O Z

i

Similarly, we can prove

) log 2 ( | |

2

W O Z

i

Therefore:

) log 2 ( || | | || | |

2 1

W O Z Z S X

i

   

slide-57
SLIDE 57

57

It can be proven

W a log 2    

Since

) log 2 ( W S

i

   

slide-58
SLIDE 58

58

It can be proven

W a log 2    

Since Combined with

) log 2 ( | | W O S X

i

 

) log 2 ( W S

i

   

We obtain relative error :

   S S X | |