Distributed Online Tracking Mingwang Tang, Feifei Li , Yufei Tao 1 - - PowerPoint PPT Presentation

distributed online tracking
SMART_READER_LITE
LIVE PREVIEW

Distributed Online Tracking Mingwang Tang, Feifei Li , Yufei Tao 1 - - PowerPoint PPT Presentation

Distributed Online Tracking Mingwang Tang, Feifei Li , Yufei Tao 1 Motivation and challenge Sensor network LBS service 2 Motivation and challenge Sensor network LBS service Observation: Large distributed data are being generated


slide-1
SLIDE 1

1

Distributed Online Tracking

Mingwang Tang, Feifei Li, Yufei Tao

slide-2
SLIDE 2

2

Motivation and challenge

Sensor network LBS service

slide-3
SLIDE 3

2

Motivation and challenge

Observation: Large distributed data are being generated continuously in many application domains. Sensor network LBS service

slide-4
SLIDE 4

2

Motivation and challenge

Observation: Large distributed data are being generated continuously in many application domains. Challenge: How to track a function computed over distributed values in online fasion continuously? Sensor network LBS service

slide-5
SLIDE 5

3

A concrete example: SAMOS Project

slide-6
SLIDE 6

3

A concrete example: SAMOS Project

Observation 1: Each ship observes a sequence of values

  • ver time, e.g., wind direction, wind speed, sound speed.

Observation 2: Tracking a user defined function (e.g. the max wind speed) exactly at every time instance over such distributed data is communication-expensive.

slide-7
SLIDE 7

4

At the University of Utah: the MesoWest project

Close to 40,000 stations across the US, http://mesowest.org >10 billion readings and counting

slide-8
SLIDE 8

5

Problem formulation

m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology.

slide-9
SLIDE 9

5

Problem formulation

m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology. fi(t) represents the value observed at si at time t.

slide-10
SLIDE 10

5

Problem formulation

m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology. fi(t) represents the value observed at si at time t. f (t) = f (f1(t), f2(t), . . . , fm(t)) for some function f at tracker T.

slide-11
SLIDE 11

5

Problem formulation

m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology. fi(t) represents the value observed at si at time t. f (t) = f (f1(t), f2(t), . . . , fm(t)) for some function f at tracker T. tracking f (t) is expensive!

slide-12
SLIDE 12

5

Problem formulation

m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology. fi(t) represents the value observed at si at time t. f (t) = f (f1(t), f2(t), . . . , fm(t)) for some function f at tracker T. tracking f (t) is expensive! Tracker T: maintain g(t) ∈ [f (t) − ∆, f (t) + ∆] for some error threshold ∆ at any time instance t

slide-13
SLIDE 13

6

Some concrete instantiations

base case: centralized setting

slide-14
SLIDE 14

6

Some concrete instantiations

a chain topology base case: centralized setting

slide-15
SLIDE 15

7

Some concrete instantiations

a broom topology

slide-16
SLIDE 16

7

Some concrete instantiations

a simple tree topology

slide-17
SLIDE 17

7

Some concrete instantiations

a general tree topology

slide-18
SLIDE 18

8

State-of-the-art result

Centralized setting: a tracker with one observer.

slide-19
SLIDE 19

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

slide-20
SLIDE 20

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

native method: unbounded competitive ratio with respect to the messages by optimal offline method.

g(t) t f (0) f (t)

slide-21
SLIDE 21

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

native method: unbounded competitive ratio with respect to the messages by optimal offline method.

g(t) t f (0) > ∆ f (t)

slide-22
SLIDE 22

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

native method: unbounded competitive ratio with respect to the messages by optimal offline method.

g(t) t f (0) > ∆

> ∆

f (t)

slide-23
SLIDE 23

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

native method: unbounded competitive ratio with respect to the messages by optimal offline method.

t C C + ∆ C − ∆ f (t)

slide-24
SLIDE 24

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

native method: unbounded competitive ratio with respect to the messages by optimal offline method.

t C C + ∆ C − ∆ f (t)

slide-25
SLIDE 25

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

native method: unbounded competitive ratio with respect to the messages by optimal offline method. OptTrack method: OptTrack is an O(log ∆) competitive online algorithm to track any f : Z → Z within ∆.

g(t) t f (t)

S

∆ ∆

slide-26
SLIDE 26

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

native method: unbounded competitive ratio with respect to the messages by optimal offline method. OptTrack method: OptTrack is an O(log ∆) competitive online algorithm to track any f : Z → Z within ∆.

g(t) t f (t)

S

∆ ∆

slide-27
SLIDE 27

8

State-of-the-art result

Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009

Centralized setting: a tracker with one observer.

native method: unbounded competitive ratio with respect to the messages by optimal offline method. OptTrack method: OptTrack is an O(log ∆) competitive online algorithm to track any f : Z → Z within ∆.

g(t) t f (t)

S S

∆ ∆

slide-28
SLIDE 28

9

Chain online tracking: f : Z + → Z

h relay nodes f (t) g(t) = gh+1(t) tracker

  • bserver

g(t) ∈ [f (t) − ∆, f (t) + ∆]

slide-29
SLIDE 29

9

Chain online tracking: f : Z + → Z

h relay nodes f (t) g(t) = gh+1(t) tracker

  • bserver

g(t) ∈ [f (t) − ∆, f (t) + ∆]

slide-30
SLIDE 30

9

Chain online tracking: f : Z + → Z

h relay nodes f (t) g(t) = gh+1(t) tracker

  • bserver

∆ h+1 ∆ h+1 ∆ h+1 ∆ h+1

g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances.

slide-31
SLIDE 31

9

Chain online tracking: f : Z + → Z

h relay nodes f (t) g(t) = gh+1(t) tracker

  • bserver

∆1 ∆2 ∆3 ∆h+1 h+1

i=1 ∆i = ∆

g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances. ChainTrackR : distributed ∆ Randomly among h + 1 centralized tracking instances.

slide-32
SLIDE 32

9

Chain online tracking: f : Z + → Z

h relay nodes f (t) g(t) = gh+1(t) tracker

  • bserver

∆1 ∆2 ∆3 ∆h+1 h+1

i=1 ∆i = ∆

g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances. ChainTrackR : distributed ∆ Randomly among h + 1 centralized tracking instances. unbounded competitive ratio unbounded competitive ratio

slide-33
SLIDE 33

9

Chain online tracking: f : Z + → Z

h relay nodes f (t) g(t) = gh+1(t) tracker

  • bserver

∆ g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances. ChainTrackR : distributed ∆ Randomly among h + 1 centralized tracking instances. ChainTrackO : achieves a O(log∆) competitive ratio unbounded competitive ratio unbounded competitive ratio

slide-34
SLIDE 34

9

Chain online tracking: f : Z + → Z

h relay nodes f (t) g(t) = gh+1(t) tracker

  • bserver

∆ g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances. ChainTrackR : distributed ∆ Randomly among h + 1 centralized tracking instances. ChainTrackO : achieves a O(log∆) competitive ratio unbounded competitive ratio unbounded competitive ratio Competitive ratio: w.r.t offline optimal

slide-35
SLIDE 35

10

Distributed: good competitive ratio is impossible!

g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T Broom Model g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T

si

General-tree Model h relay nodes

slide-36
SLIDE 36

10

Distributed: good competitive ratio is impossible!

consider f = max: f (t) = max{f1(t), . . . , fm(t)}. g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T Broom Model g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T

si

General-tree Model h relay nodes

slide-37
SLIDE 37

10

Distributed: good competitive ratio is impossible!

consider f = max: f (t) = max{f1(t), . . . , fm(t)}. g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T Broom Model g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T

si

General-tree Model Observation: by knowing the future of data at all sites, an offline method can “communicate” between leaf nodes for free. h relay nodes

slide-38
SLIDE 38

11

Performance metric of an online algorithm

Centralized and chain model: cratio(A) = maxI∈I

cost(A,I) cost(offline,I) competitive ratio

slide-39
SLIDE 39

11

Performance metric of an online algorithm

Centralized and chain model: cratio(A) = maxI∈I

cost(A,I) cost(offline,I)

Tree model: ratio(A, I) =

cost(A,I) cost(A∗

I ,I)

ratio(A) = maxI∈I ratio(A, I) competitive ratio for a class A of algorithms, A∗

I is the optimal

algorithm on an input I

slide-40
SLIDE 40

11

Performance metric of an online algorithm

Centralized and chain model: cratio(A) = maxI∈I

cost(A,I) cost(offline,I)

Tree model: ratio(A, I) =

cost(A,I) cost(A∗

I ,I)

ratio(A) = maxI∈I ratio(A, I) competitive ratio for a class A of algorithms, A∗

I is the optimal

algorithm on an input I

Inspired by the concept of instance optimality: Optimal Aggregation Algorithms for Middleware by R. Fagin, A. Lotem, and M. Naor in

  • J. Comput. Syst. Sci., 66(4):614656, 2003.
slide-41
SLIDE 41

12

Abroom: a class of online algorithms for broom model

  • Every node u keeps a value yu(t) which represents the knowledge
  • f u about f (t) in the subtree rooted at u at time t.
  • Each leaf node u’s function is tracked by its parent v within error

∆ using gu(t), i.e., |gu(t) − fu(t)| ≤ ∆ for every time instance t.

slide-42
SLIDE 42

13

Broom online tracking

Theorem 1: For any algorithm A in Abroom, there exists an input instance I and another algorithm A′ ∈ Abroom, such that cost(A, I) is at least h times worse than cost(A′, I), i.e., for any A ∈ Abroom, ratio(A)= Ω(h).

slide-43
SLIDE 43

14

Broom online tracking: f (t) = max{f1(t), . . . , fm(t)}

Baseline method: m-chain

f1(t) f2(t) fm(t)

tracker T

  • T track each fi(t) within error ∆ using gi(t) by a chain tracking

instance.

n1 n2 nh

slide-44
SLIDE 44

14

Broom online tracking: f (t) = max{f1(t), . . . , fm(t)}

Baseline method: m-chain

f1(t) f2(t) fm(t)

tracker T

  • T track each fi(t) within error ∆ using gi(t) by a chain tracking

instance.

n1 n2 nh g(t) = maxi{gi(t)} at T.

slide-45
SLIDE 45

14

Broom online tracking: f (t) = max{f1(t), . . . , fm(t)}

Baseline method: m-chain

f1(t) f2(t) fm(t)

tracker T

  • T track each fi(t) within error ∆ using gi(t) by a chain tracking

instance. BroomTrack: asks n1 do all the tracking and the remaining nodes simply ”relay” the updates sent out by n1.

n1 n2 nh y1(t) = max{g1(t), . . . , gm(t)}

slide-46
SLIDE 46

14

Broom online tracking: f (t) = max{f1(t), . . . , fm(t)}

Baseline method: m-chain

f1(t) f2(t) fm(t)

tracker T

  • T track each fi(t) within error ∆ using gi(t) by a chain tracking

instance. BroomTrack: asks n1 do all the tracking and the remaining nodes simply ”relay” the updates sent out by n1.

n1 n2 nh

  • n1 tracks each function fi from si with error threshold ∆.
  • if y1(t) = y1(t − 1), it will pop up through the chain.

y1(t) = max{g1(t), . . . , gm(t)} g(t) = y1(t)

slide-47
SLIDE 47

15

Performance of broom online track method

Theorem 2: With respect to online algorithms in Abroom, ratio(BroomTrack) < hlog∆. A trivial result: ratio(m-Chain) = O(hlog∆).

slide-48
SLIDE 48

16

General tree online tracking

slide-49
SLIDE 49

16

General tree online tracking

  • m-Chain still work for general tree topology.
  • And similarly, we define the class of online algorithms

Atree, and design the TreeTrack method.

slide-50
SLIDE 50

16

General tree online tracking

Result: There is no instance optimal algorithm for Atree. Result: ratio(TreeTrack) =O(hmaxlog∆) w.r.t. Atree.

  • m-Chain still work for general tree topology.
  • And similarly, we define the class of online algorithms

Atree, and design the TreeTrack method.

slide-51
SLIDE 51

17

Other functions and topologies, asynchronous updates

Min, Average, Sum Holistic aggregate: e.g., f = φ-quantile Convert graph model to tree Different arrival rates and time for values at different sites

slide-52
SLIDE 52

18

Experiment

Data sets: two real data sets.

  • TEMP from MesoWest Project.
  • WD from SAMOS project.

Setup N = 500 # time instances h = 2 # relay nodes f = max aggregate function ∆ = 0.6τ error threshold, τ = avg(std(f1), . . . , std(fm)) m = 15 # observers

slide-53
SLIDE 53

19

Chain model

TEMP:vary error threshold ∆

slide-54
SLIDE 54

20

Broom model

TEMP: vary error threshold ∆ and # of relay nodes h

slide-55
SLIDE 55

21

General tree model

Vary error threshold ∆

slide-56
SLIDE 56

22

Other functions: sum

Track sum on broom and general tree.

slide-57
SLIDE 57

23

Other related work

Distributred streaming model, e.g., works by G. Cormode, S. Muthukrishnan, K. Yi, M. Garofalakis and many others

slide-58
SLIDE 58

23

Other related work

Distributred streaming model, e.g., works by G. Cormode, S. Muthukrishnan, K. Yi, M. Garofalakis and many others Threshold monitoring over some features (e.g., item frequency) on streaming elements, e.g., geometric based method (I. Sharfman, A. Schuster, and D. Keren), constraint monitoring (single sided, e.g., S. Kashyap, J. Ramamirtham, R. Rastogi, and P. Shukla).

slide-59
SLIDE 59

23

Other related work

Distributred streaming model, e.g., works by G. Cormode, S. Muthukrishnan, K. Yi, M. Garofalakis and many others Threshold monitoring over some features (e.g., item frequency) on streaming elements, e.g., geometric based method (I. Sharfman, A. Schuster, and D. Keren), constraint monitoring (single sided, e.g., S. Kashyap, J. Ramamirtham, R. Rastogi, and P. Shukla). Special case: monotone functions (G. Cormode, S. Muthukrishnan, and K. Yi.), top-k (B. Babcock and C. Olston).

slide-60
SLIDE 60

24

Conclusion

Distributred online tracking is challenging

slide-61
SLIDE 61

24

Conclusion

Distributred online tracking is challenging Lesson learned: whenever possible, push tracking to the edge seems to be the best approach

slide-62
SLIDE 62

24

Conclusion

Distributred online tracking is challenging Future work:

  • investigating online tracking with an error threshold

that may change over time

  • multi-dimensional data
  • more complex analytical functions

Lesson learned: whenever possible, push tracking to the edge seems to be the best approach

slide-63
SLIDE 63

25

Thanks!

slide-64
SLIDE 64

26

Non-leaf observers?

slide-65
SLIDE 65

27

Our goal

Challenge: Our problem is a continuous online problem that require a good approximation for every instance. Challenge: Tracker does not observe data directly;

  • nline data are observed by distributed clients only.

Challenge: Each client is observing an arbitrary function

slide-66
SLIDE 66

28

Abroom: a class of online algorithms for broom online tracking (cont.)

Lemma 1 Any algorithm A ∈ Abroom must track functions f1(t), . . . , fm(t) with an error threshold that is exactly ∆ at the first relay node n1 in order to minimize ratio(A). Lemma 2 Whenever yi(t) = yi(t − 1) at node ni for any i ∈ [1, h], any A ∈ Abroom must send an update from ni to ni+1, and this update message must be yi(t).

Theorem 1 For any algorithm A in Abroom, there exists an input instance I and another algorithm A′ ∈ Abroom, such that cost(A, I) is at least h times worse than cost(A′, I), i.e., for any A ∈ Abroom, ratio(A)= Ω(h).

slide-67
SLIDE 67

29

Other functions: median

Track median on broom and general tree.