1
Distributed Online Tracking Mingwang Tang, Feifei Li , Yufei Tao 1 - - PowerPoint PPT Presentation
Distributed Online Tracking Mingwang Tang, Feifei Li , Yufei Tao 1 - - PowerPoint PPT Presentation
Distributed Online Tracking Mingwang Tang, Feifei Li , Yufei Tao 1 Motivation and challenge Sensor network LBS service 2 Motivation and challenge Sensor network LBS service Observation: Large distributed data are being generated
2
Motivation and challenge
Sensor network LBS service
2
Motivation and challenge
Observation: Large distributed data are being generated continuously in many application domains. Sensor network LBS service
2
Motivation and challenge
Observation: Large distributed data are being generated continuously in many application domains. Challenge: How to track a function computed over distributed values in online fasion continuously? Sensor network LBS service
3
A concrete example: SAMOS Project
3
A concrete example: SAMOS Project
Observation 1: Each ship observes a sequence of values
- ver time, e.g., wind direction, wind speed, sound speed.
Observation 2: Tracking a user defined function (e.g. the max wind speed) exactly at every time instance over such distributed data is communication-expensive.
4
At the University of Utah: the MesoWest project
Close to 40,000 stations across the US, http://mesowest.org >10 billion readings and counting
5
Problem formulation
m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology.
5
Problem formulation
m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology. fi(t) represents the value observed at si at time t.
5
Problem formulation
m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology. fi(t) represents the value observed at si at time t. f (t) = f (f1(t), f2(t), . . . , fm(t)) for some function f at tracker T.
5
Problem formulation
m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology. fi(t) represents the value observed at si at time t. f (t) = f (f1(t), f2(t), . . . , fm(t)) for some function f at tracker T. tracking f (t) is expensive!
5
Problem formulation
m distributed observers {s1, . . . , sm}, connected to a tracker T using a network topology. fi(t) represents the value observed at si at time t. f (t) = f (f1(t), f2(t), . . . , fm(t)) for some function f at tracker T. tracking f (t) is expensive! Tracker T: maintain g(t) ∈ [f (t) − ∆, f (t) + ∆] for some error threshold ∆ at any time instance t
6
Some concrete instantiations
base case: centralized setting
6
Some concrete instantiations
a chain topology base case: centralized setting
7
Some concrete instantiations
a broom topology
7
Some concrete instantiations
a simple tree topology
7
Some concrete instantiations
a general tree topology
8
State-of-the-art result
Centralized setting: a tracker with one observer.
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
native method: unbounded competitive ratio with respect to the messages by optimal offline method.
g(t) t f (0) f (t)
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
native method: unbounded competitive ratio with respect to the messages by optimal offline method.
g(t) t f (0) > ∆ f (t)
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
native method: unbounded competitive ratio with respect to the messages by optimal offline method.
g(t) t f (0) > ∆
> ∆
f (t)
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
native method: unbounded competitive ratio with respect to the messages by optimal offline method.
t C C + ∆ C − ∆ f (t)
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
native method: unbounded competitive ratio with respect to the messages by optimal offline method.
t C C + ∆ C − ∆ f (t)
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
native method: unbounded competitive ratio with respect to the messages by optimal offline method. OptTrack method: OptTrack is an O(log ∆) competitive online algorithm to track any f : Z → Z within ∆.
g(t) t f (t)
S
∆ ∆
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
native method: unbounded competitive ratio with respect to the messages by optimal offline method. OptTrack method: OptTrack is an O(log ∆) competitive online algorithm to track any f : Z → Z within ∆.
g(t) t f (t)
S
∆ ∆
8
State-of-the-art result
Ke Yi and Qin Zhang: Multi-Dimentional Online Tracking. SODA Conference 2009
Centralized setting: a tracker with one observer.
native method: unbounded competitive ratio with respect to the messages by optimal offline method. OptTrack method: OptTrack is an O(log ∆) competitive online algorithm to track any f : Z → Z within ∆.
g(t) t f (t)
S S
∆ ∆
9
Chain online tracking: f : Z + → Z
h relay nodes f (t) g(t) = gh+1(t) tracker
- bserver
g(t) ∈ [f (t) − ∆, f (t) + ∆]
9
Chain online tracking: f : Z + → Z
h relay nodes f (t) g(t) = gh+1(t) tracker
- bserver
g(t) ∈ [f (t) − ∆, f (t) + ∆]
9
Chain online tracking: f : Z + → Z
h relay nodes f (t) g(t) = gh+1(t) tracker
- bserver
∆ h+1 ∆ h+1 ∆ h+1 ∆ h+1
g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances.
9
Chain online tracking: f : Z + → Z
h relay nodes f (t) g(t) = gh+1(t) tracker
- bserver
∆1 ∆2 ∆3 ∆h+1 h+1
i=1 ∆i = ∆
g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances. ChainTrackR : distributed ∆ Randomly among h + 1 centralized tracking instances.
9
Chain online tracking: f : Z + → Z
h relay nodes f (t) g(t) = gh+1(t) tracker
- bserver
∆1 ∆2 ∆3 ∆h+1 h+1
i=1 ∆i = ∆
g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances. ChainTrackR : distributed ∆ Randomly among h + 1 centralized tracking instances. unbounded competitive ratio unbounded competitive ratio
9
Chain online tracking: f : Z + → Z
h relay nodes f (t) g(t) = gh+1(t) tracker
- bserver
∆ g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances. ChainTrackR : distributed ∆ Randomly among h + 1 centralized tracking instances. ChainTrackO : achieves a O(log∆) competitive ratio unbounded competitive ratio unbounded competitive ratio
9
Chain online tracking: f : Z + → Z
h relay nodes f (t) g(t) = gh+1(t) tracker
- bserver
∆ g(t) ∈ [f (t) − ∆, f (t) + ∆] ChainTrackA : distributed ∆ averagely among h + 1 centralized tracking instances. ChainTrackR : distributed ∆ Randomly among h + 1 centralized tracking instances. ChainTrackO : achieves a O(log∆) competitive ratio unbounded competitive ratio unbounded competitive ratio Competitive ratio: w.r.t offline optimal
10
Distributed: good competitive ratio is impossible!
g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T Broom Model g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T
si
General-tree Model h relay nodes
10
Distributed: good competitive ratio is impossible!
consider f = max: f (t) = max{f1(t), . . . , fm(t)}. g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T Broom Model g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T
si
General-tree Model h relay nodes
10
Distributed: good competitive ratio is impossible!
consider f = max: f (t) = max{f1(t), . . . , fm(t)}. g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T Broom Model g(t) ∈ [f (t) − ∆, f (t) + ∆] tracker T
si
General-tree Model Observation: by knowing the future of data at all sites, an offline method can “communicate” between leaf nodes for free. h relay nodes
11
Performance metric of an online algorithm
Centralized and chain model: cratio(A) = maxI∈I
cost(A,I) cost(offline,I) competitive ratio
11
Performance metric of an online algorithm
Centralized and chain model: cratio(A) = maxI∈I
cost(A,I) cost(offline,I)
Tree model: ratio(A, I) =
cost(A,I) cost(A∗
I ,I)
ratio(A) = maxI∈I ratio(A, I) competitive ratio for a class A of algorithms, A∗
I is the optimal
algorithm on an input I
11
Performance metric of an online algorithm
Centralized and chain model: cratio(A) = maxI∈I
cost(A,I) cost(offline,I)
Tree model: ratio(A, I) =
cost(A,I) cost(A∗
I ,I)
ratio(A) = maxI∈I ratio(A, I) competitive ratio for a class A of algorithms, A∗
I is the optimal
algorithm on an input I
Inspired by the concept of instance optimality: Optimal Aggregation Algorithms for Middleware by R. Fagin, A. Lotem, and M. Naor in
- J. Comput. Syst. Sci., 66(4):614656, 2003.
12
Abroom: a class of online algorithms for broom model
- Every node u keeps a value yu(t) which represents the knowledge
- f u about f (t) in the subtree rooted at u at time t.
- Each leaf node u’s function is tracked by its parent v within error
∆ using gu(t), i.e., |gu(t) − fu(t)| ≤ ∆ for every time instance t.
13
Broom online tracking
Theorem 1: For any algorithm A in Abroom, there exists an input instance I and another algorithm A′ ∈ Abroom, such that cost(A, I) is at least h times worse than cost(A′, I), i.e., for any A ∈ Abroom, ratio(A)= Ω(h).
14
Broom online tracking: f (t) = max{f1(t), . . . , fm(t)}
Baseline method: m-chain
f1(t) f2(t) fm(t)
tracker T
- T track each fi(t) within error ∆ using gi(t) by a chain tracking
instance.
n1 n2 nh
14
Broom online tracking: f (t) = max{f1(t), . . . , fm(t)}
Baseline method: m-chain
f1(t) f2(t) fm(t)
tracker T
- T track each fi(t) within error ∆ using gi(t) by a chain tracking
instance.
n1 n2 nh g(t) = maxi{gi(t)} at T.
14
Broom online tracking: f (t) = max{f1(t), . . . , fm(t)}
Baseline method: m-chain
f1(t) f2(t) fm(t)
tracker T
- T track each fi(t) within error ∆ using gi(t) by a chain tracking
instance. BroomTrack: asks n1 do all the tracking and the remaining nodes simply ”relay” the updates sent out by n1.
n1 n2 nh y1(t) = max{g1(t), . . . , gm(t)}
14
Broom online tracking: f (t) = max{f1(t), . . . , fm(t)}
Baseline method: m-chain
f1(t) f2(t) fm(t)
tracker T
- T track each fi(t) within error ∆ using gi(t) by a chain tracking
instance. BroomTrack: asks n1 do all the tracking and the remaining nodes simply ”relay” the updates sent out by n1.
n1 n2 nh
- n1 tracks each function fi from si with error threshold ∆.
- if y1(t) = y1(t − 1), it will pop up through the chain.
y1(t) = max{g1(t), . . . , gm(t)} g(t) = y1(t)
15
Performance of broom online track method
Theorem 2: With respect to online algorithms in Abroom, ratio(BroomTrack) < hlog∆. A trivial result: ratio(m-Chain) = O(hlog∆).
16
General tree online tracking
16
General tree online tracking
- m-Chain still work for general tree topology.
- And similarly, we define the class of online algorithms
Atree, and design the TreeTrack method.
16
General tree online tracking
Result: There is no instance optimal algorithm for Atree. Result: ratio(TreeTrack) =O(hmaxlog∆) w.r.t. Atree.
- m-Chain still work for general tree topology.
- And similarly, we define the class of online algorithms
Atree, and design the TreeTrack method.
17
Other functions and topologies, asynchronous updates
Min, Average, Sum Holistic aggregate: e.g., f = φ-quantile Convert graph model to tree Different arrival rates and time for values at different sites
18
Experiment
Data sets: two real data sets.
- TEMP from MesoWest Project.
- WD from SAMOS project.
Setup N = 500 # time instances h = 2 # relay nodes f = max aggregate function ∆ = 0.6τ error threshold, τ = avg(std(f1), . . . , std(fm)) m = 15 # observers
19
Chain model
TEMP:vary error threshold ∆
20
Broom model
TEMP: vary error threshold ∆ and # of relay nodes h
21
General tree model
Vary error threshold ∆
22
Other functions: sum
Track sum on broom and general tree.
23
Other related work
Distributred streaming model, e.g., works by G. Cormode, S. Muthukrishnan, K. Yi, M. Garofalakis and many others
23
Other related work
Distributred streaming model, e.g., works by G. Cormode, S. Muthukrishnan, K. Yi, M. Garofalakis and many others Threshold monitoring over some features (e.g., item frequency) on streaming elements, e.g., geometric based method (I. Sharfman, A. Schuster, and D. Keren), constraint monitoring (single sided, e.g., S. Kashyap, J. Ramamirtham, R. Rastogi, and P. Shukla).
23
Other related work
Distributred streaming model, e.g., works by G. Cormode, S. Muthukrishnan, K. Yi, M. Garofalakis and many others Threshold monitoring over some features (e.g., item frequency) on streaming elements, e.g., geometric based method (I. Sharfman, A. Schuster, and D. Keren), constraint monitoring (single sided, e.g., S. Kashyap, J. Ramamirtham, R. Rastogi, and P. Shukla). Special case: monotone functions (G. Cormode, S. Muthukrishnan, and K. Yi.), top-k (B. Babcock and C. Olston).
24
Conclusion
Distributred online tracking is challenging
24
Conclusion
Distributred online tracking is challenging Lesson learned: whenever possible, push tracking to the edge seems to be the best approach
24
Conclusion
Distributred online tracking is challenging Future work:
- investigating online tracking with an error threshold
that may change over time
- multi-dimensional data
- more complex analytical functions
Lesson learned: whenever possible, push tracking to the edge seems to be the best approach
25
Thanks!
26
Non-leaf observers?
27
Our goal
Challenge: Our problem is a continuous online problem that require a good approximation for every instance. Challenge: Tracker does not observe data directly;
- nline data are observed by distributed clients only.
Challenge: Each client is observing an arbitrary function
28
Abroom: a class of online algorithms for broom online tracking (cont.)
Lemma 1 Any algorithm A ∈ Abroom must track functions f1(t), . . . , fm(t) with an error threshold that is exactly ∆ at the first relay node n1 in order to minimize ratio(A). Lemma 2 Whenever yi(t) = yi(t − 1) at node ni for any i ∈ [1, h], any A ∈ Abroom must send an update from ni to ni+1, and this update message must be yi(t).
Theorem 1 For any algorithm A in Abroom, there exists an input instance I and another algorithm A′ ∈ Abroom, such that cost(A, I) is at least h times worse than cost(A′, I), i.e., for any A ∈ Abroom, ratio(A)= Ω(h).
29