1
play

1 Q- -digest digest Q Example Example Exact data: frequency - PDF document

Papers Papers Robust Aggregation in Sensor Robust Aggregation in Sensor [Shrivastava04] Nisheeth Shrivastava, Chiranjeeb Buragohain, Divy Agrawal, Subhash Suri, Medians and Networks Networks Beyond: New Aggregation Techniques for Sensor


  1. Papers Papers Robust Aggregation in Sensor Robust Aggregation in Sensor • [Shrivastava04] Nisheeth Shrivastava, Chiranjeeb Buragohain, Divy Agrawal, Subhash Suri, Medians and Networks Networks Beyond: New Aggregation Techniques for Sensor Networks , ACM SenSys '04, Nov. 3-5, Baltimore, MD. • [Nath04] Suman Nath, Phillip B. Gibbons, Zachary Anderson, and Srinivasan Seshan, Synopsis Diffusion for Robust Jie Gao Aggregation in Sensor Networks ". In proceedings of ACM Computer Science Department SenSys'04. Stony Brook University • [Considine04] Jeffrey Considine, Feifei Li, George Kollios, and John Byers, Approximate Aggregation Techniques for Sensor Databases , Proc. ICDE, 2004. • [Przydatek03] Bartosz Przydatek, Dawn Song, Adrian Perrig, SIA: Secure Information Aggregation in Sensor Networks , Sensys’03. 10/25/05 Jie Gao, CSE590-fall05 1 10/25/05 Jie Gao, CSE590-fall05 2 Problem I: median Problem I: median Problem I: median Problem I: median • Computing average is simple on an aggregation tree. – Each node x stores the average a(x) and the number of nodes in its subtree n(x). – The average of a node x can be computed from its children u, v. n(x)=n(u)+n(v). a(x)=(a(u)n(u)+a(v)n(v))/n(x). • Computing the median with a fixed amount of message is hard. – We do not know the rank of u’s median in v’s dataset. – We resort to approximations. x u v 10/25/05 Jie Gao, CSE590-fall05 3 10/25/05 Jie Gao, CSE590-fall05 4 Median and random sampling Median and random sampling Quantile Quantile digest (q digest (q- -digest) digest) • Problem: compute the median a of n unsorted • A data structure that answers elements {a i }. – Approximate quantile query: median, the kth largest • Take a random sample of k elements. Compute the reading. median x. – Range queries: the kth to lth largest readings. • Claim: x has rank within (½+ ε )n and (½- ε )n with – Most frequent items. probability at least 1-2/exp{2k ε 2 }. (Proof left as an – Histograms. exercise.) • Properties: • Choose k=ln(2/ δ )/(2 ε 2 ), then x is an approximate – Deterministic algorithm. median with probability 1- δ . – Error-memory trade-off. • A deterministic algorithm? – Confidence factor. • How about approximate histogram? – Support multiple queries. • What if a sensor generates a list of values? 10/25/05 Jie Gao, CSE590-fall05 5 10/25/05 Jie Gao, CSE590-fall05 6 1

  2. Q- -digest digest Q Example Example • Exact data: frequency of data value {f 1 , f 2 ,…,f σ }. Input data bucketed Q-digest • Compress the data: – detailed information concerning frequent data are preserved; – less frequently occurring values are lumped into larger buckets resulting in information loss. • Buckets: the nodes in a binary partition of the range [1, σ ]. Each bucket v has range [v.min, v.max]. • Only store non-zero buckets. • Digest property: Count(v) � n/k. (except leaf) – – Count(v) + Count(p) + Count(s) > n/k. (except root) Information loss parent sibling 10/25/05 Jie Gao, CSE590-fall05 7 10/25/05 Jie Gao, CSE590-fall05 8 Merging two q- Merging two q -digests digests Construct a q- Construct a q -digest digest • Each sensor • Merge q-digests constructs a q- from two children digest based • Add up the on its value. values in buckets • Check the • Re-evaluate the digest property digest property bottom up. bottom up: two “small” children’s Information loss: t count are undercounts since added up and some of its value moved to the appears on ancestors. parent. 10/25/05 Jie Gao, CSE590-fall05 9 10/25/05 Jie Gao, CSE590-fall05 10 Space complexity and error bound Space complexity and error bound Median and Median and quantile quantile query query 1. A q-digest with compression parameter k has at • Given q ∈ (0, 1), find the value whose rank is qn. most 3k buckets. • Relative error ε =|r-qn|/n, where r is • By property 2, for buckets Q, the true rank. � v ∈ Q [Count(v) + Count(p) + Count(s)] > |Q| n/k. – � v ∈ Q [Count(v) + Count(p) + Count(s)] � 3 � v ∈ Q Count(v)=3n. – • Post-order traversal on Q, sum the – |Q|<3k. counts of all nodes visited before a node v, which is the lower bound on the # of values less than v.max. • Any value that should be counted in v can be Report it when it is first time larger present in one of the ancestors. than qn. 1. Count(v) has max error log σ⋅ n/k. • Error bound: log σ /k = 3log σ /m, Error(v) � � ancestor p Count(p) � � ancestor p n/k � log σ⋅ n/k. – where m=3k is the storage bound 2. MERGE maintains the same relative error. for each sensor. Error(v) � � i Error(v i ) � � i log σ⋅ n i /k � log σ⋅ n/k. – 10/25/05 Jie Gao, CSE590-fall05 11 10/25/05 Jie Gao, CSE590-fall05 12 2

  3. Simulation setup Simulation setup Other queries Other queries • Inverse quantile: given a value, determine its rank. • A typical aggregation tree (BFS tree) on 40 nodes – Traverse the tree in post-order, report the sum of counts v in a 200 by 200 area. In the simulation they use for which x>v.max, which is within [rank(x), rank(x)+ ε n] 4000~8000 nodes. • Range query: find # values in range [l,h]. – Perform two inverse quantile queries and take the difference. Error bound is 2 ε n. • Frequent items: given s ∈ (0, 1), find all values reported by more than sn sensors. – Count the leaf buckets whose counts are more than sn. – Small false positive: values with count between (s- ε )n and sn may also be reported as frequent. 10/25/05 Jie Gao, CSE590-fall05 13 10/25/05 Jie Gao, CSE590-fall05 14 Simulation setup Simulation setup Histogram v.s. q- Histogram v.s. q -digest digest • Random data; • Comparison of histogram and q-digest. • Correlated data:3D elevation value from Death Valley. 10/25/05 Jie Gao, CSE590-fall05 15 10/25/05 Jie Gao, CSE590-fall05 16 Tradeoff between error and msg msg Tradeoff between error and Saving on message size Saving on message size size size 10/25/05 Jie Gao, CSE590-fall05 17 10/25/05 Jie Gao, CSE590-fall05 18 3

  4. Problem II: Aggregation along a Problem II: Aggregation along a Problem II: Aggregation along a Problem II: Aggregation along a spanning tree in practice spanning tree in practice spanning tree in practice spanning tree in practice • The impact of link dynamics on aggregation tree. • Solution: use multi-path • If a link fails, the data from the entire subtree is lost. routing (e.g., DAG) to – Wrong aggregated value; 1 improve robustness under – Inconsistency. link dynamics. • But if both paths succeed, 2 3 the the same data is received twice! • This is ok for some 4 aggregation such as MAX, MIN. • How about Count, SUM? 5 6 10/25/05 Jie Gao, CSE590-fall05 19 10/25/05 Jie Gao, CSE590-fall05 20 Order and duplicate insensitive Aggregation along a spanning tree Aggregation along a spanning tree Order and duplicate insensitive (ODI) synopses (ODI) synopses • Problem with spanning tree: Link dynamics • If a link fails, the data from the entire subtree is lost. • Aggregated value is insensitive to the sequence or duplication of input data. • Small-sizes digests such that any particular sensor • Decouple routing and data aggregation. reading is accounted for only once. • Use multi-path routing to improve the routing – MAX, MIN admit natural ODI synopsis. robustness. – ODI synopsis for SUM, COUNT, MEDIAN, AVG are more challenging. • If multiple paths succeed, the sink receives multiple • Synopsis generation: SG( ⋅ ). copies of the same data. • Synopsis fusion: SF( ⋅ ) takes two synopsis and • Design an aggregation algorithm that is insensitive generate a new synopsis of the union of input data. to order and duplications. • Synopsis evaluation: SE( ⋅ ) translates the synopsis to the final answer. 10/25/05 Jie Gao, CSE590-fall05 21 10/25/05 Jie Gao, CSE590-fall05 22 ODI synopsis for MAX/MIN ODI correctness ODI synopsis for MAX/MIN ODI correctness • A synopsis diffusion algorithm is ODI-correct if • Synopsis generation: SG( ⋅ ). SF() and SG() are order and duplicate insensitive functions. – Output the value itself. SE() • Or, if for any aggregation DAG, the resulting • Synopsis fusion: SF( ⋅ ) synopsis is identical to the synopsis produced – Take the MAX/MIN of the two SF() input values. by the canonical left-deep tree. • Synopsis evaluation: SE( ⋅ ). SG() – Output the synopsis. • The final result is independent of the underlying routing topology. 10/25/05 Jie Gao, CSE590-fall05 23 10/25/05 Jie Gao, CSE590-fall05 24 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend