1 Q- -digest digest Q Example Example Exact data: frequency - PDF document

Papers Papers Robust Aggregation in Sensor Robust Aggregation in Sensor • [Shrivastava04] Nisheeth Shrivastava, Chiranjeeb Buragohain, Divy Agrawal, Subhash Suri, Medians and Networks Networks Beyond: New Aggregation Techniques for Sensor Networks , ACM SenSys '04, Nov. 3-5, Baltimore, MD. • [Nath04] Suman Nath, Phillip B. Gibbons, Zachary Anderson, and Srinivasan Seshan, Synopsis Diffusion for Robust Jie Gao Aggregation in Sensor Networks ". In proceedings of ACM Computer Science Department SenSys'04. Stony Brook University • [Considine04] Jeffrey Considine, Feifei Li, George Kollios, and John Byers, Approximate Aggregation Techniques for Sensor Databases , Proc. ICDE, 2004. • [Przydatek03] Bartosz Przydatek, Dawn Song, Adrian Perrig, SIA: Secure Information Aggregation in Sensor Networks , Sensys’03. 10/25/05 Jie Gao, CSE590-fall05 1 10/25/05 Jie Gao, CSE590-fall05 2 Problem I: median Problem I: median Problem I: median Problem I: median • Computing average is simple on an aggregation tree. – Each node x stores the average a(x) and the number of nodes in its subtree n(x). – The average of a node x can be computed from its children u, v. n(x)=n(u)+n(v). a(x)=(a(u)n(u)+a(v)n(v))/n(x). • Computing the median with a fixed amount of message is hard. – We do not know the rank of u’s median in v’s dataset. – We resort to approximations. x u v 10/25/05 Jie Gao, CSE590-fall05 3 10/25/05 Jie Gao, CSE590-fall05 4 Median and random sampling Median and random sampling Quantile Quantile digest (q digest (q- -digest) digest) • Problem: compute the median a of n unsorted • A data structure that answers elements {a i }. – Approximate quantile query: median, the kth largest • Take a random sample of k elements. Compute the reading. median x. – Range queries: the kth to lth largest readings. • Claim: x has rank within (½+ ε )n and (½- ε )n with – Most frequent items. probability at least 1-2/exp{2k ε 2 }. (Proof left as an – Histograms. exercise.) • Properties: • Choose k=ln(2/ δ )/(2 ε 2 ), then x is an approximate – Deterministic algorithm. median with probability 1- δ . – Error-memory trade-off. • A deterministic algorithm? – Confidence factor. • How about approximate histogram? – Support multiple queries. • What if a sensor generates a list of values? 10/25/05 Jie Gao, CSE590-fall05 5 10/25/05 Jie Gao, CSE590-fall05 6 1

Q- -digest digest Q Example Example • Exact data: frequency of data value {f 1 , f 2 ,…,f σ }. Input data bucketed Q-digest • Compress the data: – detailed information concerning frequent data are preserved; – less frequently occurring values are lumped into larger buckets resulting in information loss. • Buckets: the nodes in a binary partition of the range [1, σ ]. Each bucket v has range [v.min, v.max]. • Only store non-zero buckets. • Digest property: Count(v) � n/k. (except leaf) – – Count(v) + Count(p) + Count(s) > n/k. (except root) Information loss parent sibling 10/25/05 Jie Gao, CSE590-fall05 7 10/25/05 Jie Gao, CSE590-fall05 8 Merging two q- Merging two q -digests digests Construct a q- Construct a q -digest digest • Each sensor • Merge q-digests constructs a q- from two children digest based • Add up the on its value. values in buckets • Check the • Re-evaluate the digest property digest property bottom up. bottom up: two “small” children’s Information loss: t count are undercounts since added up and some of its value moved to the appears on ancestors. parent. 10/25/05 Jie Gao, CSE590-fall05 9 10/25/05 Jie Gao, CSE590-fall05 10 Space complexity and error bound Space complexity and error bound Median and Median and quantile quantile query query 1. A q-digest with compression parameter k has at • Given q ∈ (0, 1), find the value whose rank is qn. most 3k buckets. • Relative error ε =|r-qn|/n, where r is • By property 2, for buckets Q, the true rank. � v ∈ Q [Count(v) + Count(p) + Count(s)] > |Q| n/k. – � v ∈ Q [Count(v) + Count(p) + Count(s)] � 3 � v ∈ Q Count(v)=3n. – • Post-order traversal on Q, sum the – |Q|<3k. counts of all nodes visited before a node v, which is the lower bound on the # of values less than v.max. • Any value that should be counted in v can be Report it when it is first time larger present in one of the ancestors. than qn. 1. Count(v) has max error log σ⋅ n/k. • Error bound: log σ /k = 3log σ /m, Error(v) � � ancestor p Count(p) � � ancestor p n/k � log σ⋅ n/k. – where m=3k is the storage bound 2. MERGE maintains the same relative error. for each sensor. Error(v) � � i Error(v i ) � � i log σ⋅ n i /k � log σ⋅ n/k. – 10/25/05 Jie Gao, CSE590-fall05 11 10/25/05 Jie Gao, CSE590-fall05 12 2

Simulation setup Simulation setup Other queries Other queries • Inverse quantile: given a value, determine its rank. • A typical aggregation tree (BFS tree) on 40 nodes – Traverse the tree in post-order, report the sum of counts v in a 200 by 200 area. In the simulation they use for which x>v.max, which is within [rank(x), rank(x)+ ε n] 4000~8000 nodes. • Range query: find # values in range [l,h]. – Perform two inverse quantile queries and take the difference. Error bound is 2 ε n. • Frequent items: given s ∈ (0, 1), find all values reported by more than sn sensors. – Count the leaf buckets whose counts are more than sn. – Small false positive: values with count between (s- ε )n and sn may also be reported as frequent. 10/25/05 Jie Gao, CSE590-fall05 13 10/25/05 Jie Gao, CSE590-fall05 14 Simulation setup Simulation setup Histogram v.s. q- Histogram v.s. q -digest digest • Random data; • Comparison of histogram and q-digest. • Correlated data:3D elevation value from Death Valley. 10/25/05 Jie Gao, CSE590-fall05 15 10/25/05 Jie Gao, CSE590-fall05 16 Tradeoff between error and msg msg Tradeoff between error and Saving on message size Saving on message size size size 10/25/05 Jie Gao, CSE590-fall05 17 10/25/05 Jie Gao, CSE590-fall05 18 3

Problem II: Aggregation along a Problem II: Aggregation along a Problem II: Aggregation along a Problem II: Aggregation along a spanning tree in practice spanning tree in practice spanning tree in practice spanning tree in practice • The impact of link dynamics on aggregation tree. • Solution: use multi-path • If a link fails, the data from the entire subtree is lost. routing (e.g., DAG) to – Wrong aggregated value; 1 improve robustness under – Inconsistency. link dynamics. • But if both paths succeed, 2 3 the the same data is received twice! • This is ok for some 4 aggregation such as MAX, MIN. • How about Count, SUM? 5 6 10/25/05 Jie Gao, CSE590-fall05 19 10/25/05 Jie Gao, CSE590-fall05 20 Order and duplicate insensitive Aggregation along a spanning tree Aggregation along a spanning tree Order and duplicate insensitive (ODI) synopses (ODI) synopses • Problem with spanning tree: Link dynamics • If a link fails, the data from the entire subtree is lost. • Aggregated value is insensitive to the sequence or duplication of input data. • Small-sizes digests such that any particular sensor • Decouple routing and data aggregation. reading is accounted for only once. • Use multi-path routing to improve the routing – MAX, MIN admit natural ODI synopsis. robustness. – ODI synopsis for SUM, COUNT, MEDIAN, AVG are more challenging. • If multiple paths succeed, the sink receives multiple • Synopsis generation: SG( ⋅ ). copies of the same data. • Synopsis fusion: SF( ⋅ ) takes two synopsis and • Design an aggregation algorithm that is insensitive generate a new synopsis of the union of input data. to order and duplications. • Synopsis evaluation: SE( ⋅ ) translates the synopsis to the final answer. 10/25/05 Jie Gao, CSE590-fall05 21 10/25/05 Jie Gao, CSE590-fall05 22 ODI synopsis for MAX/MIN ODI correctness ODI synopsis for MAX/MIN ODI correctness • A synopsis diffusion algorithm is ODI-correct if • Synopsis generation: SG( ⋅ ). SF() and SG() are order and duplicate insensitive functions. – Output the value itself. SE() • Or, if for any aggregation DAG, the resulting • Synopsis fusion: SF( ⋅ ) synopsis is identical to the synopsis produced – Take the MAX/MIN of the two SF() input values. by the canonical left-deep tree. • Synopsis evaluation: SE( ⋅ ). SG() – Output the synopsis. • The final result is independent of the underlying routing topology. 10/25/05 Jie Gao, CSE590-fall05 23 10/25/05 Jie Gao, CSE590-fall05 24 4

1 Q- -digest digest Q Example Example Exact data: frequency - PDF document

Papers Papers Robust Aggregation in Sensor Robust Aggregation in Sensor [Shrivastava04] Nisheeth Shrivastava, Chiranjeeb Buragohain, Divy Agrawal, Subhash Suri, Medians and Networks Networks Beyond: New Aggregation Techniques for Sensor

An Automated, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in

Introduction Questions: Assessment of the expected error of a learning algorithm: Is the error

Process Robustness Studies Background When factors interact, the level of one can sometimes be

Towards Controllable Explanation Generation for Recommender Systems via Neural Template Lei Li 1

Tensor Network Renormalization of Quantum Spin Liquids Haijun Liao, IOP, China Tao Xiang Lei

Chinas Financial Opening-up Peng Qinqin Caixin Reporter qinqinpeng@caixin.com Aug 11, 2020

Ryan Bradetich, Paul Oman, Jim Alves-Foss, and Theora Rice

ECE 3574: Applied Software Design Actor Pattern Today we are going to look at an abstraction of

BANK OF MONTREAL BANK OF MONTREAL BANK OF MONTREAL BANK OF MONTREAL FINANCIAL HIGHLIGHTS

the Crawford Difference Market-Leading Global Businesses The worlds largest fully-integrated

A Practical Congestion Control Scheme for Named Data Networking ACM ICN 2016 Klaus Schneider 1 ,

SEMANTICS Matt Post IntroHLT class 23 October 2019 Semantic Roles Syntax

Empirical Methods in Natural Language Processing Lecture 11 Word Sense Disambiguation Philipp

Seadrill Partners LLC First Quarter Results May 24, 2018 Forward Looking Statements This

AGM Presentation FY2015 Financial Results 1 December 2015 Group F Y2015 fina nc ia l hig hlig

Chapter 8: Process Control CMPS 105: Systems Programming Prof. Scott Brandt T Th 2-3:45 Soc Sci

Principles of Marketing Lesson 02 Marketing Environment A companys marketing environment

LArSoft technical details Saba Sehrish, Fermilab on behalf of SciSoft Team LArSoft 2019 Summer

WELCOME The Politics of Food: Understanding the Relationship between Food Policy Issues,

Grocery Store Simulation Project 1: Grocery Store Simulation Project 1 Handouts /

Todays Presenter Susan McClellan Executive Director, Millvale Community Library (PA) Small

ECMWF analysis of the AIRS focus-day 20 July 2002 Tony McNally / Phil Watts / Marco Matricardi

Control and Optimization in Smart Power Grids INCITE Seminar @ Universitat Politcnica de

Global Calls for Economic Justice: the potential of Islamic finance Mukhtar Hussain Justice