Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah - PowerPoint PPT Presentation

Mergeable Summaries Q P Je ff M. Phillips P ∪ Q University of Utah S ( Q, ε ) S ( P, ε ) joint with with Pankaj K. Agarwal (Duke) Graham Cormode (AT&T) Zengfeng Huang (HKUST) S ( P ∪ Q, ε ) Zheiwei Wei (HKUST) size of S ( X, ε ) is always m Ke Yi (HKUST) w Array: d CM[i,j]

Summaries for MASSIVE Data Allows approximate computation with guarantees and small space coreset : small summary, proxy for full data set with approx guarantees: • ε -samples of ( P, R ) : approx density • ε -kernel: approx convex shape sketch : (random) (linear) combination of full data, recover functions with approx guarantees: • Euclidean distance: Johnson-Lindenstrauss random projection • min-count sketch: approx item counts • Greenwald-Khanna sketch: approx quantiles • Misra-Gries sketch: approx frequent items w Array: d CM[i,j]

Summaries for MASSIVE Data Allows approximate computation with guarantees and small space coreset : small summary, proxy for full data set with approx guarantees: • ε -samples of ( P, R ) : approx density • ε -kernel: approx convex shape Summary sketch : (random) (linear) combination of full data, recover functions with approx guarantees: • Euclidean distance: Johnson-Lindenstrauss random projection • min-count sketch: approx item counts • Greenwald-Khanna sketch: approx quantiles • Misra-Gries sketch: approx frequent items

Massive Distributed Computation data centers sensor networks multi-core

Massive Distributed Computation data centers sensor networks multi-core Q P S ( Q, ε ) S ( P, ε )

Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε )

Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) • similar to: MUD, Dremel more restrictive, “natural” S ( P ∪ Q, ε ) • generalizes streaming • archiving summaries size of S ( X, ε ) is always m

Random Sample Q P P val 15 17 20 1 8 42 7 10 14 3 ran .99 .42 .53 .01 .02 .23 .82 .75 .61 .14 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

Random Sample Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( Q, ε ) S ( P, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

Random Sample S ( P, ε ) P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( P ∪ Q, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

Random Sample S ( P, ε ) P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( P ∪ Q, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m max element top k elements

Linear Sketches Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] • Estimate P [ i ] = min j CM [ h j ( i ) , j ] Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) w Array: d CM[i,j]

Linear Sketches Q P Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] P ∪ Q • Estimate P [ i ] = min j CM [ h j ( i ) , j ] S ( Q, ε ) S ( P, ε ) Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m w Array: d CM[i,j]

Linear Sketches Q P Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] P ∪ Q • Estimate P [ i ] = min j CM [ h j ( i ) , j ] S ( Q, ε ) S ( P, ε ) Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) S ( P, ε ) S ( Q, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m w Array: d CM[i,j]

Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,1) (14,3)

Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,2) (14,3)

Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,4) (3,5) (11,1) (14,2)

Heavy Hitters Summaries P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 | P [ i ] − MG [ i ] | ≤ ε = ˆ m/ ( k + 1) (1,4) S ( P, ε ) (3,5) (11,1) (14,2)

Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,2) (1,3) S ( Q, ε ) S ( P, ε ) (3,2) (3,4) (5,1) (9,5) (11,1) (14,4) (14,2)

Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,6) (3,6) (5,2) (9,5) (11,1) (14,6)

Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,5) (3,5) (5,1) (9,4) (14,5)

Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,5) (3,5) S ( P ∪ Q, ε ) (5,1) (9,4) (14,5)

Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah - PowerPoint PPT Presentation

Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah S ( Q, ) S ( P, ) joint with with Pankaj K. Agarwal (Duke) Graham Cormode (AT&T) Zengfeng Huang (HKUST) S ( P Q, ) Zheiwei Wei (HKUST) size of S ( X, )

Mergeable Summaries Graham Cormode graham@research.att.com graham@research.att.com Pankaj

Business Statistics CONTENTS Data summaries Univariate summaries Bivariate summaries

Relational Reasoning for Mergeable Replicated Data Types KC Sivaramakrishnan joint work with

Herbal summaries for the public Involvement of PCOs in preparation of herbal summaries Federica

Herbal summaries for the public Involvement of PCOs in preparation of herbal summaries Jill

Overall Mark for summaries on Moodle is misleading Moodle shows an Overall Mark for your

Publication of Risk Management Plan (RMP) summaries: Proposal for analysis of the experience of

Lecture #1: Data, Summaries, and Visuals CS 109A: Introduction to Data Science Pavlos Protopapas

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

Multi-scale Geometric Summaries for Similarity-based Upstream Sensor Fusion Christopher Tralie,

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Paper Summaries Any takers? Sound and Animation This week is the last week for paper

MADRID PRESENTATION SUMMARIES Selected lectures and illustrations TABLE OF CONTENTS Basic and

13th International VHL Medical Symposium Selected Presentation Summaries previous 2 days, data

Presentation Summaries Keynote Speaker: Dignity: A Critical Lens for Affirming Personhood Harvey

1 2th International VHL Medical Symposium Selected Presentation Summaries The 12 th International

CS 241: Systems Programming Lecture 9. More C Spring 2020 Prof. Stephen Checkoway 1 Operators

Fundamentals of Programming Session 7 Instructor: Reza Entezari-Maleki Email:

Computer Programming Dr. Deepak B Phatak Dr. Supratik Chakraborty Department of Computer Science

Introduction to Computer Science I Scanner, Increment/Decrement, Conversion Janyl Jumadinova 5

A Look Back at Arithmetic Operators: the Increment and Decrement Spring Semester 2016

Computer Programming: Skills & Concepts (CP1) Boolean Expressions; Increment and Decrement

Elm Webtechnology Guest lecture Wouter Swierstra Faculty of Science Information and Computing

Motivations Chapter 2: Elementary In the preceding chapter, you learned how to Programming

Sambuz

Useful Links

Newsletter

Mail Us

Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah - PowerPoint PPT Presentation

Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah S ( Q, ) S ( P, ) joint with with Pankaj K. Agarwal (Duke) Graham Cormode (AT&T) Zengfeng Huang (HKUST) S ( P Q, ) Zheiwei Wei (HKUST) size of S ( X, )

Mergeable Summaries Graham Cormode graham@research.att.com graham@research.att.com Pankaj

Business Statistics CONTENTS Data summaries Univariate summaries Bivariate summaries

Relational Reasoning for Mergeable Replicated Data Types KC Sivaramakrishnan joint work with

Herbal summaries for the public Involvement of PCOs in preparation of herbal summaries Federica

Herbal summaries for the public Involvement of PCOs in preparation of herbal summaries Jill

Overall Mark for summaries on Moodle is misleading Moodle shows an Overall Mark for your

Publication of Risk Management Plan (RMP) summaries: Proposal for analysis of the experience of

Lecture #1: Data, Summaries, and Visuals CS 109A: Introduction to Data Science Pavlos Protopapas

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

Multi-scale Geometric Summaries for Similarity-based Upstream Sensor Fusion Christopher Tralie,

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Paper Summaries Any takers? Sound and Animation This week is the last week for paper

MADRID PRESENTATION SUMMARIES Selected lectures and illustrations TABLE OF CONTENTS Basic and

13th International VHL Medical Symposium Selected Presentation Summaries previous 2 days, data

Presentation Summaries Keynote Speaker: Dignity: A Critical Lens for Affirming Personhood Harvey

1 2th International VHL Medical Symposium Selected Presentation Summaries The 12 th International

CS 241: Systems Programming Lecture 9. More C Spring 2020 Prof. Stephen Checkoway 1 Operators

Fundamentals of Programming Session 7 Instructor: Reza Entezari-Maleki Email:

Computer Programming Dr. Deepak B Phatak Dr. Supratik Chakraborty Department of Computer Science

Introduction to Computer Science I Scanner, Increment/Decrement, Conversion Janyl Jumadinova 5

A Look Back at Arithmetic Operators: the Increment and Decrement Spring Semester 2016

Computer Programming: Skills &amp; Concepts (CP1) Boolean Expressions; Increment and Decrement

Elm Webtechnology Guest lecture Wouter Swierstra Faculty of Science Information and Computing

Motivations Chapter 2: Elementary In the preceding chapter, you learned how to Programming

Sambuz

Useful Links

Newsletter

Mail Us

Computer Programming: Skills & Concepts (CP1) Boolean Expressions; Increment and Decrement