Part II: Applications in Database Systems Some slides courtesy of - PowerPoint PPT Presentation

Application of Graphical Models Part II: Applications in Database Systems Some slides courtesy of Amol Deshpande

Outline  Selectivity Estimation and Query Optimization  Probabilistic Relational Models  Probabilistic Databases  Sensor/Stream Data Management  References

Selectivity Estimation  Estimate the intermediate result sizes that may be generated during query processing  Equivalently, selectivity of predicates over tables  Key to obtaining good plans during optimization Customer Single-table predicates: SSN .. Income .. Homeowner? income > 90,000 and homeowner = ‘yes’ .. .. 100000 .. Yes (on customer) .. .. 11000 .. Yes Multi-table predicates: Purchases c.homeowner = ‘yes’ and SSN Store .. Amount p.amount > 10,000 and p.ssn = c.ssn (over Customer c and Purchases p)

Optimizer’s Assumption  Attribute value independence assumption  Attributes assumed to be independently distributed  Rarely true in practice Estimate Customer p(income > 90,000 and homeowner = yes) SSN .. Income .. Homeowner? as .. .. 100000 .. Yes p(income > 90,000) * p(homeowner = yes) .. .. 11000 .. Yes Can result in severe underestimation .. .. 50000 .. No .. .. 30000 .. No .. .. 200000 .. Yes In reality: p(income > 90,000, homeowner = yes) ≈ p(homeowner = yes)

Optimizer’s Assumption  Join uniformity assumption  Tuples from one relation assumed equally likely to join with tuples from other relation  Real datasets exhibit large skews Purchases Customer SSN Store .. Amount SSN .. Income .. Homeowner? .. .. 100,000 .. Yes .. .. 11,000 .. Yes .. .. 50,000 .. No .. .. 30,000 .. No .. .. 200,000 .. Yes

Selectivity Estimation using PGMs  Eliminating attribute value independence assumption [GTK’01,DGR’01,LWV’03,PMW’03] Customer Learn a PGM Income SSN age Income zipcode Home owner? Home..? Age .. .. 100000 .. Yes .. .. 11000 .. Yes .. .. 50000 .. No Approximate CPDs .. .. 30000 .. No using Histograms .. .. 200000 .. Yes Learning process modified to optimize for accuracy as well as storage space

Selectivity Estimation using PGMs  Eliminating attribute value independence assumption [GTK’01,DGR’01,LWV’03,PMW’03] Customer Learn a PGM Income SSN age Income zipcode Home owner? Home..? Age .. .. 100000 .. Yes .. .. 11000 .. Yes .. .. 50000 .. No Approximate CPDs .. .. 30000 .. No using Histograms .. .. 200000 .. Yes Query Selectivity Inference Estimates Algorithm

Example Probabilistic Database  Example from Dalvi and Suciu [2004] Possible worlds S instance probability A B prob {s1, s2, t1} 0.12 {s1, s2} 0.18 0.6 m 1 s1 {s1, t1} 0.12 s2 n 1 0.5 {s1} 0.18 T {s2, t1} 0.08 {s2} 0.12 prob C D {t1} 0.08 t1 1 p 0.4 {} 0.12

Probabilistic Databases  Much of probabilistic data is naturally correlated  E.g. sensor data, data integration [AFM’06]  If not, query processing introduces correlation  Can use graphical models to capture such correlations

Example: Mutual Exclusiveness Possible worlds X s1 X t1 f1() S instance probability 0 0 0 {s1, s2, t1} 0 A B prob 0 1 0.4 {s1, s2} 0.3 s1 m 1 0.6 1 0 0.6 {s1, t1} 0 s2 n 1 0.5 1 1 0 {s1} 0.3 T {s2, t1} 0.2 {s2} 0 C D prob X s2 f2() {t1} 0.2 t1 1 p 0.4 0 0.5 {} 0 1 0.5 Possible worlds (if desired) computed using inference

Motivation  Unprecedented, and rapidly increasing, instrumentation of our every-day world Distributed measurement networks (e.g. GPS) RFID Wireless sensor networks Network Monitoring Industrial Monitoring

Outline  A generic temporal model for sensor stream data  A range of applications  Model-based query processing  Object tracking and monitoring  …

True temperature X 1,t at X 1 at time t X 3,t X 2,t Interpretation: X 4,t independent of X 2,t given X 1,t and X 5,t X 4,t X 5,t Observed temperature O 1,t at X 1 at time t O 3,t O 2,t O 4,t O 5,t 1 2 3 SENSOR NETWORK 4 5

X 1,t X 1,t+1 X 1,t-1 X 3,t X 3,t+1 X 3,t-1 X 2,t+1 X 2,t-1 X 2,t O 1,t+1 O 1,t-1 O 1,t O 3,t+1 O 3,t-1 O 3,t O 2,t+1 O 2,t-1 O 2,t Markov Property Interpretation: {X i,t+1 } independent of {X i,t-1 } given {X i,t } 1 2 3 SENSOR NETWORK

X 1,t X 1,t+1 X 1,t-1 X 3,t X 3,t+1 X 3,t-1 X 2,t+1 X 2,t-1 X 2,t O 1,t+1 O 1,t-1 O 1,t O 3,t+1 O 3,t-1 O 3,t O 2,t+1 O 2,t-1 O 2,t State evolution can be modeled as a Dynamic Bayesian Network

X 1,t X 1,t+1 X 1,t-1 X 3,t X 3,t+1 X 3,t-1 X 2,t+1 X 2,t-1 X 2,t O 1,t+1 O 1,t-1 O 1,t O 3,t+1 O 3,t-1 O 3,t O 2,t+1 O 2,t-1 O 2,t Parameters ? (1) System model Prior: p( X 1,0 , X 2,0 , X 3,0 ) Evolution: p( X 1,t , X 2,t , X 3,t | X 1,t-1 , X 2,t-1 , X 3,t-1 )

X 1,t X 1,t+1 X 1,t-1 X 3,t X 3,t+1 X 3,t-1 X 2,t+1 X 2,t-1 X 2,t O 1,t+1 O 1,t-1 O 1,t O 3,t+1 O 3,t-1 O 3,t O 2,t+1 O 2,t-1 O 2,t Parameters ? (2) Measurement model p( O 1,t , O 2,t , O 3,t | X 1,t , X 2,t , X 3,t )

Application: Model-based Query Processing [DGMHH’04,SBEMY’06] USER Declarative Query Query Results Select nodeID, 1, 22.73, 100% temp ± .1C, conf(.95) … Where nodeID in {1..6} 6, 22.1, 99% Probabilistic Query Model Processor Data Observation Plan 1, temp = 22.73, {[temp, 1], 3, voltage = 2.73 [voltage, 3], 6, voltage = 2.65 [voltage, 6]} 1 2 4 3 SENSOR NETWORK 5 6

Application: Model-based Query Processing [DGMHH’04,SBEMY’06] USER Declarative Query Query Results Select nodeID, 1, 22.73, 100% temp ± .1C, conf(.95) … Where nodeID in {1..6} 6, 22.1, 99% Advantages: Probabilistic Query Exploit correlations Model Processor Handle noise, biases in the data Data Observation Plan Predict missing or future values 1, temp = 22.73, {[temp, 1], 3, voltage = 2.73 [voltage, 3], Reduce communication cost 6, voltage = 2.65 [voltage, 6]} 1 2 4 3 SENSOR NETWORK 5 6

Object Tracking and Monitoring  Mobile RFID readers  Handheld, robot-mounted +  Incomplete, noisy data  Environmental factors  Orientation of reading  Not directly queriable  Raw data: <tag id, reader id, ts>  Data needed for querying: e.g., precise object locations

Graphical Modeling  A generative model p(X,O)  X: true object location (x,y,z)  O: boolean for RFID readings  How state of the world changes  Object movement, reader motion  How sensing generates data from the state of the world  Sensor measurement model  Probabilistic inference over RFID streams in mobile Environments. T. Tran, C. Sutton, R. Cocci, Y. Nie, Y. Diao, and P. Shenoy. ICDE 2009.

Inference over RFID Streams  Probabilistic inference over streams -- p(X|O)  Particle filtering: sampling-based inference  Key to performance: using a small number of samples Particle filtering Our optimizations Accuracy 0.6 - 0.8 foot 0.1 - 0.5 foot Performance 0.1 reading/sec for > 1000 readings/sec 20 objects for 20,000 objects 7 orders of magnitude improvement!

Open Discussion  Where does our contribution lie when applying graphical models?  Devise the right model  Local probability distributions  Parameter estimation  Efficiency and scalability  Number of variables (e.g., objects)  Inference on streams (one pass, constant time/item)  Distributed query processing  The giant graphical model is distributed

Part II: Applications in Database Systems Some slides courtesy of - PowerPoint PPT Presentation

Application of Graphical Models Part II: Applications in Database Systems Some slides courtesy of Amol Deshpande Outline Selectivity Estimation and Query Optimization Probabilistic Relational Models Probabilistic Databases

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

FY17 CONSOLIDATED RESULTS UNIPOL AND UNIPOLSAI Bologna, 23 March 2018 2 PART 1 PART 2 PART 3

Answers To Common Questions (Part-2) ? Part 1 : Christian walk, Marriage Part 2 : Lifestyle

Using Contize for Web Applications Part I: Continuation-based Web Programming Part II: Contize

Tcp/Ip Applications Programming for Os/2: With Applications for Presentation Manager Tcp/Ip

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Network Applications Network Applications There are many network applications Network

Cardiff Schools Facilities Presentation Part 1: History of Cardiff Schools Part 2: Todays

Wind Part 1: How do we measure it? Part 2: What exactly is wind? Part 3: Where is it? PART 1:

Introduction Part One: Initial Problem Part Two: Progress Over Six Months Part

SANLAM STAFF UMBRELLA PROVIDENT AND PENSION FUND AND RELATED GROUP INSURANCE agenda PART A -

FY17 Grants Program Presented by the DCCAH Grants Department Agenda: Part 1: The Challenge

Restructuring Scientific Software using Semantic Patching with Coccinelle Michele MARTONE

Linux Qualification - Coding Style / Type issues in IEC 61508 Nicholas Mc Guire <

Coccinelle: A program matching and transformation tool Himangi Saraogi, Linux kernel intern,

INTRODUCTION TO COCCINELLE AND SMPL Linuxcon Japan, 2016 Vaishali Thakkar

Prex: Finding Guidance for Forward and Backward Porting of Linux Device Drivers Julia Lawall,

Day 1 " Last lecture: ! RNA Search and ! many biologically interesting roles for RNA "

Chapter 24 Life in the Universe 24.1 Life on Earth Our goals for learning When did life

The earliest evidence of life earliest evidence of life?

Part II: Applications in Database Systems Some slides courtesy of - PowerPoint PPT Presentation

Application of Graphical Models Part II: Applications in Database Systems Some slides courtesy of Amol Deshpande Outline Selectivity Estimation and Query Optimization Probabilistic Relational Models Probabilistic Databases

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

FY17 CONSOLIDATED RESULTS UNIPOL AND UNIPOLSAI Bologna, 23 March 2018 2 PART 1 PART 2 PART 3

Answers To Common Questions (Part-2) ? Part 1 : Christian walk, Marriage Part 2 : Lifestyle

Using Contize for Web Applications Part I: Continuation-based Web Programming Part II: Contize

Tcp/Ip Applications Programming for Os/2: With Applications for Presentation Manager Tcp/Ip

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Network Applications Network Applications There are many network applications Network

Cardiff Schools Facilities Presentation Part 1: History of Cardiff Schools Part 2: Todays

Wind Part 1: How do we measure it? Part 2: What exactly is wind? Part 3: Where is it? PART 1:

Introduction Part One: Initial Problem Part Two: Progress Over Six Months Part

SANLAM STAFF UMBRELLA PROVIDENT AND PENSION FUND AND RELATED GROUP INSURANCE agenda PART A -

FY17 Grants Program Presented by the DCCAH Grants Department Agenda: Part 1: The Challenge

Restructuring Scientific Software using Semantic Patching with Coccinelle Michele MARTONE

Linux Qualification - Coding Style / Type issues in IEC 61508 Nicholas Mc Guire &lt;

Coccinelle: A program matching and transformation tool Himangi Saraogi, Linux kernel intern,

INTRODUCTION TO COCCINELLE AND SMPL Linuxcon Japan, 2016 Vaishali Thakkar

Prex: Finding Guidance for Forward and Backward Porting of Linux Device Drivers Julia Lawall,

Day 1 &quot; Last lecture: ! RNA Search and ! many biologically interesting roles for RNA &quot;

Chapter 24 Life in the Universe 24.1 Life on Earth Our goals for learning When did life

The earliest evidence of life earliest evidence of life?

Linux Qualification - Coding Style / Type issues in IEC 61508 Nicholas Mc Guire <

Day 1 " Last lecture: ! RNA Search and ! many biologically interesting roles for RNA "