Streaming Queries over Streaming Data Sirish Chandrasekaran UC - - PowerPoint PPT Presentation
Streaming Queries over Streaming Data Sirish Chandrasekaran UC - - PowerPoint PPT Presentation
Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 VLDB 2002 with Michael J. Franklin Motivation (1): Queries over Data Streams Queries over the past Select all the Hondas that passed I-80 at Ashby Ave.
Sirish Chandrasekaran
Queries over the past
Select all the Hondas that passed I-80 at Ashby
- Ave. between 2:15 and 2:45 pm today
Queries over the future: Continuous Query
Landmark Window Query
Continuously Select all the Hondas that pass I-80 at Ashby Ave. starting now
Sliding Window Query
Continuously Select all the Hondas that pass I-80 at Ashby Ave. in the latest half hour starting now
Hybrid Query
Continuously Select all the Hondas that have passed and will
pass I-80 at Ashby Ave. since 2:30 pm today
Motivation (1): Queries over Data Streams
Sirish Chandrasekaran
Conventional Databases
Data is stored/indexed in the system Queries are applied to stored data as they “stream through” Only support queries over the past
Data Query Index Result
Sirish Chandrasekaran
CQ Engines
Only support landmark window queries over the future
Queries Data Index Result
Queries are stored/indexed in the system Data is applied to stored queries as they “stream” through
Sirish Chandrasekaran
Psoup Insight #1
Queries and data are duals
Store new queries, apply to data that arrived earlier Store new data, apply to queries that arrived earlier Multiquery Processing = “join” of query and data
– Supports all three types of queries: queries over the past,
(landmark and sliding window) continuous, and hybrid
Data
Index
Result
Queries
Query
Index
Sirish Chandrasekaran
Psoup Insight #1
Index Index
Data
Result Data
Queries
Queries and data are duals
Store new queries, apply to data that arrived earlier Store new data, apply to queries that arrived earlier Multiquery Processing = “join” of query and data
– Supports all three types of queries: queries over the past,
(landmark and sliding window) continuous, and hybrid
Sirish Chandrasekaran
Motivation (2): Disconnected Operation
Previous solutions stream out answers immediately
Not feasible/suitable for all applications
Intermittent Connectivity: e.g., Applications
- n hand-held devices (as in this morning’s
keynote address) Even if connected: Not always interested in streaming answers
Sirish Chandrasekaran
Psoup Insight #2
Separate computation from delivery
Query answers continuously generated in background Apply windows on-demand to transmit “current” results Efficient support for disconnected operation
Low response time, Shared computation and storage across invocations Data
ID R.aR.b
Query
ID Predicate
Results Structure Queries Data
T T F F T T F F F T F F
Register
T T F T
Invoke
}
Sirish Chandrasekaran
Outline of Talk
PSoup Overview
Query Model Query Registration Background Processing
Selections Queries, Join Queries
Query Invocation
PSoup in Telegraph Experiments and Results Conclusions and Future Work
Sirish Chandrasekaran
PSoup Query Model
SELECT select_list FROM from_list WHERE where_clause BEGIN begin_time END end_time
Where clause: conjunction of boolean factors BEGIN-END clause: system clock or sequence numbers (begin_time, end_time):
(constant, constant) – snapshot query (constant, variable) – landmark window query (variable, variable) – sliding window query
Sirish Chandrasekaran
Query Registration
SELECT select_list FROM from_list WHERE where_clause BEGIN begin_time END end_time
}
}
Standing Query Clause (SQC) Windows_Table Symmetric Join to the to the
QueryID: handle for future query invocations
Sirish Chandrasekaran
Selections over Single Stream:
Arrival of New Query Specification
Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b PSoup (a) Initial State Query Store ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3
Sirish Chandrasekaran
Selections over Single Stream:
Arrival of New Query Specification
PSoup (b) Arrival of new Query
Select * From R Where R.a<=4 and R.b>=3
New query ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store Query Store
Sirish Chandrasekaran
Selections over Single Stream:
Arrival of New Query Specification
PSoup (c) Building Query Store 24 R.a<=4 and R.b>=3 ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b
BUILD
Data Store Query Store
Sirish Chandrasekaran
(d) Probing Data Store
Selections over Single Stream:
Arrival of New Query Specification
PSoup match match 24 R.a<=4 and R.b>=3 ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b PROBE Data Store Query Store
Sirish Chandrasekaran
Selections over Single Stream:
Arrival of New Query Specification
Results Structure 48 49 50 51 20 T F T F 52 F 21 (e) Inserting Results Results Queries Data 22 23 24 48 50 4 3 3 8
Sirish Chandrasekaran
Selections over Single Stream:
Arrival of New Data
Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b PSoup (a) Initial State Query Store ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 24 R.a<=4 and R.b>=3
Sirish Chandrasekaran
PSoup (b) Arrival of new Data New data 24 R.a<=4 and R.b>=3 Query Store ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b 53 3 6
Selections over Single Stream:
Arrival of New Data
Sirish Chandrasekaran
Selections over Single Stream:
Arrival of New Data
PSoup (c) Building Data Store 24 R.a<=4 and R.b>=3 Query Store ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b 53 3 6
B U I L D
Sirish Chandrasekaran
(d) Probing Query Store
Selections over Single Stream:
Arrival of New Data
PSoup 24 R.a<=4 and R.b>=3 ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Query Store Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b 53 3 6 match match PROBE
Sirish Chandrasekaran
Selections over Single Stream:
Arrival of New Data
Results Structure 48 49 50 51 20 52 21 (e) Inserting Results Results Queries Data 22 23 24 53 T F F F T 24 R.a<=4 and R.b>=3 20 0<R.a<=5
Sirish Chandrasekaran
Query Invocation
Results Structure 48 49 50 51 20 T F T F 52 F 21 Queries 22 23 24 Data 53 T F F F T
}
Current Window
BEGIN begin_time END end_time System returns the results corresponding to the current value of the BEGIN-END clause
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Query Specification
Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store (a) Initial State PSoup ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b S-Data Store
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Query Specification
23 R.a<5 and R.a>S.a and S.b>1 (b) Arrival of new Query PSoup New query Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store S-Data Store ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Query Specification
23 R.a<5 and R.a>S.a and S.b>1 (c) Building Query Store PSoup ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store
B U I L D
S-Data Store ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b Query Store
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Query Specification
(d) Probing R-Data Store PSoup
}
Matches 23 R.a<5 and R.a>S.a and S.b>1 ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store PROBE S-Data Store ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b Query Store
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Query Specification
ID 20 21 22 23 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a<5 and R.a>S.a and S.b>1 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store (e) Constructing Hybrid Structs PSoup
}
Matches 10 14 31 23 2>S.a and S.b>1 Query Store 23 3>S.a and S.b>1 23 4>S.a and S.b>1 Hybrid Structs R.ID Q.ID Q.Predicate S-Data Store ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Query Specification
(f) Probing S-Data Store PSoup Matches
{
ID 20 21 22 23 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a<5 and R.a>S.a and S.b>1 S-Data Store ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store Query Store 10 14 31 23 2>S.a and S.b>1 23 3>S.a and S.b>1 23 4>S.a and S.b>1 Hybrid Structs R.ID Q.ID Q.Predicate
PROBE 14,21,23 31,21,23 31,25,23
R,S,Q Results ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Data
Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 47 50 51 R.a 4 5 3 3 3 8 R.b R-Data Store (a) Initial State PSoup 23 R.a<4 and R.b<S.b ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Data
(b) Arrival of new Data PSoup New data 53 5 4 Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 47 50 51 R.a 4 5 3 3 3 8 R.b R-Data Store 23 R.a<4 and R.b<S.b ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Data
(c) Building R-Data Store PSoup Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 47 50 51 53 R.a 4 5 3 5 3 3 8 4 R.b 23 R.a<4 and R.b<S.b R-Data Store
B U I L D
ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Data
(c) Probing Query Store PSoup Matches
{
Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 47 50 51 53 R.a 4 5 3 5 3 3 8 4 R.b 23 R.a<4 and R.b<S.b R-Data Store P R O B E ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Data
(d) Constructing Hybrid Structs PSoup Matches
{
53 53 53 20 4<S.b 21 4<S.b and S.a<10 22 10>S.a and S.b>2 Hybrid Structs ID 47 50 51 53 R.a 4 5 3 5 3 3 8 4 R.b Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23 R.a<4 and R.b<S.b R-Data Store R.ID Q.ID Q.Predicate ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store
Sirish Chandrasekaran
Joins over R and S:
Arrival of New Data
(e) Probing S-Data Store PSoup Matches
}
Hybrid Structs ID 47 50 51 53 R.a 4 5 3 5 3 3 8 4 R.b ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23 R.a<4 and R.b<S.b R-Data Store
PROBE
53 53 53 20 4<S.b 21 4<S.b and S.a<10 22 10>S.a and S.b>2 R.ID Q.ID Q.Predicate
53,48,22 53,49,22
R,S,Q Results
Sirish Chandrasekaran
Other Queries
N-way Joins
Similar to 2-way joins Probe, generate hybrid structs, repeat Can be executed without intermediate tables
Aggregations
Performed at query invocation Uses n-ary ranked tree, clustered on time
Sirish Chandrasekaran
Outline of Talk
PSoup Overview
Query Model Query Registration Background Processing
Selections Queries, Join Queries
Query Invocation
PSoup in Telegraph Experiments and Results Conclusions and Future Work
Sirish Chandrasekaran
Telegraph: Adaptive Dataflow Engine
Eddy [AH00]: Adaptive per-tuple routing SteMs [Ram01]: Shared data structures across
- perators
Eddies + SteMs = adaptive N-way symmetric join
Eddy
A B C D
Eddy
A B C D
- utput
- utput
SteMs
Sirish Chandrasekaran
Telegraph Background: CACQ
CACQ [MSHR02]
Shared execution of multiple queries with one Eddy
Tuple lineage Query Indices
Queries and Data treated very differently Only Landmark Continuous Queries No support for disconnected operation
Sirish Chandrasekaran
Leverage SteMs to store and index queries Changes to Eddies
Encode queries as tuples
break Where clause into individual boolean factors (BF) encode each BF as R.a relop [R.b|S.b] [+|-] constant
Stream Prefix Consistency
A new query or data tuple is completely processed before any other tuple: no holes in Result Structure.
Results Structure: to buffer the results.
PSoup in Telegraph
Sirish Chandrasekaran
Outline of Talk
PSoup Overview
Query Model Query Registration Background Processing
Selections Queries, Join Queries
Query Invocation
PSoup in Telegraph Experiments and Results Conclusions and Future Work
Sirish Chandrasekaran
Experiments and Results
Alternatives
NoMat – No background processing PSoup-Partial – background processing, apply current window
- n invocation
PSoup-Complete – current windows are also continuously applied in the background
Experimental Parameters
Unloaded Server with two Intel Pentium III, 666 MHz processors with 768 MB RAM Data arrives as fast as possible, in domain [0,255] Queries of form R.a relop C, where c in [0,255] Join Queries of form R.a relop S.b +/- C.
Sirish Chandrasekaran
Experiments: Response Time vs. Window Size
Interval Predicates, Selection Queries
Sirish Chandrasekaran
Equality Predicates, Selection Queries
Experiments: Response Time vs. Window Size
Sirish Chandrasekaran
Window Size = 1000 tuples
Experiments: Max data arrival rate vs. #SQCs
Sirish Chandrasekaran
PSoup in traditional query processor
PSoup = SQL QUERY over data and client query streams?
Joins = expression evaluators
Notes
Conventional QPs do not have tuple lineage Conventional QPs always use intermediate tables
Sirish Chandrasekaran
Outline of Talk
PSoup Overview
Query Model Query Registration Background Processing
Selections Queries, Join Queries
Query Invocation
PSoup in Telegraph Experiments and Results Conclusions and Future Work
Sirish Chandrasekaran
Conclusions
Treating Queries and Data the same
Combines approaches for previously studied queries
Queries over the past and continuous queries
Allows new functionality – hybrid queries
Separating Result Generation and Delivery
Makes disconnected operation feasible Efficient support for repeated query invocations
Telegraph’s adaptive framework is a convenient substrate on which to build this system
Sirish Chandrasekaran