Streaming Queries over Streaming Data Sirish Chandrasekaran UC - - PowerPoint PPT Presentation

streaming queries over streaming data
SMART_READER_LITE
LIVE PREVIEW

Streaming Queries over Streaming Data Sirish Chandrasekaran UC - - PowerPoint PPT Presentation

Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 VLDB 2002 with Michael J. Franklin Motivation (1): Queries over Data Streams Queries over the past Select all the Hondas that passed I-80 at Ashby Ave.


slide-1
SLIDE 1

Streaming Queries over Streaming Data

Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin

VLDB 2002

slide-2
SLIDE 2

Sirish Chandrasekaran

Queries over the past

Select all the Hondas that passed I-80 at Ashby

  • Ave. between 2:15 and 2:45 pm today

Queries over the future: Continuous Query

Landmark Window Query

Continuously Select all the Hondas that pass I-80 at Ashby Ave. starting now

Sliding Window Query

Continuously Select all the Hondas that pass I-80 at Ashby Ave. in the latest half hour starting now

Hybrid Query

Continuously Select all the Hondas that have passed and will

pass I-80 at Ashby Ave. since 2:30 pm today

Motivation (1): Queries over Data Streams

slide-3
SLIDE 3

Sirish Chandrasekaran

Conventional Databases

Data is stored/indexed in the system Queries are applied to stored data as they “stream through” Only support queries over the past

Data Query Index Result

slide-4
SLIDE 4

Sirish Chandrasekaran

CQ Engines

Only support landmark window queries over the future

Queries Data Index Result

Queries are stored/indexed in the system Data is applied to stored queries as they “stream” through

slide-5
SLIDE 5

Sirish Chandrasekaran

Psoup Insight #1

Queries and data are duals

Store new queries, apply to data that arrived earlier Store new data, apply to queries that arrived earlier Multiquery Processing = “join” of query and data

– Supports all three types of queries: queries over the past,

(landmark and sliding window) continuous, and hybrid

Data

Index

Result

Queries

Query

Index

slide-6
SLIDE 6

Sirish Chandrasekaran

Psoup Insight #1

Index Index

Data

Result Data

Queries

Queries and data are duals

Store new queries, apply to data that arrived earlier Store new data, apply to queries that arrived earlier Multiquery Processing = “join” of query and data

– Supports all three types of queries: queries over the past,

(landmark and sliding window) continuous, and hybrid

slide-7
SLIDE 7

Sirish Chandrasekaran

Motivation (2): Disconnected Operation

Previous solutions stream out answers immediately

Not feasible/suitable for all applications

Intermittent Connectivity: e.g., Applications

  • n hand-held devices (as in this morning’s

keynote address) Even if connected: Not always interested in streaming answers

slide-8
SLIDE 8

Sirish Chandrasekaran

Psoup Insight #2

Separate computation from delivery

Query answers continuously generated in background Apply windows on-demand to transmit “current” results Efficient support for disconnected operation

Low response time, Shared computation and storage across invocations Data

ID R.aR.b

Query

ID Predicate

Results Structure Queries Data

T T F F T T F F F T F F

Register

T T F T

Invoke

}

slide-9
SLIDE 9

Sirish Chandrasekaran

Outline of Talk

PSoup Overview

Query Model Query Registration Background Processing

Selections Queries, Join Queries

Query Invocation

PSoup in Telegraph Experiments and Results Conclusions and Future Work

slide-10
SLIDE 10

Sirish Chandrasekaran

PSoup Query Model

SELECT select_list FROM from_list WHERE where_clause BEGIN begin_time END end_time

Where clause: conjunction of boolean factors BEGIN-END clause: system clock or sequence numbers (begin_time, end_time):

(constant, constant) – snapshot query (constant, variable) – landmark window query (variable, variable) – sliding window query

slide-11
SLIDE 11

Sirish Chandrasekaran

Query Registration

SELECT select_list FROM from_list WHERE where_clause BEGIN begin_time END end_time

}

}

Standing Query Clause (SQC) Windows_Table Symmetric Join to the to the

QueryID: handle for future query invocations

slide-12
SLIDE 12

Sirish Chandrasekaran

Selections over Single Stream:

Arrival of New Query Specification

Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b PSoup (a) Initial State Query Store ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3

slide-13
SLIDE 13

Sirish Chandrasekaran

Selections over Single Stream:

Arrival of New Query Specification

PSoup (b) Arrival of new Query

Select * From R Where R.a<=4 and R.b>=3

New query ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store Query Store

slide-14
SLIDE 14

Sirish Chandrasekaran

Selections over Single Stream:

Arrival of New Query Specification

PSoup (c) Building Query Store 24 R.a<=4 and R.b>=3 ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b

BUILD

Data Store Query Store

slide-15
SLIDE 15

Sirish Chandrasekaran

(d) Probing Data Store

Selections over Single Stream:

Arrival of New Query Specification

PSoup match match 24 R.a<=4 and R.b>=3 ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b PROBE Data Store Query Store

slide-16
SLIDE 16

Sirish Chandrasekaran

Selections over Single Stream:

Arrival of New Query Specification

Results Structure 48 49 50 51 20 T F T F 52 F 21 (e) Inserting Results Results Queries Data 22 23 24 48 50 4 3 3 8

slide-17
SLIDE 17

Sirish Chandrasekaran

Selections over Single Stream:

Arrival of New Data

Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b PSoup (a) Initial State Query Store ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 24 R.a<=4 and R.b>=3

slide-18
SLIDE 18

Sirish Chandrasekaran

PSoup (b) Arrival of new Data New data 24 R.a<=4 and R.b>=3 Query Store ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b 53 3 6

Selections over Single Stream:

Arrival of New Data

slide-19
SLIDE 19

Sirish Chandrasekaran

Selections over Single Stream:

Arrival of New Data

PSoup (c) Building Data Store 24 R.a<=4 and R.b>=3 Query Store ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b 53 3 6

B U I L D

slide-20
SLIDE 20

Sirish Chandrasekaran

(d) Probing Query Store

Selections over Single Stream:

Arrival of New Data

PSoup 24 R.a<=4 and R.b>=3 ID 20 21 22 23 Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Query Store Data Store ID 48 49 50 51 R.a 4 7 3 3 3 8 52 8 4 R.b 53 3 6 match match PROBE

slide-21
SLIDE 21

Sirish Chandrasekaran

Selections over Single Stream:

Arrival of New Data

Results Structure 48 49 50 51 20 52 21 (e) Inserting Results Results Queries Data 22 23 24 53 T F F F T 24 R.a<=4 and R.b>=3 20 0<R.a<=5

slide-22
SLIDE 22

Sirish Chandrasekaran

Query Invocation

Results Structure 48 49 50 51 20 T F T F 52 F 21 Queries 22 23 24 Data 53 T F F F T

}

Current Window

BEGIN begin_time END end_time System returns the results corresponding to the current value of the BEGIN-END clause

slide-23
SLIDE 23

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Query Specification

Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store (a) Initial State PSoup ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b S-Data Store

slide-24
SLIDE 24

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Query Specification

23 R.a<5 and R.a>S.a and S.b>1 (b) Arrival of new Query PSoup New query Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store S-Data Store ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b

slide-25
SLIDE 25

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Query Specification

23 R.a<5 and R.a>S.a and S.b>1 (c) Building Query Store PSoup ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store

B U I L D

S-Data Store ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b Query Store

slide-26
SLIDE 26

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Query Specification

(d) Probing R-Data Store PSoup

}

Matches 23 R.a<5 and R.a>S.a and S.b>1 ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store PROBE S-Data Store ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b Query Store

slide-27
SLIDE 27

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Query Specification

ID 20 21 22 23 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a<5 and R.a>S.a and S.b>1 ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store (e) Constructing Hybrid Structs PSoup

}

Matches 10 14 31 23 2>S.a and S.b>1 Query Store 23 3>S.a and S.b>1 23 4>S.a and S.b>1 Hybrid Structs R.ID Q.ID Q.Predicate S-Data Store ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b

slide-28
SLIDE 28

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Query Specification

(f) Probing S-Data Store PSoup Matches

{

ID 20 21 22 23 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a<5 and R.a>S.a and S.b>1 S-Data Store ID 10 14 31 48 R.a 2 3 4 9 5 3 1 7 R.b R-Data Store Query Store 10 14 31 23 2>S.a and S.b>1 23 3>S.a and S.b>1 23 4>S.a and S.b>1 Hybrid Structs R.ID Q.ID Q.Predicate

PROBE 14,21,23 31,21,23 31,25,23

R,S,Q Results ID 21 25 36 49 S.a 2 3 4 5 2 3 4 5 S.b

slide-29
SLIDE 29

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Data

Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 47 50 51 R.a 4 5 3 3 3 8 R.b R-Data Store (a) Initial State PSoup 23 R.a<4 and R.b<S.b ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store

slide-30
SLIDE 30

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Data

(b) Arrival of new Data PSoup New data 53 5 4 Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 47 50 51 R.a 4 5 3 3 3 8 R.b R-Data Store 23 R.a<4 and R.b<S.b ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store

slide-31
SLIDE 31

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Data

(c) Building R-Data Store PSoup Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 47 50 51 53 R.a 4 5 3 5 3 3 8 4 R.b 23 R.a<4 and R.b<S.b R-Data Store

B U I L D

ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store

slide-32
SLIDE 32

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Data

(c) Probing Query Store PSoup Matches

{

Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID 47 50 51 53 R.a 4 5 3 5 3 3 8 4 R.b 23 R.a<4 and R.b<S.b R-Data Store P R O B E ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store

slide-33
SLIDE 33

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Data

(d) Constructing Hybrid Structs PSoup Matches

{

53 53 53 20 4<S.b 21 4<S.b and S.a<10 22 10>S.a and S.b>2 Hybrid Structs ID 47 50 51 53 R.a 4 5 3 5 3 3 8 4 R.b Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23 R.a<4 and R.b<S.b R-Data Store R.ID Q.ID Q.Predicate ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store

slide-34
SLIDE 34

Sirish Chandrasekaran

Joins over R and S:

Arrival of New Data

(e) Probing S-Data Store PSoup Matches

}

Hybrid Structs ID 47 50 51 53 R.a 4 5 3 5 3 3 8 4 R.b ID 48 49 52 S.a 4 5 3 4 3 2 S.b S-Data Store Query Store ID 20 21 22 Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23 R.a<4 and R.b<S.b R-Data Store

PROBE

53 53 53 20 4<S.b 21 4<S.b and S.a<10 22 10>S.a and S.b>2 R.ID Q.ID Q.Predicate

53,48,22 53,49,22

R,S,Q Results

slide-35
SLIDE 35

Sirish Chandrasekaran

Other Queries

N-way Joins

Similar to 2-way joins Probe, generate hybrid structs, repeat Can be executed without intermediate tables

Aggregations

Performed at query invocation Uses n-ary ranked tree, clustered on time

slide-36
SLIDE 36

Sirish Chandrasekaran

Outline of Talk

PSoup Overview

Query Model Query Registration Background Processing

Selections Queries, Join Queries

Query Invocation

PSoup in Telegraph Experiments and Results Conclusions and Future Work

slide-37
SLIDE 37

Sirish Chandrasekaran

Telegraph: Adaptive Dataflow Engine

Eddy [AH00]: Adaptive per-tuple routing SteMs [Ram01]: Shared data structures across

  • perators

Eddies + SteMs = adaptive N-way symmetric join

Eddy

A B C D

Eddy

A B C D

  • utput
  • utput

SteMs

slide-38
SLIDE 38

Sirish Chandrasekaran

Telegraph Background: CACQ

CACQ [MSHR02]

Shared execution of multiple queries with one Eddy

Tuple lineage Query Indices

Queries and Data treated very differently Only Landmark Continuous Queries No support for disconnected operation

slide-39
SLIDE 39

Sirish Chandrasekaran

Leverage SteMs to store and index queries Changes to Eddies

Encode queries as tuples

break Where clause into individual boolean factors (BF) encode each BF as R.a relop [R.b|S.b] [+|-] constant

Stream Prefix Consistency

A new query or data tuple is completely processed before any other tuple: no holes in Result Structure.

Results Structure: to buffer the results.

PSoup in Telegraph

slide-40
SLIDE 40

Sirish Chandrasekaran

Outline of Talk

PSoup Overview

Query Model Query Registration Background Processing

Selections Queries, Join Queries

Query Invocation

PSoup in Telegraph Experiments and Results Conclusions and Future Work

slide-41
SLIDE 41

Sirish Chandrasekaran

Experiments and Results

Alternatives

NoMat – No background processing PSoup-Partial – background processing, apply current window

  • n invocation

PSoup-Complete – current windows are also continuously applied in the background

Experimental Parameters

Unloaded Server with two Intel Pentium III, 666 MHz processors with 768 MB RAM Data arrives as fast as possible, in domain [0,255] Queries of form R.a relop C, where c in [0,255] Join Queries of form R.a relop S.b +/- C.

slide-42
SLIDE 42

Sirish Chandrasekaran

Experiments: Response Time vs. Window Size

Interval Predicates, Selection Queries

slide-43
SLIDE 43

Sirish Chandrasekaran

Equality Predicates, Selection Queries

Experiments: Response Time vs. Window Size

slide-44
SLIDE 44

Sirish Chandrasekaran

Window Size = 1000 tuples

Experiments: Max data arrival rate vs. #SQCs

slide-45
SLIDE 45

Sirish Chandrasekaran

PSoup in traditional query processor

PSoup = SQL QUERY over data and client query streams?

Joins = expression evaluators

Notes

Conventional QPs do not have tuple lineage Conventional QPs always use intermediate tables

slide-46
SLIDE 46

Sirish Chandrasekaran

Outline of Talk

PSoup Overview

Query Model Query Registration Background Processing

Selections Queries, Join Queries

Query Invocation

PSoup in Telegraph Experiments and Results Conclusions and Future Work

slide-47
SLIDE 47

Sirish Chandrasekaran

Conclusions

Treating Queries and Data the same

Combines approaches for previously studied queries

Queries over the past and continuous queries

Allows new functionality – hybrid queries

Separating Result Generation and Delivery

Makes disconnected operation feasible Efficient support for repeated query invocations

Telegraph’s adaptive framework is a convenient substrate on which to build this system

slide-48
SLIDE 48

Sirish Chandrasekaran

Future Work

Disk and Query=Data

Swapping Data and Queries to Disk Data = RxW algorithm [Aksoy et al.] Queries = PT algorithm [Acharya et al.]

Disk-based query operators

N-way XJoin

Query Semantics

More complete query semantics

Not just snapshot, landmark and sliding windows

Integration into TelegraphCQ