Presented by Mark Hosang, Wayne Wight, Sedat Behar, Yevgeny Ioffe, - - PowerPoint PPT Presentation

presented by
SMART_READER_LITE
LIVE PREVIEW

Presented by Mark Hosang, Wayne Wight, Sedat Behar, Yevgeny Ioffe, - - PowerPoint PPT Presentation

Presented by Mark Hosang, Wayne Wight, Sedat Behar, Yevgeny Ioffe, Archana Suhas Joshi, Izi Aviyente 1 Mar. 18,2003 Query Languages Timeline ! 1:40 Talk Outline ! 1:45 CQL Primer ! 2:05 SQuAl Primer ! 2:25 ATLaS Primer ! 2:50


slide-1
SLIDE 1
  • Mar. 18,2003

Query Languages

1

Presented by

Mark Hosang, Wayne Wight, Sedat Behar, Yevgeny Ioffe, Archana Suhas Joshi, Izi Aviyente

slide-2
SLIDE 2
  • Mar. 18,2003

Query Languages

2

Timeline

! 1:40 – Talk Outline ! 1:45 – CQL Primer ! 2:05 – SQuAl Primer ! 2:25 – ATLaS Primer ! 2:50 – Break ! 3:00 – Sequence Query Processing ! 3:40 – Similarities and Differences (ATLaS, CQL,

Aurora)

! 3:55 – Class Discussion ! 4:25 – Review Presentation Shortcomings

slide-3
SLIDE 3
  • Mar. 18,2003

Query Languages

3

References

  • A. Arasu, S.Babu, J.Widom. An Abstract Semantics and

Concrete Language for Continuous Queries over Streams and Relations. Technical Report, November 2002. SQR- A Stream Query Repository. http://www.db.stanford.edu/stream/sqr. Joint effort of several data stream research groups.

  • R. Motwani, J. Widom, et al. Query processing, resource

management, and approximation in a data stream management system. In Proc. First Biennnial Conf. On Innovative Data Systems Research, January 2003. D.Abadi, D.Carney, U.Cetintemel, M. Cherniack, C, Convey, S. Lee, M. Stonebraker, N. Tatbul, S.Zdonik. Aurora: A New Model and Architecture for Data Stream Management.

slide-4
SLIDE 4
  • Mar. 18,2003

Query Languages

4

Outline

! Example Application: Boston Marathon ! STREAM Solution

! Review of STREAM ! CQL ! Example Queries

! AURORA Solution: SQuAl

! Aurora operators ! Example Queries

! Comparisons between Query Languages

slide-5
SLIDE 5
  • Mar. 18,2003

Query Languages

5

Example Application

! Boston Marathon

! Goal: To collect and process data about each runner

continuously

Speed Sensors Distance Sensors Finish Line Alerter Timer

Computer Base Station

Satellite

INTERNET INTERNET (update on (update on Marathon) Marathon)

slide-6
SLIDE 6
  • Mar. 18,2003

Query Languages

6

Example Queries

1.

Find all runners running with speed greater than 6mph.

2.

Count number of runners who passed the 10 mile split in the last 15 minutes

3.

Specify the Athlete_Ids who finished the marathon within 5 hours ….

Problem: How can we express these queries???

slide-7
SLIDE 7
  • Mar. 18,2003

Query Languages

7

Previous Solutions Proposed

! TQL (1992) – SQL based query language of

Tapestry

Set て=-∞ FOREVER DO Set t:= current time Execute Queries QM(t) and QM(て) Return QM(t)-QM(て) to user set て:=t Sleep for some period of time ENDLOOP

slide-8
SLIDE 8
  • Mar. 18,2003

Query Languages

8

Previous Solutions Proposed

! NiagraCQ(2000) – XML-QL is used to express

queries

CREATE CQ_NAME XML-QL QUERY DO action | START start_time] |EVERY time_interval] |EXPIRE expiration_time]

WHERE <athlete> <country>Spain</name> <name> <lastname>$1</lastname> </> </>IN* CONSTRUCT <lastname>$1</lastname>

slide-9
SLIDE 9
  • Mar. 18,2003

Query Languages

9

Previous Solutions Proposed

! PSoup(2002) – SQL style queries, windows for Select

Project Join queries

SELECT * FROM SPEED WHERE (SPEED.RUNRATE<5) BEGIN (NOW-10) END (NOW)

slide-10
SLIDE 10
  • Mar. 18,2003

Query Languages

10

STREAM’s solution:

! Define a concrete query language (CQL) that

! Exploits relational semantics ! Is easy and compact to write ! Uses standard relational transformations for query

  • ptimization
slide-11
SLIDE 11
  • Mar. 18,2003

Query Languages

11

Recall: STREAM

Define て : global, discrete, ordered time domain Stream S is unbounded bag of elements <s,t> Relation R is a mapping from て to a bag of tuples where each tuple belongs to schema of the

  • relation. (R(t))

Tuple belonging to the schema

t∈ て(timestamp)

slide-12
SLIDE 12
  • Mar. 18,2003

Query Languages

12

RECALL: STREAM (continued)

Need mappings between streams and relations: Streams Relations

Relation-to-Stream Operators (IStream, DStream, RStream) Window Specification Language Relational Query Language

slide-13
SLIDE 13
  • Mar. 18,2003

Query Languages

13

Schema of Boston Marathon

Speed (Athlete_ID /* unique identifier of the athlete */, runRate /* the rate at which the athlete runs */) timestamp /* time of measurement */ Timer (Athlete_ID, time_elapsed /* time measured from start */) timestamp Distance_covered (Athlete_ID, distance /* distance from start */) timestamp Finished (Athlete_ID) timestamp Athlete (Athlete_ID, name, age, country)

S T R E A M S

slide-14
SLIDE 14
  • Mar. 18,2003

Query Languages

14

EXAMPLE

We want to stream the average of the last 5 speed recordings taken: 8:30:05 …. 6 …. 80 …. 8:30:04 7 79 8:30:04 8 68 8:30:03 7 54 8:30:02 5 83 8:30:01 7 73 8:30:00 6 62 8:30:00 5 51 ts Runrate Athlete_id

Speed Stream

Rate ID

6 80 7 79 8 68 7 54 5 83

S T R E A M

  • R

E L A T I O N

Average

6.6

Stream Relation

Window Specification Language Relation- Stream Operators

R Q L

R E L A T I O N

  • R

E L A T I O N

Average Relation

I S T R E A M s

…. 6.6 Average

  • IStream(R) contains all (s,T)

where s Є R at time T but s Є R at time T-1

  • DStream(R) contains all (s,T)

where s Є R at time T-1 but s Є R at time T

  • RStream(R) contains all (s,T)

where s Є R at time T

slide-15
SLIDE 15
  • Mar. 18,2003

Query Languages

15

RECALL: Window Specification Language

!

Time-based Windows:

!

S[Range T] if T=0 S[Now] if T= ∞ S[Range Unbounded]

!

Tuple-based Windows:

!

S[Rows N]

!

Partitioned Windows:

!

S[Partition By A1, A2,…..,AK Rows N]

slide-16
SLIDE 16
  • Mar. 18,2003

Query Languages

16

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Sample Query

Find all runners running with speed greater than 6 mph

SELECT * FROM Speed S WHERE S.runRate > 6

Unbounded window by default

slide-17
SLIDE 17
  • Mar. 18,2003

Query Languages

17

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Sample Query

Count number of runners who passed the 10 mile split in the last 15 minutes

SELECT Count(*) FROM Distance_covered D [Range 15 Min] WHERE D.distance >= 10

Relational result

slide-18
SLIDE 18
  • Mar. 18,2003

Query Languages

18

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Sample Query

Specify the Athlete_IDs who finished within 5 hours of their starting

SELECT Istream(F.Athlete_ID) FROM Finished[Now] F, Timer [Range 5 Hours] T WHERE F.Athlete_ID = T.Athlete_ID

slide-19
SLIDE 19
  • Mar. 18,2003

Query Languages

19

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Sample Query

Specify runners from USA who have just finished

SELECT Istream (A.*) FROM Finished[Now] F, Athletes A WHERE F.Athelete_ID = A.Athlete_ID AND A.country = ‘USA’

slide-20
SLIDE 20
  • Mar. 18,2003

Query Languages

20

Implementation: Query Optimization

" Window Reduction SELECT Istream (*) FROM Speed as S[Range Unbounded] WHERE Speed>6 SELECT Istream(*) FROM Speed as S[Now] WHERE Speed>6

slide-21
SLIDE 21
  • Mar. 18,2003

Query Languages

21

Query Optimization (Contd.)

" Filter Window Commutativity SELECT Count(*) FROM Distance_covered D [Range 15 Min] WHERE D.distance >= 10 (SELECT Count(*) FROM Distance_covered D WHERE D.distance >= 10) [Range 15 Min]

slide-22
SLIDE 22
  • Mar. 18,2003

Query Languages

22

Aurora GUI

i1 i2 i0

  • 1
  • 2

Filter Rstream Map Map Filter Filter Map Rstream Union

Filter Box Map Box Tumble Box Slide Box Xsection Box Wsort Box Restream Box Merge Box Join Box Resample Box Super Box

Starting an arc at Map Ending an arc at Filter Starting an arc at Filter Ending an arc at Map Box 4 Starting an arc at Map Box 4 Ending an arc at Union

slide-23
SLIDE 23
  • Mar. 18,2003

Query Languages

23

SQuAl (Stream Query Algebra) Operators

! Order-Agnostic Operators

! Filter ! Map ! Union

! Order-Sensitive Operators

! BSort ! Aggregate ! Join ! Resample

slide-24
SLIDE 24
  • Mar. 18,2003

Query Languages

24

Order-Agnostic Operators

Filter(P1,…,Pm)(S)

" Similar to relational selection " P1,…,Pm : predicates " S: input stream " Output: m+1 streams with same schema as S

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-25
SLIDE 25
  • Mar. 18,2003

Query Languages

25

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Order-Agnostic Operators

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample

Filter(P1,P2)(Distance_covered)

P1: Distance< 3 mile P2: Distance> 5 mile Result will have three streams…

t=(TS=ts, A1=v1,……,Ak=vk)

slide-26
SLIDE 26
  • Mar. 18,2003

Query Languages

26

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Order-Agnostic Operators

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample

08:20 6.6 9915 08:19 4.9 9909 08:18 5.5 9908 08:17 2.8 9911 08:16 6.2 9915 08:15 4.1 9909 08:14 5 9908 Timestamp Distance Athlete_ID

slide-27
SLIDE 27
  • Mar. 18,2003

Query Languages

27

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Order-Agnostic Operators

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample

08:18 5.5 9908 08:14 5 9908 08:20 6.6 9915 08:16 6.2 9915 08:17 2.8 9911 08:19 4.9 9909 08:15 4.1 9909 Timestamp Distance Athlete_ID

slide-28
SLIDE 28
  • Mar. 18,2003

Query Languages

28

Order-Agnostic Operators

Map(B1=F1,…,Bm=Fm)(S)

" Similar to relational projection " S: input stream " Maps every tuple t to (TS=t.ts, B1=F1(t),…,Bm=Fm(t)) " Output schema does not have to be the same as input schema

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-29
SLIDE 29
  • Mar. 18,2003

Query Languages

29

08:20 6.6 9915 08:19 4.9 9909 08:18 5.5 9908 08:17 2.8 9911 08:16 6.2 9915 08:15 4.1 9909 08:14 5 9908 Timestamp Distance Athlete_ID

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Order-Agnostic Operators

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample

Map(Athlete_ID=F1,distance=F2)(S)

08:20 10.56 9915 08:19 7.84 9909 08:18 8.8 9908 08:17 4.48 9911 08:16 9.92 9915 08:15 6.56 9909 08:14 8 9908 Timestamp Distance Athlete_ID

slide-30
SLIDE 30
  • Mar. 18,2003

Query Languages

30

Order-Agnostic Operators

Union(S1,…Sn)

Merges two or more streams into a single stream

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample

08:12:17 2.8 9904 08:12:16 6.2 9903 08:12:15 4.7 9902 08:12:14 5 9901 Timestamp Distance Athlete _ID 08:12:20 6.6 9915 08:12:19 4.9 9909 08:12:18 5.3 9908 Timestamp Distance Athlete _ID

Female runners Male runners

08:12:20 6.6 9915 08:12:19 4.9 9909 08:12:18 5.3 9908 08:12:17 2.8 9904 08:12:16 6.2 9903 08:12:15 4.7 9902 08:12:14 5 9901 Timestamp Distance Athlete_ID

slide-31
SLIDE 31
  • Mar. 18,2003

Query Languages

31

Order Sensitive Operators

" Assume some ordering on input stream " Order (On A, slack n, GroupBy B1,…,Bn) " For n = 0 and GroupBy clause is empty, tuples arrive in order " Otherwise it is out of order by n that means there are n tuples before the tuple t in S say u, such that u.A > t.A 5.. 5 2 4 3 2 1

slide-32
SLIDE 32
  • Mar. 18,2003

Query Languages

32

BSort

" Bsort(Assuming O)(S) " Equivalent to n passes of bubble sort where slack = n " Maintains a buffer of n + 1 tuples … 10 9 8 7 6 5 4 3 2 1 … 5 2 3 8 4 4 1 3 2 1

Time Input Stream

  • 1
  • 2

1

Buffer

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-33
SLIDE 33
  • Mar. 18,2003

Query Languages

33

BSort

" Bsort(Assuming O)(S) " Equivalent to n passes of bubble sort where slack = n " Maintains a buffer of n + 1 tuples 10 9 8 7 6 5 4 3 2 1 5 2 3 8 4 4 1 3 2 1

Time

  • 2

1

  • 3

2 1 1 3 2 1

  • 1

Input Stream

  • 1
  • Buffer

Output Stream

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-34
SLIDE 34
  • Mar. 18,2003

Query Languages

34

BSort

" Bsort(Assuming O)(S) " Equivalent to n passes of bubble sort where slack = n " Maintains a buffer of n + 1 tuples 10 9 8 7 6 5 4 3 2 1 5 2 3 8 4 4 1 3 2 1

Time

5 8 4 4 1 3 2 1 4 1 3 2 4 4 3 2 8 4 4 3 3 8 4 4 2 8 4 4

  • 2

1

  • 3

2 1

  • 1

Input Stream

4 2 3 3 2 1 1

  • Buffer

Output Stream

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-35
SLIDE 35
  • Mar. 18,2003

Query Languages

35

Aggregate

Aggregate(F, Assuming O, Size S, Advance I)(S) " F is aggregation operation or user defined function " Can handle out of order tuples by specifying slack n " Can specify time out for blocking operators

Example follows

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-36
SLIDE 36
  • Mar. 18,2003

Query Languages

36

Aggregate Example

Compute an hourly average speed of each runner . Input Stream Initially After 1 hour Output Stream Aggregate(Avg(speed), Assuming Order (On Time, GroupBy Athlete_ID), Size 1 hour, Advance 1 hour) (Speed)

205 : 1:00 – 1:59 Tuples : 1, 7 102 : 1:00 – 1:59 Tuples : 2, 6 300 : 1:00 – 1:59 Tuples : 3, 4, 5, 8 205 : 2:00 – 2:59 Tuples : 10, … 102 : 2:00 – 2:59 Tuples : 11, … 300 : 2:00 – 2:59 Tuples : 9, …

1. (300, 1:00, 5.45) 2. (205, 1:00, 5.9) 3. (102, 1:00, 5.5) … 1(205, 1:00, 6.0) 2(102, 1:00, 6.0) 3(300, 1:00, 5.8) 4(300, 1:15, 5.5) 5(300, 1:30, 5.0) 6(102, 1:30, 5.0) 7(205, 1:30, 5.8) 8(300, 1:45, 5.5) 9(300, 2:00, 5.1) 10(205, 2:00, 5.6) 11(102, 2:00, 5.7)

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-37
SLIDE 37
  • Mar. 18,2003

Query Languages

37

Aggregate Example

Compute an hourly average speed of each runner . Input Stream Initially After 1 hour Output Stream Aggregate(Avg(speed), Assuming Order (On Time, GroupBy Athlete_ID), Size 1 hour, Advance 1 hour) (Speed)

205 : 1:00 – 1:59 Tuples : 1, 7 102 : 1:00 – 1:59 Tuples : 2, 6 300 : 1:00 – 1:59 Tuples : 3, 4, 5, 8 205 : 2:00 – 2:59 Tuples : 10, … 102 : 2:00 – 2:59 Tuples : 11, … 300 : 2:00 – 2:59 Tuples : 9, …

1. (300, 1:00, 5.45) 2. (205, 1:00, 5.9) 3. (102, 1:00, 5.5) … 1(205, 1:00, 6.0) 2(102, 1:00, 6.0) 3(300, 1:00, 5.8) 4(300, 1:15, 5.5) 5(300, 1:30, 5.0) 6(102, 1:30, 5.0) 7(205, 1:30, 5.8) 8(300, 2:00, 5.1) 9(205, 2:00, 5.6) 10(102, 2:00, 5.7) 11(300, 1:45, 5.5)

Slack=1

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-38
SLIDE 38
  • Mar. 18,2003

Query Languages

38

Join

" Join(P, Size S, Left Assuming O1, Right Assuming O2)(S1, S2) " Concatenates two tuples t in S1 and u in S2 if " Both tuples satisfy the predicates (t.A – u.B) <= S " Consider two streams professor and student having the schema (name, time, location) " Join two streams having identical location within 10 minutes Professors (P) Students (S) Join(P, Size 10 min, …)(P, S)

1. (Rick, 9:00, V205) 2. (Tim, 9:00, V101) 3. (James, 9:00, V101) 4. (Rick, 9:05, V105) 5. (Tim, 9:05, V106) 6. (James, 9:05, V101) 1. (Sam, 8:55, V205) 2. (John, 8:55, V105) 3. (Adam, 8:55, V104) 4. (Jack, 8:55, V106) 5. (John, 9:05, V205) 6. (Adam, 9:05, V105)

  • 1. (Rick, 9:00, V205, Sam, 8:55, V205)
  • 2. (Rick, 9:00, V205, John, 9:05, V205)
  • 3. (Rick, 9:05, V105, John, 8:55, V105)
  • 4. (Rick, 9:05, V105, Adam, 9:05, V105)
  • 5. (Tim, 9:05, V106, Jack, 8:55, V106)
  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
slide-39
SLIDE 39
  • Mar. 18,2003

Query Languages

39

Resample

Resample (F,Size 6, Left Assuming O1, Right Assuming O2) (Time,Distance_covered)

Time(S) Distance_covered(D) Resample(F,Size6,OnTime, Assuming O) 12:00 (15,12:03,3) 12:10 (25,12:02,2) (35,12:01,2.5) (15,12:09,3.2) (25,12:07,3.4) (25,12:10,3.5) (15,12:13,3.7)

  • Filter
  • Map
  • Union
  • BSort
  • Aggregate
  • Join
  • Resample
  • Align streams by producing a synthetic stream
  • btained form interpolation of actual tuples

15,12:00,3 25,12:00,2 35,12:00,2.5 15,12:10,3.45 25,12:10,3.45

slide-40
SLIDE 40
  • Mar. 18,2003

Query Languages

40

Speed (Athlete_ID, runRate, timestamp)

SCHEMA: SCHEMA:

Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country)

Sample Query

Count number of runners whose hourly average speed is greater than 6mph

Filter(average>6)

Aggregate(Avg(runRate),Assuming O,Size 1 hour,Advance 1 hour)(Speed)

O=Order(On Time,Slack n)

slide-41
SLIDE 41
  • Mar. 18,2003

Query Languages

41

CQL & SQuAl

! CQL

! Declarative ! No support for time-out

  • perator

! Supports relations ! No support for out-of-order

streams

! SQuAl

! Procedural ! Support for time-out

  • perator

! Does not explicitly support

relations

! Tuples need not arrive in

  • rder- out of order streams

supported

What to take Away…