presented by
play

Presented by Mark Hosang, Wayne Wight, Sedat Behar, Yevgeny Ioffe, - PowerPoint PPT Presentation

Presented by Mark Hosang, Wayne Wight, Sedat Behar, Yevgeny Ioffe, Archana Suhas Joshi, Izi Aviyente 1 Mar. 18,2003 Query Languages Timeline ! 1:40 Talk Outline ! 1:45 CQL Primer ! 2:05 SQuAl Primer ! 2:25 ATLaS Primer ! 2:50


  1. Presented by Mark Hosang, Wayne Wight, Sedat Behar, Yevgeny Ioffe, Archana Suhas Joshi, Izi Aviyente 1 Mar. 18,2003 Query Languages

  2. Timeline ! 1:40 – Talk Outline ! 1:45 – CQL Primer ! 2:05 – SQuAl Primer ! 2:25 – ATLaS Primer ! 2:50 – Break ! 3:00 – Sequence Query Processing ! 3:40 – Similarities and Differences (ATLaS, CQL, Aurora) ! 3:55 – Class Discussion ! 4:25 – Review Presentation Shortcomings 2 Mar. 18,2003 Query Languages

  3. References A. Arasu, S.Babu, J.Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical Report, November 2002. SQR- A Stream Query Repository. http://www.db.stanford.edu/stream/sqr. Joint effort of several data stream research groups. R. Motwani, J. Widom, et al. Query processing, resource management, and approximation in a data stream management system. In Proc. First Biennnial Conf. On Innovative Data Systems Research , January 2003. D.Abadi, D.Carney, U.Cetintemel, M. Cherniack, C, Convey, S. Lee, M. Stonebraker, N. Tatbul, S.Zdonik. Aurora: A New Model and Architecture for Data Stream Management. 3 Mar. 18,2003 Query Languages

  4. Outline ! Example Application: Boston Marathon ! STREAM Solution ! Review of STREAM ! CQL ! Example Queries ! AURORA Solution: SQuAl ! Aurora operators ! Example Queries ! Comparisons between Query Languages 4 Mar. 18,2003 Query Languages

  5. Example Application ! Boston Marathon ! Goal: To collect and process data about each runner continuously Satellite Speed Sensors Computer Base Station INTERNET INTERNET (update on (update on Finish Line Distance Timer Alerter Sensors Marathon) Marathon) 5 Mar. 18,2003 Query Languages

  6. Example Queries Find all runners running with speed greater than 1. 6mph. Count number of runners who passed the 10 2. mile split in the last 15 minutes Specify the Athlete_Ids who finished the 3. marathon within 5 hours …. Problem: How can we express these queries??? 6 Mar. 18,2003 Query Languages

  7. Previous Solutions Proposed ! TQL (1992) – SQL based query language of Tapestry Set て =- ∞ FOREVER DO Set t:= current time Execute Queries Q M (t) and Q M ( て ) Return Q M (t)-Q M ( て ) to user set て :=t Sleep for some period of time ENDLOOP 7 Mar. 18,2003 Query Languages

  8. Previous Solutions Proposed ! NiagraCQ(2000) – XML-QL is used to express queries CREATE CQ_NAME XML-QL QUERY DO action | START start_time] |EVERY time_interval] |EXPIRE expiration_time] WHERE <athlete> <country>Spain</name> <name> <lastname>$1</lastname> </> </>IN* CONSTRUCT <lastname>$1</lastname> 8 Mar. 18,2003 Query Languages

  9. Previous Solutions Proposed ! PSoup(2002) – SQL style queries, windows for Select Project Join queries SELECT * FROM SPEED WHERE (SPEED.RUNRATE<5) BEGIN (NOW-10) END (NOW) 9 Mar. 18,2003 Query Languages

  10. STREAM’s solution: ! Define a concrete query language (CQL) that ! Exploits relational semantics ! Is easy and compact to write ! Uses standard relational transformations for query optimization 10 Mar. 18,2003 Query Languages

  11. Recall: STREAM Define て : global, discrete, ordered time domain Stream S is unbounded bag of elements <s,t> Tuple belonging to the schema t ∈ て (timestamp) Relation R is a mapping from て to a bag of tuples where each tuple belongs to schema of the relation. (R(t)) 11 Mar. 18,2003 Query Languages

  12. RECALL: STREAM (continued) Need mappings between streams and relations: Window Specification Language Relational Streams Relations Query Language Relation-to-Stream Operators (IStream, DStream, RStream) 12 Mar. 18,2003 Query Languages

  13. Schema of Boston Marathon Speed (Athlete_ID /* unique identifier of the athlete */, S T R E A M S runRate /* the rate at which the athlete runs */) timestamp /* time of measurement */ Timer (Athlete_ID, time_elapsed /* time measured from start */) timestamp Distance_covered (Athlete_ID, distance /* distance from start */) timestamp Finished (Athlete_ID) timestamp Athlete (Athlete_ID, name, age, country) 13 Mar. 18,2003 Query Languages

  14. IStream(R) contains all (s,T) Window Specification • Language where s Є R at time T but s Є R at R time T-1 Q Stream Relation L DStream(R) contains all (s,T) Relation- Stream • EXAMPLE where s Є R at time T-1 but s Є R Operators at time T Speed Stream RStream(R) contains all (s,T) We want to stream the • where s Є R at time T average of the last 5 speed recordings taken: Athlete_id Runrate ts S R 51 5 8:30:00 T E R L 62 6 8:30:00 A E Average ID Rate T Average I 73 7 8:30:01 A I Relation 83 5 S M 6.6 83 5 8:30:02 O T - Average N 54 7 54 7 8:30:03 R R - …. E E R 68 8 6.6 68 8 8:30:04 A E L 79 7 L 79 7 8:30:04 M A s A T 80 6 8:30:05 80 6 T I I …. …. …. O O N N 14 Mar. 18,2003 Query Languages

  15. RECALL: Window Specification Language Time-based Windows: ! S[Range T] ! if T=0 S[Now] if T= ∞ S[Range Unbounded] Tuple-based Windows: ! S[Rows N] ! Partitioned Windows: ! S[Partition By A 1 , A 2 ,…..,A K Rows N] ! 15 Mar. 18,2003 Query Languages

  16. Sample Query Find all runners running with speed greater than 6 mph SELECT * Unbounded FROM Speed S window by WHERE S.runRate > 6 default Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 16 Mar. 18,2003 Query Languages

  17. Sample Query Count number of runners who passed the 10 mile split in the last 15 minutes SELECT Count(*) Relational FROM Distance_covered D [Range 15 Min] result WHERE D.distance >= 10 Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 17 Mar. 18,2003 Query Languages

  18. Sample Query Specify the Athlete_IDs who finished within 5 hours of their starting SELECT Istream(F.Athlete_ID) FROM Finished[Now] F, Timer [Range 5 Hours] T WHERE F.Athlete_ID = T.Athlete_ID Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 18 Mar. 18,2003 Query Languages

  19. Sample Query Specify runners from USA who have just finished SELECT Istream (A.*) FROM Finished[Now] F, Athletes A WHERE F.Athelete_ID = A.Athlete_ID AND A.country = ‘USA’ Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 19 Mar. 18,2003 Query Languages

  20. Implementation: Query Optimization " Window Reduction SELECT Istream (*) FROM Speed as S[Range Unbounded] WHERE Speed>6 SELECT Istream(*) FROM Speed as S[Now] WHERE Speed>6 20 Mar. 18,2003 Query Languages

  21. Query Optimization (Contd.) " Filter Window Commutativity SELECT Count(*) FROM Distance_covered D [Range 15 Min] WHERE D.distance >= 10 (SELECT Count(*) FROM Distance_covered D WHERE D.distance >= 10) [Range 15 Min] 21 Mar. 18,2003 Query Languages

  22. Aurora GUI Rstream Filter Map i0 o0 Filter Box Map Box Tumble Box Slide Box Xsection Box Wsort Box Union Restream Box Merge Box Join Box Resample Box Filter Map Super Box i1 o1 Map Filter Rstream i2 o2 Starting an arc at Map Ending an arc at Filter Starting an arc at Filter Ending an arc at Map Box 4 Starting an arc at Map Box 4 Ending an arc at Union 22 Mar. 18,2003 Query Languages

  23. SQuAl (Stream Query Algebra) Operators ! Order-Agnostic Operators ! Order-Sensitive Operators ! Filter ! BSort ! Map ! Aggregate ! Union ! Join ! Resample 23 Mar. 18,2003 Query Languages

  24. •Filter •Map Order-Agnostic Operators •Union •BSort •Aggregate •Join •Resample Filter(P 1 ,…,P m )(S) " Similar to relational selection " P 1 ,…,P m : predicates " S: input stream " Output: m+1 streams with same schema as S 24 Mar. 18,2003 Query Languages

  25. •Filter •Map Order-Agnostic Operators •Union •BSort •Aggregate t=(TS=ts, A 1 =v 1 ,……,A k =v k ) •Join •Resample Filter(P 1 ,P 2 )(Distance_covered) P 1 : Distance< 3 mile P 2 : Distance> 5 mile Result will have three streams… Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 25 Mar. 18,2003 Query Languages

  26. •Filter •Map Order-Agnostic Operators •Union •BSort •Aggregate •Join •Resample Athlete_ID Distance Timestamp 9908 5 08:14 9909 4.1 08:15 9915 6.2 08:16 9911 2.8 08:17 9908 5.5 08:18 9909 4.9 08:19 9915 6.6 08:20 Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 26 Mar. 18,2003 Query Languages

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend