sequence query processing
play

SEQUENCE QUERY PROCESSING Praveen Seshadri, Miron Livny, Raghu - PowerPoint PPT Presentation

SEQUENCE QUERY PROCESSING Praveen Seshadri, Miron Livny, Raghu Ramakrishnan (CS Department, University of Wisconsin-Madison, WI) Sedat Behar and Yevgeny Ioffe REFERENCES Praveen Sheshadri, Management of Sequence Data (PhD Thesis)


  1. SEQUENCE QUERY PROCESSING Praveen Seshadri, Miron Livny, Raghu Ramakrishnan (CS Department, University of Wisconsin-Madison, WI) Sedat Behar and Yevgeny Ioffe

  2. REFERENCES • Praveen Sheshadri, Management of Sequence Data (PhD Thesis) • R.Ramakrishnan, M.Chend, M.Livny, P.Sheshadri. What’s Next? Sequence Queries • P.Sheshadri, M.Livny, R.Ramakrishnan. SEQ: A Model for Sequence Databases • H.Gunadhi, A.Segev. Event-join optimization in temporal relational databases

  3. WHY? (Difficulties with expressing Sequences) • Relational Databases: – Data collections treated as sets not sequences • Difficult to evaluate in SQL – Data model does not help evaluation • Hard to optimize: – scan two sequences in lock-step

  4. Query: Find the Volcano that caused an earthquake of magnitude 7.0 or greater SELECT V.name FROM Volcanos V, Earthquakes E WHERE E.Strength > 7.0 AND E.time = (SELECT max(E1.time) FROM Earthquakes E1 WHERE E1.time < V.time) It is difficult and not efficient!

  5. SEQUENCE MODEL Definitions : • Record : <A 1 :T 1 , A 2 :T 2 ,…, A n :T n > • Attribute : data type (int, String…) • Type : an instance in the type domain (Null associated) • Position : location of an entry in the record • Position Ordering : function that returns the position (index) • Type Domain : ( T 1 x T 2 x …x T n ) Example of a Record: <Name:String, HeartBeat:Int, BloodPres:Int …>

  6. SEQUENCE MODEL • Types of Sequences: BASE SEQUENCES : some positions map to some records CONSTANT SEQUENCES : every position maps to the same unique record Ex: Perfect Health (Mr.Perfect, 80, 80/120) DERIVED SEQUENCES : defined by a sequence operator

  7. SEQUENCE MODEL - OPERATORS • All operators are compositional: produce a single derived sequence from 2 sequences • Simple Unary Operators: – Selection: similar to Relational Databases – Projection : similar to Relational Databases – Positional Offset : shifts input sequence by offset – Value Offset : shifts the non-null entries by offset • Aggregate Unary Operators – agg_pos(i): selects set P of positions for each position i – agg_func: aggregate function over records in input stream at positions p in P • Compose Operator ( positional join operator ) – binary operator composing records r1 and r2 of two input sequences at each position i ( Null exception )

  8. SEQUENCE MODEL Sequence Queries: Project • A sequence query is an V.name acyclic graph of operators compose (just like Aurora!) E v • Output of query is output Select STR>7.0 sequence previous Volcanos • No output can be input to more than 1 operator ! Earthquakes graph is a tree

  9. SCOPE OF OPERATOR • important for optimization and evaluation! • Operator can be described by 2 functions: – Scope : defines the positions of input records to look at – OpFunc : an operator function that actually works with input records defined by Scope to define the output sequence

  10. OPERATOR PROPERTIES • Operator properties: – Scope size at position i – Scope sequentiality – Scope relativity at position i • Complex operator – Is an acyclic composition of basic operators – Properties of basic operators determine its property ( fixed scope; sequential scope; relative scope )

  11. QUERY OPTIMIZATION - TRANSFORMATIONS • Transform declarative query into equivalent one • Two sequence queries are equivalent if both have same: – input sequences – scopes on input sequences – operator function • Equivalence is independent of actual data in input sequences

  12. QUERY OPTIMIZATION – TRANSFORMATIONS • Alter a sub-query but not the entire graph • Incorrect transformations • Good idea to propagate selections, projections, and positional offsets as far down as possible • Non-unit scope operators ! break query into blocks

  13. GLOBAL SPAN OPTIMIZATION 200…350 200…350 PROJECT PROJECT 200…350 #4 .price #4 .price PROJECT #4 .price 200…350 200…350 200…350 COMPOSE COMPOSE COMPOSE #4 #4 #1#2 #1#2 #4 #1#2 200…350 200…350 200…350 200…350 200…500 COMPOSE COMPOSE #1.close>#2.close #1.close>#2.close 1…350 COMPOSE #1.close>#2.close DEC DEC #1 #1 #2 #2 DEC #1 #2 200…350 200…350 200…350 1…750 200…500 IBM IBM HP HP IBM HP • reduce query processing costs by restricting the span of a sequence based on span of other sequences ! can modify span of output based on input and vice versa!

  14. META-INFORMATION • Concepts: Span and Density 200…350 200…350 COMPOSE COMPOSE Density=0.48 #1 #1 Selectivity=0.5 Selectivity=0.5 Close>25 Close>25 Density=0.7 Density=0.7 Density=0.95 Width=20 Density=0.95 Width=100 IBM DEC IBM DEC 200…350 200…350 200…350 200…350 A. B. • Utilize span first then density to reduce the workload on join

  15. ACCESS MODES 200…350 200…350 COMPOSE COMPOSE Density=0.48 #1 #1 Selectivity=0.5 Selectivity=0.5 Close>25 Close>25 Density=0.7 Density=0.7 Density=0.95 Width=20 Density=0.95 Width=100 A. B. IBM DEC IBM DEC 200…350 200…350 200…350 200…350 • Join Strategy-A – two ways of doing it – analogous to NLJ • Join Strategy-B: Stream both in lock-step • “stream access” = get the next non-Null record • “probed access” = get the record at a specific position

  16. CACHING OF DERIVED SEQUENCES #4 #4 SUM SUM PREVIOUS [pos, pos-5] [pos, pos-5] #3 #3 #1 #1 #1.close>#2.close PROJECT PROJECT close close COMPOSE COMPOSE #1 #1 #2 #2 IBM IBM IBM HP HP • Cache-Strategy-A: – Cache last 6 values of sequence #1 (figure on left) – If scope is large or variable ! may not be feasible to cache whole scope • Cache-Strategy-B (incremental cache strategy): – Cache the value of #4 at previous position ! then the record at some position p is either cached record at previous position OR it’s the non-Null record from #3 at previous position

  17. QUERY PLAN GENERATION Sequences may be queried for - specific positions - range of positions start Plan Generation Algorithm: compose SQG: signifies that there might be a derived sequence represented below, not necessarily a base sequence. Position SQG Start: initiates query evaluation by invoking stream Sequence access on its input

  18. QUERY PLAN GENERATION Step 1- Query Specification Step 2- Meta-Information Propagation: - Bottom-up - Top-down Step 3- Query Transformation Step 4- Identification of Query Blocks Step 5- Block-wise Plan Generation Step 6- Plan Selection

  19. QUERY PLAN GENERATION Access costs to Base Sequences: size of valid range, density of sequence, access paths available Costs: Blocks with Non-Unit Scope: (Aggregate & Value Offset) –If stream access, • cost = stream cost of input + cost of storing in cache each record + cost of cache access + computational cost –If probed access, • cost = probed access cost × operator scope size Costs: Blocks with Positional Joins: Stream Access: Stream access on stream-1 and probe on stream-2; or converse or stream access on both streams Probed Access: Access stream-1 in probed fashion and for every record join stream- 2 in probed fashion Algorithmic Analysis: left-deep trees, time and space complexity

  20. Extensions • To Model: General Sequences, Ordering Domains, Multiple Orderings, Sequence Groupings • To Queries: Generalized Query Graph, Correlated Queries • To Framework: Optimization Framework, Materialization of Derived Sequences, Optimizations on base sequences (sorting)

  21. Related Work • Compare to TS-based models • Aurora, Stream-based systems

  22. CONCLUSIONS/Things to remember • Two important things: scope and query generation plans Discussion Questions • How can this algorithm extend (relate) to Aurora or Atlas? • How is operator scope extended/implemented in CQL?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend