cse 232a graduate database systems
play

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query - PowerPoint PPT Presentation

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and 15 of Cow Book Slide ACKs: Jignesh Patel, Paris Koutris 1 Lifecycle of a Query Query Result Query Database Server Query Execute Parser


  1. CSE 232A 
 Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and 15 of Cow Book Slide ACKs: Jignesh Patel, Paris Koutris 1

  2. Lifecycle of a Query Query Result Query Database Server Query 
 Execute Parser Optimizer Scheduler Operators |…|……|………..|………..| Select R.text from |…|……|………..|………..| Report R, Weather W |…|……|………..|………..| where W.image.rain() |…|……|………..|………..| and W.city = R.city |…|……|………..|………..| and W.date = R.date |…|……|………..|………..| and |…|……|………..|………..| R.text. |…|……|………..|………..| matches(“insurance claims”) |…|……|………..|………..| Query |…|……|………..|………..| |…|……|………..|………..| Query Result Syntax Tree and Physical Logical Query Plan Query Plan Segments 2

  3. Recall the Netflix Schema Ratings RatingID Stars RateDate UID MID 1 3.5 08/27/15 79 20 … … … … … UID Name Age JoinDate Users 79 Alice 23 01/10/13 80 Bob 41 05/10/13 Movies MID Name Year Director 20 Inception 2010 Christopher Nolan 16 Avatar 2009 Jim Cameron 3

  4. Example SQL Query RatingID Stars RateDate UID MID UID Name Age JoinDate MID Name Year Director SELECT M.Year, COUNT(*) AS NumBest Ratings R, Movies M FROM R.MID = M.MID WHERE AND R.Stars = 5 GROUP BY M.Year ORDER BY NumBest DESC Suppose, we also have a B+Tree Index on Ratings (Stars) 4

  5. Logical Query Plan Result Table SORT On NumBest Called “ Logical ” Operators GROUP BY AGGREGATE M.Year, COUNT(*) From extended RA Each one has JOIN alternate “physical” R.MID = M.MID implementations SELECT SELECT R.stars = 5 No predicate Ratings Table Movies Table 5

  6. Physical Query Plan Result Table External Merge-Sort In-mem quicksort; B=50 Called “ Physical ” Operators Sort-based Specifies exact Aggregate algorithm/code to run for each logical operator, with all Index-Nested Loop Join parameters (if any) Aka “ Query Indexed Access File Scan Use Index on Stars Read heapfile Evaluation Plan ” Ratings Table Movies Table 6

  7. Physical Query Plan Result Table External Merge-Sort In-mem quicksort; B=50 This is also a correct PQP for the given LQP! Hash-based Aggregate Q: Which PQP is faster? This is a key job of the Hash Join RDBMS Query Optimizer! File Scan File Scan Read Index leaf pages Read heapfile Ratings Table Movies Table 7

  8. So, what is query optimization and how does it work? 8

  9. Meet Query Optimization A given LQP could have several possible Basic Idea: PQPs with very different runtime performance Goal (Ideal): Get the optimal (fastest) PQP for a given LQP Goal (Realistic): Fine, just avoid the “clearly awful” PQPs! Query optimization is a metaphor for life itself! It is often hard to even know what an optimal plan would be, but it is feasible to avoid many obviously bad plans! Jeff Naughton 9

  10. Query Optimization Overview of Query Optimizer ❖ Physical Query Plan (PQP) ❖ Concept: Pipelining Mechanism: Iterator Interface Enumerating Alternative PQPs ❖ Logical: Algebraic Rewrites Physical: Choosing Phy. Op. Impl. Costing PQPs ❖ Materialized Views ❖ 10

  11. Overview of Query Optimizer SQL Query Parser Logical Query Plan Plan Plan Cost Enumerator Estimator Catalog Optimizer Physical Query Plan (Optimized) To Scheduler/Executor 11

  12. System Catalog ❖ Set of pre-defined relations for metadata about DB (schema) ❖ For each Relation : Relation name, File name File structure (heap file vs. clustered B+ tree, etc.) Attribute names and types; Integrity constraints; Indexes ❖ For each Index : Index name, Structure (B+ tree vs. hash, etc.); Index key ❖ For each View : View name, and View definition 12

  13. Statistics in the System Catalog ❖ RDBMS periodically collects stats about DB (instance) ❖ For each Table R : Cardinality, i.e., number of tuples, NTuples (R) Size, i.e., number of pages, NPages (R) , or just N R ❖ For each Index X : Cardinality, i.e., number of distinct keys IKeys (X) Size, i.e., number of pages IPages (X) (for a B+ tree, this is the number of leaf pages only) Height (for tree indexes) IHeight (X) Min and max keys in index ILow (X) , IHigh (X) 13

  14. Query Optimization Overview of Query Optimizer ❖ Physical Query Plan (PQP) ❖ Concept: Pipelining Mechanism: Iterator Interface Enumerating Alternative PQPs ❖ Logical: Algebraic Rewrites Physical: Choosing Phy. Op. Impl. Costing PQPs ❖ Materialized Views ❖ 14

  15. Concept: Pipelining Result Table External Merge-Sort In-mem quicksort; B=50 Q: Does the hash-based aggregate have to wait till the entire output of Hash-based Aggregate the “upstream” hash join is available? Hash Join No! We can “ pipeline ” the output of the join – pass on File Scan File Scan Read Index leaf pages Read heapfile a join output tuple as soon as it is obtained! Movies Table RatingsTable 15

  16. Concept: Pipelining Do not force “downstream” physical operators Basic Idea: to wait till the entire output is available Display output to the user incrementally Benefits: CPU Parallelism in multi-core systems! File Scan Hash Join Tuples Hash-based Aggregate 16

  17. Concept: Pipelining ❖ Crucial for PQPs with workflow of many phy. ops. ❖ Common feature of almost all RDBMSs ❖ Works for many operators: SCAN, HASH JOIN, etc. Q: Are all physical operators amenable to pipelining? No! Some may “stall” the pipeline: “ Blocking Op ” A blocking op. requires its output to be Materialized as a temporary table Usually, any phy. op. involving sorting is blocking! 17

  18. Blocking Op Result Table External Merge-Sort In-mem quicksort; B=50 This phy. op. is blocking because we need to sort Movies and sort Ratings Hash-based Aggregate (materialize the output) before we can start any aggregate computations! Sort-Merge Join File Scan File Scan Read heapfile Read heapfile Movies Table RatingsTable 18

  19. Query Optimization Overview of Query Optimizer ❖ Physical Query Plan (PQP) ❖ Concept: Pipelining Mechanism: Iterator Interface Enumerating Alternative PQPs ❖ Logical: Algebraic Rewrites Physical: Choosing Phy. Op. Impl. Costing PQPs ❖ Materialized Views ❖ 19

  20. Mechanism: Iterator Interface ❖ Software API to process PQP; makes pipelining easy to impl. ❖ Enables us to abstract away individual phy. op. impl. details ❖ Three main functions in usage interface of each phy. op.: Initialize the phy. op. “state”, get arguments Open() : Allocate input and output buffers GetNext() : Ask the phy. op. impl. to “deliver” next output tuple; pass it on; if blocking, wait Close() : Clear phy. op. state, free up space 20

  21. Query Optimization Overview of Query Optimizer ❖ Physical Query Plan (PQP) ❖ Concept: Pipelining Mechanism: Iterator Interface Enumerating Alternative PQPs ❖ Logical: Algebraic Rewrites Physical: Choosing Phy. Op. Impl. Costing PQPs ❖ Materialized Views ❖ 21

  22. Overview of Query Optimizer SQL Query Parser Logical Query Plan Plan Plan Cost Enumerator Estimator Catalog Optimizer Physical Query Plan (Optimized) To Scheduler/Executor 22

  23. Enumerating Alternative PQPs ❖ Plan Enumerator explores various PQPs for a given LQP ❖ Challenge : Space of plans is huge! How to make it feasible? ❖ RDBMS Plan Enumerator has Rules to help determine what plans to enumerate, and also consults Cost models ❖ Two main sources of Rules for enumerating plans: Logical: Algebraic Rewrites : Use relational algebra equivalence to rewrite LQP itself! Physical: Choosing Phy. Op. Impl. : Use different phy. op. impl. for a given log. op. in LQP 23

  24. Query Optimization Overview of Query Optimizer ❖ Physical Query Plan (PQP) ❖ Concept: Pipelining Mechanism: Iterator Interface Enumerating Alternative PQPs ❖ Logical: Algebraic Rewrites Physical: Choosing Phy. Op. Impl. Costing PQPs ❖ Materialized Views ❖ 24

  25. Algebraic Rewrite Rules ❖ Rewrite a given RA query in to another that is equivalent (a logical property) but might be faster (a physical property) ❖ RA operators have some formal properties we can exploit ❖ We will cover only a few rewrite rules: Single-operator Rewrites Unary operators Binary operators Cross-operator Rewrites 25

  26. Unary Operator Rewrites ❖ Key unary operators in RA: σ π ❖ Commutativity of σ σ p 1 ( σ p 2 ( R )) = σ p 2 ( σ p 1 ( R )) ❖ Cascading of σ σ p 1 ( σ p 2 ( . . . σ p n ( R ) . . . )) = σ p 1 ∧ p 2 ∧ ··· ∧ p n ( R ) A i ⊆ A i +1 ∀ i = 1 . . . ( n − 1) ❖ Cascading of π π A 1 ( π A 2 ( . . . π A n ( R ) . . . )) = π A 1 ( R ) Q: Why are cascading rewrites beneficial? 26

  27. Binary Operator Rewrites ❖ Key binary operator in RA: . / ❖ Commutativity of R . / S = S . / R . / ❖ Associativity of ( R . / S ) . / T = R . / ( S . / T ) . / Q: Why are these properties beneficial? Q: What other binary operators in RA satisfy these? 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend