course content database management systems
play

Course Content Database Management Systems Introduction - PowerPoint PPT Presentation

Course Content Database Management Systems Introduction Database Design Theory Query Processing and Optimisation Winter 2003 Concurrency Control Data Base Recovery and Security CMPUT 391: Query Processing &


  1. Course Content Database Management Systems • Introduction • Database Design Theory • Query Processing and Optimisation Winter 2003 • Concurrency Control • Data Base Recovery and Security CMPUT 391: Query Processing & Optimization • Object-Oriented Databases • Inverted Index for IR Dr. Osmar R. Zaïane • XML • Data Warehousing • Data Mining • Parallel and Distributed Databases University of Alberta Chapters 12, 13, 14 • Other Advanced Database Topics 15 & 20 of Textbook  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 Database Management Systems University of Alberta 1 Database Management Systems University of Alberta 2 2 Query Processing and Optimization Objectives of Lecture 3 Query Processing and Optimization • Query Processing and Planning • System Catalog • Get a glimpse on query processing and • Evaluation of Relational Operations evaluation. • Merge Sort • Introduce the issue of query planning and • Evaluation of Relational Operations (Continue) plan selection. • Understand the importance of good • Cost Estimation and Plan Selection database design for good performance. • Physical Database Design Issues • Database Tuning  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 3 4 Database Management Systems University of Alberta Database Management Systems University of Alberta

  2. Overview of Query Processing The Need for Optimization • The aim is to transform a query in a high-level declarative language (SQL) into a correct and Consider: efficient execution strategy SELECT name, address • Query Decomposition FROM Customer, Account WHERE Customer.name = Account.name – Analysis AND Balance > 2000 – Conjunctive and disjunctive normalization There are different possibilities for execution: – Semantic analysis π C.name,C.address ( σ C.name=A.name ∧ A.balance>2000 (C × A)) • Query Optimization π C.name,C.address ( σ C.name=A.name (C × σ A.balance>2000 (A)) • Query Evaluation (Execution)  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 Database Management Systems University of Alberta 5 Database Management Systems University of Alberta 6 Architecture for DBMS Query General Approaches to Optimization Processing • Heuristic-based query optimization SQL Query SQL Parser – Given a query expression, perform selections and projections as early as possible. Relational Algebra Expression – Eliminate duplicate computations. Query Optimizer • Cost-based query optimization System Cost Query Plan Catalog Estimator Generator – Estimate the cost of different equivalent query Query Execution Plan expressions (using the heuristics and algebra manipulation) and choose the execution plan with Query Plan Interpreter the lowest cost estimation. Query Result  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 7 8 Database Management Systems University of Alberta Database Management Systems University of Alberta

  3. Heuristic Transformations General Guidelines Selection and projection-based transformations • Cascading Selection • Perform Selections and projections as early as σ cond1 ∧ cond2 (R) ≡σ cond1 ( σ cond2 (R)) possible • Commutativity of selection – Splitting selection formula if necessary σ cond1 ( σ cond2 (R)) ≡ σ cond2 ( σ cond1 (R)) – Adding projections to eliminate unused columns • Cascading of Projection • Eliminating or reducing if possible repeated π Attribs1 ( π Attribs2 (…( π Attribsn (R)…)) ≡ π Attribs1 (R) computations • Commutativity of Selection and Projection • Combine unary operators with binary operators π Attribs ( σ cond (R)) ≡ σ cond ( π Attribs (R))  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 Database Management Systems University of Alberta 9 Database Management Systems University of Alberta 10 Heuristic Transformations Query Trees Pushing selections and projections through joins • A query tree is a tree structure that corresponds to a σ cond (R × S) ≡ R relational algebra expression such that: cond S – Each leaf node represents an input relation; if conditions cond relate to the attributes of both R and S – Each internal node represents a relation obtained by applying σ cond (R × S) ≡ σ cond (R) × S one relational operator to its child nodes if attributes in cond all belong to R (idem with joins) – The root relation represents the answer to the query π Attribs1 (R × S) ≡ π Attribs1 ( π Attribs2 (R) × S ) • Two query trees are equivalent if their root relations are Where attribs1 ⊆ attribs2 ⊆ (R) the same (query result) π Attribs1 (R cond S) ≡ π Attribs1 ( π Attribs2 (R) cond S) • A query tree may have different execution plans Attribs2 should contain all attributes in cond • Some query trees and plans are more efficient to execute than others.  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 11 12 Database Management Systems University of Alberta Database Management Systems University of Alberta

  4. Example of Query Tree and Overview of Query Optimization Query Plan • Query Plan : Tree of Relational Algebra operators with choice of algorithms for each operation. Query Tree Query Plan – Each operator typically implemented using a `pull’ interface: SELECT S.sname FROM Reserves R. Sailors S when an operator is `pulled’ for the next output tuples, it `pulls’ sname On the fly WHERE R.sid = S.sid AND on its inputs and computes them. R.bid = 100 AND • Two main issues: On the fly rating > 5 bid=100 S.rating > 5 – For a given query, what plans are considered? • Algorithm to search plan space for cheapest (estimated) plan. Simple Nested Loop Join – How is the cost of a plan estimated? sid=sid File Scan • Ideally: Want to find best plan. Reserves Sailors File Scan • Practically: Avoid worst plans! π S.sname ( σ R.sid=S.sid ^ R.bid = 100 ^ S.rating>5 ( R × S))  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 Database Management Systems University of Alberta 13 Database Management Systems University of Alberta 14 Query Processing and Optimization System Catalog • Query Processing and Planning • A Database system maintains information • System Catalog about every relation and view it contains. • Evaluation of Relational Operations • This information is stored in special • Merge Sort relations called catalog relations or data • Evaluation of Relational Operations (Continue) dictionary • Cost Estimation and Plan Selection • The data in the data dictionary is • Physical Database Design Issues extensively used for query optimization • Database Tuning  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 15 16 Database Management Systems University of Alberta Database Management Systems University of Alberta

  5. Statistics Stored System Catalog Information • Cardinality (Ntuples(R)) : number of tuples in each relation • Size (Npages(R)): number of pages for each relation • For each relation • Index Cardinality (Nkeys(I)) : number of distinct key values – Relation name, file name, file structure • Index Size (INPages(I)) : number of pages for each index – Attribute name and type for all attributes • Index Height ( IHeight(I)): number of nonleaf levels for each tree – Index name for all indexes on the relation index – Integrity constrains on the relation • Index Range: minimum (ILow(I)) and maximum (IHigh(I)) present key values for each index • For each index • Catalogs updated periodically. – Index name and structure – Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok. – Search key attributes • More detailed information (e.g., histograms of the values in some • For each view field, or attribute weight, etc.) are sometimes stored. – View name and definition  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 Database Management Systems University of Alberta 17 Database Management Systems University of Alberta 18 Query Processing and Estimating the Result Size Optimization • Typical optimizers estimate the size of the • Query Processing and Planning relation resulting from a relational operation. • System Catalog • The result size estimation plays an important role in cost estimation because the output of an • Evaluation of Relational Operations operation can be the input of another operation. • Merge Sort • In a SELECT-FROM-WHERE query, the size of the • Evaluation of Relational Operations (Continue) result is typically the product of the cardinality • Cost Estimation and Plan Selection of the relations in the FROM clause, adjusted by • Physical Database Design Issues the reduction effect by the conditions in the WHERE clause. • Database Tuning  Dr. Osmar R. Zaïane, 2001  Dr. Osmar R. Zaïane, 2001 19 20 Database Management Systems University of Alberta Database Management Systems University of Alberta

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend