database management the query systems
play

Database Management The Query Systems SELECT P.name Find the - PowerPoint PPT Presentation

Database Management The Query Systems SELECT P.name Find the names of Consider: FROM Professor P, Teaching T professors from A Hands-On Example for Query Plan Cost Estimation WHERE P.Id = T.ProfID Computing Science AND T.Semester =


  1. Database Management The Query Systems SELECT P.name Find the names of Consider: FROM Professor P, Teaching T professors from A Hand’s-On Example for Query Plan Cost Estimation WHERE P.Id = T.ProfID Computing Science AND T.Semester = ‘F2000’ who taught a course CMPUT 391: Query Processing & Optimization AND P.Dept = ‘CS’ in the Fall of 2000 Dr. Osmar R. Zaïane Professor Id name Dept … … Semester ProfID Teaching University of Alberta Join  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 Database Management Systems University of Alberta 1 Database Management Systems University of Alberta 2 The Query in RA Query Trees π name ( σ Dept=‘CS’ ∧ Semester=‘F2000’ (Professor Id=ProfID Teaching)) A SELECT P.name FROM Professor P, Teaching T SQL π name ( σ Dept=‘CS’ (Professor) Id=ProfID σ Semester=‘F2000’ (Teaching)) WHERE P.Id = T.ProfID B Possible Relational AND T.Semester = ‘F2000’ Algebra Expressions AND P.Dept = ‘CS’ A π B π name name π name ( σ Dept=‘CS’ ∧ Semester=‘F2000’ (Professor σ A Id=ProfID Teaching)) Dept=‘CS’ ∧ Semester=‘F2000’ Id=ProfID π name ( σ Dept=‘CS’ (Professor) Id=ProfID σ Semester=‘F2000’ (Teaching)) B σ σ Semester= Dept=‘CS’ π name ( σ Semester=‘F2000’ ( σ Dept=‘CS’ (Professor) ‘F2000’ C Id=ProfID Teaching)) Id=ProfID π name ( σ Dept=‘CS’ (Professor Id=ProfID σ Semester=‘F2000’ (Teaching))) D Teaching Teaching Professor Professor  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 3 4 Database Management Systems University of Alberta Database Management Systems University of Alberta

  2. Query Trees Data Dictionary and Indexes π name ( σ Semester=‘F2000’ ( σ Dept=‘CS’ (Professor) C Id=ProfID Teaching)) 200 pages Professor π name ( σ Dept=‘CS’ (Professor Id=ProfID σ Semester=‘F2000’ (Teaching))) 1000 records on professors in 50 Departments D That is 5 tuples per page Hash B+tree Clustered 2-level B + tree index on Dept C D π π Professor Id name Dept … Hash index on Id name name σ σ Dept=‘CS’ Semester=‘F2000’ Teaching 1000 pages 10000 teaching records for the period of 4 semesters Id=ProfID Id=ProfID That is 10 tuples per page σ Dept=‘CS’ σ Semester= Professor Teaching B+tree Hash Clustered 2-level B + tree index on Semester ‘F2000’ Hash index on ProfID … Semester ProfID Teaching Professor Teaching  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 Database Management Systems University of Alberta 5 Database Management Systems University of Alberta 6 Cost Estimation for A Cost Estimation for B Accessing the indices: We have 48 buffer blocks in main memory Both indices are 2-level B+ trees. This means A Query Plan A Query Plan A B π π we need 2 I/Os for each � 4 I/Os We will use 1 block for the output; 1 block name name for the input of Teaching ; and the rest (46 Pipeline Pipeline Estimating the sizes of the selections: σ blocks) for the input of Professor . Write in •There are 1000 professors in 50 departments. Dept=‘CS’ ∧ Semester=‘F2000’ P & T Join Result Temp file Assuming a uniform distribution, CS would Buffer block of P Id=ProfID Pipeline BNLJ (46 pages) have 1000/50=20 professors. (4 pages) + 4 250 σ σ . . . Writing temporary file (4 pages) Semester= BNLJ Dept=‘CS’ •There are 10000 teachings in 4 semester. ‘F2000’ . . . . . . Id=ProfID B+ tree on B+ tree on Assuming a uniform distributions, there would Dept Semester Input buffer for T Output buffer be 10000/4=2500 teachings in the Fall of 2000. 1000 1000 200 200 (250 pages) + writing temp file (250 pages) • We need to read Professor once � 200 I/O Professor Teaching Professor Teaching Clustered • We need to read Teaching 5 times Cost = (4 + 250 + 4) + (4 + 250) Block nested-loop Join: This is because with 46 blocks, we need to fill the Cost = 200 + 5000 = 5200 I/O buffers  200/46  =5 times to read the whole Professor + (4 + 250) = 766 I/O All records of professors in CS fit in the buffer. 200 for reading Professor We would scan the teaching of F2000 only 258 for the select; 254 for writing table. Each time we fill the buffer, we scan Teaching . 5000 for reading Teaching 5 times once. � 4 + 250 I/Os The temporary files; 250 for the join • There is no cost for selecting and projecting.  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 7 8 Database Management Systems University of Alberta Database Management Systems University of Alberta

  3. Cost Estimation for B (cont’) Cost Estimation for C If DBMS is smart: Accessing the indices: No need for temp files •The index is a 2-level B+ trees. This means we Both indices are 2-level B+ trees. This means A Query Plan A Query Plan B π C π need 2 I/Os to access the CS professors. we need 2 I/Os for each � 4 I/Os name name •There are 1000 professors in 50 departments. σ Pipeline Pipeline Assuming a uniform distribution, CS would have Estimating the sizes of the selections and BNLJ: Semester=‘F2000’ 1000/50=20 professors. Since the records are •There are 1000 professors in 50 departments. Id=ProfID BNLJ Assuming a uniform distribution, CS would INLJ clustered, we would need only 4 I/Os. have 1000/50=20 professors. (4 pages) 4 σ 250 • The selection result is piped as input to the join. σ Id=ProfID • Since the result can fit in main memory, the Semester= σ Dept=‘CS’ Use Hash •To match the 20 professor, we need to search the Dept=‘CS’ ‘F2000’ result of the second select can play the role of index 20 times. Thus, accessing the index costs B+ tree on B+ tree on Teaching the scan. As the clustered Teachings of F2000 B+ tree on Dept Semester 20 * 1.2 = 24 I/O Semester 1000 are read, they are joined to the professor records 1000 200 200 •Again, assuming uniform distribution, each Teaching in the buffer. Professor Professor professor teaches 10 teachings (10000/1000). •There are 10000 teachings in 4 semester. Cost = (4 + 250 + 4) = 258 I/O Since the index is not clustered, we need 10 I/Os Assuming a uniform distributions, there would Cost = 2 + 4 + 224 = 230 I/O 258 for the select. per professor to get all teachings. That is 200 I/Os be 2500 teachings in the Fall of 2000. (250 6 for selecting Professor the join is on the fly. pages) in total (20*10) for the teachings of all professors 224 for the join in CS.  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 Database Management Systems University of Alberta 9 Database Management Systems University of Alberta 10 Cost Estimation for D Cost Estimation for D (cont’) We have 48 buffer blocks in main memory We have 48 buffer blocks in main memory A Query Plan A Query Plan D D •There are 10000 teachings in 4 semester. Assuming π π • We need to sort Professors and the teachings in the a uniform distributions, there would be 2500 name name Fall of 2000. The teachings can be sorted as they are σ σ teachings in the Fall of 2000. (250 pages since 10 Pipeline Pipeline selected. While in main memory, the first runs can be Dept=‘CS’ Dept=‘CS’ tuples per page) 250 I/O + 2 I/Os for B+tree. produced. � No temporary files for the selection. • While in main memory after selection, the first SortMerge SortMerge • Sorting Professor with a k-way sort: runs can be produced by sorting the buffers in MM. since we have 48 blocks, this allows us to get 5 runs Id=ProfID Id=ProfID The first runs are obtained after 252 I/O for (  200 pages /48 blocks  =5 runs) σ σ selection and 250 I/Os for writing the 1 st runs. Semester= Semester= 48 pages 48 pages 48 pages 48 pages 8 pages Professor Professor 200 I/O reading We obtain  250 pages /48 blocks  = 6 runs B+ tree on B+ tree on ‘F2000’ ‘F2000’ 200 I/O writing Semester Semester 1000 1000 • We have enough buffers to merge them in one 200 200 Teaching Teaching pass. This adds 250 + 250 I/Os to sort them all. 200 I/O reading • Selecting and sorting the teachings costs 252 + 250 200 I/O writing Cost =800+1002+450= 2252 I/O 200 pages Cost of sort is 800 I/O so far. + 250 + 250 = 1002 I/Os. � Sorting Professor costs us 2 * 200 + 2 * 200 = 800 for sorting P •The sort merge requires an additional scan of both 1002 for selecting and sorting T 800 I/Os. 450 for the sort merge join sorted files: 200 + 250  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 11 12 Database Management Systems University of Alberta Database Management Systems University of Alberta

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend