today s class carnegie mellon univ
play

Todays Class Carnegie Mellon Univ. History & Background Dept. - PowerPoint PPT Presentation

CMU SCS CMU SCS Todays Class Carnegie Mellon Univ. History & Background Dept. of Computer Science Relational Algebra Equivalences 15-415/615 - DB Applications Plan Cost Estimation Plan Enumeration C. Faloutsos A.


  1. CMU SCS CMU SCS Today’s Class Carnegie Mellon Univ. • History & Background Dept. of Computer Science • Relational Algebra Equivalences 15-415/615 - DB Applications • Plan Cost Estimation • Plan Enumeration C. Faloutsos – A. Pavlo Lecture#15: Query Optimization Faloutsos/Pavlo CMU SCS 15-415/615 2 CMU SCS CMU SCS Query Optimization 1970s – Relational Model • Remember that SQL is declarative. • Ted Codd saw the maintenance overhead for IMS/Codasyl. – User tells the DBMS what answer they want, not how to get the answer. • Proposed database abstraction based • There can be a big difference in on relations: Codd performance based on plan is used: – Store database in simple data structures. – See last week: 5.7 days vs. 45 seconds – Access it through high-level language. – Physical storage left up to implementation. Faloutsos/Pavlo CMU SCS 15-415/615 3 Faloutsos/Pavlo CMU SCS 15-415/615 4

  2. CMU SCS CMU SCS IBM System R IBM System R • Skunkworks project at IBM Research in • First implementation of a query optimizer. San Jose to implement Codd’s ideas. • People argued that the DBMS could never • Had to figure out all of the things that we choose a query plan better than what a are discussing in this course themselves. human could write. • IBM never commercialized System R. • A lot of the concepts from System R’s optimizer are still used today. Faloutsos/Pavlo CMU SCS 15-415/615 5 Faloutsos/Pavlo CMU SCS 15-415/615 6 CMU SCS CMU SCS Today’s Class Relational Algebra Equivalences • History & Background • A query can be expressed in different ways. • Relational Algebra Equivalences • The optimizer considers variations and • Plan Cost Estimation choose the one with the lowest cost. • Plan Enumeration • How do we know whether two queries are • Nested Sub-queries equivalent? Faloutsos/Pavlo CMU SCS 15-415/615 7 Faloutsos/Pavlo CMU SCS 15-415/615 8

  3. CMU SCS CMU SCS Relational Algebra Equivalences Predicate Pushdown SELECT name, cid • Two relational algebra expressions are FROM student, enrolled WHERE student.sid = equivalent if they generate the same set of enrolled.sid tuples. AND enrolled.grade = ‘A’ π π name, cid name, cid σ ⨝ sid=sid grade=‘A’ ⨝ σ sid=sid grade=‘A’ student enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 9 Faloutsos/Pavlo CMU SCS 15-415/615 10 CMU SCS CMU SCS Relational Algebra Equivalences Relational Algebra Equivalences SELECT name, cid FROM student, enrolled • Selections: WHERE student.sid = – Perform them early enrolled.sid AND enrolled.grade = ‘A’ – Break a complex predicate, and push down σ p1 ∧ p2 ∧ …pn (R) = σ p1 ( σ p2 ( σ … pn (R))…) π name, cid ( σ grade=‘A’ (student ⋈ enrolled )) • Simplify a complex predicate = – ( X=Y AND Y=3 ) → X=3 AND Y=3 π name, cid ( student ⋈ ( σ grade=‘A’ (enrolled ))) Faloutsos/Pavlo CMU SCS 15-415/615 11 Faloutsos/Pavlo CMU SCS 15-415/615 12

  4. CMU SCS CMU SCS Relational Algebra Equivalences Projection Pushdown SELECT name, cid • Projections: FROM student, enrolled WHERE student.sid = – Perform them early enrolled.sid • Smaller tuples AND enrolled.grade = ‘A’ • Fewer tuples (if duplicates are eliminated) π π – Project out all attributes except the ones name, cid name, cid requested or required (e.g., joining attr.) σ ⨝ sid=sid grade=‘A’ ⨝ σ sid=sid grade=‘A’ • This is not important for a column store… student enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 13 Faloutsos/Pavlo CMU SCS 15-415/615 14 CMU SCS CMU SCS Projection Pushdown Relational Algebra Equivalences SELECT name, cid FROM student, enrolled • Joins: WHERE student.sid = – Commutative, associative enrolled.sid R ⋈ S = S ⋈ R AND enrolled.grade = ‘A’ (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T) π π name, cid name, cid σ ⨝ sid=sid grade=‘A’ • Q: How many different orderings are there π ⨝ π sid, cid sid=sid for an n -way join? sid, name σ grade=‘A’ student enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 14 Faloutsos/Pavlo CMU SCS 15-415/615 15

  5. CMU SCS CMU SCS Relational Algebra Equivalences Today’s Class • Joins: How many different orderings are • History & Background there for an n-way join? • Relational Algebra Equivalences • A: Catalan number ~ 4 n • Plan Cost Estimation – Exhaustive enumeration: too slow. • Plan Enumeration • We’ll see in a second how an optimizer limits the search space... Faloutsos/Pavlo CMU SCS 15-415/615 16 Faloutsos/Pavlo CMU SCS 15-415/615 17 CMU SCS CMU SCS Cost Estimation Cost Estimation – Statistics S R • For each relation R we keep: • How long will a query take? #1 – N R → # tuples – CPU : Small cost; tough to estimate #2 #3 – S R → size of tuple in bytes – Disk : # of block transfers – V(A,R) → # of distinct values – Memory : Amount of DRAM used of attribute ‘A’ – Network : # of messages … • How many tuples will be read/written? • What statistics do we need to keep? #N R Faloutsos/Pavlo CMU SCS 15-415/615 18 Faloutsos/Pavlo CMU SCS 15-415/615 19

  6. CMU SCS CMU SCS Derivable Statistics Derivable Statistics S R • F R → max# records/block • SC(A,R) → Selection Cardinality avg# of records with A=given • B R → # blocks F R #1 → N R / V(A,R) • SC(A,R) → selection cardinality #2 • Note that this assumes data uniformity avg# of records with A=given #3 – 10,000 students, 10 colleges – how many … students in SCS? #B R Faloutsos/Pavlo CMU SCS 15-415/615 20 Faloutsos/Pavlo CMU SCS 15-415/615 21 CMU SCS CMU SCS Additional Statistics Statistics • For index i : • Where do we store them? – F i → average fanout (~50 -100) HT i • How often do we update them? – HT i → # levels of index i (~2-3) ~ log(#entries)/log( F i ) • Manual invocations: – LB i # → blocks at leaf level – Postgres/SQLite: ANALYZE – MySQL: ANALYZE TABLE Faloutsos/Pavlo CMU SCS 15-415/615 22 Faloutsos/Pavlo CMU SCS 15-415/615 23

  7. CMU SCS CMU SCS Selection Statistics Selections – Complex Predicates • Selectivity sel(P) of predicate P : • We saw simple predicates ( name=“Kayne” ) == fraction of tuples that qualify • How about more complex predicates, like – salary > 10000 • Formula depends on type of predicate. – age=30 AND jobTitle=“Costermonger” – Equality • What is their selectivity? – Range – Negation – Conjunction – Disjunction Faloutsos/Pavlo CMU SCS 15-415/615 24 Faloutsos/Pavlo CMU SCS 15-415/615 25 CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Selectivity sel(P) of predicate P : • Assume that V(rating, sailors) has 5 == fraction of tuples that qualify distinct values (0–4) and N R = 5 • Equality Predicate : A=constant • Formula depends on type of predicate. – Equality – sel(A=constant) = SC(P) / V(A,R) – Range – Example: sel(rating=‘2’) = – Negation – Conjunction – Disjunction Faloutsos/Pavlo CMU SCS 15-415/615 25 26

  8. CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Assume that V(rating, sailors) has 5 • Assume that V(rating, sailors) has 5 distinct values (0–4) and N R = 5 distinct values (0–4) and N R = 5 • Equality Predicate : A=constant • Equality Predicate : A=constant – sel(A=constant) = SC(P) / V(A,R) – sel(A=constant) = SC(P) / V(A,R) – Example: sel(rating=‘2’) = – Example: sel(rating=‘2’) = count count V(rating,R)=5 0 1 2 3 4 0 1 2 3 4 26 26 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Assume that V(rating, sailors) has 5 • Assume that V(rating, sailors) has 5 distinct values (0–4) and N R = 5 distinct values (0–4) and N R = 5 • Equality Predicate : A=constant • Equality Predicate : A=constant – sel(A=constant) = SC(P) / V(A,R) – sel(A=constant) = SC(P) / V(A,R) – Example: sel(rating=‘2’) = – Example: sel(rating=‘2’) = 1/5 SC(rating=‘2’)=1 SC(rating=‘2’)=1 count count V(rating,R)=5 V(rating,R)=5 0 1 2 3 4 0 1 2 3 4 26 26 rating rating

  9. CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Range Query: • Range Query: – sel( A>a ) = (A max – a) / (A max – A min ) – sel( A>a ) = (A max – a) / (A max – A min ) – Example: sel(rating >= ‘2’) – Example: sel(rating >= ‘2’) = (4 – 2) / (4 – 0) = 1/2 rating min = 0 rating max = 4 count count 0 1 2 3 4 0 1 2 3 4 27 27 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Negation Query • Negation Query – sel(not P ) = 1 – sel( P ) – sel(not P ) = 1 – sel( P ) – Example: sel(rating != ‘2’) – Example: sel(rating != ‘2’) SC(rating=‘2’)=1 count count 0 1 2 3 4 0 1 2 3 4 28 28 rating rating

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend