12
play

12 Part I Intro to Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Execution 12 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Carnegie Mellon University Fall 2019 2 ADM IN ISTRIVIA Homework #3 is due Wed Oct 9 th @ 11:59pm Mid-Term Exam is Wed Oct 16th @ 12:00pm


  1. Query Execution 12 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Carnegie Mellon University Fall 2019

  2. 2 ADM IN ISTRIVIA Homework #3 is due Wed Oct 9 th @ 11:59pm Mid-Term Exam is Wed Oct 16th @ 12:00pm Project #2 is due Sun Oct 20 th @ 11:59pm CMU 15-445/645 (Fall 2019)

  3. 3 Q UERY PLAN SELECT R.id, S.cdate The operators are arranged in a tree. FROM R JOIN S ON R.id = S.id Data flows from the leaves of the tree WHERE S.value > 100 p up towards the root. R.id, S.value The output of the root node is the ⨝ R.id=S.id result of the query. s value>100 R S CMU 15-445/645 (Fall 2019)

  4. 4 TO DAY'S AGEN DA Processing Models Access Methods Expression Evaluation CMU 15-445/645 (Fall 2019)

  5. 5 PRO CESSIN G M O DEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for different workloads. Approach #1: Iterator Model Approach #2: Materialization Model Approach #3: Vectorized / Batch Model CMU 15-445/645 (Fall 2019)

  6. 6 ITERATO R M O DEL Each query plan operator implements a Next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Also called Volcano or Pipeline Model. CMU 15-445/645 (Fall 2019)

  7. 7 ITERATO R M O DEL SELECT R.id, S.cdate Next() for t in child.Next() : FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 Next() for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id Next() for t in child.Next() : s if evalPred (t): emit (t) value>100 Next() Next() for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

  8. 7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

  9. 7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

  10. 7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : Single Tuple s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

  11. 7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 5 R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

  12. 7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 5 R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)

  13. 8 ITERATO R M O DEL This is used in almost every DBMS. Allows for tuple pipelining . Some operators have to block until their children emit all of their tuples. → Joins, Subqueries, Order By Output control works easily with this approach. CMU 15-445/645 (Fall 2019)

  14. 9 M ATERIALIZATIO N M O DEL Each operator processes its input all at once and then emits its output all at once. → The operator "materializes" its output as a single result. → The DBMS can push down hints into to avoid scanning too many tuples. → Can send either a materialized row or a single column. The output can be either whole tuples (NSM) or subsets of columns (DSM) CMU 15-445/645 (Fall 2019)

  15. 10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s if evalPred (t): out.add (t) value>100 return out out = [ ] out = [ ] R S for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)

  16. 10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s if evalPred (t): out.add (t) All Tuples value>100 return out out = [ ] out = [ ] R S 3 for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)

  17. 10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s 4 if evalPred (t): out.add (t) value>100 return out out = [ ] out = [ ] R S 3 5 for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)

  18. 10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s 4 if evalPred (t): out.add (t) value>100 return out out = [ ] out = [ ] R S 3 5 for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)

  19. 11 M ATERIALIZATIO N M O DEL Better for OLTP workloads because queries only access a small number of tuples at a time. → Lower execution / coordination overhead. → Fewer function calls. Not good for OLAP queries with large intermediate results. CMU 15-445/645 (Fall 2019)

  20. 12 VECTO RIZATIO N M O DEL Like the Iterator Model where each operator implements a Next function in this model. Each operator emits a batch of tuples instead of a single tuple. → The operator's internal loop processes multiple tuples at a time. → The size of the batch can vary based on hardware or query properties. CMU 15-445/645 (Fall 2019)

  21. 13 VECTO RIZATIO N M O DEL out = [ ] 1 for t in child.Next() : SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S if | out |> n : emit ( out ) ON R.id = S.id out = [ ] 2 WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ R.id=S.id out = [ ] for t in child.Next() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 Tuple Batch out = [ ] out = [ ] 3 R S for t in R: for t in S: out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2019)

  22. 13 VECTO RIZATIO N M O DEL out = [ ] 1 for t in child.Next() : SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S if | out |> n : emit ( out ) ON R.id = S.id out = [ ] 2 WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ R.id=S.id out = [ ] 4 for t in child.Next() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 Tuple Batch out = [ ] out = [ ] 3 5 R S for t in R: for t in S: out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2019)

  23. 14 VECTO RIZATIO N M O DEL Ideal for OLAP queries because it greatly reduces the number of invocations per operator. Allows for operators to use vectorized (SIMD) instructions to process batches of tuples. CMU 15-445/645 (Fall 2019)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend