query processing
play

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 ADM IN ISTRIVIA Project #2 Checkpoint #1 is due Monday October 9 th @ 11:59pm Mid-term Exam is on


  1. Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018

  2. 2 ADM IN ISTRIVIA Project #2 – Checkpoint #1 is due Monday October 9 th @ 11:59pm Mid-term Exam is on Wednesday October 17 th (in class) CMU 15-445/645 (Fall 2018)

  3. 3 UPCO M IN G DATABASE EVEN TS SQream DB Tech Talk → Thursday Oct 4 th @ 12:00pm → CIC 4 th Floor CMU 15-445/645 (Fall 2018)

  4. 4 Q UERY PLAN SELECT A.id, B.value The operators are arranged in a tree. FROM A, B Data flows from the leaves toward the WHERE A.id = B.id AND B.value > 100 root. p A.id, B.value The output of the root node is the result of the query. ⨝ A.id=B.id s value>100 A B CMU 15-445/645 (Fall 2018)

  5. 5 TO DAY'S AGEN DA Processing Models Access Methods Expression Evaluation CMU 15-445/645 (Fall 2018)

  6. 6 PRO CESSIN G M O DEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for different workloads. Three approaches: → Iterator Model → Materialization Model → Vectorized / Batch Model CMU 15-445/645 (Fall 2018)

  7. 7 ITERATO R M O DEL Each query plan operator implements a next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Top-down plan processing. Also called Volcano or Pipeline Model. CMU 15-445/645 (Fall 2018)

  8. 8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

  9. 8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

  10. 8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

  11. 8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 5 emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

  12. 9 ITERATO R M O DEL This is used in almost every DBMS. Allows for tuple pipelining . Some operators will block until children emit all of their tuples. → Joins, Subqueries, Order By Output control works easily with this approach. → Limit CMU 15-445/645 (Fall 2018)

  13. 10 M ATERIALIZATIO N M O DEL Each operator processes its input all at once and then emits its output all at once. → The operator "materializes" it output as a single result. → The DBMS can push down hints into to avoid scanning too many tuples. Bottom-up plan processing. CMU 15-445/645 (Fall 2018)

  14. 11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

  15. 11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

  16. 11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } 3 for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 2 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

  17. 11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } 5 FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p 4 for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } 3 for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 2 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

  18. 12 M ATERIALIZATIO N M O DEL Better for OLTP workloads because queries typically only access a small number of tuples at a time. → Lower execution / coordination overhead. Not good for OLAP queries with large intermediate results. CMU 15-445/645 (Fall 2018)

  19. 13 VECTO RIZATIO N M O DEL Like Iterator Model, each operator implements a next function. Each operator emits a batch of tuples instead of a single tuple. → The operator's internal loop processes multiple tuples at a time. → The size of the batch can vary based on hardware or query properties. CMU 15-445/645 (Fall 2018)

  20. 14 VECTO RIZATIO N M O DEL out = { } 1 for t in child.Output() : SELECT A.id, B.value out.add ( projection (t)) FROM A, B if | out |> n : emit ( out ) WHERE A.id = B.id out = { } AND B.value > 100 2 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : A.id, B.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ A.id=B.id out = { } for t in child.Output() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 out = { } out = { } A B 3 for t in A : for t in B : out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2018)

  21. 14 VECTO RIZATIO N M O DEL out = { } 1 for t in child.Output() : SELECT A.id, B.value out.add ( projection (t)) FROM A, B if | out |> n : emit ( out ) WHERE A.id = B.id out = { } AND B.value > 100 2 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : A.id, B.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ A.id=B.id out = { } 4 for t in child.Output() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 out = { } out = { } A B 3 5 for t in A : for t in B : out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2018)

  22. 15 VECTO RIZATIO N M O DEL Ideal for OLAP queries → Greatly reduces the number of invocations per operator. → Allows for operators to use vectorized (SIMD) instructions to process batches of tuples. CMU 15-445/645 (Fall 2018)

  23. 16 PRO CESSIN G M O DELS SUM M ARY Iterator / Volcano → Direction: Top-Down → Emits: Single Tuple → Target: General Purpose Vectorized Materialization → Direction: Top-Down → Direction: Bottom-Up → Emits: Tuple Batch → Emits: Entire Tuple Set → Target: OLAP → Target: OLTP CMU 15-445/645 (Fall 2018)

  24. 17 ACCESS M ETH O DS SELECT A.id, B.value An access method is a way that the FROM A, B DBMS can access the data stored in a WHERE A.id = B.id AND B.value > 100 table. p → Not defined in relational algebra. A.id, B.value Three basic approaches: ⨝ A.id=B.id → Sequential Scan s → Index Scan → Multi-Index / "Bitmap" Scan value>100 A B CMU 15-445/645 (Fall 2018)

  25. 18 SEQ UEN TIAL SCAN For each page in the table: for page in table.pages: → Retrieve it from the buffer pool. for t in page.tuples: → Iterate over each tuple and check whether if evalPred (t): to include it. // Do Something! The DBMS maintains an internal cursor that tracks the last page / slot it examined. CMU 15-445/645 (Fall 2018)

  26. 19 SEQ UEN TIAL SCAN : O PTIM IZATIO N S This is almost always the worst thing that the DBMS can do to execute a query. Sequential Scan Optimizations: → Prefetching → Parallelization → Buffer Pool Bypass → Zone Maps → Late Materialization → Heap Clustering CMU 15-445/645 (Fall 2018)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend