optimal join algorithms meet top

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 2 : Optimal Join Algorithms Time Slides:


  1. SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- 𝑙 Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 2 : Optimal Join Algorithms Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details 1

  2. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min – Lower Bound and the Yannakakis Algorithm – Problems Caused by Cycles – Tree Decompositions – Summary and Further Reading β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min 2

  3. Basic Terminology and Assumptions β€’ Terminology - Full conjunctive query (CQ) β€’ Natural join of π‘š relations with O(π‘œ) tuples each β€’ E.g.: 𝑅 𝐡 1 , 𝐡 2 , 𝐡 3 , 𝐡 4 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 β€’ Any selections comparing attributes to constants, e.g., 𝐡 4 < 1 - Query size: O(π‘š) - Output cardinality: 𝑠 β€’ Assumptions - No pre-computed data structures such as indexes, sorted representation, materialized views 3

  4. Complexity Notation β€’ Standard O and Ξ© notation for time and memory complexity in the RAM model of computation β€’ Common practice: focus on data complexity - We care about scalability in data size β€’ Treat query size π‘š as a constant - E.g., O 𝑔 π‘š β‹… π‘œ 𝑔(π‘š) + log π‘œ 𝑔(π‘š) β‹… 𝑠 simplifies to O π‘œ 𝑔(π‘š) + log π‘œ 𝑔(π‘š) β‹… 𝑠 4

  5. Complexity Notation β€’ Standard O and Ξ© notation for time and memory complexity in the RAM model of computation β€’ Common practice: focus on data complexity - We care about scalability in data size β€’ Treat query size π‘š as a constant - E.g., O 𝑔 π‘š β‹… π‘œ 𝑔(π‘š) + log π‘œ 𝑔(π‘š) β‹… 𝑠 simplifies to O π‘œ 𝑔(π‘š) + log π‘œ 𝑔(π‘š) β‹… 𝑠 β€’ We mostly use ΰ·© O -notation (soft-O) data complexity - Abstracts away polylog factors in input size that clutter formulas - E.g., O π‘œ 𝑔(π‘š) + logπ‘œ 𝑔(π‘š) β‹… 𝑠 further simplifies to ΰ·© O π‘œ 𝑔(π‘š) + 𝑠 5

  6. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min – Lower Bound and the Yannakakis Algorithm – Problems Caused by Cycles – Tree Decompositions – Summary and Further Reading β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min 6

  7. Lower Bound for Any Query β€’ Need to read entire input at least once: Ξ©(π‘šπ‘œ) - Ξ©(π‘œ) data complexity β€’ Need to output every result, each of size π‘š : Ξ©(π‘šπ‘ ) - Ξ©(𝑠) data complexity β€’ Together: Ξ©(π‘œ + 𝑠) time complexity to compute any CQ β€’ Amazingly, the Yannakakis algorithm essentially matches the lower bound for acyclic CQs - Time complexity ΰ·© O π‘œ + 𝑠 7

  8. Yannakakis Algorithm β€’ Given: acyclic conjunctive query Q as a rooted join tree β€’ Step 1: semi-join reduction (two sweeps) - Semi-join reduction sweep from the leaves to root - Semi-join reduction sweep from root to the leaves β€’ Step 2: use the join tree as the query plan - Compute the joins bottom up, with early projections [Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDB’81] https://dl.acm.org/doi/10.5555/1286831.1286840 8

  9. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 4 60 𝐡 1 , 𝐡 2 𝑆 2 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200 𝐡 2 𝐡 1 , 𝐡 2 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 9

  10. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 4 60 𝐡 1 , 𝐡 2 Bottom-up traversal (semi-joins) 𝑆 2 1. 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 𝑺 πŸ‘ ⋉ 𝑺 πŸ“ 1 20 100 3 10 300 1 40 300 2 30 200 𝐡 2 𝐡 1 , 𝐡 2 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 10

  11. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 4 60 𝐡 1 , 𝐡 2 Bottom-up traversal (semi-joins) 𝑆 2 1. 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 𝑺 πŸ‘ ⋉ 𝑺 πŸ“ 1 20 100 3 10 300 𝑩 𝟐 𝑩 πŸ‘ 1 40 300 2 30 200 1 10 1 20 𝐡 2 2 20 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 11

  12. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 4 60 𝐡 1 , 𝐡 2 Bottom-up traversal (semi-joins) 𝑆 2 1. 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 𝑺 πŸ‘ ⋉ 𝑺 πŸ“ 1 20 100 3 10 300 𝑩 𝟐 𝑩 πŸ‘ 1 40 300 2 30 200 1 10 1 20 𝐡 2 2 20 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 12

  13. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 4 60 𝐡 1 , 𝐡 2 Bottom-up traversal (semi-joins) 𝑆 2 1. 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 𝑺 πŸ‘ ⋉ 𝑺 πŸ’ 𝑺 πŸ‘ ⋉ 𝑺 πŸ“ 1 20 100 3 10 300 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 1 40 300 2 30 200 10 1 10 20 1 20 30 2 20 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 13

  14. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 𝑺 𝟐 ⋉ 𝑺 πŸ‘ 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 4 60 𝑩 𝟐 𝑩 πŸ‘ 1 10 1 20 Bottom-up traversal (semi-joins) 𝑆 2 1. 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 𝑺 πŸ‘ ⋉ 𝑺 πŸ’ 𝑺 πŸ‘ ⋉ 𝑺 πŸ“ 1 20 100 3 10 300 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 1 40 300 2 30 200 10 1 10 20 1 20 30 2 20 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 14

  15. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 4 60 𝑺 πŸ‘ ⋉ 𝑺 𝟐 Bottom-up traversal (semi-joins) 𝑆 2 1. 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 3 10 300 1 40 300 2 30 200 𝑺 πŸ’ ⋉ 𝑺 πŸ‘ 𝑺 πŸ“ ⋉ 𝑺 πŸ‘ 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 15

  16. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 Bottom-up traversal (semi-joins) 𝑆 2 1. 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 Join bottom-up 3. 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 1 20 2000 16

  17. Yannakakis Algorithm – Example 𝑆 1 𝑩 𝟐 𝑩 πŸ‘ Database Theory 1 20 1 10 𝑅 = 𝑆 1 𝐡 1 , 𝐡 2 β‹ˆ 𝑆 2 𝐡 1 , 𝐡 2 , 𝐡 3 β‹ˆ 𝑆 3 𝐡 2 β‹ˆ 𝑆 4 𝐡 1 , 𝐡 2 , 𝐡 4 Bottom-up traversal (semi-joins) 𝑆 2 1. 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ’ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 Join bottom-up 3. In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa! 𝑆 3 𝑆 4 𝑩 πŸ‘ 𝑩 𝟐 𝑩 πŸ‘ 𝑩 πŸ“ 10 1 10 1000 20 1 20 1000 1 20 2000 17

  18. Yannakakis Algorithm – Properties β€’ Semi-join sweeps take ΰ·© O π‘œ β€’ A join step can never shrink intermediate result size - This does not hold for all trees - Tree must be attribute-connected (more on this soon) β€’ Hence all intermediate results are of size O 𝑠 β€’ Each join step therefore has O π‘œ + 𝑠 input and O 𝑠 output β€’ Easy to compute a binary join with O π‘œ + 𝑠 input and O 𝑠 output in time ΰ·© O π‘œ + 𝑠 , e.g., using sort-merge join 18

  19. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min – Lower Bound and the Yannakakis Algorithm – Problems Caused by Cycles – Tree Decompositions – Summary and Further Reading β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min 19

  20. CQs with Cycles β€’ 3-path: 𝑅 3π‘ž = 𝑆 1 (𝐡 1 , 𝐡 2 ) β‹ˆ 𝑆 2 (𝐡 2 , 𝐡 3 ) β‹ˆ 𝑆 3 (𝐡 3 , 𝐡 4 ) β€’ 3-cycle: 𝑅 3𝑑 = 𝑆 1 (𝐡 1 , 𝐡 2 ) β‹ˆ 𝑆 2 (𝐡 2 , 𝐡 3 ) β‹ˆ 𝑆 3 (𝐡 3 , 𝐡 1 ) 𝑅 3π‘ž 𝑅 3𝑑 𝐡 3 𝐡 3 𝐡 2 𝐡 1 𝐡 2 𝐡 3 𝐡 1 𝐡 2 𝐡 3 𝐡 1 𝐡 2 𝐡 4 20

Recommend


More recommend