SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- π Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 2 : Optimal Join Algorithms Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details 1
Outline tutorial β’ Part 1: Top- π (Wolfgang): ~20min β’ Part 2: Optimal Join Algorithms (Mirek): ~30min β Lower Bound and the Yannakakis Algorithm β Problems Caused by Cycles β Tree Decompositions β Summary and Further Reading β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min 2
Basic Terminology and Assumptions β’ Terminology - Full conjunctive query (CQ) β’ Natural join of π relations with O(π) tuples each β’ E.g.: π π΅ 1 , π΅ 2 , π΅ 3 , π΅ 4 = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 β’ Any selections comparing attributes to constants, e.g., π΅ 4 < 1 - Query size: O(π) - Output cardinality: π β’ Assumptions - No pre-computed data structures such as indexes, sorted representation, materialized views 3
Complexity Notation β’ Standard O and Ξ© notation for time and memory complexity in the RAM model of computation β’ Common practice: focus on data complexity - We care about scalability in data size β’ Treat query size π as a constant - E.g., O π π β π π(π) + log π π(π) β π simplifies to O π π(π) + log π π(π) β π 4
Complexity Notation β’ Standard O and Ξ© notation for time and memory complexity in the RAM model of computation β’ Common practice: focus on data complexity - We care about scalability in data size β’ Treat query size π as a constant - E.g., O π π β π π(π) + log π π(π) β π simplifies to O π π(π) + log π π(π) β π β’ We mostly use ΰ·© O -notation (soft-O) data complexity - Abstracts away polylog factors in input size that clutter formulas - E.g., O π π(π) + logπ π(π) β π further simplifies to ΰ·© O π π(π) + π 5
Outline tutorial β’ Part 1: Top- π (Wolfgang): ~20min β’ Part 2: Optimal Join Algorithms (Mirek): ~30min β Lower Bound and the Yannakakis Algorithm β Problems Caused by Cycles β Tree Decompositions β Summary and Further Reading β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min 6
Lower Bound for Any Query β’ Need to read entire input at least once: Ξ©(ππ) - Ξ©(π) data complexity β’ Need to output every result, each of size π : Ξ©(ππ ) - Ξ©(π ) data complexity β’ Together: Ξ©(π + π ) time complexity to compute any CQ β’ Amazingly, the Yannakakis algorithm essentially matches the lower bound for acyclic CQs - Time complexity ΰ·© O π + π 7
Yannakakis Algorithm β’ Given: acyclic conjunctive query Q as a rooted join tree β’ Step 1: semi-join reduction (two sweeps) - Semi-join reduction sweep from the leaves to root - Semi-join reduction sweep from root to the leaves β’ Step 2: use the join tree as the query plan - Compute the joins bottom up, with early projections [Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDBβ81] https://dl.acm.org/doi/10.5555/1286831.1286840 8
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 4 60 π΅ 1 , π΅ 2 π 2 π© π π© π π© π 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200 π΅ 2 π΅ 1 , π΅ 2 π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 9
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 4 60 π΅ 1 , π΅ 2 Bottom-up traversal (semi-joins) π 2 1. π© π π© π π© π 1 10 100 πΊ π β πΊ π 1 20 100 3 10 300 1 40 300 2 30 200 π΅ 2 π΅ 1 , π΅ 2 π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 10
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 4 60 π΅ 1 , π΅ 2 Bottom-up traversal (semi-joins) π 2 1. π© π π© π π© π 1 10 100 πΊ π β πΊ π 1 20 100 3 10 300 π© π π© π 1 40 300 2 30 200 1 10 1 20 π΅ 2 2 20 π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 11
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 4 60 π΅ 1 , π΅ 2 Bottom-up traversal (semi-joins) π 2 1. π© π π© π π© π 1 10 100 πΊ π β πΊ π 1 20 100 3 10 300 π© π π© π 1 40 300 2 30 200 1 10 1 20 π΅ 2 2 20 π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 12
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 4 60 π΅ 1 , π΅ 2 Bottom-up traversal (semi-joins) π 2 1. π© π π© π π© π 1 10 100 πΊ π β πΊ π πΊ π β πΊ π 1 20 100 3 10 300 π© π π© π π© π 1 40 300 2 30 200 10 1 10 20 1 20 30 2 20 π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 13
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory πΊ π β πΊ π 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 4 60 π© π π© π 1 10 1 20 Bottom-up traversal (semi-joins) π 2 1. π© π π© π π© π 1 10 100 πΊ π β πΊ π πΊ π β πΊ π 1 20 100 3 10 300 π© π π© π π© π 1 40 300 2 30 200 10 1 10 20 1 20 30 2 20 π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 14
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 4 60 πΊ π β πΊ π Bottom-up traversal (semi-joins) π 2 1. π© π π© π π© π 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 3 10 300 1 40 300 2 30 200 πΊ π β πΊ π πΊ π β πΊ π π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 15
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 Bottom-up traversal (semi-joins) π 2 1. π© π π© π π© π 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 Join bottom-up 3. π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 1 20 2000 16
Yannakakis Algorithm β Example π 1 π© π π© π Database Theory 1 20 1 10 π = π 1 π΅ 1 , π΅ 2 β π 2 π΅ 1 , π΅ 2 , π΅ 3 β π 3 π΅ 2 β π 4 π΅ 1 , π΅ 2 , π΅ 4 Bottom-up traversal (semi-joins) π 2 1. π© π π© π π© π 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 Join bottom-up 3. In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa! π 3 π 4 π© π π© π π© π π© π 10 1 10 1000 20 1 20 1000 1 20 2000 17
Yannakakis Algorithm β Properties β’ Semi-join sweeps take ΰ·© O π β’ A join step can never shrink intermediate result size - This does not hold for all trees - Tree must be attribute-connected (more on this soon) β’ Hence all intermediate results are of size O π β’ Each join step therefore has O π + π input and O π output β’ Easy to compute a binary join with O π + π input and O π output in time ΰ·© O π + π , e.g., using sort-merge join 18
Outline tutorial β’ Part 1: Top- π (Wolfgang): ~20min β’ Part 2: Optimal Join Algorithms (Mirek): ~30min β Lower Bound and the Yannakakis Algorithm β Problems Caused by Cycles β Tree Decompositions β Summary and Further Reading β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min 19
CQs with Cycles β’ 3-path: π 3π = π 1 (π΅ 1 , π΅ 2 ) β π 2 (π΅ 2 , π΅ 3 ) β π 3 (π΅ 3 , π΅ 4 ) β’ 3-cycle: π 3π = π 1 (π΅ 1 , π΅ 2 ) β π 2 (π΅ 2 , π΅ 3 ) β π 3 (π΅ 3 , π΅ 1 ) π 3π π 3π π΅ 3 π΅ 3 π΅ 2 π΅ 1 π΅ 2 π΅ 3 π΅ 1 π΅ 2 π΅ 3 π΅ 1 π΅ 2 π΅ 4 20
Recommend
More recommend