optimal join algorithms meet top
play

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 3 : Ranked Enumeration Time Slides:


  1. SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- 𝑙 Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 3 : Ranked Enumeration Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details 1

  2. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k β€’ Anyk-Part β€’ Anyk-Rec – Beyond Path Queries – Ranking Function – Open Problems 2

  3. Ranked Enumeration Example 𝑆 1 𝑆 2 𝑆 3 select A 1 , A 2 , A 3 , A 4 , w 1 + w 2 + w 3 as weight 𝐡 1 𝐡 2 π‘₯ 1 𝐡 2 𝐡 3 𝐡 3 𝐡 4 π‘₯ 3 π‘₯ 2 from R 1 , R 2 , R 3 1 0 1 0 1 1 1 20 5 where R 1 .A 1 =R 2 .A 1 and R 2 .A 2 =R 3 .A 2 2 0 2 0 1 7 1 2 40 order by weight 3 0 3 0 1 8 2 3 10 limit k any-k 4 1 4 0 2 6 2 4 30 Rank-1 Rank-2 Rank-3 (1, 0, 2, 3, 17 ) (2, 0, 2, 3, 18 ) (3, 0, 2, 3, 19 ) … 3

  4. Ranked Enumeration: Problem Definition β€œAny - k” Anytime algorithms + Top-k #results TTL All results eventually returned Most important results first No need to set k in advance (ranking function on output tuples, e.g. sum of weights) time TTF Delay RAM Cost Model: TT k = Time-to- 𝑙 π‘’β„Ž result TTF = Time-to-First = TT 1 β€’ Delay β€’ TTL = Time-to-Last = TT |out| β€’ 4

  5. Top- 𝑙 Optimal Join Algorithms Any- 𝑙 middleware cost model RAM cost model return all results; (# accesses) wish: 𝑃 𝑠 , 𝑠 > π‘œ conjunctive queries ranking function small result size; query wish: 𝑃(𝑙) decompositions most important all results minimize results first are equally return only intermediate important 𝑙 - best results results incremental computation 5

  6. Resorting to other paradigms β€’ Using Top- 𝑙 : - Most top- 𝑙 join algorithms can be adapted to support ranked enumeration (k is usually not a hard requirement) - But different cost model, huge intermediate results β€’ Using (Optimal) Join Algorithms: - Batch computation of full output then sort - Good TTL , Bad TTF How do Ho do we we pus push the he so sortin ing into nto the he join oin? 6

  7. Unranked Enumeration Related problem: enumerate join results in no particular order 𝑆 1 𝑆 2 𝐡 1 𝐡 2 𝐡 2 𝐡 3 Pre-processing Delay (1, 1, 3) (3, 2, 1) 1 1 2 1 2 4 5 2 3 2 1 3 What if we have projections? [Bagan+ 07]: β€œFree -connex ” acyclic queries β€’ Linear pre-processing β€’ Constant delay 7 [Bagan+ 07] Bagan, Durand, Grandjean. On acyclic conjunctive queries and constant delay enumeration. CSL'07 https://doi.org/10.1007/978-3-540-74915-8_18

  8. Unranked Enumeration vs Ranked Enumeration Challenge: return the output tuples in the right order 𝑆 1 𝑆 2 𝐡 1 𝐡 2 𝐡 2 𝐡 3 Pre-processing vs ? (1, 1, 3) (3, 2, 1) 1 1 2 1 2 4 5 2 3 2 1 3 Our focus: ranking, no projections 8

  9. Conceptual Roadmap Paths/Serial Cyclic/General Join Problems Top-1 Path Queries Top-1 Conjunctive Queries Optimization DP Union of Tree-DP (UT-DP) Ranked Any-k DP Any-k UT-DP Enumeration Tropical semiring (min, +) Any-k UT-DP over selective dioids 9

  10. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β€’ Anyk-Part β€’ Anyk-Rec Union of Tree-DP (UT-DP) DP – Beyond Path Queries Any-k UT-DP Any-k DP – Ranking Function Any-k UT-DP over – Open Problems selective dioids 10

  11. Top-1 result β€’ Idea: Modify the bottom-up phase of Yannakakis to propagate the minimum weight - (min, +) operators in each step - Top-1 result can be constructed with one top-down traversal 11

  12. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 𝐡 1 𝐡 2 π‘₯ 1 𝐡 2 𝐡 3 𝐡 3 𝐡 4 π‘₯ 3 π‘₯ 2 1 0 1 0 1 5 1 1 20 2 0 2 0 1 7 1 2 40 3 0 3 0 1 8 2 3 10 4 1 4 0 2 6 2 4 30 12

  13. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Nodes = Tuples Edges = Joining pairs 1 5 20 Labels = Weights 2 7 40 3 8 10 4 6 30 13

  14. Top-1 result: Example Bottom-up 𝑆 1 𝑆 2 𝑆 3 ∞ ∞ ∞ 1 5 20 ∞ ∞ ∞ 2 7 40 ∞ ∞ ∞ 3 8 10 ∞ ∞ ∞ 4 6 30 14

  15. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the ∞ ∞ 20 minimum total weight it 1 5 20 can reach 40 ∞ ∞ 2 7 40 ∞ ∞ 10 3 8 10 ∞ ∞ 30 4 6 30 15

  16. Top-1 result: Example min 20,40 + 5 = 25 𝑆 1 𝑆 2 𝑆 3 Each node passes on the ∞ 25 20 minimum total weight it 1 5 20 can reach 40 ∞ 27 2 7 40 ∞ 28 10 3 8 10 ∞ 16 30 4 6 30 16

  17. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the 17 25 20 minimum total weight it 1 5 20 can reach 40 18 27 2 7 40 19 28 10 3 8 10 ∞ 16 30 4 6 30 17

  18. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the 17 17 25 20 minimum total weight it 1 5 20 can reach 40 18 27 2 7 40 min 19 28 10 3 8 10 ∞ 16 30 4 6 30 Minimum result weight = 17 18

  19. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Top-down for Top-1 result 1 5 20 2 7 40 3 8 10 4 6 30 Follow the winning edges 19

  20. Top-1 result & DP Rank-1 algorithm for path queries = (Serial) Dynamic Programming 𝑆 1 𝑆 2 𝑆 3 1 5 20 Subproblem Minimum achievable weight 2 7 40 starting from 𝑠 𝑗 ∈ 𝑆 𝑗 3 8 10 Subproblem 4 6 30 from tuple β€œ5” Overlapping Subproblems Subproblem from tuple β€œ1” 20

  21. Top-1 result & DP Rank-1 algorithm for path queries = (Serial) Dynamic Programming Relations = Stages 𝑆 1 𝑆 2 𝑆 3 (Independent problems) 1 5 20 2 7 40 Nodes = States 3 8 10 (Subproblems) 4 6 30 Principle of Optimality Edges = Decisions An optimal solution must contain (Dependencies) optimal solutions (to subproblems) 21

  22. DP Equi-join State Space β„“ 𝑆 1 𝑆 2 𝑆 3 3 Γ— 4 3 Γ— 2 + 1 Γ— 2 1 5 20 2 7 40 π‘œ 3 8 10 4 6 30 Total time = #Edges = 𝑃(π‘œ 2 β„“ ) 22

  23. DP Equi-join State Space Equivalent to the β€œmessages” of Yannakakis β„“ 𝑆 1 𝑆 2 𝑆 3 3 4 4 4 Transform the state space 1 5 20 (at most one incoming /outgoing edge per tuple) 2 7 40 π‘œ 3 8 10 4 6 30 Linear in the size Total time = #Edges = 𝑃(π‘œ β„“ ) of the database 23

  24. Connection to Factorized Databases 𝑆 1 𝑆 2 𝑆 3 𝐡 2 𝐡 3 [Olteanu+ 16]: 1 5 20 𝐡 3 = 1 Conditional independence of 𝐡 2 = 0 2 7 40 the non-joining attributes given the joining attribute value 𝐡 3 = 2 3 8 10 4 6 30 24 [Olteanu+ 16] Olteanu, Schleich . Factorized databases. SIGMOD Recordβ€˜06 https://doi.org/10.1145/3003665.3003667

  25. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β€’ Anyk-Part β€’ Anyk-Rec Union of Tree-DP (UT-DP) DP – Beyond Path Queries Any-k UT-DP Any-k DP – Ranking Function Any-k UT-DP over – Open Problems selective dioids 25

  26. DP as a Shortest Path Problem β€’ DP computation equivalent to finding the shortest path in a graph 1 5 20 source terminal node node 2 7 40 s t 3 8 10 4 6 30 Note: We ignore the artificial intermediate nodes for simplicity 26

  27. K-Shortest Paths β€’ How do we find the 𝑙 π‘’β„Ž best solution to a DP problem? - Rank-1 DP solution => shortest path 𝑙 π‘’β„Ž shortest path - Rank- 𝑙 DP solution => 2 nd Shortest Path (26) 1 5 20 source terminal node node 2 7 40 s t 3 8 10 4 6 30 Shortest Path (17) 27

  28. K-Shortest Paths β€’ Two major approaches for computing the 𝑙 π‘’β„Ž shortest path in a directed acyclic multi-stage graph β€’ Anyk-Part - Partition the solution space β€’ Anyk-Rec - Recursively compute the lower-rank paths from all nodes (suffixes) 28

  29. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β€’ Anyk-Part β€’ Anyk-Rec Union of Tree-DP (UT-DP) DP – Beyond Path Queries Any-k UT-DP Any-k DP – Ranking Function Any-k UT-DP over – Open Problems selective dioids 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend