Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- 𝑙 Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 3 : Ranked Enumeration Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details 1

Outline tutorial • Part 1: Top- 𝑙 (Wolfgang): ~20min • Part 2: Optimal Join Algorithms (Mirek): ~30min • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k • Anyk-Part • Anyk-Rec – Beyond Path Queries – Ranking Function – Open Problems 2

Ranked Enumeration Example 𝑆 1 𝑆 2 𝑆 3 select A 1 , A 2 , A 3 , A 4 , w 1 + w 2 + w 3 as weight 𝐵 1 𝐵 2 𝑥 1 𝐵 2 𝐵 3 𝐵 3 𝐵 4 𝑥 3 𝑥 2 from R 1 , R 2 , R 3 1 0 1 0 1 1 1 20 5 where R 1 .A 1 =R 2 .A 1 and R 2 .A 2 =R 3 .A 2 2 0 2 0 1 7 1 2 40 order by weight 3 0 3 0 1 8 2 3 10 limit k any-k 4 1 4 0 2 6 2 4 30 Rank-1 Rank-2 Rank-3 (1, 0, 2, 3, 17 ) (2, 0, 2, 3, 18 ) (3, 0, 2, 3, 19 ) … 3

Ranked Enumeration: Problem Definition “Any - k” Anytime algorithms + Top-k #results TTL All results eventually returned Most important results first No need to set k in advance (ranking function on output tuples, e.g. sum of weights) time TTF Delay RAM Cost Model: TT k = Time-to- 𝑙 𝑢ℎ result TTF = Time-to-First = TT 1 • Delay • TTL = Time-to-Last = TT |out| • 4

Top- 𝑙 Optimal Join Algorithms Any- 𝑙 middleware cost model RAM cost model return all results; (# accesses) wish: 𝑃 𝑠 , 𝑠 > 𝑜 conjunctive queries ranking function small result size; query wish: 𝑃(𝑙) decompositions most important all results minimize results first are equally return only intermediate important 𝑙 - best results results incremental computation 5

Resorting to other paradigms • Using Top- 𝑙 : - Most top- 𝑙 join algorithms can be adapted to support ranked enumeration (k is usually not a hard requirement) - But different cost model, huge intermediate results • Using (Optimal) Join Algorithms: - Batch computation of full output then sort - Good TTL , Bad TTF How do Ho do we we pus push the he so sortin ing into nto the he join oin? 6

Unranked Enumeration Related problem: enumerate join results in no particular order 𝑆 1 𝑆 2 𝐵 1 𝐵 2 𝐵 2 𝐵 3 Pre-processing Delay (1, 1, 3) (3, 2, 1) 1 1 2 1 2 4 5 2 3 2 1 3 What if we have projections? [Bagan+ 07]: “Free -connex ” acyclic queries • Linear pre-processing • Constant delay 7 [Bagan+ 07] Bagan, Durand, Grandjean. On acyclic conjunctive queries and constant delay enumeration. CSL'07 https://doi.org/10.1007/978-3-540-74915-8_18

Unranked Enumeration vs Ranked Enumeration Challenge: return the output tuples in the right order 𝑆 1 𝑆 2 𝐵 1 𝐵 2 𝐵 2 𝐵 3 Pre-processing vs ? (1, 1, 3) (3, 2, 1) 1 1 2 1 2 4 5 2 3 2 1 3 Our focus: ranking, no projections 8

Conceptual Roadmap Paths/Serial Cyclic/General Join Problems Top-1 Path Queries Top-1 Conjunctive Queries Optimization DP Union of Tree-DP (UT-DP) Ranked Any-k DP Any-k UT-DP Enumeration Tropical semiring (min, +) Any-k UT-DP over selective dioids 9

Outline tutorial • Part 1: Top- 𝑙 (Wolfgang): ~20min • Part 2: Optimal Join Algorithms (Mirek): ~30min • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries • Anyk-Part • Anyk-Rec Union of Tree-DP (UT-DP) DP – Beyond Path Queries Any-k UT-DP Any-k DP – Ranking Function Any-k UT-DP over – Open Problems selective dioids 10

Top-1 result • Idea: Modify the bottom-up phase of Yannakakis to propagate the minimum weight - (min, +) operators in each step - Top-1 result can be constructed with one top-down traversal 11

Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 𝐵 1 𝐵 2 𝑥 1 𝐵 2 𝐵 3 𝐵 3 𝐵 4 𝑥 3 𝑥 2 1 0 1 0 1 5 1 1 20 2 0 2 0 1 7 1 2 40 3 0 3 0 1 8 2 3 10 4 1 4 0 2 6 2 4 30 12

Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Nodes = Tuples Edges = Joining pairs 1 5 20 Labels = Weights 2 7 40 3 8 10 4 6 30 13

Top-1 result: Example Bottom-up 𝑆 1 𝑆 2 𝑆 3 ∞ ∞ ∞ 1 5 20 ∞ ∞ ∞ 2 7 40 ∞ ∞ ∞ 3 8 10 ∞ ∞ ∞ 4 6 30 14

Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the ∞ ∞ 20 minimum total weight it 1 5 20 can reach 40 ∞ ∞ 2 7 40 ∞ ∞ 10 3 8 10 ∞ ∞ 30 4 6 30 15

Top-1 result: Example min 20,40 + 5 = 25 𝑆 1 𝑆 2 𝑆 3 Each node passes on the ∞ 25 20 minimum total weight it 1 5 20 can reach 40 ∞ 27 2 7 40 ∞ 28 10 3 8 10 ∞ 16 30 4 6 30 16

Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the 17 25 20 minimum total weight it 1 5 20 can reach 40 18 27 2 7 40 19 28 10 3 8 10 ∞ 16 30 4 6 30 17

Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the 17 17 25 20 minimum total weight it 1 5 20 can reach 40 18 27 2 7 40 min 19 28 10 3 8 10 ∞ 16 30 4 6 30 Minimum result weight = 17 18

Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Top-down for Top-1 result 1 5 20 2 7 40 3 8 10 4 6 30 Follow the winning edges 19

Top-1 result & DP Rank-1 algorithm for path queries = (Serial) Dynamic Programming 𝑆 1 𝑆 2 𝑆 3 1 5 20 Subproblem Minimum achievable weight 2 7 40 starting from 𝑠 𝑗 ∈ 𝑆 𝑗 3 8 10 Subproblem 4 6 30 from tuple “5” Overlapping Subproblems Subproblem from tuple “1” 20

Top-1 result & DP Rank-1 algorithm for path queries = (Serial) Dynamic Programming Relations = Stages 𝑆 1 𝑆 2 𝑆 3 (Independent problems) 1 5 20 2 7 40 Nodes = States 3 8 10 (Subproblems) 4 6 30 Principle of Optimality Edges = Decisions An optimal solution must contain (Dependencies) optimal solutions (to subproblems) 21

DP Equi-join State Space ℓ 𝑆 1 𝑆 2 𝑆 3 3 × 4 3 × 2 + 1 × 2 1 5 20 2 7 40 𝑜 3 8 10 4 6 30 Total time = #Edges = 𝑃(𝑜 2 ℓ ) 22

DP Equi-join State Space Equivalent to the “messages” of Yannakakis ℓ 𝑆 1 𝑆 2 𝑆 3 3 4 4 4 Transform the state space 1 5 20 (at most one incoming /outgoing edge per tuple) 2 7 40 𝑜 3 8 10 4 6 30 Linear in the size Total time = #Edges = 𝑃(𝑜 ℓ ) of the database 23

Connection to Factorized Databases 𝑆 1 𝑆 2 𝑆 3 𝐵 2 𝐵 3 [Olteanu+ 16]: 1 5 20 𝐵 3 = 1 Conditional independence of 𝐵 2 = 0 2 7 40 the non-joining attributes given the joining attribute value 𝐵 3 = 2 3 8 10 4 6 30 24 [Olteanu+ 16] Olteanu, Schleich . Factorized databases. SIGMOD Record‘06 https://doi.org/10.1145/3003665.3003667

DP as a Shortest Path Problem • DP computation equivalent to finding the shortest path in a graph 1 5 20 source terminal node node 2 7 40 s t 3 8 10 4 6 30 Note: We ignore the artificial intermediate nodes for simplicity 26

K-Shortest Paths • How do we find the 𝑙 𝑢ℎ best solution to a DP problem? - Rank-1 DP solution => shortest path 𝑙 𝑢ℎ shortest path - Rank- 𝑙 DP solution => 2 nd Shortest Path (26) 1 5 20 source terminal node node 2 7 40 s t 3 8 10 4 6 30 Shortest Path (17) 27

K-Shortest Paths • Two major approaches for computing the 𝑙 𝑢ℎ shortest path in a directed acyclic multi-stage graph • Anyk-Part - Partition the solution space • Anyk-Rec - Recursively compute the lower-rank paths from all nodes (suffixes) 28

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 3 : Ranked Enumeration Time Slides:

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Checki king in and Treating High-Achievi ving Students Meet Meet you your r Doctor Doctor

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Outline Ranking and skyline Top- k algorithms Skyline algorithms Reconciling top-k

CS411 Database Systems Join Expressions 06: SQL Kazuhiro Minami Join Expressions Products and

How does Hash Join work in PostgreSQL and its derivates Yandong Yao Pivotal Greenplum team

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network

Re-evaluate Evaluation David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel Presented by

Littlewood Richardson coefficients for reflection groups Arkady Berenstein and Edward Richmond*

Software implementation of correlated quantum chemistry methods. Exploiting advanced programming

Outline Problem: identifying an ARX systems via binary sensors Previous solutions typically

Control problems for traffjc fmow Mauro Garavello University of Milano Bicocca OptHySYS

Breaking and Mending Resilient Mix-nets Lan Nguyen and Rei Safavi-Naini School of IT and CS

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

fJJ1 independent testing duties I'll need to pay on my new labs? product? extreme liability such

Sambuz

Useful Links

Newsletter

Mail Us