Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - - PowerPoint PPT Presentation

β–Ά
optimal join algorithms meet top
SMART_READER_LITE
LIVE PREVIEW

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 3 : Ranked Enumeration Time Slides:


slide-1
SLIDE 1

1

Optimal Join Algorithms meet Top-𝑙

SIGMOD 2020 tutorial Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Northeastern University, Boston

Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu

Ranked results Time

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details

Part 3 : Ranked Enumeration

slide-2
SLIDE 2

2

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min
  • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min

– Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k

  • Anyk-Part
  • Anyk-Rec

– Beyond Path Queries – Ranking Function – Open Problems

slide-3
SLIDE 3

3

Ranked Enumeration Example

𝑆1

π‘₯1 𝐡1 1 1 2 2 3 3 4 4 𝐡2 1

𝑆2

𝐡2 π‘₯2 𝐡3 1 1 1 2

𝑆3

𝐡3 𝐡4 1 1 1 2 2 3 2 4 π‘₯3 5 7 8 6 20 40 10 30 select A1, A2, A3, A4, w1 + w2 + w3 as weight from R1, R2, R3 where R1.A1=R2.A1 and R2.A2=R3.A2

  • rder by weight

limit k any-k

(1, 0, 2, 3, 17) (2, 0, 2, 3, 18) (3, 0, 2, 3, 19)

Rank-1 Rank-2 Rank-3

…

slide-4
SLIDE 4

4

Ranked Enumeration: Problem Definition

RAM Cost Model: TT k = Time-to-π‘™π‘’β„Ž result

  • TTF = Time-to-First = TT 1
  • Delay
  • TTL = Time-to-Last = TT |out|

#results time

TTF TTL Delay

β€œAny-k” Anytime algorithms + Top-k

Most important results first (ranking function on output tuples, e.g. sum of weights) All results eventually returned No need to set k in advance

slide-5
SLIDE 5

5

Top-𝑙 Optimal Join Algorithms

ranking function most important results first RAM cost model minimize intermediate results

Any-𝑙

middleware cost model (# accesses) return only 𝑙-best results small result size; wish: 𝑃(𝑙) all results are equally important query decompositions return all results; wish: 𝑃 𝑠 , 𝑠>π‘œ conjunctive queries incremental computation

slide-6
SLIDE 6

6

Resorting to other paradigms

  • Using Top-𝑙:
  • Most top-𝑙 join algorithms can be adapted to support ranked enumeration

(k is usually not a hard requirement)

  • But different cost model, huge intermediate results
  • Using (Optimal) Join Algorithms:
  • Batch computation of full output then sort
  • Good TTL, Bad TTF

Ho How do do we we pus push the he so sortin ing into nto the he join

  • in?
slide-7
SLIDE 7

7

Unranked Enumeration

Related problem: enumerate join results in no particular order

What if we have projections? [Bagan+ 07]: β€œFree-connex” acyclic queries

  • Linear pre-processing
  • Constant delay

Pre-processing (1, 1, 3) Delay 𝑆1

𝐡1 1 2 3 𝐡2 1 4 2

𝑆2

𝐡2 2 5 1 𝐡3 1 2 3

(3, 2, 1)

[Bagan+ 07] Bagan, Durand, Grandjean. On acyclic conjunctive queries and constant delay enumeration. CSL'07 https://doi.org/10.1007/978-3-540-74915-8_18

slide-8
SLIDE 8

8

Unranked Enumeration vs Ranked Enumeration

Challenge: return the output tuples in the right order

vs ? Pre-processing (1, 1, 3) (3, 2, 1) 𝑆1

𝐡1 1 2 3 𝐡2 1 4 2

𝑆2

𝐡2 2 5 1 𝐡3 1 2 3

Our focus: ranking, no projections

slide-9
SLIDE 9

9

Conceptual Roadmap

Top-1 Path Queries DP Any-k DP Top-1 Conjunctive Queries Union of Tree-DP (UT-DP) Any-k UT-DP Any-k UT-DP over selective dioids

Tropical semiring (min, +) Join Problems Optimization Ranked Enumeration Paths/Serial Cyclic/General

slide-10
SLIDE 10

10

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min
  • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min

– Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k

  • Anyk-Part
  • Anyk-Rec

– Beyond Path Queries – Ranking Function – Open Problems

Top-1 Path Queries DP Any-k DP Top-1 Conjunctive Queries Union of Tree-DP (UT-DP) Any-k UT-DP Any-k UT-DP over selective dioids

slide-11
SLIDE 11

11

Top-1 result

  • Idea: Modify the bottom-up phase of Yannakakis to propagate the

minimum weight

  • (min, +) operators in each step
  • Top-1 result can be constructed with one top-down traversal
slide-12
SLIDE 12

12

Top-1 result: Example

𝑆1

π‘₯1 𝐡1 1 1 2 2 3 3 4 4 𝐡2 1

𝑆2

𝐡2 π‘₯2 𝐡3 1 1 1 2

𝑆3

𝐡3 𝐡4 1 1 1 2 2 3 2 4 π‘₯3 5 7 8 6 20 40 10 30

slide-13
SLIDE 13

13

Top-1 result: Example

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Nodes = Tuples Edges = Joining pairs Labels = Weights

slide-14
SLIDE 14

14

Top-1 result: Example

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Bottom-up

∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

slide-15
SLIDE 15

15

Top-1 result: Example

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Each node passes on the minimum total weight it can reach

20 40 10 30 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

slide-16
SLIDE 16

16

Top-1 result: Example

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Each node passes on the minimum total weight it can reach

min 20,40 + 5 = 25 25 27 28 16 20 40 10 30 ∞ ∞ ∞ ∞

slide-17
SLIDE 17

17

Top-1 result: Example

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Each node passes on the minimum total weight it can reach

17 18 19 25 27 28 16 20 40 10 30 ∞

slide-18
SLIDE 18

18

Top-1 result: Example

Each node passes on the minimum total weight it can reach Minimum result weight = 17 𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4 25 27 28 16 20 40 10 30 17 17 18 19 min ∞

slide-19
SLIDE 19

19

Top-1 result: Example

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Top-down for Top-1 result Follow the winning edges

slide-20
SLIDE 20

20

Top-1 result & DP

Rank-1 algorithm for path queries = (Serial) Dynamic Programming

Subproblem Minimum achievable weight starting from 𝑠

𝑗 ∈ 𝑆𝑗

Subproblem from tuple β€œ1” 𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Subproblem from tuple β€œ5” Overlapping Subproblems

slide-21
SLIDE 21

21

Top-1 result & DP

Rank-1 algorithm for path queries = (Serial) Dynamic Programming

𝑆1 𝑆2 𝑆3 Edges = Decisions (Dependencies) Relations = Stages (Independent problems) Nodes = States (Subproblems)

20 10 40 30 5 8 7 6 1 3 2 4

Principle of Optimality An optimal solution must contain

  • ptimal solutions (to subproblems)
slide-22
SLIDE 22

22

DP Equi-join State Space Total time = #Edges = 𝑃(π‘œ2 β„“)

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

3 Γ— 4 3 Γ— 2 + 1 Γ— 2 π‘œ β„“

slide-23
SLIDE 23

23

DP Equi-join State Space Total time = #Edges = 𝑃(π‘œ β„“)

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Transform the state space (at most one incoming /outgoing edge per tuple)

3 4 4 4

Linear in the size

  • f the database

π‘œ β„“

Equivalent to the β€œmessages” of Yannakakis

slide-24
SLIDE 24

24

Connection to Factorized Databases

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

𝐡2 = 0 𝐡3 = 1 𝐡3 = 2

[Olteanu+ 16]: Conditional independence of the non-joining attributes given the joining attribute value

𝐡2 𝐡3

[Olteanu+ 16] Olteanu, Schleich. Factorized databases. SIGMOD Recordβ€˜06 https://doi.org/10.1145/3003665.3003667

slide-25
SLIDE 25

25

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min
  • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min

– Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k

  • Anyk-Part
  • Anyk-Rec

– Beyond Path Queries – Ranking Function – Open Problems

Top-1 Path Queries DP Any-k DP Top-1 Conjunctive Queries Union of Tree-DP (UT-DP) Any-k UT-DP Any-k UT-DP over selective dioids

slide-26
SLIDE 26

26

DP as a Shortest Path Problem

  • DP computation equivalent to finding the shortest path in a graph

source node terminal node Note: We ignore the artificial intermediate nodes for simplicity 20 10 40 30 5 8 7 6 1 3 2 4 s t

slide-27
SLIDE 27

27

K-Shortest Paths

  • How do we find the π‘™π‘’β„Ž best solution to a DP problem?
  • Rank-1 DP solution

=> shortest path

  • Rank-𝑙 DP solution

=> π‘™π‘’β„Ž shortest path

Shortest Path (17) 2nd Shortest Path (26) s t source node terminal node 20 10 40 30 5 8 7 6 1 3 2 4

slide-28
SLIDE 28

28

K-Shortest Paths

  • Two major approaches for computing the π‘™π‘’β„Ž shortest path in a

directed acyclic multi-stage graph

  • Anyk-Part
  • Partition the solution space
  • Anyk-Rec
  • Recursively compute the lower-rank paths from all nodes (suffixes)
slide-29
SLIDE 29

29

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min
  • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min

– Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k

  • Anyk-Part
  • Anyk-Rec

– Beyond Path Queries – Ranking Function – Open Problems

Top-1 Path Queries DP Any-k DP Top-1 Conjunctive Queries Union of Tree-DP (UT-DP) Any-k UT-DP Any-k UT-DP over selective dioids

slide-30
SLIDE 30

30

Lawler-Murty Procedure

[Lawler 72]: generic procedure for ranked enumeration

  • Repeatedly partitions the solution space
  • Applicable to a wide range of problems
  • Generalization of an earlier algorithm of [Murty 68]

Available Fixed

Disjo Disjoint Sub Subspaces

Variables Values

Orig iginal l Spa Space

Best solution

[Lawler 72] Lawler. A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science’72

https://doi.org/10.1287/mnsc.18.7.401

[Murty 68] Murty. An Algorithm for Ranking all the Assignments in Order of Increasing Cost. Operations Research’68 https://doi.org/10.1287/opre.16.3.682

slide-31
SLIDE 31

31

2nd Best Path

What can the 2nd best path be?

s 20 10 40 30 5 8 7 6 1 3 2

𝑆1 𝑆2 𝑆3

t Top-1 Path

slide-32
SLIDE 32

32

2nd Best Path

Option 1: Deviate in the first stage

s 20 10 40 30 5 8 7 6 1 3 2

𝑆1 𝑆2 𝑆3

t Top-1 Path

slide-33
SLIDE 33

33

2nd Best Path

Option 2: Keep the first decision Deviate in the second stage

s 20 10 40 30 5 8 7 6 1 3 2

𝑆1 𝑆2 𝑆3

t Top-1 Path

slide-34
SLIDE 34

34

2nd Best Path

Option 3: Keep the first and second decisions Deviate in the third stage

s 20 10 40 30 5 8 7 6 1 3 2

𝑆1 𝑆2 𝑆3

t Top-1 Path

slide-35
SLIDE 35

35

  • Partition the solution space into 3 disjoint subspaces (subgraphs)
  • Compute the best solution in each subspace
  • 2nd best = winner among the 3

2nd Best Path

(18 18) (26) (37)

s 20 10 40 30 5 8 7 6 1 3 2 t s 20 10 40 30 5 8 7 6 1 3 2 t s 20 10 40 30 5 8 7 6 1 3 2 t

slide-36
SLIDE 36

36

Rank-𝑙 path

  • In general, maintain a global Priority Queue
  • Pop to find winner
  • Partition winner further

s 20 10 40 30 5 8 7 6 1 3 2 t s 20 10 40 30 5 8 7 6 1 3 2 t s 20 10 40 30 5 8 7 6 1 3 2 t s 20 10 40 30 5 8 7 6 1 3 2 t s 20 10 40 30 5 8 7 6 1 3 2 t s 20 10 40 30 5 8 7 6 1 3 2 t

slide-37
SLIDE 37

37

Anyk-Part: Default

  • How do we find the best solution in each subspace?
  • Default approach: Shortest path algorithm from scratch

s 20 10 40 30 5 8 7 6 1 3 2 t

slide-38
SLIDE 38

38

Anyk-Part: Default

  • How do we find the best solution in each subspace?
  • Default approach: Shortest path algorithm from scratch

s 20 10 40 30 5 8 7 1 t

slide-39
SLIDE 39

39

  • How do we find the best solution in each subspace?
  • Default approach: Shortest path algorithm from scratch

𝑃 π‘œ β„“ per new subspace

Anyk-Part: Default

s 20 10 40 30 5 8 7 1 t 20 40 10 30 25 27 28 26 26

slide-40
SLIDE 40

40

Anyk-Part: Default

[Kimelfeld+ 06]:

  • Ranked enumeration with delay linear in the size of the database
  • Does not fully exploit the structure of the problem

[Kimelfeld+ 06] Kimelfeld, Sagiv. Incrementally Computing Ordered Answers of Acyclic Conjunctive Queries. NGITS’06 https://doi.org/https://doi.org/10.1007/11780991_13

slide-41
SLIDE 41

41

Successor: given a prefix and a decision, what is the next best decision we can make?

Anyk-Part: Exploiting the DP structure

Fixed prefix: Same as previous solution Choose β€œsuccessor” of previous decision Reach the terminal

  • ptimally

Can we calculate the Top-1 weight of each subspace faster?

s 20 10 40 30 5 8 7 6 1 3 2 t

slide-42
SLIDE 42

42

Anyk-Part: Exploiting the DP structure

Fixed prefix: Same as previous solution Reach the terminal

  • ptimally

Computed from DP bottom-up

s 20 10 40 30 5 8 7 6 1 3 2 t 25 27 28 16

Choose β€œsuccessor” of previous decision

slide-43
SLIDE 43

43

Anyk-Part Variants

  • We already know the minimum weight we can get from choosing each decision
  • We just need to compare them to find the β€œsuccessor”
  • Do some pre-processing after DP bottom-up to accomplish that
  • 4 different variants

What is the successor of 6?

5 8 7 6 25 27 27 28 28 16 16 1

slide-44
SLIDE 44

44

Anyk-Part Variant 1: β€œAll”

[Yang+ 18]:

  • The solutions will be compared by the global priority queue anyway, so insert all
  • f them as potential successors
  • But delay will again be linear in the size of the database

[Yang+ 18]: Yang, Ajwani, Gatterbauer, Nicholson, Riedewald, Sala. Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs. WWW’18

https://doi.org/https://doi.org/10.1145/3178876.3186115

5 8 7 25 27 28

Successors

5 8 7 6 25 27 27 28 28 16 16 1

slide-45
SLIDE 45

45

Anyk-Part Variant 2: β€œEager”

  • Invest more into pre-processing to get a lower delay
  • Sort the decisions and find the true successor

5 25 7 27 8 28 6 16

Successor

5 8 7 6 25 27 27 28 28 16 16 1

Sorted List

slide-46
SLIDE 46

46

Anyk-Part Variant 3: β€œLazy”

  • Sorting = wasted effort if enumeration is stopped early

[Chang+ 15]:

  • sort incrementally with a priority queue (per node)
  • store order for future reusage

[Chang+ 15] Chang, Lin, Zhang, Yu, Zhang, Qin. Optimal enumeration: Efficient top-k tree matching. PVLDB’15 https://doi.org/10.14778/2735479.2735486

PQ

5 8 7 6 25 27 28 16

Sorted List

5 25

Pop Successor

5 8 7 6 25 27 27 28 28 16 16 1

slide-47
SLIDE 47

47

Anyk-Part Variant 4: β€œTake2”

  • We want to lower both preprocessing time and delay

[Tziavelis+ 20]:

  • Build a heap (binary tree) in linear time
  • Heap order gives only two potential successors (asymptotically same as one)

[Tziavelis+ 20] Tziavelis, Ajwani, Gatterbauer, Riedewald, Yang. Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries. PVLDB’20

https://doi.org/10.14778/3397230.3397250

5 8 7 6 25 27 28 16

Binary Heap Successors

5 8 7 6 25 27 27 28 28 16 16 1

slide-48
SLIDE 48

48

Anyk-Part Complexity

  • 𝑃 β„“ π‘œ same as DP bottom-up
  • 𝑃 𝑙 π‘šπ‘π‘•π‘™ same as sorting 𝑙 objects
  • 𝑃 𝑙 β„“ needed to enumerate each result

(*) assuming constant-time lookup with hashing

slide-49
SLIDE 49

49

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min
  • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min

– Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k

  • Anyk-Part
  • Anyk-Rec

– Beyond Path Queries – Ranking Function – Open Problems

Top-1 Path Queries DP Any-k DP Top-1 Conjunctive Queries Union of Tree-DP (UT-DP) Any-k UT-DP Any-k UT-DP over selective dioids

slide-50
SLIDE 50

50

Anyk-Rec: Motivation

Principle of Optimality (DP) If Ξ 1 𝑑 begins with node 𝑠 then Ξ 1 𝑑 = 𝑑 β—‹ Ξ 1 𝑠 Generalized Principle of Optimality If Π𝑙 𝑑 begins with node 𝑠 then Π𝑙 𝑑 = 𝑑 β—‹ Ξ π‘˜ 𝑠 for some π‘˜ ≀ 𝑙 Π𝑙 𝑑 = π‘™π‘’β„Ž shortest path from node 𝑑

s r

Ξ 1 𝑑 Ξ 1 𝑠

s r

Π𝑙 𝑑 Ξ π‘˜ 𝑠

Martins, Pascoal, Santos. A new improvement for a K shortest paths algorithm. InvestigaΓ§Γ£o Operacional’01 http://apdio.pt/documents/10180/15407/IOvol21n1.pdf

slide-51
SLIDE 51

51

Anyk-Rec: Example

For each node (e.g. 1) we want to compute the ranking of paths-suffixes

Ξ 1 1 Ξ 2 1 Ξ 3 1

Idea: Store ordering of lower-rank suffixes and reuse it as much as possible

20 10 40 30 5 8 7 6 1 3 2 4 s t

slide-52
SLIDE 52

52

Anyk-Rec: Example

PQ

5 7 8 6

1 β—‹Ξ 1 5 [25] 1 β—‹Ξ 1 7 [27] 1 β—‹Ξ 1 8 [28] 1 β—‹Ξ 1 6 [16]

5 8 7 6 1

One entry per outgoing edge Stores ordering of suffixes Initially Empty Sorted List

slide-53
SLIDE 53

53

Anyk-Rec: Example

PQ

5 7 8 6

1 β—‹Ξ 1 6 [16]

5 8 7 6 1

Pop

  • p

Ξ 1 6

Sorted List

1 β—‹Ξ 1 5 [25] 1 β—‹Ξ 1 7 [27] 1 β—‹Ξ 1 8 [28]

slide-54
SLIDE 54

54

Anyk-Rec: Example

PQ

5 7 8 6 5 8 7 6 1

Sorted List

Ξ 1 6

St Store

Ξ 1 1 = 1 β—‹Ξ 1 6 [16]

1 β—‹Ξ 1 6 [16]

1 β—‹Ξ 1 5 [25] 1 β—‹Ξ 1 7 [27] 1 β—‹Ξ 1 8 [28]

slide-55
SLIDE 55

55

Anyk-Rec: Example

PQ

5 7 8 6 5 8 7 6 1

Sorted List

Ξ 1 6 Ξ 1 1 = 1 β—‹Ξ 1 6 [16]

Rep Replace

1 β—‹Ξ 2 6 [36]

Ξ 2 6

Computed recur ursi sively

1 β—‹Ξ 1 5 [25] 1 β—‹Ξ 1 7 [27] 1 β—‹Ξ 1 8 [28] 1 β—‹Ξ 2 6 [36]

slide-56
SLIDE 56

56

Anyk-Rec: Example

PQ

5 7 8 6 5 8 7 6 1

Sorted List

Ξ 1 5 Ξ 1 1 = 1 β—‹Ξ 1 6 [16]

Pop

  • p

1 β—‹Ξ 1 5 [25]

1 β—‹Ξ 1 7 [27] 1 β—‹Ξ 1 8 [28] 1 β—‹Ξ 2 6 [36]

slide-57
SLIDE 57

57

Anyk-Rec: Example

PQ

5 7 8 6 5 8 7 6 1

Sorted List

Ξ 1 5 Ξ 1 1 = 1 β—‹Ξ 1 6 [16]

1 β—‹Ξ 1 5 [25]

St Store

Ξ 2 1 = 1 β—‹Ξ 1 5 [25] 1 β—‹Ξ 1 7 [27] 1 β—‹Ξ 1 8 [28] 1 β—‹Ξ 2 6 [36]

slide-58
SLIDE 58

58

Anyk-Rec: Example

PQ

5 7 8 6 5 8 7 6 1

Sorted List

Ξ 1 5 Ξ 1 1 = 1 β—‹Ξ 1 6 [16]

1 β—‹Ξ 2 5 [45]

Ξ 2 1 = 1 β—‹Ξ 1 5 [25]

Rep Replace

Computed recur ursi sively

Ξ 2 5 1 β—‹Ξ 1 7 [27] 1 β—‹Ξ 1 8 [28] 1 β—‹Ξ 2 6 [36] 1 β—‹Ξ 2 5 [45]

slide-59
SLIDE 59

59

Ξ 2 1 = 1 β—‹Ξ 1 5 [25]

Anyk-Rec: Suffix Reusage

PQ

5 7 8 6 5 8 7 6 1

Sorted List

Ξ 1 1 = 1 β—‹Ξ 1 6 [16] . . . …

Reuse Ξ 2 1 for all subsequent calls!

Ξ 2 1 ? Ξ 2 1 ? Prefix 1 Prefix 2

Later…

slide-60
SLIDE 60

60

Anyk-Rec: Suffix Reusage

  • In general, delay is higher than Anyk-Part
  • 𝑃 β„“ log π‘œ vs 𝑃 log 𝑙 + β„“ of Take2
  • But reusing computation may pay off in the end
  • If a lot of suffixes are shared, TTL can be faster than sorting!
  • If join pattern = Cartesian product with π‘œβ„“ results:
  • Anyk-Rec TTL:

𝑃 π‘œβ„“(log π‘œ + β„“)

  • Sorting the output: 𝑃 π‘œβ„“ log π‘œ βˆ™ β„“
slide-61
SLIDE 61

61

More on the History of Anyk-Rec

  • [Bellman+ 60]: Keep the 𝑙 best solutions per node
  • [Dreyfus 69]: Recursive equations
  • [JimΓ©nez+ 99]: Top-down approach
  • [Deep+ 19]: Application to conjunctive queries
  • [Tziavelis+ 20]: Improved TTL guarantees

[Bellman+ 60] Bellman and Kalaba. β€œOn k th best policies”. JSIAM’60 https://doi.org/10.1137/0108044 [Dreyfus 69] Dreyfus. An appraisal of some shortest-path algorithms. Operations research’69 https://doi.org/10.1287/opre.17.3.395 [Deep+ 19] Deep and Koutris. Ranked enumeration of conjunctive query results. ArXiv’19 http://arxiv.org/abs/1902.02698 [JimΓ©nez+ 99] JimΓ©nez, Marzal. Computing the k shortest paths: A new algorithm and an experimental comparison. WAE’99

https://doi.org/10.1007/3-540-48318-7_4

[Tziavelis+ 20] Tziavelis, Ajwani, Gatterbauer, Riedewald, Yang. Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries. PVLDB’20

https://doi.org/10.14778/3397230.3397250

slide-62
SLIDE 62

62

Overview

  • Take2 has lower complexity over all instances
  • But there are cases where the recursive approach wins for TTL

(*) assuming constant-time lookup with hashing

slide-63
SLIDE 63

63

Some Experimental Results ?

  • Anyk starts much faster than Batch
  • Anyk-Rec also finishes faster than Batch
  • Anyk-Part is usually faster in the beginning

Tziavelis, Ajwani, Gatterbauer, Riedewald, Yang. Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries. PVLDB’20

https://doi.org/10.14778/3397230.3397250

slide-64
SLIDE 64

64

Some Experimental Results

  • Boolean (is there any result?) is the

best we can do

  • Anyk-Rec is getting faster when there are

more opportunities for suffix reusage

  • Anyk is only 2 times slower
slide-65
SLIDE 65

65

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min
  • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min

– Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k

  • Anyk-Part
  • Anyk-Rec

– Beyond Path Queries – Ranking Function – Open Problems

Top-1 Path Queries DP Any-k DP Top-1 Conjunctive Queries Union of Tree-DP (UT-DP) Any-k UT-DP Any-k UT-DP over selective dioids

slide-66
SLIDE 66

66

Acyclic Queries

R2(A1,A2,A4) R3(A2,A3,A5) R4(A1,A3,A6) R1(A1,A2,A3)

If the query is acyclic, it can be represented by a join tree

slide-67
SLIDE 67

67

Tree-DP

𝑆1

1 2

𝑆2

3 4 5 6 7 8

𝑆3 𝑆4

Stages of DP form a tree instead of a path (Tree-DP) For Top-1, go bottom-up and choose decisions independently in each branch

slide-68
SLIDE 68

68

Ranked Enumeration for Tree-DP

  • Anyk-Part:
  • Serialize the stages and treat it like the path case
  • Complexity guarantees remain the same
  • Anyk-Rec:
  • Apply the path algorithm in each branch
  • Difficulty: how do we combine the solutions from each branch?
  • Improved TTL only if the tree has significant depth

[Deep+ 19] Deep and Koutris. Ranked enumeration of conjunctive query results. ArXiv’19 http://arxiv.org/abs/1902.02698 [Tziavelis+ 20] Tziavelis, Ajwani, Gatterbauer, Riedewald, Yang. Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries. PVLDB’20

https://doi.org/10.14778/3397230.3397250

slide-69
SLIDE 69

69

Cyclic Queries

  • For cyclic queries, use tree decompositions
  • Submodular width decompositions: union of acyclic queries

A1 A4 A3 A2 A6 A5 R6 R1 R2 R3 R4 R5

Acyclic 1 Acyclic 2 Acyclic 3 Acyclic 4 Acyclic 5 Acyclic 6 Acyclic 7

𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3)

slide-70
SLIDE 70

70

Ranked Enumeration for Cyclic Queries

  • Straightforward to run any-k with a top-level Priority Queue

PQ

Acyclic 1 Acyclic 2 Acyclic 3 Acyclic 4 Acyclic 5 Acyclic 6 Acyclic 7

𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3) 𝑃(π‘œ5/3)

get_next()

  • TTF = 𝑃(π‘œ5/3) for 𝑅6𝑑 (same as Boolean query)
slide-71
SLIDE 71

71

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min
  • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min

– Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k

  • Anyk-Part
  • Anyk-Rec

– Beyond Path Queries – Ranking Function – Open Problems

Top-1 Path Queries DP Any-k DP Top-1 Conjunctive Queries Union of Tree-DP (UT-DP) Any-k UT-DP Any-k UT-DP over selective dioids

slide-72
SLIDE 72

72

What ranking functions can be supported?

So far (min, +). Can we substitute these operators with others?

1.

We need to be able to do Dynamic Programming

2.

The 1st operator has to induce an order on the domain

𝑆2 𝑆3

20 40 5 min 20,40 + 5 = 25 25 20 40

slide-73
SLIDE 73

73

Semirings

  • Semiring (W,βŠ•,βŠ—,0,1)

1. (W,βŠ•,0) is commutative monoid 2. (W,βŠ—,1) is monoid 3. βŠ— distributes over βŠ•: (x βŠ• y)βŠ— z = (x βŠ— z) βŠ• (y βŠ— z) 4. 0 annihilates βŠ—: 0 βŠ— x = 0

  • Examples

1. (R∞,min,+,∞,0) β€œTropical semiring” 2. ({0,1},∨,∧,0,1) Boolean 3. (N,+, βˆ™ ,0,1) Number of paths

Key property for efficiency (DP) No ordering (What would the 2nd best solution be?)

Aji, McEliece. The generalized distributive law. IEEE Trans. Inf’00 https://doi.org/10.1109/18.825794

  • Mohri. Semiring frameworks and algorithms for shortest-distance problems. JALC’02 http://www.jalc.de/issues/2002/issue_7_3/abs-321.pdf
slide-74
SLIDE 74

74

Selective Dioids

  • A selective dioid (W,βŠ•,βŠ—,0,1) is a semiring with an additional property
  • βŠ• is selective: (x βŠ• y = x) ∨ (x βŠ• y = y)
  • Selectivity of βŠ• gives us a total order on W:
  • x≀y iff x βŠ• y = x
  • E.g. x≀y iff min(x, y) = x

Gondran, Minoux. Graphs, Dioids and Semirings: New Models and Algorithms. Springer’08. https://doi.org/10.1007/978-0-387-75450-5

slide-75
SLIDE 75

75

DP & Yannakakis

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

Minimum SUM Yannakakis

T T T F T T T T T T T T

Yannakakis Bottom-up: DP over Boolean semiring ({0,1},∨,∧,0,1)

Dangling tuple 17 18 19 ∞ 25 27 28 16 20 40 10 30

Any-k with Boolean semiring?

Equivalent to standard query evaluation of Yannakakis if we use β€œsmarter” PQs (sorted lists of 0-1)

slide-76
SLIDE 76

76

Lexicographic Orders

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

(5, 1, 20) (5, 1, 40) (5, 2, 20) … Lexicographic Order 𝑆2 βˆ’ 𝑆1 βˆ’ 𝑆3 Results first weighted

  • n 𝑆2 then 𝑆1 then 𝑆3
slide-77
SLIDE 77

77

Lexicographic Orders

𝑆1 𝑆2 𝑆3

20 10 40 30 5 8 7 6 1 3 2 4

W: β„“-dimensional vectors

5

has input weight (0, 5, 0)

βŠ• : lexicographic min

(0, 5, 20) βŠ• (0, 6, 10) = (0, 5, 20)

βŠ— : element-wise addition

(0, 0, 20) βŠ— (0, 5, 0) = (0, 5, 20) Lexicographic Order 𝑆2 βˆ’ 𝑆1 βˆ’ 𝑆3

slide-78
SLIDE 78

78

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min
  • Part 3: Ranked enumeration over Joins (Nikolaos): ~40min

– Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k

  • Anyk-Part
  • Anyk-Rec

– Beyond Path Queries – Ranking Function – Open Problems

slide-79
SLIDE 79

79

Open Problems

  • How does any-k interact with other relational operators?
  • Projections (drawing ideas from constant-delay enumeration)
  • Disjunctions
  • Groupings
  • How does the query plan affect the performance of any-k algorithms? How

would the database optimizer choose the best algorithm/join plan?

  • Can we efficiently decompose every query into a union of disjoint trees?
  • Can we prove results beyond the worst-case? (e.g. instance-optimality)
  • Can we β€œpush” the any-k functionality inside the bags of the tree

decomposition instead of materializing them beforehand?