optimal join algorithms meet top
play

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 1 : Top- Time Slides:


  1. SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- 𝑙 Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 1 : Top- 𝑙 Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. 1 See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details

  2. Why "Optimal Join Algorithms meet Top- 𝑙 "? Optimal Join algorithms Top- 𝑙 Return all results over joins Given 𝑙 , return 𝑙 β€œbest” results β‡’ How to avoid large β‡’ How to avoid working on intermediate results? any lower ranked results? Ranked Enumeration (Any- 𝑙 ) Incrementally return the 𝑙 β€œbest” results over joins (for any 𝑙 = 1, 2, ...) β‡’ How to most effectively push sorting through joins? 3

  3. Top- 𝑙 Optimal Join Algorithms Any- 𝑙 middleware cost model RAM cost model return all results; (# accesses) wish: 𝑃 𝑠 , 𝑠 > π‘œ conjunctive queries ranking function small result size; query wish: 𝑃(𝑙) decompositions most important all results minimize results first are equally return only intermediate important 𝑙 - best results results incremental computation 4

  4. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min – Top- 𝑙 selection problem – Threshold algorithm [Fagin+ '03] – Top- 𝑙 join problem – J* algorithm [Natsev+ '01] – Discussion on cost models β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min 5

  5. Top- 𝑙 Selection Query: overall setup β€’ π‘œ objects π‘Œ ! , π‘Œ " , … , π‘Œ # with β„“ numeric weight attributes π‘₯ ! , π‘₯ " , … , π‘₯ β„“ β€’ weight of object = aggregate function over its weights 𝜍 π‘₯ ! , π‘₯ " , … , π‘₯ β„“ = 𝜍 π‘Œ β€’ Goal: Find top- 𝑙 objects according to some order (e.g. min) In most original papers assumed to be max! id π‘₯ ! π‘₯ " π‘₯ % sum Example aggregate function: 𝜍 = sum {π‘₯ ! , π‘₯ " , π‘₯ % } π‘Œ ! 3 4 3 10 π‘Œ " 4 2 4 10 π‘Œ % 6 8 1 15 π‘Œ & 7 6 6 18 Top- 𝑙 : a set of 𝑙 objects s.t. 𝜍 π‘Œ ( ≀ 𝜍(π‘Œ ) ) π‘Œ ' 8 7 5 20 for every π‘Œ ( ∈ π‘ˆ and every π‘Œ ) βˆ‰ π‘ˆ π‘œ = 5 , β„“ = 3 , 𝑙 = 2 6

  6. Top- 𝑙 Selection Query: information in different relations β€’ Weights are stored in β„“ distinct relations 𝑆 ! 𝑆 ! 𝑆 " 𝑆 % id π‘₯ ! π‘₯ " π‘₯ % sum id π‘₯ ! id π‘₯ " id π‘₯ % π‘Œ ! 3 4 3 10 π‘Œ ! 3 π‘Œ ! 4 π‘Œ ! 3 π‘Œ " 4 2 4 10 π‘Œ " 4 π‘Œ " 2 π‘Œ " 4 π‘Œ % 6 8 1 15 π‘Œ % 6 π‘Œ % 1 π‘Œ % 8 π‘Œ & 7 6 6 18 π‘Œ & 7 π‘Œ & 6 π‘Œ & 6 π‘Œ ' 8 7 5 20 π‘Œ ' 8 π‘Œ ' 7 π‘Œ ' 5 7

  7. Top- 𝑙 Selection Query: sorted access β€’ Weights are stored in β„“ distinct relations 𝑆 ! - each 𝑆 ! is sorted by attribute π‘₯ ! 𝑆 ! 𝑆 " 𝑆 % id π‘₯ ! π‘₯ " π‘₯ % sum id π‘₯ ! id π‘₯ " id π‘₯ % π‘Œ ! 3 4 3 10 π‘Œ ! 3 π‘Œ ! 4 π‘Œ ! 3 π‘Œ " 4 2 4 10 π‘Œ " 4 π‘Œ " 2 π‘Œ " 4 π‘Œ % 6 8 1 15 π‘Œ % 6 π‘Œ % 1 π‘Œ % 8 π‘Œ & 7 6 6 18 π‘Œ & 7 π‘Œ & 6 π‘Œ & 6 π‘Œ ' 8 7 5 20 π‘Œ ' 8 π‘Œ ' 7 π‘Œ ' 5 8

  8. Top- 𝑙 Selection Query: sorted access β€’ Weights are stored in β„“ distinct relations 𝑆 ! - each 𝑆 ! is sorted by attribute π‘₯ ! 𝑆 ! 𝑆 " 𝑆 % id π‘₯ ! π‘₯ " π‘₯ % sum id π‘₯ ! id π‘₯ " id π‘₯ % Notice we sort in increasing order π‘Œ ! 3 4 3 10 π‘Œ ! 3 π‘Œ " 2 π‘Œ % 1 π‘Œ " 4 2 4 10 π‘Œ " 4 π‘Œ ! 4 π‘Œ ! 3 π‘Œ % 6 8 1 15 π‘Œ % 6 π‘Œ & 6 π‘Œ " 4 π‘Œ & 7 6 6 18 π‘Œ & 7 π‘Œ ' 7 π‘Œ ' 5 π‘Œ ' 8 7 5 20 π‘Œ ' 8 π‘Œ % 8 π‘Œ & 6 9

  9. Top- 𝑙 Selection Query: "middleware" assumption As Assumption 1: 1: Mi Middl ddleware c cost m mode del : β€’ Weights are stored in β„“ distinct relations 𝑆 ! we aggregate rankings of other services. - each 𝑆 ! is sorted by attribute π‘₯ ! β€’ we only pay for accesses to attribute lists β€’ Goal: Find top- 𝑙 with minimal access cost β€’ 2 types of access: sequential / random - get next object in 𝑆 ! sequentially: "sorted" sequential access cost 𝑑 "#$ - obtain the weight for a specific object in 𝑆 ! : random access (index lookup) cost 𝑑 %&'( 𝑆 ! 𝑆 " 𝑆 % id π‘₯ ! π‘₯ " π‘₯ % sum id π‘₯ ! id π‘₯ " id π‘₯ % Notice we sort in increasing order π‘Œ ! 3 4 3 10 π‘Œ ! 3 π‘Œ " 2 π‘Œ % 1 π‘Œ " 4 2 4 10 π‘Œ " 4 π‘Œ ! 4 π‘Œ ! 3 π‘Œ % 6 8 1 15 π‘Œ % 6 π‘Œ & 6 π‘Œ " 4 π‘Œ & 7 6 6 18 π‘Œ & 7 π‘Œ ' 7 π‘Œ ' 5 π‘Œ ' 8 7 5 20 π‘Œ ' 8 π‘Œ % 8 π‘Œ & 6 10

  10. Top- 𝑙 Selection Query as a Join Problem As Assumption 1: 1: Mi Middl ddleware c cost m mode del : β€’ Weights are stored in β„“ distinct relations 𝑆 ! we aggregate rankings of other services. - each 𝑆 ! is sorted by attribute π‘₯ ! β€’ we only pay for accesses to attribute lists β€’ Goal: Find top- 𝑙 with minimal access cost β€’ 2 types of access: sequential / random - get next object in 𝑆 ! sequentially: "sorted" sequential access cost 𝑑 "#$ - obtain the weight for a specific object in 𝑆 ! : random access (index lookup) cost 𝑑 %&'( 𝑆 ! 𝑆 " 𝑆 % select R 1 .id, id π‘₯ ! π‘₯ " π‘₯ % sum id π‘₯ ! id π‘₯ " id π‘₯ % sum(w 1 ,w 2 ,w 3 ) as weight π‘Œ ! 3 4 3 10 π‘Œ ! 3 π‘Œ " 2 π‘Œ % 1 from R 1 , R 2 , R 3 π‘Œ " 4 2 4 10 π‘Œ " 4 π‘Œ ! 4 π‘Œ ! 3 where R 1 .id=R 2 .id π‘Œ % 6 8 1 15 π‘Œ % 6 π‘Œ & 6 π‘Œ " 4 and R 2 .id=R 3 .id π‘Œ & 7 6 6 18 π‘Œ & 7 π‘Œ ' 7 π‘Œ ' 5 order by weight π‘Œ ' 8 7 5 20 π‘Œ ' 8 π‘Œ % 8 π‘Œ & 6 limit 2 ~ Joins on unique object id: 1–1 relationships 11

  11. Naive algorithm: retrieve all items Assumption 1: As 1: Mi Middl ddleware c cost m mode del : β€’ Weights are stored in β„“ distinct relations 𝑆 ! we aggregate rankings of other services. - each 𝑆 ! is sorted by attribute π‘₯ ! β€’ we only pay for accesses to attribute lists β€’ Goal: Find top- 𝑙 with minimal access cost β€’ 2 types of access: sequential / random - get next object in 𝑆 ! sequentially: "sorted" sequential access cost 𝑑 "#$ - obtain the weight for a specific object in 𝑆 ! : random access (index lookup) cost 𝑑 %&'( 𝑆 ! 𝑆 " 𝑆 % select R 1 .id, id π‘₯ ! π‘₯ " π‘₯ % sum id π‘₯ ! id π‘₯ " id π‘₯ % sum(w 1 ,w 2 ,w 3 ) as weight π‘Œ ! 3 4 3 10 π‘Œ ! 3 π‘Œ " 2 π‘Œ % 1 from R 1 , R 2 , R 3 π‘Œ " 4 2 4 10 π‘Œ " 4 π‘Œ ! 4 π‘Œ ! 3 where R 1 .id=R 2 .id π‘Œ % 6 8 1 15 π‘Œ % 6 π‘Œ & 6 π‘Œ " 4 and R 2 .id=R 3 .id π‘Œ & 7 6 6 18 π‘Œ & 7 π‘Œ ' 7 π‘Œ ' 5 order by weight π‘Œ ' 8 7 5 20 π‘Œ ' 8 π‘Œ % 8 π‘Œ & 6 limit 2 Naive algorithm: retrieve all items, sort, return top- 𝑙 Cost = π‘œ β‹… β„“ β‹… 𝑑 "#$% 12

  12. Assumption 2: monotonicity of 𝜍 Assumption 1: As 1: Mi Middl ddleware c cost m mode del : β€’ Weights are stored in β„“ distinct relations 𝑆 ! we aggregate rankings of other services. - each 𝑆 ! is sorted by attribute π‘₯ ! β€’ we only pay for accesses to attribute lists β€’ Goal: Find top- 𝑙 with minimal access cost β€’ 2 types of access: sequential / random - get next object in 𝑆 ! sequentially: "sorted" sequential access cost 𝑑 "#$ - obtain the weight for a specific object in 𝑆 ! : random access (index lookup) cost 𝑑 %&'( 𝑆 ! 𝑆 " 𝑆 % select R 1 .id, id π‘₯ ! π‘₯ " π‘₯ % sum id π‘₯ ! id π‘₯ " id π‘₯ % sum(w 1 ,w 2 ,w 3 ) as weight π‘Œ ! 3 4 3 10 π‘Œ ! 3 π‘Œ " 2 π‘Œ % 1 from R 1 , R 2 , R 3 π‘Œ " 4 2 4 10 π‘Œ " 4 π‘Œ ! 4 π‘Œ ! 3 where R 1 .id=R 2 .id π‘Œ % 6 8 1 15 π‘Œ % 6 π‘Œ & 6 π‘Œ " 4 and R 2 .id=R 3 .id π‘Œ & 7 6 6 18 π‘Œ & 7 π‘Œ ' 7 π‘Œ ' 5 order by weight π‘Œ ' 8 7 5 20 π‘Œ ' 8 π‘Œ % 8 π‘Œ & 6 limit 2 Part 3: tropical semiring (min, sum) is instance Assumption 2: As 2: The aggregate function 𝜍 is mo monotone : of " sele lective di dioid " (i.e. min(a,b) = a or b). , if π‘₯ ! ≀ π‘₯ ! , for all i , , π‘₯ * , , … , π‘₯ β„“ 𝜍 π‘₯ ) , π‘₯ * , … , π‘₯ β„“ ≀ 𝜍 π‘₯ ) 𝜍 is decomposable: 𝜍 π‘₯ ) , π‘₯ * , π‘₯ - = 𝜍{π‘₯ ) , π‘₯ * , π‘₯ - } 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend