accelerating the merge phase of sort merge join
play

Accelerating the merge phase of sort-merge join FPL 2019 The 29th - PowerPoint PPT Presentation

Accelerating the merge phase of sort-merge join Accelerating the merge phase of sort-merge join FPL 2019 The 29th International Conference on Field-Programmable Logic and Applications Philippos Papaphilippou, Holger Pirk, Wayne Luk Dept. of


  1. Accelerating the merge phase of sort-merge join Accelerating the merge phase of sort-merge join FPL 2019 – The 29th International Conference on Field-Programmable Logic and Applications Philippos Papaphilippou, Holger Pirk, Wayne Luk Dept. of Computing, Imperial College London, UK {pp616, pirk, w.luk}@imperial.ac.uk Source code: philippos.info/mergejoin 9/9/2019 Philippos Papaphilippou 1

  2. Accelerating the merge phase of sort-merge join The task: equi-join A-Key B-Key Value A-Key Value B-Key Value A1 B1 2 A1 2 B1 2 A1 B2 2 A2 2 B2 2 A2 B1 2 A3 3 ⨝ = B3 3 A2 B2 2 A4 3 B4 5 A3 B3 3 A5 3 B5 6 A4 B3 3 A6 11 A5 B3 3 ● Equi-join – Join two tables based on key equality – Cartesian product when there are more than 1 keys in one of the 2 tables ● Popular algorithms – Hash-join → Random access pattern – Sort-merge join → Streaming access pattern → FPGA friendly 9/9/2019 Philippos Papaphilippou 3

  3. Accelerating the merge phase of sort-merge join Challenges in related work ● Input properties – Presence of duplicate keys → complicates the hardware and access patterns – Long input → limited storage inside the FPGA – Wide input → moving big rows is expensive – Some designs are inapplicable or slow down ● Data movement – Narrow inter-chip (CPU ↔ FPGA) communication – Induced latency ● Scalability – Future technologies (High-throughput) – Big data → arbitrarily long tables 9/9/2019 Philippos Papaphilippou 4

  4. Accelerating the merge phase of sort-merge join Abstracted solution ● High-Throughput Stream processor ● Inputs – Sorted keys of table A – Sorted keys of table B ● Output – Index ranges where the key was the same ● Expand on demand (late materialisation) 9/9/2019 Philippos Papaphilippou 5

  5. Accelerating the merge phase of sort-merge join Proposal Building blocks Round-robin module – Co-grouping engine – Modified FLiMS – 9/9/2019 Philippos Papaphilippou 6

  6. Accelerating the merge phase of sort-merge join Round-robin module ● Stream processor CAS network (bitonic sorter) ● Rearranges sparse input, before writing in multiple Barrel Shifters banks ● Round-robin effect, but in parallel MSB + + + + SR 9/9/2019 Philippos Papaphilippou 7

  7. Accelerating the merge phase of sort-merge join Co-grouping engine ● Stream processor 1 cycle delay ● Provides ranges of indexes, <index start , index end , key> f g 0 where the key was the same 0 ● Input: Sorted keys <index, key> f g 1 Round 1 ● Output: Unique keys, Robin ... ... index ranges f g P-1 P-1 9/9/2019 Philippos Papaphilippou 8

  8. Accelerating the merge phase of sort-merge join Join module ● Task: merge 2 co-grouped streams ● Output: tuples of the form <index Astar t , index Aend , index Bstart , index Bend , key> ● Main idea: Sort them together – Based on a high-throughput H/W ● merge sorter (FLiMS [FPT’18]) Match same-key groups, by only looking – at consecutives 9/9/2019 Philippos Papaphilippou 9

  9. Accelerating the merge phase of sort-merge join Advantages ● Input agnostic – Index-based – Big data analytics ● Stream processor – FPGA-friendly ● Modular design – Novel building blocks – Can be combined with other: H/W sorters, filters, ... ● High-throughput design – Scalable for future architectures – Lower resources than related work 9/9/2019 Philippos Papaphilippou 10

  10. Accelerating the merge phase of sort-merge join Evaluation on a heterogeneous system ● Platform Zynq UltraScale+ device – 16384 Empty space: 3 Operating system: Petalinux – no more key matches than Output size (# of rows) the number of distinct keys Communication: DMA transfers – 12288 FPGA speedup ● Speedup of up to 3.1 times 2.5 1-port (H/W) vs 1-thread (S/W) 8192 – ● Input design space exploration 2 4096 Fraction of distinct keys (%) – Fraction of key matches (%) – 1.5 0 (directly related to the output size) ● Speedup variation factors 0 20 40 60 80 100 CPU performance – Distinct keys in A, B (%) Length of the DMA transfers (CPU→ FPGA) – 9/9/2019 Philippos Papaphilippou 11

  11. Accelerating the merge phase of sort-merge join END Thank you for your attention! Source code for Ultra96: philippos.info/mergejoin 9/9/2019 Philippos Papaphilippou 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend