Efficient Join Processing across Heterogeneous Processors
Henning Funke, Sebastian Breß, Stefan Noll, Jens Teubner December 15, 2015
1 / 14
Efficient Join Processing across Heterogeneous Processors Henning - - PowerPoint PPT Presentation
Efficient Join Processing across Heterogeneous Processors Henning Funke, Sebastian Bre, Stefan Noll, Jens Teubner December 15, 2015 1 / 14 GPUs IN DATABASES ARE LIKE A MUSCLE CAR IN A TRAFFIC JAM 3 / 14 Bottleneck 3 / 14 really? 3 / 14
1 / 14
3 / 14
3 / 14
3 / 14
◮ Pipeline join probe and result compaction in shared memory
1Based on: Alcantara, Dan Anthony Feliciano. Efficient hash tables on the
4 / 14
◮ Pipeline join probe and result compaction in shared memory
1Based on: Alcantara, Dan Anthony Feliciano. Efficient hash tables on the
4 / 14
5 / 14
6 / 14
◮ Scalability to large data ◮ Communication ◮ Local and remote resources
Relations: High-Speed Networks for Distributed Join Processing. In DaMoN, 2009.
7 / 14
8 / 14
Intel Xeon E5-1607 v2 and NVIDIA Geforce GTX970 2 4 6 8 1 2 3 4 5 CPU worker threads Probe throughput GB/s CPU alone CPU + GPU
9 / 14
10 / 14
149 GB/s
Per core (4) scan 15.8 GB/s gather 4 GB/s even share 7.8 GB/s Local prefix scan 84 GB/s gather 33.2 GB/s
12 GB/s 31 GB/s
11 / 14
149 GB/s
Per core (4) scan 15.8 GB/s gather 4 GB/s even share 7.8 GB/s Local prefix scan 84 GB/s gather 33.2 GB/s
12 GB/s 31 GB/s
12 / 14
149 GB/s
Per core (4) scan 15.8 GB/s gather 4 GB/s even share 7.8 GB/s Local prefix scan 84 GB/s gather 33.2 GB/s
12 GB/s 31 GB/s
◮ PCI express bus and main memory can become a bottleneck ◮ Take bandwidth footprint and throughput into account
12 / 14
◮ Materialize tuples in hash table
◮ Order probe data by hash function
◮ Pipeline data between GPU kernels ◮ Concurrent kernel execution
13 / 14
◮ PCIe is not the dominating
◮ Dataflow oriented processing
◮ Move part of probes to coprocessor
◮ Join processing → query processing ◮ Compile pipelined operator sequences ◮ Stream arbitrary columns
Morsel-driven parallelism SIGMOD 2014
14 / 14
◮ PCIe is not the dominating
◮ Dataflow oriented processing
◮ Move part of probes to coprocessor
◮ Join processing → query processing ◮ Compile pipelined operator sequences ◮ Stream arbitrary columns
Morsel-driven parallelism SIGMOD 2014
14 / 14