spinning relations high speed networks for distributed
play

Spinning Relations: High-Speed Networks for Distributed Join - PowerPoint PPT Presentation

Spinning Relations: High-Speed Networks for Distributed Join Processing Philip Frey, Romulo Goncalves, Martin Kersten, Jens Teubner Problem Statement We address a core database problem, but for large problem sizes: Process a join R S


  1. Spinning Relations: High-Speed Networks for Distributed Join Processing Philip Frey, Romulo Goncalves, Martin Kersten, Jens Teubner

  2. Problem Statement We address a core database problem, but for large problem sizes: Process a join R � θ S (arbitrary join predicate). R and S are large (many gigabytes, even terabytes). Traditional approach: Use a big machine and/or suffer the severe disk I/O bottleneck of block nested loops join. Can do distributed evaluation only for certain θ or certain data distributions (or suffer high network I/O cost). Today: Assume a cluster of commodity machines only. Leverage modern high-speed networks (10 Gb/s and beyond). Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 2 / 11

  3. Modern Networks: High Speed? It is actually very hard to saturate modern ( e.g. , 10 Gb/s) networks. System 1 System 2 underutilized network CPU CPU RAM NIC NIC RAM High CPU demand ◮ Rule of thumb: 1 GHz CPU per 1 Gb/s network throughput (!) Memory bus contention ◮ Data typically has to cross the memory bus three times → ≈ 3 GB/s bus capacity needed for 10 Gb/s network Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 3 / 11

  4. RDMA: Remote Direct Memory Access RDMA -capable network cards (RNICs) can saturate the link using direct data placement (avoid unnecessary bus transfers), OS bypassing (avoid context switches), and TCP offloading (avoid CPU load). System 1 System 2 fully utilized network CPU CPU RAM RNIC RNIC RAM Data is read/written on both ends using intra-host DMA . Asynchronous transfer after work request issued by CPU. Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 4 / 11

  5. Cyclo-Join Idea 1 distribute input S Host H 1 2 join locally RDMA Host H 2 R 3 R 3 R 3 R 3 RDMA 3 rotate R 4 S 1 R 4 R 4 S 2 R 2 R 4 R 2 RDMA R 2 R 2 S 0 Host H 0 RDMA S 3 R 5 input R Host H 3 R 5 R 5 R 1 R 5 S 5 R 1 R 1 S 4 RDMA R 1 R 0 RDMA R 0 R 0 R 0 Host H 5 Host H 4 RDMA: join and rotate Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 5 / 11

  6. Analysis Cyclo-join has similarities to block nested loops join . Cut input data into blocks R i and S j . Join all combinations R i � S j in memory . As such, cyclo-join can be paired with any in-memory join algorithm , can be used to distribute the processing of any join predicate . Cyclo-join fits into a “cloud-style” environment: additional nodes can be hooked in as needed, arbitrary assignment host ↔ task, cyclo-join consumes and produces distributed tables → n -way joins. Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 6 / 11

  7. Cyclo-Join Put Into Practice We implemented a prototype of cyclo-join : four processing nodes ◮ Intel Xeon quad-core 2.33 GHz ◮ 6 GB RAM per node; memory bandwidth: 3.4 GB/s (measured) 10 Gb/s Ethernet ◮ Chelsio T3 RDMA-enabled network cards ◮ Nortel 10 Gb/s Ethernet switch in-memory hash join ◮ hash phase physically re-organizes data (on each node) → better cache efficiency during join phase ◮ I/O complexity: O ( | R | + | S | ) Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 7 / 11

  8. Experiments Experiment 1: Distribute evaluation of a join where | R | = | S | = 1 . 8 GB. 80 hash buildup synchronization wall-clock time [s] join execution 60 MonetDB 40 (single-host) 20 0 1 host 2 hosts 3 hosts 4 hosts 1 . 8 � 1 . 8 1 . 8 � 1 . 8 1 . 8 � 1 . 8 1 . 8 � 1 . 8 # hosts / sizes of S � R [GB] Main benefit: reduced hash buildup time . Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 8 / 11

  9. Experiments Experiment 2: Scale up and join larger S (hash buildup ignored here) . 4 0.26 synchronization join execution wall-clock time [s] 3.54 0.58 3 0.80 2.83 2 2.08 1.35 1 0 1 host 2 hosts 3 hosts 4 hosts 1 . 8 � 1 . 8 3 . 6 � 1 . 8 5 . 4 � 1 . 8 7 . 2 � 1 . 8 # hosts / sizes of S � R [GB] � System scales like a machine with large RAM would. � CPUs have to wait for network transfers (“synchronization”). Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 9 / 11

  10. Memory Transfers Need to wait for network: Does that mean RDMA doesn’t work at all? 1 . 8 GB 10 Gb/s = 1 . 44 s time memory bandwidth [GB/s] 5 0.58 3 2.83 RDMA trans. 4 2 bus bandwidth 1 3 0 2 3 hosts 5 . 4 � 1 . 8 join R i � S j 0.58 s 1 2.83 s time 0 0 1 2 3 4 The culprit is the local memory bus ! If RDMA hadn’t saved us some bus transfers, this would be worse . Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 10 / 11

  11. Conclusions I demonstrated cyclo-join : ring topology to process large joins , use distributed memory to process arbitrary joins , hardware acceleration via RDMA is crucial: ◮ reduce CPU load and memory bus contention . Cyclo-join is part of the Data Cyclotron project: support for more local join algorithms , process full queries in a merry-go-round setup . Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 11 / 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend