design guidelines
play

Design Guidelines for High Performance RDMA Systems Anuj Kalia - PowerPoint PPT Presentation

Design Guidelines for High Performance RDMA Systems Anuj Kalia (CMU) Michael Kaminsky (Intel Labs) David Andersen (CMU) 1 RDMA is cheap (and fast!) Mellanox Connect-IB 2x 56 Gbps InfiniBand ~2 s RTT RDMA $1300 Problem


  1. Design Guidelines for High Performance RDMA Systems Anuj Kalia (CMU) Michael Kaminsky (Intel Labs) David Andersen (CMU) 1

  2. RDMA is cheap (and fast!) Mellanox Connect-IB • 2x 56 Gbps InfiniBand • ~2 µs RTT • RDMA • $1300 Problem Performance depends on complex low-level factors 2

  3. Background: RDMA read Core Core CPU L3 DMA read PCI Express RDMA read request NIC RDMA read response 3

  4. How to design a sequencer? Server 87 88 Client Client 4

  5. Which RDMA ops to use? Remote CPU bypass (one-sided) • Read • Write Perf? • Fetch-and-add 2.2 M/s • Compare-and-swap Remote CPU involved (messaging, two-sided) • Send • Recv 5

  6. How we sped up the sequencer by 50X 6

  7. Large RDMA design space Operations READ WRITE ATOMIC SEND, RECV Remote bypass (one-sided) Two-sided Transports Reliable Unreliable Connected Datagram Optimizations Inlined Unsignaled Doorbell batching WQE shrinking 0B-RECVs 7

  8. Guidelines NICs have multiple processing units (PUs) Avoid contention Exploit parallelism PCI Express messages are expensive Reduce CPU-to-NIC messages (MMIOs) Reduce NIC-to-CPU messages (DMAs) 8

  9. High contention w/ atomics Core Core CPU L3 Sequence counter A PCI Express DMA read DMA write Fetch&Add(A, 1) Latency ~500ns Throughput ~2 M/s PU PU 9

  10. Reduce contention: use CPU cores Core Core Core to L3: 20 ns L3 A DMA write PCI Express (500 ns) RDMA write (RPC req) NIC SEND (RPC resp) [HERD, SIGCOMM 14] 10

  11. Sequencer throughput 150 Throughput (M/s) 50x 120 90 60 30 7 2.2 0 Sequencer throughput Atomics RPC (1 core) 11

  12. Reduce MMIOs w/ Doorbell batching SEND SEND CPU Push NIC MMIOs ⇒ lots of CPU cycles SEND SEND CPU Pull NIC DMA 12

  13. RPCs w/ Doorbell batching Push Pull (Doorbell batching) CPU NIC CPU NIC Requests Requests Responses Responses 13

  14. Sequencer throughput 150 Throughput (M/s) 50x 120 90 60 30 16.6 7 2.2 0 Sequencer throughput Atomics RPC (1 C) +Dbell batching 14

  15. Exploit NIC parallelism w/ multiQ Core Core CPU L3 A PCI Express Idle Bottleneck SEND (RPC resp) 15

  16. Sequencer throughput 150 Throughput (M/s) 50x 120 90 60 27.4 30 16.6 7 2.2 0 Sequencer throughput Atomics RPC (1 C) +3 queues +Dbell batching 16

  17. Sequencer throughput 150 Throughput (M/s) 50x 120 97.2 90 60 27.4 30 16.6 7 2.2 0 Sequencer throughput Atomics RPC (1 C) +Batching +3 queues +6 cores Bottleneck = PCIe DMA bandwidth (paper) 17

  18. Reduce DMA size: Header-only 0 64 128 SEND CPU NIC 0 64 128 Header Imm Size Data Unused 64B 4B 8B 52B Move payload 0 64 Header Imm 18

  19. Sequencer throughput 150 Throughput (M/s) 122 50x 120 97.2 90 60 27.4 30 7 2.2 0 Sequencer throughput Atomics RPC (1 C) +4 Queues, +6 cores +Header-only Dbell batching 19

  20. Evaluation • Evaluation of optimizations on 3 RDMA generations • PCIe models, bottlenecks • More atomics experiments • Example: atomic operations on multiple addresses 20

  21. RPC-based key-value store 14 resps/doorbell Baseline +Doorbell Batching 100 9 resps/doorbell Throughput (M/s) 75 50 25 HERD [SIGCOMM 14] 16B keys, 32B values, 5% PUTs 0 0 2 4 6 8 10 12 14 Number of cores 21

  22. Conclusion NICs have multiple processing units (PUs) Avoid contention Exploit parallelism PCI Express messages are expensive Reduce CPU-to-NIC messages (MMIOs) Reduce NIC-to-CPU messages (DMAs) Code: https://github.com/anujkaliaiitd/rdma_bench 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend