strom smart remote memory ry
play

StRoM: Smart Remote Memory ry David Sidler* , Zeke Wang , Monica - PowerPoint PPT Presentation

StRoM: Smart Remote Memory ry David Sidler* , Zeke Wang , Monica Chiosa , Amit Kulkarni , Gustavo Alonso * Microsoft Corporation Collaborative Innovation Center of Artificial Intelligence, Zhejiang University


  1. StRoM: Smart Remote Memory ry David Sidler* ‡ , Zeke Wang †‡ , Monica Chiosa ‡ , Amit Kulkarni ‡ , Gustavo Alonso ‡ * Microsoft Corporation † Collaborative Innovation Center of Artificial Intelligence, Zhejiang University ‡ Systems Group, Department of Computer Science, ETH Zürich

  2. Increasing Compute-Bandwidth Gap 100000 10000 Compute- Bandwidth Relative Speedup 1000 Gap ▪ Increase in CPU cycles allocated 100 towards network processing ▪ Context switches between OS network 10 stack and application amplify the issue 1 0.1 1980 1990 2000 2010 2020 CPU frequency Network bandwidth

  3. RDMA (Remote Direct Memory Access) RDMA (Remote Direct Memory Access) Memory Memory Complete Hardware offload => Bypasses OS and CPU CPU NIC CPU NIC Distributed key-value stores[1,2] Parallel database systems Distributed graph computation[3] [1] C. Mitchell, et al., Using One-sided RDMA Reads to build a fast, CPU-efficient key-value store, ATC’13 [2] A. Dragojevic, et al., FaRM: Fast Remote Memory, NSDI’14 [3] M. Wu, et al., GRAM: Scaling graph computation to the trillions, SoCC’15

  4. Get over RDMA: Two-sided vs One-sided Two-sided (Send/Receive) Remote Memory ▪ Single round trip ▪ Simple client-server Hash Table Value Store 1 Send GET model NIC ▪ Client Remote CPU involved ▪ Read hash entry 2 CPU ▪ Compare keys 3 Send Value ▪ Read value No clear winner One-sided (Direct Access) Remote Memory ▪ Remote CPU not involved ▪ At least 2 RTs necessary Hash Table Value Store ▪ 1 Read Hash Table Handling of misses costly NIC Client 2 CPU 3 Read Value Compare keys

  5. StRoM: Smart Remote Memory StRoM: Deployment of Acceleration kernels on the NIC Memory StRoM kernel ▪ Direct access to host memory ▪ Able to receive/transmit data StRoM NIC CPU over RDMA kernel

  6. GET as StRoM Kernel ▪ Read hash entry Remote Memory ▪ Compare keys Hash Table ▪ Read value 2 ▪ Single round trip Value Store ▪ 1 Remote CPU not involved RDMA RPC GET NIC Client kernel CPU 3 Write Value

  7. Acceleration Capabilities Accelerating Data Access Invoke one-sided RPCs on the remote NIC Memory ▪ Traversal of remote data structures ▪ Verification of data objects StRoM ▪ Manipulation of simple data structures NIC CPU kernel Accelerating Data Processing On-the-fly data processing when transmitting/receiving Memory ▪ Data shuffling ▪ Filtering ▪ Pattern/event detection StRoM NIC CPU ▪ Aggregation kernel ▪ Compression ▪ Statistics gathering

  8. Use Case: Gathering Statistics HyperLogLog (HLL) kernel to estimate cardinality of a data set • Bump-in-the-wire kernel • Cardinality estimation can augment the optimizer in data processing systems Remote Memory 1 data statistics 1 RDMA RPC WRITE HLL NIC Node 2 kernel CPU Leading Harmonic Hash Buckets Zeros mean

  9. Evaluation – StRoM NIC • FPGA-based prototype RDMA NIC • Extended RoCEv2 implementation with support for StRoM StRoM at 10G StRoM at 100G Alpha Data ADM-PCIE-7V3 Xilinx VCU118

  10. Evaluation – GET kernel 5 μ s

  11. Evaluation – HLL kernel

  12. Conclusion StRoM: Smart Remote Memory • Deployment of acceleration kernels on the NIC • Acceleration of data access and data processing at up to 100G • Research platform Open source at github.com/fpgasystems/fpga-network-stack

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend