Networking for Containerized Clouds Daehyeok Kim Tianlong Yu 1 , - - PowerPoint PPT Presentation

networking for containerized clouds
SMART_READER_LITE
LIVE PREVIEW

Networking for Containerized Clouds Daehyeok Kim Tianlong Yu 1 , - - PowerPoint PPT Presentation

FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu 1 , Hongqiang Liu 3 , Yibo Zhu 4 , Jitu Padhye 2 , Shachar Raindel 2 Chuanxiong Guo 4 , Vyas Sekar 1 , Srinivasan Seshan 1 Carnegie Mellon


slide-1
SLIDE 1

FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds

Daehyeok Kim

Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4, Jitu Padhye2, Shachar Raindel2 Chuanxiong Guo4, Vyas Sekar1, Srinivasan Seshan1 Carnegie Mellon University1, Microsoft2, Alibaba group3, Bytedance4

slide-2
SLIDE 2

Two Trends in Cloud Applications

  • Lightweight isolation
  • Portability

1

  • Higher networking performance

Containerization RDMA networking

slide-3
SLIDE 3

Benefits of Containerization

2

NIC Container 1 IP: 10.0.0.1 IP: 30.0.0.1 Network App Container 2 IP: 20.0.0.1 Network App Host 1 Host 2 NIC Container 2 Network App IP: 20.0.0.1 IP: 40.0.0.1 Migration Namespace Isolation Portability Software Switch

Software Switch

slide-4
SLIDE 4

Containerization and RDMA are in Conflict!

3

RDMA NIC Container 1 IP: 10.0.0.1 IP: 10.0.0.1 RDMA App Container 2 IP: 10.0.0.1 RDMA App Host 1 Host 2 RDMA NIC Container 2 RDMA App IP: 20.0.0.1 IP: 20.0.0.1 Migration Namespace Isolation Portability

slide-5
SLIDE 5

Existing H/W based Virtualization Isn’t Working

4

Container 1 IP: 10.0.0.1 IP: 10.0.0.1 RDMA App Container 2 IP: 10.0.0.2 RDMA App Container 2 RDMA App IP: 20.0.0.1 Host 1 Host 2 VF 1 VF 2 NIC Switch RDMA NIC IP: 10.0.0.2 VF NIC Switch IP: 20.0.0.1 Migration

Using Single Root I/O Virtualization (SR-IOV)

VF Virtual Function Namespace Isolation Portability

slide-6
SLIDE 6

Sub-optimal Performance of Containerized Apps

5

1000 2000 3000 Resnet-50 Inception-v3 Alexnet Training Speed (Images/sec) Model Native RDMA Container+TCP

RDMA networking can improve the training speed of NN model by ~ 10x !

9.2x 14.4x Speech recognition RNN training Image classification CNN training 0.5 1 10 20 30 40 CDF Time per step (sec) Native RDMA Container+TCP

slide-7
SLIDE 7

Our Work: FreeFlow

  • Enable high speed RDMA networking capabilities for containerized

applications

  • Compatible with existing RDMA applications
  • Close to native RDMA performance
  • Evaluation with real-world data-intensive applications

6

slide-8
SLIDE 8

Outline

  • Motivation
  • FreeFlow Design
  • Implementation and Evaluation

7

slide-9
SLIDE 9

FreeFlow Design Overview

8

RDMA App RDMA NIC

Native RDMA FreeFlow

RDMA NIC Container 1 IP: 10.0.0.1 RDMA App Container 2 IP: 20.0.0.1 FreeFlow IP: 30.0.0.1 Host Host Verbs library Verbs library Verbs API Verbs API RDMA App Verbs API NIC command

slide-10
SLIDE 10

Background on RDMA

9

RDMA App RDMA NIC

MEM-1 RDMA CTX

RDMA App RDMA NIC

MEM-2 RDMA CTX

  • 1. Control path
  • Setup RDMA Context
  • Post work requests (e.g., write)
  • 2. Data path
  • NIC processes work requests
  • NIC directly accesses memory

“Host 1 wants to write contents in MEM-1 to MEM-2 on Host 2”

Host 1 Host 2 Verbs library Verbs library

slide-11
SLIDE 11

FreeFlow in the Scene

10

RDMA App RDMA NIC

MEM-1 RDMA CTX

“Container 1 wants to write contents in MEM-1 to MEM-2 on Container 2”

Container 1 Container 2 FreeFlow

S-RDMA CTX S-MEM-1

RDMA App RDMA NIC

RDMA CTX MEM-2

FreeFlow

S-MEM-2 S-RDMA CTX

C1: How to forward verbs calls? Verbs library Verbs library C2: How to synchronize memory?

slide-12
SLIDE 12

Challenge 1: Verbs forwarding in Control Path

11

RDMA App RDMA NIC FreeFlow Verbs library Container NIC command Verbs API

?

RDMA App Shim ibv_post_send (struct ibv_qp* qp, …) Attempt 1: Forward “as it is” ➔Incorrect Attempt 2: “Serialize” and forward ➔ Inefficient struct ibv_qp { struct ibv_context *context;

….

};

slide-13
SLIDE 13

Internal Structure of Verbs Library

12

RDMA App RDMA NIC FreeFlow Verbs library Container NIC command Verbs API

?

RDMA App Shim ibv_post_send (struct ibv_qp* qp, …) struct ibv_qp { struct ibv_context *context;

….

}; Parameters are serialized by Verbs library!

slide-14
SLIDE 14

FreeFlow Control Path Channel

13

RDMA App FreeFlow library Write (VNIC_fd, serialized parameters) Parameters are forwarded correctly without manual serialization!

Idea: Leveraging the serialized output of verbs library

Verbs library Shim RDMA App RDMA NIC FreeFlow Router Verbs library Container NIC command Verbs API VNIC ibv_post_send (struct ibv_qp* qp, ….) FreeFlow Router VNIC

slide-15
SLIDE 15

Challenge 2: Synchronizing Memory for Data Path

14

  • Shadow memory in FreeFlow router
  • A copy of application’s memory region
  • Directly accessed by NICs
  • S-MEM and MEM must be synchronized.
  • How to synchronize S-MEM and MEM?

RDMA App RDMA NIC

MEM RDMA CTX

FreeFlow Router

S-RDMA CTX S-MEM

Verbs library

VNIC

Container

slide-16
SLIDE 16

Strawman Approach for Synchronization

15

“Container 1 wants to write contents in MEM-1 to MEM-2 on Container 2”

RDMA App RDMA NIC

MEM-1 RDMA CTX

FreeFlow Router

S-RDMA CTX S-MEM-1

Verbs library

VNIC

Container RDMA App RDMA NIC

RDMA CTX MEM-2

FreeFlow Router

S-MEM-2 S-RDMA CTX

Verbs library

VNIC

Container DATA

?

Explicit synchronization High freq.➔ High overhead Low freq.➔ Wrong data for app

slide-17
SLIDE 17

Containers can Share Memory Regions

16

RDMA App RDMA NIC

MEM-1 RDMA CTX

FreeFlow Router

S-RDMA CTX S-MEM-1

Verbs library

VNIC

Container Host Shared memory

MEM

MEM and S-MEM can be located on the same physical memory region

  • FreeFlow router is running in a container
slide-18
SLIDE 18

Zero-copy Synchronization in Data Path

17

RDMA App RDMA NIC

MEM-1 RDMA CTX

FreeFlow Router

S-RDMA CTX S-MEM-1

Verbs library

VNIC

Container Host Shared memory

MEM

Synchronization without explicit memory copy: Method1: Allocate shared buffers with FreeFlow APIs Method2: Re-map app’s memory space to shadow memory space FreeFlow supports both! How to allocated MEM-1 to shadow memory space?

slide-19
SLIDE 19

FreeFlow Design Summary

18

FreeFlow control path channel Zero-copy memory synchronization

FreeFlow provides near native RDMA performance for containers!

RDMA NIC Container 1 IP: 10.0.0.1 RDMA App Container 2 IP: 20.0.0.1 FreeFlow Router IP: 30.0.0.1 Verbs library RDMA App

VNIC VNIC

slide-20
SLIDE 20

Outline

  • Motivation
  • FreeFlow Design
  • Implementation and Evaluation

19

slide-21
SLIDE 21

Implementation and Experimental Setup

  • FreeFlow Library
  • Add 4000 lines in C to libibverbs and libmlx4.
  • FreeFlow Router
  • 2000 lines in C++
  • Testbed setup
  • Two Intel Xeon E5-2620 8-core CPUs, 64 GB RAM
  • 56 Gbps Mellanox ConnectX-3 NICs
  • Docker containers

20

slide-22
SLIDE 22

Does FreeFlow Support Low Latency?

21

1 2 3 4 64 256 1K 4K Latency (us) Message size (B) Native RDMA FreeFlow 0.38μs

slide-23
SLIDE 23

Does FreeFlow Support High Throughput?

22

20 40 60 2K 8K 32K 128K 512K 1M Throughput (Gbps) Message size (B) Native RDMA FreeFlow

Bounded by control path channel performance

slide-24
SLIDE 24

Do Applications Benefit from FreeFlow?

23

0.5 1 10 20 30 40 CDF Time per step (sec) Container+TCP Native RDMA FreeFlow 8.7x

slide-25
SLIDE 25

Summary

  • Containerization today can’t benefit from speed of RDMA.
  • Existing solutions for NIC virtualization don’t work (e.g., SR-IOV).
  • FreeFlow enables containerized apps to use RDMA.
  • Challenges and Key Ideas
  • Control path: Leveraging Verbs library structure for efficient Verbs forwarding
  • Data path: Zero-copy memory synchronization
  • Performance close to native RDMA

24

github.com/microsoft/freeflow