Generic External Memory for Switch Data Planes Daehyeok Kim Yibo - - PowerPoint PPT Presentation

generic external memory for switch data planes
SMART_READER_LITE
LIVE PREVIEW

Generic External Memory for Switch Data Planes Daehyeok Kim Yibo - - PowerPoint PPT Presentation

Generic External Memory for Switch Data Planes Daehyeok Kim Yibo Zhu, Changhoon Kim, Jeongkeun Lee, Srinivasan Seshan Enabling Virtual Switching on ToR Switch Move virtual switch (Tenant, VM IP) Host IP Multi-million to ToR switch (1,


slide-1
SLIDE 1

Generic External Memory for Switch Data Planes

Daehyeok Kim

Yibo Zhu, Changhoon Kim, Jeongkeun Lee, Srinivasan Seshan

slide-2
SLIDE 2

Enabling Virtual Switching on ToR Switch

2

Customers’ Bare-metal servers

(Tenant, VM IP) Host IP (1, 20.0.0.1) 10.0.0.1 (1, 20.0.0.2) 10.0.1.1 (1, 20.0.0.3) 10.0.2.1 … …

Multi-million Entries ≫ SRAM size! Cannot install virtual switches on the servers

Limited SRAM space is bottleneck for memory-intensive applications!

Move virtual switch to ToR switch

slide-3
SLIDE 3

Current Trend: Moving Functionality to Switches

3

All these applications can benefit from large memory space!

slide-4
SLIDE 4

Programmable Switch Chips Need More Memory

  • Programmable data plane technology
  • E.g., Protocol-Independent Switch Architecture (PISA) + P4
  • Flexible but only with on-chip SRAM cache

4

+

Programmable switch chip w/ SRAM cache

DRAM

= Lots of innovative applications!

slide-5
SLIDE 5

Status quo

5

  • Fixed-function switch chips built with fixed-function external memory
  • These aren’t very useful
  • Inflexible: Usage fixed at design time
  • Fixed and small scale: Memory size and bandwidth fixed at design time
  • Expensive: Chip getting larger and complex

Is programmable switch chip + general-purpose memory possible?

slide-6
SLIDE 6

GEM: Generic External Memory for Programmable Data Planes

6

General-purpose DRAM pool Programmable switch chip

Flexible memory access BW and size Re-use commodity hardware

slide-7
SLIDE 7

Key Components

7 Match Action 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … …

Match Action 20.0.0.1:80 10.0.0.1:20 … …

miss

Dst: 20.0.0.2:80 Dst: 10.0.1.1:20

C1: Remote memory access channel C3: Remote data structures and APIs C2: Packet management during remote memory access

slide-8
SLIDE 8

C1: Remote Memory Access Channel

  • Goal: Enable programmable switch chip to directly access memory
  • Purely access DRAM: No impact to the server’s existing compute and

networking workloads

  • Minimal latency between the chip and memory
  • Challenge: How to generate RDMA requests from the data plane?
  • Programmable switch chip cannot generate arbitrary new packets

8

Leverage RDMA!

*RDMA: Remote Direct Memory Access

slide-9
SLIDE 9

Accessing Remote Memory from Data Plane via RDMA

9 Match Action 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … …

Match Action 20.0.0.1:80 10.0.0.1:20 … …

miss

Dst: 20.0.0.2:80 Dst: 10.0.1.1:20

Generating RDMA request

  • 1. Clone and truncate a packet
  • 2. Add RDMA headers

READ Resp. READ (entry) DRAM server Context Server #1 QP#, SEQ#, ACK#, … … …

Implementable in P4

ETH Header RDMA Header (READ)

slide-10
SLIDE 10

Key Components

10 Match Action 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … …

Match Action 20.0.0.1:80 10.0.0.1:20 … …

miss

Dst: 20.0.0.2:80 Dst: 10.0.1.1:20

C1: Remote memory access channel C3: Remote data structures and APIs C2: Packet management during remote memory access

slide-11
SLIDE 11

C2: Packet Management during Remote Memory Access

11 Match Action 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … …

Match Action 20.0.0.1:80 10.0.0.1:20 … …

miss

Dst: 20.0.0.2:80 READ (entry) READ Resp. Dst: 10.0.1.1:20 Packet buffer in on-chip SRAM

Consuming too much SRAM space! L

slide-12
SLIDE 12

Depositing Packets on Remote Buffer

12 Match Action Packet 20.0.0.1:80 10.0.0.2:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … …

Match Action 20.0.0.1:80 10.0.0.1:20 … …

miss

Dst: 20.0.0.2:80 Dst: 10.0.1.1:20 READ (entry) WRITE (pkt) READ Resp.

slide-13
SLIDE 13

Key Components

13 Match Action 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … …

Match Action 20.0.0.1:80 10.0.0.1:20 … …

miss

Dst: 20.0.0.2:80 Dst: 10.0.1.1:20

C1: Remote memory access channel C3: Remote data structures and APIs C2: Packet management during remote memory access

slide-14
SLIDE 14

C3: Remote Data Structures and APIs

14 Match Action Packet 0xA 20.0.0.1:80 10.0.0.2:20 0xB 20.0.0.2:80 10.0.1.1:20 0xC 20.0.0.3:80 10.0.2.1:20 … … …

Match Action 20.0.0.1:80 10.0.0.1:20 … …

miss

Dst: 20.0.0.2:80 READ (entry) @ 0xB READ Resp. WRITE (pkt) @ 0xB

How to locate remote entry?

Match Mem addr 20.0.0.1:80 0xA 20.0.0.2:80 0xB 20.0.0.3:80 0xC … …

Consuming too much SRAM space! L

slide-15
SLIDE 15

General Data Structures and APIs?

  • Ongoing work: designing general data structures for remote memory
  • Proof-of-concept use cases for specific applications
  • Lookup table extension for extending virtual switch table
  • Packet buffer extension for mitigating packet drops due to incast
  • State store extension for network telemetry

15

slide-16
SLIDE 16

Use Case: Extending Lookup Table

16

Customers’ Bare-metal servers Remote table

Match Action 20.0.0.1:80 10.0.0.1:20 … …

Fetch entries from remote tables J

Match Action 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … …

Hot entries on SRAM Entire entries

slide-17
SLIDE 17

Other Use Cases

17

  • Packet buffer extension for

mitigating packet drops

Remote buffer servers

Queue is full… Dropping packets L Reduce packet drops J

Remote State stores

Can’t maintain many stateful objects L Update the remote stores J

  • State store extension for

network telemetry

slide-18
SLIDE 18

Experiment Setup

18

ETH IP/DSCP=0x00 TCP Payload ETH IP/DSCP=0x28 TCP Payload Action DSCP -> 0x28 Run NPTcp

Server Server

*Baseline: Simple L2 switch

slide-19
SLIDE 19

Results

  • End-to-end latency
  • Packet store / load throughput: close to the line rate (≈ 37.5 Gbps)

19

1 - 2 μs additional latency

slide-20
SLIDE 20

Summary

20

Vision: Generic External Memory for Programmable Data Plane

Q1: Efficient caching on SRAM GEM will be a key enabler for innovations in networking and computational networking! Q3: Handling server failures Q2: Dynamically scaling DRAM pool