ServerSwitch: A Programmable and High Performance Platform for Data - - PowerPoint PPT Presentation

serverswitch a programmable and
SMART_READER_LITE
LIVE PREVIEW

ServerSwitch: A Programmable and High Performance Platform for Data - - PowerPoint PPT Presentation

ServerSwitch: A Programmable and High Performance Platform for Data Center Networks Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou , Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, Yongguang Zhang Microsoft Research Asia Tsinghua


slide-1
SLIDE 1

ServerSwitch: A Programmable and High Performance Platform for Data Center Networks

Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou†, Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, Yongguang Zhang Microsoft Research Asia †Tsinghua University

NSDI 2011, Boston, USA

slide-2
SLIDE 2

Motivations

  • Lots of research and innovations in DCN

– PortLand, DCell/BCube, CamCube, VL2, … – Topology, routing, congestion control, network services, etc.

  • Many DCN designs depart from current practices

– BCube uses self-defined packet header for source routing – Portland performs LPM on destination MAC – Quantized Congestion Notification (QCN) requires the switches to send explicit congestion notification

  • Need a platform to prototype existing and many future

DCN designs

NSDI 2011, Boston, USA

slide-3
SLIDE 3

Requirements

  • Programmable and high-performance packet

forwarding engine

– Wire-speed packet forwarding for various packet sizes – Various packet forwarding schemes and formats

  • New routing and signaling, flow/congestion control

– ARP interception (PortLand), adaptive routing (BCube), congestion control (QCN)

  • Support new DCN services by enabling in-network

packet processing

– Network cache service (CamCube), Switch-assisted reliable multicast (SideCar)

NSDI 2011, Boston, USA

slide-4
SLIDE 4

Existing Approaches

  • Existing switches/routers

– Usually closed system, no programming interface

  • OpenFlow

– Mainly focus on control plane at present – Unclear how to support new congestion control mechanisms and in-network data processing

  • Software routers

– Performance not comparable to switching ASIC

  • NetFPGA

– Not commodity devices and difficult to program

NSDI 2011, Boston, USA

slide-5
SLIDE 5

Technology Trends

Modern Switching Chip

  • High switching capacity

(640Gbps)

  • Rich protocol support

(Ethernet, IP, MPLS)

  • TCAM for advanced packet

filtering

PCI-E Interconnect

  • High bandwidth (160Gbps)
  • Low latency (<1us)

Commodity Server

  • Multi-core
  • Multi 10GE packet

processing capability

NSDI 2011, Boston, USA

slide-6
SLIDE 6

Design Goals

NSDI 2011, Boston, USA

  • Programmable packet forwarding engine in

silicon

– Leverage the high capacity and programmability within modern switching chip for packet forwarding

  • Low latency software processing for control plane

and congestion control messages

– Leverage the low latency PCI-E interface for latency sensitive schemes

  • Software-based in-network packet processing

– Leverage the rich programmability and high performance provided by modern server

slide-7
SLIDE 7
  • Hardware

– Modern Switching chip – Multi-core CPU – PCI-E interconnect

  • Software Stack

– C APIs for switching chip management – Packet Processing in both Kernel and User Space

Architecture

User Space Kernel SC driver ServerSwitch driver API/Library App TCP/IP App NIC driver Hardware External Ports

Ethernet Controller Ethernet Controller Ethernet Controller

NIC chips Switching chip PCI-E PCI-E

ServerSwitch Card

TCAM Server

NSDI 2011, Boston, USA

slide-8
SLIDE 8

Programmable Packet Forwarding Engine

  • Destination-based forwarding, e.g., IP, Ethernet
  • Tag-based forwarding, e.g., MPLS
  • Source Routing based forwarding, e.g., BCube

NSDI 2011, Boston, USA

56338

EM(MPLS) TCAM Prog Parser LPM MPLS Parser

UDLK

IP Parser

DIP Label DIP

MPLS Modifier L2 Modifier Interface Table IP Modifier EM(IP) Eth Parser EM(MAC)

DMAC Index

Classifi er

Limited Programmability High Programmability No Programmabiltiy

P P

slide-9
SLIDE 9

TCAM Basic

NSDI 2011, Boston, USA

TCAM

A 1 1 B A B A B 2 2 3 3 A A 2 B

Key

Value 1 Value 2 Value 3 Value 4 Value 5 Value 6 non-cared cared

slide-10
SLIDE 10

TCAM Based Source Routing

NSDI 2011, Boston, USA

Idx IA1 IA2 IA3

TCAM Incoming Packet

A 1 B A B A B 2 2 3 3 A A 2 B Idx IA1 IA2 IA3 1

Output Port

1 2 1 2 1 2 A A 1 B

slide-11
SLIDE 11

ServerSwitch API

  • Switching chip management

– User defined lookup key extraction – Forwarding table manipulation – Traffic statistics collection

  • Examples:

– SetUDLK(1, (B0-5)) – SetLookupTable(TCAM, 1, 1, “000201000000”, “FFFFFF000000”, {act=REDIRECT_VIF, vif=3}) – ReadRegister(OUTPUT_QUEUE_BYTES_PORT 0)

NSDI 2011, Boston, USA

slide-12
SLIDE 12
  • Hardware

– 4 GE external ports – x4 PCI-E to server – 2x10GE board-to-board interconnection – Cost: 400$ in 80 pieces – Power consumption: 15.7W

Implementation

  • Software

– Windows Server 2008 R2 – Switching chip driver (2670 lines of C) – NIC driver (binary from Intel) – ServerSwitch driver (20719 lines of C) – User library (Based on Broadcom SDK)

NSDI 2011, Boston, USA

4xGE BCM56338 2x10GE Intel 82576EB

slide-13
SLIDE 13

Example 1: BCube

  • Self-defined packet header for BCube source routing
  • Easy to program: Less than 200 LoC to program the

switching chip

Version

HL Tos Total length Identification Flags Fragment offset TTL Protocol Header checksum Source Address Destination Address NHA1 NHA2 NHA3 NHA4 NHA5 NHA6 NHA7 NHA8 Pad BCube Protocol NH B14-17 B18-21 B22-25 B26-29 B30-33 B34-37 B38-41 B42-45

NSDI 2011, Boston, USA

slide-14
SLIDE 14

BCube Experiment

  • ServerSwitch: wire-speed packet forwarding for 64B
  • ServerSwitch: 15.6us forwarding latency, ~1/3 of software

forwarding latency

NSDI 2011, Boston, USA

NetFPGA NetFPGA

4-core i7 server

Forwarding rate (ServerSwitch) Forwarding rate (Software) Latency (Software) Latnecy (ServerSwitch)

slide-15
SLIDE 15

Example 2: Quantized Congestion Notification

  • Congestion notification generation requires

very low latency

NSDI 2011, Boston, USA

ServerSwitch Packet Marker RP UDP Source Token Bucket CP NIC qlen Congestion Notication ① ② ③ Output Port

slide-16
SLIDE 16

QCN Experiment

NSDI 2011, Boston, USA

Sender Receiver Queue Length Throughput

  • Queue fluctuates around equilibrium point (Q_EQ)

Change bandwidth

slide-17
SLIDE 17

Limitations

  • Only support modifications for standard

protocols

– Ethernet MACs, IP TTL, MPLS label

  • Not suitable for low-latency, per-packet

processing

– XCP

  • Limited number of ports and port speed

– Cannot be directly used for fat-tree and VL2 – 4 ServerSwitch cards form a 16-port ServerSwitch, still viable for prototyping fat-tree and VL2

NSDI 2011, Boston, USA

slide-18
SLIDE 18

Summary

  • ServerSwitch: integrating a high performance,

limited programmable ASIC switching chip with a powerful, fully programmable server

– Line-rate forwarding performance for various user-defined forwarding schemes – Support new signaling and congestion mechanisms – Enable in-network data processing

  • Ongoing 10GE ServerSwitch

NSDI 2011, Boston, USA

slide-19
SLIDE 19

Thanks.

Q&A

NSDI 2011, Boston, USA