ServerSwitch: A Programmable and High Performance Platform for Data - PowerPoint PPT Presentation

ServerSwitch: A Programmable and High Performance Platform for Data Center Networks Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou † , Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, Yongguang Zhang Microsoft Research Asia † Tsinghua University NSDI 2011, Boston, USA

Motivations • Lots of research and innovations in DCN – PortLand, DCell /BCube, CamCube, VL2, … – Topology, routing, congestion control, network services, etc. • Many DCN designs depart from current practices – BCube uses self-defined packet header for source routing – Portland performs LPM on destination MAC – Quantized Congestion Notification (QCN) requires the switches to send explicit congestion notification • Need a platform to prototype existing and many future DCN designs NSDI 2011, Boston, USA

Requirements • Programmable and high-performance packet forwarding engine – Wire-speed packet forwarding for various packet sizes – Various packet forwarding schemes and formats • New routing and signaling, flow/congestion control – ARP interception (PortLand), adaptive routing (BCube), congestion control (QCN) • Support new DCN services by enabling in-network packet processing – Network cache service (CamCube), Switch-assisted reliable multicast (SideCar) NSDI 2011, Boston, USA

Existing Approaches • Existing switches/routers – Usually closed system, no programming interface • OpenFlow – Mainly focus on control plane at present – Unclear how to support new congestion control mechanisms and in-network data processing • Software routers – Performance not comparable to switching ASIC • NetFPGA – Not commodity devices and difficult to program NSDI 2011, Boston, USA

Technology Trends Modern Switching Chip PCI-E Interconnect Commodity Server • High switching capacity • High bandwidth (160Gbps) • Multi-core (640Gbps) • Low latency (<1us) • Multi 10GE packet • Rich protocol support processing capability (Ethernet, IP, MPLS) • TCAM for advanced packet filtering NSDI 2011, Boston, USA

Design Goals • Programmable packet forwarding engine in silicon – Leverage the high capacity and programmability within modern switching chip for packet forwarding • Low latency software processing for control plane and congestion control messages – Leverage the low latency PCI-E interface for latency sensitive schemes • Software-based in-network packet processing – Leverage the rich programmability and high performance provided by modern server NSDI 2011, Boston, USA

Architecture • Hardware – Modern Switching App App chip API/Library User Space – Multi-core CPU TCP/IP Server – PCI-E interconnect ServerSwitch driver • Software Stack SC driver NIC driver Kernel – C APIs for switching PCI-E PCI-E ServerSwitch Card External Ports chip management Ethernet TCAM Ethernet Ethernet Controller – Packet Processing in Switching NIC Controller Controller chip chips both Kernel and User Hardware Space NSDI 2011, Boston, USA

Programmable Packet Forwarding Engine High Programmability Eth Parser EM(MAC) DMAC Limited Programmability P P Classifi IP L2 IP Parser er Modifier Modifier LPM DIP DIP EM(IP) No Programmabiltiy MPLS MPLS Parser Modifier Label EM(MPLS) Prog Interface TCAM UDLK Index Parser Table 56338 • Destination-based forwarding, e.g. , IP, Ethernet • Tag-based forwarding, e.g. , MPLS • Source Routing based forwarding, e.g. , BCube NSDI 2011, Boston, USA

TCAM Basic TCAM non-cared cared Key Value 1 1 A Value 2 1 B Value 3 2 A B A 2 A Value 4 2 B Value 5 3 A Value 6 B 3 NSDI 2011, Boston, USA

TCAM Based Source Routing Output TCAM Port Incoming Packet Idx IA 1 IA 2 IA 3 1 A 1 Idx IA 1 IA 2 IA 3 2 1 B 2 A B A 2 A 1 2 B 2 3 A 1 1 A B A B 3 2 NSDI 2011, Boston, USA

ServerSwitch API • Switching chip management – User defined lookup key extraction – Forwarding table manipulation – Traffic statistics collection • Examples: – SetUDLK(1, (B0-5)) – SetLookupTable(TCAM, 1, 1, “000201000000”, “FFFFFF000000”, {act=REDIRECT_VIF, vif=3}) – ReadRegister(OUTPUT_QUEUE_BYTES_PORT 0) NSDI 2011, Boston, USA

Implementation BCM56338 4xGE 2x10GE Intel 82576EB • • Hardware Software – – 4 GE external ports Windows Server 2008 R2 – – x4 PCI-E to server Switching chip driver (2670 lines of C) – – 2x10GE board-to-board interconnection NIC driver (binary from Intel) – – Cost: 400$ in 80 pieces ServerSwitch driver (20719 lines of C) – – Power consumption: 15.7W User library (Based on Broadcom SDK) NSDI 2011, Boston, USA

Example 1: BCube B14-17 Version HL Tos Total length B18-21 Identification Flags Fragment offset B22-25 TTL Protocol Header checksum B26-29 Source Address B30-33 Destination Address B34-37 NHA 1 NHA 2 NHA 3 NHA 4 B38-41 NHA 5 NHA 6 NHA 7 NHA 8 B42-45 BCube Protocol NH Pad • Self-defined packet header for BCube source routing • Easy to program: Less than 200 LoC to program the switching chip NSDI 2011, Boston, USA

BCube Experiment Forwarding rate (ServerSwitch) Latency (Software) NetFPGA 4-core i7 server NetFPGA Forwarding rate (Software) Latnecy (ServerSwitch) • ServerSwitch: wire-speed packet forwarding for 64B • ServerSwitch: 15.6us forwarding latency, ~1/3 of software forwarding latency NSDI 2011, Boston, USA

Example 2: Quantized Congestion Notification ServerSwitch UDP Source RP CP ① qlen Token Packet ③ Bucket Marker ② Congestion Output Port Notication NIC • Congestion notification generation requires very low latency NSDI 2011, Boston, USA

QCN Experiment Sender Queue Length Throughput Change bandwidth Receiver • Queue fluctuates around equilibrium point (Q_EQ) NSDI 2011, Boston, USA

Limitations • Only support modifications for standard protocols – Ethernet MACs, IP TTL, MPLS label • Not suitable for low-latency, per-packet processing – XCP • Limited number of ports and port speed – Cannot be directly used for fat-tree and VL2 – 4 ServerSwitch cards form a 16-port ServerSwitch, still viable for prototyping fat-tree and VL2 NSDI 2011, Boston, USA

Summary • ServerSwitch: integrating a high performance, limited programmable ASIC switching chip with a powerful, fully programmable server – Line-rate forwarding performance for various user-defined forwarding schemes – Support new signaling and congestion mechanisms – Enable in-network data processing • Ongoing 10GE ServerSwitch NSDI 2011, Boston, USA

Thanks. Q&A NSDI 2011, Boston, USA

ServerSwitch: A Programmable and High Performance Platform for Data - PowerPoint PPT Presentation

ServerSwitch: A Programmable and High Performance Platform for Data Center Networks Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou , Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, Yongguang Zhang Microsoft Research Asia Tsinghua

ROMs, PLAs and FPGAs October 5, 2006 Typeset by Foil T EX Why Programmable Logic?

PROGRAMMABLE LOGIC CONTROLLER Control Systems Types Programmable Logic Controllers

Field Programmable Gate Arrays by Ketil Red Field Programmable Gate Array Integrated

Built- -In Self In Self- -Test for Programmable Test for Programmable Built I/O Buffers in

VHDL VHDL - Flaxer Eli Ch 2 - 1 Programmable Logic Review (last chapter) VHDL and

Common Lisp - The programmable programing language Ben Dudson Common Lisp - The programmable

Programable Logic Devices In the 1970s programmable logic circuits called programmable logic

Regulatory Guidance on the Use of Field Programmable Gate of Field Programmable Gate Arrays in

Outline FPGA clocking Programmable clocks Dynamic programmable oscillators EMI

Programmable Data Plane at Terabit Speeds Milad Sharif SOFTWARE ENGINEER PISA: Protocol

Nanowire- -Based Based Nanowire Programmable Programmable Architectures Architectures

TESTING PROGRAMMABLE INFRASTRUCTURE (WITH RUBY) @burythehammer PROGRAMMABLE INFRASTRUCTURE IS

Open Programmable Architecture for Java-enabled Network Devices Tal Lavian Technology Center

SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture 4: Programmable ASICs L 4 P

Programmable Switch Hardware ECE/CS598HPN Radhika Mittal Conventional SDN Programmable

Real-Time Programmable Real-Time Programmable Shading Shading

Disclosures Philips, consultant Case Presentations: Problem Cases from the Liver/GI Consult

Jonathan S. Shapiro Jonathan M. Smith David J. Farber Priorities 1.Security & Integrity

6.808: Mobile and Sensor Computing Lecture 10: The Pothole Patrol Hari Balakrishnan hari@mit.edu

Preimages for Step-Reduced SHA-2 Jian Guo 1 Krystian Matusiewicz 2 Nanyang Technological

The SAE Architecture Analysis and Description Language (AADL) Standard: A Basis for Architecture-

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler, Richard C. Murphy, Dylan Stark, and

Programs Great American Cleanup Li5er is Wrong Too

Lecture 22/Chapter 19 Finding Data in Life: 1. Part 4. Statistical Inference Ch. 19

ServerSwitch: A Programmable and High Performance Platform for Data - PowerPoint PPT Presentation

ServerSwitch: A Programmable and High Performance Platform for Data Center Networks Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou , Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, Yongguang Zhang Microsoft Research Asia Tsinghua

ROMs, PLAs and FPGAs October 5, 2006 Typeset by Foil T EX Why Programmable Logic?

PROGRAMMABLE LOGIC CONTROLLER Control Systems Types Programmable Logic Controllers

Field Programmable Gate Arrays by Ketil Red Field Programmable Gate Array Integrated

Built- -In Self In Self- -Test for Programmable Test for Programmable Built I/O Buffers in

VHDL VHDL - Flaxer Eli Ch 2 - 1 Programmable Logic Review (last chapter) VHDL and

Common Lisp - The programmable programing language Ben Dudson Common Lisp - The programmable

Programable Logic Devices In the 1970s programmable logic circuits called programmable logic

Regulatory Guidance on the Use of Field Programmable Gate of Field Programmable Gate Arrays in

Outline FPGA clocking Programmable clocks Dynamic programmable oscillators EMI

Programmable Data Plane at Terabit Speeds Milad Sharif SOFTWARE ENGINEER PISA: Protocol

Nanowire- -Based Based Nanowire Programmable Programmable Architectures Architectures

TESTING PROGRAMMABLE INFRASTRUCTURE (WITH RUBY) @burythehammer PROGRAMMABLE INFRASTRUCTURE IS

Open Programmable Architecture for Java-enabled Network Devices Tal Lavian Technology Center

SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture 4: Programmable ASICs L 4 P

Programmable Switch Hardware ECE/CS598HPN Radhika Mittal Conventional SDN Programmable

Real-Time Programmable Real-Time Programmable Shading Shading

Disclosures Philips, consultant Case Presentations: Problem Cases from the Liver/GI Consult

Jonathan S. Shapiro Jonathan M. Smith David J. Farber Priorities 1.Security &amp; Integrity

6.808: Mobile and Sensor Computing Lecture 10: The Pothole Patrol Hari Balakrishnan hari@mit.edu

Preimages for Step-Reduced SHA-2 Jian Guo 1 Krystian Matusiewicz 2 Nanyang Technological

The SAE Architecture Analysis and Description Language (AADL) Standard: A Basis for Architecture-

The Chapel Tasking Layer Over Qthreads Kyle B. Wheeler, Richard C. Murphy, Dylan Stark, and

Programs Great American Cleanup Li5er is Wrong Too

Lecture 22/Chapter 19 Finding Data in Life: 1. Part 4. Statistical Inference Ch. 19

Jonathan S. Shapiro Jonathan M. Smith David J. Farber Priorities 1.Security & Integrity