Using CPU as a Traffic Co-processing Unit in Commodity Switches - - PowerPoint PPT Presentation

using cpu as a traffic co processing unit in commodity
SMART_READER_LITE
LIVE PREVIEW

Using CPU as a Traffic Co-processing Unit in Commodity Switches - - PowerPoint PPT Presentation

Using CPU as a Traffic Co-processing Unit in Commodity Switches Guohan Lu , Rui Miao + , Yongqiang Xiong and Chuanxiong Guo Microsoft Research Asia + Tsinghua University Background Commodity switches are the basic building blocks in


slide-1
SLIDE 1

Using CPU as a Traffic Co-processing Unit in Commodity Switches

Guohan Lu, Rui Miao+, Yongqiang Xiong and Chuanxiong Guo Microsoft Research Asia

+Tsinghua University

slide-2
SLIDE 2
  • Commodity switches are the basic building

blocks in enterprise and data center networks

– PortLand and VL2 build entire DCN with 1U commodity switches

Background

slide-3
SLIDE 3

Background (cont’)

  • Commodity switches now widely adopt single

switching chip design

  • Greatly simplifies switch design and lowers

down the cost

All-in-one switching ASIC CPU for control plane DRAM

slide-4
SLIDE 4

Limitation (I)

  • Limited forwarding table size for flow-based

forwarding schemes, e.g. Openflow

– Openflow provides finest granularity for better security (Ethane), traffic load balancing (Hedera), Energy saving (ElasticTree) – 4k flow entries for most recent BRCM switching chip

Data center for map- reduce style applications with 120 ToR and ~5k servers # of active flows ≥ 4096 for 95%+ time

4k

slide-5
SLIDE 5

Limitation (II)

  • Shallow packet buffer for bursty traffic

– Switching ASIC has only several MB buffer – Bursty traffic pattern, e.g. TCP incast, TCP flash crowds – Packet drops lead to degraded network performance

Receiver R0 R1 R2 Senders

slide-6
SLIDE 6

Design Goals

  • Large forwarding table

– Support large forwarding table for forwarding schemes such as OpenFlow

  • Deep packet buffer

– Absorb temporary traffic bursts, e.g., TCP incast, TCP flash crowds

slide-7
SLIDE 7

Multicore CPU for packet processing

Assumptions for Commodity Switches

High speed interconnect as high speed data channel Large DRAM as off- chip packet buffer Ethernet ports All-in-one switching ASIC Future switch box

slide-8
SLIDE 8

Large forwarding table

  • Complete forwarding table in software
  • Partial forwarding table in hardware

Switch ASIC CPU software fwd table hw fwd table

slide-9
SLIDE 9

TraFfic Offloading Ratio (TFOR)

  • TFOR: Traffic forwarded by HW v.s. all traffic
  • Obtain TFOR: For every minute, get flow rates, sort the

flows based on the rates, put k fastest flows in HW.

  • TFOR ≥ 92% for 95%+ time when k = 4096
slide-10
SLIDE 10

Flow Management

  • k fastest flows are forwarded by hardware, rest are

forwarded by software

  • Assume one byte counter per flow in hardware
  • Procedures
  • Count software-forwarded flow bytes, periodically read

the counters from hardware

  • Rank flows based on their rates and determine k fastest

flows

  • Offload fast flows to hardware and onload slow flows to

software

slide-11
SLIDE 11

Deep Packet Buffer

  • Phase 1: Traffic redirection
  • Phase 2: Cancel redirection

Switching chip CPU Memory Memory Memory Server High watermark Low watermark Internal high bandwidth Channel

slide-12
SLIDE 12

Internal bandwidth Needed

  • Receiver: delayed ack disabled
  • Senders: TCP slow start
  • No packet drops when internal bandwidth is larger than 2C.

Switch ASIC data flow ack flow S S S 𝑆𝑗𝑜

𝑒𝑏𝑢𝑏 = 2𝐷 𝑁𝑇𝑇

𝑆𝑗𝑜

𝑏𝑑𝑙 = 𝐷 𝑁𝑇𝑇

CPU R 𝑆𝑝𝑣𝑢

𝑒𝑏𝑢𝑏 = 𝐷 𝑁𝑇𝑇

? ≥2C

slide-13
SLIDE 13

Prototype

  • A 16xGE port switch using

4 ServerSwitch cards

  • HP z800 workstation

– 8 CPU cores – 48GB DRAM

  • Kernel code for packet

forwarding

  • User space code for

switch ASIC management

16xGE 10GE

slide-14
SLIDE 14

Large Forwarding Table

  • 10 min synthesized

traffic using flow size distribution from DCN measurements

  • 1,792 HW flow entries

Interval ratio Total bytes (GB) # of active flows TFOR 1x 33.6 10,644 96.1% 1/10x 336 106,544 90.5% S S S R S R R R

slide-15
SLIDE 15

Deep Packet Buffer

1024 Requests SYN/ACK timeout Data timeout Fast Recovery Packet drops

TCP 109 180 690 15962 DCTCP 23 395 173 3302 DeepBuf

S S S C 15 Servers Requests Responses TCP Flash Crowds last for 1 second

slide-16
SLIDE 16

Conclusions

  • Two major limitations of current commodity

switches

– Limited forwarding table for Openflow – Shallow packet buffer for bursty traffic pattern

  • Use CPU as traffic co-processor to address

these two limitations

slide-17
SLIDE 17

QUESTIONS?