Tagger: Practical PFC Deadlock Prevention in Data Center Networks - - PowerPoint PPT Presentation

tagger practical pfc deadlock prevention in data center
SMART_READER_LITE
LIVE PREVIEW

Tagger: Practical PFC Deadlock Prevention in Data Center Networks - - PowerPoint PPT Presentation

Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST) , Yibo Zhu, Peng Cheng, Chuanxiong Guo* (Toutiao), Kun Tan*(Huawei), Jitendra Padhye, Kai Chen (HKUST) Microsoft CoNEXT 2017, Incheon, South Korea * Work done


slide-1
SLIDE 1

Tagger: Practical PFC Deadlock Prevention in Data Center Networks

Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo* (Toutiao), Kun Tan*(Huawei), Jitendra Padhye, Kai Chen (HKUST) Microsoft

1

* Work done while at Microsoft

CoNEXT 2017, Incheon, South Korea

slide-2
SLIDE 2

2

RDMA: Remote Direct Memory Access

v High throughput, low latency with low CPU overhead v Microsoft, Google, etc. are deploying RDMA

RDMA Application RDMA NIC

Kernel

RDMA Application RDMA NIC

Lossless Network

kernel bypass kernel bypass

(With PFC)

Kernel

RDMA is Being Widely Deployed

slide-3
SLIDE 3

Congestion

PAUSE upstream switch when PFC threshold reached v Avoid packet drop due to buffer overflow

3

Priority Flow Control (PFC)

PFC threshold: 3pkts

PAUSE

slide-4
SLIDE 4

4

Due to Cyclic Buffer Dependency (CBD) A->B->C->A

Not just a theoretical problem, we have seen it in our datacenters too! PFC threshold Switch A Switch B PAUSE PAUSE PAUSE Switch C

A Simple Illustration of PFC Deadlock

slide-5
SLIDE 5

5

CBD in the Clos Network

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

slide-6
SLIDE 6

6

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2 flow 1 flow 2

consider two flows initially follow shortest UP-DOWN paths

CBD in the Clos Network

slide-7
SLIDE 7

7

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2 flow 1 flow 2

CBD in the Clos Network

due to link failures, both flows are locally rerouted to non-shortest paths

slide-8
SLIDE 8

8

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2 flow 1 flow 2

CBD: L2->S1->L3->S2->L2 L2 S1

RX

L3 S2

RX RX RX RX RX

buffer dependency graph

CBD in the Clos Network

these two DOWN-UP bounced flows create CBD

slide-9
SLIDE 9

9

Real in Production Data Centers?

Packet reroute measurements in more than 20 data centers:

~100,000 DOWN-UP reroutes!

slide-10
SLIDE 10
  • #1: transient problem à PERMANENT deadlock

v Transient loops due to link failures v Packet flooding v …

  • #2: small deadlock can cause large deadlock

deadlock

10

PAUSE PAUSE PAUSE PAUSE PAUSE PAUSE PAUSE

Handling Deadlock is Important

slide-11
SLIDE 11

Three Key Challenges

11

What are the challenges in designing a practical deadlock prevention solution?

Ø No change to existing routing protocols or hardware Ø Link failures & routing errors are unavoidable at scale Ø Switches support at most 8 limited lossless priorities

(and typically only two can be used)

slide-12
SLIDE 12
  • #1: deadlock-free routing protocols

v not supported by commodity switches (fail challenge #1) v not work with link failures or routing errors (fail challenge #2)

  • #2: buffer management schemes

v require a lot of lossless priorities (fail challenge #3)

12

The Existing Deadlock Prevention Solutions

Our answer: Tagger

slide-13
SLIDE 13

TAGGER DESIGN

13

slide-14
SLIDE 14

14

Important Observation

Fat-tree [Sigcomm’08] VL2 [Sigcomm’09]

desired path set: all shortest paths

BCube [Sigcomm’09]

desired path set: dimension-order paths

HyperX [SC’09]

Takeaway: In a data center, we can ask operator to supply a set of expected lossless paths (ELP)!

slide-15
SLIDE 15

15

Basic Idea of Tagger

  • 1. Ask operators to provide:

v topology & expected lossless paths (ELP)

  • 2. Packets carrying tags when in the network
  • 3. Pre-install match-action rules at switches for tag

manipulation and packet queueing

v packets travel over ELP: lossless queues & CBD never forms v packets deviate ELP: lossy queue, thus PFC not triggered

slide-16
SLIDE 16

16

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2 flow 1 flow 2

Illustrating Tagger for Clos Topology

ELP = all shortest paths (CBD-free)

Root cause of CBD: packets deviate UP-DOWN routing!

slide-17
SLIDE 17

17

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

tag = NoBounce

  • Under Tagger, packets carry tags when travelling in the network
  • Initially, tag value = NoBounce
  • At switches, Tagger pre-install match-action rules for tag manipulation

Tag InPort OutPort NewTag NoBounce S1 S2 Bounced … … … …

flow 1 match action

Illustrating Tagger for Clos Topology

match-action rules installed at switches

slide-18
SLIDE 18

18

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

Packet received by switch L3

Tag InPort OutPort NewTag NoBounce S1 S2 Bounced … … … …

flow 1 match action

tag = NoBounce

Illustrating Tagger for Clos Topology

match-action rules installed at switches

slide-19
SLIDE 19

19

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

tag = NoBounce

rewrite tag once DOWN-UP bounce detected

flow 1 match action

Tag InPort OutPort NewTag NoBounce S1 S2 Bounced … … … …

down-up bounce observed!

Bounced

Illustrating Tagger for Clos Topology

slide-20
SLIDE 20

20

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

tag = Bounced

  • S2 knows it is a bounced packet that deviates ELP à placed in the lossy queue
  • No PFC PAUSE sent from S2 to L3 à buffer dependency from L3 to S2 removed

flow 1

Illustrating Tagger for Clos Topology

slide-21
SLIDE 21

21

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2 flow 2

  • Tagger will do the same for packets of flow 2
  • 2 buffer dependency edges are removed à CBD is eliminated

CBD: L2->S1->L3->S2->L2 L2 S1

RX

L3 S2

RX RX RX RX RX

buffer dependency graph L2 S1

RX

L3 S2

RX RX RX RX RX

Illustrating Tagger for Clos Topology

slide-22
SLIDE 22

22

What If ELP Has CBD?

ELP = shortest paths

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2 + 1-bounce paths (ELP has CBD now!)

slide-23
SLIDE 23

23

Segmenting ELP into CBD-free Subsets

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

flow 1 flow 2

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

flow 1 flow 2

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

flow 1 flow 2

path segments before bounce (only have UP-DOWN paths, no CBD) two bounced paths are in ELP now path segments after bounce (only have UP-DOWN paths, no CBD)

slide-24
SLIDE 24

24

Isolating Path Segments with Tags

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

flow 1 flow 2

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

flow 1 flow 2

tag 1 à path segments before bounce tag 2 à path segments after bounce

slide-25
SLIDE 25

25

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2 flow 1 tag = 1

Isolating Path Segments with Tags

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

tag = 2

Adding a rule at switch L3: (Tag = 1, Inport=S1, OutPort = S2) -> NewTag = 2

slide-26
SLIDE 26

26

No CBD after Segmentation

CBD: L2->S1->L3->S2->L2 buffer dependency graph

L2 S1

1 2 1 1

L3 S2

2 1 1 1

packets with tag i à i-th lossless queue

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

flow 1 flow 2

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

flow 1 flow 2

tag 2 tag 1

slide-27
SLIDE 27

27

What If k-bounce Paths all in ELP?

ELP = shortest up-down paths + 1-bounce paths

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

k-bounce paths solution: just segmenting ELP into k CBD-free subsets based on number of bounced times!

slide-28
SLIDE 28

28

Summary: Tagger Design for Clos Topology

  • 1. Initially, packets carry with tag = 1
  • 2. pre-install match-action rules at switches:
  • DOWN-UP bounce: increase tag by 1
  • Enqueue packets with tag i to i-th lossless queue (i <= k+1)
  • Enqueue packets with tag i to lossy queue(i > k+1)

For Clos topology, Tagger is optimal in terms of # of lossless priorities.

slide-29
SLIDE 29

29

How to Implement Tagger?

  • DSCP field in the IP header as the tag carried in the packets
  • build 3-step match-action pipeline with basic ACL rules available

in commodity switches

slide-30
SLIDE 30

30

Tagger Meets All the Three Challenges

  • 1. Work with existing routing protocols & hardware
  • 2. Work with link failures & routing errors
  • 3. Work with limited number of lossless queues
slide-31
SLIDE 31

More Details in the Paper

  • Proof of Deadlock freedom
  • Analysis & Discussions

– Algorithm complexity – Optimality – Compression of match-action rules – …

31

slide-32
SLIDE 32

32

Evaluation-1: Tagger prevents Deadlock

L1 L2 T1 T2 L3 L4 T3 T4 S1 S2

flow 1 flow 2

Scenario: two flows forms CBD

Tagger avoids CBD caused by bounced flows, and prevents deadlock!

deadlock!

slide-33
SLIDE 33

33

Evaluation-2: Scalability of Tagger

Tagger is scalable in terms of number of lossless priorities and ACL rules. Match-action rules and priorities required for Jellyfish topology

* last entry includes additional 20,000 random paths.

slide-34
SLIDE 34

34

Evaluation-3: Overhead of Tagger

Tagger rules have no impact on throughput and latency

slide-35
SLIDE 35

35

Conclusion

  • Tagger: a tagging system guarantees deadlock-freedom

– Practical: Ørequire no change to existing routing protocols Øimplementable with existing commodity switching ASICs Øwork with limited number of lossless priorities – General: Øwork with any topologies Øwork with any ELPs

slide-36
SLIDE 36

36

Thanks!