Leveraging Multi-path Diversity for Transport Loss Recovery in Data - - PowerPoint PPT Presentation

leveraging multi path diversity for transport loss
SMART_READER_LITE
LIVE PREVIEW

Leveraging Multi-path Diversity for Transport Loss Recovery in Data - - PowerPoint PPT Presentation

Fa Fast and Ca Cautious: Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers Guo Chen Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong (Larry) Luo, Yongqiang Xiong, Xiaoliang Wang, and Youjian


slide-1
SLIDE 1

Fa Fast and Ca Cautious:

Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers

Guo Chen

Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong (Larry) Luo, Yongqiang Xiong, Xiaoliang Wang, and Youjian Zhao

slide-2
SLIDE 2

Motivation

n Services care about the tail flow completion time (tail FCT)

¨ Large number of flows generated in each operation ¨ Overall performance governed by the last completed flows

16/6/25 2

Large-scale web application hosted in Data Center Network (DCN) App Logic App Logic App Logic App Logic App Logic App Logic App Logic App Logic App Logic App Logic

Responding to a user request

slide-3
SLIDE 3

Motivation

n Services care about the tail flow completion time (tail FCT)

¨ Large number of flows generated in each operation ¨ Overall performance governed by the last completed flows

n But packet loss hurts tail FCT

¨ Real case in a Microsoft Azure’s DCN

16/6/25 3

Spine switch 2% random drop rate --> increase of 99th percentile latency of all users

DCN tail latency visualization

[Pingmesh (SIGCOMM’15)] (a) Normal (b) Spine failure

slide-4
SLIDE 4

Outline

n Motivation n Packet Loss in DCN n Impact of Packet Loss n Challenge for Loss Recovery n FUSO Design n Evaluation n Summary

16/6/25 4

slide-5
SLIDE 5

Loss rate and location distribution of lossy links (loss rate > 1%)

Mean loss rate 4% 78% above ToR Similar in 5 days

Packet Loss in DCN

16/6/25 5

1) Loss frequently happens (the overall loss rate is low) 2) Most losses happen in the network instead of the edge

n Loss characteristics

¨ Measured in a Microsoft production DCN during Dec. 1st-5th, 2015

slide-6
SLIDE 6

Packet Loss in DCN

n Reasons causing loss

¨ Congestion loss

Ø

Uneven load-balance

Ø

Incast

¨ Failure loss

Ø

Silent random drop

Ø

Packet black-hole

Bursty; Transient

16/6/25 6

Complex; Hard to detect

Greatly mitigated (e.g., 1%->0.01%)

[Jupiter Rising SIGCOMM’15]

Common & Huge impact

  • n performance

[Pingmesh SIGCOMM’15]

slide-7
SLIDE 7

Outline

n Motivation n Packet Loss in DCN n Impact of Packet Loss

¨ Why loss hurts the tail? ¨ How hard loss hurts?

n Challenge for Loss Recovery n FUSO Design n Evaluation n Summary

16/6/25 7

slide-8
SLIDE 8

How TCP Handles Loss?

n Fast recovery

¨ Wait for certain number of DACKs to

detect the loss and retransmit

8 1-2 Ack 1-2 3-6 DupAck 3 Retran 3

RTT RTT Sender Receiver

slide-9
SLIDE 9

1-2 Ack 1-2 3-6 Retran 3

RTT Timeout Sender Receiver

How TCP Handles Loss?

n Fast recovery

¨ Wait for certain number of DACKs to

detect the loss and retransmit

n Timeout (RTO)

¨ If not enough DACKs return, retransmit

after a timeout

9

RTO >> RTT e.g. RTO=5ms, RTT<100us

[Pingmesh (SIGCOMM’15), DCTCP (SIGCOMM’10)]

slide-10
SLIDE 10

1-2 Ack 1-2 3-6 Retran 3

RTT Timeout Sender Receiver

How TCP Handles Loss?

n Fast recovery

¨ Wait for certain number of DACKs to

detect the loss and retransmit

n Timeout (RTO)

¨ If not enough DACKs return, retransmit

after a timeout

10

RTO >> RTT e.g. RTO=5ms, RTT<100us

[Pingmesh (SIGCOMM’15), DCTCP (SIGCOMM’10)]

Encountering one RTO à dramatically increase the FCT

slide-11
SLIDE 11

Timeout probability of flows with different sizes passing a path with different packet loss rate 10KB(testbed) 100KB(testbed) 100KB(analysis) 10KB(analysis)

Loss Incurs Timeout

  • 1. 1% loss à more than 1% flows timeout
  • 2. Larger flows (e.g. 100KB)

a.

timeout ratio sharply grows when loss rate > 1%

16/6/25 11

99th FCT > RTO 3% loss à ~10% timeout

n A little loss causes enough timeout to hurt the tail FCT

slide-12
SLIDE 12

Timeout probability of flows with different sizes passing a path with different packet loss rate 10KB(testbed) 100KB(testbed) 100KB(analysis) 10KB(analysis)

Loss Incurs Timeout

  • 1. 1% loss à more than 1% flows timeout
  • 2. Larger flows (e.g. 100KB)

a.

timeout ratio sharply grows when loss rate > 1%

16/6/25 12

99th FCT > RTO 3% loss à ~10% timeout

n A little loss causes enough timeout to hurt the tail FCT

To avoid RTO

slide-13
SLIDE 13

Outline

n Motivation n Packet Loss in DCN n Impact of Packet Loss n Challenge for Loss Recovery n FUSO Design n Evaluation n Summary

16/6/25 13

slide-14
SLIDE 14

Challenge for TCP Loss Recovery

n Prior works add aggressiveness to congestion control to do

loss recovery before timeout (RTO)

¨ Tail Loss Probe (TLP)

Ø

transmit one prober after 2RTT

¨ Instant Recovery (TCP-IR)

Ø

generate an FEC packet for every group of packets (up to 16)

Ø

FEC packets also act as probers, delayed 1/4RTT before sent

¨ Proactive/RepFlow

Ø

Duplicate every packet/flow

16/6/25 14

[SIGCOMM’13, RFC 5827] [SIGCOMM’13, RFC 5827] [SIGCOMM’13, INFOCOM’14]

slide-15
SLIDE 15

Challenge for TCP Loss Recovery

n How long to wait before sending recovery packets?

¨ For congestion loss

Ø

Should delay enough in case of worsening congestion

16/6/25 15

Bursty: Lead to multiple consecutive losses

[Incast (WREN’09), DCTCP (SIGCOMM’10)]

slide-16
SLIDE 16

Challenge for TCP Loss Recovery

n How long to wait before sending recovery packets?

¨ For congestion loss

Ø

Should delay enough in case of worsening congestion

¨ For failure loss such as random drop

Ø

Should recover as fast as possible, otherwise already increase the FCT

16/6/25 16

  • Wait 2RTT is too costly
  • Accurate & high-precision RTT measurementis challenging

[TLP SIGCOMM’13, RFC 5827]

slide-17
SLIDE 17

How to accelerate loss recovery as soon as possible, under various loss conditions without causing congestion?

Brief Summary

n Loss easily incurs timeout to hurt the tail n To prevent timeout, prior works add fixed aggressiveness to

recover loss before timeout

n Hard to adapt to various loss conditions

¨ Should be fast for failure loss ¨ Should be cautious for congestion loss

16/6/25 17

slide-18
SLIDE 18

Outline

n Motivation n Packet Loss in DCN n Impact of Packet Loss n Challenge for Loss Recovery n FUSO Design n Evaluation n Summary

16/6/25 18

slide-19
SLIDE 19

n Utilize the “good” paths to proactively conduct loss recovery

for “bad” paths

¨ Leveraging path diversity (multiple paths; a few encounter loss)

n Fast and Cautious

¨ Fast

Ø

Proactive (immediate) recovery for potential packet loss utilizing spare transmission opportunity

¨ Cautious

Ø

Strictly follow congestion control without adding aggressiveness

FUSO: Fast Multi-path Loss Recovery

16/6/25 19

slide-20
SLIDE 20

Receiver Sender

Multi-path Transport Background

16/6/25 20

SF1 SF2 SF3 SF1 SF2 SF3

CWND2

CWNDtotal

CWND1 CWND3

Mu Multi-pa path h Co Congestion Co Control Da Data Di Distribution Su Sub-fl flows: Implicitly/Explicitly ma mapping to physical paths

slide-21
SLIDE 21

Receiver Sender

FUSO

16/6/25 21

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P2 P3 P4 P5 P1

slide-22
SLIDE 22

Receiver Sender

FUSO

16/6/25 22

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P2 P3 P4 P5 P1

slide-23
SLIDE 23

Receiver Sender

FUSO

16/6/25 23

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P2 P3 P4 P5 P1

slide-24
SLIDE 24

Receiver Sender

FUSO

16/6/25 24

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P2 P3 P4 P5 P1

slide-25
SLIDE 25

Receiver Sender

FUSO

16/6/25 25

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P2 P3 P4 P5 P1

slide-26
SLIDE 26

Receiver Sender

FUSO

16/6/25 26

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost

slide-27
SLIDE 27

Receiver Sender

FUSO

16/6/25 27

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P4 P5 P1 P3 P2

Lost

slide-28
SLIDE 28

Receiver Sender

FUSO

16/6/25 28

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P4 P5 P1 P3 P2

Lost

slide-29
SLIDE 29

Receiver Sender

FUSO

16/6/25 29

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P4 P5 P1 P3 P2

Lost

slide-30
SLIDE 30

Receiver Sender

FUSO

16/6/25 30

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P4 P5 P1 P3 P2

Lost ACK P3

slide-31
SLIDE 31

Receiver Sender

FUSO

16/6/25 31

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P4 P5 P1 P3 P2

Lost

slide-32
SLIDE 32

Receiver Sender

FUSO

16/6/25 32

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P4 P5 P1 P3 P2

Lost

slide-33
SLIDE 33

Receiver Sender

FUSO

16/6/25 33

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P1 P3 P2

Lost

P4 P5

slide-34
SLIDE 34

Receiver Sender

FUSO

16/6/25 34

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P1 P3 P2

Lost

P4 P5

slide-35
SLIDE 35

Receiver Sender

FUSO

16/6/25 35

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P1 P3 P2

Lost ACK P1

P4 P5

ACK P4&P5

slide-36
SLIDE 36

Receiver Sender

FUSO

16/6/25 36

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost

slide-37
SLIDE 37

Receiver Sender

FUSO

16/6/25 37

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost Sp Spare CWND No No new data

slide-38
SLIDE 38

Receiver Sender

FUSO

16/6/25 38

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost

Proactive loss recovery

P2

slide-39
SLIDE 39

Receiver Sender

FUSO

16/6/25 39

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost

Proactive loss recovery

P2

“W “Worst” ” sub-fl flow “B “Best” ” sub-fl flow

slide-40
SLIDE 40

Receiver Sender

FUSO

16/6/25 40

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost

Proactive loss recovery

P2

“W “Worst” ” sub-fl flow “B “Best” ” sub-fl flow

P2

slide-41
SLIDE 41

Receiver Sender

FUSO

16/6/25 41

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost

Proactive loss recovery

P2

“W “Worst” ” sub-fl flow “B “Best” ” sub-fl flow

P2

slide-42
SLIDE 42

Receiver Sender

FUSO

16/6/25 42

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost

Proactive loss recovery

P2

“W “Worst” ” sub-fl flow “B “Best” ” sub-fl flow

P2

slide-43
SLIDE 43

Receiver Sender

FUSO

16/6/25 43

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost

Proactive loss recovery

P2

“W “Worst” ” sub-fl flow “B “Best” ” sub-fl flow

P2

Do Done!

slide-44
SLIDE 44

Receiver Sender

Standard MPTCP

16/6/25 44

SF1 SF2 SF3 SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P3 P4 P5 P1 P2

Lost Re Retransmit af after an an RTO

slide-45
SLIDE 45

Sender

FUSO: Path Selection

16/6/25 45

SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P2

Lost Possibilityofencounteringloss

slide-46
SLIDE 46

Sender

FUSO: Path Selection

16/6/25 46

SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P2

Lost “W “Worst” ” sub-fl flow

n “Worst” Sub-flow

¨ With un-ACKed data ¨ Most likely having loss

Un Un-AC ACKed da data Possibilityofencounteringloss

slide-47
SLIDE 47

Sender

FUSO: Path Selection

16/6/25 47

SF1 SF2 SF3

CWNDtotal

CWND1 CWND2 CWND3

P2

Lost “W “Worst” ” sub-fl flow

n “Worst” Sub-flow

¨ With un-ACKed data ¨ Most likely having loss

Possibilityofencounteringloss

n “Best” Sub-flow

¨ With spare CWND ¨ Least likely having loss

Sp Spare CW CWND “Bes est” ” sub-fl flow

slide-48
SLIDE 48

FUSO in 1 Slide

n If (spare CWND) && (no new data)

¨

Utilize the transmission opportunity to proactively recover

¨

Use “good” paths to help “bad” paths

n Multi-path diversity offers many transmission opportunities

¨

“Good” paths have spare window

16/6/25 48

App Data

. . .

P2 Multipath Congestion Control

Send to best Sub-Flow

P1 P5 P6 P7

R

P4 Spare window

...

Spare window

Un-ACKed data

P4

Sender

App Data

R

Recover

P3

Receiver

Sub-Flow 1 Sub-Flow N Sub-Flow 2

P3

Sub-Flow 1 Sub-Flow 2 Sub-Flow N Recovery packets

slide-49
SLIDE 49

FUSO Implementation

n Implemented in Linux kernel; ~900 lines of code

16/6/25 49

https://github.com/1989chenguo/FUSO

slide-50
SLIDE 50

Outline

n Motivation n Packet Loss in DCN n Impact of Packet Loss n Challenge for Loss Recovery n FUSO Design n Evaluation n Summary

16/6/25 50

slide-51
SLIDE 51

Testbed Settings

n Network

¨ 1Gbps fabric & 1Gbps hosts; ECMP routing; ECN enabled

n TCP

¨ Init_cwnd=16; min_RTO=5ms

16/6/25 51

slide-52
SLIDE 52

99th FCT % of flows encountering timeout

better

Testbed Results

n Failure loss

¨ Random-drop

16/6/25 52

Fast

Reducing 99th FCT up to ~82.3% Reducing the timeout flows up to 100% Loss rate: 0.125%-4%

Latency-sensitive flows

slide-53
SLIDE 53

better

Testbed Results

n Congestion loss

¨ Incast

16/6/25 53

Concurrent responses

Performs the best

Cautious

slide-54
SLIDE 54

Testbed Results

n Failure loss & Congestion loss

¨ From failure-loss-dominated to

congestion-loss-dominated

16/6/25 54

Loss rate: 2%

Latency-sensitive flows

Adapt to various loss condition

better Background long flows

slide-55
SLIDE 55

Larger-scale Simulations

n Simulation settings

¨ NS2 simulator; 3-

layer, 4-port FatTree

¨ 40Gbps fabric,

10Gbps host; 64 hosts, 20 switches

¨ Empirical failure

generation

16/6/25 55

Latency-sensitive flows Background long flows

Random failure

slide-56
SLIDE 56

better

Larger-scale Simulations

n Simulation settings

¨

NS2 simulator; 3-layer, 4-port FatTree fabric

¨

40Gbps fabric, 10Gbps host; 64 hosts, 20 switches

¨

Empirical failure generation

16/6/25 56

Reducing the average FCT up to ~60.3% Reducing the 99th FCT up to ~87.4%

slide-57
SLIDE 57

Outline

n Motivation n Packet Loss in DCN n Impact of Packet Loss n Challenge for Loss Recovery n FUSO Design n Evaluation n Summary

16/6/25 57

slide-58
SLIDE 58

Summary

n Loss hurts tail latency

¨

Loss is not uncommon

¨

A little loss leads to enough timeout, hurting the tail

n Challenges for loss recovery

¨

How to accelerate loss recovery under various loss conditions without causing congestion?

n Philosophy for FUSO

¨

To be fast & cautious are equally important

¨

Fast: Proactive loss recovery utilizing spare transmission opportunity, leveraging multipath diversity

¨

Cautious: Strictly follows congestion control without adding aggressiveness

16/6/25 58

slide-59
SLIDE 59

Q&A?

Thanks