RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 - PowerPoint PPT Presentation

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 Yuchung Cheng Neal Cardwell Nandita Dukkipati Google IETF97: Seoul, Nov 2016

SYN What’s RACK (Recent ACK)? SYN/ACK ACK Key Idea: time-based loss inferences (not packet or P1 sequence counting) P2 If a packet is delivered out of order, then packets ● sent chronologically before it are either lost or SACK of P2 reordered Retransmit P1 Wait RTT/4 before retransmitting in case the ● unacked packet is just delayed. RTT/4 is empirically Expect ACK of P1 determined by then … wait ACK of P1/P2 RTT/4 in case P1 Conceptually RACK arms a (virtual) timer on every ● is reordered packet sent. The timers are updated by the latest RTT measurement.

SYN New in RACK: Tail Loss Probe (TLP) SYN/ACK Problem ● ACK P0 Tails drops are common on request response ○ P1 traffic After 2 RTTs... P2 ACK Tail drops lead to timeouts which is often 10x ○ send TLP to get SACK to start longer than fast recovery RACK recovery 70% of losses on Google.com recovered via ○ of a tail loss TLP: P2 timeouts Goal ● Reduce tail latency of request response SACK of P2 ○ transactions Approach ● Retransmit P1 Convert RTOs to fast recovery ○ Retransmit the last packet in 2 RTTs to trigger ○ RACK-based Fast Recovery ACK of P1/P2 draft-dukkipati-tcpm-tcp-loss-probe (expired 2013) ● Past presentations @ IETF 87 86 85 84 ○ 3 Previously depended on non-standard FACK ○

Why RACK + TLP? Problems in existing recovery (e.g., wait for 3 dupacks to start the repair process) 1. Poor performance Losses on short flows, tail losses, lost retransmit often resort to timeouts ○ Work poorly with common reordering scenarios ○ e.g. Last pkt is delivered before the first N-1 pkts are delivered. Dupack threshold == N-1 ■ 2. Complex Many additional heuristics case-by-case ○ RFC5681, RFC6675, RFC5827, RFC4653, RFC5682, FACK, thin-dupack (Linux has all!) ○ RACK + TLP’s goal is to solve both problems: performant and simple recovery! 4

Performance impact A/B test on Google.com in Western-Europe for 3 days in Oct 2016 Short flows: timeout-driven repair is ~3.6x ack-driven repairs ● A: RFC3517 (conserv. sack recovery) + RFC5827 (early retransmit) + F-RTO ● B: RACK + TLP + F-RTO ● Impact -41% RTO-triggered recoveries ● -23% time in recovery, mostly benefited from TLP ● +2.6% data packets (TLP packets) ● >30% TLP are spurious as indicated by DSACK ○ TODO: poor connectivity regions. Compare w/ RACK + TLP only 5

Timeouts can destroy throughput 20ms RTT, 10Gbps, 1% random drop, BBR congestion control A: w/ RACK: lost retransmit repaired in 1 RTT Two tests overlaid: A: 9.6Gbps w/ RACK B: 5.4Gbps w/o RACK B: w/o RACK: lost Overlaid time-seq graphs of A & B retransmit every While line: sequence sent 10000 packets Green line: cumulative ack received causing timeout Purple line: selective acknowledgements Yellow line: highest receive window allows 6 Red dots: retransmission

RACK + TLP fast loss recovery example ACK of loss probe triggers RACK to retransmit rest ACK of 2nd loss probe RACK reo_timer fires (assuming cwnd==3) triggers RACK to after RTT/4 to Send loss probe retransmit the rest retransmit the rest after 2*RTT Data / RTX Loss probe ACK 7 Timeline

w/o RACK+TLP: slow repair by timeout (diagram assumes RTO=3*RTT for illustration) w/ RACK + TLP (same from prev. slide) Data / RTX Loss probe ACK 8

TLP discussions Why retransmit the last packet instead of the first packet (SND.UNA)? ● When only one packet is in flight ● Receiver may delayed the ACK: 2*RTT is too aggressive? ○ 1.5RTT + 200ms ■ TLP (retransmit the packet) may masquerade a loss event ○ Draft suggest a (slightly complicated) detection mechanism ■ Do we really care 1-pkt loss event? ■ How many TLPs before RTO? ● Draft uses 1, but more may help? ○ Too many timers (RACK reo_timer, TLP timer, RTO) ● Can easily implemen with one real timer b/c only one is active at any time ○ 9

WIP: extend RACK + TLP to mitigating spurious RTO retransmission storm Retransmission storm induced by spurious RTO 1. (Spurious) timeout! Mark all packets (P1… P100)lost, retransmit P1 2. ACK of original P1, retransmit P2 P3 spuriously 3. ACK of original P2, retransmit P4 P5 spuriously 4. … End up spuriously retransmitting all a. Double the bloat and queue 10

Extend RACK + TLP to mitigating spurious RTO retransmission storm original data (false) Rtx data Retransmission storm induced by spurious RTO 1. (Spurious) timeout! Mark all packets (P1… P100)lost, retransmit P1 2. ACK of original P1, retransmit P2 P3 spuriously 3. ACK of original P2, retransmit P4 P5 spuriously 4. … End up spuriously retransmitting all a. Double the bloat and queue Time-series of bytes received on Chrome loading many images in parallel from pinterests.com: incast -> delay spikes -> false RTOs -> spurious RTX storms 11

Extend RACK + TLP to mitigating spurious RTO retransmission storm Retransmission storm induced by spurious RTO Extending RACK + TLP to RTOs could save this! 1. (Spurious) timeout! 1. (Spurious) timeout! Mark all packets (P1… P100)lost, retransmit P1 Mark first packet (P1) lost, retransmit P1 2. ACK of original P1, retransmit P2 P3 spuriously 2. ACK of original P1, retransmit P99 and P100 (TLP) 3. ACK of original P2, retransmit P4 P5 spuriously 3. ACK of original P2 4. … End up spuriously retransmitting all ==> never retransmitted P2 so stop! a. Double the bloat and queue (If the timeout is genuine, step 3 would receive ACK of P99 and P100, then RACK would repair P2 … P 98) 12

RACK + TLP as a new integrated recovery Conceptually more intuitive (vs N dupacks mean loss) ● ACK-driven repairs as much as possible (even lost retransmits) ● Timeout-driven repairs as the last resort ● Timeout can be long and conservative ○ End RTO tweaking game risking falsely resetting cwnd to 1 ○ Robust under common reordering (traversing slightly different paths or out-of-order delivery in wireless) ● Experimentation: implemented as a supplemental loss detection ● Progressively replace existing conventional approaches ○ In Linux 4.4, Windows 10/Server 2016, FreeBSD/NetFlix ○ Please help review the draft and share any data and implementation experiences on tcpm list! ● 13

Backup slides 14

RACK + TLP Example: tail loss + lost retransmit (slide 7 - 15) Packet Seq. Time 15

TLP retransmit the tail, soliciting an ACK/SACK Packet Seq. TLP SACK 2RTT Time 16

RACK detects first 3 packets are lost from the ACK/SACK, and retransmits Packet Seq. TLP SACK 2RTT Lost Packet Time 17

After 2RTT send a TLP again Packet Seq. TLP SACK 2RTT Lost Packet (Need to update draft-02 Time 18 to probe in recovery)

The TLP solicits another ACK/SACK Packet Seq. TLP SACK 2RTT Lost Packet Time 19

The ACK/SACK let RACK detect first two retransmits are lost and retransmit Packet them (again) Seq. TLP SACK 2RTT Lost Packet Time 20

The new ACK/SACK indicates 1st packet is lost for the 3rd time Packet Seq. TLP SACK 2RTT Lost Packet Time 21

After waiting, RACK detects the lost retransmission and retransmits again Packet Seq. TLP SACK 2RTT Lost Packet Time 22

All acked and repaired: loss rate = 8/4 = 200%! Packet Seq. TLP SACK 2RTT Lost Packet Time 23

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 - PowerPoint PPT Presentation

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 Yuchung Cheng Neal Cardwell Nandita Dukkipati Google IETF97: Seoul, Nov 2016 SYN Whats RACK (Recent ACK)? SYN/ACK ACK Key Idea: time-based loss inferences (not packet or P1

Non-Congestion Robustness (NCR) for TCP draft-ietf-tcpm-draft-ietf-tcpm-tcp-dcr-03.txt IETF 62 -

Tail Loss Probe (TLP) Converting RTOs to fast recoveries draft-dukkipati-tcpm-tcp-loss-probe-00

IETF Status at IETF 85 Russ Housley IETF Chair 1 IETF 85 Participants l 1098 people l 195

Insights into Laminar TCP draft-mathis-tcpm-laminar-tcp-01

Rack in Rails 3 <http://twitter.com/rtomayko> Ryan Tomayko GitHub Rack (Core Team)

TCP Fast Open draft-cheng-tcpm-fastopen-00.txt Yuchung Cheng, Jerry Chu, Sivasankar

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, 2015 Rack-Scale Computers

RFC 2833 bis draft-ietf-avt-rfc2833bis-08.txt draft-ietf-avt-rfc2833bisdata-01.txt

IETF 74 DHC draft-dhankins-softwire-tunnel-option draft-ietf-dhc-option-guidelines

DRAFT DRAFT DRAFT DRAFT DRAFT

RACK for SCTP Felix Weinrank Michael Txen Erwin P. Rathgeb Agenda A brief introdcution

Booster Fast Loss Monitoring PIP Booster Workshop R.J. Tesarek 11/23/15 1 Fast Loss Monitor

Increasing TCPs Initial Window draft-hkchu-tcpm-initcwnd-00.txt H.K. Jerry Chu -

Laminar TCP and Related Problems draft-mathis-tcpm-laminar-tcp-01 Matt Mathis

Welcome to the IETF! Would you like instructions? Mike StJohns IETF 97 Seoul, South Korea 1

EMU IETF 74 Note Well Any submission to the IETF intended by the Contributor for publication as

Paper presentation Ultra-Portable Devices Paper: Hooman Darabi, Janice Chiu, A noise

motivation and context: exotic states of QCD spectrum phenomenology and formalism:

S-Matrix Uniqueness from Soft Theorems Laurentiu Rodina IPhT, CEA Saclay May 17, 2018 Based on

A First Course in Digital Communications Ha H. Nguyen and E. Shwedyk February 2009 A First

End-to-End Protocols Guevara Noubir Textbook: Computer Networks: A

Metastability in irreversible diffusion processes and stochastic resonance Joint work with Nils

Foundations of Chemical Kinetics Lecture 31: Kramers theory Marc R. Roussel Department of

A negative ion TPC with GridPix readout C. Ligtenberg , M. van Beuzekom, Y. Bilevych, K. Desch, H.

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 - PowerPoint PPT Presentation

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 Yuchung Cheng Neal Cardwell Nandita Dukkipati Google IETF97: Seoul, Nov 2016 SYN Whats RACK (Recent ACK)? SYN/ACK ACK Key Idea: time-based loss inferences (not packet or P1

Non-Congestion Robustness (NCR) for TCP draft-ietf-tcpm-draft-ietf-tcpm-tcp-dcr-03.txt IETF 62 -

Tail Loss Probe (TLP) Converting RTOs to fast recoveries draft-dukkipati-tcpm-tcp-loss-probe-00

IETF Status at IETF 85 Russ Housley IETF Chair 1 IETF 85 Participants l 1098 people l 195

Insights into Laminar TCP draft-mathis-tcpm-laminar-tcp-01

Rack in Rails 3 &lt;http://twitter.com/rtomayko&gt; Ryan Tomayko GitHub Rack (Core Team)

TCP Fast Open draft-cheng-tcpm-fastopen-00.txt Yuchung Cheng, Jerry Chu, Sivasankar

Do we need Rack-Scale Coordination? Alysson Bessani 1 April 21th, 2015 Rack-Scale Computers

RFC 2833 bis draft-ietf-avt-rfc2833bis-08.txt draft-ietf-avt-rfc2833bisdata-01.txt

IETF 74 DHC draft-dhankins-softwire-tunnel-option draft-ietf-dhc-option-guidelines

DRAFT DRAFT DRAFT DRAFT DRAFT

RACK for SCTP Felix Weinrank Michael Txen Erwin P. Rathgeb Agenda A brief introdcution

Booster Fast Loss Monitoring PIP Booster Workshop R.J. Tesarek 11/23/15 1 Fast Loss Monitor

Increasing TCPs Initial Window draft-hkchu-tcpm-initcwnd-00.txt H.K. Jerry Chu -

Laminar TCP and Related Problems draft-mathis-tcpm-laminar-tcp-01 Matt Mathis

Welcome to the IETF! Would you like instructions? Mike StJohns IETF 97 Seoul, South Korea 1

EMU IETF 74 Note Well Any submission to the IETF intended by the Contributor for publication as

Paper presentation Ultra-Portable Devices Paper: Hooman Darabi, Janice Chiu, A noise

motivation and context: exotic states of QCD spectrum phenomenology and formalism:

S-Matrix Uniqueness from Soft Theorems Laurentiu Rodina IPhT, CEA Saclay May 17, 2018 Based on

A First Course in Digital Communications Ha H. Nguyen and E. Shwedyk February 2009 A First

End-to-End Protocols Guevara Noubir Textbook: Computer Networks: A

Metastability in irreversible diffusion processes and stochastic resonance Joint work with Nils

Foundations of Chemical Kinetics Lecture 31: Kramers theory Marc R. Roussel Department of

A negative ion TPC with GridPix readout C. Ligtenberg , M. van Beuzekom, Y. Bilevych, K. Desch, H.

Rack in Rails 3 <http://twitter.com/rtomayko> Ryan Tomayko GitHub Rack (Core Team)