 
              Responding to Spurious Timeouts in TCP Andrei Gurtov University of Helsinki Reiner Ludwig Ericsson Research
Outline • Motivation • Spurious Timeouts in TCP • Robustness to Packet Losses • Undoing Congestion Control • Adapting the Retransmit Timer • Performance Evaluation • Conclusions 2/15
Motivation • Delay variation in wireless 50000 45000 networks 40000 – Cell reselections in GPRS 35000 Sequence Number (B) last 3-15 sec 30000 25000 – Bandwidth oscillation in 20000 CDMA2000 15000 – Link-layer persistent error 10000 Sn d_D ata recovery Sn d_Ack 5000 0 • Deployed aggressive 500 510 520 530 540 550 560 570 580 Tim e of D ay (s) retransmission timers A TCP trace shows spurious – 10 ms granularity and 200 retransmissions caused by two cell ms minimum in Linux TCP reselections in a live GPRS network. – Solaris 3/15
Spurious Timeouts in TCP • Spurious timeouts hurt TCP performance – Unnecessary retransmissions during go- back-N – Disrupted congestion control • In a short run, slow start causes congestion • In a long run, underutilization due to reduced ssthresh 4/15
Spurious Timeouts in TCP • Detecting spurious timeouts – Eifel, F-RTO, etc… • Response after detecting a spurious timeout – Our focus 5/15
Main Issues w.r.t. Response 1. Robustness to packet losses – Danger of genuine timeouts 2. How to restore the congestion control state – Does a full restore of cwnd and ssthresh cause a burst? – Do partial restore options perform well? 3. How to adapt the retransmit timer – To avoid clogging the network with unnecessary retransmissions in the future 6/15
1. Robustness to Packet Losses • State-of-the-art TCP is often sufficient – Fast Retransmit+ Sack+Limited Transmit • Heavy losses do trigger genuine timeouts – TCP gets low throughput – Cannot adapt RTO to a more conservative level • Solutions – FACK works well but not with reordering – NewReno+Sack works almost as well and appears safe for the Internet 7/15
2. Undoing Congestion Control • Full undo – too aggressive? – Appropriate, no bursts observed • Partial undo sets the sender idle for a while – The flight size is higher than the reduced cwnd – The ACK clock is can be lost • A new proposal: use the ACK clock but in congestion avoidance – Ssthresh=cwnd_old, cwnd=ssthresh 8/15
TCP RTO • TCP does not take samples from delayed segments (Karn algorithm) • TCP with timestamps can do that – RTO is more conservative but decays quite fast • TCP with Eifel uses timestamps – Already more conservative than the standard TCP – Also, maintains a larger window that results into higher RTT and higher RTO 9/15
3. Adapting RTO • Upon a spurious timeout – Reseed: initialize SRTT and RTTVAR with new sample (history discard) – Back-off: keep the exponential back-off count – Min++: increase the minimum RTO (by 1 sec) • Reset to the standard timer upon a geniune timeout 10/15
Performance Evaluation • ns2, dumbbell with TCP and CBR sources – 3G or satellite link: 2 Mbps, 400 ms RTT – Periodic delay spikes • 250 % gain in throughput when delay spikes occur on uncongested path (without CBR) • TCP fairness does not suffer because response to packet losses is unchanged 11/15
Robustness to Packet Losses: TCPs with CBR 1 0.9 0.8 0.7 0.6 Download Time ormalized Segments Sent 0.5 Spurious RTOs 0.4 Genuine RTOs N 0.3 0.2 0.1 0 std eifel std eifel std eifel Reno-SACK NewReno-SACK FACK 12/15
Undo of Congestion Control: TCP FACK with CBR 1 0.9 0.8 0.7 0.6 Full Normalized Partial 0.5 None 0.4 New partial 0.3 0.2 0.1 0 Download Time Segments Sent Spurious RTOs Genuine RTOs 13/15
Adopting RTO: TCP FACK with CBR 1 0.95 0.9 0.85 0.8 Std Normalized Reseed 0.75 Back-off Min++ 0.7 0.65 0.6 0.55 0.5 Download Time Segments Sent Spurious RTOs Genuine RTOs 14/15
Summary • An update of TCP sender improving performance over paths with variable delays • Up to 250% throughput gain on links with a high bandwidth-delay product • Adequate performance on congested paths – NewReno-SACK is robust to packet losses • Full restore of cong. control after a spurious timeout is ok • Using back-offs or increasing the min RTO can reduce the number of spurious RTOs by 40% with only slightly lower throughput • We have some real measurements for 2.5G links 15/15
Recommend
More recommend