Sliding Window Protocol and TCP Congestion Control Simon S. Lam - - PowerPoint PPT Presentation

sliding window protocol and tcp congestion control
SMART_READER_LITE
LIVE PREVIEW

Sliding Window Protocol and TCP Congestion Control Simon S. Lam - - PowerPoint PPT Presentation

Sliding Window Protocol and TCP Congestion Control Simon S. Lam Department of Computer Science Th The University of Texas at Austin U i it f T t A ti TCP Congestion Control (Simon S. Lam) 1 1 Reliable data transfer f important in


slide-1
SLIDE 1

1

Sliding Window Protocol and TCP Congestion Control

Simon S. Lam Department of Computer Science Th U i it f T t A ti The University of Texas at Austin

1 TCP Congestion Control (Simon S. Lam)

slide-2
SLIDE 2

2

Reliable data transfer f

 important in app., transport, link layers  characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

TCP Congestion Control (Simon S. Lam)

2

p y p ( )

slide-3
SLIDE 3

3

Channel Abstractions Channel Abstract ons

 Lossy FIFO channel  Lossy FIFO channel

 delivers a subsequence in FIFO order  example: delivery service provided by a

p y p y physical link

L d i d li i (LRD)

 Lossy, reordering, duplicative (LRD)

channel

 x mpl : d liv

s vic p vid d b IP b

 example: delivery service provided by IP or by

UDP protocol

TCP Congestion Control (Simon S. Lam)

3

slide-4
SLIDE 4

4

Sliding Window Protocol

 Consider an infinite array, Source, at the

sender, and an infinite array, Sink, at the receiver.

send window

Source: 1 2 a–1 a s–1 s

send window

acknowledged unacknowledged

Source:

P1 Sender g g received r + RW – 1

Sink:

next expected P2 Receiver

1 2

r

delivered receive window d d ( )

TCP Congestion Control (Simon S. Lam)

4

SW send window size (s - a ≤ SW) RW receive window size

slide-5
SLIDE 5

5

Sliding Windows in Action Sl d ng W ndows n Act on

 Data unit r has just been received by P2

 Receive window slides forward  Receive window slides forward

 P2 sends cumulative ack with sequence

number it expects to receive next (r+3) number it expects to receive next (r+3)

1 2 a–1 a s–1 s

send window

Source: P1 Sender

unacknowledged acknowledged

Sender r+3 1 2

r

r + RW – 1

Sink: P2 Receiver

next expected

TCP Congestion Control (Simon S. Lam)

5

delivered receive window

Receiver

slide-6
SLIDE 6

6

Sliding Windows in Action Sl d ng W ndows n Act on

 P1 has just received cumulative ack with

3 t t d b r+3 as next expected sequence number

 Send window slides forward

1 2 a–1 a s–1 s

send window

Source: P1 Sender

acknowledged r + RW – 1 next expected

1 2

r

delivered receive window r RW 1

Sink: P2 Receiver

next expected

TCP Congestion Control (Simon S. Lam)

6

delivered receive window

slide-7
SLIDE 7

7

Sliding Window protocol

 Functions provided

 error control (reliable delivery)  in-order delivery  flow and congestion control (by varying send

i d i ) window size)  TCP uses cumulative acks (needed for correctness)  Oth

ki d f k

 Other kinds of acks (to improve performance)

 selective nack  selective ack (TCP SACK)  selective ack (TCP SACK)  bit-vector representing entire state of receive

window (in addition to first sequence number of

TCP Congestion Control (Simon S. Lam)

7

( q window)

slide-8
SLIDE 8

8

Sliding Windows for Lossy FIFO Channels

A ll b f bi i k h d f

 A small number of bits in packet header for

sequence number

 Necessary and sufficient condition for correct  Necessary and sufficient condition for correct

  • peration: SW + RW ≤ MaxSeqNum

 Necessity:

RW receive window size SW send window size

P1 Sender 1 2 a–1 a

send window

Source:

SW send window size

acknowledged unacknowledged

Sink:

next expected

P2 Receiver 1 2

delivered

Sink:

next expected receive window

TCP Congestion Control (Simon S. Lam)

8

slide-9
SLIDE 9

9

Sliding Windows for Lossy FIFO Ch l Channels

 Sufficiency can only be  Interesting special cases

y y demonstrated by using a formal method to prove that the protocol g p

 SW = RW = 1

alternating-bit l that the protocol provides reliable in-

  • rder delivery. See

Sh k d L ACM protocol

 SW = 7, RW = 1

  • ut of order arrivals

Shankar and Lam, ACM TOPLAS, Vol. 14, No. 3, July 1992.

  • ut-of-order arrivals

not accepted, e.g., HDLC y

 SW = RW

TCP Congestion Control (Simon S. Lam)

9

slide-10
SLIDE 10

10

Sliding Windows for LRD Channels Sl d ng W ndows for LRD Channels

 Assumption: Packets have bounded lifetime L  Assumption: Packets have bounded lifetime L  Be careful how fast sequence numbers are

consumed (i.e., by arrival of data to be sent m ( , y f into network) (send rate)× L < MaxSeqNum

 TCP

 32-bit sequence numbers  counts bytes  assumes that datagrams will be discarded by IP

if too old

TCP Congestion Control (Simon S. Lam)

10

if too old

slide-11
SLIDE 11

11

Window Size Controls Sending Rate W g

RTT RTT

time Source 1 2 W 1 2 W data ACKs time Destination 1 2 W 1 2 W

 ~ W packets per RTT when no loss

11 TCP Congestion Control (Simon S. Lam)

slide-12
SLIDE 12

12

Throughput Throughput

 Limit the number of unacked transmitted

k ts i th t k t i d si W packets in the network to window size W M th h t k t /

W

 Max. throughput packets/sec

RTT 

W MSS ×

= bytes/sec

(assuming no loss, MSS denotes maximum segment size)

W MSS RTT ×

(assuming no loss, MSS denotes maximum segment size)

 Where did we apply Little’s Law? Answer : Consider the TCP send buffer

12

Answer : Consider the TCP send buffer

TCP Congestion Control (Simon S. Lam)

slide-13
SLIDE 13

13

Throughput or send rate?

 Previous formula actually provides an upper bound

 Average number in the send buffer is less than W unless

g m ff packet arrival rate to send buffer is infinite

 If a packet is lost in the network with probability p, then

the average time in send buffer is (1

)

O

p RTT p T − × + ×

g ff Since TO > RTT, actual throughput is smaller.  Th

th hp t f h st’s TCP s nd b ff is th

(1 )

O

p RTT p T × + ×

 The throughput of a host s TCP send buffer is the

host’s send rate into the network (including

  • riginal transmissions and retransmissions)

13 TCP Congestion Control (Simon S. Lam)

slide-14
SLIDE 14

14

Fast Retransmit

 Time-out period often

relatively long:  If sender receives 3

duplicate ACKs for

relatively long

 long delay before

resending lost packet  Detect lost segments

duplicate ACKs for the same data, it supposes that

 Detect lost segments

via duplicate ACKs

 Sender often sends

supposes that segment after ACKed data was

many segments back-to- back

 If segment is lost,

h ll l k l

lost:

 fast retransmit:

resend se ment

there will likely be many duplicate ACKs.

resend segment before timer expires

TCP Congestion Control (Simon S. Lam)

14

slide-15
SLIDE 15

15

Host A Host B X egment X r 2nd se eout for time

time R di t ft t i l d li t ACK

TCP Congestion Control (Simon S. Lam)

15

Resending a segment after triple duplicate ACK without waiting for timeout

slide-16
SLIDE 16

16

TCP Flow Control F

receiver: explicitly informs sender of (dynamically sender won’t overrun

flow control

y y changing) amount of free buffer space

 RcvWindow field in

receiver’s buffers by transmitting too much, too fast TCP segment header sender: keeps amount of sender keeps amount of transmitted, unACKed data less than most recently received y RcvWindow value

buffer at receive side of a TCP connection

TCP Congestion Control (Simon S. Lam)

16

slide-17
SLIDE 17

17

Causes/costs of congestion: scenario

 four senders  multi-hop paths

λin

Q: what happens as and increase at many λ'in

 Timeout & retransmit

increase at many senders?

Host A λin : original data

positive feedback  instability

finite shared output li k b ff

in

g λ'in : original data plus retransmitted data

link buffers

Host B

λout

TCP Congestion Control (Simon S. Lam)

17

slide-18
SLIDE 18

18

Effect of Congestion Effect of Congest on

 W too big for many flows -> congestion  Packet loss -> transmissions on links a packet has  Packet loss -> transmissions on links a packet has

traversed prior to loss are wasted

 Congestion collapse due to too many retransmissions

and too much wasted transmission capacity

 October 1986, Internet had its first congestion

collapse collapse

goodput

upper bound upper bound collapse desired

18

load (aggregate send rate)

TCP Congestion Control (Simon S. Lam)

collapse

slide-19
SLIDE 19

19

TCP Window Control TCP W ndow Control

 Receiver flow control  Receiver flow control

 Avoid overloading receiver  rcvwindow: receiver’s advertised window (also rwnd)  Receiver sends rcvwindow to sender  Receiver sends rcvwindow to sender

 Network congestion control

 Sender tries to avoid overloading network  It infers network congestion from “loss indications”  congwin: congestion window (also cwnd)

 Sender sets W = min (congwin, rcvwindow)

19 TCP Congestion Control (Simon S. Lam)

slide-20
SLIDE 20

20

TCP Congestion Control

 end-to-end control (no network

assistance)

 sender limits transmission:

How does sender determine CongWin?

 loss event = timeout or  sender limits transmission: LastByteSent-LastByteAcked ≤ CongWin  Roughly the send buffer’s  loss event timeout or

3 duplicate acks

 TCP sender reduces

CongWin after a loss

 Roughly, the send buffer s

CongWin after a loss event three mechanisms:

l

h h

CongWin /

where CongWin is in bytes

 slow start  reduce to 1 segment

after timeout event AIMD ( ddi i

i

throughput ≤ CongWin

RTT bytes/sec

 AIMD (additive increase

multiplicative decrease)

Note: For now consider RcvWindow to be very large such that the send window size is l t C Wi

TCP Congestion Control (Simon S. Lam)

20

equal to CongWin.

slide-21
SLIDE 21

21

TCP Slow Start TCP Slow Start

 Probing for usable bandwidth  Probing for usable bandwidth  When connection begins CongWin = 1 MSS  When connection begins, CongWin = 1 MSS

 Example: MSS = 500 bytes & RTT = 200 msec  initial rate = 2500 bytes/sec = 20 kbps

y p  available bandwidth may be >> MSS/RTT

y

 desirable to quickly ramp up to a higher rate

TCP Congestion Control (Simon S. Lam)

21

slide-22
SLIDE 22

22

TCP Slow Start (more)

 When connection

begins, increase rate exponentially until Host A Host B exponentially until first loss event or “threshold”

d bl i

RTT

 double CongWin every

RTT

 done by incrementing

CongWin by 1 MSS for CongWin by 1 MSS for every ACK received  Summary: initial rate

is slow but ramps up is slow but ramps up exponentially fast

time

TCP Congestion Control (Simon S. Lam)

22

slide-23
SLIDE 23

23

Congestion avoidance state & responses to loss events responses to loss events

Q: If no loss, when should the exponential increase switch to linear?

12 14

w size

TCP Reno

3 dup ACKs switch to linear? A: When CongWin gets to current value of threshold

6 8 10

n window

egments)

Reno

Implementation:

 For initial slow start,

h h ld l

2 4

ngestion

(se

threshold TCP Tahoe

threshold is set to a large value (e.g., 64 Kbytes)

 Subsequently, threshold is

bl

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Transmission round

co variable

 At a loss event, threshold is

set to 1/2 of CongWin just b f l

Tahoe Reno

Note: For simplicity, CongWin is in number

TCP Congestion Control (Simon S. Lam)

23

before loss event p y g

  • f segments in the above graph.
slide-24
SLIDE 24

24

Rationale for Reno’s Fast Recovery f F y

 After 3 dup ACKs:

 CongWin is cut in half

 3 dup ACKs indicate g f (multiplicative decrease)

 window then grows linearly

(additive increase)  3 dup ACKs indicate

network capable of delivering some segments

(add t ve ncrease)

 But after timeout event:

 CongWin is set to 1 MSS

instead;  timeout occurring

before 3 dup ACKs is

instead;

 window then grows

exponentially to threshold, then grows linearly

before 3 dup ACKs is “more alarming”

then grows linearly Additive Increase Multiplicative Decrease (AIMD)

TCP Congestion Control (Simon S. Lam)

24

slide-25
SLIDE 25

25

TCP Reno (example scenario) ( mp )

CongWin Timeout 3 dupACKs

halved

th sh ld h d

Initial slow start

t

threshold reached during slow start

In this example, 3 dupACKs during slow start before reaching initial threshold

24

TCP Congestion Control (Simon S. Lam)

f g

slide-26
SLIDE 26

26

Example: FR/FR entry and exit

9

Example FR/FR entry and ex t

S 1 2 3 4 5 6 8 7 1 10 11 9 time S time R 1 2 3 4 5 6 8 7 1

Exit FR/FR 1 1 1 1 1 1 1

10 11

loss

9

9 4

time R

cwnd 8 ssthresh 7 4 4 4 4 11 deflate cwnd 4

 Above scenario: Packet 1 is lost, packets 2, 3, and

4 are received; 3 dupACKs with seq. no. 1 returned ssthresh 4 4 4

 Fast retransmit

 Retransmit packet 1 upon 3 dupACKs

 Fast recovery (in steps)

26

y ( p )

 Inflate cwnd with #dupACKs such that new packets 9,

10, and 11 can be sent while repairing loss

TCP Congestion Control (Simon S. Lam)

slide-27
SLIDE 27

27

FR/FR (in more detail) FR/FR ( n more deta l)

 Enter FR/FR after 3 dupACKs

p

 Set ssthresh ← max(flightsize/2, 2)  Retransmit lost packet  Set cwnd ← ssthresh + #dupACKs (window inflation)  Set cwnd ← ssthresh + #dupACKs (window inflation)  Wait till W=min(rwnd, cwnd) is large enough; transmit

new packet(s) O d ACK (1 RTT l t ) s t d ssth sh

 On non-dup ACK (1 RTT later), set cwnd ← ssthresh

(window deflation)  Enter Congestion Avoidance

27 TCP Congestion Control (Simon S. Lam)

slide-28
SLIDE 28

28

Summary: TCP Congestion Control (Reno) Summary TCP Congestion Control (Reno)

 When CongWin is below Threshold, sender in

slow-start phase window grows exponentially (until slow-start phase, window grows exponentially (until

loss event or exceeding threshold).  When CongWin is above Threshold, sender is in  When CongWin is above Threshold, sender is in

congestion-avoidance phase, window grows linearly.

 When a triple duplicate ACK occurs Threshold  When a triple duplicate ACK occurs, Threshold

set to CongWin/2 and CongWin set to Threshold (also fast retransmit)

 When timeout occurs, Threshold set to

CongWin/2 and CongWin is set to 1 MSS.

TCP Congestion Control (Simon S. Lam)

28

slide-29
SLIDE 29

29

Successive Timeouts Success ve T meouts

 When there is another timeout, double the timeout

value value

 Keep doing so for each additional loss-retransmission

 Exponential backoff up to

max timeout value equal max timeout value equal to 64 times initial timeout value (There are other variations.)

29

Note: red line in figure denotes first timeout

TCP Congestion Control (Simon S. Lam)

slide-30
SLIDE 30

30

AIMD in steady state (when no timeout)

multiplicative decrease: cut CongWin in half after loss event (3 dup additive increase: increase CongWin by 1 MSS every RTT in after loss event (3 dup acks) 1 MSS every RTT in the absence of any loss event: probing

24 Kbytes congestion window

What limits the average window size (or throughput)?

16 Kbytes 8 Kbytes

TCP Congestion Control (Simon S. Lam)

30

time

Long-lived TCP connection

slide-31
SLIDE 31

31

First approximation

  • M. Mathis, et al., “The Macroscopic Behavior of the TCP Congestion

Avoidance Algorithm,”ACM Computer Communicatons Review, 27(3), 1997.

 No slow-start, no timeout, long-lived TCP

c nn cti n connection

 Independent identically distributed “periods”  Three dupACKs are received in a round with  Three dupACKs are received in a round with

probability p

Ave.

31 TCP Congestion Control (Simon S. Lam)

# of RTTs

slide-32
SLIDE 32

32

Geometric Distribution

d d l l f l h b b l Independent trials - a trial fails with probability p

  • Ave. no. of transmissions to get first “failure”

1

(1 )i b

∞ ∞

 

1 1 1 1

(1 ) (1 )

i i i i i

n ib i p p p i p

− = = ∞ −

= = − = −

  

1

(1 ) (1 ) (1 )

i i i

p i p d d p p p p d d

= ∞ ∞

= − = − − = − −

  

1 2

( ) ( ) 1 1 1 1

i i

p p p p dp dp d p p dp p p

= =

= − = − +

 

1 1 1/ dp p p p + =

Ave no of trials to get first “success” is

32

  • Ave. no. of trials to get first success is

1/(1-p)

TCP Congestion Control (Simon S. Lam)

slide-33
SLIDE 33

33

First approximation (cont.)

send rate (in packets/sec)

F rst approx mat on (cont.)

 Average number of packets delivered in

2

3

  • no. of packets/period

8 time per period W W RTT = =    

packets delivered in

  • ne period (area under
  • ne saw-tooth)

time per period 2 1/ 1 3 RTT p       = =

2 2 2

1 3 2 2 2 8 W W W     + =        

2 2 3 RTT p RTT p = =      

 Average number of packets sent per period is 1/p  Equate the two and solve f W t

33

for W, we get

TCP Congestion Control (Simon S. Lam)

8 3 W p =

slide-34
SLIDE 34

34

TCP ACK generation [RFC 1122, RFC 2581] TCP ACK generat on [RFC

, RFC 58 ] Event at Receiver TCP Receiver action

Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK expected seq # already ACKed Arrival of in-order segment with expected seq #. One other send ACK Immediately send single cumulative ACK, ACKing both in-order segments segment has ACK pending Arrival of out-of-order segment higher-than-expect seq # Immediately send duplicate ACK, indicating seq. # of next expected byte higher than expect seq. # . Gap detected Arrival of segment that ti ll l t l fill indicating seq. # of next expected byte Immediate send ACK, provided that t t t t l d f

34

partially or completely fills gap segment starts at lower end of gap

TCP Congestion Control (Simon S. Lam)

slide-35
SLIDE 35

35

Receiver implements Delayed ACKs mp m D y K

 Receiver sends one ACK for every two packets

received -> each saw-tooth is WxRTT wide d h

2

  • > area under a saw-tooth is

2

3 1 4 W p =

 Send rate is

1/ 1/ 1 3 4 4 / (3 ) p p RTT W RTT p RTT p = = ⋅ ⋅

 One ACK for every b packets received -> send rate

is

( ) p p

1 3 2 RTT bp

35

p

TCP Congestion Control (Simon S. Lam)

slide-36
SLIDE 36

36

Challenges in the future g f

 TCP average throughput (approximate) in terms of

loss rate, p

1.22 MSS RTT p ⋅

for b = 1

p

 Example: 1500-byte segments, 100ms RTT, to get

10 Gbps throughput loss rate needs to be very low 10 Gbps throughput, loss rate needs to be very low

p = 2x10-10

 New versions of TCP needed for connections with

large delay-bandwidth product

 E g

data center networks (local global)

36 TCP Congestion Control (Simon S. Lam)

 E.g., data center networks (local, global)

slide-37
SLIDE 37

37

A more detailed model

Reference:

  • J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP

Th h t A Si l M d l d it E i i l V lid ti ” Throughput: A Simple Model and its Empirical Validation,” Proceedings ACM SIGCOMM, 1998.

TCP Congestion Control (Simon S. Lam) 37

slide-38
SLIDE 38

38

Motivation Mot vat on

 Previous formulas not so accurate when  Previous formulas not so accurate when

loss rates are high

 TCP traces show that there are more loss

indications due to timeouts (TO) than due ( ) to triple dupACKs (TD)

38 TCP Congestion Control (Simon S. Lam)

slide-39
SLIDE 39

39

AIMD with Timeouts

i l d k  No slow start triple dup acks  b = 1 (no delayed ack)

39 TCP Congestion Control (Simon S. Lam)

slide-40
SLIDE 40

40

Problem 3 in HW #2

 no triple duplicate Acks  packet loss (timeout) with probability p

Simplified:

 packet loss (timeout) with probability p  timeout interval fixed at T0 after each loss First success in next cycle

TCP Congestion Control (Simon S. Lam) 40

slide-41
SLIDE 41

41

The End The End

41 TCP Congestion Control (Simon S. Lam)