Transport Layer (TCP/UDP) Where we are in the Course Moving on up - - PowerPoint PPT Presentation

transport layer tcp udp where we are in the course
SMART_READER_LITE
LIVE PREVIEW

Transport Layer (TCP/UDP) Where we are in the Course Moving on up - - PowerPoint PPT Presentation

Transport Layer (TCP/UDP) Where we are in the Course Moving on up to the Transport Layer! Application Transport Network Link Physical CSE 461 University of Washington 2 Recall Transport layer provides end-to-end connectivity across


slide-1
SLIDE 1

Transport Layer (TCP/UDP)

slide-2
SLIDE 2

Where we are in the Course

  • Moving on up to the Transport Layer!

CSE 461 University of Washington 2

Physical Link Network Transport Application

slide-3
SLIDE 3

Recall

  • Transport layer provides end-to-end connectivity

across the network

CSE 461 University of Washington 3

Router Host Host

TCP IP 802.11 app IP 802.11 IP Ethernet TCP IP Ethernet app

slide-4
SLIDE 4

Recall (2)

  • Segments carry application data across the network
  • Segments are carried within packets within frames

CSE 461 University of Washington 4

802.11 IP TCP App, e.g., HTTP Segment Packet Frame

slide-5
SLIDE 5

Transport Layer Services

  • Provide different kinds of data delivery across the

network to applications

CSE 461 University of Washington 5

Unreliable Reliable Messages Datagrams (UDP) Bytestream Streams (TCP)

slide-6
SLIDE 6

Comparison of Internet Transports

  • TCP is full-featured, UDP is a glorified packet

CSE 461 University of Washington 6

TCP (Streams) UDP (Datagrams) Connections Datagrams Bytes are delivered once, reliably, and in order Messages may be lost, reordered, duplicated Arbitrary length content Limited message size Flow control matches sender to receiver Can send regardless

  • f receiver state

Congestion control matches sender to network Can send regardless

  • f network state
slide-7
SLIDE 7

Socket API

  • Simple abstraction to use the network
  • The “network” API (really Transport service) used to write

all Internet apps

  • Part of all major OSes and languages; originally Berkeley

(Unix) ~1983

  • Supports both Internet transport services (Streams

and Datagrams)

CSE 461 University of Washington 7

slide-8
SLIDE 8

Socket API (2)

  • Sockets let apps attach to the local network at

different ports

CSE 461 University of Washington 8

Socket, Port #1 Socket, Port #2

slide-9
SLIDE 9

Socket API (3)

  • Same API used for Streams and Datagrams

CSE 461 University of Washington 9

Primitive Meaning SOCKET Create a new communication endpoint BIND Associate a local address (port) with a socket LISTEN Announce willingness to accept connections ACCEPT Passively establish an incoming connection CONNECT Actively attempt to establish a connection SEND(TO) Send some data over the socket RECEIVE(FROM) Receive some data over the socket CLOSE Release the socket

Only needed for Streams To/From for Datagrams

slide-10
SLIDE 10

Ports

  • Application process is identified by the tuple IP

address, protocol, and port

  • Ports are 16-bit integers representing local “mailboxes”

that a process leases

  • Servers often bind to “well-known ports”
  • <1024, require administrative privileges
  • Clients often assigned “ephemeral” ports
  • Chosen by OS, used temporarily

CSE 461 University of Washington 10

slide-11
SLIDE 11

Some Well-Known Ports

CSE 461 University of Washington 11

Port Protocol Use 20, 21 FTP File transfer 22 SSH Remote login, replacement for Telnet 25 SMTP Email 80 HTTP World Wide Web 110 POP-3 Remote email access 143 IMAP Remote email access 443 HTTPS Secure Web (HTTP over SSL/TLS) 543 RTSP Media player control 631 IPP Printer sharing

slide-12
SLIDE 12

Topics

  • Service models
  • Socket API and ports
  • Datagrams, Streams
  • User Datagram Protocol (UDP)
  • Connections (TCP)
  • Sliding Window (TCP)
  • Flow control (TCP)
  • Retransmission timers (TCP)
  • Congestion control (TCP)

CSE 461 University of Washington 12

slide-13
SLIDE 13

UDP

slide-14
SLIDE 14

User Datagram Protocol (UDP)

  • Used by apps that don’t want reliability or

bytestreams

  • Like what?

CSE 461 University of Washington 14

slide-15
SLIDE 15

User Datagram Protocol (UDP)

  • Used by apps that don’t want reliability or

bytestreams

  • Voice-over-IP
  • DNS, RPC
  • DHCP

(If application wants reliability and messages then it has work to do!)

CSE 461 University of Washington 15

slide-16
SLIDE 16

Datagram Sockets

CSE 461 University of Washington 16

Client (host 1) Server (host 2)

Time

request reply

slide-17
SLIDE 17

Datagram Sockets (2)

CSE 461 University of Washington 17

Client (host 1) Server (host 2)

Time

1: socket 2: bind 1: socket 6: sendto 3: recvfrom* 4: sendto 5: recvfrom* 7: close 7: close *= call blocks request reply

slide-18
SLIDE 18

UDP Buffering

CSE 461 University of Washington 18

App Port Mux/Demux App App

Application Transport (TCP) Network (IP)

packet

Message queues Ports

slide-19
SLIDE 19

UDP Header

  • Uses ports to identify sending and receiving

application processes

  • Datagram length up to 64K
  • Checksum (16 bits) for reliability

CSE 461 University of Washington 19

slide-20
SLIDE 20

UDP Header (2)

  • Optional checksum covers UDP segment and IP

pseudoheader

  • Checks key IP fields (addresses)
  • Value of zero means “no checksum”

CSE 461 University of Washington 20

slide-21
SLIDE 21

TCP

slide-22
SLIDE 22

TCP

  • TCP Consists of 3 primary phases:
  • Connection Establishment (Setup)
  • Sliding Windows/Flow Control
  • Connection Release (Teardown)
slide-23
SLIDE 23

Connection Establishment

  • Both sender and receiver must be ready before we

start the transfer of data

  • Need to agree on a set of parameters
  • e.g., the Maximum Segment Size (MSS)
  • This is signaling
  • It sets up state at the endpoints
  • Like “dialing” for a telephone call

CSE 461 University of Washington 23

slide-24
SLIDE 24

CSE 461 University of Washington 24

Three-Way Handshake

  • Used in TCP; opens connection for

data in both directions

  • Each side probes the other with a

fresh Initial Sequence Number (ISN)

  • Sends on a SYNchronize segment
  • Echo on an ACKnowledge segment
  • Chosen to be robust even against

delayed duplicates

Active party (client) Passive party (server)

slide-25
SLIDE 25

CSE 461 University of Washington 25

Three-Way Handshake (2)

  • Three steps:
  • Client sends SYN(x)
  • Server replies with SYN(y)ACK(x+1)
  • Client replies with ACK(y+1)
  • SYNs are retransmitted if lost
  • Sequence and ack numbers carried
  • n further segments

1 2 3 Active party (client) Passive party (server) Time

slide-26
SLIDE 26

CSE 461 University of Washington 26

Three-Way Handshake (3)

  • Suppose delayed, duplicate

copies of the SYN and ACK arrive at the server!

  • Improbable, but anyhow …

Active party (client) Passive party (server)

slide-27
SLIDE 27

CSE 461 University of Washington 27

Three-Way Handshake (4)

  • Suppose delayed, duplicate

copies of the SYN and ACK arrive at the server!

  • Improbable, but anyhow …
  • Connection will be cleanly

rejected on both sides 

Active party (client) Passive party (server)

X X

REJECT REJECT

slide-28
SLIDE 28

TCP Connection State Machine

  • Captures the states ([]) and transitions (->)
  • A/B means event A triggers the transition, with action B

Both parties run instances

  • f this state

machine

slide-29
SLIDE 29

TCP Connections (2)

  • Follow the path of the client:
slide-30
SLIDE 30

TCP Connections (3)

  • And the path of the server:
slide-31
SLIDE 31

TCP Connections (4)

  • Again, with states …

CSE 461 University of Washington 31

LISTEN SYN_RCVD SYN_SENT ESTABLISHED ESTABLISHED 1 2 3 Active party (client) Passive party (server) Time CLOSED CLOSED

slide-32
SLIDE 32

TCP Connections (5)

  • Finite state machines are a useful tool to specify and

check the handling of all cases that may occur

  • TCP allows for simultaneous open
  • i.e., both sides open instead of the client-server pattern
  • Try at home to confirm it works 

CSE 461 University of Washington 32

slide-33
SLIDE 33

Connection Release

  • Orderly release by both parties when done
  • Delivers all pending data and “hangs up”
  • Cleans up state in sender and receiver
  • Key problem is to provide reliability while releasing
  • TCP uses a “symmetric” close in which both sides

shutdown independently

CSE 461 University of Washington 33

slide-34
SLIDE 34

CSE 461 University of Washington 34

TCP Connection Release

  • Two steps:
  • Active sends FIN(x), passive ACKs
  • Passive sends FIN(y), active ACKs
  • FINs are retransmitted if lost
  • Each FIN/ACK closes one direction
  • f data transfer

Active party Passive party

slide-35
SLIDE 35

CSE 461 University of Washington 35

TCP Connection Release (2)

  • Two steps:
  • Active sends FIN(x), passive ACKs
  • Passive sends FIN(y), active ACKs
  • FINs are retransmitted if lost
  • Each FIN/ACK closes one direction
  • f data transfer

Active party Passive party 1 2

slide-36
SLIDE 36

TCP Connection State Machine

CSE 461 University of Washington 36

Both parties run instances

  • f this state

machine

slide-37
SLIDE 37

TCP Release

  • Follow the active party

CSE 461 University of Washington 37

slide-38
SLIDE 38

TCP Release (2)

  • Follow the passive party

CSE 461 University of Washington 38

slide-39
SLIDE 39

TCP Release (3)

  • Again, with states …

CSE 461 University of Washington 39

1 2 CLOSED Active party Passive party FIN_WAIT_1 CLOSE_WAIT LAST_ACK FIN_WAIT_2 TIME_WAIT CLOSED ESTABLISHED (timeout) ESTABLISHED

slide-40
SLIDE 40

TIME_WAIT State

  • Wait a long time after sending all segments and

before completing the close

  • Two times the maximum segment lifetime of 60 seconds
  • Why?
  • ACK might have been lost, in which case FIN will be resent

for an orderly close

  • Could otherwise interfere with a subsequent connection

CSE 461 University of Washington 40

slide-41
SLIDE 41

Flow Control

slide-42
SLIDE 42

Recall

  • ARQ with one message at a time is Stop-and-Wait

(normal case below)

CSE 461 University of Washington 42

Frame 0 ACK 0 Timeout Time Sender Receiver Frame 1 ACK 1

slide-43
SLIDE 43

Limitation of Stop-and-Wait

  • It allows only a single message to be outstanding

from the sender:

  • Fine for LAN (only one frame fit)
  • Not efficient for network paths with BD >> 1 packet

CSE 461 University of Washington 43

slide-44
SLIDE 44

Limitation of Stop-and-Wait (2)

  • Example: R=1 Mbps, D = 50 ms, 10kb packets
  • RTT (Round Trip Time) = 2D = 100 ms
  • How many packets/sec?
  • What if R=10 Mbps?

CSE 461 University of Washington 44

slide-45
SLIDE 45

Sliding Window

  • Generalization of stop-and-wait
  • Allows W packets to be outstanding
  • Can send W packets per RTT (=2D)
  • Pipelining improves performance
  • Need W=2BD to fill network path

CSE 461 University of Washington 45

slide-46
SLIDE 46

Sliding Window (2)

  • What W will use the network capacity?
  • Assume 10kb packets
  • Ex: R=1 Mbps, D = 50 ms
  • Ex: What if R=10 Mbps?

CSE 461 University of Washington 46

slide-47
SLIDE 47

Sliding Window (3)

  • Ex: R=1 Mbps, D = 50 ms
  • 2BD = 106 b/sec x 100. 10-3 sec = 100 kbit
  • W = 2BD = 10 packets of 1250 bytes
  • Ex: What if R=10 Mbps?
  • 2BD = 1000 kbit
  • W = 2BD = 100 packets of 1250 bytes

CSE 461 University of Washington 47

slide-48
SLIDE 48

Sliding Window Protocol

  • Many variations, depending on how buffers,

acknowledgements, and retransmissions are handled

  • Go-Back-N
  • Simplest version, can be inefficient
  • Selective Repeat
  • More complex, better performance

CSE 461 University of Washington 48

slide-49
SLIDE 49

Sliding Window – Sender

  • Sender buffers up to W segments until they are

acknowledged

  • LFS=LAST FRAME SENT, LAR=LAST ACK REC’D
  • Sends while LFS – LAR ≤ W

CSE 461 University of Washington 49

.. 5 6 7 .. 2 3 4 5 2 3 .. LAR LFS W=5 Acked Unacked 3 .. Unavailable Available

  • seq. number

Sliding Window

slide-50
SLIDE 50

Sliding Window – Sender (2)

  • Transport accepts another segment of data from the

Application ...

  • Transport sends it (as LFS–LAR  5)

CSE 461 University of Washington 50

.. 5 6 7 .. 2 3 4 5 2 3 .. LAR W=5 Acked Unacked 3 .. Unavailable Sent

  • seq. number

Sliding Window LFS

slide-51
SLIDE 51

Sliding Window – Sender (3)

  • Next higher ACK arrives from peer…
  • Window advances, buffer is freed
  • LFS–LAR  4 (can send one more)

CSE 461 University of Washington 51

.. 5 6 7 .. 2 3 4 5 2 3 .. LAR W=5 Acked Unacked 3 .. Unavailable Available

  • seq. number

Sliding Window LFS

slide-52
SLIDE 52

Sliding Window – Go-Back-N

  • Receiver keeps only a single packet buffer for the

next segment

  • State variable, LAS = LAST ACK SENT
  • On receive:
  • If seq. number is LAS+1, accept and pass it to app, update

LAS, send ACK

  • Otherwise discard (as out of order)

CSE 461 University of Washington 52

slide-53
SLIDE 53

Sliding Window – Selective Repeat

  • Receiver passes data to app in order, and buffers out-of-
  • rder segments to reduce retransmissions
  • ACK conveys highest in-order segment, plus hints about out-
  • f-order segments
  • TCP uses a selective repeat design; we’ll see the details later

CSE 461 University of Washington 53

slide-54
SLIDE 54

Sliding Window – Selective Repeat (2)

  • Buffers W segments, keeps state variable LAS = LAST

ACK SENT

  • On receive:
  • Buffer segments [LAS+1, LAS+W]
  • Send app in-order segments from LAS+1, and update LAS
  • Send ACK for LAS regardless

CSE 461 University of Washington 54

slide-55
SLIDE 55

Sliding Window – Retransmissions

  • Go-Back-N uses a single timer to detect losses
  • On timeout, resends buffered packets starting at LAR+1
  • Selective Repeat uses a timer per unacked segment

to detect losses

  • On timeout for segment, resend it
  • Hope to resend fewer segments

CSE 461 University of Washington 55

slide-56
SLIDE 56

Sequence Numbers

  • Need more than 0/1 for Stop-and-Wait …
  • But how many?
  • For Selective Repeat, need W numbers for packets, plus

W for acks of earlier packets

  • 2W seq. numbers
  • Fewer for Go-Back-N (W+1)
  • Typically implement seq. number with an N-bit

counter that wraps around at 2N—1

  • E.g., N=8: …, 253, 254, 255, 0, 1, 2, 3, …

CSE 461 University of Washington 56

slide-57
SLIDE 57

Sequence Time Plot

CSE 461 University of Washington 57

Time

  • Seq. Number

Acks (at Receiver) Delay (=RTT/2) Transmissions (at Sender)

slide-58
SLIDE 58

Sequence Time Plot (2)

CSE 461 University of Washington 58

Time

  • Seq. Number

Go-Back-N scenario

slide-59
SLIDE 59

Sequence Time Plot (3)

CSE 461 University of Washington 59

Time

  • Seq. Number

Loss Timeout Retransmissions

slide-60
SLIDE 60

ACK Clocking

slide-61
SLIDE 61

Sliding Window ACK Clock

  • Each in-order ACK advances the sliding window and

lets a new segment enter the network

  • ACKs “clock” data segments

CSE 461 University of Washington 61

Ack 1 2 3 4 5 6 7 8 9 10 20 19 18 17 16 15 14 13 12 11 Data

slide-62
SLIDE 62

Benefit of ACK Clocking

  • Consider what happens when sender injects a burst
  • f segments into the network

CSE 461 University of Washington 62

Fast link Fast link Slow (bottleneck) link Queue

slide-63
SLIDE 63

Benefit of ACK Clocking (2)

  • Segments are buffered and spread out on slow link

CSE 461 University of Washington 63

Fast link Fast link Slow (bottleneck) link Segments “spread out”

slide-64
SLIDE 64

Benefit of ACK Clocking (3)

  • ACKs maintain the spread back to the original sender

CSE 461 University of Washington 64

Slow link Acks maintain spread

slide-65
SLIDE 65

Benefit of ACK Clocking (4)

  • Sender clocks new segments with the spread
  • Now sending at the bottleneck link without queuing!

CSE 461 University of Washington 65

Slow link Segments spread Queue no longer builds

slide-66
SLIDE 66

Benefit of ACK Clocking (4)

  • Helps run with low levels of loss and delay!
  • The network smooths out the burst of data segments
  • ACK clock transfers this smooth timing back to sender
  • Subsequent data segments are not sent in bursts so do

not queue up in the network

CSE 461 University of Washington 66

slide-67
SLIDE 67

TCP Uses ACK Clocking

  • TCP uses a sliding window because of the value of

ACK clocking

  • Sliding window controls how many segments are

inside the network

  • TCP only sends small bursts of segments to let the

network keep the traffic smooth

CSE 461 University of Washington 67

slide-68
SLIDE 68

Problem

  • Sliding window has pipelining to keep network busy
  • What if the receiver is overloaded?

CSE 461 University of Washington 68

Streaming video Big Iron Wee Mobile Arg …

slide-69
SLIDE 69

Sliding Window – Receiver

  • Consider receiver with W buffers
  • LAS=LAST ACK SENT, app pulls in-order data from buffer with

recv() call

CSE 461 University of Washington 69

Sliding Window .. 5 6 7 5 2 3 .. LAS W=5 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acceptable

slide-70
SLIDE 70

Sliding Window – Receiver (2)

  • Suppose the next two segments arrive but app does

not call recv()

CSE 461 University of Washington 70

.. 5 6 7 5 2 3 .. LAS W=5 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acceptable

slide-71
SLIDE 71

Sliding Window – Receiver (3)

  • Suppose the next two segments arrive but app does

not call recv()

  • LAS rises, but we can’t slide window!

CSE 461 University of Washington 71

.. 5 6 7 5 2 3 .. LAS W=5 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acked

slide-72
SLIDE 72

Sliding Window – Receiver (4)

  • Further segments arrive (in order) we fill buffer
  • Must drop segments until app recvs!

CSE 461 University of Washington 72

Nothing Acceptable! .. 5 6 7 5 2 3 .. W=5 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acked LAS

slide-73
SLIDE 73

Sliding Window – Receiver (5)

  • App recv() takes two segments
  • Window slides (phew)

CSE 461 University of Washington 73

Acceptable .. 5 6 7 5 2 3 .. W=5 Finished 3 ..

  • seq. number

5 5 5 5 Acked LAS

slide-74
SLIDE 74

Flow Control

  • Avoid loss at receiver by telling sender the available

buffer space

  • WIN=#Acceptable, not W (from LAS)

CSE 461 University of Washington 74

Acceptable .. 5 6 7 5 2 3 .. W=5 Finished 3 ..

  • seq. number

5 5 5 5 Acked LAS

slide-75
SLIDE 75

Flow Control (2)

  • Sender uses lower of the sliding window and flow

control window (WIN) as the effective window size

CSE 461 University of Washington 75

Acceptable .. 5 6 7 5 2 3 .. LAS W=3 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acked

slide-76
SLIDE 76

CSE 461 University of Washington 76

Flow Control (3)

  • TCP-style example
  • SEQ/ACK sliding window
  • Flow control with WIN
  • SEQ + length < ACK+WIN
  • 4KB buffer at receiver
  • Circular buffer of bytes
slide-77
SLIDE 77

Topic

  • How to set the timeout for sending a retransmission
  • Adapting to the network path

CSE 461 University of Washington 77

Lost? Network

slide-78
SLIDE 78

Retransmissions

  • With sliding window, detecting loss with timeout
  • Set timer when a segment is sent
  • Cancel timer when ack is received
  • If timer fires, retransmit data as lost

CSE 461 University of Washington 78

Retransmit!

slide-79
SLIDE 79

Timeout Problem

  • Timeout should be “just right”
  • Too long wastes network capacity
  • Too short leads to spurious resends
  • But what is “just right”?
  • Easy to set on a LAN (Link)
  • Short, fixed, predictable RTT
  • Hard on the Internet (Transport)
  • Wide range, variable RTT

CSE 461 University of Washington 79

slide-80
SLIDE 80

Example of RTTs

CSE 461 University of Washington 80

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

Round Trip Time (ms) BCNSEABCN Seconds

slide-81
SLIDE 81

Example of RTTs (2)

CSE 461 University of Washington 81

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

Round Trip Time (ms) Variation due to queuing at routers, changes in network paths, etc. BCNSEABCN Propagation (+transmission) delay ≈ 2D Seconds

slide-82
SLIDE 82

Example of RTTs (3)

CSE 461 University of Washington 82

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

Round Trip Time (ms) Timer too high! Timer too low! Need to adapt to the network conditions Seconds

slide-83
SLIDE 83

Adaptive Timeout

  • Smoothed estimates of the RTT (1) and variance in RTT (2)
  • Update estimates with a moving average

1. SRTTN+1 = 0.9*SRTTN + 0.1*RTTN+1 2. SvarN+1 = 0.9*SvarN + 0.1*|RTTN+1– SRTTN+1|

  • Set timeout to a multiple of estimates
  • To estimate the upper RTT in practice
  • TCP TimeoutN = SRTTN + 4*SvarN

CSE 461 University of Washington 83

slide-84
SLIDE 84

Example of Adaptive Timeout

CSE 461 University of Washington 84

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

RTT (ms) SRTT Svar Seconds

slide-85
SLIDE 85

Example of Adaptive Timeout (2)

CSE 461 University of Washington 85

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

RTT (ms) Timeout (SRTT + 4*Svar) Early timeout Seconds

slide-86
SLIDE 86

Adaptive Timeout (2)

  • Simple to compute, does a good job of tracking

actual RTT

  • Little “headroom” to lower
  • Yet very few early timeouts
  • Turns out to be important for good performance

and robustness

CSE 461 University of Washington 86

slide-87
SLIDE 87

Congestion

slide-88
SLIDE 88

TCP to date:

  • We can set up a connection (connection

establishment)

  • Tear down a connection (connection release)
  • Keep the sending and receiving buffers from
  • verflowing (flow control)

What’s missing?

slide-89
SLIDE 89

Network Congestion

  • A “traffic jam” in the network
  • Later we will learn how to control it

CSE 461 University of Washington 89

What’s the hold up? Network

slide-90
SLIDE 90

Congestion Collapse in the 1980s

  • Early TCP used fixed size window (e.g., 8 packets)
  • Initially fine for reliability
  • But something happened as the ARPANET grew
  • Links stayed busy but transfer rates fell by orders of

magnitude!

CSE 461 University of Washington 90

slide-91
SLIDE 91

Nature of Congestion

  • Routers/switches have internal buffering

CSE 461 University of Washington 91

. . .

. . . . . .

. . . Input Buffer Output Buffer Fabric Input Output

slide-92
SLIDE 92

Nature of Congestion (2)

  • Simplified view of per port output queues
  • Typically FIFO (First In First Out), discard when full

CSE 461 University of Washington 92

Router

=

(FIFO) Queue Queued Packets Router

slide-93
SLIDE 93

Nature of Congestion (3)

  • Queues help by absorbing bursts when input >
  • utput rate
  • But if input > output rate persistently, queue will
  • verflow
  • This is congestion
  • Congestion is a function of the traffic patterns – can
  • ccur even if every link has the same capacity

CSE 461 University of Washington 93

slide-94
SLIDE 94

Effects of Congestion

  • What happens to performance as we increase load?
slide-95
SLIDE 95

Effects of Congestion (2)

  • What happens to performance as we increase load?
slide-96
SLIDE 96

Effects of Congestion (3)

  • As offered load rises, congestion occurs as queues

begin to fill:

  • Delay and loss rise sharply with more load
  • Throughput falls below load (due to loss)
  • Goodput may fall below throughput (due to spurious

retransmissions)

  • None of the above is good!
  • Want network performance just before congestion

CSE 461 University of Washington 96

slide-97
SLIDE 97

Van Jacobson (1950—)

  • Widely credited with saving the

Internet from congestion collapse in the late 80s

  • Introduced congestion control

principles

  • Practical solutions (TCP Tahoe/Reno)
  • Much other pioneering work:
  • Tools like traceroute, tcpdump,

pathchar

  • IP header compression, multicast tools

CSE 461 University of Washington 97

slide-98
SLIDE 98

TCP Tahoe/Reno

  • TCP extensions and features we will study:
  • AIMD
  • Fair Queuing
  • ACK clocking
  • Adaptive timeout (mean and variance)
  • Slow-start
  • Fast Retransmission
  • Fast Recovery

CSE 461 University of Washington 98

slide-99
SLIDE 99

TCP Timeline

CSE 461 University of Washington 99

1988 1990 1970 1980 1975 1985 Origins of “TCP” (Cerf & Kahn, ’74) 3-way handshake (Tomlinson, ‘75) TCP Reno (Jacobson, ‘90) Congestion collapse Observed, ‘86 TCP/IP “flag day” (BSD Unix 4.2, ‘83) TCP Tahoe (Jacobson, ’88)

Pre-history Congestion control

. . . TCP and IP (RFC 791/793, ‘81)

slide-100
SLIDE 100

TCP Timeline (2)

CSE 461 University of Washington 100

2010 2000 1995 2005 ECN (Floyd, ‘94) TCP Reno (Jacobson, ‘90) TCP New Reno (Hoe, ‘95) TCP BIC (Linux, ‘04 TCP with SACK (Floyd, ‘96)

Diversification Classic congestion control

. . . 1990 TCP LEDBAT (IETF ’08) TCP Vegas (Brakmo, ‘93) TCP CUBIC (Linux, ’06) . . . Background Router support Delay based FAST TCP (Low et al., ’04) Compound TCP (Windows, ’07)

slide-101
SLIDE 101

Bandwidth Allocation

  • Important task for network is to allocate its capacity

to senders

  • Good allocation is both efficient and fair
  • Efficient means most capacity is used but there is no

congestion

  • Fair means every sender gets a reasonable share the

network

CSE 461 University of Washington 101

slide-102
SLIDE 102

Bandwidth Allocation (2)

  • Key observation:
  • In an effective solution, Transport and Network layers

must work together

  • Network layer witnesses congestion
  • Only it can provide direct feedback
  • Transport layer causes congestion
  • Only it can reduce offered load

CSE 461 University of Washington 102

slide-103
SLIDE 103

Bandwidth Allocation (3)

  • Why is it hard? (Just split equally!)
  • Number of senders and their offered load changes
  • Senders may lack capacity in different parts of network
  • Network is distributed; no single party has an overall

picture of its state

CSE 461 University of Washington 103

slide-104
SLIDE 104

Bandwidth Allocation (4)

  • Solution context:
  • Senders adapt concurrently based on their own view of

the network

  • Design this adaption so the network usage as a whole is

efficient and fair

  • Adaption is continuous since offered loads continue to

change over time

CSE 461 University of Washington 104

slide-105
SLIDE 105

Fair Allocations

slide-106
SLIDE 106

Fair Allocation

  • What’s a “fair” bandwidth allocation?
  • The max-min fair allocation

CSE 461 University of Washington 106

slide-107
SLIDE 107

Recall

  • We want a good bandwidth allocation to be both

fair and efficient

  • Now we learn what fair means
  • Caveat: in practice, efficiency is more important

than fairness

CSE 461 University of Washington 107

slide-108
SLIDE 108

Efficiency vs. Fairness

  • Cannot always have both!
  • Example network with traffic:
  • AB, BC and AC
  • How much traffic can we carry?

CSE 461 University of Washington 108

A B C 1 1

slide-109
SLIDE 109

Efficiency vs. Fairness (2)

  • If we care about fairness:
  • Give equal bandwidth to each flow
  • AB: ½ unit, BC: ½, and AC, ½
  • Total traffic carried is 1 ½ units

CSE 461 University of Washington 109

A B C 1 1

slide-110
SLIDE 110

Efficiency vs. Fairness (3)

  • If we care about efficiency:
  • Maximize total traffic in network
  • AB: 1 unit, BC: 1, and AC, 0
  • Total traffic rises to 2 units!

CSE 461 University of Washington 110

A B C 1 1

slide-111
SLIDE 111

The Slippery Notion of Fairness

  • Why is “equal per flow” fair anyway?
  • AC uses more network resources than AB or BC
  • Host A sends two flows, B sends one
  • Not productive to seek exact fairness
  • More important to avoid starvation
  • A node that cannot use any bandwidth
  • “Equal per flow” is good enough

CSE 461 University of Washington 111

slide-112
SLIDE 112

Generalizing “Equal per Flow”

  • Bottleneck for a flow of traffic is the link that limits

its bandwidth

  • Where congestion occurs for the flow
  • For AC, link A–B is the bottleneck

CSE 461 University of Washington 112

A B C 1 10 Bottleneck

slide-113
SLIDE 113

Generalizing “Equal per Flow” (2)

  • Flows may have different bottlenecks
  • For AC, link A–B is the bottleneck
  • For BC, link B–C is the bottleneck
  • Can no longer divide links equally …

CSE 461 University of Washington 113

A B C 1 10

slide-114
SLIDE 114

Max-Min Fairness

  • Intuitively, flows bottlenecked on a link get an equal

share of that link

  • Max-min fair allocation is one that:
  • Increasing the rate of one flow will decrease the rate of a

smaller flow

  • This “maximizes the minimum” flow

CSE 461 University of Washington 114

slide-115
SLIDE 115

Max-Min Fairness (2)

  • To find it given a network, imagine “pouring water

into the network”

  • 1. Start with all flows at rate 0
  • 2. Increase the flows until there is a new bottleneck in

the network

  • 3. Hold fixed the rate of the flows that are bottlenecked
  • 4. Go to step 2 for any remaining flows

CSE 461 University of Washington 115

slide-116
SLIDE 116

Max-Min Example

  • Example: network with 4 flows, link bandwidth = 1
  • What is the max-min fair allocation?

CSE 461 University of Washington 116

slide-117
SLIDE 117

Max-Min Example (2)

  • When rate=1/3, flows B, C, and D bottleneck R4—R5
  • Fix B, C, and D, continue to increase A

CSE 461 University of Washington 117

Bottleneck Bottleneck

slide-118
SLIDE 118

Max-Min Example (3)

  • When rate=2/3, flow A bottlenecks R2—R3. Done.

CSE 461 University of Washington 118

Bottleneck Bottleneck

slide-119
SLIDE 119

Max-Min Example (4)

  • End with A=2/3, B, C, D=1/3, and R2—R3, R4—R5

full

  • Other links have extra capacity that can’t be used
  • , linksxample: network with 4 flows, links equal

bandwidth

  • What is the max-min fair allocation?

CSE 461 University of Washington 119

slide-120
SLIDE 120

Adapting over Time

  • Allocation changes as flows start and stop

CSE 461 University of Washington 120

Time

slide-121
SLIDE 121

Adapting over Time (2)

CSE 461 University of Washington 121

Flow 1 slows when Flow 2 starts Flow 1 speeds up when Flow 2 stops Time Flow 3 limit is elsewhere

slide-122
SLIDE 122

Bandwidth Allocation

slide-123
SLIDE 123

Recall

  • Want to allocate capacity to senders
  • Network layer provides feedback
  • Transport layer adjusts offered load
  • A good allocation is efficient and fair
  • How should we perform the allocation?
  • Several different possibilities …

CSE 461 University of Washington 123

slide-124
SLIDE 124

Bandwidth Allocation Models

  • Open loop versus closed loop
  • Open: reserve bandwidth before use
  • Closed: use feedback to adjust rates
  • Host versus Network support
  • Who is sets/enforces allocations?
  • Window versus Rate based
  • How is allocation expressed?

CSE 461 University of Washington 124

TCP is a closed loop, host-driven, and window-based

slide-125
SLIDE 125

Bandwidth Allocation Models (2)

  • We’ll look at closed-loop, host-driven, and window-

based too

  • Network layer returns feedback on current

allocation to senders

  • At least tells if there is congestion
  • Transport layer adjusts sender’s behavior via

window in response

  • How senders adapt is a control law

CSE 461 University of Washington 125

slide-126
SLIDE 126

Additive Increase Multiplicative Decrease

  • AIMD is a control law hosts can use to reach a good

allocation

  • Hosts additively increase rate while network not congested
  • Hosts multiplicatively decrease rate when congested
  • Used by TCP
  • Let’s explore the AIMD game …

CSE 461 University of Washington 126

slide-127
SLIDE 127

AIMD Game

  • Hosts 1 and 2 share a bottleneck
  • But do not talk to each other directly
  • Router provides binary feedback
  • Tells hosts if network is congested

CSE 461 University of Washington 127

Rest of Network Bottleneck Router Host 1 Host 2 1 1 1

slide-128
SLIDE 128

AIMD Game (2)

  • Each point is a possible allocation

CSE 461 University of Washington 128

Host 1 Host 2

1 1

Fair Efficient Optimal Allocation Congested

slide-129
SLIDE 129

AIMD Game (3)

  • AI and MD move the allocation

CSE 461 University of Washington 129

Host 1 Host 2

1 1

Fair, y=x Efficient, x+y=1 Optimal Allocation Congested Multiplicative Decrease Additive Increase

slide-130
SLIDE 130

AIMD Game (4)

  • Play the game!

CSE 461 University of Washington 130

Host 1 Host 2

1 1

Fair Efficient Congested A starting point

slide-131
SLIDE 131

AIMD Game (5)

  • Always converge to good allocation!

CSE 461 University of Washington 131

Host 1 Host 2

1 1

Fair Efficient Congested A starting point

slide-132
SLIDE 132

AIMD Sawtooth

  • Produces a “sawtooth” pattern over time for rate of

each host

  • This is the TCP sawtooth (later)

CSE 461 University of Washington 132

Multiplicative Decrease Additive Increase Time Host 1 or 2’s Rate

slide-133
SLIDE 133

AIMD Properties

  • Converges to an allocation that is efficient and fair

when hosts run it

  • Holds for more general topologies
  • Other increase/decrease control laws do not! (Try

MIAD, MIMD, MIAD)

  • Requires only binary feedback from the network

CSE 461 University of Washington 133

slide-134
SLIDE 134

Feedback Signals

  • Several possible signals, with different pros/cons
  • We’ll look at classic TCP that uses packet loss as a signal

CSE 461 University of Washington 134

Signal Example Protocol Pros / Cons Packet loss TCP NewReno Cubic TCP (Linux) Hard to get wrong Hear about congestion late Packet delay Compound TCP (Windows) Hear about congestion early Need to infer congestion Router indication TCPs with Explicit Congestion Notification Hear about congestion early Require router support

slide-135
SLIDE 135

Slow Start (TCP Additive Increase)

slide-136
SLIDE 136

Practical AIMD

  • We want TCP to follow an AIMD control law for a

good allocation

  • Sender uses a congestion window or cwnd to set its

rate (≈cwnd/RTT)

  • Sender uses loss as network congestion signal
  • Need TCP to work across a very large range of rates

and RTTs

CSE 461 University of Washington 136

slide-137
SLIDE 137

TCP Startup Problem

  • We want to quickly near the right rate, cwndIDEAL, but

it varies greatly

  • Fixed sliding window doesn’t adapt and is rough on the

network (loss!)

  • Additive Increase with small bursts adapts cwnd gently to

the network, but might take a long time to become efficient

CSE 461 University of Washington 137

slide-138
SLIDE 138

Slow-Start Solution

  • Start by doubling cwnd every RTT
  • Exponential growth (1, 2, 4, 8, 16, …)
  • Start slow, quickly reach large values

138

AI Fixed Time Window (cwnd) Slow-start

slide-139
SLIDE 139

Slow-Start Solution (2)

  • Eventually packet loss will occur when the network

is congested

  • Loss timeout tells us cwnd is too large
  • Next time, switch to AI beforehand
  • Slowly adapt cwnd near right value
  • In terms of cwnd:
  • Expect loss for cwndC ≈ 2BD+queue
  • Use ssthresh = cwndC/2 to switch to AI

CSE 461 University of Washington 139

slide-140
SLIDE 140

Slow-Start Solution (3)

  • Combined behavior, after first time
  • Most time spend near right value

140

AI Fixed Time Window ssthresh cwndC cwndIDEAL AI phase Slow-start

slide-141
SLIDE 141

Slow-Start (Doubling) Timeline

CSE 461 University of Washington 141

Increment cwnd by 1 packet for each ACK

slide-142
SLIDE 142

Additive Increase Timeline

CSE 461 University of Washington 142

Increment cwnd by 1 packet every cwnd ACKs (or 1 RTT)

slide-143
SLIDE 143

TCP Tahoe (Implementation)

  • Initial slow-start (doubling) phase
  • Start with cwnd = 1 (or small value)
  • cwnd += 1 packet per ACK
  • Later Additive Increase phase
  • cwnd += 1/cwnd packets per ACK
  • Roughly adds 1 packet per RTT
  • Switching threshold (initially infinity)
  • Switch to AI when cwnd > ssthresh
  • Set ssthresh = cwnd/2 after loss
  • Begin with slow-start after timeout

CSE 461 University of Washington 143

slide-144
SLIDE 144

Timeout Misfortunes

  • Why do a slow-start after timeout?
  • Instead of MD cwnd (for AIMD)
  • Timeouts are sufficiently long that the ACK clock will

have run down

  • Slow-start ramps up the ACK clock
  • We need to detect loss before a timeout to get to

full AIMD

CSE 461 University of Washington 144

slide-145
SLIDE 145

Fast Recovery (TCP Multiplicative Decrease)

slide-146
SLIDE 146

Practical AIMD (2)

  • We want TCP to follow an AIMD control law for a

good allocation

  • Sender uses a congestion window or cwnd to set its

rate (≈cwnd/RTT)

  • Sender uses slow-start to ramp up the ACK clock,

followed by Additive Increase

  • But after a timeout, sender slow-starts again with

cwnd=1 (as it no ACK clock)

CSE 461 University of Washington 146

slide-147
SLIDE 147

Inferring Loss from ACKs

  • TCP uses a cumulative ACK
  • Carries highest in-order seq. number
  • Normally a steady advance
  • Duplicate ACKs give us hints about what data hasn’t

arrived

  • Tell us some new data did arrive, but it was not next

segment

  • Thus the next segment may be lost

CSE 461 University of Washington 147

slide-148
SLIDE 148

Fast Retransmit

  • Treat three duplicate ACKs as a loss
  • Retransmit next expected segment
  • Some repetition allows for reordering, but still detects loss

quickly

CSE 461 University of Washington 148

Ack 1 2 3 4 5 5 5 5 5 5

slide-149
SLIDE 149

Fast Retransmit (2)

CSE 461 University of Washington 149

Ack 10 Ack 11 Ack 12 Ack 13

. . .

Ack 13 Ack 13 Ack 13 Data 14

. . .

Ack 13 Ack 20

. . . . . .

Data 20

Third duplicate ACK, so send 14 Retransmission fills in the hole at 14 ACK jumps after loss is repaired . . . . . . Data 14 was lost earlier, but got 15 to 20

slide-150
SLIDE 150

Fast Retransmit (3)

  • It can repair single segment loss quickly, typically

before a timeout

  • However, we have quiet time at the sender/receiver

while waiting for the ACK to jump

  • And we still need to MD cwnd …

CSE 461 University of Washington 150

slide-151
SLIDE 151

Inferring Non-Loss from ACKs

  • Duplicate ACKs also give us hints about what data

has arrived

  • Each new duplicate ACK means that some new segment

has arrived

  • It will be the segments after the loss
  • Thus advancing the sliding window will not increase the

number of segments stored in the network

CSE 461 University of Washington 151

slide-152
SLIDE 152

Fast Recovery

  • First fast retransmit, and MD cwnd
  • Then pretend further duplicate ACKs are the

expected ACKs

  • Lets new segments be sent for ACKs
  • Reconcile views when the ACK jumps

CSE 461 University of Washington 152

Ack 1 2 3 4 5 5 5 5 5 5

slide-153
SLIDE 153

Fast Recovery (2)

CSE 461 University of Washington 153

Ack 12 Ack 13 Ack 13 Ack 13 Ack 13 Data 14 Ack 13 Ack 20

. . . . . .

Data 20

Third duplicate ACK, so send 14 Data 14 was lost earlier, but got 15 to 20 Retransmission fills in the hole at 14 Set ssthresh, cwnd = cwnd/2

Data 21 Data 22

More ACKs advance window; may send segments before jump

Ack 13

Exit Fast Recovery

slide-154
SLIDE 154

Fast Recovery (3)

  • With fast retransmit, it repairs a single segment loss

quickly and keeps the ACK clock running

  • This allows us to realize AIMD
  • No timeouts or slow-start after loss, just continue with a

smaller cwnd

  • TCP Reno combines slow-start, fast retransmit and

fast recovery

  • Multiplicative Decrease is ½

CSE 461 University of Washington 154

slide-155
SLIDE 155

TCP Reno

CSE 461 University of Washington 155

MD of ½ , no slow-start ACK clock running TCP sawtooth

slide-156
SLIDE 156

TCP Reno, NewReno, and SACK

  • Reno can repair one loss per RTT
  • Multiple losses cause a timeout
  • NewReno further refines ACK heuristics
  • Repairs multiple losses without timeout
  • Selective ACK (SACK) is a better idea
  • Receiver sends ACK ranges so sender can retransmit

without guesswork

CSE 461 University of Washington 156

slide-157
SLIDE 157

Network-Side Congestion Control

slide-158
SLIDE 158

Congestion Avoidance vs. Control

  • Classic TCP drives the network into congestion and

then recovers

  • Needs to see loss to slow down
  • Would be better to use the network but avoid

congestion altogether!

  • Reduces loss and delay
  • But how can we do this?

CSE 461 University of Washington 158

slide-159
SLIDE 159

Feedback Signals

  • Delay and router signals can let us avoid congestion

CSE 461 University of Washington 159

Signal Example Protocol Pros / Cons Packet loss Classic TCP Cubic TCP (Linux) Hard to get wrong Hear about congestion late Packet delay Compound TCP (Windows) Hear about congestion early Need to infer congestion Router indication TCPs with Explicit Congestion Notification Hear about congestion early Require router support

slide-160
SLIDE 160

ECN (Explicit Congestion Notification)

  • Router detects the onset of congestion via its queue
  • When congested, it marks affected packets (IP header)

CSE 461 University of Washington 160

slide-161
SLIDE 161

ECN (2)

  • Marked packets arrive at receiver; treated as loss
  • TCP receiver reliably informs TCP sender of the congestion

CSE 461 University of Washington 161

slide-162
SLIDE 162

ECN (3)

  • Advantages:
  • Routers deliver clear signal to hosts
  • Congestion is detected early, no loss
  • No extra packets need to be sent
  • Disadvantages:
  • Routers and hosts must be upgraded

CSE 461 University of Washington 162

slide-163
SLIDE 163

Random Early Detection (RED)

  • Alternative idea: instead of marking packets, drop
  • We know they’re using TCP, make use of that fact
  • Signals congestion to sender
  • But without adding headers or doing packet inspection
  • Drop at random, depending on queue size
  • If queue empty, accept packet always
  • If queue full, always drop
  • As queue approaches full, increase likelihood of packet drop
  • Example: 1 queue slot left, 10 packets expected, 90% chance of drop
slide-164
SLIDE 164

RED (Random Early Detection)

  • Router detects the onset of congestion via its queue
  • Prior to congestion, drop a packet to signal

CSE 461 University of Washington 164

Drop packet

slide-165
SLIDE 165

RED (Random Early Detection)

  • Sender enters MD, slows packet flow
  • We shed load, everyone is happy

CSE 461 University of Washington 165

Drop packet