Transport Where we are in the Course Moving on up to the Transport - - PowerPoint PPT Presentation

transport where we are in the course
SMART_READER_LITE
LIVE PREVIEW

Transport Where we are in the Course Moving on up to the Transport - - PowerPoint PPT Presentation

Transport Where we are in the Course Moving on up to the Transport Layer! Application Transport Network Link Physical CSEP 561 University of Washington 2 Recall Transport layer provides end-to-end connectivity across the network


slide-1
SLIDE 1

Transport

slide-2
SLIDE 2

Where we are in the Course

  • Moving on up to the Transport Layer!

CSEP 561 University of Washington 2

Physical Link Network Transport Application

slide-3
SLIDE 3

Recall

  • Transport layer provides end-to-end connectivity

across the network

CSEP 561 University of Washington 3

Router Host Host

TCP IP 802.11 app IP 802.11 IP Ethernet TCP IP Ethernet app

slide-4
SLIDE 4

Recall (2)

  • Segments carry application data across the network
  • Segments are carried within packets within frames

CSEP 561 University of Washington 4

802.11 IP TCP App, e.g., HTTP Segment Packet Frame

slide-5
SLIDE 5

Transport Layer Services

  • Provide different kinds of data delivery across the

network to applications

CSEP 561 University of Washington 5

Unreliable Reliable Messages Datagrams (UDP) Bytestream Streams (TCP)

slide-6
SLIDE 6

Comparison of Internet Transports

  • TCP is full-featured, UDP is a glorified packet

CSEP 561 University of Washington 6

TCP (Streams) UDP (Datagrams) Connections Datagrams Bytes are delivered once, reliably, and in order Messages may be lost, reordered, duplicated Arbitrary length content Limited message size Flow control matches sender to receiver Can send regardless

  • f receiver state

Congestion control matches sender to network Can send regardless

  • f network state
slide-7
SLIDE 7

Socket API

  • Simple abstraction to use the network
  • The “network” API (really Transport service) used to write

all Internet apps

  • Part of all major OSes and languages; originally Berkeley

(Unix) ~1983

  • Supports both Internet transport services (Streams

and Datagrams)

CSEP 561 University of Washington 7

slide-8
SLIDE 8

Socket API (2)

  • Sockets let apps attach to the local network at

different ports

CSEP 561 University of Washington 8

Socket, Port #1 Socket, Port #2

slide-9
SLIDE 9

Socket API (3)

  • Same API used for Streams and Datagrams

CSEP 561 University of Washington 9

Primitive Meaning SOCKET Create a new communication endpoint BIND Associate a local address (port) with a socket LISTEN Announce willingness to accept connections ACCEPT Passively establish an incoming connection CONNECT Actively attempt to establish a connection SEND(TO) Send some data over the socket RECEIVE(FROM) Receive some data over the socket CLOSE Release the socket

Only needed for Streams To/From for Datagrams

slide-10
SLIDE 10

Ports

  • Application process is identified by the tuple IP

address, transport protocol, and port

  • Ports are 16-bit integers representing local “mailboxes”

that a process leases

  • Servers often bind to “well-known ports”
  • <1024, require administrative privileges
  • Clients often assigned “ephemeral” ports
  • Chosen by OS, used temporarily

CSEP 561 University of Washington 10

slide-11
SLIDE 11

Some Well-Known Ports

CSEP 561 University of Washington 11

Port Protocol Use 20, 21 FTP File transfer 22 SSH Remote login, replacement for Telnet 25 SMTP Email 80 HTTP World Wide Web 110 POP-3 Remote email access 143 IMAP Remote email access 443 HTTPS Secure Web (HTTP over SSL/TLS) 543 RTSP Media player control 631 IPP Printer sharing

slide-12
SLIDE 12

UDP

slide-13
SLIDE 13

User Datagram Protocol (UDP)

  • Used by apps that don’t want reliability or

bytestreams

  • Like what?

CSEP 561 University of Washington 13

slide-14
SLIDE 14

User Datagram Protocol (UDP)

  • Used by apps that don’t want reliability or

bytestreams

  • Voice-over-IP
  • DNS, RPC
  • DHCP

(If application wants reliability and messages then it has work to do!)

CSEP 561 University of Washington 14

slide-15
SLIDE 15

Datagram Sockets

CSEP 561 University of Washington 15

Client (host 1) Server (host 2)

Time

request reply

slide-16
SLIDE 16

Datagram Sockets (2)

CSEP 561 University of Washington 16

Client (host 1) Server (host 2)

Time

1: socket 2: bind 1: socket 6: sendto 3: recvfrom* 4: sendto 5: recvfrom* 7: close 7: close *= call blocks request reply

slide-17
SLIDE 17

UDP Buffering

CSEP 561 University of Washington 17

App Port Mux/Demux App App

Application Transport (UDP) Network (IP)

packet

Message queues Ports

slide-18
SLIDE 18

UDP Header

  • Uses ports to identify sending and receiving

application processes

  • Datagram length up to 64K
  • Checksum (16 bits) for reliability

CSEP 561 University of Washington 18

slide-19
SLIDE 19

UDP Header (2)

  • Optional checksum covers UDP segment and IP

pseudoheader

  • Checks key IP fields (addresses)
  • Value of zero means “no checksum”

CSEP 561 University of Washington 19

slide-20
SLIDE 20

TCP

slide-21
SLIDE 21

TCP

  • TCP Consists of 3 primary phases:
  • Connection Establishment (Setup)
  • Sliding Windows/Flow Control
  • Connection Release (Teardown)
slide-22
SLIDE 22

Connection Establishment

  • Both sender and receiver must be ready before we

start the transfer of data

  • Need to agree on a set of parameters
  • e.g., the Maximum Segment Size (MSS)
  • This is signaling
  • It sets up state at the endpoints
  • Like “dialing” for a telephone call

CSEP 561 University of Washington 22

slide-23
SLIDE 23

CSEP 561 University of Washington 23

Three-Way Handshake

  • Used in TCP; opens connection for

data in both directions

  • Each side probes the other with a

fresh Initial Sequence Number (ISN)

  • Sends on a SYNchronize segment
  • Echo on an ACKnowledge segment
  • Chosen to be robust even against

delayed duplicates

Active party (client) Passive party (server)

slide-24
SLIDE 24

CSEP 561 University of Washington 24

Three-Way Handshake (2)

  • Three steps:
  • Client sends SYN(x)
  • Server replies with SYN(y)ACK(x+1)
  • Client replies with ACK(y+1)
  • SYNs are retransmitted if lost
  • Sequence and ack numbers carried
  • n further segments

1 2 3 Active party (client) Passive party (server) Time

slide-25
SLIDE 25

CSEP 561 University of Washington 25

Three-Way Handshake (3)

  • Suppose delayed, duplicate

copies of the SYN and ACK arrive at the server!

  • Improbable, but anyhow …

Active party (client) Passive party (server)

slide-26
SLIDE 26

CSEP 561 University of Washington 26

Three-Way Handshake (4)

  • Suppose delayed, duplicate

copies of the SYN and ACK arrive at the server!

  • Improbable, but anyhow …
  • Connection will be cleanly

rejected on both sides

Active party (client) Passive party (server)

X X

REJECT REJECT

slide-27
SLIDE 27

Connection Release

  • Orderly release by both parties when done
  • Delivers all pending data and “hangs up”
  • Cleans up state in sender and receiver
  • Key problem is to provide reliability while releasing
  • TCP uses a “symmetric” close in which both sides

shutdown independently

CSEP 561 University of Washington 27

slide-28
SLIDE 28

CSEP 561 University of Washington 28

TCP Connection Release

  • Two steps:
  • Active sends FIN(x), passive ACKs
  • Passive sends FIN(y), active ACKs
  • FINs are retransmitted if lost
  • Each FIN/ACK closes one direction
  • f data transfer

Active party Passive party

slide-29
SLIDE 29

CSEP 561 University of Washington 29

TCP Connection Release (2)

  • Two steps:
  • Active sends FIN(x), passive ACKs
  • Passive sends FIN(y), active ACKs
  • FINs are retransmitted if lost
  • Each FIN/ACK closes one direction
  • f data transfer

Active party Passive party 1 2

slide-30
SLIDE 30

Flow Control

slide-31
SLIDE 31

Recall

  • ARQ with one message at a time is Stop-and-Wait

(normal case below)

CSEP 561 University of Washington 31

Frame 0 ACK 0 Timeout Time Sender Receiver Frame 1 ACK 1

slide-32
SLIDE 32

Limitation of Stop-and-Wait

  • It allows only a single message to be outstanding

from the sender:

  • Fine for LAN (only one frame fits in network anyhow)
  • Not efficient for network paths with BD >> 1 packet

CSEP 561 University of Washington 32

slide-33
SLIDE 33

Limitation of Stop-and-Wait (2)

  • Example: R=1 Mbps, D = 50 ms, 10kb packets
  • RTT (Round Trip Time) = 2D = 100 ms
  • How many packets/sec?
  • What if R=10 Mbps?

CSEP 561 University of Washington 33

slide-34
SLIDE 34

Sliding Window

  • Generalization of stop-and-wait
  • Allows W packets to be outstanding
  • Can send W packets per RTT (=2D)
  • Pipelining improves performance
  • Need W=2BD to fill network path

CSEP 561 University of Washington 34

slide-35
SLIDE 35

Sliding Window (2)

  • What W will use the network capacity?
  • Assume 10kb packets
  • Ex: R=1 Mbps, D = 50 ms
  • Ex: What if R=10 Mbps?

CSEP 561 University of Washington 35

slide-36
SLIDE 36

Sliding Window (3)

  • Ex: R=1 Mbps, D = 50 ms
  • 2BD = 106 b/sec x 100. 10-3 sec = 100 kbit
  • W = 2BD = 10 packets of 1250 bytes
  • Ex: What if R=10 Mbps?
  • 2BD = 1000 kbit
  • W = 2BD = 100 packets of 1250 bytes

CSEP 561 University of Washington 36

slide-37
SLIDE 37

Sliding Window Protocol

  • Many variations, depending on how buffers,

acknowledgements, and retransmissions are handled

  • Go-Back-N
  • Simplest version, can be inefficient
  • Selective Repeat
  • More complex, better performance

CSEP 561 University of Washington 37

slide-38
SLIDE 38

Sliding Window – Sender

  • Sender buffers up to W segments until they are

acknowledged

  • LFS=LAST FRAME SENT, LAR=LAST ACK REC’D
  • Sends while LFS – LAR ≤ W

CSEP 561 University of Washington 38

.. 5 6 7 .. 2 3 4 5 2 3 .. LAR LFS W=5 Acked Unacked 3 .. Unavailable Available

  • seq. number

Sliding Window

slide-39
SLIDE 39

Sliding Window – Sender (2)

  • Transport accepts another segment of data from the

Application ...

  • Transport sends it (as LFS–LAR = 5)

CSEP 561 University of Washington 39

.. 5 6 7 .. 2 3 4 5 2 3 .. LAR W=5 Acked Unacked 3 .. Unavailable Sent

  • seq. number

Sliding Window LFS

slide-40
SLIDE 40

Sliding Window – Sender (3)

  • Next higher ACK arrives from peer…
  • Window advances, buffer is freed
  • LFS–LAR = 4 (can send one more)

CSEP 561 University of Washington 40

.. 5 6 7 .. 2 3 4 5 2 3 .. LAR W=5 Acked Unacked 3 .. Unavailable Available

  • seq. number

Sliding Window LFS

slide-41
SLIDE 41

Sliding Window – Go-Back-N

  • Receiver keeps only a single packet buffer for the

next segment

  • State variable, LAS = LAST ACK SENT
  • On receive:
  • If seq. number is LAS+1, accept and pass it to app, update

LAS, send ACK

  • Otherwise discard (as out of order)

CSEP 561 University of Washington 41

slide-42
SLIDE 42

Sliding Window – Selective Repeat

  • Receiver passes data to app in order, and buffers out-of-
  • rder segments to reduce retransmissions
  • ACK conveys highest in-order segment, plus hints about out-
  • f-order segments
  • TCP uses a selective repeat design; we’ll see the details later

CSEP 561 University of Washington 42

slide-43
SLIDE 43

Sliding Window – Selective Repeat (2)

  • Buffers W segments, keeps state variable LAS = LAST

ACK SENT

  • On receive:
  • Buffer segments [LAS+1, LAS+W]
  • Send app in-order segments from LAS+1, and update LAS
  • Send ACK for LAS regardless

CSEP 561 University of Washington 43

slide-44
SLIDE 44

5

Sliding Window – Selective Retransmission (3)

  • Keep normal sliding window
  • If receive something out of order
  • Send last unacked packet again!

CSEP 561 University of Washington 44

.. 5 6 7 .. 2 4 5 3 .. LAR+1 again W=5 Acked Unacked 3 .. Unavailable Ack Arrives Out of Order!

  • seq. number

Sliding Window LFS ..

slide-45
SLIDE 45

5

Sliding Window – Selective Retransmission (4)

  • Keep normal sliding window
  • If correct packet arrives, move window and LAR,

send more messages

CSEP 561 University of Washington 45

.. 5 6 7 .. 4 5 3 .. LAR W=5 Acked Unacked 3 .. Correct ack arrives…

  • seq. number

Sliding Window LFS .. .. Now Available

slide-46
SLIDE 46

Sliding Window – Retransmissions

  • Go-Back-N uses a single timer to detect losses
  • On timeout, resends buffered packets starting at LAR+1
  • Selective Repeat uses a timer per unacked segment

to detect losses

  • On timeout for segment, resend it
  • Hope to resend fewer segments

CSEP 561 University of Washington 46

slide-47
SLIDE 47

Sequence Time Plot

CSEP 561 University of Washington 47

Time

  • Seq. Number

Acks (at Receiver) Delay (=RTT/2) Transmissions (at Sender)

slide-48
SLIDE 48

Sequence Time Plot (2)

CSEP 561 University of Washington 48

Time

  • Seq. Number

Go-Back-N scenario

slide-49
SLIDE 49

Sequence Time Plot (3)

CSEP 561 University of Washington 49

Time

  • Seq. Number

Loss Timeout Retransmissions

slide-50
SLIDE 50

Problem

  • Sliding window has pipelining to keep network busy
  • What if the receiver is overloaded?

CSEP 561 University of Washington 50

Streaming video Big Iron Wee Mobile Arg …

slide-51
SLIDE 51

Sliding Window – Receiver

  • Consider receiver with W buffers
  • LAS=LAST ACK SENT, app pulls in-order data from buffer with

recv() call

CSEP 561 University of Washington 51

Sliding Window .. 5 6 7 5 2 3 .. LAS W=5 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acceptable

slide-52
SLIDE 52

Sliding Window – Receiver (2)

  • Suppose the next two segments arrive but app does

not call recv()

CSEP 561 University of Washington 52

.. 5 6 7 5 2 3 .. LAS W=5 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acceptable

slide-53
SLIDE 53

Sliding Window – Receiver (3)

  • Suppose the next two segments arrive but app does

not call recv()

  • LAS rises, but we can’t slide window!

CSEP 561 University of Washington 53

.. 5 6 7 5 2 3 .. LAS W=5 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acked

slide-54
SLIDE 54

Sliding Window – Receiver (4)

  • Further segments arrive (in order) we fill buffer
  • Must drop segments until app recvs!

CSEP 561 University of Washington 54

Nothing Acceptable! .. 5 6 7 5 2 3 .. W=5 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acked LAS

slide-55
SLIDE 55

Sliding Window – Receiver (5)

  • App recv() takes two segments
  • Window slides (phew)

CSEP 561 University of Washington 55

Acceptable .. 5 6 7 5 2 3 .. W=5 Finished 3 ..

  • seq. number

5 5 5 5 Acked LAS

slide-56
SLIDE 56

Flow Control

  • Avoid loss at receiver by telling sender the available

buffer space

  • WIN=#Acceptable, not W (from LAS)

CSEP 561 University of Washington 56

Acceptable .. 5 6 7 5 2 3 .. W=5 Finished 3 ..

  • seq. number

5 5 5 5 Acked LAS

slide-57
SLIDE 57

Flow Control (2)

  • Sender uses lower of the sliding window and flow

control window (WIN) as the effective window size

CSEP 561 University of Washington 57

Acceptable .. 5 6 7 5 2 3 .. LAS W=3 Finished 3 .. Too high

  • seq. number

5 5 5 5 Acked

slide-58
SLIDE 58

CSEP 561 University of Washington 58

Flow Control (3)

  • TCP-style example
  • SEQ/ACK sliding window
  • Flow control with WIN
  • SEQ + length < ACK+WIN
  • 4KB buffer at receiver
  • Circular buffer of bytes
slide-59
SLIDE 59

Topic

  • How to set the timeout for sending a retransmission
  • Adapting to the network path

CSEP 561 University of Washington 59

Lost? Network

slide-60
SLIDE 60

Retransmissions

  • With sliding window, detecting loss with timeout
  • Set timer when a segment is sent
  • Cancel timer when ack is received
  • If timer fires, retransmit data as lost

CSEP 561 University of Washington 60

Retransmit!

slide-61
SLIDE 61

Timeout Problem

  • Timeout should be “just right”
  • Too long wastes network capacity
  • Too short leads to spurious resends
  • But what is “just right”?
  • Easy to set on a LAN (Link)
  • Short, fixed, predictable RTT
  • Hard on the Internet (Transport)
  • Wide range, variable RTT

CSEP 561 University of Washington 61

slide-62
SLIDE 62

Example of RTTs

CSEP 561 University of Washington 62

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

Round Trip Time (ms)   BCN Seconds

slide-63
SLIDE 63

Example of RTTs (2)

CSEP 561 University of Washington 63

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

Round Trip Time (ms) Variation due to queuing at routers, changes in network paths, etc.   BCN Propagation (+transmission) delay ≈ 2D Seconds

slide-64
SLIDE 64

Example of RTTs (3)

CSEP 561 University of Washington 64

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

Round Trip Time (ms) Timer too high! Timer too low! Need to adapt to the network conditions Seconds

slide-65
SLIDE 65

Adaptive Timeout

  • Smoothed estimates of the RTT (1) and variance in RTT (2)
  • Update estimates with a moving average

1. SRTTN+1= 0.9*SRTTN + 0.1*RTTN+1 2. SvarN+1 = 0.9*SvarN + 0.1*|RTTN+1–SRTTN+1|

  • Set timeout to a multiple of estimates
  • To estimate the upper RTT in practice
  • TCP TimeoutN = SRTTN + 4*SvarN

CSEP 561 University of Washington 65

slide-66
SLIDE 66

Example of Adaptive Timeout

CSEP 561 University of Washington 66

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

RTT (ms) SRTT Svar Seconds

slide-67
SLIDE 67

Example of Adaptive Timeout (2)

CSEP 561 University of Washington 67

100 200 300 400 500 600 700 800 900 1000 20 40 60 80 100 120 140 160 180 200

RTT (ms) Timeout (SRTT + 4*Svar) Early timeout Seconds

slide-68
SLIDE 68

Adaptive Timeout (2)

  • Simple to compute, does a good job of tracking

actual RTT

  • Little “headroom” to lower
  • Yet very few early timeouts
  • Turns out to be important for good performance

and robustness

CSEP 561 University of Washington 68

slide-69
SLIDE 69

Congestion

slide-70
SLIDE 70

TCP to date:

  • We can set up a connection (connection

establishment)

  • Tear down a connection (connection release)
  • Keep the sending and receiving buffers from
  • verflowing (flow control)

What’s missing?

slide-71
SLIDE 71

Network Congestion

  • A “traffic jam” in the network
  • Later we will learn how to control it

CSEP 561 University of Washington 71

What’s the hold up? Network

slide-72
SLIDE 72

Congestion Collapse in the 1980s

  • Early TCP used fixed size window (e.g., 8 packets)
  • Initially fine for reliability
  • But something happened as the ARPANET grew
  • Links stayed busy but transfer rates fell by orders of

magnitude!

CSEP 561 University of Washington 72

slide-73
SLIDE 73

Nature of Congestion

  • Routers/switches have internal buffering

CSEP 561 University of Washington 73

. . .

. . . . . .

. . . Input Buffer Output Buffer Fabric Input Output

slide-74
SLIDE 74

Nature of Congestion (2)

  • Simplified view of per port output queues
  • Typically FIFO (First In First Out), discard when full

CSEP 561 University of Washington 74

Router

=

(FIFO) Queue Queued Packets Router

slide-75
SLIDE 75

Nature of Congestion (3)

  • Queues help by absorbing bursts when input >
  • utput rate
  • But if input > output rate persistently, queue will
  • verflow
  • This is congestion
  • Congestion is a function of the traffic patterns – can
  • ccur even if every link has the same capacity

CSEP 561 University of Washington 75

slide-76
SLIDE 76

Effects of Congestion

  • What happens to performance as we increase load?
slide-77
SLIDE 77

Effects of Congestion (2)

  • What happens to performance as we increase load?
slide-78
SLIDE 78

Effects of Congestion (3)

  • As offered load rises, congestion occurs as queues

begin to fill:

  • Delay and loss rise sharply with more load
  • Throughput falls below load (due to loss)
  • Goodput may fall below throughput (due to spurious

retransmissions)

  • None of the above is good!
  • Want network performance just before congestion

CSEP 561 University of Washington 78

slide-79
SLIDE 79

Van Jacobson (1950—)

  • Widely credited with saving the

Internet from congestion collapse in the late 80s

  • Introduced congestion control

principles

  • Practical solutions (TCP

Tahoe/Reno)

  • Much other pioneering work:
  • Tools like traceroute, tcpdump,

pathchar

  • IP header compression, multicast

CSEP 561 University of Washington 79

slide-80
SLIDE 80

TCP Tahoe/Reno

  • TCP extensions and features we will study:
  • AIMD
  • Fair Queuing
  • Slow-start
  • Fast Retransmission
  • Fast Recovery

CSEP 561 University of Washington 80

slide-81
SLIDE 81

TCP Timeline

CSEP 561 University of Washington 81

1988 1990 1970 1980 1975 1985 Origins of “TCP” (Cerf & Kahn, ’74) 3-way handshake (Tomlinson, ‘75) TCP Reno (Jacobson, ‘90) Congestion collapse Observed, ‘86 TCP/IP “flag day” (BSD Unix 4.2, ‘83) TCP Tahoe (Jacobson, ’88)

Pre-history Congestion control

. . . TCP and IP (RFC 791/793, ‘81)

slide-82
SLIDE 82

TCP Timeline (2)

CSEP 561 University of Washington 82

2010 2000 1995 2005 ECN (Floyd, ‘94) TCP Reno (Jacobson, ‘90) TCP New Reno (Hoe, ‘95) TCP BIC (Linux, ‘04 TCP with SACK (Floyd, ‘96)

Diversification Classic congestion control

. . . 1990 TCP LEDBAT (IETF ’08) TCP Vegas (Brakmo, ‘93) TCP CUBIC (Linux, ’06) . . . Background Router support Delay based FAST TCP (Low et al., ’04) Compound TCP (Windows, ’07)

slide-83
SLIDE 83

Bandwidth Allocation

  • Important task for network is to allocate its capacity

to senders

  • Good allocation is both efficient and fair
  • Efficient means most capacity is used but there is no

congestion

  • Fair means every sender gets a reasonable share the

network

CSEP 561 University of Washington 83

slide-84
SLIDE 84

Bandwidth Allocation (2)

  • Key observation:
  • In an effective solution, Transport and Network layers

must work together

  • Network layer witnesses congestion
  • Only it can provide direct feedback
  • Transport layer causes congestion
  • Only it can reduce offered load

CSEP 561 University of Washington 84

slide-85
SLIDE 85

Bandwidth Allocation (3)

  • Why is it hard? (Just split equally!)
  • Number of senders and their offered load changes
  • Senders may lack capacity in different parts of network
  • Network is distributed; no single party has an overall

picture of its state

CSEP 561 University of Washington 85

slide-86
SLIDE 86

Bandwidth Allocation (4)

  • Solution context:
  • Senders adapt concurrently based on their own view of

the network

  • Design this adaption so the network usage as a whole is

efficient and fair

  • Adaption is continuous since offered loads continue to

change over time

CSEP 561 University of Washington 86

slide-87
SLIDE 87

Fair Allocations

slide-88
SLIDE 88

Fair Allocation

  • What’s a “fair” bandwidth allocation?
  • The max-min fair allocation

CSEP 561 University of Washington 88

slide-89
SLIDE 89

Recall

  • We want a good bandwidth allocation to be both

fair and efficient

  • Now we learn what fair means
  • Caveat: in practice, efficiency is more important

than fairness

CSEP 561 University of Washington 89

slide-90
SLIDE 90

Efficiency vs. Fairness

  • Cannot always have both!
  • Example network with traffic:
  • 

B, B→C and A→ C

  • How much traffic can we carry?

CSEP 561 University of Washington 90

A B C 1 1

slide-91
SLIDE 91

Efficiency vs. Fairness (2)

  • If we care about fairness:
  • Give equal bandwidth to each flow
  • 

B: ½ unit, B→C: ½, and A→C, ½

  • Total traffic carried is 1 ½ units

CSEP 561 University of Washington 91

A B C 1 1

slide-92
SLIDE 92

Efficiency vs. Fairness (3)

  • If we care about efficiency:
  • Maximize total traffic in network
  • 

B: 1 unit, B→C: 1, and A→C, 0

  • Total traffic rises to 2 units!

CSEP 561 University of Washington 92

A B C 1 1

slide-93
SLIDE 93

The Slippery Notion of Fairness

  • Why is “equal per flow” fair anyway?
  • 

C uses more network resources than A→B or B→C

  • Host A sends two flows, B sends one
  • Not productive to seek exact fairness
  • More important to avoid starvation
  • A node that cannot use any bandwidth
  • “Equal per flow” is good enough

CSEP 561 University of Washington 93

slide-94
SLIDE 94

Generalizing “Equal per Flow”

  • Bottleneck for a flow of traffic is the link that limits

its bandwidth

  • Where congestion occurs for the flow
  • For A→C, link A–B is the bottleneck

CSEP 561 University of Washington 94

A B C 1 10 Bottleneck

slide-95
SLIDE 95

Generalizing “Equal per Flow” (2)

  • Flows may have different bottlenecks
  • For A→C, link A–B is the bottleneck
  • For B→C, link B–C is the bottleneck
  • Can no longer divide links equally …

CSEP 561 University of Washington 95

A B C 1 10

slide-96
SLIDE 96

Adapting over Time

  • Allocation changes as flows start and stop

CSEP 561 University of Washington 96

Time

slide-97
SLIDE 97

Adapting over Time (2)

CSEP 561 University of Washington 97

Flow 1 slows when Flow 2 starts Flow 1 speeds up when Flow 2 stops Time Flow 3 limit is elsewhere

slide-98
SLIDE 98

Bandwidth Allocation Models

  • Open loop versus closed loop
  • Open: reserve bandwidth before use
  • Closed: use feedback to adjust rates
  • Host versus Network support
  • Who is sets/enforces allocations?
  • Window versus Rate based
  • How is allocation expressed?

CSEP 561 University of Washington 98

TCP is a closed loop, host-driven, and window-based

slide-99
SLIDE 99

Bandwidth Allocation Models (2)

  • We’ll look at closed-loop, host-driven, and window-

based too

  • Network layer returns feedback on current

allocation to senders

  • For TCP signal is “a packet dropped”
  • Transport layer adjusts sender’s behavior via

window in response

  • How senders adapt is a control law

CSEP 561 University of Washington 99

slide-100
SLIDE 100

Additive Increase Multiplicative Decrease

  • AIMD is a control law hosts can use to reach a good

allocation

  • Hosts additively increase rate while network not congested
  • Hosts multiplicatively decrease rate when congested
  • Used by TCP
  • Let’s explore the AIMD game …

CSEP 561 University of Washington 100

slide-101
SLIDE 101

AIMD Game

  • Hosts 1 and 2 share a bottleneck
  • But do not talk to each other directly
  • Router provides binary feedback
  • Tells hosts if network is congested

CSEP 561 University of Washington 101

Rest of Network Bottleneck Router Host 1 Host 2 1 1 1

slide-102
SLIDE 102

AIMD Game (2)

  • Each point is a possible allocation

CSEP 561 University of Washington 102

Host 1 Host 2

1 1

Fair Efficient Optimal Allocation Congested

slide-103
SLIDE 103

AIMD Game (3)

  • AI and MD move the allocation

CSEP 561 University of Washington 103

Host 1 Host 2

1 1

Fair, y=x Efficient, x+y=1 Optimal Allocation Congested Multiplicative Decrease Additive Increase

slide-104
SLIDE 104

AIMD Game (4)

  • Play the game!

CSEP 561 University of Washington 104

Host 1 Host 2

1 1

Fair Efficient Congested A starting point

slide-105
SLIDE 105

AIMD Game (5)

  • Always converge to good allocation!

CSEP 561 University of Washington 105

Host 1 Host 2

1 1

Fair Efficient Congested A starting point

slide-106
SLIDE 106

AIMD Sawtooth

  • Produces a “sawtooth” pattern over time for rate of

each host

  • This is the TCP sawtooth (later)

CSEP 561 University of Washington 106

Multiplicative Decrease Additive Increase Time Host 1 or 2’s Rate

slide-107
SLIDE 107

AIMD Properties

  • Converges to an allocation that is efficient and fair

when hosts run it

  • Holds for more general topologies
  • Other increase/decrease control laws do not! (Try

MIAD, MIMD, MIAD)

  • Requires only binary feedback from the network

CSEP 561 University of Washington 107

slide-108
SLIDE 108

Feedback Signals

  • Several possible signals, with different pros/cons
  • We’ll look at classic TCP that uses packet loss as a signal

CSEP 561 University of Washington 108

Signal Example Protocol Pros / Cons Packet loss TCP NewReno Cubic TCP (Linux) Hard to get wrong Hear about congestion late Packet delay TCP BBR (Youtube) Hear about congestion early Need to infer congestion Router indication TCPs with Explicit Congestion Notification Hear about congestion early Require router support

slide-109
SLIDE 109

Slow Start (TCP Additive Increase)

slide-110
SLIDE 110

Practical AIMD

  • We want TCP to follow an AIMD control law for a

good allocation

  • Sender uses a congestion window or cwnd to set its

rate (≈cwnd/RTT)

  • Sender uses loss as network congestion signal
  • Need TCP to work across a very large range of rates

and RTTs

CSEP 561 University of Washington 110

slide-111
SLIDE 111

TCP Startup Problem

  • We want to quickly near the right rate, cwndIDEAL, but

it varies greatly

  • Fixed sliding window doesn’t adapt and is rough on the

network (loss!)

  • Additive Increase with small bursts adapts cwnd gently to

the network, but might take a long time to become efficient

CSEP 561 University of Washington 111

slide-112
SLIDE 112

Slow-Start Solution

  • Start by doubling cwnd every RTT
  • Exponential growth (1, 2, 4, 8, 16, …)
  • Start slow, quickly reach large values

112

AI Fixed Time Window (cwnd) Slow-start

slide-113
SLIDE 113

Slow-Start Solution (2)

  • Eventually packet loss will occur when the network

is congested

  • Loss timeout tells us cwnd is too large
  • Next time, switch to AI beforehand
  • Slowly adapt cwnd near right value
  • In terms of cwnd:
  • Expect loss for cwndC ≈ 2BD+queue
  • Use ssthresh = cwndC/2 to switch to AI

CSEP 561 University of Washington 113

slide-114
SLIDE 114

Slow-Start Solution (3)

  • Combined behavior, after first time
  • Most time spend near right value

114

AI Fixed Time Window ssthresh cwndC cwndIDEAL AI phase Slow-start

slide-115
SLIDE 115

Slow-Start (Doubling) Timeline

CSEP 561 University of Washington 115

Increment cwnd by 1 packet for each ACK

slide-116
SLIDE 116

Additive Increase Timeline

CSEP 561 University of Washington 116

Increment cwnd by 1 packet every cwnd ACKs (or 1 RTT)

slide-117
SLIDE 117

TCP Tahoe (Implementation)

  • Initial slow-start (doubling) phase
  • Start with cwnd = 1 (or small value)
  • cwnd += 1 packet per ACK
  • Later Additive Increase phase
  • cwnd += 1/cwnd packets per ACK
  • Roughly adds 1 packet per RTT
  • Switching threshold (initially infinity)
  • Switch to AI when cwnd > ssthresh
  • Set ssthresh = cwnd/2 after loss

CSEP 561 University of Washington 117

slide-118
SLIDE 118

Timeout Misfortunes

  • Why do a slow-start after timeout?
  • Instead of MD cwnd (for AIMD)
  • Timeouts are sufficiently long that the ACK clock will

have run down

  • Slow-start ramps up the ACK clock
  • We need to detect loss before a timeout to get to

full AIMD

CSEP 561 University of Washington 118

slide-119
SLIDE 119

Fast Recovery (TCP Multiplicative Decrease)

slide-120
SLIDE 120

Practical AIMD (2)

  • We want TCP to follow an AIMD control law for a

good allocation

  • Sender uses a congestion window or cwnd to set its

rate (≈cwnd/RTT)

  • Sender uses slow-start to ramp up the ACK clock,

followed by Additive Increase

  • But after a timeout, sender slow-starts again with

cwnd=1 (as it no ACK clock)

CSEP 561 University of Washington 120

slide-121
SLIDE 121

Inferring Loss from ACKs

  • TCP uses a cumulative ACK
  • Carries highest in-order seq. number
  • Normally a steady advance
  • Duplicate ACKs give us hints about what data hasn’t

arrived

  • Tell us some new data did arrive, but it was not next

segment

  • Thus the next segment may be lost

CSEP 561 University of Washington 121

slide-122
SLIDE 122

Fast Retransmit

  • Treat three duplicate ACKs as a loss
  • Retransmit next expected segment
  • Some repetition allows for reordering, but still detects loss

quickly

CSEP 561 University of Washington 122

Ack 1 2 3 4 5 5 5 5 5 5

slide-123
SLIDE 123

Fast Retransmit (2)

CSEP 561 University of Washington 123

Ack 10 Ack 11 Ack 12 Ack 13

. . .

Ack 13 Ack 13 Ack 13 Data 14

. . .

Ack 13 Ack 20

. . . . . .

Data 20

Third duplicate ACK, so send 14 Retransmission fills in the hole at 14 ACK jumps after loss is repaired . . . . . . Data 14 was lost earlier, but got 15 to 20

slide-124
SLIDE 124

Fast Retransmit (3)

  • It can repair single segment loss quickly, typically

before a timeout

  • However, we have quiet time at the sender/receiver

while waiting for the ACK to jump

  • And we still need to MD cwnd …

CSEP 561 University of Washington 124

slide-125
SLIDE 125

Inferring Non-Loss from ACKs

  • Duplicate ACKs also give us hints about what data

has arrived

  • Each new duplicate ACK means that some new segment

has arrived

  • It will be the segments after the loss
  • Thus advancing the sliding window will not increase the

number of segments stored in the network

CSEP 561 University of Washington 125

slide-126
SLIDE 126

Fast Recovery

  • First fast retransmit, and MD cwnd
  • Then pretend further duplicate ACKs are the

expected ACKs

  • Lets new segments be sent for ACKs
  • Reconcile views when the ACK jumps

CSEP 561 University of Washington 126

Ack 1 2 3 4 5 5 5 5 5 5

slide-127
SLIDE 127

Fast Recovery (2)

CSEP 561 University of Washington 127

Ack 12 Ack 13 Ack 13 Ack 13 Ack 13 Data 14 Ack 13 Ack 20

. . . . . .

Data 20

Third duplicate ACK, so send 14 Data 14 was lost earlier, but got 15 to 20 Retransmission fills in the hole at 14 Set ssthresh, cwnd = cwnd/2

Data 21 Data 22

More ACKs advance window; may send segments before jump

Ack 13

Exit Fast Recovery

slide-128
SLIDE 128

Fast Recovery (3)

  • With fast retransmit, it repairs a single segment loss

quickly and keeps the ACK clock running

  • This allows us to realize AIMD
  • No timeouts or slow-start after loss, just continue with a

smaller cwnd

  • TCP Reno combines slow-start, fast retransmit and

fast recovery

  • Multiplicative Decrease is ½

CSEP 561 University of Washington 128

slide-129
SLIDE 129

TCP Reno

CSEP 561 University of Washington 129

MD of ½ , no slow-start ACK clock running TCP sawtooth

slide-130
SLIDE 130

TCP Reno, NewReno, and SACK

  • Reno can repair one loss per RTT
  • Multiple losses cause a timeout
  • NewReno further refines ACK heuristics
  • Repairs multiple losses without timeout
  • Selective ACK (SACK) is a better idea
  • Receiver sends ACK ranges so sender can retransmit

without guesswork

CSEP 561 University of Washington 130

slide-131
SLIDE 131

TCP CUBIC

  • Standard TCP Stack in Linux (> 2.6.19) and Windows (>

10.1709)

  • Internet grows to have more long-distance, high

bandwidth connections

  • Seeks to resolve two key problems with “standard” TCP:

⚫ Flows with lower RTT’s “grow” faster than those with

higher RTTs

⚫ Flows grow too “slowly” (linearly) after congestion

CSEP 561 University of Washington 131

slide-132
SLIDE 132

TCP CUBIC

1) At the time of experiencing congestion event the window size for that instant will be recorded as Wmax or the maximum window size. 2) The Wmax value will be set as the inflection point of the cubic function that will govern the growth of the congestion window. 3) The transmission will then be restarted with a smaller window value (20%) and, if no congestion is experienced, this value will increase according to the concave portion of the cubic function (not depending on received ACKs for cadence).

CSEP 561 University of Washington 132

slide-133
SLIDE 133

TCP CUBIC

CSEP 561 University of Washington 133

slide-134
SLIDE 134

TCP CUBIC vs Everyone

CSEP 561 University of Washington 134

slide-135
SLIDE 135

TCP BBR

  • Bottleneck Bandwidth and Round-trip propagation time
  • Developed at Google in 2016 primarily for YouTube traffic
  • Attempting to solve “bufflerbloat” problem
  • “Model-based” (Vegas) rather than “Loss-based” (CUBIC)

⚫ Measure RTT, latency, bottleneck bandwidth ⚫ Use this to predict window size

CSEP 561 University of Washington 135

slide-136
SLIDE 136

Bufferbloat

  • Larger queues are better than smaller queues right?

CSEP 561 University of Washington 136

slide-137
SLIDE 137

Bufferbloat

  • Given TCP loss semantics…
  • Performance can decrease

buffer size is increased

  • Consider a full buffer:

⚫ New packets arrive and

are dropped (‘tail drop’)

⚫ SACK doesn’t arrive until

entire buffer sent

CSEP 561 University of Washington 137

slide-138
SLIDE 138

TCP BBR

  • BBR Has 4 Distinct Phases

1) Startup: Basically identical to Cubic. Expontentially grow until RTTs start to increase (instead of dropped packet). Set cwnd. 2) Drain: Startup filled a queue. Temporarily reduce sending rate (known as “pacing gain”) 3) Probe Bandwidth: Increase sending rate to see if there’s more capacity. If not, drain again. 4) Probe RTT: Reduce rate dramatically (4 packets) to measure RTT. Use this as our baseline for above.

CSEP 561 University of Washington 138

slide-139
SLIDE 139

TCP BBR vs Everyone

CSEP 561 University of Washington 139

slide-140
SLIDE 140

Network-Side Congestion Control

slide-141
SLIDE 141

Congestion Avoidance vs. Control

  • Classic TCP drives the network into congestion and

then recovers

  • Needs to see loss to slow down
  • Would be better to use the network but avoid

congestion altogether!

  • Reduces loss and delay
  • But how can we do this?

CSEP 561 University of Washington 141

slide-142
SLIDE 142

Feedback Signals

  • Delay and router signals can let us avoid congestion

CSEP 561 University of Washington 142

Signal Example Protocol Pros / Cons Packet loss Classic TCP Cubic TCP (Linux) Hard to get wrong Hear about congestion late Packet delay TCP BBR (Youtube) Hear about congestion early Need to infer congestion Router indication TCPs with Explicit Congestion Notification Hear about congestion early Require router support

slide-143
SLIDE 143

ECN (Explicit Congestion Notification)

  • Router detects the onset of congestion via its queue
  • When congested, it marks affected packets (IP header)

CSEP 561 University of Washington 143

slide-144
SLIDE 144

ECN (2)

  • Marked packets arrive at receiver; treated as loss
  • TCP receiver reliably informs TCP sender of the congestion

CSEP 561 University of Washington 144

slide-145
SLIDE 145

ECN (3)

  • Advantages:
  • Routers deliver clear signal to hosts
  • Congestion is detected early, no loss
  • No extra packets need to be sent
  • Disadvantages:
  • Routers and hosts must be upgraded (currently 1%)
  • More work at router

CSEP 561 University of Washington 145

slide-146
SLIDE 146

Random Early Detection (RED)

  • Jacobson (again!) and Floyd
  • Alternative idea: instead of marking packets, drop
  • We know they’re using TCP, make use of that fact
  • Signals congestion to sender
  • But without adding headers or doing packet inspection
  • Drop at random, depending on queue size
  • If queue empty, accept packet always
  • If queue full, always drop
  • As queue approaches full, increase likelihood of packet drop
  • Example: 1 queue slot left, 10 packets expected, 90% chance of drop
slide-147
SLIDE 147

RED (Random Early Detection)

  • Router detects the onset of congestion via its queue
  • Prior to congestion, drop a packet to signal

CSEP 561 University of Washington 147

Drop packet

slide-148
SLIDE 148

RED (Random Early Detection)

  • Sender enters MD, slows packet flow
  • We shed load, everyone is happy

CSEP 561 University of Washington 148

Drop packet