TCP Overview Jeff Chase Duke University These slides draw - - PowerPoint PPT Presentation

tcp overview
SMART_READER_LITE
LIVE PREVIEW

TCP Overview Jeff Chase Duke University These slides draw - - PowerPoint PPT Presentation

TCP Overview Jeff Chase Duke University These slides draw extensively on material from Srini Seshan and Dave Andersen at CMU (mostly figures), and they also incorporate earlier material by Adolfo Rodriguez and Amin Vahdat. Some congestion


slide-1
SLIDE 1

TCP Overview

Jeff Chase Duke University

These slides draw extensively on material from Srini Seshan and Dave Andersen at CMU (mostly figures), and they also incorporate earlier material by Adolfo Rodriguez and Amin Vahdat. Some congestion control slides are from Ion Stoica.

slide-2
SLIDE 2

The Internet Protocol Suite

Application Application Presentation Presentation Session Session Transport Transport Network Network Data link Data link Physical Physical UDP TCP Data Link Physical Applications Presentation Session

The Hourglass Model

Waist The waist facilitates Interoperability.

slide-3
SLIDE 3

UDP

  • User Datagram Protocol (UDP)

– Thin veneer on top of IP – Data sent as individual datagrams

  • From a sender to a receiver: like USPS

– Demultiplex receiver: (IPaddr, port) pair – No guarantees about reliability, in-order delivery – Checksum to prevent corruption of data

Link-layer IP SrcPort DestPort Checksum Len Data… UDP Header

slide-4
SLIDE 4

TCP

  • Transmission Control Protocol (TCP)

– Reliable in-order delivery of byte stream – Full-duplex: each endpoint may send and receive – Flow control

  • To ensure that sender does not overrun

receiver by sending too fast – Congestion control

  • Keep the sender from overrunning the network
  • Many simultaneous connections across routers

(cross traffic)

slide-5
SLIDE 5

A Brief Internet History

1970 1975 1980 1985 1990 1995

1969 ARPANET created 1972 TELNET

RFC 318

1973 FTP

RFC 454

1982 TCP & IP

RFC 793 & 791

1977 MAIL

RFC 733

1984 DNS

RFC 883

1986 NNTP

RFC 977

1990 ARPANET dissolved 1991 WWW/HTTP 1992 MBONE 1995 Multi-backbone Internet

slide-6
SLIDE 6

TCP sender TCP receiver COMPLETE SEND, RECEIVE timer packet arrival packet transmission TCP user TCP/IP implementation drivers

slide-7
SLIDE 7

Some TCP Challenges

  • Segment byte stream into individual packets

– How big should the packets/segments be?

  • What if packets are delivered out of order?

– May take different paths through the network

  • What if a packet is lost?

– Packets may be dropped in the network

  • What if a packet is corrupted in transit?

– Detect error and fix it or resend

  • How fast should the sender send?
slide-8
SLIDE 8

Mechanism: Checksums

  • Checksum C = F(contents)
  • Checksum C is small, fixed-size (in essence, a hash)
  • Generate at sender and place in segment
  • Verify at receiver
  • If checksum matches, packet is not corrupt
  • Probably…
slide-9
SLIDE 9

Sequence Number Space

  • Each byte in byte stream is numbered.

– 32 bit value – Wraps around – Initial values selected at start up time

  • Each packet/segment has a sequence number and length

– Indicates where it fits in the byte stream

packet 8 packet 9 packet 10

13450 14950 16050 17550

slide-10
SLIDE 10

Sequence Numbers

  • 32 Bits, Unsigned

– Circular Comparison

  • Why So Big?

– Guard against stray packets

  • With IP, packets have maximum lifetime of 120s
  • Sequence number would wrap around in this time at 286MB/s

Max a b a < b Max b a b < a

slide-11
SLIDE 11

Using the Sequence Numbers

  • Reassembly buffer

– Packets/segments received into (kernel) memory – Sort them by sequence number – Deliver segments to application in order! – Seq(i) + Len(i) < Seq(i+1)?

  • Gap: defer delivery of segment i+1
  • Acknowledgments

– Periodically send back the sequence number of the latest (newest) byte received in order. – No ack received? Lost segment: retransmit.

  • How long to wait?
slide-12
SLIDE 12

TCP Header Format

SrcPort DestPort SequenceNum Acknowledgment HdrLen AdvertisedWindow Flags CheckSum UrgPtr Options (variable) Data 4 10 16 31

  • Without options, TCP header 20 bytes

– Thus, typical Internet packet minimum of 40 bytes

slide-13
SLIDE 13

Establishing Connection: Three-Way handshake

  • Each side notifies other of

starting sequence number it will use for sending – Why not simply chose 0?

  • Must avoid overlap with

earlier incarnation

  • Security issues
  • Each side acknowledges
  • ther’s sequence number

– SYN-ACK: Acknowledge sequence number + 1

  • Can combine second SYN with

first ACK

SYN: SeqC ACK: SeqC+1 SYN: SeqS ACK: SeqS+1

Client Server

slide-14
SLIDE 14

TCP State Diagram: Connection Setup

CLOSED SYN SENT SYN RCVD ESTAB LISTEN

active OPEN create TCB Snd SYN create TCB passive OPEN delete TCB CLOSE delete TCB CLOSE snd SYN SEND snd SYN ACK rcv SYN Send FIN CLOSE rcv ACK of SYN Snd ACK Rcv SYN, ACK rcv SYN snd ACK Client Server

slide-15
SLIDE 15

Tearing Down Connection

  • Either side can initiate tear down

– Send FIN signal – “I’m not going to send any more data”

  • Other side can continue sending

data – Half open connection – Must continue to acknowledge

  • Acknowledging FIN

– Acknowledge last sequence number + 1 A B

FIN, SeqA ACK, SeqA+1 ACK Data ACK, SeqB+1 FIN, SeqB

slide-16
SLIDE 16

Reliable Transmission

  • How do we send a packet reliably when it can be lost?
  • Two mechanisms

– Acks – Timeouts

  • Simplest reliable protocol: Stop and Wait
slide-17
SLIDE 17

Stop and Wait

Time P a c k e t ACK Timeout Send a packet, stop and wait until ack arrives Sender Receiver

slide-18
SLIDE 18

Recovering From Error

P a c k e t ACK Timeout P a c k e t ACK Timeout P a c k e t Timeout P a c k e t ACK Timeout Time P a c k e t ACK Timeout P a c k e t ACK Timeout ACK lost Packet lost Early timeout

slide-19
SLIDE 19

Problems with Stop and Wait

  • How to recognize a duplicate transmission?

– Solution: put sequence number in packet

  • Performance

– Unless Latency-Bandwidth product is very small, sender cannot fill the pipe – Solution: sliding window protocols

slide-20
SLIDE 20

Keeping the Pipe Full

  • Bandwidth-Delay product measures network capacity
  • How much data can you put into the network before

the first byte reaches receiver

  • Stop and Wait: 1 data packet per RTT

– Ex. 1.5-Mbps link with 45-ms RTT – Stop-and-wait: 182 Kbps

  • Ideally, send enough packets to fill the pipe before

requiring first ACK

Bandwidth Latency

slide-21
SLIDE 21

How Do We Keep the Pipe Full?

  • Send multiple packets

without waiting for first to be ACKed – How many? Limited by the “window” wnd. – Flow/congestion policies set wnd.

  • Self-clocking sliding window

– Arrival of an ack opens up another window “slot” to send – Ideally, first ACK arrives immediately after window is filled – Else pipeline “bubbles” waste bandwidth

  • Throughput = wnd/RTT

wnd = 3

segment 1 segment 2 segment 3 ACK 2 segment 4 ACK 3 segment 5 segment 6 ACK 4

RTT (Round Trip Time)

[Stoica]

slide-22
SLIDE 22

Flow Control

  • Receiver devotes some buffer space to hold incoming bytes until

the application consumes them. – Socket buffers

  • How much? Must place a bound on it.

– Advertise wnd: max number of bytes to accept – Receiver returns AdvertisedWindow in TCP header of its acknowledgments back to the sender.

  • Sliding window

– Flow window is range of bytes receiver will accept

  • [ack+1, ack + wnd]

– Receiver drops segments/bytes outside the window – Sender stops transmitting when it fills the window

  • Bytes in transit <= wnd

– Each side advances window as data is delivered.

slide-23
SLIDE 23

Window Flow Control: Send Side

Sent but not acked Not yet sent window Next to be sent Sent and acked Next to be written by TCP user

slide-24
SLIDE 24

acknowledged sent to be sent outside window

Source Port Source Port

  • Dest. Port
  • Dest. Port

Sequence Number Sequence Number Acknowledgment Acknowledgment HL/Flags HL/Flags Window Window

  • D. Checksum
  • D. Checksum

Urgent Pointer Urgent Pointer Options… Options… Source Port Source Port

  • Dest. Port
  • Dest. Port

Sequence Number Sequence Number Acknowledgment Acknowledgment HL/Flags HL/Flags Window Window

  • D. Checksum
  • D. Checksum

Urgent Pointer Urgent Pointer Options... Options...

Packet Sent Packet Received App write

Window Flow Control: Send Side

slide-25
SLIDE 25

Acked but not delivered to user Not yet acked Receive buffer window

Window Flow Control: Receive Side

New

What should receiver do with an arriving segment?

duplicate out-of-

  • rder
  • utside

window

slide-26
SLIDE 26

snd.una snd.bufsize

complete sends sent but unacked

(retransmission queue) acks in flight data in flight snd.nxt rcv.nxt rcv.bufsize

acknowledged

(pending delivery)

reorder buffer

rcv.user

not received

(lazily allocated)

not sent

(posted sends) newer data

increasing sequence numbers

rcv.wnd rcv.last

Another fig….might be useful. If you’re interested, all of this basic data transfer is specified in RFC 793.

slide-27
SLIDE 27

TCP Persist

  • What happens if window is 0?

– Application has not consumed data fast enough. – Receiver buffer exhausted: grind to halt

  • Must reopen window when application reads data

– App reads data: opens up buffer space – Receiver sends segment with “window update” – What if this update is lost?

  • TCP Persist state (sender idled by closed window)

– Sender periodically sends 1 byte packets – Receiver responds with ACK even if it can’t store the packet – ACK segment includes current window

slide-28
SLIDE 28

Performance

slide-29
SLIDE 29

Limits to Throughput

  • How fast can the app produce or consume data?
  • Hardware/path limitations

– Wire speed – Host limitations/overhead

  • Efficient use of the wire

– Leaving network idle – Sending duplicate data – High ratio of control to data – What causes loss of efficiency?

slide-30
SLIDE 30

Efficiency: Flow Window Size

  • OS system calls to allow receiving application to set

the socket buffer size. – Defaults are often too small for long/fat networks – Leaves network bandwidth idle

  • The window size field in the TCP header limits the

window that the receiver can advertise. – 16 bits 64 KBytes – 10 msec RTT 51 Mbit/second – 100 msec RTT 5 Mbit/second

  • Solution: TCP options added to get around 64KB limit

– Window scaling (RFC 1323) – Shift advertised window field by specified number of bits

slide-31
SLIDE 31

Efficiency: RTT Estimation

  • Retransmission timer (RTO)

– Underestimate RTT unnecessary retransmits – Overestimate RTT network idles after drops

  • Solution: sender samples RTT by measuring time

between transmit of segment and received ack. – TCP now has an option to make this easier

  • Receiver reflects timestamp placed by sender
  • But samples will vary: how to get a stable estimate?

– Key technique: exponential smoothing – Exponentially weighted moving average – But it’s tricky…

slide-32
SLIDE 32

TCP RTT Estimator

  • Exponential smoothing:

– Identify persistent behaviors/trends, but do not over- react to transient changes. – RTT = α (old RTT) + (1 - α) (new sample) – Recommended value for α: 0.8 - 0.9

  • 0.875 for most TCPs

0.5 1 1.5 2 2.5

Note: it is also tricky to convert the RTT estimate into a good value for the RTO:

  • Track RTT variance.
  • “Loosen” the timer when the

RTT variance is high.

slide-33
SLIDE 33

Efficiency: Tactical Delays

  • Want to send full-size segments, but what if app

writes just a small amount of data?

– E.g., a user typing at a remote shell. – Wait for more data? How long? – Nagle heuristic: allow at most one outstanding unacked short segment.

  • Want to advertise reopenings in the flow window, but

what if app reads just a small amount of data?

– Silly window syndrome (Clark 1982) – Wait for more reads? How long? – Heuristic: increase window by min(MSS,RecvBuffer/2)

  • Want to ack data in a timely fashion, but also take

advantage of cumulative acks to reduce ack traffic.

– Heuristic: after receiving an in-order segment, wait up to 500 ms for another one before sending an ack.

slide-34
SLIDE 34

Overhead

slide-35
SLIDE 35

transmit packet to network interface move data from application to system buffer TCP/IP protocol compute checksum network driver

sender

deposit packet in host memory move data from system buffer to application TCP/IP protocol compare checksum network driver

receiver

Sources of CPU Overhead

slide-36
SLIDE 36

Overhead

bad good

a

  • a
  • Although TCP/IP family protocol processing itself is

reasonably efficient, managing a dumb NIC steals CPU/memory cycles away from the application.

a = application processing per unit of bandwidth

  • = host communication overhead per unit of bandwidth
slide-37
SLIDE 37

Bandwidth (wire speed)

1/(a+o)

The host/network gap

Host overhead (o) Application (server) throughput Host saturation throughput curve “bad” “good” Gap

If overhead is high, then the host CPU will “saturate” below wire speed. What matters: a+o.

slide-38
SLIDE 38

Outrunning Moore’s Law?

time Network Bandwidth per CPU cycle

Ethernet SAN

High-speed Small-Area Networks (SANs) and Ethernet are both advancing ahead of Moore’s Law, e.g., roughly one order of magnitude every 4 years.

compute-intensive apps I/O-intensive apps

high performance (data center?) How much bandwidth do data center applications need? “Amdahl’s other law”:

One one bit of I/O per MIP per second. etc.

slide-39
SLIDE 39

Hitting the wall

time Bandwidth per CPU cycle Ethernet SAN Host saturation point Throughput improves as hosts advance, but bandwidth per cycle is constant once the host saturation point is reached.

slide-40
SLIDE 40

“IP SANs”

  • If you believe in the problem, then the solution is to

attach hosts to the faster wires with smarter NICs. – Hardware checksums, interrupt suppression – Transport offload (TOE) – Connection-aware w/ early demultiplexing – ULP offload (e.g., iSCSI) – Direct data placement/RDMA (“remote DMA”)

  • Since these NICs take on the key characteristics of

SANs, let’s use the generic term “IP-SAN”. – S stands for Small, Server, Storage, System, … – Non-IP SANs: Giganet, FibreChannel, Infiniband…

slide-41
SLIDE 41

How much can IP-SANs help?

  • IP-SAN is a difficult engineering challenge.

– It takes time and money to get it right.

  • LAWS [Shivam&Chase03] is a “back of napkin” analysis

to explore potential benefits and limitations.

  • Figure of merit: marginal improvement in peak

application throughput (“speedup”)

  • Premise: Internet servers are fully pipelined

– Ignore latency (your mileage may vary) – IP-SANs can improve throughput if host saturates.

slide-42
SLIDE 42

Application ratio (γ)

a

  • a
  • a
  • a
  • Application ratio (γ) captures “compute-intensity”.

γ = a/o

For a given application, lower overhead increases γ. For a given communication system,

γ is a property of the application:

it captures processing per unit of bandwidth.

slide-43
SLIDE 43

compute-intensity (γ) throughput increase (%) Amdahl’s Law bounds the potential improvement to 1/γ when the system is still host-limited after offload. What is γ for “typical” services in the data center?

γ and Amdahl’s Law

CPU-intensive apps

⇒ low benefit

network-intensive apps ⇒ high benefit Apache Apache w/ Perl?

1/γ

slide-44
SLIDE 44

What to Know

  • Overhead often matters for network performance,

sometimes more than latency and bandwidth. – That goes for other I/O as well. – Overhead matters if but only if the CPU saturates.

  • Reducing overhead might improve performance…or not.

– Your mileage may vary: throughput < 1/(a+o) – Be skeptical of claims.

  • Amdahl’s Law (diminishing returns):

– How much speedup can I get by optimizing one part

  • f the system? (e.g., eliminating o)

– Depends on its share of the total cost.

  • Throughput < 1/a speedup bounded by o/a
slide-45
SLIDE 45

Congestion

slide-46
SLIDE 46

Congestion

  • Different sources compete for resources inside network
  • Why is it a problem?

– Sources are unaware of current state of resource – Sources are unaware of each other

  • Manifestations:

– Lost packets (buffer overflow at routers) – Long delays (queuing in router buffers) – Can result in throughput less than bottleneck link (1.5Mbps for the above topology) a.k.a. congestion collapse 10 Mbps 100 Mbps 1.5 Mbps

slide-47
SLIDE 47

Causes & Costs of Congestion

  • Four senders – multihop paths
  • Timeout/retransmit

Q: What happens as rate increases?

slide-48
SLIDE 48

Causes & Costs of Congestion

  • When packet dropped, any “upstream transmission capacity

used for that packet was wasted!

slide-49
SLIDE 49

Congestion Collapse

  • Definition: Increase in network load results in decrease
  • f useful work done
  • Many possible causes

– Spurious retransmissions of packets still in flight

  • Classical congestion collapse
  • Solution: better timers and TCP congestion

control – Undelivered packets

  • Packets consume resources and are dropped

elsewhere in network

  • Solution: congestion control for ALL traffic
slide-50
SLIDE 50

Congestion Control and Avoidance

  • A mechanism which:

– Uses network resources efficiently – Preserves fair network resource allocation – Prevents or avoids collapse

  • Congestion collapse is not just a theory

– Has been frequently observed in many networks

slide-51
SLIDE 51

What’s Really Happening?

  • Knee – point after which

– Throughput increases very slow – Delay increases fast

  • Cliff – point after which

– Throughput starts to decrease very fast to zero (congestion collapse) – Delay approaches infinity

  • Note (in an M/M/1 queue)

– Delay = 1/(1 – utilization)

Load Load Throughput Delay knee cliff congestion collapse packet loss

[Stoica]

slide-52
SLIDE 52

Congestion Control vs. Congestion Avoidance

  • Congestion control goal

– Stay left of cliff

  • Congestion avoidance goal

– Stay left of knee

Load Throughput knee cliff congestion collapse

slide-53
SLIDE 53

TCP Congestion Control

  • Sender maintains congestion window cwnd

– Max number of packets in flight is limited by both flow window and congestion window – MIN(wnd, cwnd)

  • Sender probes the network by sending faster and

faster, until it encounters congestion. – Grow the cwnd as acks come back

  • Congestion? Throttle back.
  • Driven by senders: distributed, fair and efficient…

– If we can get the policies right…. – And if everybody plays nice.

slide-54
SLIDE 54

AIMD and the “TCP Sawtooth”

  • Additive increase multiplicative decrease
  • TCP periodically probes for available bandwidth by increasing its

window (rate) additively by one segment per window (per RTT).

  • Congestion?

– Cut window in half cut sending rate in half – multiplicative rate decrease

  • AIMD turns out to be stable, and it can be shown that most alternatives

are either too indolent or too aggressive, e.g., waste bandwidth or can still drive the network into congestion collapse.

Time Rate

slide-55
SLIDE 55

Detecting Congestion

  • Very simple mechanisms in IP routers

– congestion loss – Many proposals for routers to mark packets to warn of congestion (ECN, XCP), but not yet widely deployed.

  • So: TCP interprets packet drops as a congestion indicator.

– loss congestion – Not necessarily true…but a good heuristic. – Loss can also result from transient errors, e.g., on wireless

  • Duplicate acknowledgments are also a good early indication that

a packet was dropped… – …or maybe the packets were just reordered. – Heuristic: “one duplicate ack could be a reorder, but if it happens again then it must be congestion”.

  • triple dup ack
slide-56
SLIDE 56

Detecting Congestion: Summary of Approaches

  • End-end congestion control:

– No explicit feedback from network – Congestion inferred from end-system observed loss, delay – Approach taken by TCP

  • Network-assisted congestion

control: – Routers provide feedback to end systems

  • Single bit indicating

congestion (SNA, DECbit, TCP/IP ECN, ATM)

  • Explicit send rate
  • Problem: makes routers

complicated

slide-57
SLIDE 57

Under the Hood (RFC 2581)

  • Ack each segment with highest seqnum received.
  • Acks drive actions at sender: self-clocking
  • Below ssthresh, cwnd = cwnd+1 for each acked segment.

– “Slow start” is really fast, e.g., doubles cwnd per RTT.

  • Above ssthresh, cwnd = cwnd + (1/cwnd) on each acked segment.

– Additive increase, e.g., grows cwnd by one per RTT. – “congestion avoidance”

  • Loss? Congestion!

– Multiplicative decrease: ssthresh = cwnd/2; cwnd = 1 – “congestion control”: back to “slow start”

  • Triple dup ack? Loss! “Fast retransmit”. (Tahoe)
  • After fast-retransmit loss: cwnd = ssthresh, every ack adds

1/cwnd to window, even dups. – Multiplicative decrease with “fast recovery” (Reno)

These details are not to be tested for CPS 196.

slide-58
SLIDE 58

58

The big picture: Basic AIMD

Time cwnd Timeout “Slow Start” Congestion Avoidance

istoica@cs.berkeley.edu

slide-59
SLIDE 59

59

Fast Retransmit and Fast Recovery

  • Retransmit after 3 duplicated acks

– prevent expensive timeouts

  • No need to slow start again
  • At steady state, cwnd oscillates around the
  • ptimal window size.

Time cwnd Slow Start Congestion Avoidance

istoica@cs.berkeley.edu

slide-60
SLIDE 60

TCP Saw Tooth Behavior

Time Congestion Window

Initial Slowstart Fast Retransmit and Recovery Slowstart to pace packets Timeouts may still

  • ccur
slide-61
SLIDE 61

Sending Too Fast

  • Overflow at receiver? Receiver drops packet.
  • Overflow network link? NIC drops packet.
  • Overflow router? Router drops packet.
  • Faster than fair share?

– Pro: you win – Con: somebody else loses – TCP is a game

slide-62
SLIDE 62

Max-Min Fairness Criteria

  • “Fair sharing”
  • But flows have differing demands…
  • Flows demanding less than their share get as much as

they need.

  • Flows demanding more than their share split the

surplus.

  • Generalizes to proportional sharing
slide-63
SLIDE 63

Trust and Rate Control

  • Is gaming TCP a security problem?
  • How should the network deal with this?
  • Whose responsibility is it?
  • What incentive does anyone have to play the game by

the rules? – Good Samaritan? – Rodney King: “Can’t we all just get along?” – Judge Judy? – Adam Smith?

slide-64
SLIDE 64

Savage TCP (Daytona)

  • Attack: “Ack early, ack often”.

– Three variations on a theme. – Ack early hides congestion loss. – “Big ack attack”

  • Defense:

– Don’t make hidden assumptions about a peer’s good behavior.

  • One ack per segment? Uh uh.

– Remove incentives to cheat. – Trust but verify.

  • Nonces and cumulative nonces.

Summary: a malicious TCP receiver can fool an honest sender into sending faster than the network allows, consuming an unfair share of network bandwidth.

TCP Congestion Control with a Misbehaving Receiver. By Stefan Savage, Neal Cardwell, David Wetherall and Tom Anderson. ACM Computer Communications Review, pp. 71-78, v 29, no 5, October, 1999.

slide-65
SLIDE 65

The Role of Routers?

  • Flow: a stream of related packets or demands, e.g.,

between a given source and destination endpoint.

  • TCP-friendly flow: arrival rate does not exceed a

compliant TCP with a given RTT and drop rate.

  • Unresponsive flow: not TCP-friendly.

– Does not throttle back (enough) when packets are dropped. – Can a router identify unresponsive flows? How much time/state does the router need? (hard problem)

  • Principle: congested routers should preferentially

drop packets of unresponsive flows to punish the guilty and create an incentive for cooperation.

Promoting the Use of End-to-End Congestion Control in the Internet. By Sally Floyd and Kevin Fall. IEEE/ACM Transactions on Networking, August 1999.

slide-66
SLIDE 66

TCP Link Sharing Behavior

  • Even if all flows are faithful to the Congestion Game,

there are other behaviors/anomalies to consider. – Is it fair? What if some flows through a congested link have higher RTTs than others? – What if sender and receiver use multiple TCP streams to communicate?

  • Browsers, gridFTP

– What if congestion patterns are dynamic at time scales smaller than an RTT? – How effectively can one TCP flow use the bandwidth? What if the network is long/fat? – Is more buffering at the routers always better? – Does it work for small/bursty connections?

slide-67
SLIDE 67

Suitability of TCP

  • Rate control: required to be a good citizen?

– “Stick with TCP” or “stuck with TCP”?

  • In order delivery always good?
  • Loss tolerance vs. jitter?

– Sometimes jitter is worse than loss.

  • Alternative transports in IETF: SCTP, DCCP
  • Security? TLS/SSL (coming soon…)
  • Does content distribution need different protocols?

– Multicast – Different sending/receiving rates – Forward Error Correction

slide-68
SLIDE 68

Extra slides: may be of interest, but will not be tested.

slide-69
SLIDE 69

TCP Timeline

1975 1980 1985 1990

1982 TCP & IP

RFC 793 & 791

1974 TCP described by Vint Cerf and Bob Kahn In IEEE Trans Comm 1983 BSD Unix 4.2 supports TCP/IP 1984 Nagel’s algorithm to reduce overhead

  • f small packets;

predicts congestion collapse 1987 Karn’s algorithm to better estimate round-trip time 1986 Congestion collapse

  • bserved

1988 Van Jacobson’s algorithms congestion avoidance and congestion control (most implemented in 4.3BSD Tahoe) 1990 4.3BSD Reno fast retransmit delayed ACK’s 1975 Three-way handshake Raymond Tomlinson In SIGCOMM 75

slide-70
SLIDE 70

TCP: After 1990

1993 1994 1996

1994 ECN (Floyd) Explicit Congestion Notification 1993 TCP Vegas (Brakmo et al) real congestion avoidance 1994 T/TCP (Braden) Transaction TCP 1996 SACK TCP (Floyd et al) Selective Acknowledgement 1996 Hoe Improving TCP startup 1996 FACK TCP (Mathis et al) extension to SACK Interaction of real-time protocols with TCP? XCP…

1999

slide-71
SLIDE 71

TCP Flavors

  • TCP Tahoe

– Jacobson’s implementation of congestion control – No fast recovery

  • TCP Reno

– Fast recovery – Delayed ACKs

  • TCP Vegas

– Source-based congestion avoidance rather than control – TCP Reno needs to cause congestion to determine available bandwidth

slide-72
SLIDE 72

TCP user TCP/IP protocol sender checksum SEND COMPLETE

transmit queue

get data

user transmit buffers TCP transmit buffers (optional)

  • utbound

segments

network path TCP/IP protocol receiver checksum RECEIVE COMPLETE

receive queue

window data

user receive buffers TCP receive buffers (optional) inbound segments

TCB flow ack flow ack TCP implementation

slide-73
SLIDE 73

TCP Connection Teardown

  • Each side of a TCP connection can independently close

the connection – Thus, possible to have a half duplex connection

  • Closing process sends a FIN message

– Waits for ACK of FIN to come back – This side of the connection is now closed

slide-74
SLIDE 74

Jacobson’s Retransmission Timeout

  • Where to set RTO? Originally 2*RTT.

– Timeout? Cut RTO in half – lots of spurious timeouts

  • At high loads RTT variance is high
  • Solution:

– Base RTO on RTT and standard deviation

  • RTO = RTT + 4 * rttvar

– new_rttvar = β * dev + (1- β) old_rttvar

  • Dev = smoothed linear deviation
slide-75
SLIDE 75

TCP ACK Generation [RFC 1122, RFC 2581]

Event

In-order segment arrival, No gaps, Everything else already ACKed In-order segment arrival, No gaps, One delayed ACK pending Out-of-order segment arrival Higher-than-expect seq. # Gap detected Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK Send duplicate ACK, indicating seq. #

  • f next expected byte

Immediate ACK