Improving the FreeBSD TCP Implementation
An update on all things TCP in FreeBSD and how they affect you Lawrence Stewart
lastewart@swin.edu.au Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology
Improving the FreeBSD TCP Implementation An update on all things - - PowerPoint PPT Presentation
Improving the FreeBSD TCP Implementation An update on all things TCP in FreeBSD and how they affect you Lawrence Stewart lastewart@swin.edu.au Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology Outline Who
lastewart@swin.edu.au Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology
1
2
3
4
5
6
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 2
1
2
3
4
5
6
1
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 3
Research assistant/engineer during/after studies http://caia.swin.edu.au/
Main focus on transport protocols http://caia.swin.edu.au/cv/lstewart/
Experimental research, software development, home networking, servers and personal desktops
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 4
1
2
3
4
5
6
2
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 5
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 6
Slow start Congestion avoidance Fast retransmit Fast recovery
SACK, ABC, ECN, window scaling, timestamps, etc.
1See RFC2001
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 7
5 10 15 20 25 30 50 100 150 time (secs) cwnd (pkts) Vanilla FreeBSD 7.0 − 80 RTT, 10Mbps
flow 1 cwnd
Slow start Fast retransmit/ Fast recovery Congestion avoidance
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 8
NewReno is defacto standard with warts (LFN, wireless) Many new proposals BSD still uses NewReno Linux uses CUBIC Windows Vista uses CTCP
CSO/TSO/LRO/TOE Various locking/caching tricks Socket buffer autotuning
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 9
FAST, HS-TCP , H-TCP , CTCP , CUBIC, etc.
2Nice summary:
http://kb.pert.geant2.net/PERTKB/TcpHighSpeedVariants
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 10
1
2
3
4
5
6
3
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 11
TCP CC research Standardisation process
Select most appropriate CC algorithm for the task
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 12
/* specify one of these structs per CC algorithm */ struct cc_algo { char name[TCP_CA_NAME_MAX]; int (*init) (struct tcpcb *tp); void (*deinit) (struct tcpcb *tp); void (*cwnd_init) (struct tcpcb *tp); void (*ack_received) (struct tcpcb *tp, struct tcphdr *th); void (*pre_fr) (struct tcpcb *tp, struct tcphdr *th); void (*post_fr) (struct tcpcb *tp, struct tcphdr *th); void (*after_idle) (struct tcpcb *tp); void (*after_timeout) (struct tcpcb *tp); STAILQ_ENTRY(cc_algo) entries; };
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 13
/* called during TCP/IP stack initialisation on boot */ void cc_init(void); /* dynamically registers a new CC algorithm */ int cc_register_algorithm(struct cc_algo *); /* dynamically deregisters a CC algorithm */ int cc_deregister_algorithm(struct cc_algo *);
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 14
struct tcpcb { .... /* CC function pointers to use for this connection */ struct cc_algo *cc_algo; /* connection specific CC algorithm data */ void *cc_data; };
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 15
available: comma-separated list of available CC algorithms algorithm: current system default CC algorithm
Override system default CC algorithm using setsockopt(2) Same as Linux define e.g. Iperf -Z option works
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 16
591 line C file ~280 lines of actual source code of which: ~100 lines is housekeeping/support code ~180 lines is core H-TCP code
412 line C file, 200 line header file ~300 lines of actual source code of which: ~145 lines is housekeeping/support code ~155 lines is core CUBIC code
3Available from: http://caia.swin.edu.au/urp/newtcp/tools.html
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 17
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 18
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 18
Host A Host B Router Host C Host D Endace DAG 3.7GF drop-tail queue drop-tail queue RTT/2 delay RTT/2 delay
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 19
30 35 40 45 50 55 60 queue occupancy (Kbytes) 60 62 64 66 68 70 72 25 30 35 40 45 50 55 time (secs) cwnd (pkts) flow 1 cwnd queue occupancy
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 20
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 21
1
2
3
4
5
6
4
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 22
4Available from http://caia.swin.edu.au/urp/newtcp/tools.html
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 23
1
2
3
4
5
6
5
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 24
Implement TCP Appropriate Byte Counting Implement TCP reassembly queue autotuning Integrate SIFTR into FreeBSD Characterise changes on our TCP testbed
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 25
5Available from: http://caia.swin.edu.au/urp/newtcp/tools.html 6See README in SIFTR distribution for specific details
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 26
Socket API ip_input() ip_output() tcp_input() tcp_output() L2 In L2 Out User Space Kernel Space Application TCP Control Block src_port: 80 dst_port: 54677 cwnd: 4380 rtt: 100 ... TCP Control Block src_port: 80 dst_port: 54677 cwnd: 4380 rtt: 100 ... TCP Control Block src_port: 80 dst_port: 54677 cwnd: 4380 rtt: 100 ... TCP Control Block src_port: 80 dst_port: 54677 cwnd: 4380 rtt: 100 ... query/update
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 27
Packet src_ip: 1.1.1.1 src_port: 1 dst_ip: 2.2.2.2 dst_port: 2 ... TCP Control Block src_port: 1 dst_port: 2 cwnd: 4380 rtt: 100 ...
lookup pkt_node copy stats enqueue pkt_node dequeue all pkt_nodes counter == 0? generate & write log message counter = (counter % ppl) get flow’s counter del pkt_node true false pkt_manager thread network thread(s) Packet enters Packet exits possible lock contention Legend counter++ TCP Packet? false true more pkt_nodes to process? yes no
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 28
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 29
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 30
10 20 30 40 50 60 50 100 150 200 250 time (secs) cwnd (pkts)
100ms RTT, 10Mbps, 62500 byte queue
noabc abc
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 31
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 32
1
2
3
4
5
6
6
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 33
RTT estimator Share CC between TCP/SCTP (Randall et. al.) Comprehensive RFC compliance check Fix slow-start, FR/FR
Framework for dealing with CSO/TSO/LRO/TOE DTRACEesque instrumentation Testing framework <- next big project I want to tackle
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 34
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 35
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 36
BSDCan 2009 http://www.caia.swin.edu.au lastewart@swin.edu.au 37