Making Linux TCP Fast Yuchung Cheng Neal Cardwell 1 netdev 1.2 - - PowerPoint PPT Presentation

making linux tcp fast
SMART_READER_LITE
LIVE PREVIEW

Making Linux TCP Fast Yuchung Cheng Neal Cardwell 1 netdev 1.2 - - PowerPoint PPT Presentation

Making Linux TCP Fast Yuchung Cheng Neal Cardwell 1 netdev 1.2 Tokyo, October, 2016 Once upon a time, there was a TCP ACK... Here is the a story of what happened next... 2 RACK: detect losses by packets send time Monitors the delivery


slide-1
SLIDE 1

1

Making Linux TCP Fast

Yuchung Cheng Neal Cardwell

netdev 1.2 Tokyo, October, 2016

slide-2
SLIDE 2

Once upon a time, there was a TCP ACK...

Here is the a story of what happened next...

2

slide-3
SLIDE 3

RACK: detect losses by packets’ send time

Monitors the delivery process of every (re)transmission. E.x. Sent packets P1 and P2 Receives a SACK of P2 => P1 is lost if sent more than $RTT + $reo_wnd ago1 Reduce timeouts in Disorder state by 80% on Google.com

3

1 RACK draft-ietf-tcpm-rack-00 since Linux 4.4

slide-4
SLIDE 4

congestion control: how fast to send?

4

slide-5
SLIDE 5

5

Congestion and bottlenecks

slide-6
SLIDE 6

6

Delivery rate

BDP BDP + BufSize

Congestion and bottlenecks

amount in flight

slide-7
SLIDE 7

7

Delivery rate

BDP BDP + BufSize

RTT amount in flight

slide-8
SLIDE 8

8

Delivery rate

BDP BDP + BufSize

RTT

CUBIC / Reno

amount in flight

slide-9
SLIDE 9

9

Delivery rate

BDP BDP + BufSize

RTT

Optimal max BW and min RTT (Gail & Kleinrock. 1981)

amount in flight

slide-10
SLIDE 10

BDP = (max BW) * (min RTT)

10

Delivery rate

BDP BDP + BufSize

RTT

Estimating optimal point (max BW, min RTT)

amount in flight

Est min RTT = windowed min of RTT samples Est max BW = windowed max of BW samples

slide-11
SLIDE 11

11

Delivery rate

BDP BDP + BufSize

RTT amount in flight

Only min RTT is visible Only max BW is visible

But to see both max BW and min RTT, must probe on both sides of BDP...

slide-12
SLIDE 12

One way to stay near (max BW, min RTT) point:

12

Model network, update max BW and min RTT estimates on each ACK Control sending based on the model, to... Probe both max BW and min RTT, to feed the model samples Pace near estimated BW, to reduce queues and loss Vary pacing rate to keep inflight near BDP (for full pipe but small queue) That's BBR congestion control (code in Linux v4.9 paper ACM Queue, Oct 2016) BBR = Bottleneck Bandwidth and Round-trip propagation time BBR seeks high tput with small queue by probing BW and RTT sequentially

slide-13
SLIDE 13

Confidential + Proprietary

BBR: model-based walk toward max BW, min RTT

  • ptimal operating point

13

slide-14
SLIDE 14

Confidential + Proprietary

STARTUP: exponential BW search

14

slide-15
SLIDE 15

Confidential + Proprietary

DRAIN: drain the queue created during startup

15

slide-16
SLIDE 16

Confidential + Proprietary

PROBE_BW: explore max BW, drain queue, cruise

16

slide-17
SLIDE 17

Confidential + Proprietary

PROBE_RTT briefly if min RTT filter expires (=10s)*

[*] if continuously sending

minimal packets in flight for max(0.2s, 1 round trip)

17

slide-18
SLIDE 18

Packet scheduling: when to send?

18

slide-19
SLIDE 19

NIC TCP

Pacing Fair queuing TCP Small Queues (TSQ) TSO autosizing

link

19

? fq

slide-20
SLIDE 20

Performance results...

20

slide-21
SLIDE 21

BBR vs CUBIC synthetic bulk TCP test with 1 flow, bottleneck_bw 100Mbps, RTT 100ms

Fully use bandwidth, despite high loss

21

slide-22
SLIDE 22

Low queue delay, despite bloated buffers

22

BBR vs CUBIC synthetic bulk TCP test with 8 flows, bottleneck_bw=128kbps, RTT=40ms

slide-23
SLIDE 23

BBR is 2-20x faster on Google WAN

  • BBR used for all TCP on Google B4
  • Most BBR flows so far rwin-limited

○ max RWIN here was 8MB (tcp_rmem[2]) ○ 10 Gbps x 100ms = 125MB BDP

  • after lifting rwin limit

BBR 133x faster than CUBIC

23

slide-24
SLIDE 24

Conclusion

Algorithms and architecture in Linux TCP have evolved

  • Maximizing BW, minimizing queue, and one-RTT recovery (BBR, RACK)
  • Based on groundwork of a high-performance packet scheduler

(fq/pacing/tsq/tso-autosizing)

  • Orders of magnitude higher bandwidth and lower latency

Next Google, YouTube, and... the Internet?

  • Help us make them better! https://groups.google.com/forum/#!forum/bbr-dev

24

slide-25
SLIDE 25

Backup slides...

26

slide-26
SLIDE 26

Confidential + Proprietary

BBR convergence dynamics

Converge by sync'd PROBE_RTT + randomized cycling phases in PROBE_BW

  • Queue (RTT) reduction is observed by every (active) flow
  • Elephants yield more (multiplicative decrease) to let mice grow

bw = 100 Mbit/sec path rtt = 10ms