adapting tcp for recon fj gurable datacenter networks
play

Adapting TCP for Recon fj gurable Datacenter Networks Matthew K. - PowerPoint PPT Presentation

Adapting TCP for Recon fj gurable Datacenter Networks Matthew K. Mukerjee* , Christopher Canel* , Weiyang Wang , Daehyeok Kim* , Srinivasan Seshan*, Alex C. Snoeren *Carnegie Mellon University, UC San Diego, Nefeli Networks,


  1. Adapting TCP for Recon fj gurable Datacenter Networks Matthew K. Mukerjee* † , Christopher Canel* , Weiyang Wang ○ , Daehyeok Kim* ‡ , Srinivasan Seshan*, Alex C. Snoeren ○ *Carnegie Mellon University, ○ UC San Diego, † Nefeli Networks, ‡ Microsoft Research February 26, 2020

  2. Reconfigurable Datacenter Network (RDCN) all-to-all higher bandwidth, … connectivity between certain racks Packet Switch Packet Switch Circuit Switch Packet Network

  3. Reconfigurable Datacenter Network (RDCN) all-to-all higher bandwidth, … connectivity between certain racks Packet Switch Packet Switch Circuit Switch Packet Network 60GHz free-space optical wireless circuits optics

  4. Reconfigurable Datacenter Network (RDCN) all-to-all higher bandwidth, … connectivity between certain racks Packet Switch Packet Switch Circuit Switch Packet Network Available bandwidth ToR switch ToR switch Time Server 1 Server 1 … … … Server M Server M Rack 1 Rack N [Liu, NSDI ’14]

  5. Reconfigurable Datacenter Network (RDCN) all-to-all higher bandwidth, … connectivity between certain racks Packet Switch Packet Switch Circuit Switch Packet Network Available bandwidth ToR switch ToR switch Time Server 1 Server 1 … … … Server M Server M Rack 1 Rack N [Liu, NSDI ’14]

  6. Reconfigurable Datacenter Network (RDCN) all-to-all higher bandwidth, … connectivity between certain racks Packet Switch Packet Switch Circuit Switch Packet Network Available bandwidth ToR switch ToR switch Time Server 1 Server 1 … … … Server M Server M Rack 1 Rack N [Liu, NSDI ’14]

  7. Reconfigurable Datacenter Network (RDCN) all-to-all higher bandwidth, … connectivity between certain racks Packet Switch Packet Switch Circuit Switch Packet Network Available bandwidth ToR switch ToR switch Time Server 1 Server 1 … … … Server M Server M Rack 1 Rack N [Liu, NSDI ’14]

  8. Reconfigurable Datacenter Network (RDCN) all-to-all higher bandwidth, … connectivity between certain racks Packet Switch Packet Switch Circuit Switch Packet Network ToR switch ToR switch Server 1 Server 1 RDCN is a black box: … … … Do not segregate flows between networks Server M Server M Rack 1 Rack N [Liu, NSDI ’14]

  9. 2010: RDCNs speed up DC workloads Hybrid networks achieve higher performance on datacenter workloads Packet network Hybrid network (c-Through) Full bisection bandwidth network [Wang, SIGCOMM ’10]

  10. Today’s RDCNs recon fj gure 10x as often Advances in circuit switch technology have led to a 10x reduction in recon fj guration delay ⇒ today, circuits can recon fj gure much more frequently 2010 Today 180 μ s Available bandwidth Available bandwidth 10ms Time Time Better for datacenters: More fm exibility to support dynamic workloads Better for hosts: Less data must be available to saturate higher bandwidth NW [Porter, SIGCOMM ’13]

  11. Short-lived circuits pose a problem for TCP 16 fm ows from rack 1 to rack 2; packet network: 10 Gb/s; circuit network: 80 Gb/s 100 No TCP variant makes use of Average circuit utilization (%) 75 the high-bandwidth circuits 50 55 55 54 53 53 53 52 52 51 51 50 49 49 49 46 42 25 26 0 r c g c p d p a s p v o e s o d h b i a i i n l n l n d c e c l o a b b o b b b g t t e e o e c e n u y a c h e p y r v w c i d h l l a v s l t i c h s s g e w i h 7C3 variant

  12. TCP cannot ramp up during short circuits achieved bandwidth (BW) = slope what we expect 8x BW reality W B x 1 180 μ s no circuit circuit no circuit

  13. What is the problem? All TCP variants are designed to adapt to changing network conditions • E.g., congestion, bottleneck links, RTT But bandwidth fm uctuations in modern RDCN are an order of magnitude more frequent (10x shorter circuit duration) and more substantial (10x higher bandwidth) than TCP is designed to handle • RDCNs break the implicit assumption of relatively-stable network conditions This requires an order-of-magnitude shift in how fast TCP reacts

  14. This talk: Our 2-part solution In-network: Use information about upcoming circuits to transparently “trick” TCP into ramping up more aggressively • High utilization, at the cost of tail latency At endhosts: New TCP variant, reTCP , that explicitly reacts to circuit state changes • Mitigates tail latency penalty The two techniques can be deployed separately, but work best together

  15. This talk: Our 2-part solution In-network: Use information about upcoming circuits to transparently “trick” TCP into ramping up more aggressively • High utilization, at the cost of tail latency At endhosts: New TCP variant, reTCP , that explicitly reacts to circuit state changes • Mitigates tail latency penalty The two techniques can be deployed separately, but work best together

  16. Naïve idea: Enlarge switch bu ff ers Want we want: TCP’s congestion window ( cwnd ) to parallel the BW fm uctuations First attempt: Make cwnd large all the time How? Use large ToR bu ff ers Bandwidth cwnd Available bandwidth desired cwnd large, static bu ff ers Time Time

  17. Naïve idea: Enlarge switch bu ff ers low BDP Packet Switch Sender ToR bu ff er ToR bu ff er Receiver Circuit Switch high BDP

  18. Naïve idea: Enlarge switch bu ff ers Larger ToR bu ff ers low BDP increase utilization of the high-BDP circuit network Packet Switch Sender ToR bu ff er Receiver ToR bu ff er Circuit Switch high BDP Bandwidth

  19. Naïve idea: Enlarge switch bu ff ers low BDP Latency Packet Switch Sender ToR bu ff er Receiver ToR bu ff er Circuit Switch high BDP Bandwidth

  20. Large queues increase utilization… 100 100 100 80 77 Average circuit utilization (%) 60 49 40 31 20 21 0 4 8 16 32 64 128 6tatic buffer size (Sackets) 16 flows from rack 1 to rack 2; packet network: 10 Gb/s; circuit network: 80 Gb/s

  21. …but result in high latency Median latency 99th percentile latency latency 1000 1000 6tatic buffers (vary size) 0edian latency ( μ s) 6tatic buffers (vary size) 99th Sercentile latency ( μ s) 500 500 0 0 20 40 60 80 100 20 40 60 80 100 Average circuit utilization (%) Average circuit utilization (%) How can we improve this latency? 16 flows from rack 1 to rack 2; packet network: 10 Gb/s; circuit network: 80 Gb/s

  22. Use large bu ff ers only when circuit is up Dynamic bu ff er resizing: Before a circuit begins, transparently enlarge ToR bu ff ers Full circuit utilization with a latency degradation only during ramp-up period Bandwidth cwnd Available bandwidth desired cwnd large, static bu ff ers dynamic bu ff ers Time Time resize!

  23. Resize ToR bu ff ers before circuit begins cwnd Time Packet Switch Sender ToR bu ff er ToR bu ff er Receiver Circuit Switch

  24. Resize ToR bu ff ers before circuit begins cwnd Time Packet Switch Sender ToR bu ff er ToR bu ff er Receiver Circuit Switch

  25. Resize ToR bu ff ers before circuit begins cwnd Circuit coming! Time Packet Switch Sender ToR bu ff er ToR bu ff er Receiver Circuit Switch

  26. Resize ToR bu ff ers before circuit begins cwnd Time Packet Switch Sender ToR bu ff er Receiver ToR bu ff er Circuit Switch

  27. Resize ToR bu ff ers before circuit begins cwnd Time Packet Switch Sender ToR bu ff er Receiver ToR bu ff er Circuit Switch

  28. Resize ToR bu ff ers before circuit begins cwnd Time Packet Switch Sender ToR bu ff er Receiver ToR bu ff er Circuit Switch

  29. Resize ToR bu ff ers before circuit begins cwnd Time Packet Switch Sender ToR bu ff er Receiver ToR bu ff er Circuit Switch

  30. Resize ToR bu ff ers before circuit begins cwnd Time Packet Switch ToR bu ff er ToR bu ff er Sender Receiver Circuit Switch

  31. Resize ToR bu ff ers before circuit begins cwnd Time Packet Switch Sender ToR bu ff er ToR bu ff er Receiver Circuit Switch

  32. Con fj guring dynamic bu ff er resizing How long in advance should ToR bu ff ers resize ( 𝝊 )? • Long enough for TCP to grow cwnd to the circuit BDP How large should ToR bu ff ers grow to? • circuit BDP = 80 Gb/s ⨉ 40 µs = 45 9000-byte packets For our con fj guration, the ToR bu ff ers must hold ~40 packets to achieve 90% utilization, which requires 1800 µs of prebu ff ering We resize ToR bu ff ers between sizes of 16 and 50 packets

  33. How long in advance to resize, 𝝊 ? no circuit circuit 180 μ s ToR buffer size (packets) 16 flows from rack 1 to rack 2; packet network: 10 Gb/s; circuit network: 80 Gb/s; small buffers: 16 packets; large buffers: 50 packets

  34. How long in advance to resize, 𝝊 ? no circuit circuit 180 μ s achieved bandwidth (BW) = slope ToR buffer size (packets) 8x BW 1x BW 16 flows from rack 1 to rack 2; packet network: 10 Gb/s; circuit network: 80 Gb/s; small buffers: 16 packets; large buffers: 50 packets

  35. How long in advance to resize, 𝝊 ? no circuit circuit 180 μ s achieved bandwidth (BW) = slope ToR buffer size (packets) 49% 16 flows from rack 1 to rack 2; packet network: 10 Gb/s; circuit network: 80 Gb/s; small buffers: 16 packets; large buffers: 50 packets

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend