ac dc tcp virtual congestion control enforcement for
play

AC DC TCP: Virtual Congestion Control Enforcement for Datacenter - PowerPoint PPT Presentation

AC DC TCP: Virtual Congestion Control Enforcement for Datacenter Networks Ke Keqiang He He , Eric Rozner, Kanak Agarwal, Yu Gu, Wes Felter, John Carter, Aditya Akella 1 Datacenter Network Congestion Control Congestion is not rare in


  1. AC ⚡ DC TCP: Virtual Congestion Control Enforcement for Datacenter Networks Ke Keqiang He He , Eric Rozner, Kanak Agarwal, Yu Gu, Wes Felter, John Carter, Aditya Akella 1

  2. Datacenter Network Congestion Control • Congestion is not rare in datacenter networks [Singh, SIGCOMM’15] • Tail latency is huge • 99.9 th -tile latency is orders of magnitude higher than the median [Mogul, HotOS’15] • Queueing latency is the major contributor [Jang, SIGCOMM’15] • New datacenter TCP congestion control schemes have been proposed • E.g., DCTCP, TIMELY, DCQCN, TCP-Bolt, ICTCP, etc 2

  3. But, We Can Not Control VM TCP Stacks • In multi-tenant datacenters, admins can not control VM TCP stacks • Because VMs are setup and managed by different entities Tenant 3 Tenant 1 Tenant 2 VM VM VM Therefore, outdated, inefficient, or misconfigured TCP stacks TCP/IP stack TCP/IP stack TCP/IP stack can be implemented in the VMs. Virtualization Infrastructure Servers Storage This leads to 2 main problems. Networking 3

  4. Problem #1: Large Queueing Latency switch queue P P P P P P P P sender receiver TCP RTT can reach tens of milliseconds because of packet queueing. No queueing latency, TCP RTT is around 60 to 200 microseconds 4

  5. Problem #2: TCP Unfairness • ECN and non-ECN coexistence problem [Judd, NSDI’15] • Non-ECN: e.g., CUBIC • ECN: e.g., DCTCP 5

  6. Problem #2: TCP Unfairness (cont.) CC: Congestion Control • Different congestion control algorithms lead to unfairness receivers senders Dumbbell topology 5 flows with different CC algorithms congest a 10G link 6

  7. AC � � ⚡ DC TCP: Administrator Control over Data Center TCP Implements TCP congestion control in the Virtual Switch Ensures VM TCP stacks can not impact the network 7

  8. AC ⚡ DC: High Level View Virtual Machines AC/DC Apps Apps Apps (sender) Per-flow CC feedback Uniform per-flow CC OS OS OS vNIC vNIC vNIC vSwitch Control plane Data path (AC/DC) Server AC/DC Case study: DCTCP (receiver) Datacenter Network CC in the vSwitch 8

  9. AC ⚡ DC Benefits • No modifications to VMs or hardware • Low latency provided by state-of-the-art CC algorithms • Improved TCP fairness and support both ECN and non-ECN flows • Enforce per-flow differentiation via congestion control, e.g., • East-west and north-south flows can use different CCs (web server) • Give higher priority to “mission-critical” traffic (backend VM) 9

  10. AC ⚡ DC Design • Obtaining Congestion Control State • DCTCP Congestion Control in the vSwitch • Enforcing Congestion Control • Per-flow Differentiation via Congestion Control 10

  11. Obtaining Congestion Control State • Per-flow connection tracking • All traffic goes through the virtual switch • We can reconstruct CC via monitoring all the packets of a connection Flow Updating CC Packet classification variables • Maintain per-flow congestion control variables • E.g., CC-related sequence numbers, dupack counter etc 11

  12. DCTCP Congestion Control in the vSwitch • Universal ECN marking • Get ECN feedback 12

  13. Universal ECN Marking • Why? • Not all VMs run ECN-Capable Transports (ECT) like DCTCP • Universal ECN Marking • All packets entering the fabric should be ECN-marked by the virtual switch • Solves the ECN and non-ECN coexistence problem 13

  14. Get ECN Feedback Congestion Experienced (CE) marked Receiver Sender congested side side switch Need a way to carry the congestion information back. 14

  15. Get ECN Feedback Congestion Experienced (CE) marked congested AC/DC AC/DC switch sender receiver Congestion feedback is encoded as 8 bytes: {ECN_bytes, Total_bytes}. Piggybacked on an existing TCP ACK (PACK). 15

  16. DCTCP Congestion Control in the vSwitch Incoming ACK Extract CC info if it is PACK; Update connection tracking variables; Update ⍺ once every RTT; Yes Yes Congestion? Loss? No No ⍺ = max_alpha; DCTCP Yes Cut wnd in last tcp_cong_avoid(); Congestion RTT? Control Law No wnd=wnd*(1 - ⍺ /2); AC/DC enforces CC on the flow; Send ACK to VM; 16

  17. Enforcing Congestion Control • TCP sends min(CWND, RWND) • CWND is congestion control window ( congestion control ) • RWND is receiver’s advertised window ( flow control ) • AC ⚡ DC reuses RWND for congestion control purpose • VMs with unaltered TCP stacks will naturally follow our enforcement • Non-conforming flows can be policed by dropping any excess packets not allowed by the calculated congestion window • Loss has to be recovered e2e, this incentivizes tenants to respect standards 17

  18. Control Law for Per-flow Differentiation 𝑆𝑋𝑂𝐸 = 𝑆𝑋𝑂𝐸 ∗ (1 − 𝛽 DCTCP: 2) 𝑆𝑋𝑂𝐸 = 𝑆𝑋𝑂𝐸 ∗ (1 − (𝛽 − 𝛽𝛾 AC ⚡ DC TCP: 2 )) When 𝛾 is close to 1, it becomes DCTCP. When 𝛾 is close to 0, it backs-off aggressively. Larger 𝛾 for higher priority traffic. 18

  19. Implementation • Prototype implementation in Open vSwitch kernel datapath • ~1200 LoC added • Our design leverages available techniques to improve performance • RCU-enabled hash tables to perform connection tracking • AC ⚡ DC manipulates TCP segments, instead of MTU-sized packets • AC ⚡ DC leverages NIC checksumming so the TCP checksum does not have to be recomputed after header fields are modified VM1 VM2 Stack Stack Manipulates TCP segments TCP segment AC ⚡ DC TSO Hypervisor TCP/IP NIC recalculates TCP checksum P P P NIC P 19

  20. Evaluation • Testbed: 17 servers (6-core, 60GB memory), 6 10Gbps switches • Microbenchmark topologies receivers senders receiver senders Incast topology Dumbbell topology 20

  21. Evaluation • Macrobechmark topology 17 servers attached to a 10G switch. • Metrics: TCP RTT, loss rate, Flow Completion Time (FCT) 21

  22. Experiment Setting (compared 3 schemes) • CUBIC • CUBIC stack on top of standard OVS • DCTCP • DCTCP stack on top of standard OVS • AC ⚡ DC • CUBIC/Reno/Vegas/HighSpeed/Illinois stacks on top of AC ⚡ DC VM VM VM VMs Any CUBIC DCTCP OVS OVS AC ⚡ DC Hypervisor CUBIC DCTCP AC ⚡ DC 22

  23. receivers senders Tracking Window Size Dumbbell topology Running DCTCP stack on top of AC ⚡ DC, only outputs calculated RWND without enforcement. AC ⚡ DC closely tracks the window size of DCTCP. 23

  24. receivers senders Convergence Dumbbell topology CUBIC DCTCP AC/DC AC/DC has comparable convergence properties as DCTCP and is better than CUBIC. 24

  25. receivers senders AC ⚡ DC improves fairness when VMs use different CCs Dumbbell topology Standard OVS AC ⚡ DC 25

  26. receivers senders Overhead (CPU and Memory) Dumbbell topology Sender side Less than 1% additional CPU overhead compared with the baseline. Each connection uses 320 bytes to maintain CC variables (10k connections use 3.2MB). 26

  27. receiver TCP Incast RTT & drop rate senders Incast topology 50 th percentile RTT 99.9 th percentile RTT Packet drop rate AC ⚡ DC tracks the performance of DCTCP closely. 27

  28. Flow completion time with 17 servers attached trace-driven workloads to a 10G switch. Web-searching workload (DCTCP) Data-mining workload (CONGA) AC ⚡ DC obtains same performance as DCTCP. AC ⚡ DC can reduce FCT by 36% - 76% compared with default CUBIC. 28

  29. Summary • AC ⚡ DC allows administrators to regain control over arbitrary tenant TCP stacks by enforcing congestion control in the virtual switch • AC ⚡ DC requires no changes to VMs or network hardware • AC ⚡ DC is scalable, light-weight (< 1% CPU overhead) and flexible 29

  30. Thanks! 30

  31. Backup Slides 31

  32. Related Work • DCTCP • ECN-based congestion control for DCNs • TIMELY • Latency-based congestion control for DCNs • Accurate latency measurement provided by accurate NIC timestamps • vCC • vCC and AC ⚡ DC are closely related works by two independent teams J 32

  33. ECN and non-ECN Coexistence ECN Non-ECN Switch configured with WRED/ECN When queue occupancy is larger than marking threshold, non-ECN packets are dropped 33

  34. IPSec • AC ⚡ DC is not able to inspect the TCP headers for IPSec traffic • May perform approximating rate limiting based on congestion feedback information. 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend