Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch
The 49th International Conference on Parallel Processing (ICPP) 2020
Ahmed M. Abdelmoniem, Hengky Susanto, and Brahim Bensaou
Reducing Latency in Multi-Tenant Data Centers via Cautious - - PowerPoint PPT Presentation
Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch The 49th International Conference on Parallel Processing (ICPP) 2020 Ahmed M. Abdelmoniem, Hengky Susanto, and Brahim Bensaou Outline Introduction and background
The 49th International Conference on Parallel Processing (ICPP) 2020
Ahmed M. Abdelmoniem, Hengky Susanto, and Brahim Bensaou
› Data for recommendation, machine learning, business intelligence, scientific research, etc.
DCTCP DCTCP DCTCP Data parallel processing application
Sender Switch Receiver
ECN threshold
Switch Buffer
Packet Marked by ECN marking TCP ACK ACK packets carry ECN echo
Switch Buffer
Worker 1 Worker 2 Worker 3
Aggregator
ECN threshold
Packets drop
Data parallel processing
Better understanding of how well DCTCP coping with the incast problem.
Conduct preliminary experiment via NS2 simulator in dumbbell network topology.
performance short-lived flow .
generate 3 DUPACKs to trigger TCP fast re-transmission scheme.
Our findings show that, short-lived flows in DCTCP still suffer from packet losses due to the use of large initial congestion windows that results in incast problem.
To provide a holistic view of the problem and better understanding on the complex interactions between network components (switch, sender, and receiver).
Sender Switch Receiver
Due to nature of TCP based protocol, there is a wealth
size, RTT, .., etc. A receiver has a natural position to evaluate information carried by packets with ECN marking from inbound traffic.
Combining information gathered from both the sender and receiver provides a sender with a richer and more holistic view of the network condition.
by the receiver from the total number of transmitted packets by the sender.
At time t
Worker 1
Worker 2 Worker 3
Aggregator
Switch Buffer
Switch
Worker 2 Worker 3
Worker 1
Observation : Transmitting packets at different time helps to mitigate packets drop.
At time t+1
At time t
Worker 2 Worker 3 Worker 1
At time t+1 Switch Buffer
Visualize switch buffer as the bin. Visualize packet as the items that are packed into a bin.
Key insight gained from our theoretical results:
information base on the number of packets marked and unmarked by ECN.
should be transmitted in at least two batches (rounds).
Despite encouraging outcomes from our theoretical results, to realize the theory to practice, we must consider the practical challenges.
receives the ECN echo via ACK packets.
too late.
impact on large flows.
(Hypervisor-based congestion watching mechanism)
An active network probe scheme, that incorporates insights from our theoretical results, to determine the initial congestion window based on the congestion level in the network.
Our solution:
up during TCP synchronization stage.
the receiver in the case of congestion.
according to number of probe packets marked with ECN marking
(ICWND) according to RWND.
Hypervisor NIC
Flow# ECN S1:D1 7 S2:D2 3
Receiver
S1:data S3:data ECN-ECHO
Sender
S3:D3 5
S2:data VM2 VM1 VM3
S1-D1 S2-D2 S3-D3 IN Hook Pre_Route Ip_rcv OUT Hook Post_route Ip_finish rwnd_update
D1:ACK D3:ACK D2:ACK H-WATCH ECN
vSwitch daemon flow_lookup receive_from_vport action:do_output (Packet Interceptor) send_packet_to_vswitchd handle_packet_cmd send_to_vport new TCP Flow Table ECN tracking RWND Update extract_key action_execute User Space Kernel Datapath
Local TCP/IP stack Routing Prerouting Forward HWATCH Hook (packet interceptor) Input Output Postrouting
TCP Flow Table ECN tracking RWND Update
(1) Simulation experiments. (2) Testbed experiments.
ToR switches Aggregation switches
core routers
Servers
pod
Fat Tree Topology
TCP-DropTail, TCP-Red, and DCTCP.
Short-Lived flows: Avg FCT Long-Lived flows: Avg Goodput Persistent queue over time Bottleneck utilization over time
Performance of short-lived and long-lived flows in over 100 sources scenario.
Module.
Rack 1 Rack 2 Rack 3 Core ToR Rack 4 NetFPGA Switch
1 Gb/s
Fat Tree topology
Test experiments confirms that HWatch mitigates packets drop.
Short-Lived flows: Average FCT Long-Lived flows: Average Goodput
flows that are actively sending data to scale back to release some buffer space for short-lived flows.
flows while imposing minimal impact on the large flows.
Time Number of packets
Y number of packets burst (incast)
Packets are dropped from buffer overflow
Time required to drain packets in buffer that are queued at time t.
Packets queued in the buffer
X number of existing packets queued inside the buffer prior the burst
Illustration of the switch buffer experiencing incast.
3 3
Buffer Size