Reducing Latency in Multi-Tenant Data Centers via Cautious - PowerPoint PPT Presentation

Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch The 49th International Conference on Parallel Processing (ICPP) 2020 Ahmed M. Abdelmoniem, Hengky Susanto, and Brahim Bensaou

Outline • Introduction and background • Preliminary Investigation • Exploring solution design space • The solution (Hwatch) design • Evaluation • Conclusion

Living in Big Data Era • Massive amount of data being generated, collected, and processed daily. › Data for recommendation, machine learning, business intelligence, scientific research, etc. • To accelerate the processing speed, this massive amount of data must be processed in parallel .

Congestion Control in Data Center Data parallel processing application DCTCP DCTCP DCTCP • Data parallel processing is generally conducted in a data center. • Modern datacenter employ Data Center TCP (DCTCP) . • DCTCP is a TCP-like congestion control protocol designed for data center networks. • DCTCP leverages Explicit Congestion Notification (ECN) to provide multi-bit feed-back to the end host.

DCTCP Overview Marked by ECN marking Switch Buffer ECN threshold Packet Receiver Sender Switch TCP ACK ACK packets carry ECN echo

Incast Problem in Datacenter Network Data parallel processing Worker 1 ECN threshold Aggregator Switch Buffer Worker 2 Packets drop Worker 3 • Incast is bloated buffer incident. • Caused by burst of packets arriving at the same time. • Common occurrence for data parallel processing in data center network. • Often leads to network and application level performance degradation.

Preliminary Investigation Better understanding of how well DCTCP coping with the incast problem.

Investigating Incast Problem Conduct preliminary experiment via NS2 simulator in dumbbell network topology. • Short-lived flows are sensitive to the choice of the initial sending window size. • DCTCP marking results in the more aggressive acquisition of the available bandwidth. • Favor large flows (e.g. data backup traffic) over short-lived flow (e.g. search query) that may degrade the performance short-lived flow .

The Root of The Performance Degradation • Observation 1 : Short-lived flows (e.g. 1 to 5 packets) do not have enough packets to generate 3 DUPACKs to trigger TCP fast re-transmission scheme. • Or the sender may lose the entire window of packets. • Sender must rely on TCP RTO to detect packet drops. ( Default Linux’s RTO = 200 ms to 300 ms ). • Observation 2 : Incast problem primarily affects short-lived flows . • Today’s data parallel processing applications (E.g. Map -Reduce) generate many short-lived flows.

Our findings show that, short-lived flows in DCTCP still suffer from packet losses due to the use of large initial congestion windows that results in incast problem .

Research Question What is the optimal choice of initial congestion window for DCTCP that mitigates incast problem, while minimizing the average completion time of short-lived flows?

Exploring Solution Design Space To provide a holistic view of the problem and better understanding on the complex interactions between network components (switch, sender, and receiver).

Exploring The Design Space at Sender and Receiver Receiver Sender Switch Due to nature of TCP based protocol, there is a wealth A receiver has a natural position to evaluate information of information available at the sender . carried by packets with ECN marking from inbound traffic. • Number of transmitted packets, congestion window size, RTT, .., etc. Combining information gathered from both the sender and receiver provides a sender with a richer and more holistic view of the network condition. • For instance, the number of packets dropped can be approximated by subtracting the number of packets received by the receiver from the total number of transmitted packets by the sender .

Exploring The Design Space at The Switch Worker 1 Worker 2 Aggregator Switch Worker 3 Observation : Transmitting packets at different time helps to mitigate packets drop. Switch Buffer Worker 1 At time t+1 At time t Worker 2 Worker 3

Visualizing Queue Management as Bin-Packing Worker 1 Worker 3 Worker 2 Visualize switch buffer as the bin. Visualize packet as the items that are packed into a bin. Switch Buffer At time t At time t+1

Visualizing Incast As Bin-Packing • Different perspective in understanding incast problem. • Starting point to think how to transmit a bulk data of short-lived flows, which only consists of few packets. • Allowing us to inherit wisdom from earlier study on bin-packing problem. • Our solution draws inspiration from the classic solution, Next Fit Algorithm. • Emulate Bin-Packing problem by utilizing ECN used in DCTCP. • The ECN setup follows the recommendation for DCTCP.

Theoretical Results Key insight gained from our theoretical results: • The initial congestion window size can be approximated by leveraging information base on the number of packets marked and unmarked by ECN. • Given the number of packets marked by ECN, the first 𝑜 transmitted packets should be transmitted in at least two batches (rounds).

From Theory to Practice - Practical Challenges Despite encouraging outcomes from our theoretical results, to realize the theory to practice, we must consider the practical challenges. • Incast problem is a distributed online problem. • Bin-packing is an offline problem. • Short-lived flows (e.g. 1 to 5 packets) may be completed before the sender receives the ECN echo via ACK packets. • Short-lived flows only learn about in network congestion after receiving ECN echo, which is too late.

System Design Requirements of The Solution • Improves the performance of short-lived flows with minimal impact on large flows. • Simple for deployment in data center. • Being Independent of the TCP variant. • No modifications to the VM’s network stack. • Addresses the practical challenges.

Our solution: HWatch ( Hypervisor-based congestion watching mechanism ) An active network probe scheme, that incorporates insights from our theoretical results, to determine the initial congestion window based on the congestion level in the network.

HWatch System Design in a Nutshell • Injects probe packets at connection start- up during TCP synchronization stage. Receiver Sender • The probe packets carry the ECN marks to S1:data S2:data S3:data the receiver in the case of congestion. D3:ACK D1:ACK D2:ACK S3-D3 S2-D2 S1-D1 ECN-ECHO ECN • Receiver sets the receive window (RWND) VM3 VM2 VM1 Post_route Ip_finish OUT Hook according to number of probe packets Hypervisor Flow# ECN marked with ECN marking S1:D1 7 Pre_Route H-WATCH IN Hook Ip_rcv S2:D2 3 • Conveys RWND to sender via ACK packet. S3:D3 5 NIC rwnd_update • Sender sets the Initial congestion window (ICWND) according to RWND.

HWatch Implementation Overview vSwitch daemon User Space Local TCP/IP stack Kernel Datapath Input Output receive_from_vport send_packet_to_vswitchd handle_packet_cmd Forward HWATCH Hook Prerouting Routing Postrouting new (packet interceptor) action:do_output extract_key flow_lookup action_execute (Packet Interceptor) TCP Flow Table TCP Flow Table send_to_vport ECN tracking ECN tracking RWND Update RWND Update • The Hwatch module is implemented in the shim-layer in both sender and receiver. • Hwatch is implemented via Netfilters by inserting hooks in the forward processing path. • HWatch is realized by modifying the OpenSwitch kernel datapath module. • Adjusting the flow processing table. • By doing so, Hwatch is deployable friendly in the production datacenter.

Evaluation and Analysis (1) Simulation experiments. (2) Testbed experiments.

Simulation Experiment Setups core routers Aggregation switches ToR pod switches Servers Fat Tree Topology • NS2 simulator. • Over 100s of servers connected by commodity switches. • Compared to different type TCP traffic sources (TCP-DropTail, TCP-Red, and DCTCP).

Simulation Experiment Results Short-Lived flows: Avg FCT Long-Lived flows: Avg Goodput Persistent queue over time Bottleneck utilization over time Performance of short-lived and long-lived flows in over 100 sources scenario. • HWatch improves the performance of short-lived flows by up to 10X compares to TCP-DropTail, TCP-Red, and DCTCP. • Minimal impact to the performance of large-flows.

Testbed Experiment Setup 1 Gb/s NetFPGA Core Switch ToR Rack 3 Rack 4 Rack 1 Rack 2 Fat Tree topology • Build and deploy HWatch prototype in mini data center. • Fat Tree topology connecting 4 Racks of DC-grade servers installed with Incast Guard End-host Module. • Commodity (EdgeCore) Top of Rack switches. • Core Switch is PC installed with the NetFPGA Switch.

Testbed Experiments Results Long-Lived flows: Average Goodput Short-Lived flows: Average FCT Test experiments confirms that HWatch mitigates packets drop. • Improves the performance of short-lived flows by up to 100% . • Minimal impact to the performance large flows.

Reducing Latency in Multi-Tenant Data Centers via Cautious - PowerPoint PPT Presentation

Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch The 49th International Conference on Parallel Processing (ICPP) 2020 Ahmed M. Abdelmoniem, Hengky Susanto, and Brahim Bensaou Outline Introduction and background

LANDLORD TENANT LAW UPDATES HIGHLIGHTS FROM LAWS PASSED IN 2019 STARTING A LANDLORD-TENANT

Reducing Latency for Linux Transport Per Hurtig Karlstad University Andreas Petlund Simula

Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella Latency-critical

Election s Agenda Agenda 1. Welcome and Introduction 2. Why we are here today 3. Tenant

February 14, 2018 Staff: Kim Painter, Laura London Queens Court Tenant Profile Tenant Profile

Northampton Tenant Panel Housing Options Appraisal Appendix 2 Results and Analysis of the Tenant

Data Centers with with Data Centers wi with th V-Class Chillers The V-Class Chiller Data Centers

Green Latency-aware Data Deployment in Data Centers: Balancing Latency, Energy in Networks and

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Operating Multi-Tenant Kafka Services for Developers Data Council SF 2019 Ali Hamidi - Heroku

Reducing input latency on the web bit.ly/reduce-input-latency W3C Games Workshop - June 2019

Reducing the Latency-Tail of Short-Lived Flows: Adding Forward Error Correction in Data Centers

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Why Some Like It Loud: Timing Power Attacks in Multi-tenant Data Centers Using an Acoustic Side

Multi-Tenant Data Centers Mohammad A. Islam, Xiaoqi Ren, Shaolei Ren, and Adam Wierman This work

UNIVERSITY Academic Support Centers Academic Support Centers (ASC) Academic Support Centers

VERTEX magnets status Antoni Aduszkiewicz University of Warsaw EATM, October 6, 2015 Antoni

WIND & CLOUDS Mt. Washington Observatory ATMOSPHERIC PRESSURE Weight of the air above a

AIRS Profile Data Assimilation in WRF Brad Zavodsky, Shih-hung Chou, Gary Jedlovec SPoRT:

902-135: TEN RUNG WALL Page 11 Rev. 1/22/04 PARTS LIST KKJJJ SPECIFICATIONS Ref Description

Abstract argumentation for agent-based social simulations

Designing Technology from Everyday Experience Jill Fantauzzacoffin Artist-inventor Georgia

Web APIs Web APIs Programming for Statistical Programming for Statistical Science Science

Store and Share using EUDAT B2SHARE REST API EGI-CF EUDAT training workshop November 2015 Carl

Reducing Latency in Multi-Tenant Data Centers via Cautious - PowerPoint PPT Presentation

Reducing Latency in Multi-Tenant Data Centers via Cautious Congestion Watch The 49th International Conference on Parallel Processing (ICPP) 2020 Ahmed M. Abdelmoniem, Hengky Susanto, and Brahim Bensaou Outline Introduction and background

LANDLORD TENANT LAW UPDATES HIGHLIGHTS FROM LAWS PASSED IN 2019 STARTING A LANDLORD-TENANT

Reducing Latency for Linux Transport Per Hurtig Karlstad University Andreas Petlund Simula

Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella Latency-critical

Election s Agenda Agenda 1. Welcome and Introduction 2. Why we are here today 3. Tenant

February 14, 2018 Staff: Kim Painter, Laura London Queens Court Tenant Profile Tenant Profile

Northampton Tenant Panel Housing Options Appraisal Appendix 2 Results and Analysis of the Tenant

Data Centers with with Data Centers wi with th V-Class Chillers The V-Class Chiller Data Centers

Green Latency-aware Data Deployment in Data Centers: Balancing Latency, Energy in Networks and

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Operating Multi-Tenant Kafka Services for Developers Data Council SF 2019 Ali Hamidi - Heroku

Reducing input latency on the web bit.ly/reduce-input-latency W3C Games Workshop - June 2019

Reducing the Latency-Tail of Short-Lived Flows: Adding Forward Error Correction in Data Centers

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Why Some Like It Loud: Timing Power Attacks in Multi-tenant Data Centers Using an Acoustic Side

Multi-Tenant Data Centers Mohammad A. Islam, Xiaoqi Ren, Shaolei Ren, and Adam Wierman This work

UNIVERSITY Academic Support Centers Academic Support Centers (ASC) Academic Support Centers

VERTEX magnets status Antoni Aduszkiewicz University of Warsaw EATM, October 6, 2015 Antoni

WIND &amp; CLOUDS Mt. Washington Observatory ATMOSPHERIC PRESSURE Weight of the air above a

AIRS Profile Data Assimilation in WRF Brad Zavodsky, Shih-hung Chou, Gary Jedlovec SPoRT:

902-135: TEN RUNG WALL Page 11 Rev. 1/22/04 PARTS LIST KKJJJ SPECIFICATIONS Ref Description

Abstract argumentation for agent-based social simulations

Designing Technology from Everyday Experience Jill Fantauzzacoffin Artist-inventor Georgia

Web APIs Web APIs Programming for Statistical Programming for Statistical Science Science

Store and Share using EUDAT B2SHARE REST API EGI-CF EUDAT training workshop November 2015 Carl

WIND & CLOUDS Mt. Washington Observatory ATMOSPHERIC PRESSURE Weight of the air above a