D ATA center networks use multi-rooted Clos topologies to balances - PDF document

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 1, JANUARY 2019 133 Luopan: Sampling-Based Load Balancing in Data Center Networks Peng Wang , Member, IEEE , George Trimponias, Hong Xu , Member, IEEE , and Yanhui Geng Abstract— Data center networks demand high-performance, robust, and practical data plane load balancing protocols. Despite progress, existing work falls short of meeting these requirements. We design, analyze, and evaluate Luopan, a novel sampling based load balancing protocol that overcomes these challenges. Luopan operates at flowcell granularity similar to Presto. It periodically samples a few paths for each destination switch and directs flowcells to the least congested one. By being congestion-aware, Luopan improves flow completion time (FCT), and is more robust to topological asymmetries compared to Presto. The sampling approach simplifies the protocol and makes it much more scalable for implementation in large-scale networks compared to existing congestion-aware schemes. We provide analysis to show that Luopan’s periodic sampling has the same asymptotic behavior as instantaneous sampling: taking 2 random samples provides exponential improvements over 1 sample. We conduct comprehensive packet-level simulations with production workloads. The results show that Luopan consistently outperforms state-of-the-art schemes in large-scale topologies. Compared to Presto, Luopan with 2 samples improves the 99.9%ile FCT of mice flows by up to 35 percent, and average FCT of medium and elephant flows by up to 30 percent. Luopan also performs significantly better than Local Sampling with large asymmetry. Index Terms— Data center networks, load balancing, network congestion, distributed Ç 1 I NTRODUCTION D ATA center networks use multi-rooted Clos topologies to balances the number of flowcells. It does not work well with provide many equal-cost paths between hosts [4], [18]. link failures and network asymmetry, which are rather com- To load balance traffic, switches run ECMP—Equal Cost mon in practice [17]. Even in a symmetric network with uni- Multi-Path—that forwards packets among equal-cost egress form flowcells, Presto’s round-robin still causes transient ports using static hashing. Though simple to implement, congestion in the lower tier of a multi-tier Clos network, ECMP’s drawbacks are widely recognized in the community. because it sequentially uses the ports of a switch first before Hash collisions cause flow collisions and congestion, degrad- moving to the next (Section 2.2). Transient load imbalance ing throughput for elephant flows [5], [12], [14] and tail still exists with Presto, which degrades the tail FCT for mice latency for mice flows [7], [8], [25], [37]. flows. Recent work such as Presto [20] proposes to break flows A more robust approach is congestion-aware load bal- into small flowcells and load balance flowcells across avail- ancing advocated by CONGA [6] and HULA [24]. Switches able paths in a round-robin fashion. By transforming the monitor congestion levels for each path and direct a flow or heavy-tailed flows into many smaller flowcells, Presto can flowlet to the least congested path. This is responsive to better balance the load and improve flow completion time changing network conditions, and robust to failures and (FCT) for medium and large flows (Section 2.1). However, network asymmetry [6], [24]. To make the best load balanc- in practice most flows are small and only have a few flow- ing decisions, prior work strives to collect congestion feed- cells. We find that in one production network 90 percent of back for each path between the source and destination ToR the flows have less than 6 flowcells (Section 2.2). This switches. These omniscient schemes perform well in small- implies that a flow can only utilize a few random paths out scale enterprise networks with simple 2-tier leaf-spine of the hundreds available in typical large scale produc- topologies [6]. The challenge is that they have serious scal- tion networks [9], [33]. Further, Presto’s round-robin only ability and overhead issues that impede the deployment potential in large-scale networks (Section 2.3). Production networks such as Google’s [33], Facebook’s [9], and Ama- � P. Wang and H. Xu are with the Department of Computer Science, City zon’s [3] use 3-tier or even more complex Clos topologies. University of Hong Kong, Kowloon Tong, Hong Kong. For a typical 3-tier Clos network, hundreds of paths exist E-mail: pewang4-c@my.cityu.edu.hk, henry.xu@cityu.edu.hk. � G. Trimponias is with Huawei Noah’s Ark Lab, Hong Kong. between any two ToR switches, and a ToR switch can com- E-mail: g.trimponias@huawei.com. municate with hundreds of other ToR switches [9]. Thus, � Y. Geng is with Huawei Montreal Research Centre, Markham, ON L3R omniscient per-path feedback requires storing and tracking 5A4, Canada. E-mail: geng.yanhui@huawei.com. a daunting number of paths at each ToR in the time scale of Manuscript received 11 Dec. 2017; revised 26 Apr. 2018; accepted 9 July 2018. RTT (tens of microseconds). Further, acquiring omniscient Date of publication 23 July 2018; date of current version 12 Dec. 2018. (Corresponding author: Hong Xu.) information involves many switches in the process and Recommended for acceptance by B. He. makes the control loop slower. For information on obtaining reprints of this article, please send e-mail to: We explore a different direction: what if we use congestion reprints@ieee.org, and reference the Digital Object Identifier below. information of just a few random paths for load balancing? Digital Object Identifier no. 10.1109/TPDS.2018.2858815 1045-9219 � 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See ht _ tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

D ATA center networks use multi-rooted Clos topologies to balances - PDF document

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 1, JANUARY 2019 133 Luopan: Sampling-Based Load Balancing in Data Center Networks Peng Wang , Member, IEEE , George Trimponias, Hong Xu , Member, IEEE , and Yanhui Geng

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop

Cloud-based Global Load Balancing Improve Performance and reliability while reducing IT costs

Load Shift Working Group AUG 22, 2018 10AM 2PM PST CPUC COURTYARD ROOM

Alberta Energy and Capacity Presentation to AESO Adequacy and Demand Curve Working Group August

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

Using Octavia deep dive Dean H. Lorenz, IBM Research Haifa Allan Hu, Cloud Networking

Fahime Alizade & Rawi Ramdhan } Introduction Why scan the Internet? How to detect

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ

Security for smart Electricity GRIDs Project type: Collaborative project small or medium

Optimized in-memory IBOR architecture in a cloud environment Using Apache Ignite Rafique Awan

Oregon PUD Association Annual Meeting www.avangrid.com 1 Who is Avangrid Renewables? AVANGRID

The evolution of load-balancing in a company remarkably like ours, with some sort of web

How to Re-Architect without Breaking Stuff (too much) Owen G Garrett Ma March 2018

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

Benefits Local catalog Seamless discovery and delivery: find it, click it, get it Consortial

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software & IBM TSS

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference

1 The SP Suite has the capability to design an entire concrete structure from foundation to roof.

Transmission Plan Development Draft 2012/2013 ISO Transmission Plan Stakeholder Meeting Neil

AUTOMATIC CONTINGENCY SELECTION Ejebe/Wollenberg EE 8725 Presentation November 3, 2015 Tahnee

2018 Results February 1 st , 2019 2018 Results February 1 st 2019 / 2 Disclaimer This document

INVITED SPEAKER PRESENTATION GUIDELINES A goal of this Congress is to engage all attendees in

Supplementary Report May 2006 Kawasaki Kisen Kaisha, Ltd. 1. Shipping Markets 2. Financial Data

D ATA center networks use multi-rooted Clos topologies to balances - PDF document

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 1, JANUARY 2019 133 Luopan: Sampling-Based Load Balancing in Data Center Networks Peng Wang , Member, IEEE , George Trimponias, Hong Xu , Member, IEEE , and Yanhui Geng

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop

Cloud-based Global Load Balancing Improve Performance and reliability while reducing IT costs

Load Shift Working Group AUG 22, 2018 10AM 2PM PST CPUC COURTYARD ROOM

Alberta Energy and Capacity Presentation to AESO Adequacy and Demand Curve Working Group August

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

Using Octavia deep dive Dean H. Lorenz, IBM Research Haifa Allan Hu, Cloud Networking

Fahime Alizade &amp; Rawi Ramdhan } Introduction Why scan the Internet? How to detect

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ

Security for smart Electricity GRIDs Project type: Collaborative project small or medium

Optimized in-memory IBOR architecture in a cloud environment Using Apache Ignite Rafique Awan

Oregon PUD Association Annual Meeting www.avangrid.com 1 Who is Avangrid Renewables? AVANGRID

The evolution of load-balancing in a company remarkably like ours, with some sort of web

How to Re-Architect without Breaking Stuff (too much) Owen G Garrett Ma March 2018

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

Benefits Local catalog Seamless discovery and delivery: find it, click it, get it Consortial

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software &amp; IBM TSS

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference

1 The SP Suite has the capability to design an entire concrete structure from foundation to roof.

Transmission Plan Development Draft 2012/2013 ISO Transmission Plan Stakeholder Meeting Neil

AUTOMATIC CONTINGENCY SELECTION Ejebe/Wollenberg EE 8725 Presentation November 3, 2015 Tahnee

2018 Results February 1 st , 2019 2018 Results February 1 st 2019 / 2 Disclaimer This document

INVITED SPEAKER PRESENTATION GUIDELINES A goal of this Congress is to engage all attendees in

Supplementary Report May 2006 Kawasaki Kisen Kaisha, Ltd. 1. Shipping Markets 2. Financial Data

Fahime Alizade & Rawi Ramdhan } Introduction Why scan the Internet? How to detect

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software & IBM TSS