B4 and After : Managing Hierarchy, Partitioning, and Asymmetry for - PowerPoint PPT Presentation

B4 and After : Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN

(“Chi”) Chi-yao Hong , Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Kondapa Naidu B., Chandan Bhagat, Sourabh Jain, Jay Kaimal, Shiyu Liang, Kirill Mendelev, Steve Padgett, Faro Rabe, Saikat Ray, Malveeka Tewari, Matt Tierney, Monika Zahn, Jonathan Zolla, Joon Ong, Amin Vahdat On behalf of many others in: Google Network Infrastructure and Network SREs

99.99% availability 99.9% availability 99% availability First-generation toward Stargate B4 network J-POP highly available, massive-scale Saturn network 2018 y p o 2017 c 2016 2015 r k o w e t n 2014 2013 2012 2011 >100x more traffic 3

Previous B4 paper published in SIGCOMM 2013 4

Background: B4 with SDN Traffic Engineering (TE) Deployed in 2012 Site-level tunnels (tunnels & tunnel splits) Central Per-Site 12-site Topology TE Domain TE Controller Controllers Demand Matrix (via Google BwE) 5

Background: B4 with SDN Traffic Engineering (TE) Deployed in 2012 Key Takeaways: High efficiency : Lower per-byte cost compared ❏ with B2 (Google global backbone running RSVP TE on vendor gears) Deterministic convergence : Fast, global TE ❏ optimization and failure handling Rapid software iteration: ~1 month for developing ❏ and deploying a median-size software features 6

But, it also comes with new challenges 7

Grand Challenge #1: High Availability Requirements B4 initially Service Availability Application Examples Class SLO had 99% Search ads, DNS, WWW 99.99% SC4 availability in 2013 Proto service backend, Email 99.95% SC3 Ads database replication 99.9% SC2 Search index copies, logs 99% SC1 Bulk transfer N/A SC0 8

Very demanding goal, given: inherent unreliability of long-haul links ● necessary management operations ● Service Availability Application Examples Class SLO B4 initially Search ads, DNS, WWW 99.99% SC4 had 99% Proto service backend, Email 99.95% availability SC3 Ads database replication 99.9% SC2 Search index copies, logs 99% SC1 Bulk transfer N/A SC0 9

Grand Challenge #2: Scale Requirements our bandwidth requirement doubled every ~9 months 10

traffic increased by >100x in 5 years 11

Grand Challenge #2: Scale Requirements Scale increased across dimensions: our bandwidth #Cluster prefixes: 8x ● requirement doubled #B4 sites: 3x ● every ~9 months #Control domains: 16x ● #Tunnels: 60x ● 12

Other challenges: No disruption to existing traffic, maintain high cost efficiency and high feature velocity 13

To meet these demanding requirements, we’ve had to aggressively develop many point solutions 14

1. Flat topology scales poorly and hurts availability 2. Solving capacity asymmetry Lessons problem in hierarchical topology is key to achieve high availability at Learned scale 3. Scalable switch forwarding rule management is essential to hierarchical TE 15

5.12 / 6.4 Tbps To WAN (other B4 sites) Saturn CF CF CF CF First-generation B4 site fabric BF BF BF BF Site 5.12 Tbps To Clusters Site Site Site B4 WAN 16

5.12 / 6.4 Tbps To WAN (other B4 sites) Saturn CF CF CF CF First-generation B4 site fabric BF BF BF BF Site 5.12 Tbps To Clusters Site Scaling option #1 : Site Site Add more chassis--Up to 8 chassis per Saturn fabric B4 WAN 17

Scaling option #2 : Build multiple B4 sites in close proximity Slower central TE controller Site Site Limited switch table limit Site Site Site Site Complicated capacity planning and job allocation 18

Jumpgate: Two-layer Topology Jumpgate Site 80 Tbps toward WAN / clusters / sidelinks Supernode spine switches x16 edge switches x32 19

Jumpgate: Two-layer Topology Jumpgate Site Support horizontal scaling by adding more supernodes to a site 80 Tbps toward WAN / clusters / sidelinks Support vertical scaling by upgrading a supernode in place to new generation Supernode spine switches x16 Improve availability with granular, per-supernode control domain edge switches x32 20

1. Flat topology scales poorly and hurts availability 2. Solving capacity asymmetry Lessons problem in hierarchical topology is key to achieve high Learned availability at scale 3. Scalable switch forwarding rule management is essential to hierarchical TE 21

Site A Site B Site C 4 4 1 1 4 4 4 4 4 4 sum of supernode-level link capacity 16 16 Site A Site B Site C 16 22

Site A Site B Site C 2 2 1 1 2 2 2 2 2 2 Abstract loss 43% = (14-8) / 14 Bottleneck! 14? 16 Site A Site B Site C 8 8 23

100% capacity loss in 18% cases Cumulative function of site-level links and 2% capacity loss topology events at median case due to striping inefficiency Site-level link capacity loss due to topology abstraction / total capacity [log 10 scale] 24

Solution = Sidelinks + Supernode-level TE 25

Site A Site B Site C 3.5 3.5 1 1 3.5 3.5 3.5 3.5 3.5 3.5 ● 57% toward next site ● 43% toward self site 26

Solution = Sidelinks + Supernode-level TE Multi-layer TE (Site-level & supernode-level) turns out to be challenging! 27

Design Proposals Hierarchical Tunneling Supernode-level TE Site-level tunnels + Supernode-level tunnels Supernode-level sub-tunnels Two layers of IP Scaling challenges: encapsulation lead to Increase path allocation inefficient hashing run time by 188x longer 28

Site A Site B (4 supernodes) (2 supernodes) x x 4x x x Assume balanced ingress traffic Tunnel Split Group (TSG) Supernode-level traffic splits; Maximize admissible No packet encapsulation; demand subject to fairness Calculated per site-level link and link capacity constraint 29

Site A Site B (4 supernodes) (2 supernodes) Greedy Exhaustive Waterfill Algorithm Iteratively allocate each flow on their direct path (w/o sidelinks) or alternatively on their indirect paths (w/ sidelinks on source site) until any flow cannot be allocated further Provably Take less than 1 Low abstraction forwarding loop second to run capacity loss free 30

< 2% 100% loss loss Cumulative function of site-level links and topology events Site-level link capacity loss due to topology abstraction / total capacity [log 10 scale] 31

TSG Sequencing Problem A1 B1 A1 B1 A2 B2 A2 B2 Current TSGs Target TSGs Bad properties Forwarding Loop Blackhole during update: 32

Dependency Graph based TSG Update 1. Map target TSGs to a supernode dependency graph 2. Apply TSG update in reverse topological ordering* * Share ideas with work in IGP updates: Francois & Bonaventure, Avoiding Transient Loops during IGP ● convergence in IP Networks, INFOCOM’05 Vanbever et al., Seamless Network-wide IGP Migrations, ● SIGCOMM’11 Loop-free and no Requires no One or two steps in extra blackhole packet tagging >99.7% of TSG ops 33

1. Flat topology scales poorly and hurts availability 2. Solving capacity asymmetry Lessons problem in hierarchical topology is key to achieve high availability at Learned scale 3. Scalable switch forwarding rule management is essential to hierarchical TE 34

Multi-stage Hashing across Switches in Clos Supernode 1. Ingress traffic at edge switches: Supernode a. Site-level tunnel split x16 b. TSG site-level split (to self-site or next-site) 2. At spine switches: x32 a. TSG supernode-level split b. Egress edge switch split 3. Egress traffic at edge switches: a. Egress port/trunk split Enable hierarchical TE at scale: Overall throughput improved by >6% B4 Site 35

99% availability 99.9% availability 99.99% availability Jumpgate: Two-layer topology toward Flat topology J-POP Stargate highly available, massive-scale Saturn network c i f f a r t e r o m x 0 0 1 > 2018 copy 2017 2016 2015 network 2014 2013 2012 2011 Efficient switch rule management TSG: & more service SDN TE tunneling Hierarchical TE classes Two service classes 36

Conclusions Highly available WAN with plentiful bandwidth offers ❏ unique benefits to many cloud services (e.g., Spanner) Future Work--Limit the blast radius of rare yet ❏ catastrophic failures Reduce dependencies across components ❏ Network operation via per-QoS canary ❏ 37

B4 and After : Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN Before After Copy network with 99% availability High-available network with 99.99% availability Inter-DC WAN with moderate number of sites 100x more traffic, 60x more tunnels Saturn: flat site topology & Jumpgate: hierarchical topology & per-site domain TE controller granular TE control domain Site-level tunneling in conjunction with Site-level tunneling supernode-level TE (“Tunnel Split Group”) Multi-stage hashing across switches in Clos Tunnel splits implemented at ingress switches supernode

Switch Pipeline ACL ECMP Encap Supernode (Flow Match) (Port Hashing) (+Tunnel IP) x16 x32 B4 Site 39

B4 and After : Managing Hierarchy, Partitioning, and Asymmetry for - PowerPoint PPT Presentation

B4 and After : Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN (Chi) Chi-yao Hong , Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Kondapa Naidu B., Chandan Bhagat,

Linux Filesystem Hierarchy Linux Filesystem Hierarchy and Hard Disk Partitioning and Hard Disk

Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System

GLO Science Professional Before & After Images Before GLO After GLO Before GLO After GLO

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning Goals Assignment of

Partitioning Problem and Usage Lecture 8 CSCI 4974/6971 26 Sep 2016 1 / 14 Todays Biz 1.

Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Hierarchy of School Marketing Needs Leadership Day - February 16, 2018 Maslows Hierarchy of

Extensions of the Caucal Hierarchy? Pawe Parys University of Warsaw LATA 2019 Caucal

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Hierarchy Builder C.Cohen, K.Sakaguchi and E.Tassi Disclaimer: this talk is an advertisement

Polynomial Hierarchy A polynomial-bounded version of Kleenes Arithmetic Hierarchy becomes

XIV. Arithmetic Hierarchy Yuxi Fu BASICS, Shanghai Jiao Tong University We introduce a hierarchy

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder

Hierarchy of Provider Edge Devices in Hierarchy of Provider Edge Devices in BGP/MPLS VPN

Network Routing Hatem Takruri, Ibrahim Kettaneh , Ahmed Alquraan, Samer Al-Kiswany 1 In

Dynamic Compilation and Optimization of Packet Processing Programs Gbor Rtvri, Lszl

Sparsity in Dependency Grammar Induction Jennifer Gillenwater 1 Kuzman Ganchev 1 ca 2 Jo ao

[show me your privileges and I will lead you to SYSTEM] Andrea Pierini, Paris, June 19 th 2019 1

Li Xiong CS573 Data Privacy and Security

MOBILE COMPUTING CSE 40814/60814 Fall 2015 Bluetooth Basic idea Universal radio

Multi-Task Learning for Improved Discriminative Training in SMT Patrick Simianer and Stefan

Da t a pl a n e f o r SUBSCRIBER GATEWAY OVERVIEW & CHALLENGES Natarajan Venkataraman,

B4 and After : Managing Hierarchy, Partitioning, and Asymmetry for - PowerPoint PPT Presentation

B4 and After : Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN (Chi) Chi-yao Hong , Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Kondapa Naidu B., Chandan Bhagat,

Linux Filesystem Hierarchy Linux Filesystem Hierarchy and Hard Disk Partitioning and Hard Disk

Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply

Partitioning Introduction to Partitioning Mahapatra-Texas A&amp;M-Spring02 1 System

GLO Science Professional Before &amp; After Images Before GLO After GLO Before GLO After GLO

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning Goals Assignment of

Partitioning Problem and Usage Lecture 8 CSCI 4974/6971 26 Sep 2016 1 / 14 Todays Biz 1.

Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Hierarchy of School Marketing Needs Leadership Day - February 16, 2018 Maslows Hierarchy of

Extensions of the Caucal Hierarchy? Pawe Parys University of Warsaw LATA 2019 Caucal

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Hierarchy Builder C.Cohen, K.Sakaguchi and E.Tassi Disclaimer: this talk is an advertisement

Polynomial Hierarchy A polynomial-bounded version of Kleenes Arithmetic Hierarchy becomes

XIV. Arithmetic Hierarchy Yuxi Fu BASICS, Shanghai Jiao Tong University We introduce a hierarchy

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder

Hierarchy of Provider Edge Devices in Hierarchy of Provider Edge Devices in BGP/MPLS VPN

Network Routing Hatem Takruri, Ibrahim Kettaneh , Ahmed Alquraan, Samer Al-Kiswany 1 In

Dynamic Compilation and Optimization of Packet Processing Programs Gbor Rtvri, Lszl

Sparsity in Dependency Grammar Induction Jennifer Gillenwater 1 Kuzman Ganchev 1 ca 2 Jo ao

[show me your privileges and I will lead you to SYSTEM] Andrea Pierini, Paris, June 19 th 2019 1

Li Xiong CS573 Data Privacy and Security

MOBILE COMPUTING CSE 40814/60814 Fall 2015 Bluetooth Basic idea Universal radio

Multi-Task Learning for Improved Discriminative Training in SMT Patrick Simianer and Stefan

Da t a pl a n e f o r SUBSCRIBER GATEWAY OVERVIEW &amp; CHALLENGES Natarajan Venkataraman,

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System

GLO Science Professional Before & After Images Before GLO After GLO Before GLO After GLO

Da t a pl a n e f o r SUBSCRIBER GATEWAY OVERVIEW & CHALLENGES Natarajan Venkataraman,