Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters - PowerPoint PPT Presentation

IEEE INFOCOM 2017 Datacenter Networks 1 Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters in Cloud Data Centers Ruozhou Yu , Guoliang Xue, and Xiang Zhang Arizona State University Dan Li Tsinghua University 1/25

Outlines q Introduction and Motivation q System Model and Algorithm Design q Performance Evaluation q Conclusions 2/25

The Cloud Shift q Cloud computing : seems an omnipotent solution to all kinds of performance requirements The Mighty Cloud q But is it as mighty as it seems? 3/25

Inside the Cloud q An illusion of infinite computing resources created by large clusters of interconnected machines in data centers q Performance bottleneck: Cloud network ! 4/25

VM & Bandwidth q Traditional approach: Network-agnostic VM allocation q Recent advance: Bandwidth-guaranteed VM allocation q Or Virtual Cluster Embedding (VCE) ! v Existing algorithms can allocate bandwidth-guaranteed VMs with minimum bandwidth, migration costs, etc. q But we know that Cloud machines do fail, quite often… 5/25

Survivable VCE q Question : How can we ensure VM availability even when its host machine could fail? q Answer : We prepare extra VMs and bandwidth just in case! q Question : And how much will that cost us? q Answer : No problem! We can minimize that! q Question : How are we going to achieve that? q Answer : Dynamic programming! 6/25

Outline q Introduction and Motivation q System Model and Algorithm Design q Performance Evaluation q Conclusions 7/25

Network Topology q Assumption : the DCN has a tree structure v Abstracts many common DCN topologies (FatTree, VL2, etc) Original FatTree 1 Gbps / Link c 1 c 2 c 3 c 4 a 11 a 12 a 21 a 22 a 31 a 32 a 41 a 42 e 11 e 12 e 21 e 22 e 31 e 32 e 41 e 42 Abstract Tree c 4 Gbps / Link a 1 a 2 a 3 a 4 2 Gbps / Link e 11 e 12 e 21 e 22 e 31 e 32 e 41 e 42 1 Gbps / Link 8/25

VM Survivability Model q Primary VMs : VMs that are active during normal operations; q Backup VMs : VMs in standby mode, activated when a primary VM’s PM fails v Each backup VM synchronizes the states of multiple primary VMs a b c Migrate q Question : Can we find a bandwidth-guaranteed allocation of both primary and backup VMs to cover an arbitrary single- PM failure, with the minimum number of backup VMs? 9/25

Dynamic Programming for SVCE q Given : topology tree T , request J = < N , B > q Assumption : single PM failure v Interpretation : a failure can be either within a subtree, or outside a subtree, but cannot be both. q Key observation : each subtree’s ability to provide VMs is independent from the rest of the tree, both during normal operations and during an arbitrary failure q Two layers of Dynamic Programming v Outer DP : DP for entire subtrees v Inner DP : DP for the first k sub-subtrees of each subtree 10/25

DP in Details q Outer DP : N v [ n 0 , n 1 ] as the minimum number of total VMs needed in subtree T v , to ensure that v T v can provide at least n 0 VMs when no failure is in T v ; v T v can provide at least n 1 VMs when any PM fails in T v . q Inner DP : N v ’ [ n 0 , n 1 , k ] as the minimum number of total VMs needed in the first k subtrees of v , to ensure that v The k subtrees can provide n 0 VMs when no failure is in them; v The k subtrees can provide n 1 VMs when any PM fails in them. q Alternately update the two tables: v N v [ n 0 , n 1 ] depends on N v ’ [ n 0 ’ , n 1 ’ , d v ] ( d v is the # subtrees under v ); v N v ’ [ n 0 , n 1 , k ] depends on N v [ n 0 ’’ , n 1 ’’ ] of lower-layer nodes. 11/25

Work-through Example ’ [n 0 ,n 1 ,2] / N 3 [n 0 ,n 1 ] N 3 q J = < 2 , 100 Mbps> n 0 \n 1 0 1 2 0 x x x 1 x x x 100Mbps 2 x x x 100Mbps 100Mbps SW 3 ’ [n 0 ,n 1 ,1] N 1 [n 0 ,n 1 ] / N 3 N 2 [n 0 ,n 1 ] PM 1 PM 2 n 0 \n 1 0 1 2 n 0 \n 1 0 1 2 0 0 ∞ ∞ 0 0 ∞ ∞ 1 1 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ 2 2 ∞ ∞ 12/25

Work-through Example ’ [n 0 ,n 1 ,2] / N 3 [n 0 ,n 1 ] N 3 q J = < 2 , 100 Mbps> n 0 \n 1 0 1 2 0 x x x 1 x x x 100Mbps n 0 =2 2 x x 4 n 1 =2 100Mbps 100Mbps SW 3 ’ [n 0 ,n 1 ,1] N 1 [n 0 ,n 1 ] / N 1 N 2 [n 0 ,n 1 ] PM 1 PM 2 n 0 \n 1 0 1 2 n 0 \n 1 0 1 2 0 0 ∞ ∞ 0 0 ∞ ∞ 1 1 ∞ ∞ 1 1 ∞ ∞ n 0 =2 n 0 =2 2 2 ∞ ∞ 2 2 ∞ ∞ n 1 =0 n 1 =0 13/25

Work-through Example ’ [n 0 ,n 1 ,2] / N 3 [n 0 ,n 1 ] N 3 q J = < 2 , 100 Mbps> n 0 \n 1 0 1 2 0 x x x 1 x x x 100Mbps n 0 =2 2 x 4 4 n 1 =1 100Mbps 100Mbps SW 3 ’ [n 0 ,n 1 ,1] N 1 [n 0 ,n 1 ] / N 1 N 2 [n 0 ,n 1 ] PM 1 PM 2 n 0 \n 1 0 1 2 n 0 \n 1 0 1 2 0 0 ∞ ∞ 0 0 ∞ ∞ 1 1 ∞ ∞ 1 1 ∞ ∞ n 0 =2 n 0 =2 2 2 ∞ ∞ 2 2 ∞ ∞ n 1 =0 n 1 =0 14/25

Work-through Example ’ [n 0 ,n 1 ,2] / N 3 [n 0 ,n 1 ] N 3 q J = < 2 , 100 Mbps> n 0 \n 1 0 1 2 0 x x x 1 x x x 100Mbps n 0 =2 2 x 3 4 n 1 =1 100Mbps 100Mbps SW 3 ’ [n 0 ,n 1 ,1] N 1 [n 0 ,n 1 ] / N 1 N 2 [n 0 ,n 1 ] PM 1 PM 2 n 0 \n 1 0 1 2 n 0 \n 1 0 1 2 0 0 ∞ ∞ 0 0 ∞ ∞ n 0 =1 1 1 ∞ ∞ 1 1 ∞ ∞ n 1 =0 n 0 =2 2 2 ∞ ∞ 2 2 ∞ ∞ n 1 =0 15/25

Work-through Example ’ [n 0 ,n 1 ,2] / N 3 [n 0 ,n 1 ] N 3 q J = < 2 , 100 Mbps> n 0 \n 1 0 1 2 0 x x x 1 x x x 100Mbps n 0 =2 2 x 2 4 n 1 =1 100Mbps 100Mbps SW 3 ’ [n 0 ,n 1 ,1] N 1 [n 0 ,n 1 ] / N 1 N 2 [n 0 ,n 1 ] PM 1 PM 2 n 0 \n 1 0 1 2 n 0 \n 1 0 1 2 0 0 ∞ ∞ 0 0 ∞ ∞ n 0 =1 n 0 =1 1 1 ∞ ∞ 1 1 ∞ ∞ n 1 =0 n 1 =0 2 2 ∞ ∞ 2 2 ∞ ∞ 16/25

Heuristic SVCE q Optimal DP time complexity: O (| V | N 6 ) v where | V | is # tree nodes , N is # requested VMs . q Question : Can we find a near - optimal solution with less time ? q Observation : if we find a normal VCE with N + N ’ VMs, such that each PM hosts at most N ’ VMs, then we can always recover from any single PM failure. q Algorithm : search from N ’=1 to N , each time using an existing VCE algorithm to find a VCE with N ’ extra VMs, and each PM’s # VMs is bounded by N ’ . q Time Complexity : O ( N · | V |log| V |) 17/25

Simulation Setups q Tree-structured DCN v 4-layer 8-ary (512 PMs, 73 switches) v 5 VM slots / PM v ToR bandwidth: 1 Gbps | Aggr/Core bandwidth: 10 Gbps q Tenant VCs v 1000 requests v 15 VMs and 300 Mbps per VM, on average v Poisson arrivals q Comparison: v OPT: Optimal DP SVCE algorithm v HEU: Heuristic SVCE algorithm v SBS: Shadow-based solution (dedicated VC backup) 19/25

Simulation Results: Average VM Usage 20/25

Simulation Results: Acceptance Ratio 21/25

Simulation Results: Running Time 22/25

Conclusions q A first study on Survivable VCE v A two-layer optimal DP algorithm v A faster near-optimal heuristic algorithm q Discussions v Extension to tree-like topologies (FatTree, VL2, etc.) v Extension to cover a constant number of simultaneous failures q Future work v SVCE on generic data center topologies (BCube, JellyFish, etc.) v Covering link failures in addition to PM failures 24/25

Q&A? THANK YOU VERY MUCH! 25/25

Hose Model Bandwidth Guarantee q Request J = < N , B > v N = 7, B = 100 Mbps a 200 Mbps Number of VMs T c can offer (bandwidth constrained): b c [0, 2] ∩ [5, 7] n c ∈ 26/25

DP in Details /2 Bandwidth feasible VMs q Outer DP update : v PM level: Bandwidth Lower bound of infeasible VMs upper bracket bw feasible VM v Switch level: q Inner DP update : v No subtree: v k -th subtree: 27/25

Work-through Example ’ [n 0 ,n 1 ,2] / N 3 [n 0 ,n 1 ] N 3 q J = < 2 , 100 Mbps> n 0 \n 1 0 1 2 0 x x x 1 x x x 100Mbps n 0 =2 2 x 3 4 n 1 =1 100Mbps 100Mbps SW 3 ’ [n 0 ,n 1 ,1] N 1 [n 0 ,n 1 ] / N 1 N 2 [n 0 ,n 1 ] PM 1 PM 2 n 0 \n 1 0 1 2 n 0 \n 1 0 1 2 0 0 ∞ ∞ 0 0 ∞ ∞ n 0 =1 1 1 ∞ ∞ 1 1 ∞ ∞ n 1 =0 n 0 =2 2 2 ∞ ∞ 2 2 ∞ ∞ n 1 =0 28/25

Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters - PowerPoint PPT Presentation

IEEE INFOCOM 2017 Datacenter Networks 1 Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters in Cloud Data Centers Ruozhou Yu , Guoliang Xue, and Xiang Zhang Arizona State University Dan Li Tsinghua University 1/25 Outlines q

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

CockroachDBs Survivability Model Scalable, Survivable, Consistent, SQL presented by Marc

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Bandwidth Management Chris Wilson Aptivate Ltd, UK AfNOG 2010 Ingredients What is bandwidth

Bandwidth Ex Parte Addendum M a y 1 0 , 2 0 1 8 Addendum to Bandwidth FCC Meeting on May 2,

Virtualising our CPE Mantychore is part-funded by the EC under Grant Agreement N 261527

GROUPS Virtual Group Topics Overview of Virtual Groups Participating as a Virtual Group in

Housing Guaranteed Housing Pace offers guaranteed housing to entering first year and transfer

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima

Guaranteed Energy Savings Program (GESP) Peter Berger, GESP Manager Guaranteed Energy Savings

The Survivable Network Analysis Method: Assessing Survivability of Critical Systems

Survivable Real-Time Network Services David L. Mills University of Delaware

Cloudinomicon :: Idempotent Infrastructure, Survivable Systems & Bringing Sexy Back to

CockroachDB Scalable, survivable, strongly consistent, SQL presented by Ben Darnell / CTO About

What We Talk About When We Talk About Cloud Network Performance* * With apologies to Raymond

Thinking Outside the Box: Innovative Pathways to Refugee Employment What makes us unique? Hire

Tools & Techniques Triage Using 99 Business Analyst Techniques to better understand

Network Security Architecture 1 Additional Reading Firewalls and Internet Security:

Can Far Memory Improve Job Throughput? Eurosys 2020 Talk Emmanuel Amaro, Christopher

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , Mosharaf Chowdhury, Harsha

An Introduction to the Tor Ecosystem for Developers Alexander Fry February 2, 2020 FOSDEM

Nonparametric density estimation Christopher F Baum EC 823: Applied Econometrics Boston College,

Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters - PowerPoint PPT Presentation

IEEE INFOCOM 2017 Datacenter Networks 1 Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters in Cloud Data Centers Ruozhou Yu , Guoliang Xue, and Xiang Zhang Arizona State University Dan Li Tsinghua University 1/25 Outlines q

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

CockroachDBs Survivability Model Scalable, Survivable, Consistent, SQL presented by Marc

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Bandwidth Management Chris Wilson Aptivate Ltd, UK AfNOG 2010 Ingredients What is bandwidth

Bandwidth Ex Parte Addendum M a y 1 0 , 2 0 1 8 Addendum to Bandwidth FCC Meeting on May 2,

Virtualising our CPE Mantychore is part-funded by the EC under Grant Agreement N 261527

GROUPS Virtual Group Topics Overview of Virtual Groups Participating as a Virtual Group in

Housing Guaranteed Housing Pace offers guaranteed housing to entering first year and transfer

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima

Guaranteed Energy Savings Program (GESP) Peter Berger, GESP Manager Guaranteed Energy Savings

The Survivable Network Analysis Method: Assessing Survivability of Critical Systems

Survivable Real-Time Network Services David L. Mills University of Delaware

Cloudinomicon :: Idempotent Infrastructure, Survivable Systems &amp; Bringing Sexy Back to

CockroachDB Scalable, survivable, strongly consistent, SQL presented by Ben Darnell / CTO About

What We Talk About When We Talk About Cloud Network Performance* * With apologies to Raymond

Thinking Outside the Box: Innovative Pathways to Refugee Employment What makes us unique? Hire

Tools &amp; Techniques Triage Using 99 Business Analyst Techniques to better understand

Network Security Architecture 1 Additional Reading Firewalls and Internet Security:

Can Far Memory Improve Job Throughput? Eurosys 2020 Talk Emmanuel Amaro, Christopher

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , Mosharaf Chowdhury, Harsha

An Introduction to the Tor Ecosystem for Developers Alexander Fry February 2, 2020 FOSDEM

Nonparametric density estimation Christopher F Baum EC 823: Applied Econometrics Boston College,

Cloudinomicon :: Idempotent Infrastructure, Survivable Systems & Bringing Sexy Back to

Tools & Techniques Triage Using 99 Business Analyst Techniques to better understand