Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters - - PowerPoint PPT Presentation

survivable and bandwidth guaranteed embedding of virtual
SMART_READER_LITE
LIVE PREVIEW

Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters - - PowerPoint PPT Presentation

IEEE INFOCOM 2017 Datacenter Networks 1 Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters in Cloud Data Centers Ruozhou Yu , Guoliang Xue, and Xiang Zhang Arizona State University Dan Li Tsinghua University 1/25 Outlines q


slide-1
SLIDE 1

Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters in Cloud Data Centers

Ruozhou Yu, Guoliang Xue, and Xiang Zhang Arizona State University Dan Li Tsinghua University

1/25 IEEE INFOCOM 2017 Datacenter Networks 1

slide-2
SLIDE 2

Outlines

q Introduction and Motivation q System Model and Algorithm Design q Performance Evaluation q Conclusions

2/25

slide-3
SLIDE 3

The Cloud Shift

q Cloud computing: seems an omnipotent solution to all kinds of performance requirements q But is it as mighty as it seems?

3/25

The Mighty Cloud

slide-4
SLIDE 4

Inside the Cloud

q An illusion of infinite computing resources created by large clusters of interconnected machines in data centers q Performance bottleneck: Cloud network!

4/25

slide-5
SLIDE 5

VM & Bandwidth

q Traditional approach: Network-agnostic VM allocation q Recent advance: Bandwidth-guaranteed VM allocation q Or Virtual Cluster Embedding (VCE)!

v Existing algorithms can allocate bandwidth-guaranteed VMs with minimum bandwidth, migration costs, etc.

q But we know that Cloud machines do fail, quite often…

5/25

slide-6
SLIDE 6

Survivable VCE

q Question: How can we ensure VM availability even when its host machine could fail? q Answer: We prepare extra VMs and bandwidth just in case! q Question: And how much will that cost us? q Answer: No problem! We can minimize that! q Question: How are we going to achieve that? q Answer: Dynamic programming!

6/25

slide-7
SLIDE 7

Outline

q Introduction and Motivation q System Model and Algorithm Design q Performance Evaluation q Conclusions

7/25

slide-8
SLIDE 8

Network Topology

q Assumption: the DCN has a tree structure

v Abstracts many common DCN topologies (FatTree, VL2, etc)

8/25

c a1 a2 a3 a4 e41 e42 e32 e31 e22 e21 e12 e11 e11 e12 a11 a12 e21 e22 a21 a22 e31 e32 a31 a32 e41 e42 a41 a42 c1 c2 c3 c4

Abstract Tree Original FatTree

1 Gbps / Link 4 Gbps / Link 2 Gbps / Link 1 Gbps / Link

slide-9
SLIDE 9

VM Survivability Model

q Primary VMs: VMs that are active during normal operations; q Backup VMs: VMs in standby mode, activated when a primary VM’s PM fails

v Each backup VM synchronizes the states of multiple primary VMs

q Question: Can we find a bandwidth-guaranteed allocation of both primary and backup VMs to cover an arbitrary single- PM failure, with the minimum number of backup VMs?

9/25 a b c Migrate

slide-10
SLIDE 10

Dynamic Programming for SVCE

q Given: topology tree T, request J = <N, B> q Assumption: single PM failure

v Interpretation: a failure can be either within a subtree, or outside a subtree, but cannot be both.

q Key observation: each subtree’s ability to provide VMs is independent from the rest of the tree, both during normal

  • perations and during an arbitrary failure

q Two layers of Dynamic Programming

v Outer DP: DP for entire subtrees v Inner DP: DP for the first k sub-subtrees of each subtree

10/25

slide-11
SLIDE 11

DP in Details

q Outer DP: Nv[n0, n1] as the minimum number of total VMs needed in subtree Tv, to ensure that vTv can provide at least n0 VMs when no failure is in Tv; vTv can provide at least n1 VMs when any PM fails in Tv. q Inner DP: Nv’[n0, n1, k] as the minimum number of total VMs needed in the first k subtrees of v, to ensure that

v The k subtrees can provide n0 VMs when no failure is in them; v The k subtrees can provide n1 VMs when any PM fails in them.

q Alternately update the two tables:

v Nv[n0, n1] depends on Nv’[n0’, n1’, dv] (dv is the # subtrees under v); v Nv’[n0, n1, k] depends on Nv[n0’’, n1’’] of lower-layer nodes.

11/25

slide-12
SLIDE 12

Work-through Example

q J = <2, 100 Mbps>

12/25 n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 x x x 1 x x x 2 x x x 100Mbps 100Mbps 100Mbps N1[n0,n1] / N3

’[n0,n1,1]

N2[n0,n1] PM 1 PM 2 SW 3 N3

’[n0,n1,2] / N3[n0,n1]

slide-13
SLIDE 13

Work-through Example

q J = <2, 100 Mbps>

13/25 n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 x x x 1 x x x 2 x x 100Mbps 100Mbps 100Mbps N1[n0,n1] / N1

’[n0,n1,1]

N2[n0,n1] PM 1 PM 2 SW 3 N3

’[n0,n1,2] / N3[n0,n1]

4 n0=2 n1=2 n0=2 n1=0 n0=2 n1=0

slide-14
SLIDE 14

Work-through Example

q J = <2, 100 Mbps>

14/25 n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 x x x 1 x x x 2 x 4 100Mbps 100Mbps 100Mbps N1[n0,n1] / N1

’[n0,n1,1]

N2[n0,n1] PM 1 PM 2 SW 3 N3

’[n0,n1,2] / N3[n0,n1]

4 n0=2 n1=1 n0=2 n1=0 n0=2 n1=0

slide-15
SLIDE 15

Work-through Example

q J = <2, 100 Mbps>

15/25 n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 x x x 1 x x x 2 x 4 100Mbps 100Mbps 100Mbps N1[n0,n1] / N1

’[n0,n1,1]

N2[n0,n1] PM 1 PM 2 SW 3 N3

’[n0,n1,2] / N3[n0,n1]

3 n0=2 n1=1 n0=1 n1=0 n0=2 n1=0

slide-16
SLIDE 16

Work-through Example

q J = <2, 100 Mbps>

16/25 n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 x x x 1 x x x 2 x 4 100Mbps 100Mbps 100Mbps N1[n0,n1] / N1

’[n0,n1,1]

N2[n0,n1] PM 1 PM 2 SW 3 N3

’[n0,n1,2] / N3[n0,n1]

2 n0=2 n1=1 n0=1 n1=0 n0=1 n1=0

slide-17
SLIDE 17

Heuristic SVCE

q Optimal DP time complexity: O(|V| N6)

v where |V| is # tree nodes, N is # requested VMs.

q Question: Can we find a near-optimal solution with less time? q Observation: if we find a normal VCE with N+N’ VMs, such that each PM hosts at most N’ VMs, then we can always recover from any single PM failure. q Algorithm: search from N’=1 to N, each time using an existing VCE algorithm to find a VCE with N’ extra VMs, and each PM’s # VMs is bounded by N’. q Time Complexity: O(N·|V|log|V|)

17/25

slide-18
SLIDE 18

Outline

q Introduction and Motivation q System Model and Algorithm Design q Performance Evaluation q Conclusions

18/25

slide-19
SLIDE 19

Simulation Setups

q Tree-structured DCN

v 4-layer 8-ary (512 PMs, 73 switches) v 5 VM slots / PM v ToR bandwidth: 1 Gbps | Aggr/Core bandwidth: 10 Gbps

q Tenant VCs

v 1000 requests v 15 VMs and 300 Mbps per VM, on average v Poisson arrivals

q Comparison:

v OPT: Optimal DP SVCE algorithm v HEU: Heuristic SVCE algorithm v SBS: Shadow-based solution (dedicated VC backup) 19/25

slide-20
SLIDE 20

Simulation Results: Average VM Usage

20/25

slide-21
SLIDE 21

Simulation Results: Acceptance Ratio

21/25

slide-22
SLIDE 22

Simulation Results: Running Time

22/25

slide-23
SLIDE 23

Outline

q Introduction and Motivation q System Model and Algorithm Design q Performance Evaluation q Conclusions

23/25

slide-24
SLIDE 24

Conclusions

q A first study on Survivable VCE

v A two-layer optimal DP algorithm v A faster near-optimal heuristic algorithm

q Discussions

v Extension to tree-like topologies (FatTree, VL2, etc.) v Extension to cover a constant number of simultaneous failures

q Future work

v SVCE on generic data center topologies (BCube, JellyFish, etc.) v Covering link failures in addition to PM failures

24/25

slide-25
SLIDE 25

THANK YOU VERY MUCH!

Q&A?

25/25

slide-26
SLIDE 26

Hose Model Bandwidth Guarantee

q Request J = <N, B>

v N = 7, B = 100 Mbps

26/25 b c a 200 Mbps

Number of VMs Tc can offer (bandwidth constrained): nc [0, 2] ∩ [5, 7]

slide-27
SLIDE 27

DP in Details /2

27/25

q Outer DP update:

v PM level: v Switch level:

q Inner DP update:

v No subtree: v k-th subtree:

Bandwidth feasible VMs Bandwidth infeasible VMs Lower bound of upper bracket bw feasible VM

slide-28
SLIDE 28

Work-through Example

q J = <2, 100 Mbps>

28/25 n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 ∞ ∞ 1 1 ∞ ∞ 2 2 ∞ ∞ n0\n1 1 2 x x x 1 x x x 2 x 4 100Mbps 100Mbps 100Mbps N1[n0,n1] / N1

’[n0,n1,1]

N2[n0,n1] PM 1 PM 2 SW 3 N3

’[n0,n1,2] / N3[n0,n1]

3 n0=2 n1=1 n0=2 n1=0 n0=1 n1=0