Understanding Understanding Lifecycle Management Lifecycle - - PowerPoint PPT Presentation

understanding understanding lifecycle management
SMART_READER_LITE
LIVE PREVIEW

Understanding Understanding Lifecycle Management Lifecycle - - PowerPoint PPT Presentation

Understanding Understanding Lifecycle Management Lifecycle Management Complexity of Datacenter Complexity of Datacenter Topologies Topologies Mingyang Zhang (USC) Mingyang Zhang (USC) Radhika Niranjan Mysore (VMware Research) Sucha


slide-1
SLIDE 1

Understanding Lifecycle Management Complexity of Datacenter Topologies

Mingyang Zhang (USC) Radhika Niranjan Mysore (VMware Research) Sucha Supittayapornpong (USC) Ramesh Govindan (USC)

Understanding Lifecycle Management Complexity of Datacenter Topologies

Mingyang Zhang (USC) Radhika Niranjan Mysore (VMware Research) Sucha Supittayapornpong (USC) Ramesh Govindan (USC)

1

slide-2
SLIDE 2

Datacenter topology designs

5-layer Clos Jellyfish [NSDI12] Xpander [CoNEXT16]

2

slide-3
SLIDE 3

Previous focus

3

Clos Jellyfish Xpander

Cost ($) Capacity

slide-4
SLIDE 4

Manageability has received very little attention!

4

Clos Jellyfish Xpander

Capacity Cost ($)

slide-5
SLIDE 5

Manageability has received very little attention!

5

Clos Jellyfish Xpander

Capacity Management complexity Cost ($)

slide-6
SLIDE 6

Our Focus: Lifecycle management

6

How does the complexity of managing data centers depend on the topology?

slide-7
SLIDE 7

Lifecycle management of datacenter topologies

Logical topology Deployment Physical topology

7

slide-8
SLIDE 8

Lifecycle management of datacenter topologies

Logical topology Deployment Physical topology New added switches Expansion

8

slide-9
SLIDE 9

Management complexity is important

9

⎯ Complex deployment stalls the rollout of services for a long time

slide-10
SLIDE 10

Management complexity is important

10

⎯ Complex deployment stalls the rollout of services for a long time ⎯ Expensive considering the increasing traffic demand

From Singh et al. Sigcomm15

slide-11
SLIDE 11

Management complexity is important

11

⎯ Topology expansion leads to capacity drop due to rewiring ⎯ Complex expansion leads to degraded capacity for a long time

New added switches

slide-12
SLIDE 12

12

Challenges Contributions

slide-13
SLIDE 13

13

How to characterize the management complexity? Metrics ⎯ Deployment ⎯ Expansion

Challenges Contributions

slide-14
SLIDE 14

14

How to characterize the management complexity? Metrics ⎯ Deployment ⎯ Expansion

Challenges

How does topology structure affect the management complexity? Comparison of topologies ⎯ No topology dominates ⎯ Principles learned

Contributions

slide-15
SLIDE 15

15

How to characterize the management complexity? Metrics ⎯ Deployment ⎯ Expansion

Challenges

How does topology structure affect the management complexity? Is there a topology family with lower management complexity, lower cost and high capacity? Comparison of topologies ⎯ No topology dominates ⎯ Principles learned New topology ⎯ FatClique

Contributions

slide-16
SLIDE 16

16

How to characterize the management complexity? Metrics ⎯ Deployment ⎯ Expansion

Challenges

How does topology structure affect the management complexity? Is there a topology family with lower management complexity, lower cost and high capacity? Comparison of topologies ⎯ No topology dominates ⎯ Principles learned New topology ⎯ FatClique

Contributions

slide-17
SLIDE 17

Lifecycle management overview

⎯ Problems: packaging, wiring, placement, rewiring... ⎯ Constraints: switch, rack, patch panel, cable tray...

Broadcom Trident 3 Optical patch panel Rack Cable tray

17

slide-18
SLIDE 18

Methodology

From first principles ⎯ Understand in detail how topologies are deployed and expanded ⎯ Derive metrics that capture the complexity of these operations

18

slide-19
SLIDE 19

19

How to characterize the management complexity? Metrics ⎯ Deployment ⎯ Expansion

Challenges

How does topology structure affect the management complexity? Is there a topology family with lower management complexity, lower cost and high capacity? Comparison of topologies ⎯ No topology dominates ⎯ Principles learned New topology ⎯ FatClique

Contributions

slide-20
SLIDE 20

Packaging

switch server data center racks

...

Deployment

20

slide-21
SLIDE 21

Packaging

switch server server rack switch rack server rack switch rack

...

Deployment

21

Metric: number of switches

slide-22
SLIDE 22

Wiring

switch rack

Intra-rack links: short and cheap

Deployment

22

slide-23
SLIDE 23

Wiring complexity

rack rack

... ...

Inter-rack links

Deployment

23

slide-24
SLIDE 24

Wiring

rack rack

... ...

Inter-rack links over cable trays (expensive)

Deployment

24

Main wiring complexity comes from inter-rack links!

slide-25
SLIDE 25

Cable bundling Deployment

25

Too many fibers to be handled individually!

slide-26
SLIDE 26

Cable bundling

Cable bundle ⎯ a fixed number of identical-length fibers between two clusters of network devices. Bundle type ⎯ capacity (# fibers in a bundle) ⎯ length

Deployment

26

slide-27
SLIDE 27

Cable bundling

Top view of racks

Deployment

27

16 individual fibers, 4 types of length

Bundle type: (bundle capacity, bundle length)

w/o bundling

slide-28
SLIDE 28

Cable bundling

Top view of racks

Deployment

28

16 individual fibers, 4 types of length

Bundle type: (bundle capacity, bundle length)

aggregator

8 equal-length bundles, 1 bundle type bundle w/ bundling w/o bundling Metric: the number of bundle types

slide-29
SLIDE 29

Cable bundling

It is hard to handle individual fibers with various length!

Deployment

29

w/o bundling w/ bundling [Singh, et al. Sigcomm15]

slide-30
SLIDE 30

Role of patch panel in bundling Deployment

30

Aggregator: Patch panel Aggregator

slide-31
SLIDE 31

Role of patch panel in bundling Deployment

31

Manual process Metric: the number of patch panels Aggregator: Patch panel

slide-32
SLIDE 32

Deployment complexity metrics

32

slide-33
SLIDE 33

Deployment complexity metrics

33

# switches

...

slide-34
SLIDE 34

Deployment complexity metrics

34

# switches # patch panels

...

slide-35
SLIDE 35

Deployment complexity metrics

35

# switches # patch panels # bundle types

...

slide-36
SLIDE 36

36

How to characterize the management complexity? Metrics ⎯ Deployment ⎯ Expansion

Challenges

How does topology structure affect the management complexity? Is there a topology family with lower management complexity, lower cost and high capacity? Comparison of topologies ⎯ No topology dominates ⎯ Principles learned New topology ⎯ FatClique

Contributions

slide-37
SLIDE 37

Expansion complexity

37

Metric: # Expansion steps New

slide-38
SLIDE 38

A single expansion step complexity

38

It is hard to move existing links in cable trays

slide-39
SLIDE 39

A single expansion step complexity

39

Patch panel rack Patch panel rack

Existing links New links

slide-40
SLIDE 40

A single expansion step complexity

40

New Spine Patch panel rack Patch panel rack Patch panel rack Patch panel rack

Existing links New links

slide-41
SLIDE 41

A single expansion step complexity

41

New Spine Existing links New links

Metric: # Rewired links per patch panel rack

Patch panel rack Patch panel rack Patch panel rack Patch panel rack

slide-42
SLIDE 42

Metrics

Deployment

42

# Switches # Patch panels # Bundle types Expansion # Expansion step # Rewired links per patch panel rack

slide-43
SLIDE 43

43

How to characterize the management complexity? Metrics ⎯ Deployment ⎯ Expansion

Challenges

How does topology structure affect the management complexity? Is there a topology family with lower management complexity, lower cost and high capacity? New topology ⎯ FatClique

Contributions

Comparison of topologies ⎯ No topology dominates ⎯ Principles learned

slide-44
SLIDE 44

Topology comparison case study

We equalize capacities of topologies

4-layer Clos (Medium) Jellyfish Patch panels Bundle types Switches Re-wired links per patch panel rack Expansion steps

44

slide-45
SLIDE 45

Topology comparison case study

We equalize capacities of topologies

4-layer Clos (Medium) Jellyfish Patch panels Bundle types Switches Re-wired links per patch panel rack Expansion steps

45

slide-46
SLIDE 46

Topology comparison case study

We equalize capacities of topologies

4-layer Clos (Medium) Jellyfish Patch panels Bundle types Switches Re-wired links per patch panel rack Expansion steps

46

slide-47
SLIDE 47

Topology comparison case study

We equalize capacities of topologies

4-layer Clos (Medium) Jellyfish Patch panels Bundle types Switches Re-wired links per patch panel rack Expansion steps

47

No topology dominates by all metrics!

slide-48
SLIDE 48

Principles learned

⎯ Importance of regularity ⎯ Importance of maximizing intra-rack links ⎯ Importance of fat edge

48

slide-49
SLIDE 49

Principle 1: Importance of regularity

Jellyfish is a random graph which leads to non-uniform bundles between switch clusters. In large scale, Jellyfish has

  • ne order of magnitude

more bundle types than Clos!

49

slide-50
SLIDE 50

Principle 2: Importance of maximizing intra-rack links

50

Switch Intra-rack Inter-rack

Rack in Clos

slide-51
SLIDE 51

Principle 2: Importance of maximizing intra-rack links

Rack in Clos

51

Intra-rack Inter-rack

Rack in Jellyfish

Switch Intra-rack Inter-rack Most links in Jellyfish are inter-rack links, which leads to more patch panel usage and high wiring complexity! Switch

slide-52
SLIDE 52

Principle 3: Importance of fat edge

52

Network edge

slide-53
SLIDE 53

Principle 3: Importance of fat edge

Servers Southbound links Northbound links Switches

53

slide-54
SLIDE 54

Principle 3: Importance of fat edge

Thin Edge North:South = 1:1

54

slide-55
SLIDE 55

Principle 3: Importance of fat edge

Fat Edge North:South = 1:1 North:South = 2:1

55

Thin Edge

slide-56
SLIDE 56

Principle 3: Importance of fat edge

Fat Edge North:South = 1:1 North:South = 2:1 Residual capacity requirement during expansion: 75% Rewiring leads to capacity drop; Drain traffic before rewiring

56

Draining 25% links --> 25% lose Thin Edge

slide-57
SLIDE 57

Principle 3: Importance of fat edge

Thin Edge Fat Edge North:South = 1:1 North:South = 2:1 Draining 25% links --> 25% lose

57

Draining 50% links --> 0% lose Residual capacity requirement during expansion: 75% Rewiring leads to capacity drop; Drain traffic before rewiring

slide-58
SLIDE 58

Principle 3: Importance of fat edge

Thin Edge Fat Edge North:South = 1:1 North:South = 2:1 Residual capacity requirement during expansion: 75% ⎯ At fat edge, more links can be rewired in a single expansion step. ⎯ Jellyfish has fat edge = fewer expansion steps ⎯ Clos has thin edge = more expansion steps

58

Draining 25% links --> 25% lose Draining 50% links --> 0% lose

slide-59
SLIDE 59

Summary of case study

59

4-layer Clos (Medium) Jellyfish Regularity Maximizing intra-rack links Fat edge

slide-60
SLIDE 60

60

How to characterize the management complexity? Metrics ⎯ Deployment ⎯ Expansion

Challenges

How does topology structure affect the management complexity? Is there a topology family with lower management complexity, lower cost and high capacity? New topology ⎯ FatClique

Contributions

Comparison of topologies ⎯ No topology dominates ⎯ Principles learned

slide-61
SLIDE 61

FatClique

Sub-block (Clique of Switches)

61

switch server

slide-62
SLIDE 62

FatClique

switch server Sub-block (Clique of Switches)

62

Goal: one or multiple sub-blocks should be packed into a single rack to maximize intra-rack links

slide-63
SLIDE 63

FatClique

switch server Sub-block (Clique of Switches) Block (Clique of Sub-blocks)

63

slide-64
SLIDE 64

FatClique

switch server Sub-block (Clique of Switches) The Whole Network (Clique of Blocks) Block (Clique of Sub-blocks)

64

Goal: blocks should be large enough to form uniform bundles.

slide-65
SLIDE 65

Does FatClique satisfy principles learned?

⎯ Regularity ⎯ Maximizing intra-rack links ⎯ Fat edge

65

slide-66
SLIDE 66

Does FatClique satisfy principles learned?

⎯ Regularity ⎯ Maximizing intra-rack links ⎯ Fat edge

66

slide-67
SLIDE 67

Challenges

Conflicts: Fat edge vs maximizing intra-rack links

67

Sub-block For each switch, ⎯ 3 servers ⎯ 3 intra-rack links ⎯ 3 inter-rack links

slide-68
SLIDE 68

Challenges

Conflicts: Fat edge vs maximizing intra-rack links

68

Sub-block For each switch, ⎯ 3 servers ⎯ 3 intra-rack links ⎯ 3 inter-rack links

slide-69
SLIDE 69

Challenges

69

Sub-block For each switch, ⎯ 3 servers ⎯ 3 intra-rack sw ⎯ 3 inter-rack sw

Conflicts: Fat edge vs maximizing intra-rack links

slide-70
SLIDE 70

Challenges

70

For each switch, ⎯ 3 servers ⎯ 3 intra-rack sw ⎯ 3 inter-rack sw 3 Sub-block 3 3 3

Conflicts: Fat edge vs maximizing intra-rack links

slide-71
SLIDE 71

Challenges

71

3 Sub-block 3 Thin edge 3 3 For each switch, ⎯ 3 servers ⎯ 3 intra-rack sw ⎯ 3 inter-rack sw

Conflicts: Fat edge vs maximizing intra-rack links

slide-72
SLIDE 72

Challenges

72

3 Sub-block 3 Thin edge Decrease intra-rack links per switch from 3 to 2 4 4 4 Fat edge 3 3 For each switch, ⎯ 3 servers ⎯ 3 intra-rack sw ⎯ 3 inter-rack sw

Conflicts: Fat edge vs maximizing intra-rack links

slide-73
SLIDE 73

Challenges

73

Conflicts ⎯ Fat edge vs maximizing intra-rack links ⎯ Fat edge vs minimizing switches

slide-74
SLIDE 74

Challenges

Conflicts ⎯ Fat edge vs maximizing intra-rack links ⎯ Fat edge vs minimizing switches Constraints ⎯ provide right amount of capacity ⎯ minimize rack fragmentation ⎯ minimize overall cable length ⎯ ...

74

slide-75
SLIDE 75

Constraint-based search

sub-block Block switch server

75

slide-76
SLIDE 76

Constraint-based search

sub-block Block switch server

76

Constraints ⎯ Fat edge at a switch

⎯ # Northbound > # Southbound

slide-77
SLIDE 77

Constraint-based search

sub-block Block switch server

77

Constraints ⎯ Fat edge at a switch

⎯ # Northbound > # Southbound

slide-78
SLIDE 78

Constraint-based search

sub-block Block switch server

78

Constraints ⎯ Fat edge at a switch

⎯ # Northbound > # Southbound

⎯ Fat edge at a block ⎯ Block size ⎯ ...

slide-79
SLIDE 79

Evaluation

⎯ Does FatClique have lower deployment complexity? ⎯ Does FatClique have lower expansion complexity?

79

slide-80
SLIDE 80

Evaluation Methodology

⎯ Equalize capacities for topologies ⎯ Compare topologies at different scale ⎯ Highly optimized placement algorithms for different topologies ⎯ Optimal expansion algorithm for symmetric Clos ⎯ Search-based near-optimal expansion algorithm for FatClique ⎯ Patch panel usage in different topologies ⎯ ...

80

slide-81
SLIDE 81

FatClique has low deployment complexity

C: Clos, J: Jellyfish, X: Xpander, F: FatClique

# switches # patch panels # bundle types FatClique performs best by all deployment metrics

81

slide-82
SLIDE 82

FatClique has low deployment complexity

C: Clos, J: Jellyfish, X: Xpander, F: FatClique

# switches # patch panels # bundle types FatClique performs best by all deployment metrics

82

slide-83
SLIDE 83

FatClique has low expansion complexity

FatClique is as good as expanders ⎯ 3 or 4 expansion steps even when the residual capacity requirement is tight

83

Residual Capacity Requirement

slide-84
SLIDE 84

FatClique has low expansion complexity

84

Residual Capacity Requirement

FatClique is as good as expanders ⎯ 3 or 4 expansion steps even when the residual capacity requirement is tight ⎯ FatClique enables higher availability

slide-85
SLIDE 85

Conclusions and Future work

Management complexity is an important dimension for topology design ⎯ Our work is a first step towards this direction ⎯ Metric design FatClique achieves lower management complexity ⎯ with same capacity ⎯ with lower cost Future work ⎯ control plane complexity ⎯ network debuggability ⎯ practical routing for FatClique

85

Cost/Capacity Management complexity FatClique Considered topologies

slide-86
SLIDE 86

86

Thanks!

slide-87
SLIDE 87

Backup

87

slide-88
SLIDE 88

FatClique has low cabling cost

88

C: Clos, J: Jellyfish, X: Xpander, F: FatClique Cabling Cost ($) FatClique is 23% cheaper than Clos ⎯ Smaller number of links FatClique is cheaper than Expanders ⎯ Maximizing intra-rack links, which saves expensive optical transceivers.

slide-89
SLIDE 89

Single step complexity

89

slide-90
SLIDE 90

Path diversity

90

slide-91
SLIDE 91

Spectral gap

91

slide-92
SLIDE 92

Deployment complexity metrics

# switches # patch panels # bundle types

92

slide-93
SLIDE 93

Deployment-wiring

Google’s Watchtower Chassis

93