CompSci 514: Computer Networks L18: Datacenter Network Architectures II
Xiaowei Yang
1
CompSci 514: Computer Networks L18: Datacenter Network - - PowerPoint PPT Presentation
CompSci 514: Computer Networks L18: Datacenter Network Architectures II Xiaowei Yang 1 Outline Design and evaluation of VL2 Discussion FatTree vs VL2 What common challenges did each address? What methods did each use to
1
2
4
5
Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004
CR CR AR AR AR AR
S S
DC-Layer 3 Internet
S S A A A
…
S S A A A
…
DC-Layer 2
Key
~ 1,000 servers/pod == IP subnet
6
CR CR AR AR AR AR S S S S A A A
…
S S A A A
…
7
S S S S A A A
…
S S A A A
…
~ 5:1 ~ 40:1 ~ 200:1
8
CR CR AR AR AR AR S S S S S S S S S S S S
IP subnet (VLAN) #1
~ 200:1
IP subnet (VLAN) #2
A A A
…
A A A
…
A A
…
A A
…
A A A
9
CR CR AR AR AR AR S S S S S S S S S S S S
IP subnet (VLAN) #1
~ 200:1
Complicated manual L2/L3 re-configuration
IP subnet (VLAN) #2
A A A
…
A A A
…
A A
…
A A
…
A A A
10
CR CR AR AR AR AR S S S S S S S S S S S S
A A A
…
A A A
…
A A
…
A A
…
A A
Revenue lost Expense wasted
11
12
13
14 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 1 100 10000 1e+06 1e+08 1e+10 1e+12 PDF Flow Size (Bytes) Flow Size PDF Total Bytes PDF 0.2 0.4 0.6 0.8 1 1 100 10000 1e+06 1e+08 1e+10 1e+12 CDF Flow Size (Bytes) Flow Size CDF Total Bytes CDF
Figure : Mice are numerous; of fows are smaller than MB. However, more than of bytes are in fows between MB and GB.
15
0.01 0.02 0.03 0.04 1 10 100 1000 0.2 0.4 0.6 0.8 1 Fraction of Time Cumulative Number of Concurrent flows in/out of each Machine PDF CDF
Figure : Number of concurrent connections has two modes: () fows per node more than of the time and () fows per node for at least of the time.
16
17
18
19
5 10 15 20 25 30 35 40 200 400 600 800 1000 Index of the Containing Cluster Time in 100s intervals
Frequency
5 10 20 50 100 200
Run Length Frequency
2.0 3.0 4.0 100 200 300
log(Time to Repeat)
(a) (b) (c)
Figure : Lack of short-term predictability: Tie cluster to which a trafc matrix belongs, i.e., the type of trafc mix in the TM, changes quickly and randomly.
20
21
22
23
24
A A A
…
A A A
…
A A A
…
A A A
…
CR CR AR AR AR AR S S S S S S S S S S S S A A A A A A A A A A A A A A A A A A A A A A A A A
25
capacity
isolation
A A A
…
A A A
…
A A A
…
A A A
…
A A A A A A A A A A A A A A A A A A A A A A A A A
26
high capacity between servers Enforce hose model using existing mechanisms only Employ flat addressing
semantics
Isolation Guarantee bandwidth for hose-model traffic Flow-based random traffic indirection (Valiant LB) Name-location separation & resolution service TCP
27
Figure 1: A VPN BASED ON THE CUSTOMER-PIPE MODEL. A mesh of customer-pipes is needed, each ex- tending from one customer endpoint to another. A cus- tomer endpoint must maintain a logical interface for each
Service level agreements following the characterization phase might be based on the current traffic load with provi- sions made for expected gradual growth as well as expected drastic traffic changes that the customer might foresee (or protect against). Both the customer and the provider may play a role in testing whether the SLAs are met. The provider may police (and possibly shape) the incoming traffic to a hose from the customer’s access link to ensure that it stays within the specified profile. Similarly, traffic leaving the VPN at a hose egress (i.e., traffic potentially generated from multi- ple sources that has traversed the network) may have to be monitored and measured at the hose egress point, to ensure that such traffic stays within the specified profile and that the provider has met the SLA. The customer might also be required to specify a policy for actions to be taken should egress traffic be more than the specified egress hose capacity.
2.1 Capacity Management
From a provider’s perspective, it is potentially more chal- lenging to support the hose model, due to the need to meet the SLAs with a very weak specification
trix. To manage resources so as to deal with this increased uncertainty, we consider two basic mechanisms: Statistical Multiplexing: As a single QoS assurance applies to a hose, the provider can consider multiplexing all the traffic of a given hose together. Similarly, the set of hoses making up the VPN have a common QoS assurance, and the provider can consider multiplexing all the traffic of a given VPN together. These techniques can be applied
access links and network internal links. Resizing: In order to provide tight QoS assurances, the provider may use (aggregate) network resource reservation mechanisms that allocate capacity
given hose or VPN. A provider can take the approach
allocating this capacity statically, taking into account worst case demands. Alternatively, a provider can make an initial allocation, and then resize that allocation based on online measurements. Again, such techniques can be applied
both access and network internal
finer time scale than the time scale
Figure 2: A VPN BASED ON THE HOSE MODEL. A customer endpoint maintains just one logical interface, a hose, to the provider access router. In the Figure, we show the implementation
provider-pipes. These two resource management mechanisms can be used separately
Some more remarks are in order on resizing. Provi- sioning decisions normally have an impact
timescales. Within the context
mea- surements
usage can be used on much shorter timescales to enable efficient capacity management. Under- lying this is an assumption that within the network bound- aries will exist between resources that might be used by different classes of traffic to ensure that performance guar- antees are met. For example, traffic from different VPNs might be isolated from each other, and from other classes of traffic. In the context of this paper, resources available for VPN traffic cannot be used by other traffic requiring perfor- mance guarantees. We assume that this perspective holds whether the boundaries reflect reservation
as in the case of Intserv, or whether it represents some allo- cation in a bandwidth broker in a Diffserv environment. If we can use the measurements
the boundary for a given VPN’s traffic, more bandwidth will be made available to other traffic and we can make better use of available capacity. In reality, measurements
usage would be used to make a prediction about near term future usage, and this prediction will be used to resize the share of resources allocated. In the hose model, this approach can be realized by al- lowing customers to resize the respective hose capacities
a VPN. Presumably there will be some cost incentive for customers to resize their hose capacities. While we envisage this mechanism to be mainly used to track actual usage, by exposing this interface to the customer, it would also en- able the customer to resize its hose capacities based on local policy decisions. How frequently hoses may be resized will depend on im- plementation and overheads for resizing and measurement. More important, however, is whether frequent resizing is beneficial and whether it is possible to make predictions with sufficient accuracy. Finally, short timescale resizing is not a replacement for provisioning and admission control and the appropriate relationship between these resource man- agement approaches is important. 97
28
payload ToR3
. . . . . .
y
x
Servers use flat names Switches run link-state routing and maintain only switch-level topology
y z
payload ToR4 z
ToR2 ToR4 ToR1 ToR3 y, z
payload ToR3 z
. . .
Directory Service
… x à ToR2 y à ToR3 z à ToR4 …
Lookup & Response
… x à ToR2 y à ToR3 z à ToR3 …
29
payload ToR3
. . . . . .
y
x
Servers use flat names Switches run link-state routing and maintain only switch-level topology
y z
payload ToR4 z
ToR2 ToR4 ToR1 ToR3 y, z
payload ToR3 z
. . .
Directory Service
… x à ToR2 y à ToR3 z à ToR4 …
Lookup & Response
… x à ToR2 y à ToR3 z à ToR3 …
30
. . . . . .
TOR
20 Servers
Int
. . . . . . . . .
Aggr
K aggr switches with D ports
20*(DK/4) Servers
. . . . . . . . . . .
31
. . . . . .
TOR
20 Servers
Int
. . . . . . . . .
Aggr
K aggr switches with D ports
20*(DK/4) Servers
. . . . . . . . . . .
D (# of 10G ports) Max DC size (# of Servers) 48 11,520 96 46,080 144 103,680
32
x y
payload T3 y
z
payload T5 z
IANY IANY IANY
IANY
Links used for up paths Links used for down paths
T1 T2 T3 T4 T5 T6
33
x y
payload T3 y
z
payload T5 z
IANY IANY IANY
IANY
Links used for up paths Links used for down paths
T1 T2 T3 T4 T5 T6
34
Time (s) Fairness Index
§
0 100 200 300 400 500 1.00 0.96 0.92 0.88 0.84 0.80 Fairness of Aggr-to-Int links’ utilization
Goodput efficiency Fairness§ between flows
§Jain’s fairness index defined as (∑xi)2/(n·∑xi2)
94% 0.995
35 50 100 150 200 250 300 350 400 10 20 30 40 50 60 Time (s) Aggregate goodput (Gbps) 50 100 150 200 250 300 350 400 1000 2000 3000 4000 5000 6000 Active flows Aggregate goodput Active flows
Figure : Aggregate goodput during a .TB shufe among servers.
36
80 100 120 140 160 180 200 220 5 10 15 Aggregate goodput (Gbps) Time (s) Service 1 Service 2
Figure : Aggregate goodput of two services with servers inter- mingled on the ToRs. Service one’s goodput is unafected as ser- vice two ramps trafc up and down.
37
50 60 70 80 90 100 110 120 130 5 10 15 20 Aggregate goodput (Gbps) Time (s) 50 60 70 80 90 100 110 120 130 500 1000 1500 2000 # mice started Aggregate goodput # mice started
Figure : Aggregate goodput of service one as service two cre- ates bursts containing successively more short TCP connections.
38
diate and Intermediate are unplugged in succession and then reconnected in succession. Approximate times of link manipu- lation marked with vertical lines. Network re-converges in < 1s afer each failure and demonstrates graceful degradation.
39