NUMFabric: Fast and Flexible Bandwidth Allocation in Datacenters
Kanthi Nagaraj (Stanford), Dinesh Bharadia(M.I.T.), Mohammad Alizadeh (M.I.T.), Hongzi Mao (M.I.T.), Sandeep Chinchali (Stanford) and Sachin Katti(Stanford)
Sigcomm 2016
NUMFabric: Fast and Flexible Bandwidth Allocation in Datacenters - - PowerPoint PPT Presentation
NUMFabric: Fast and Flexible Bandwidth Allocation in Datacenters Kanthi Nagaraj (Stanford), Dinesh Bharadia(M.I.T.), Mohammad Alizadeh (M.I.T.), Hongzi Mao (M.I.T.), Sandeep Chinchali (Stanford) and Sachin Katti(Stanford) Sigcomm 2016
Kanthi Nagaraj (Stanford), Dinesh Bharadia(M.I.T.), Mohammad Alizadeh (M.I.T.), Hongzi Mao (M.I.T.), Sandeep Chinchali (Stanford) and Sachin Katti(Stanford)
Sigcomm 2016
6
Minimize avg flow completion time Translate to utility functions
Hosts
send utility function to hosts Weighted proportional fairness Application level
mππ¦ππππ¨π β )*
+*
(π¦-)
xi
Resource pooling
mππ¦ππππ¨π β π₯π β log (π§-)
across all subpaths
xi Γ rate of flow i si Γ size of flow i wi Γ weight of flow i
Network sends congestion Signals
H9 H8 H6 H3 H2 H1 H4 H5 H7
Each source sets its rate based on gradient of its utility function and the network feedback Sources send traffic
0.2 0.4 0.6 0.8 1 1.2 1 6 11 16 Rates Iterations
Flow rates
Capacity Capacity
0.2 0.4 0.6 0.8 1 1.2 1 6 11 16 21 26 Normalized rates Iterations
Larger steps to optimal
0.2 0.4 0.6 0.8 1 1.2 1 6 11 16 21 26 31 Iterations
Smaller steps to optimal Capacity Capacity
Overshooting might cause bloated queues and packet drops
Use Weights instead of rates ! Setting weights of the flow and allowing a fabric to allocate rates proportional to the weights enables exactly this.
0.2 0.4 0.6 0.8 1 1.2 1 6 11 16 21 26 Normalized rates Iterations Larger steps to optimal 0.2 0.4 0.6 0.8 1 1.2 1 3 5 7 9 11 13 15 17 19 Iterations Larger steps to optimal
?
Can we enable larger steps to optimal but without over-shooting and under-utilization? Overshooting might cause drops, queues bloating
Capacity Capacity
Capacity
allocates relative rates proportional to the weights of all flows
0.2 0.4 0.6 0.8 1 1.2 1 6 11 16 21 Setting weights to control rates
Weights Network feedback Layer that sets weights of flows based on network feedback Layer that realizes rates proportional to the weights
Translate to utility functions Application level
Weighted Max-Min rate allocation according to the weights
ππβ² π¦π = N ππ
Q βS(-)
ππ β π¦π β πQ
= 0 πππ¦ππππ¨π N ππ(π¦-)
KKT Conditions: Equations that must necessarily be true at optimal solution At optimal, either the link is fully utilized or the price of the link is zero At optimal, the marginal utility of the source is equal to the sum of the prices along the path of the flow Price of a link : variable that indicates the congestion level at the switch
ππβ² π¦π = N ππ
Q βS(-)
ππ β π¦π β πQ
= 0
xi = π½ππ€ππ π‘π ππ ππ^ (β ππ)
Q βS(-)
Sources set the rates of the flows using price feedback Switches set their prices measuring congestion solve solve
Network congestion signals
N ππ
Q βS(-)
H9 H8 H6 H3 H2 H1 H4 H5 H7
pl = ππ + π½ β β π¦π β πQ
Sources set rates of flows Sources adapt rates of flows
ππβ² π¦π = N ππ
Q βS(-)
ππ β π¦π β πQ
= 0
wi = π½ππ€ππ π‘π ππ ππ
^ (β
ππ)
Q βS(-)
WMM layer always achieves 100% link utilization
Controlling rates directly causes the brittleness in the existing solutions. WMM layer converts these weights to rates
H9 H8 H6 H3 H2 H1 H4 H5 H7
Switches adapt prices at every iteration so that the flow rates move closer to optimal
As we know, controlling rates directly causes the brittleness in the existing solutions.
Switches adapt prices every iteration so that the flow rates to move closer to optimal πππ‘πππ£π = π-β² π¦π β N ππ
Q βS(-) ππ = πQ + min π ππ‘πππ£ππ βπππ‘ π’π ππ€ππ π‘ππ ππ§ ππππ₯-
H9 H8 H6 H3 H2 H1 H4 H5 H7
Residue Residue
ππ β π¦π β πQ
= 0 β ππβ² π¦π = N ππ
Q βS(-)
Weighted Max-Min Feasible and stable rates for all flows based on weights Weight adaptation at hosts
Path prices Rates Flow weights Residues
Price adaptation at switches
Residues Prices
20
40Gbps Fabric Links 10Gbps Edge Links
8 Racks
functions against point solutions for different objectivesβ pFabric, MPTCP, etc.
every βeventβ.
converge before triggering another event
time (335 us) of NUMFabric is 2.3X better that the other algorithms
mππ¦ππππ¨π β )*
+*
si Γ size of the flow
mππ¦ππππ¨π β log (π§-)
flow across all sub-paths
bandwidth allocation for different bandwidth allocation objectives
decouples the objectives of finding optimal rates and stable rates.This makes it 2-3X faster existing mechanisms.
tenant-level aggregates is focus of our current and future work.