Traffic Engineering with Forw rward Fault Correction
Harry Liu Microsoft Research 06/02/2016
1
Joint work with Ratul Mahajan, Srikanth Kandula, Ming Zhang and David Gelernter
Traffic Engineering with Forw rward Fault Correction Harry Liu - - PowerPoint PPT Presentation
Traffic Engineering with Forw rward Fault Correction Harry Liu Microsoft Research 06/02/2016 Joint work with Ratul Mahajan, Srikanth Kandula, Ming Zhang and David Gelernter 1 Cloud services require large network capacity Cloud Applications
Harry Liu Microsoft Research 06/02/2016
1
Joint work with Ratul Mahajan, Srikanth Kandula, Ming Zhang and David Gelernter
2
Cloud Applications Cloud Networks Growing traffic Expensive
(e.g. cost of WAN: $100M/year)
3
Traffic Engineering (centralized & SDN-Based) WAN Network
Datacenter Network
4
Sub-optimal resource allocation based on local view & control.
2 1 4 3 10
5 5
10
Link Cap: 10
Requirement: path length ≤ 2 hops 2 1 4 3 10
10
10
Link Cap: 10
Demand=10 Demand=10 Total throughput: 15 Total throughput: 20
TE controller
Optimal resource allocation based on global view & control.
1) how much traffic to admit 2) how to route
5
TE controller
Network Network view (e.g. topo, cap, traffic) Network configuration (e.g. routes, rate limits)
Frequent updates for high utilization (e.g. per 5min) Control-plane faults Data-plane faults
6
Link and switch failures
s1
7 3 3 7
s2 s3 s4
link failure
link failure s1
10 7
s2 s3 s4
3
congestion
Rescaling: Sending traffic proportionally to residual paths
Link Capacity: 10
s2 s4 (10) s3 s4 (10)
7
Failures or long delays to configure a network device
TE Controller Switch TE configurations
Memory shortage RPC failure Firmware bugs Overloaded CPU
Control plane faults can also result in congestion.
8
TE controller
Network Network view Network configuration
Inaccuracy Incompleteness Control-plane faults Data-plane faults
9
Control plane: fault rate = 0.1% -- 1% per TE update. Data plane: fault rate = 25% per 5 minutes.
In a production WAN network (200+ routers, 6000+ links):
10
Cannot prevent congestion Slow
(seconds -- minutes)
Blocked by control plane faults Big loss in throughput
11
Network TE Algorithm not robust enough making it robust
12
Network faults Packet loss
FFC guarantees no congestion under up to k arbitrary faults. FEC guarantees no information loss under up to k arbitrary packet drops.
with careful data encoding with careful traffic distribution
13
Link Capacity: 10
K=1 (FFC)
Failure Cases
s2 s4 (9) s3 s4 (9)
s1
8 1 1 8
s2 s3 s4
s1
8
s2 s3 s4
9
link failure
1
s1
9
s2 s3 s4
9
link failure s1
9 1 8
s2 s3 s4 link failure
s1
15 5 10
s2 s3 s4 link failure
14
Non-FFC (Throughput: 30) FFC (k=1) (Throughput: 18) Non-FFC (Throughput: 18)
There exists a trade-off between throughput and robustness FFC does not always sacrifice efficiency for robustness
s1
8 1 1 8
s2 s3 s4 s1
10 5 5 10
s2 s3 s4 s1
5 4 4 5
s2 s3 s4
link failure
s1
9 5
s2 s3 s4
4
Achieving the optimal throughput with FFC guarantee
15
Formulation: How to merge FFC into existing TE framework? Computation: How to find FFC-TE efficiently?
16
Sizes of flows 𝑙𝑑 control plane faults Deliver all granted flows No overloaded link FFC constraints: Maximizing throughput No overloaded link up to 𝑙𝑓 link failures 𝑙𝑤 switch failures TE decisions: Traffic on paths Basic TE constraints: TE objective: … 𝑐𝑔 𝑚𝑔,𝑢 s.t. ∀𝑔: ∀𝑢 𝑚𝑔,𝑢 ≥ 𝑐𝑔 ∀𝑓: ∀𝑔 ∀𝑢∋𝑓 𝑚𝑔,𝑢 ≤ 𝑑𝑓 … LP formulations
17
D S
flow size: 𝑡𝑔 𝑏2 𝑏3 bw allocation: 𝑏1
path-1 path-2 path-3
𝑡𝑔 ≤ 𝑏2+𝑏3
𝟒 𝟑
Fault on path-1: Fault on path-2: Fault on path-3:
𝑡𝑔 ≤ 𝑏1+𝑏3 𝑡𝑔 ≤ 𝑏1+𝑏2 Lemma: FFC is achieved when path-i’s weight is 𝑏𝑗/𝑏1+𝑏2+𝑏3 Paths are link-disjoint.
18
Given n paths and 𝐵 = {𝑏1, 𝑏2, … , 𝑏𝑜}, FFC requires that the sum of arbitrary n-k elements in 𝐵 is ≥ flow size O( 𝑜
𝑙 )
k-sum linear constraint group (k-sum group): FFC-TE LP-formulation:
TE Objective k-sum group-N k-sum group-1 Basic TE Constraints
FFC Constraints (too many) Lossless compression of a k-sum group: O( 𝑜
𝑙 )
O(kn) bubble sorting network
(SIGCOMM 2014)
O(n) strong duality
(MSR TR 2016)
http://www.hongqiangliu.com/publications.html
19
20
21
SDN Controller
Network Network view Network configuration
Network Properties:
Network Faults:
22