Traffic Engineering with Forward Fault Correction (FFC)
Hongqiang “Harry” Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, David Gelernter (Yale University)
1
Forward Fault Correction (FFC) Hongqiang Harry Liu , Srikanth - - PowerPoint PPT Presentation
Traffic Engineering with Forward Fault Correction (FFC) Hongqiang Harry Liu , Srikanth Kandula, Ratul Mahajan, Ming Zhang, David Gelernter (Yale University) 1 Cloud services require large network capacity Cloud Services Growing traffic
1
2
(e.g. cost of WAN: $100M/year)
3
4
TE controller
Network Network view Network configuration
Frequent updates for high utilization Control-plane faults Data-plane faults
5
TE Controller Switch TE configurations
Memory shortage RPC failure Firmware bugs Overloaded CPU
s1 s2 s3 s4
Link Capacity: 10
7 3 3 7
10
s1
10 10 10 10
s2 s3 s4 6
New Flows (traffic demands): s1 s2 (10) s1 s3 (10) s1 s4 (10) Configuration failure Congestion s1
7 3 10 10 10
s2 s3 s4
10
s2 s4 (10) s3 s4 (10)
7
Link and switch failures
s1
7 3 3 7
s2 s3 s4
link failure
link failure s1
10 7
s2 s3 s4
3
congestion
Link Capacity: 10
s2 s4 (10) s3 s4 (10)
8
Control plane: fault rate = 0.1% -- 1% per TE update. Data plane: fault rate = 25% per 5 minutes.
In production networks:
9
Cannot prevent congestion Slow
(seconds -- minutes)
Blocked by control plane faults Big loss in throughput
10
11
Network faults Packet loss
FFC guarantees no congestion under up to k arbitrary faults. FEC guarantees no information loss under up to k arbitrary packet drops.
with careful data encoding with careful traffic distribution
s1 s2 s3 s4
Link Capacity: 10
7 3 3 7
10
s1
10 10 10 10
s2 s3 s4
Configuration failure Congestion s1
7 3 10 10 10
s2 s3 s4
10
Non-FFC
12
s1 s2 s3 s4
Link Capacity: 10
7 3 3 7
13 Configuration failure s1
7 3 10 10 10
s2 s3 s4
7
s1 s2 s3 s4
10 10 10 10 7 Control Plane FFC (k=1)
Configuration failure s1
10 7 10 10
s2 s3 s4
7 3
10
s1
10 10 10 10
s2 s3 s4
Non-FFC
s1 s2 s3 s4
10 10 10 10 7 K=1 (Control Plane FFC)
s1
10 4 10
s2 s3 s4
10 10
K=2 (Control Plane FFC)
14
Throughput: 44 Throughput: 47 Throughput: 50
15
Formulation: How to merge FFC into existing TE framework? Computation: How to find FFC-TE efficiently?
16
Sizes of flows 𝑙𝑑 control plane faults Deliver all granted flows No overloaded link FFC constraints: Maximizing throughput No overloaded link up to 𝑙𝑓 link failures 𝑙𝑤 switch failures TE decisions: Traffic on paths Basic TE constraints: TE objective: … 𝑐𝑔 𝑚𝑔,𝑢 s.t. ∀𝑔: ∀𝑢 𝑚𝑔,𝑢 ≥ 𝑐𝑔 ∀𝑓: ∀𝑔 ∀𝑢∋𝑓 𝑚𝑔,𝑢 ≤ 𝑑𝑓 … LP formulations
17
s1 s2
𝑔
1
𝑔
2
𝑔
3
𝑚1
𝑜𝑓𝑥 + 𝑚2 𝑜𝑓𝑥 + 𝑚3 𝑝𝑚𝑒 ≤ link cap
𝑚1
𝑜𝑓𝑥 + 𝑚2 𝑝𝑚𝑒 + 𝑚3 𝑜𝑓𝑥 ≤ link cap
𝑚1
𝑝𝑚𝑒 + 𝑚2 𝑜𝑓𝑥 + 𝑚3 𝑜𝑓𝑥 ≤ link cap
𝑔
1’s load in old TE
2’s load in new TE
Fault on 𝑔
1:
Fault on 𝑔
2:
Fault on 𝑔
3:
Total load under faults?
With n flows and FFC protection k: #constraints = 𝒐
𝟐 + … + 𝒐 𝒍 for each link.
Challenge: too many constraints
18
Our approach: A lossless compression from O( 𝑜
𝑙 ) constraints to O(kn) constraints.
𝑙 )
Define 𝑧𝑛 as the mth largest element in 𝑌: 𝑛=1
𝑙
𝑧𝑛 ≤ link spare capacity
Expressing 𝑧𝑛 with 𝑌?
19
A comparison 𝑦1 𝑦2 𝑨1=max{𝑦1, 𝑦2} 𝑨2=min{𝑦1, 𝑦2}
1st round 2nd round
paths are disjoint.
20
21
A WAN network with O(100) switches and O(1000) links Injecting faults based
Single priority traffic in a well-provisioned network Multiple priority traffic in a well-utilized network
FFC Throughput / Optimal Throughput
40 60 20 Ratio (%)
22
80 100 High priority (High FFC protection) Medium priority (Low FFC protection) Low priority (No FFC protection) Single priority
FFC Data-loss / Non-FFC Data-loss
160%
<0.01%
23
Heavy network
High risk of congestion