SLIDE 1
Perseverance-Aware Traffic Engineering in Rate-Adaptive Networks with Reconfiguration Delay
Shih-Hao Tseng, (pronounced as “She-How Zen”) October 10, 2019
Department of Computing and Mathematical Sciences, California Institute of Technology
SLIDE 2 Optical Networks
- Modern wide-area networks consist of expensive optical fibers.
- The capacity of the optical fibers is determined by the
signal-to-noise ratio (SNR) and the adopted modulation (such as PSK, QAM, etc.).
modulation noise capacity
1
SLIDE 3 Rate-Adaptive Networks
- In practice, SNR is much better than required.
- RADWAN (Singh et al., 2018) leverages bandwidth variable
transceivers (BVTs) to change the modulation and vary the capacity.
modulation noise capacity
Singh et al., “RADWAN: Rate Adaptive Wide Area Network,” 2018.
2
SLIDE 4 Rate-Adaptive Networks
- In practice, SNR is much better than required.
- RADWAN (Singh et al., 2018) leverages bandwidth variable
transceivers (BVTs) to change the modulation and vary the capacity.
modulation noise capacity
Singh et al., “RADWAN: Rate Adaptive Wide Area Network,” 2018.
2
SLIDE 5 Rate-Adaptive Networks: Challenge
- Reconfiguration delay: During the change of modulation, the
- ptical link is down for a while.
reconfiguration delay
3
SLIDE 6 One-Shot Update and Churn
- The reconfiguration delay causes traffic disturbance, which is
named churn in RADWAN.
4
SLIDE 7 One-Shot Update and Churn
- The reconfiguration delay causes traffic disturbance, which is
named churn in RADWAN.
- Adaptive links bring higher final throughput while causing
- churn. RADWAN updates the links in one-shot and addresses
the trade-off by max (final throughput) − ǫ · (churn) where ǫ is the trade-off factor.
4
SLIDE 8 One-Shot Update
- One-shot update leads to considerable traffic fluctuation.
initial
final
5
SLIDE 9 Multi-Step Reconfiguration
- One-shot update leads to considerable traffic fluctuation.
- We can update links in batches to reduce the traffic
fluctuation by introducing intermediate steps similar to SWAN (Hong et al., 2013).
initial
final step 1 step 2
Hong et al., “Achieving High Utilization with Software-Driven WAN,” 2013.
6
SLIDE 10 Multi-Step Reconfiguration and Perseverance
- Given a multi-step plan, we can consider not only the total
impact (churn) but also the smoothness of the update.
churn Throughput Step 1 2 3 4
7
SLIDE 11 Multi-Step Reconfiguration and Perseverance
- Given a multi-step plan, we can consider not only the total
impact (churn) but also the smoothness of the update.
- We propose perseverance to describe the smoothness of the
- transition. The perseverance level is defined as the maximum
allowed throughput drop between two consecutive steps.
≤ 40% ≤ 40% ≤ 40% perseverance level = 40% Throughput Step 1 2 3 4
7
SLIDE 12 Multi-Step Reconfiguration and Perseverance
- Incorporating perseverance into consideration, we consider the
- ptimization as follows, which is different from RADWAN’s
churn-based proposal: max (final throughput in T steps) s.t. (perseverance level ≥ ρ) where ρ is the lower bound on the perseverance level.
- A multi-step reconfiguration allows higher final throughput
without the degradation of perseverance level.
8
SLIDE 13 Multi-Step Reconfiguration and Perseverance initial
final perseverance level ρ = 0
9
SLIDE 14
Multi-Step Reconfiguration and Perseverance initial final step 1 step 2 perseverance level ρ = 0.5
9
SLIDE 15 Rate Adaptation Planning (RAP) Problem
- Consider a network shared by N flows. Each sends at rate
xn(t) during step t. With the horizon T, we can write down the discrete-time control formulation of the rate adaptation planning (RAP) problem as follows RAP = max
xn(T) subject to capacity constraints, perseverance constraints, initial constraints, and feasibility constraints.
10
SLIDE 16 Rate Adaptation Planning (RAP) Problem: Constraints
- Perseverance constraints: Given a perseverance level ρ, the
perseverance constraint can be written as ρxn(t − 1) ≤ xn(t) for all t = 1, 2, . . . , T.
- Initial constraints: xn(0) is given for all n. Each link has an
initial capacity.
- Feasibility constraints: Each flow n has a predetermined
path set to send its traffic. xn(t) is the sum of the traffic along all the paths.
11
SLIDE 17 Rate Adaptation Planning (RAP) Problem: Constraints
- Capacity constraints: The capacity of a link cl is determined
by the adopted modulations (and the underlying SNR). Once the modulation is changed, the link is down for one step.
t = 0 t = 1 t = 2 t = 3 t = 4 t = T = 5
12
SLIDE 18 Mixed Integer Linear Programming Formulation
- Under a fixed SNR, we can show that an optimal update plan
can be achieved by changing the modulation on each link l at most once – to one providing the highest capacity.
t = 0 t = 1 t = 2 t = 3 t = 4 t = T = 5
13
SLIDE 19 Mixed Integer Linear Programming Formulation
- Under a fixed SNR, we can show that an optimal update plan
can be achieved by changing the modulation on each link l at most once – to one providing the highest capacity.
- As such, we introduce the auxiliary integer variable zl(t) for
each link l to indicate whether the modulation of l has been changed at step t.
zl(0) = 0 zl(1) = 1 zl(2) = 1 zl(3) = 1 zl(4) = 1 zl(5) = 1 t = 0 t = 1 t = 2 t = 3 t = 4 t = T = 5
13
SLIDE 20 Mixed Integer Linear Programming Formulation
- Under a fixed SNR, we can show that an optimal update plan
can be achieved by changing the modulation on each link l at most once – to one providing the highest capacity.
- As such, we introduce the auxiliary integer variable zl(t) for
each link l to indicate whether the modulation of l has been changed at step t.
zl(0) = 0 zl(1) = 0 zl(2) = 0 zl(3) = 1 zl(4) = 1 zl(5) = 1 t = 0 t = 1 t = 2 t = 3 t = 4 t = T = 5
13
SLIDE 21 Mixed Integer Linear Programming Formulation
- Under a fixed SNR, we can show that an optimal update plan
can be achieved by changing the modulation on each link l at most once – to one providing the highest capacity.
- As such, we introduce the auxiliary integer variable zl(t) for
each link l to indicate whether the modulation of l has been changed at step t.
zl(0) = 0 zl(1) = 0 zl(2) = 0 zl(3) = 0 zl(4) = 0 zl(5) = 0 t = 0 t = 1 t = 2 t = 3 t = 4 t = T = 5
13
SLIDE 22 Mixed Integer Linear Programming Formulation
- Transformed capacity constraints: Using zl(t), we can
write the capacity as cl(t) = cmin
l
(1 − zl(t)) + cmax
l
zl(t − 1) where cmin
l
and cmax
l
are the minimum and the maximum achievable capacity of the link under the SNR.
zl(0) = 0 zl(1) = 1 zl(2) = 1 zl(3) = 1 zl(4) = 1 zl(5) = 1 t = 0 t = 1 t = 2 t = 3 t = 4 t = T = 5
14
SLIDE 23 Mixed Integer Linear Programming Formulation
max
xn(T) (RAP) s.t. capacity constraints perseverance constraints initial constraints feasibility constraints zl(t − 1) ≤ zl(t) ∀t ∈ T, l ∈ L zl(t) ∈ {0, 1} ∀t ∈ T, l ∈ L
15
SLIDE 24 Analysis of Rate Adaptation Planning (RAP) Problem
- Can we solve RAP in polynomial time?
16
SLIDE 25 Analysis of Rate Adaptation Planning (RAP) Problem
- Can we solve RAP in polynomial time?
→ Unlikely, RAP is NP-hard.
16
SLIDE 26 Analysis of Rate Adaptation Planning (RAP) Problem
- Can we solve RAP in polynomial time?
→ Unlikely, RAP is NP-hard.
- Can we approximate RAP within a constant factor?
16
SLIDE 27 Analysis of Rate Adaptation Planning (RAP) Problem
- Can we solve RAP in polynomial time?
→ Unlikely, RAP is NP-hard.
- Can we approximate RAP within a constant factor?
→ No, unless P=NP.
16
SLIDE 28 Analysis of Rate Adaptation Planning (RAP) Problem
- Can we solve RAP in polynomial time?
→ Unlikely, RAP is NP-hard.
- Can we approximate RAP within a constant factor?
→ No, unless P=NP.
16
SLIDE 29 Analysis of Rate Adaptation Planning (RAP) Problem
- Can we solve RAP in polynomial time?
→ Unlikely, RAP is NP-hard.
- Can we approximate RAP within a constant factor?
→ No, unless P=NP.
→ Under some mild assumptions, we can always reach the
- ptimal final throughput, despite that the update sequence
may be extremely long.
16
SLIDE 30 Analysis of Rate Adaptation Planning (RAP) Problem
- Can we solve RAP in polynomial time?
→ Unlikely, RAP is NP-hard.
- Can we approximate RAP within a constant factor?
→ No, unless P=NP.
→ Under some mild assumptions, we can always reach the
- ptimal final throughput, despite that the update sequence
may be extremely long. → We would prefer to finish the update in a bounded number
- f steps. Therefore, we need some good heuristics for RAP.
16
SLIDE 31 Algorithm Design Ideas
- Find a feasible reconfiguration plan.
- Fix the configuration (i.e., when we should change the
modulations of which links) and maximize the usage of available links.
17
SLIDE 32 Algorithm Design Ideas
- Find a feasible reconfiguration plan.
- Fix the configuration (i.e., when we should change the
modulations of which links) and maximize the usage of available links. In sum, we design our algorithm (ALG) to
1
solve 2-step LP relaxation at time t for relaxed zl(t) ∈ (0, 1);
2
upround zl(t) to integers to form the configuration at t;
3
iterate through t = 1, . . . , T − 1 to obtain the configurations and find the work conserving reconfiguration plan.
17
SLIDE 33 Proposed Algorithm (ALG)
xn(1) ≥ ρxn(0) max
n∈N
xn(2) xn(2) ≥ ρ2xn(0) max
n∈N
xn(3) xn(3) ≥ ρ3xn(0) max
n∈N
xn(4) xn(4) ≥ ρ4xn(0) max
n∈N
xn(5) zl(1) t = 1 zl(2) t = 2 zl(3) t = 3 zl(4) t = 4 max
t∈T
xn(t) t = T = 5 xn
p(1)
xn
p(2)
xn
p(3)
xn
p(4)
xn
p(5)
1 2 3
18
SLIDE 34 Simulations: Questions of Interest
- Is it still beneficial to have rate-adaptive links under the
reconfiguration delay and perseverance constraints?
- What is a reasonable perseverance level?
- How well does perseverance smoothen the process?
- How hard is it to find a perseverance-aware solution?
19
SLIDE 35 Simulation Setup
- We simulate RADWAN, the optimal solution to RAP (OPT),
and the proposed algorithm (ALG) based on the the existing WAN topologies: SWAN, Internet2, and B4.
- The baseline case: T = 5 and ρ = 0.5.
(a) SWAN (8 nodes, 12 links) (b) B4 (18 nodes, 39 links)
20
SLIDE 36 Advantage of Rate Adaptive Links
- Is it still beneficial to have rate-adaptive links under the
reconfiguration delay and perseverance constraints?
- What is a reasonable perseverance level?
- How well does perseverance smoothen the process?
- How hard is it to find a perseverance-aware solution?
21
SLIDE 37 Advantage of Rate Adaptive Links
Table 1: Average throughput (Gbps) under different WANs. Our methods OPT and ALG boost the throughput by 40% to 50% while ensuring a more steady reconfiguration plan.
topology ρ ≈ 1 ρ = 0.5 ρ ≈ 0 w/o adaptive links OPT ALG RADWAN† SWAN 681.85 998.623 998.611 998.625 Internet2 1071.15 1510.13 1509.89 1510.15 B4 2621.12 3919.81 3919.14 3919.87
†We also simulate RADWAN with ǫ = 0.001, and the resulting average throughput
remains the same as ǫ = 0.1. 22
SLIDE 38 Convergence under Different Perseverance Levels
- Is it still beneficial to have rate-adaptive links under the
reconfiguration delay and perseverance constraints?
- What is a reasonable perseverance level?
→ How does a perseverance level slow down the convergence to the final throughput?
- How well does perseverance smoothen the process?
- How hard is it to find a perseverance-aware solution?
23
SLIDE 39
Convergence under Different Perseverance Levels
1 2 3 4 5 6 7 8 9 10 1,200 1,400 1,600 0.5 0.55 0.75 0.8 0.85 0.9 Step Throughput (Gbps)
Figure 2: Larger perseverance levels ρ (boxed values) prevent aggressive update with large disturbance, and hence, it takes more steps for ALG to converge to the maximum throughput.
24
SLIDE 40
Convergence under Different Perseverance Levels
1 2 3 4 5 6 7 8 9 10 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Number of Steps for Convergence Perseverance Level ρ
Figure 3: The 1st-5th-50th-95th-99th percentiles of the minimum number of steps needed for throughput convergence. When ρ = 0.5, ALG converges in 5 steps in 99% of the 1000 random cases.
25
SLIDE 41 Mitigation of Transition Fluctuation
- Is it still beneficial to have rate-adaptive links under the
reconfiguration delay and perseverance constraints?
- What is a reasonable perseverance level?
→ How does a perseverance level slow down the convergence to the final throughput?
- How well does perseverance smoothen the process?
- How hard is it to find a perseverance-aware solution?
26
SLIDE 42 Mitigation of Transition Fluctuation
OPT ALG RADWAN
200 400 600 800 1,000 1,200 1,400 Maximum Throughput Deviation (Gbps) (a) SWAN (8 nodes, 12 links) 1,000 2,000 3,000 4,000 5,000 Maximum Throughput Deviation (Gbps) (b) B4 (18 nodes, 39 links)
27
SLIDE 43 Comparison of Computation Overhead
- Is it still beneficial to have rate-adaptive links under the
reconfiguration delay and perseverance constraints?
- What is a reasonable perseverance level?
→ How does a perseverance level slow down the convergence to the final throughput?
- How well does perseverance smoothen the process?
- How hard is it to find a perseverance-aware solution?
→ What are the computation overheads?
28
SLIDE 44 Comparison of Computation Overhead
Table 2: Average CPU computation time (ms) and fraction. ALG uses much less time and scales better than OPT (current reconfiguration downtime is 68 s, which can be potentially reduced to 35 ms).
topology OPT ALG fraction ALG OPT
67.7 15.0 22.2% Internet2 497.1 39.6 8.0% B4 1.2 × 106 332.0 0.03%
29
SLIDE 45 Conclusion
- Instead of one-shot update and churn, we introduce the idea
perseverance for a multi-step reconfiguration.
- We propose an efficient algorithm (ALG) to approach RAP.
The proposed algorithm improves the overall throughput and smoothens the transition with small computation overhead.
- Besides perseverance, the network operators might maintain
some other properties (such as throughput level) during a multi-step reconfiguration. It is possible to extend the proposed multi-step update framework and examine some
30
SLIDE 46
Questions & Answers
SLIDE 47 References
- R. Singh, M. Ghobadi, K.-T. Foerster, M. Filer, and P. Gill,
“RADWAN: Rate adaptive wide area network,” in Proc. ACM SIGCOMM. ACM, 2018, pp. 547–560. C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill,
- M. Nanduri, and R. Wattenhofer, “Achieving high utilization
with software-driven WAN,” ACM SIGCOMM CCR, vol. 43,