Kulfi
Robust Traffic Engineering Using Semi-Oblivious Routing
Praveen Kumar, Yang Yuan, Chris Yu, Bobby Kleinberg, Robert Soulé, & Nate Foster Cornell, Carnegie Mellon, Microsoft Research, & Lugano
1
NetKAT f a t i c t r A * * C o m p l n t e t e e - - PowerPoint PPT Presentation
1 Kulfi Robust Tra ffi c Engineering Using Semi-Oblivious Routing Praveen Kumar, Yang Yuan, Chris Yu, Bobby Kleinberg, Robert Soul, & Nate Foster Cornell, Carnegie Mellon, Microsoft Research, & Lugano 1 Kulfi Tastes great, no
Robust Traffic Engineering Using Semi-Oblivious Routing
Praveen Kumar, Yang Yuan, Chris Yu, Bobby Kleinberg, Robert Soulé, & Nate Foster Cornell, Carnegie Mellon, Microsoft Research, & Lugano
1
Robust Traffic Engineering Using Semi-Oblivious Routing
Praveen Kumar, Yang Yuan, Chris Yu, Bobby Kleinberg, Robert Soulé, & Nate Foster Cornell, Carnegie Mellon, Microsoft Research, & Lugano
1
Tastes great, no churn!
NetKAT
2
Probabilistic NetKAT Nate Foster1, Dexter Kozen1, Konstantinos Mamouras2∗, Mark Reitblatt3∗, and Alexandra Silva4 1 Cornell University 2 University of Pennsylvania 3 Facebook 4 University College London[ESOP ’16] [PLDI ’16]
NetKAT
2
Probabilistic NetKAT Nate Foster1, Dexter Kozen1, Konstantinos Mamouras2∗, Mark Reitblatt3∗, and Alexandra Silva4 1 Cornell University 2 University of Pennsylvania 3 Facebook 4 University College London[ESOP ’16] [PLDI ’16]
A Bus Ride...
3
“Why aren’t more algorithms researchers working on SDN?”
WAN Traffic Engineering
Network infrastructure is expensive! Operators must balance latency-sensitive customer traffic with high-volume, operational traffic Many competing objectives:
Balances load Achieves low latency Tolerates failures Simple to implement
4
Challenges
5
West East
Challenges
5
West East
Device Limitations
Challenges
5
West East
Sporadic shortcuts Device Limitations
Challenges
5
West East
Sporadic shortcuts Sparse bisection Device Limitations
Challenges
5
West East
Sporadic shortcuts Sparse bisection Unexpected Failures Device Limitations
Challenges
5
West East
Sporadic shortcuts Sparse bisection Unexpected Failures Misprediction & Bursts Device Limitations
Routing Scheme
6
send traffic from sources to destinations?
flows onto multiple forwarding paths?
Routing Scheme
6
send traffic from sources to destinations?
flows onto multiple forwarding paths?
Routing Scheme
6
send traffic from sources to destinations?
flows onto multiple forwarding paths?
Optimal Approach (Strawman MCF)
7
historical data
sending rates from solution
Optimal Approach (Strawman MCF)
7
historical data
sending rates from solution
Centralized Traffic Engineering
8
SWAN & B4 [SIGCOMM ’13]
forwarding paths between each source and destination (e.g., K-shortest paths)
rates in response to (estimated or scheduled) demands
Centralized Traffic Engineering
8
SWAN & B4 [SIGCOMM ’13]
forwarding paths between each source and destination (e.g., K-shortest paths)
rates in response to (estimated or scheduled) demands
Centralized Traffic Engineering
8
SWAN & B4 [SIGCOMM ’13]
forwarding paths between each source and destination (e.g., K-shortest paths)
rates in response to (estimated or scheduled) demands
Talk Outline
Motivation Randomized Routing Evaluation Conclusions
9
Randomized Routing
10
ECMP
11
least-cost paths
packet header fields
least cost paths
ECMP
11
least-cost paths
packet header fields
least cost paths
Valiant Load Balancing
12
intermediate node
intermediate node
to destination
Valiant Load Balancing
12
intermediate node
intermediate node
to destination
Valiant Load Balancing
13
West East
Valiant Load Balancing
13
West East
Oblivious Routing
14
A routing tree is an overlay in which nodes correspond to physical nodes and edges to physical paths A randomized routing tree is probability distribution over routing trees Intuition: there is a duality between low-stretch routing trees and low-congestion routing schemes
Räcke’s Algorithm
Räcke’s algorithm iteratively constructs a randomized routing tree At each iteration, it penalizes edges that have been heavily utilized in previous trees Achieves a polylogarithmic competitive ratio with respect to the
15
Semi-Oblivious Routing
Semi-oblivious routing combines Räcke’s oblivious routing with dynamic rate adaptation / local failure recovery Forwarding paths: computed statically Sending rates: adapt to changing demands 👏 Hajiaghayi et al. proved Ω(log(n)/log (log(n))) competitive ratio 👎 Realistic workloads are different from worst-case
16
SDN Implementation & Evaluation
17
Kulfi Framework
18
Implemented over a dozen different traffic engineering schemes Measure performance in simulator and hardware testbed with a variety of demands and failures Used “local” failure recovery
Kulfi Framework
18
Implemented over a dozen different traffic engineering schemes Measure performance in simulator and hardware testbed with a variety of demands and failures Used “local” failure recovery [ ]
Kulfi Framework
18
Implemented over a dozen different traffic engineering schemes Measure performance in simulator and hardware testbed with a variety of demands and failures Used “local” failure recovery [ ]
Kulfi Framework
18
Implemented over a dozen different traffic engineering schemes Measure performance in simulator and hardware testbed with a variety of demands and failures Used “local” failure recovery [ ]
Kulfi Framework
18
Implemented over a dozen different traffic engineering schemes Measure performance in simulator and hardware testbed with a variety of demands and failures Used “local” failure recovery [ ]
Visualizing Routing Schemes
19
SDN Implementation
20
SDN Controller SDN Switch Netfilter Module User- Space Agent Linux Kernel Linux End Host data traffic forwarding rules Traffic statistics Traffic matrix + Path map Traffic statistics Historical Data
Hardware Testbed
21
Facebook Backbone: Simulation
22
Facebook Backbone: Simulation
22
Constant factor
Abilene Topology
23
S11 S10 S4 S8 S5 S2 S7 S3 S6 S9 S12 S1
Emulated Abilene topology in hardware test bed Used real-world and worst case traffic scenarios Compared shortest-path, ECMP , MCF , oblivious, and semi-oblivious
Abilene Topology
23
S11 S10 S4 S8 S5 S2 S7 S3 S6 S9 S12 S1
Emulated Abilene topology in hardware test bed Used real-world and worst case traffic scenarios Compared shortest-path, ECMP , MCF , oblivious, and semi-oblivious Artificial traffic
Abilene Topology: Simulated Workload
24 SPF max ECMP max Obliv max Semi Obliv max MCF max SPF median ECMP median Obliv median Semi Obliv median MCF median
0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 140 160 180 Link congestion Time (minutes) Abilene Gravity + Artificial Traffic
Topology Zoo: Failures
25 % Loss due to Failure Time
Selected Topology Zoo: Latency
26 Fraction Delivered Latency
JANET GeantConclusions
Randomization can dramatically simplify traffic engineering while balancing competing objectives Oblivious routing performs much better in practice than expected, avoids problems associated with churn, and load-balances better Semi-oblivious routing provides near-optimal performance in real-world scenarios, even in the presence of demand misprediction, traffic bursts, and failures Ongoing work: working with large ISP and content provider to further refine and evaluate Kulfi
27
Team Kulfi
28
Chris Yu ‘15 Praveen Kumar Yang Yuan Bobby Kleinberg Robert Soulé
https://github.com/merlin-lang/kulfi
Topology Zoo, Traffic Burst
29
Throughput Burst Amount