SMORE: Semi-Oblivious Tra ffi c Engineering Praveen Kumar * Yang - - PowerPoint PPT Presentation

smore semi oblivious tra ffi c engineering
SMART_READER_LITE
LIVE PREVIEW

SMORE: Semi-Oblivious Tra ffi c Engineering Praveen Kumar * Yang - - PowerPoint PPT Presentation

SMORE: Semi-Oblivious Tra ffi c Engineering Praveen Kumar * Yang Yuan* Chris Yu Nate Foster* Robert Kleinberg* Petr Lapukhov # Chiun Lin Lim # Robert Soul * Cornell CMU # Facebook USI Lugano WAN Tra ffj c Engineering WAN Tra ffj


slide-1
SLIDE 1

SMORE: Semi-Oblivious Traffic Engineering

Praveen Kumar* Yang Yuan* Chris Yu‡ Nate Foster* 


Robert Kleinberg* Petr Lapukhov# Chiun Lin Lim# Robert Soulé §

* Cornell

‡ CMU # Facebook § USI Lugano

slide-2
SLIDE 2

WAN Traffjc Engineering

slide-3
SLIDE 3

WAN Traffjc Engineering

Objectives Challenges

Gbps

Performance Robustness Latency Operational simplicity

slide-4
SLIDE 4

WAN Traffjc Engineering

Objectives Challenges

Gbps

Performance Robustness Latency Operational simplicity Unstructured topology Unexpected failures Misprediction & Traffjc Bursts Heterogeneous capacity Update

  • verheads

Device limitations

slide-5
SLIDE 5

TE Approaches

Traditional Distributed SDN-Based Centralized

1 1 100 1 1 1 1 1 1 1

slide-6
SLIDE 6

TE Approaches

Traditional Distributed SDN-Based Centralized

1 1 100 1 1 1 1 1 1 1

slide-7
SLIDE 7

TE Approaches

Traditional Distributed SDN-Based Centralized

1 1 100 1 1 1 1 1 1 1 100

slide-8
SLIDE 8

TE Approaches

Traditional Distributed SDN-Based Centralized

1 1 100 1 1 1 1 1 1 1 100

slide-9
SLIDE 9

TE Approaches

Traditional Distributed SDN-Based Centralized

Optimal TE?
 (MCF)

1 1 100 1 1 1 1 1 1 1 100

slide-10
SLIDE 10

Operational Cost of Optimality

Solver Time

Time (seconds) Traffjc Matrix

slide-11
SLIDE 11

Operational Cost of Optimality

Path Churn

Churn (# paths) Traffjc Matrix

slide-12
SLIDE 12

Towards a Practical Model

Topology (+ demands) Path Selection Rate Adaptation Paths Splitting Ratio

Demands

1 2

slide-13
SLIDE 13

Towards a Practical Model

Topology (+ demands) Path Selection Rate Adaptation Paths Splitting Ratio

Demands

Computing and updating paths is typically expensive and slow. But updating splitting ratios is cheap and fast!

1 2

slide-14
SLIDE 14

Towards a Practical Model

Topology (+ demands) Path Selection Rate Adaptation Paths Splitting Ratio

Demands

Computing and updating paths is typically expensive and slow. But updating splitting ratios is cheap and fast!

S t a t i c D y n a m i c

1 2

slide-15
SLIDE 15

Path Selection Challenges

  • Selecting a good set of paths is tricky!
  • Route the demands (ideally, with competitive latency)
  • React to changes in demands (diurnal changes, traffjc bursts, etc.)
  • Be robust under mis-prediction of demands
  • Have suffjcient extra capacity to route demands in presence of failures
  • and more …
slide-16
SLIDE 16

Approach

A static set of cleverly-constructed paths can provide near-optimal performance and robustness!

Desired path properties:

  • Low stretch for minimizing latency
  • High diversity for ensuring robustness
  • Good load balancing for performance
  • Capacity aware
  • Globally optimized

{

slide-17
SLIDE 17

Path Properties: Capacity Aware

  • Traditional approaches to routing

based on shortest paths (e.g., ECMP, KSP) are generally not capacity aware

C B A G E F D

100 Gbps 10 Gbps

slide-18
SLIDE 18

Path Properties: Capacity Aware

  • Traditional approaches to routing

based on shortest paths (e.g., ECMP, KSP) are generally not capacity aware

C B A G E F D A C B

100 Gbps 10 Gbps

slide-19
SLIDE 19

Path Properties: Capacity Aware

  • Traditional approaches to routing

based on shortest paths (e.g., ECMP, KSP) are generally not capacity aware

C B A G E F D A C B

100 Gbps 10 Gbps

slide-20
SLIDE 20

Path Properties: Globally Optimal

Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal

C B A G E F D

Globally optimal CSPF

slide-21
SLIDE 21

Path Properties: Globally Optimal

Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal

C B A G E F D A

Globally optimal CSPF

slide-22
SLIDE 22

Path Properties: Globally Optimal

Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal

C B A G E F D A B

Globally optimal CSPF

slide-23
SLIDE 23

Path Properties: Globally Optimal

Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal

C B A G E F D A C B

Globally optimal CSPF

slide-24
SLIDE 24

Path Properties: Globally Optimal

Other approaches based on greedy algorithms are capacity aware, but are still not globally optimal

C B A G E F D A C B C B A G E F D A C B

Globally optimal CSPF

slide-25
SLIDE 25

Path Selection

Algorithm Load balanced Diverse Low-stretch Capacity aware Globally Optimized SPF / ECMP

❌ ❌ ❌ ✔

CSPF

✔ ❌ ❌ ✔

k-shortest paths

❌ ❌ ? ✔

Edge-disjoint KSP

❌ ❌ ✔ ✔

MCF

✔ ✔ ❌ ❌

VLB

❌ ❌ ✔ ❌

B4

✔ ✔ ❌ ? ? - Diffjcult to generalize

slide-26
SLIDE 26

Path Selection

Algorithm Load balanced Diverse Low-stretch Capacity aware Globally Optimized SPF / ECMP

❌ ❌ ❌ ✔

CSPF

✔ ❌ ❌ ✔

k-shortest paths

❌ ❌ ? ✔

Edge-disjoint KSP

❌ ❌ ✔ ✔

MCF

✔ ✔ ❌ ❌

VLB

❌ ❌ ✔ ❌

B4

✔ ✔ ❌ ? ? - Diffjcult to generalize

slide-27
SLIDE 27

Path Selection

Algorithm Load balanced Diverse Low-stretch Capacity aware Globally Optimized SPF / ECMP

❌ ❌ ❌ ✔

CSPF

✔ ❌ ❌ ✔

k-shortest paths

❌ ❌ ? ✔

Edge-disjoint KSP

❌ ❌ ✔ ✔

MCF

✔ ✔ ❌ ❌

VLB

❌ ❌ ✔ ❌

B4

✔ ✔ ❌ ? ? - Diffjcult to generalize

slide-28
SLIDE 28

Path Selection

Algorithm Load balanced Diverse Low-stretch Capacity aware Globally Optimized SPF / ECMP

❌ ❌ ❌ ✔

CSPF

✔ ❌ ❌ ✔

k-shortest paths

❌ ❌ ? ✔

Edge-disjoint KSP

❌ ❌ ✔ ✔

MCF

✔ ✔ ❌ ❌

VLB

❌ ❌ ✔ ❌

B4

✔ ✔ ❌ ? ? - Diffjcult to generalize

slide-29
SLIDE 29

Path Selection

Algorithm Load balanced Diverse Low-stretch Capacity aware Globally Optimized SPF / ECMP

❌ ❌ ❌ ✔

CSPF

✔ ❌ ❌ ✔

k-shortest paths

❌ ❌ ? ✔

Edge-disjoint KSP

❌ ❌ ✔ ✔

MCF

✔ ✔ ❌ ❌

VLB

❌ ❌ ✔ ❌

B4

✔ ✔ ❌ ? ? - Diffjcult to generalize

slide-30
SLIDE 30

Path Selection

Algorithm Load balanced Diverse Low-stretch Capacity aware Globally Optimized SPF / ECMP

❌ ❌ ❌ ✔

CSPF

✔ ❌ ❌ ✔

k-shortest paths

❌ ❌ ? ✔

Edge-disjoint KSP

❌ ❌ ✔ ✔

MCF

✔ ✔ ❌ ❌

VLB

❌ ❌ ✔ ❌

B4

✔ ✔ ❌ ? ? - Diffjcult to generalize

slide-31
SLIDE 31

Oblivious Routing

slide-32
SLIDE 32

VLB

  • Route through random

intermediate node

  • Works well for mesh topologies
  • WANs are not mesh-like
  • Good resilience
  • Poor performance & latency

Mesh

3 2 1 … N 4

slide-33
SLIDE 33

VLB

  • Route through random

intermediate node

  • Works well for mesh topologies
  • WANs are not mesh-like
  • Good resilience
  • Poor performance & latency

Mesh

3 2 1 … N 4

slide-34
SLIDE 34

Not Mesh

VLB

  • Route through random

intermediate node

  • Works well for mesh topologies
  • WANs are not mesh-like
  • Good resilience
  • Poor performance & latency
slide-35
SLIDE 35

Not Mesh

VLB

  • Route through random

intermediate node

  • Works well for mesh topologies
  • WANs are not mesh-like
  • Good resilience
  • Poor performance & latency
slide-36
SLIDE 36

Oblivious [Räcke ‘08]

  • Generalizes VLB to non-mesh
  • Distribution over routing trees
  • Approximation algorithm for

low-stretch trees [FRT ’04]

  • Penalize links based on usage
  • O(log n) competitive

Not Mesh

Low-stretch routing trees Probability

slide-37
SLIDE 37

Oblivious [Räcke ‘08]

  • Generalizes VLB to non-mesh
  • Distribution over routing trees
  • Approximation algorithm for

low-stretch trees [FRT ’04]

  • Penalize links based on usage
  • O(log n) competitive

Not Mesh

Low-stretch routing trees Probability

slide-38
SLIDE 38

Oblivious [Räcke ‘08]

  • Generalizes VLB to non-mesh
  • Distribution over routing trees
  • Approximation algorithm for

low-stretch trees [FRT ’04]

  • Penalize links based on usage
  • O(log n) competitive

Not Mesh

Low-stretch routing trees Probability

slide-39
SLIDE 39

Path Selection

Algorithm Load balanced Diverse Low-stretch Capacity aware Globally Optimized SPF / ECMP

❌ ❌ ❌ ✔

CSPF

✔ ❌ ❌ ✔

k-shortest paths

❌ ❌ ? ✔

Edge-disjoint KSP

❌ ❌ ✔ ✔

MCF

✔ ✔ ❌ ❌

VLB

❌ ❌ ✔ ❌

B4

✔ ✔ ❌ ?

SMORE / Oblivious

✔ ✔ ✔ ✔

slide-40
SLIDE 40

SMORE: Semi-Oblivious Routing

Oblivious Routing computes a set of paths which are low-stretch, robust and have good load balancing properties LP Optimizer balances load by dynamically adjusting splitting ratios used to map incoming traffjc fmows to paths

Path Selection Rate Adaptation

Semi-Oblivious Traffjc Engineering: The Road Not Taken [NSDI ’18]

slide-41
SLIDE 41

Semi-Oblivious Routing in Practice?

  • ▼ Previous work [Hajiaghayi et al.] established a worst-case competitive

ratio that is not much better than oblivious routing: Ω(log(n)/log (log(n)))

  • But the real-world does not typically exhibit worst-case scenarios
  • Implicit correlation between demands and link capacities 



 
 Question: How well does semi-oblivious routing perform in practice?

slide-42
SLIDE 42

Evaluation

slide-43
SLIDE 43

YA TES

Facebook’s Backbone Network

Source: https://research.fb.com/robust-and-efficient-traffic-engineering-with-oblivious-routing/ YATES: Rapid Prototyping for Traffic Engineering Systems [SOSR ’18]

slide-44
SLIDE 44

Performance

Metric Time

Throughput Congestion Drop

  • Max. Link Utilization
slide-45
SLIDE 45

Performance

Metric Time

Throughput Congestion Drop

  • Max. Link Utilization
slide-46
SLIDE 46

Robustness

Metric Time

Throughput Congestion Drop

  • Max. Link Utilization

Failure Drop

slide-47
SLIDE 47

Robustness

Path budget = 4

Metric Time

Throughput Congestion Drop

  • Max. Link Utilization

Failure Drop

slide-48
SLIDE 48

Operational Constraints - Path Budget

4-8x

Optimal SMORE MCF KSP+MCF R-MCF

Path Budget

  • Max. Link Utilization
slide-49
SLIDE 49

Large Scale Simulations

  • Conducted larger set of simulations on Internet Topology Zoo
  • 30 topologies from ISPs and content providers
  • Multiple traffjc matrices (gravity model), failure models and operational

conditions

slide-50
SLIDE 50

Do these results generalize?

Yes*

Throughput SLA SMORE Probability of achieving SLA Normalized Capacity KSP+MCF SMORE R-MCF FFC Oblivious ECMP

slide-51
SLIDE 51

Takeaways

  • Path selection plays an outsized role in the performance of TE systems
  • Semi-oblivious TE meets the competing objectives of performance and

robustness in modern networks

  • Oblivious routing for path selection + Dynamic load-balancing
  • Ongoing and future-work:
  • Apply to other networks (e.g. non-Clos DC topologies)
  • SR-based implementations and deployments
slide-52
SLIDE 52

Thank You!

Bobby Kleinberg Cornell Robert Soule Lugano Nate Foster Cornell Petr Lapukhov Facebook Chiun Lin Lim Facebook Chris Yu CMU Yang Yuan Cornell

Code: github.com/cornell-netlab/yates SMORE: Oblivious routing + Dynamic rate adaptation Learn more: www.cs.cornell.edu/~praveenk/smore/

NSDI ’18