Internet Routing Inefficient BGP is designed for scalability, - - PowerPoint PPT Presentation

▶

Jan 31, 2024 646 likes •709 views

Internet Routing Inefficient BGP is designed for scalability, sacrificing performance Resilient Overlay Networks Link outages common, but routing tables take minutes to update CS294-4 Presentation Summarized data creates

SLIDE 1

Resilient Overlay Networks

CS294-4 Presentation Nikita Borisov Sep 15, 2003

Internet Routing Inefficient

BGP is designed for scalability, sacrificing

performance

Link outages common, but routing tables

take minutes to update

Summarized data creates inefficient paths
No response to congestion

Network Redundancies Network Redundancies

Multiple paths exist between most hosts

– Many are not advertised due to private peering

Link outages lead to non-transitive

reachability

– A and C can’t reach each other but B can reach them both

Indirect paths often offer better performance

– (though possibly violate AUPs)

SLIDE 2

RON goals

Fast failure detection and recovery

– Seconds, not minutes

Integration with application

– Optimize routes for latency, throughput, etc.

Fine-grained policy specification

– E.g. keep commercial traffic off Internet2

Overlay Network

Small network - 3-50 nodes
Continuous measurement of each pairwise

link

Connectivity/performance stats distributed

globally

Pick best path out of direct and indirect
nes

– Restrict search to one indirect hop

Failure Detection

Active monitoring

– Send probes on each virtual link – One probe every 14s – Fast timeout probes if one is lost

Detect failure in under 20s

– Faster than any TCP timeout – Good enough for even human scale

Performance Metrics

Estimate latency based on RTT of probes

– Moving weighted average – Assume latency is symmetric

Estimate loss rate based on probes received

– Average of last 100 samples

Estimate TCP throughput

– Model TCP performance based on latency and loss rate

SLIDE 3

Path Selection

Always route around outages
Application can optimize for latency, loss rate,

throughput

– Throughput hard to optimize – Avoid bad-throughput routes instead – Exhaustively search all one-hop paths

Introduce hysteresis to prevent “route flapping”

Routing Policy

Policies specify which virtual links to use
Separate routing tables per policy
Packets classified with policy tag and

routed accordingly

Sample policy: exclusive clique

– Only members of clique can use links between each other – E.g. Internet2 hosts

Measurements

Two studies (RON1 and RON2)
RON recovers from 100% (RON1) or 60%

(RON2) outages and high loss rates

Routes around bad throughput failures

– Doubles TCP throughput in 5% of all samples

Reduces loss rate by 0.05 in 5% of samples

Performance Problems

RON worse in some cases

– Measurement inaccuracies – Information propagation delays – Hysteresis

But …

– RON win in most cases – RON loss never very large – RON win, though, can be dramatic

SLIDE 4

Overhead

Probing traffic - grows O(N)
Routing state traffic - grows O(N2)
Total BW consumed

– 2.2Kbps with 10 nodes – 33Kbps with 50 nodes

A limiting factor for scaling

Question

Is this overhead excessive?

– Less than 10% of a broadband link

What if RONs become more popular?
Is using a RON “cheating”?

Applications

Videoconferencing
Cooperating ISPs
Branch offices of companies
Others?