Internet Routing Inefficient BGP is designed for scalability, - - PowerPoint PPT Presentation

internet routing inefficient
SMART_READER_LITE
LIVE PREVIEW

Internet Routing Inefficient BGP is designed for scalability, - - PowerPoint PPT Presentation

Internet Routing Inefficient BGP is designed for scalability, sacrificing performance Resilient Overlay Networks Link outages common, but routing tables take minutes to update CS294-4 Presentation Summarized data creates


slide-1
SLIDE 1

Resilient Overlay Networks

CS294-4 Presentation Nikita Borisov Sep 15, 2003

Internet Routing Inefficient

  • BGP is designed for scalability, sacrificing

performance

  • Link outages common, but routing tables

take minutes to update

  • Summarized data creates inefficient paths
  • No response to congestion

Network Redundancies Network Redundancies

  • Multiple paths exist between most hosts

– Many are not advertised due to private peering

  • Link outages lead to non-transitive

reachability

– A and C can’t reach each other but B can reach them both

  • Indirect paths often offer better performance

– (though possibly violate AUPs)

slide-2
SLIDE 2

RON goals

  • Fast failure detection and recovery

– Seconds, not minutes

  • Integration with application

– Optimize routes for latency, throughput, etc.

  • Fine-grained policy specification

– E.g. keep commercial traffic off Internet2

Overlay Network

  • Small network - 3-50 nodes
  • Continuous measurement of each pairwise

link

  • Connectivity/performance stats distributed

globally

  • Pick best path out of direct and indirect
  • nes

– Restrict search to one indirect hop

Failure Detection

  • Active monitoring

– Send probes on each virtual link – One probe every 14s – Fast timeout probes if one is lost

  • Detect failure in under 20s

– Faster than any TCP timeout – Good enough for even human scale

Performance Metrics

  • Estimate latency based on RTT of probes

– Moving weighted average – Assume latency is symmetric

  • Estimate loss rate based on probes received

– Average of last 100 samples

  • Estimate TCP throughput

– Model TCP performance based on latency and loss rate

slide-3
SLIDE 3

Path Selection

  • Always route around outages
  • Application can optimize for latency, loss rate,

throughput

– Throughput hard to optimize – Avoid bad-throughput routes instead – Exhaustively search all one-hop paths

  • Introduce hysteresis to prevent “route flapping”

Routing Policy

  • Policies specify which virtual links to use
  • Separate routing tables per policy
  • Packets classified with policy tag and

routed accordingly

  • Sample policy: exclusive clique

– Only members of clique can use links between each other – E.g. Internet2 hosts

Measurements

  • Two studies (RON1 and RON2)
  • RON recovers from 100% (RON1) or 60%

(RON2) outages and high loss rates

  • Routes around bad throughput failures

– Doubles TCP throughput in 5% of all samples

  • Reduces loss rate by 0.05 in 5% of samples

Performance Problems

  • RON worse in some cases

– Measurement inaccuracies – Information propagation delays – Hysteresis

  • But …

– RON win in most cases – RON loss never very large – RON win, though, can be dramatic

slide-4
SLIDE 4

Overhead

  • Probing traffic - grows O(N)
  • Routing state traffic - grows O(N2)
  • Total BW consumed

– 2.2Kbps with 10 nodes – 33Kbps with 50 nodes

  • A limiting factor for scaling

Question

  • Is this overhead excessive?

– Less than 10% of a broadband link

  • What if RONs become more popular?
  • Is using a RON “cheating”?

Applications

  • Videoconferencing
  • Cooperating ISPs
  • Branch offices of companies
  • Others?

Discussion