best path vs multi path overlay routing
play

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) - PowerPoint PPT Presentation

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) Alex C. Snoeren (UCSD) Hari Balakrishnan (MIT) October 2003 http://nms.lcs.mit.edu/ron/ Overview Best-path vs. redundant overlay routing What tactics work best to Reduce


  1. Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) Alex C. Snoeren (UCSD) Hari Balakrishnan (MIT) October 2003 http://nms.lcs.mit.edu/ron/

  2. Overview Best-path vs. redundant overlay routing • What tactics work best to – Reduce loss? – Reduce latency? – Avoid outages? • In what circumstances do they perform best? • Implications for new strategies

  3. Context: Reliability via Path Diversity �� �� �� �� �� �� • Backup links provide alternatives ➔ Mechanisms for obtaining diversity (existing diversity) ➔ Mechanisms for using diversity (overlay techniques)

  4. Obtaining Diversity Engineered diversity: �� �� �� �� �� �� Exploiting existing diversity: �� �� �� �� �� ��

  5. Existing AS-level Redundancy • Traceroute between 12 hosts, showing Autonomous Systems (AS’s) AS5650 AS3 AS1239 AS5050 AS6521 AS13649 MIT Sightpath Aros CCI MA−Cable AS9 AS1742 AS1785 AS7015 AS701 AS210 AS6114 CMU Utah NYU AS226 AS7922 AS7018 AS702 UTREP AS1103 AS6453 AMNAP AS3967 VU−NL AS7280 AS145 AS1200 AS8297 AS3356 CA−T1 AS3756 AS9057 AS8709 AS13790 Abilene vBNS AS26 AS1 AS3561 AS1790 NYSERNet Cornell Known private peering AS209

  6. Exploiting Diversity via overlays �� �� �� �� �� �� �� �� �� �� �� �� • Send packets through cooperating peers • End-hosts only, no network support

  7. Exploiting Diversity via Overlays Reactive Routing Probes and Routing Updates �� �� • Probe paths �� �� �� �� • Route via best • RON (SOSP’01) �� �� �� �� �� �� Detour

  8. Exploiting Diversity via Overlays Probes and Routing Updates �� �� Reactive Routing �� �� �� �� • Probe paths • Route via best �� �� �� �� �� �� Redundant Routing �� �� �� �� �� �� • Parallel paths • No probing �� �� �� �� • Mesh routing �� �� (SOSP’01)

  9. Reactive vs. Redundant Routing 100% % Capacity used by data Probe/Redundant Traffic Capacity limit Data Traffic 0 Desired Loss Rate Improvement 0% 100% • Capacity limits probing and redundancy

  10. Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence

  11. Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit Reactive Redundant 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence

  12. Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit Reactive Redundant 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence • Overhead scaling: throughput vs. nodes

  13. 8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing

  14. 8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss

  15. 8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss Direct Rand 2pkts, Redundant routing, simplest

  16. 8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss Direct Rand 2pkts, Redundant routing, simplest Lat Loss 2pkts, Reactive + Redundant (Falls back to random)

  17. Probing on Internet Testbed Each node repeats: 1. Pick random node j 2. Pick one of the 8 routing types ( direct, loss, lat, etc. ) in round-robin order. Send to j . 3. Delay for random interval [0.6s - 1.2s] Probes are one-way, recorded at sender & receiver.

  18. Datasets From Internet Deployment Dataset Nodes Time Measurements RON wide 17 5 days 4.7M RON narrow 17 3 days 2.8M RON 2003 30 14 days 32.6M ✔ Variety of network types and bandwidths 5 int’l, 3 Cable/DSL, 7 universities... ✔ N 2 path scaling ∼ 900 paths

  19. One-way Loss Rates Are Low 1 0.9 0.8 fraction of paths 90% of paths under 1% loss rate 0.7 • Overall loss 0.6 0.5 0.42% 0.4 0.3 in 2003 0.2 2003 dataset 0.1 2002 dataset 0 0 1 2 3 4 5 6 7 average path−wide loss rate (%) • Includes quiescent periods • Outages still (painfully) apparent

  20. Duplication Reduces Overall Loss Type Loss % direct 0.42 direct direct 0.30 dd 10ms 0.27 dd 20ms 0.27

  21. Duplication Reduces Overall Loss Type Loss % direct 0.42 direct direct 0.30 dd 10ms 0.27 dd 20ms 0.27 Lat 0.43 Loss 0.33 Direct Rand 0.26 Lat Loss 0.23

  22. Loss Probabilities Sanity Check • 0.42% loss << [Paxson 94,95] (2.8%, 5%). • Unloaded paths vs. loaded by TCP transfer • Conditional loss probabilities are similar P ( lose P2 | lost P1 ) Study ∼ 50% Paxson TCP Bolot 8ms spacing 60% RON 2003 no spacing 72% RON 2003 20ms 65% RON 2003 direct rand 62%

  23. Latency Improvements 1 0.95 5% of connections exhibit large latency improvement Fraction of paths 0.9 0.85 Mean Latency lat loss 46.8 ms 0.8 lat 48.0 direct rand 51.7 0.75 direct 54.1 0.7 0 50 100 150 200 250 300 Latency (ms) Unlike loss, most latency from specific bad paths

  24. # High Loss Periods (1 hr, normalized) > 0% Type direct 1 (8817) direct direct 0.59 dd 20ms 0.43 Lat 1.2 ← Worse than naive duplication Loss 0.80 Direct Rand 0.44 for low loss situations Lat Loss 0.38

  25. # High Loss Periods (1 hr, normalized) > 0% > 30% Type direct 1 (8817) 1 (630) direct direct 0.59 0.93 dd 20ms 0.43 0.91 Lat 1.2 0.96 ← on par Loss 0.80 0.91 Direct Rand 0.44 0.92 Lat Loss 0.38 0.89

  26. # High Loss Periods (1 hr, normalized) > 0% > 30% > 60% Type direct 1 (8817) 1 (630) 1 (255) direct direct 0.59 0.93 0.98 dd 20ms 0.43 0.91 0.98 Lat 1.2 0.96 0.91 0.86 ★ Loss 0.80 0.91 0.92 ★ Direct Rand 0.44 0.92 0.84 ★ Lat Loss 0.38 0.89

  27. Measurement Summary ✔ Redundant beats reactive for low loss – “Meshing” beats controls during outages ✔ Reactive finds specific good paths – Latency improvements – Low loss paths ✘ No overlay technique near independent paths – Hypothesis: Access link failures – More severe outages harder to correct

  28. Why Not FEC? Redundant assumption: Fast recovery, low rate 0.42% loss rate → need little redundancy 1st packet lost Recovery X ...100 packets... Failure losses bursty ( ≥ 0 . 5 conditional loss) ✘ Spread FEC over even more packets ➔ Latency-critical traffic: 2-redundant mesh

  29. Conclusions • Loss rate for low-rate traffic low (0.42%) • Conditional loss probability high (0.72) even for random mesh (0.62) • 40-60% of loss avoidable ✔ Reundant: Avoiding low loss rates ✔ Reactive: Avoiding high loss, latency ➔ Low loss suggests selective approach ...

  30. Future Work Strategies for avoiding losses and outages: • Selective redundancy: Protecting SYNs, etc. (shameless plug: Currently implementing) • Selective probing: Activate on first loss Measurements: • Engineered network redundancy impact? (testing now, looking for multihomed sites) http://nms.lcs.mit.edu/ron/

  31. Scaling • Reactive: Scales with # nodes • Redundant: Scales with traffic volume

  32. Best Path Scaling Routing and probing add packets: Responsiveness vs. overhead vs. size 35000 Overhead 30000 Overhead (bits/second) 30 nodes 25000 13.3Kbps 10 nodes 20000 2.2Kbps 15000 10000 50 nodes 33Kbps 5000 0 0 5 10 15 20 25 30 35 40 45 50 Number of Nodes • 50 nodes near limit, enough for many apps.

  33. Best Path Routing �� �� �� �� �� �� �� �� �� �� �� �� Probes and Routing • Frequently measure all inter-node paths • Exchange routing information • Route along app-specific best path consistent with routing policy

  34. Probing and Outage Detection Node A Node B I n i t i a l P i n ID 5: time 10 g 1 e ID 5: time 33 s n o p s e R ID 5: time 15 R e s p o n s e 2 ID 5: time 39 Record "success" with RTT 5 Record "success" with RTT 6 • Probe every random(14) seconds • 3 packets, both sides get RTT and reachability • If “lost probe,” send next immediately Timeout based on RTT and RTT variance • If N lost probes, notify outage

  35. Architecture: Probing �� �� �� �� �� �� �� �� �� �� �� �� ➔ Probe between nodes, determine path qualities � N 2 � – O probe traffic with active probes – Passive measurements

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend