Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) - PowerPoint PPT Presentation

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) Alex C. Snoeren (UCSD) Hari Balakrishnan (MIT) October 2003 http://nms.lcs.mit.edu/ron/

Overview Best-path vs. redundant overlay routing • What tactics work best to – Reduce loss? – Reduce latency? – Avoid outages? • In what circumstances do they perform best? • Implications for new strategies

Context: Reliability via Path Diversity �� • Backup links provide alternatives ➔ Mechanisms for obtaining diversity (existing diversity) ➔ Mechanisms for using diversity (overlay techniques)

Obtaining Diversity Engineered diversity: �� Exploiting existing diversity: ��

Existing AS-level Redundancy • Traceroute between 12 hosts, showing Autonomous Systems (AS’s) AS5650 AS3 AS1239 AS5050 AS6521 AS13649 MIT Sightpath Aros CCI MA−Cable AS9 AS1742 AS1785 AS7015 AS701 AS210 AS6114 CMU Utah NYU AS226 AS7922 AS7018 AS702 UTREP AS1103 AS6453 AMNAP AS3967 VU−NL AS7280 AS145 AS1200 AS8297 AS3356 CA−T1 AS3756 AS9057 AS8709 AS13790 Abilene vBNS AS26 AS1 AS3561 AS1790 NYSERNet Cornell Known private peering AS209

Exploiting Diversity via overlays �� • Send packets through cooperating peers • End-hosts only, no network support

Exploiting Diversity via Overlays Reactive Routing Probes and Routing Updates �� • Probe paths �� • Route via best • RON (SOSP’01) �� Detour

Exploiting Diversity via Overlays Probes and Routing Updates �� Reactive Routing �� • Probe paths • Route via best �� Redundant Routing �� • Parallel paths • No probing �� • Mesh routing �� (SOSP’01)

Reactive vs. Redundant Routing 100% % Capacity used by data Probe/Redundant Traffic Capacity limit Data Traffic 0 Desired Loss Rate Improvement 0% 100% • Capacity limits probing and redundancy

Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence

Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit Reactive Redundant 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence

Reactive vs. Redundant Routing Best Expected Independence Path Limit Limit 100% % Capacity used by data Capacity limit Reactive Redundant 0 Desired Loss Rate Improvement 0% 100% • Reactive limit: best path performance • Redundant limit: Path independence • Overhead scaling: throughput vs. nodes

8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing

8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss

8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss Direct Rand 2pkts, Redundant routing, simplest

8 Routing Methods Direct Single packet, direct path Direct Direct 2 packets, direct, no spacing DD 10ms 2 packets, direct, 10ms spacing DD 20ms 2 packets, direct, 20ms spacing Lat Reactive routing, min latency Loss Reactive routing, min loss Direct Rand 2pkts, Redundant routing, simplest Lat Loss 2pkts, Reactive + Redundant (Falls back to random)

Probing on Internet Testbed Each node repeats: 1. Pick random node j 2. Pick one of the 8 routing types ( direct, loss, lat, etc. ) in round-robin order. Send to j . 3. Delay for random interval [0.6s - 1.2s] Probes are one-way, recorded at sender & receiver.

Datasets From Internet Deployment Dataset Nodes Time Measurements RON wide 17 5 days 4.7M RON narrow 17 3 days 2.8M RON 2003 30 14 days 32.6M ✔ Variety of network types and bandwidths 5 int’l, 3 Cable/DSL, 7 universities... ✔ N 2 path scaling ∼ 900 paths

One-way Loss Rates Are Low 1 0.9 0.8 fraction of paths 90% of paths under 1% loss rate 0.7 • Overall loss 0.6 0.5 0.42% 0.4 0.3 in 2003 0.2 2003 dataset 0.1 2002 dataset 0 0 1 2 3 4 5 6 7 average path−wide loss rate (%) • Includes quiescent periods • Outages still (painfully) apparent

Duplication Reduces Overall Loss Type Loss % direct 0.42 direct direct 0.30 dd 10ms 0.27 dd 20ms 0.27

Duplication Reduces Overall Loss Type Loss % direct 0.42 direct direct 0.30 dd 10ms 0.27 dd 20ms 0.27 Lat 0.43 Loss 0.33 Direct Rand 0.26 Lat Loss 0.23

Loss Probabilities Sanity Check • 0.42% loss << [Paxson 94,95] (2.8%, 5%). • Unloaded paths vs. loaded by TCP transfer • Conditional loss probabilities are similar P ( lose P2 | lost P1 ) Study ∼ 50% Paxson TCP Bolot 8ms spacing 60% RON 2003 no spacing 72% RON 2003 20ms 65% RON 2003 direct rand 62%

Latency Improvements 1 0.95 5% of connections exhibit large latency improvement Fraction of paths 0.9 0.85 Mean Latency lat loss 46.8 ms 0.8 lat 48.0 direct rand 51.7 0.75 direct 54.1 0.7 0 50 100 150 200 250 300 Latency (ms) Unlike loss, most latency from specific bad paths

# High Loss Periods (1 hr, normalized) > 0% Type direct 1 (8817) direct direct 0.59 dd 20ms 0.43 Lat 1.2 ← Worse than naive duplication Loss 0.80 Direct Rand 0.44 for low loss situations Lat Loss 0.38

# High Loss Periods (1 hr, normalized) > 0% > 30% Type direct 1 (8817) 1 (630) direct direct 0.59 0.93 dd 20ms 0.43 0.91 Lat 1.2 0.96 ← on par Loss 0.80 0.91 Direct Rand 0.44 0.92 Lat Loss 0.38 0.89

# High Loss Periods (1 hr, normalized) > 0% > 30% > 60% Type direct 1 (8817) 1 (630) 1 (255) direct direct 0.59 0.93 0.98 dd 20ms 0.43 0.91 0.98 Lat 1.2 0.96 0.91 0.86 ★ Loss 0.80 0.91 0.92 ★ Direct Rand 0.44 0.92 0.84 ★ Lat Loss 0.38 0.89

Measurement Summary ✔ Redundant beats reactive for low loss – “Meshing” beats controls during outages ✔ Reactive finds specific good paths – Latency improvements – Low loss paths ✘ No overlay technique near independent paths – Hypothesis: Access link failures – More severe outages harder to correct

Why Not FEC? Redundant assumption: Fast recovery, low rate 0.42% loss rate → need little redundancy 1st packet lost Recovery X ...100 packets... Failure losses bursty ( ≥ 0 . 5 conditional loss) ✘ Spread FEC over even more packets ➔ Latency-critical traffic: 2-redundant mesh

Conclusions • Loss rate for low-rate traffic low (0.42%) • Conditional loss probability high (0.72) even for random mesh (0.62) • 40-60% of loss avoidable ✔ Reundant: Avoiding low loss rates ✔ Reactive: Avoiding high loss, latency ➔ Low loss suggests selective approach ...

Future Work Strategies for avoiding losses and outages: • Selective redundancy: Protecting SYNs, etc. (shameless plug: Currently implementing) • Selective probing: Activate on first loss Measurements: • Engineered network redundancy impact? (testing now, looking for multihomed sites) http://nms.lcs.mit.edu/ron/

Scaling • Reactive: Scales with # nodes • Redundant: Scales with traffic volume

Best Path Scaling Routing and probing add packets: Responsiveness vs. overhead vs. size 35000 Overhead 30000 Overhead (bits/second) 30 nodes 25000 13.3Kbps 10 nodes 20000 2.2Kbps 15000 10000 50 nodes 33Kbps 5000 0 0 5 10 15 20 25 30 35 40 45 50 Number of Nodes • 50 nodes near limit, enough for many apps.

Best Path Routing �� Probes and Routing • Frequently measure all inter-node paths • Exchange routing information • Route along app-specific best path consistent with routing policy

Probing and Outage Detection Node A Node B I n i t i a l P i n ID 5: time 10 g 1 e ID 5: time 33 s n o p s e R ID 5: time 15 R e s p o n s e 2 ID 5: time 39 Record "success" with RTT 5 Record "success" with RTT 6 • Probe every random(14) seconds • 3 packets, both sides get RTT and reachability • If “lost probe,” send next immediately Timeout based on RTT and RTT variance • If N lost probes, notify outage

Architecture: Probing �� ➔ Probe between nodes, determine path qualities � N 2 � – O probe traffic with active probes – Passive measurements

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) - PowerPoint PPT Presentation

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) Alex C. Snoeren (UCSD) Hari Balakrishnan (MIT) October 2003 http://nms.lcs.mit.edu/ron/ Overview Best-path vs. redundant overlay routing What tactics work best to Reduce

A Novel Approach for Cooperative Overlay-Maintenance in Multi-Overlay Environments 1 Wu-Chun

The Implication of Overlay Routing The Implication of Overlay Routing on ISPs Connecting

CS5412: OVERLAY NETWORKS Lecture IV Ken Birman Overlay Networks 2 We use the term overlay

MIRO: M Multi ulti- -path path MIRO: Interdomain nterdomain RO ROuting uting I Wen Xu

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Routing on Overlay Networks EECS 228 Abhay Parekh parekh@eecs.berkeley.edu October 28, 2002

Advanced routing topics Tuomas Launiainen Suboptimal routing Routing trees Measurement of

BELTLINE OVERLAY DISTRICT Z-06-121 Beltline Zoning Overlay District Regulations CITY OF ATLANTA

Overlay-based IP Routing Richard Hartmann Chair for Network Architectures and Services

Interplay between routing and forwarding routing algorithm Routing Algorithms and Routing local

4.3 Routing protocols We first look at Routing Tables and routing mechanisms. A routing table has

Landmark Landmark-based routing based routing Landmark Landmark-based routing based routing

Outline Integer Programming DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Vehicle Routing

Global routing Global routing Global routing Global routing Bill Swartz Bill Swartz

Raspberry Pi: Getting Started and Creative Applications Presented by Ruth Suehle Tom Callaway

The Internet of Programmable Things Kasper Lund Background Kasper Lund, software engineer @

How to Prepare for Implementation March 10, 2020 Webinar starts at 12 PM CT Presented by TARA

About AROS - AROS is a open source rewrite of Amiga OS 3.1 - APL licence - Source Code

A Scalable Cross- -Platform Platform A Scalable Cross Infrastructure for Application

Automatic Design of Aircraft Arrival Routes with Limited Turning Angle Tobias Andersson Granberg,

Think Aloud This slideshow is inspired from Rolf Mlichs book Think aloud & Steve

Random Utilities Ivana Ljubic (ESSEC, Paris, France) Joint work with Eduardo Moreno (Universidad

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) - PowerPoint PPT Presentation

Best-Path vs. Multi-Path Overlay Routing David G. Andersen (MIT) Alex C. Snoeren (UCSD) Hari Balakrishnan (MIT) October 2003 http://nms.lcs.mit.edu/ron/ Overview Best-path vs. redundant overlay routing What tactics work best to Reduce

A Novel Approach for Cooperative Overlay-Maintenance in Multi-Overlay Environments 1 Wu-Chun

The Implication of Overlay Routing The Implication of Overlay Routing on ISPs Connecting

CS5412: OVERLAY NETWORKS Lecture IV Ken Birman Overlay Networks 2 We use the term overlay

MIRO: M Multi ulti- -path path MIRO: Interdomain nterdomain RO ROuting uting I Wen Xu

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Routing on Overlay Networks EECS 228 Abhay Parekh parekh@eecs.berkeley.edu October 28, 2002

Advanced routing topics Tuomas Launiainen Suboptimal routing Routing trees Measurement of

BELTLINE OVERLAY DISTRICT Z-06-121 Beltline Zoning Overlay District Regulations CITY OF ATLANTA

Overlay-based IP Routing Richard Hartmann Chair for Network Architectures and Services

Interplay between routing and forwarding routing algorithm Routing Algorithms and Routing local

4.3 Routing protocols We first look at Routing Tables and routing mechanisms. A routing table has

Landmark Landmark-based routing based routing Landmark Landmark-based routing based routing

Outline Integer Programming DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Vehicle Routing

Global routing Global routing Global routing Global routing Bill Swartz Bill Swartz

Raspberry Pi: Getting Started and Creative Applications Presented by Ruth Suehle Tom Callaway

The Internet of Programmable Things Kasper Lund Background Kasper Lund, software engineer @

How to Prepare for Implementation March 10, 2020 Webinar starts at 12 PM CT Presented by TARA

About AROS - AROS is a open source rewrite of Amiga OS 3.1 - APL licence - Source Code

A Scalable Cross- -Platform Platform A Scalable Cross Infrastructure for Application

Automatic Design of Aircraft Arrival Routes with Limited Turning Angle Tobias Andersson Granberg,

Think Aloud This slideshow is inspired from Rolf Mlichs book Think aloud &amp; Steve

Random Utilities Ivana Ljubic (ESSEC, Paris, France) Joint work with Eduardo Moreno (Universidad

Think Aloud This slideshow is inspired from Rolf Mlichs book Think aloud & Steve