Primitives for Active Internet Topology Mapping: Toward - - PowerPoint PPT Presentation

primitives for active internet topology mapping toward
SMART_READER_LITE
LIVE PREVIEW

Primitives for Active Internet Topology Mapping: Toward - - PowerPoint PPT Presentation

Primitives for Active Internet Topology Mapping: Toward High-Frequency Characterization Robert Beverly, Arthur Berger , Geoffrey Xie Naval Postgraduate School MIT/Akamai November 2, 2010 ACM Internet Measurement Conference R. Beverly,


slide-1
SLIDE 1

Primitives for Active Internet Topology Mapping: Toward High-Frequency Characterization

Robert Beverly, Arthur Berger∗, Geoffrey Xie

Naval Postgraduate School

∗MIT/Akamai

November 2, 2010 ACM Internet Measurement Conference

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 1 / 22

slide-2
SLIDE 2

The Problem Motivation

Internet Topology

Long-standing question: What is the topology of the Internet? Difficult to answer – Internet is: A large, complex distributed system (organism) Non-stationary (in time) Difficult to observe, multi-party (information hiding) Poorly instrumented (not part of original design) ⇒ Poorly understood topology (interface, router, or AS level)

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 2 / 22

slide-3
SLIDE 3

The Problem Motivation

Internet Topology

Long-standing question: What is the topology of the Internet? Difficult to answer – Internet is: A large, complex distributed system (organism) Non-stationary (in time) Difficult to observe, multi-party (information hiding) Poorly instrumented (not part of original design) ⇒ Poorly understood topology (interface, router, or AS level)

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 2 / 22

slide-4
SLIDE 4

The Problem Challenges

What is the topology of the Internet?

Why care? Network Robustness: to failure, to attacks, and how to best

  • improve. (antithesis – how to mount attacks)

Impact on Research: network modeling, routing protocol validation, new architectures, Internet evolution, etc. Easy to get wrong (see e.g. “What are our standards for validation

  • f measurement-based networking research?” [KW08])

These challenges and opportunities are well-known. We bring some novel insights to bear on the problem.

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 3 / 22

slide-5
SLIDE 5

The Problem Challenges

Our Work

Our focus: Active probing from a fixed set of vantage points High-frequency, high-fidelity continuous characterization Use external knowledge and adaptive sampling to solve:

Which destinations to probe How/where to perform the probe

This Talk:

1

Characterize production topology mapping systems

2

Develop/analyze new primitives for active topology discovery

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 4 / 22

slide-6
SLIDE 6

The Problem Measurement Techniques

Archipelago/Skitter/iPlane

Production Topology Measurement Ark/Skitter (CAIDA), iPlane (UW) Multiple days and significant resources for complete cycle Ark probing strategy: IPv4 space divided into /24’s; partitioned across ∼ 41 monitors From each /24, select a single address at random to probe Probe == Scamper [L10]; record router interfaces on forward path A “cycle” == probes to all routed /24’s Investigate one vantage point (Jan, 2010): Ark iPlane Traces 263K 150K Probes 4.4M 2.5M Prefixes 55K 30K

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 5 / 22

slide-7
SLIDE 7

The Problem Measurement Techniques

Path-pair Distance Metric

Q1: How similar are traceroutes to the same destination BGP prefix? Use Levenshtein “edit” distance DP algorithm Determine the minimum number of edits (insert, delete, substitute) to transform one string into another e.g. “robert” → “robber” = 2 We use: Σ = {0, 1, . . . , 232 − 1} Each unsigned 32-bit IP address along traceroute paths ∈ Σ ED=2

129.186.6.251 129.186.254.131 192.245.179.52 4.53.34.13 129.186.6.251 192.245.179.52 4.69.145.12

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 6 / 22

slide-8
SLIDE 8

The Problem Measurement Techniques

Path-pair Distance Metric

Q1: How similar are traceroutes to the same destination BGP prefix? Use Levenshtein “edit” distance DP algorithm Determine the minimum number of edits (insert, delete, substitute) to transform one string into another e.g. “robert” → “robber” = 2 We use: Σ = {0, 1, . . . , 232 − 1} Each unsigned 32-bit IP address along traceroute paths ∈ Σ ED=2

129.186.6.251 129.186.254.131 192.245.179.52 4.53.34.13 129.186.6.251 192.245.179.52 4.69.145.12

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 6 / 22

slide-9
SLIDE 9

The Problem Measurement Techniques

Path-pair Distance Metric

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 Cumulative Fraction of Path Pairs Levenshtein Edit Distance Intra-BGP Prefix (Ark) Intra-BGP Prefix (iPlane) Random Prefix Pair

Q1: How similar are traceroutes to the same destination BGP prefix? ∼60% of traces to destinations in same BGP prefix have ED ≤ 3 Fewer than 50% of random traces have ED ≤ 10

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 7 / 22

slide-10
SLIDE 10

The Problem Measurement Techniques

Path-pair Distance Metric

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 Cumulative Fraction of Path Pairs Levenshtein Edit Distance Intra-BGP Prefix (Ark) Intra-BGP Prefix (iPlane) Random Prefix Pair

Q1: How similar are traceroutes to the same destination BGP prefix? ∼60% of traces to destinations in same BGP prefix have ED ≤ 3 Fewer than 50% of random traces have ED ≤ 10 Confirms our intuition

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 7 / 22

slide-11
SLIDE 11

The Problem Measurement Techniques

Edit Distance

Q2: How much path variance is due to the last-hop AS? Intuitively, number of potential paths exponential in the depth More information gain at the end of the traceroute?

Rtr Rtr Rtr Rtr Internet Monitor Rtr Rtr Rtr Rtr

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 8 / 22

slide-12
SLIDE 12

The Problem Measurement Techniques

Edit Distance

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 Cumulative Fraction of Path Pairs Levenshtein Edit Distance (last-hop AS removed) Intra-BGP Prefix (Ark) Intra-BGP Prefix (iPlane) Random Prefix Pair

Q2: Variance due to the last-hop AS? Lob off last AS Answer: lots! For ∼ 70% of probes to same prefix, we get no additional information beyond leaf AS

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 9 / 22

slide-13
SLIDE 13

The Problem Measurement Techniques

Edit Distance

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 Cumulative Fraction of Path Pairs Levenshtein Edit Distance (last-hop AS removed) Intra-BGP Prefix (Ark) Intra-BGP Prefix (iPlane) Random Prefix Pair

Q2: Variance due to the last-hop AS? Lob off last AS Answer: lots! For ∼ 70% of probes to same prefix, we get no additional information beyond leaf AS Significant packet savings possible (DoubleTree)

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 9 / 22

slide-14
SLIDE 14

Methodology

Adaptive Probing Methodology

Meta-Conclusion: adaptive probing a useful strategy We develop three primitives:

1

Subnet Centric Probing

2

Vantage Point Spreading

3

Interface Set Cover These primitives leverage adaptive sampling, external knowledge (e.g., common subnetting structure, BGP , etc), and data from prior cycles to maximize efficiency and information gain of each probe.

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 10 / 22

slide-15
SLIDE 15

Methodology

Adaptive Probing Methodology

We develop three primitives:

1

Subnet Centric Probing

2

Vantage Point Spreading

3

Interface Set Cover Best explained by understanding sources of path diversity:

D2 D3 AS Ingress D1 Vantage Point Vantage Point Vantage Point Vantage Point

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 11 / 22

slide-16
SLIDE 16

Methodology

Subnet Centric Probing

Granularity vs. Scaling ∼ 232−1 possible destinations (2.9B from Jan 2010 routeviews) What granularity? /24’s? Prefixes? AS’s? Subnet Centric Probing

D2 D3 AS Ingress D1 Vantage Point

From a single vantage point, no path diversity into the AS Path diversity due to AS-internal structure

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 12 / 22

slide-17
SLIDE 17

Methodology

Subnet Centric Probing

D2 D3 AS Ingress D1 Vantage Point

Goal: adapt granularity, discover internal structure Leverage BGP as coarse structure Follow least common prefix: iteratively pick destinations within prefix that are maximally distant (in subnetting sense) Address “distance” is misleading: e.g. 18.255.255.100 vs. 19.0.0.4 vs. 18.0.0.5 Stopping criterion: ED(ti, ti+1) ≤ τ; τ = 3

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 13 / 22

slide-18
SLIDE 18

Methodology

Subnet Centric Probing

1 10 100 1000 10000 100000 1 10 100 1000 Count Degree Prefix Directed AS Directed Subnet Centric Ark (Ground Truth) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Verticies Edges Difference from Ark Ground Truth Subnet-Centric Prefix-Directed AS-Directed

Inferred degree distribution well- approximates ground-truth Captures ≥ 90% of the vertex and edge fidelity

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 14 / 22

slide-19
SLIDE 19

Methodology

Subnet Centric Probing

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Verticies Edges Difference from Ark Ground Truth Subnet-Centric Prefix-Directed AS-Directed 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Probes(load) Difference from Ark Ground Truth Subnet-Centric Prefix-Directed AS-Directed

Captures ≥ 90% of the vertex and edge fidelity Using ∼ 60% of ground-truth load

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 15 / 22

slide-20
SLIDE 20

Methodology

Vantage Point Spreading

Vantage Point Spreading

D2 D3 AS Ingress D1 Vantage Point Vantage Point Vantage Point Vantage Point

Discover AS ingress points and paths to the AS via multiple vantage points Random assignment of destinations to vantage points is wasteful E.g. empirically, the 16 /24’s in a /20 prefix are hit on average by 12 unique VPs

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 16 / 22

slide-21
SLIDE 21

Methodology

Vantage Point Spreading

Vantage Point Spreading

D2 D3 AS Ingress D1 Vantage Point Vantage Point Vantage Point Vantage Point

Using BGP knowledge, maximize the number of distinct VPs per-prefix Note, this is complimentary to SCP

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 17 / 22

slide-22
SLIDE 22

Methodology

Vantage Point Spreading

50 100 150 200 250 300 350 400 5 10 15 20 25 30 35 40 Unique Interfaces Discovered Number of VPs for Destination VP Influence y=10x

2000 4000 6000 8000 10000 12000 / 2 2 / 2 1 / 2 Vertices Size of Prefix from which /24s Drawn VP Selection Spreading Random Single

Diminishing return

  • f

vantage point influence Vertices in resulting graph as com- pared to random: ∼ 6% increase “for free.”

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 18 / 22

slide-23
SLIDE 23

Methodology

Interface Set Cover

Interface Set Cover As shown in preceding analysis, full traces very inefficient Perform greedy minimum set cover approximation (NP-complete) Select subset of prior round probe packets for current round

D2 D3 D1 Vantage Point Vantage Point Vantage Point Vantage Point

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 19 / 22

slide-24
SLIDE 24

Methodology

Interface Set Cover

Interface Set Cover Generalizes DoubleTree [DRFC05] without parametrization Efficient Inherently multi-round Additional probing for validation mis-matches (e.g. load balancing, new paths)

D2 D3 D1 Vantage Point Vantage Point Vantage Point Vantage Point

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 20 / 22

slide-25
SLIDE 25

Methodology

Interface Set Cover

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 1 2 3 4 5 6 7 8 9 10 11 Fraction of Missed Interfaces Cycle Separation (Days) Full Trace SetCover ISC 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4 5 6 7 8 9 10 11 Load Ratio to Full Traceroutes Cycle Separation (Days) Full Trace SetCover ISC

20K random IP destinations each day over a two-week period, frac- tion of missing interface using ISC Uses ≤ 20% of the full probing load (∼ 30% of full trace set cover)

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 21 / 22

slide-26
SLIDE 26

Summary

Summary

Take-Aways: Deconstructed Ark/iPlane topology tracing as case study Developed primitives for faster, more efficient probing:

Subnet Centric Probing, Interface Set Cover, Vantage Point Spreading Significant load savings without sacrificing fidelity

Future Combining our primitives on production system Refine ISC “change-driven” logic Build a better Internet scope to detect small-scale dynamics Thanks! Questions?

  • R. Beverly, A. Berger, G. Xie (NPS)

Primitives for Active Topology IMC 2010 22 / 22