Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques
Confidence Weighted Marginal Utility Analyses of Internet Mapping - - PowerPoint PPT Presentation
Confidence Weighted Marginal Utility Analyses of Internet Mapping - - PowerPoint PPT Presentation
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Craig Prince and Danny Wyatt December 6, 2004, CSE561 Networks Confidence Weighted
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping What and Why
Internet Mapping
What is it?
Figure out what the internet looks like Find routers and their interconnections Discern a topology
What is it good for?
Research Simulations Problem diagnosis Routing in overlay networks Spying on competing ISPs
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping How
How do you map the internet?
Cannot directly observe it Have to send traceroutes through it What you see depends on
Source Target Routing policies ... and the topology itself!
Can only control source and target... Errors can occur that do not reflect true topology... Things change over time...
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping How
How do you map the internet?
Cannot directly observe it Have to send traceroutes through it What you see depends on
Source Target Routing policies ... and the topology itself!
Can only control source and target... Errors can occur that do not reflect true topology... Things change over time... ...so just add as many as you can
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping How
How do you map the internet?
Cannot directly observe it Have to send traceroutes through it What you see depends on
Source Target Routing policies ... and the topology itself!
Can only control source and target... Errors can occur that do not reflect true topology... Things change over time... ...so just add as many as you can Is more really better?
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Internet Mapping Aims
Aims
How do different mapping tools compare in their efficient use of data? Are some kinds of measurements more valuable than others? If we are uncertain of our observations, how would different methods address that uncertainty?
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Mapping Tools
The Data: 3 Mapping Tools
Skitter
24 distributed sources Each uses 1 or more of 4 lists of preselected target Continually loop through lists We use 3 days: 12/18-20, 2002
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Mapping Tools
The Data: 3 Mapping Tools
Skitter
24 distributed sources Each uses 1 or more of 4 lists of preselected target Continually loop through lists We use 3 days: 12/18-20, 2002
Scriptroute
70 distributed PlanetLab nodes Each used same list of 125,000 address prefixes Attempted all traces once a day for three days (same as above)
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Mapping Tools
The Data: 3 Mapping Tools
Skitter
24 distributed sources Each uses 1 or more of 4 lists of preselected target Continually loop through lists We use 3 days: 12/18-20, 2002
Scriptroute
70 distributed PlanetLab nodes Each used same list of 125,000 address prefixes Attempted all traces once a day for three days (same as above)
Rocketfuel
837 distributed public traceroute servers ≈ 60, 000 targets Heuristic pruning of source-target pairs to maximize coverage Data collected over January, 2002
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Methodology
Some Definitions
A map is a directed graph G = (V , E) There is some impossible, true map ˆ G = ( ˆ V , ˆ E) with 100% perfect coverage A map is made by aggregating many measurements
Sources Targets
Coverage is how well one map approximates another Marginal coverage is how much each measurement contributes to its map We evaluate the marginal coverage of each of the three tools
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Methodology
More Definitions: Confidence Weighting
Traceroutes are noisy sensors with probability of error d n(e) is number of observations of edge e ∈ E Probability that e exists is P(e) = 1 − dn(e) Edge coverage of G is mean probability of all edges:
P
e∈E P(e)
|E|
Node coverage is defined similarly For each analysis, also consider how it compares according to different values of d
Node Coverage per Source
200000 400000 600000 800000 1000000 1200000 1400000 5 10 15 20 25 Node Coverage Source Error Probability 0.0 0.3 0.5 0.9 Total Probed
Skitter
20000 40000 60000 80000 100000 120000 140000 10 20 30 40 50 60 70 Node Coverage Source Error Probability 0.0 0.3 0.5 0.9 Total Probed
Scriptroute
10000 20000 30000 40000 50000 60000 70000 80000 100 200 300 400 500 600 700 800 900 Node Coverage Source Error Probability 0.0 0.3 0.5 0.9 Total Probed
Rocketfuel
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Analyses Entropy
Entropy
H(A) =
- a∈A
−P(a) log(P(a)) Average number of bits needed to encode each event a We take the entropy of the mean node and edge distributions Should always be changing
Edge Entropy per Source
0.2 0.4 0.6 0.8 1 5 10 15 20 25 Edge Entropy Source Error Probability 0.0 0.3 0.5 0.9 Opt
Skitter
0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 Edge Entropy Source Error Probability 0.0 0.3 0.5 0.9 Opt
Scriptroute
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 100 200 300 400 500 600 700 800 900 Edge Entropy Source Error Probability 0.0 0.3 0.5 0.9 Opt
Rocketfuel
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Analyses K-L Divergence
Kullback-Leibler Divergence
KL(A||B) =
- a
pA(a) log pA(a) pB(a)
- Also known as relative entropy
Average extra bits per event for encoding according to the wrong distribution We measure divergence between coverage up to a measurement and final coverage Marginal utility is the decrease in K-L divergence between measurements
K-L Divergence per Source
0.5 1 1.5 2 2.5 3 3.5 4 5 10 15 20 25 Target KL Divergence Source Error Probability 0.0 0.3 0.5 0.9 Opt
Skitter
0.5 1 1.5 2 2.5 3 10 20 30 40 50 60 70 Edge KL Divergence Source Error Probability 0.0 0.3 0.5 0.9 Opt
Scriptroute
1 2 3 4 5 6 100 200 300 400 500 600 700 800 900 Edge KL Divergence Source Error Probability 0.0 0.3 0.5 0.9 Opt
Rocketfuel
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Analyses K-L Divergence
K-L Divergence per Target
1 2 3 4 5 6 7 8 5000 10000 15000 20000 25000 30000 35000 Edge KL Divergence Target Error Probability 0.0 0.3 0.5 0.9 Opt
Scriptroute
1 2 3 4 5 6 7 10000 20000 30000 40000 50000 60000 Edge KL Divergence Target Error Probability 0.0 0.3 0.5 0.9 Opt
Rocketfuel
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Conclusions
Conclusions
Adding targets is more useful than adding sources
Half of all coverage comes from the first few sources
Rocketfuel does increase its per measurement return
More targets always yield more information More sources have diminished returns, but higher than other tools
There is a pronounced trade off in confidence
Rocketfuel has more divergence between different error probabilities More redundant tools are less effected
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Conclusions
Conclusions
These metrics can be used as heuristics for quicker mapping Reordering the second two days of Skitter data according to the first day:
200000 400000 600000 800000 1000000 1200000 5 10 15 20 25 Node Coverage Source Error Probability 0.0 0.3 0.5 0.9 Total Probed
Day 1
200000 400000 600000 800000 1000000 1200000 5 10 15 20 25 Node Coverage Source Error Probability 0.0 0.3 0.5 0.9 Total Probed
Day 2 and 3, reordered
Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Conclusions