SLIDE 1 Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements
Romain Fontugne1, Emile Aben2, Cristel Pelsser3, Randy Bush1 November 1, 2017
1IIJ Research Lab, 2RIPE NCC, 3University of Strasbourg / CNRS
1 / 25
SLIDE 2
Understanding Internet health?
2 / 25
SLIDE 3
Understanding Internet health?
2 / 25
SLIDE 4
Understanding Internet health?
2 / 25
SLIDE 5 Understanding Internet health? (Problems)
Manual observations
- Traceroute / Ping / Operators’ group mailing lists
- Slow process
- Small visibility
3 / 25
SLIDE 6 Understanding Internet health? (Problems)
Manual observations
- Traceroute / Ping / Operators’ group mailing lists
- Slow process
- Small visibility
→ Our goal: Systematically pinpoint network disruptions
- Delay changes
- Forwarding anomalies (not covered here, see the paper)
3 / 25
SLIDE 7
Silly solution: frequent traceroutes to the whole Internet!
→ Doesn’t scale → Overload the network
4 / 25
SLIDE 8
Better solution: mine results from deployed platforms
→ Cooperative and distributed approach → Using existing data, no added burden to the network
5 / 25
SLIDE 9 RIPE Atlas
Actively measures Internet connectivity
- Multiple types of measurement:
ping, traceroute, DNS, SSL, NTP and HTTP
- 10 000 active probes!
- Data for numerous measurements
is made publicly available
6 / 25
SLIDE 10 RIPE Atlas: traceroutes
Two repetitive large-scale measurements
- Builtin: traceroute every 30 minutes to all DNS root servers
(≈ 500 server instances)
- Anchoring: traceroute every 15 minutes to 189 collaborative
servers Analyzed dataset
- May to December 2015
- 2.8 billion IPv4 traceroutes
- 1.2 billion IPv6 traceroutes
7 / 25
SLIDE 11 Monitor delays with traceroute?
Challenges:
asymmetry
2 4 6 8 10 12 Number of hops 50 100 150 200 250 300 RTT (ms)
Traceroutes from CZ to BD
8 / 25
SLIDE 12
Monitor delays with traceroute?
Traceroute to “www.target.com” Round Trip Time (RTT) between B and C? Report abnormal RTT between B and C?
9 / 25
SLIDE 13
What is the RTT between B and C?
Differential RTT: ∆CB =RTTC - RTTB
?
= RTTCB
10 / 25
SLIDE 14 What is the RTT between B and C?
RTTC - RTTB = RTTCB?
- No!
- Traffic is asymmetric
- RTTB and RTTC take different return paths!
11 / 25
SLIDE 15 What is the RTT between B and C?
RTTC - RTTB = RTTCB?
- No!
- Traffic is asymmetric
- RTTB and RTTC take different return paths!
- Differential RTT: ∆CB = RTTC − RTTB = dBC + ep
11 / 25
SLIDE 16 Problem with differential RTT
Monitoring ∆CB over time:
Time 10 20 30 ∆RTT
→ Delay change on BC? CD? DA? BA???
12 / 25
SLIDE 17
Proposed Approach: Use probes with different return paths
Differential RTT: ∆CB = x0
13 / 25
SLIDE 18
Proposed Approach: Use probes with different return paths
Differential RTT: ∆CB = {x0, x1}
13 / 25
SLIDE 19
Proposed Approach: Use probes with different return paths
Differential RTT: ∆CB = {x0, x1, x2, x3, x4}
13 / 25
SLIDE 20 Proposed Approach: Use probes with different return paths
Differential RTT: ∆CB = {x0, x1, x2, x3, x4} Median ∆CB:
- Stable if a few return paths delay change
- Fluctuate if delay on BC changes
13 / 25
SLIDE 21 Median Diff. RTT: Example
Tier1 link, 2 weeks of data, 95 probes:
−400 −300 −200 −100 100 200 300 400 Differential RTT (ms)
130.117.0.250 (Cogent, Zurich) - 154.54.38.50 (Cogent, Munich) Raw values
J u n 2 2 1 5 J u n 4 2 1 5 J u n 6 2 1 5 J u n 8 2 1 5 J u n 1 2 1 5 J u n 1 2 2 1 5 J u n 1 4 2 1 5 4.8 5.0 5.2 5.4 5.6 Differential RTT (ms)
Median Diff. RTT Normal Reference
- Stable despite noisy RTTs
- Normally distributed
- Conf. interval: Wilson score
- Normal ref.: exp. smooth.
14 / 25
SLIDE 22 Detecting Delay Changes
N
2 6 2 1 5 N
2 7 2 1 5 N
2 8 2 1 5 N
2 9 2 1 5 N
3 2 1 5 D e c 1 2 1 5 −10 −5 5 10 15 20 25 30 Differential RTT (ms)
72.52.92.14 (HE, Frankfurt) - 80.81.192.154 (DE-CIX (RIPE)) Median Diff. RTT Normal Reference Detected Anomalies
Significant RTT changes: Confidence interval not overlapping with the normal reference
15 / 25
SLIDE 23 Results
Analyzed dataset
- Atlas builtin/anchoring measurements
- From May to Dec. 2015
- Observed 262k IPv4 and 42k IPv6 links
We found a lot of delay changes! Let’s see only two prominent examples
16 / 25
SLIDE 24 Case study: DDoS on DNS root servers
Two attacks:
- Nov. 30th 2015
- Dec. 1st 2015
Almost all server are anycast
the 531 sites?
instances altered by the attacks
17 / 25
SLIDE 25 Observed delay changes
N
2 6 2 1 5 N
2 7 2 1 5 N
2 8 2 1 5 N
2 9 2 1 5 N
3 2 1 5 D e c 1 2 1 5 −4 −2 2 4 6 8 10 12 Differential RTT (ms)
193.0.14.129 (K-root) - 74.208.6.124 (1&1, Kansas City) Median Diff. RTT Normal Reference Detected Anomalies
N
2 6 2 1 5 N
2 7 2 1 5 N
2 8 2 1 5 N
2 9 2 1 5 N
3 2 1 5 D e c 1 2 1 5 −10 −5 5 10 15 20 25 30 Differential RTT (ms)
72.52.92.14 (HE, Frankfurt) - 80.81.192.154 (DE-CIX (RIPE)) Median Diff. RTT Normal Reference Detected Anomalies
N
2 6 2 1 5 N
2 7 2 1 5 N
2 8 2 1 5 N
2 9 2 1 5 N
3 2 1 5 D e c 1 2 1 5 2 3 4 5 6 7 8 9 10 Differential RTT (ms)
188.93.16.77 (Selectel, St. Petersburg) - 95.213.189.0 (Selectel, Moscow)
affected only by one attack
Russia
18 / 25
SLIDE 26 Unaffected root servers
N
2 6 2 1 5 N
2 7 2 1 5 N
2 8 2 1 5 N
2 9 2 1 5 N
3 2 1 5 D e c 1 2 1 5 −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Differential RTT (ms)
193.0.14.129 (K-root) - 212.191.229.90 (Poznan, PL) Median Diff. RTT Normal Reference
Very stable delay during the attacks
- Thanks to anycast!
- Far from the attackers
19 / 25
SLIDE 27
Congested links for servers F, I, and K
→ Concentration of malicious traffic at IXPs
20 / 25
SLIDE 28
Case study: Telekom Malaysia BGP leak
21 / 25
SLIDE 29
Case study: Telekom Malaysia BGP leak
22 / 25
SLIDE 30
Case study: Telekom Malaysia BGP leak
22 / 25
SLIDE 31
Case study: Telekom Malaysia BGP leak
22 / 25
SLIDE 32
Case study: Telekom Malaysia BGP leak
Not only with Google... but about 170k prefixes!
22 / 25
SLIDE 33 Congestion in Level3
Rerouted traffic has congested Level3 (120 reported links)
- Example: 229ms increase between two routers in London!
J u n 8 2 1 5 J u n 9 2 1 5 J u n 1 2 1 5 J u n 1 1 2 1 5 J u n 1 2 2 1 5 J u n 1 3 2 1 5 −50 50 100 150 200 250 300 350 Differential RTT (ms)
67.16.133.130 - 67.17.106.150 Median Diff. RTT Normal Reference Detected Anomalies
23 / 25
SLIDE 34 Congestion in Level3
Reported links in London:
Delay increase Delay & packet loss
→ Traffic staying within UK/Europe may also be altered
24 / 25
SLIDE 35 Summary
Detect and locate delay and forwarding anomalies in billions
- f traceroutes
- Non-parametric and robust statistics
- Diverse root causes: remote attacks, routing anomalies, etc...
- Give a lot of new insights on reported events
Online detection for network
- perators
- http://ihr.iijlab.net/
25 / 25