pinpointing delay and forwarding anomalies using large
play

Pinpointing Delay and Forwarding Anomalies Using Large-Scale - PowerPoint PPT Presentation

Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements Romain Fontugne 1 , Emile Aben 2 , Cristel Pelsser 3 , Randy Bush 1 November 1, 2017 1 IIJ Research Lab, 2 RIPE NCC, 3 University of Strasbourg / CNRS 1 / 25


  1. Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements Romain Fontugne 1 , Emile Aben 2 , Cristel Pelsser 3 , Randy Bush 1 November 1, 2017 1 IIJ Research Lab, 2 RIPE NCC, 3 University of Strasbourg / CNRS 1 / 25

  2. Understanding Internet health? 2 / 25

  3. Understanding Internet health? 2 / 25

  4. Understanding Internet health? 2 / 25

  5. Understanding Internet health? (Problems) Manual observations • Traceroute / Ping / Operators’ group mailing lists • Slow process • Small visibility 3 / 25

  6. Understanding Internet health? (Problems) Manual observations • Traceroute / Ping / Operators’ group mailing lists • Slow process • Small visibility → Our goal: Systematically pinpoint network disruptions • Delay changes • Forwarding anomalies (not covered here, see the paper) 3 / 25

  7. Silly solution: frequent traceroutes to the whole Internet! → Doesn’t scale → Overload the network 4 / 25

  8. Better solution: mine results from deployed platforms → Cooperative and distributed approach → Using existing data, no added burden to the network 5 / 25

  9. RIPE Atlas Actively measures Internet connectivity • Multiple types of measurement: ping, traceroute , DNS, SSL, NTP and HTTP • 10 000 active probes! • Data for numerous measurements is made publicly available 6 / 25

  10. RIPE Atlas: traceroutes Two repetitive large-scale measurements • Builtin : traceroute every 30 minutes to all DNS root servers ( ≈ 500 server instances) • Anchoring : traceroute every 15 minutes to 189 collaborative servers Analyzed dataset • May to December 2015 • 2.8 billion IPv4 traceroutes • 1.2 billion IPv6 traceroutes 7 / 25

  11. Monitor delays with traceroute? Traceroutes from CZ to BD 300 250 200 RTT (ms) 150 100 50 Challenges: 0 0 6 8 10 2 4 12 Number of hops • Noisy data • Traffic asymmetry • Packet loss 8 / 25

  12. Monitor delays with traceroute? Traceroute to “www.target.com” Round Trip Time (RTT) between B and C? Report abnormal RTT between B and C? 9 / 25

  13. What is the RTT between B and C? ? Differential RTT : ∆ CB = RTT C - RTT B = RTT CB 10 / 25

  14. What is the RTT between B and C? RTT C - RTT B = RTT CB ? • No! • Traffic is asymmetric • RTT B and RTT C take different return paths! 11 / 25

  15. What is the RTT between B and C? RTT C - RTT B = RTT CB ? • No! • Traffic is asymmetric • RTT B and RTT C take different return paths! • Differential RTT : ∆ CB = RTT C − RTT B = d BC + e p 11 / 25

  16. Problem with differential RTT Monitoring ∆ CB over time: 30 20 ∆ RTT 10 0 Time → Delay change on BC? CD? DA? BA??? 12 / 25

  17. Proposed Approach: Use probes with different return paths Differential RTT: ∆ CB = x 0 13 / 25

  18. Proposed Approach: Use probes with different return paths Differential RTT: ∆ CB = { x 0 , x 1 } 13 / 25

  19. Proposed Approach: Use probes with different return paths Differential RTT: ∆ CB = { x 0 , x 1 , x 2 , x 3 , x 4 } 13 / 25

  20. Proposed Approach: Use probes with different return paths Differential RTT: ∆ CB = { x 0 , x 1 , x 2 , x 3 , x 4 } Median ∆ CB : • Stable if a few return paths delay change • Fluctuate if delay on BC changes 13 / 25

  21. Median Diff. RTT: Example Tier1 link, 2 weeks of data, 95 probes: 130.117.0.250 (Cogent, Zurich) - 154.54.38.50 (Cogent, Munich) 400 Differential RTT (ms) 300 Raw values 200 100 0 −100 −200 −300 −400 5.6 Differential RTT (ms) 5.4 5.2 5.0 Median Diff. RTT Normal Reference 4.8 5 5 5 5 5 5 5 1 1 1 1 1 1 1 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 4 6 8 0 2 4 0 0 0 0 1 1 1 n n n n n n n u u u u u u u J J J J J J J • Stable despite noisy RTTs • Conf. interval: Wilson score • Normally distributed • Normal ref.: exp. smooth. 14 / 25

  22. Detecting Delay Changes 72.52.92.14 (HE, Frankfurt) - 80.81.192.154 (DE-CIX (RIPE)) 30 Differential RTT (ms) 25 Median Diff. RTT 20 Normal Reference 15 Detected Anomalies 10 5 0 −5 −10 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D Significant RTT changes: Confidence interval not overlapping with the normal reference 15 / 25

  23. Results Analyzed dataset • Atlas builtin / anchoring measurements • From May to Dec. 2015 • Observed 262k IPv4 and 42k IPv6 links We found a lot of delay changes! Let’s see only two prominent examples 16 / 25

  24. Case study: DDoS on DNS root servers Two attacks: • Nov. 30th 2015 • Dec. 1st 2015 Almost all server are anycast • Congestion at the 531 sites? • Found 129 instances altered by the attacks 17 / 25

  25. Observed delay changes 193.0.14.129 (K -root) - 74.208.6.124 (1&1, Kansas City) 12 Differential RTT (ms) 10 Median Diff. RTT 8 Normal Reference 6 4 Detected Anomalies 2 0 −2 −4 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D • Certain servers are 72.52.92.14 (HE, Frankfurt) - 80.81.192.154 (DE-CIX (RIPE)) 30 Differential RTT (ms) 25 Median Diff. RTT affected only by one 20 Normal Reference 15 10 Detected Anomalies attack 5 0 −5 −10 • Continuous attack in 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D Russia 188.93.16.77 (Selectel, St. Petersburg) - 95.213.189.0 (Selectel, Moscow) 10 Differential RTT (ms) 9 8 7 6 5 4 3 2 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D 18 / 25

  26. Unaffected root servers 193.0.14.129 (K -root) - 212.191.229.90 (Poznan, PL) 0.14 Differential RTT (ms) 0.12 0.10 0.08 0.06 0.04 Median Diff. RTT 0.02 0.00 Normal Reference −0.02 −0.04 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 6 7 8 9 0 1 2 2 2 2 3 0 v v v v v c o o o o o e N N N N N D Very stable delay during the attacks • Thanks to anycast! • Far from the attackers 19 / 25

  27. Congested links for servers F, I, and K → Concentration of malicious traffic at IXPs 20 / 25

  28. Case study: Telekom Malaysia BGP leak 21 / 25

  29. Case study: Telekom Malaysia BGP leak 22 / 25

  30. Case study: Telekom Malaysia BGP leak 22 / 25

  31. Case study: Telekom Malaysia BGP leak 22 / 25

  32. Case study: Telekom Malaysia BGP leak Not only with Google... but about 170k prefixes! 22 / 25

  33. Congestion in Level3 Rerouted traffic has congested Level3 (120 reported links) • Example: 229ms increase between two routers in London! 67.16.133.130 - 67.17.106.150 350 Differential RTT (ms) 300 Median Diff. RTT 250 Normal Reference 200 Detected Anomalies 150 100 50 0 −50 5 5 5 5 5 5 1 1 1 1 1 1 0 0 0 0 0 0 2 2 2 2 2 2 8 9 0 1 2 3 0 0 1 1 1 1 n n n n n n u u u u u u J J J J J J 23 / 25

  34. Congestion in Level3 Reported links in London: Delay increase Delay & packet loss → Traffic staying within UK/Europe may also be altered 24 / 25

  35. Summary Detect and locate delay and forwarding anomalies in billions of traceroutes • Non-parametric and robust statistics • Diverse root causes: remote attacks, routing anomalies, etc... • Give a lot of new insights on reported events Online detection for network operators • http://ihr.iijlab.net/ 25 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend