Pinpointing Delay and Forwarding Anomalies Using Large-Scale - - PowerPoint PPT Presentation

pinpointing delay and forwarding anomalies using large
SMART_READER_LITE
LIVE PREVIEW

Pinpointing Delay and Forwarding Anomalies Using Large-Scale - - PowerPoint PPT Presentation

Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements Romain Fontugne 1 , Emile Aben 2 , Cristel Pelsser 3 , Randy Bush 1 November 1, 2017 1 IIJ Research Lab, 2 RIPE NCC, 3 University of Strasbourg / CNRS 1 / 25


slide-1
SLIDE 1

Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements

Romain Fontugne1, Emile Aben2, Cristel Pelsser3, Randy Bush1 November 1, 2017

1IIJ Research Lab, 2RIPE NCC, 3University of Strasbourg / CNRS

1 / 25

slide-2
SLIDE 2

Understanding Internet health?

2 / 25

slide-3
SLIDE 3

Understanding Internet health?

2 / 25

slide-4
SLIDE 4

Understanding Internet health?

2 / 25

slide-5
SLIDE 5

Understanding Internet health? (Problems)

Manual observations

  • Traceroute / Ping / Operators’ group mailing lists
  • Slow process
  • Small visibility

3 / 25

slide-6
SLIDE 6

Understanding Internet health? (Problems)

Manual observations

  • Traceroute / Ping / Operators’ group mailing lists
  • Slow process
  • Small visibility

→ Our goal: Systematically pinpoint network disruptions

  • Delay changes
  • Forwarding anomalies (not covered here, see the paper)

3 / 25

slide-7
SLIDE 7

Silly solution: frequent traceroutes to the whole Internet!

→ Doesn’t scale → Overload the network

4 / 25

slide-8
SLIDE 8

Better solution: mine results from deployed platforms

→ Cooperative and distributed approach → Using existing data, no added burden to the network

5 / 25

slide-9
SLIDE 9

RIPE Atlas

Actively measures Internet connectivity

  • Multiple types of measurement:

ping, traceroute, DNS, SSL, NTP and HTTP

  • 10 000 active probes!
  • Data for numerous measurements

is made publicly available

6 / 25

slide-10
SLIDE 10

RIPE Atlas: traceroutes

Two repetitive large-scale measurements

  • Builtin: traceroute every 30 minutes to all DNS root servers

(≈ 500 server instances)

  • Anchoring: traceroute every 15 minutes to 189 collaborative

servers Analyzed dataset

  • May to December 2015
  • 2.8 billion IPv4 traceroutes
  • 1.2 billion IPv6 traceroutes

7 / 25

slide-11
SLIDE 11

Monitor delays with traceroute?

Challenges:

  • Noisy data
  • Traffic

asymmetry

  • Packet loss

2 4 6 8 10 12 Number of hops 50 100 150 200 250 300 RTT (ms)

Traceroutes from CZ to BD

8 / 25

slide-12
SLIDE 12

Monitor delays with traceroute?

Traceroute to “www.target.com” Round Trip Time (RTT) between B and C? Report abnormal RTT between B and C?

9 / 25

slide-13
SLIDE 13

What is the RTT between B and C?

Differential RTT: ∆CB =RTTC - RTTB

?

= RTTCB

10 / 25

slide-14
SLIDE 14

What is the RTT between B and C?

RTTC - RTTB = RTTCB?

  • No!
  • Traffic is asymmetric
  • RTTB and RTTC take different return paths!

11 / 25

slide-15
SLIDE 15

What is the RTT between B and C?

RTTC - RTTB = RTTCB?

  • No!
  • Traffic is asymmetric
  • RTTB and RTTC take different return paths!
  • Differential RTT: ∆CB = RTTC − RTTB = dBC + ep

11 / 25

slide-16
SLIDE 16

Problem with differential RTT

Monitoring ∆CB over time:

Time 10 20 30 ∆RTT

→ Delay change on BC? CD? DA? BA???

12 / 25

slide-17
SLIDE 17

Proposed Approach: Use probes with different return paths

Differential RTT: ∆CB = x0

13 / 25

slide-18
SLIDE 18

Proposed Approach: Use probes with different return paths

Differential RTT: ∆CB = {x0, x1}

13 / 25

slide-19
SLIDE 19

Proposed Approach: Use probes with different return paths

Differential RTT: ∆CB = {x0, x1, x2, x3, x4}

13 / 25

slide-20
SLIDE 20

Proposed Approach: Use probes with different return paths

Differential RTT: ∆CB = {x0, x1, x2, x3, x4} Median ∆CB:

  • Stable if a few return paths delay change
  • Fluctuate if delay on BC changes

13 / 25

slide-21
SLIDE 21

Median Diff. RTT: Example

Tier1 link, 2 weeks of data, 95 probes:

−400 −300 −200 −100 100 200 300 400 Differential RTT (ms)

130.117.0.250 (Cogent, Zurich) - 154.54.38.50 (Cogent, Munich) Raw values

J u n 2 2 1 5 J u n 4 2 1 5 J u n 6 2 1 5 J u n 8 2 1 5 J u n 1 2 1 5 J u n 1 2 2 1 5 J u n 1 4 2 1 5 4.8 5.0 5.2 5.4 5.6 Differential RTT (ms)

Median Diff. RTT Normal Reference

  • Stable despite noisy RTTs
  • Normally distributed
  • Conf. interval: Wilson score
  • Normal ref.: exp. smooth.

14 / 25

slide-22
SLIDE 22

Detecting Delay Changes

N

  • v

2 6 2 1 5 N

  • v

2 7 2 1 5 N

  • v

2 8 2 1 5 N

  • v

2 9 2 1 5 N

  • v

3 2 1 5 D e c 1 2 1 5 −10 −5 5 10 15 20 25 30 Differential RTT (ms)

72.52.92.14 (HE, Frankfurt) - 80.81.192.154 (DE-CIX (RIPE)) Median Diff. RTT Normal Reference Detected Anomalies

Significant RTT changes: Confidence interval not overlapping with the normal reference

15 / 25

slide-23
SLIDE 23

Results

Analyzed dataset

  • Atlas builtin/anchoring measurements
  • From May to Dec. 2015
  • Observed 262k IPv4 and 42k IPv6 links

We found a lot of delay changes! Let’s see only two prominent examples

16 / 25

slide-24
SLIDE 24

Case study: DDoS on DNS root servers

Two attacks:

  • Nov. 30th 2015
  • Dec. 1st 2015

Almost all server are anycast

  • Congestion at

the 531 sites?

  • Found 129

instances altered by the attacks

17 / 25

slide-25
SLIDE 25

Observed delay changes

N

  • v

2 6 2 1 5 N

  • v

2 7 2 1 5 N

  • v

2 8 2 1 5 N

  • v

2 9 2 1 5 N

  • v

3 2 1 5 D e c 1 2 1 5 −4 −2 2 4 6 8 10 12 Differential RTT (ms)

193.0.14.129 (K-root) - 74.208.6.124 (1&1, Kansas City) Median Diff. RTT Normal Reference Detected Anomalies

N

  • v

2 6 2 1 5 N

  • v

2 7 2 1 5 N

  • v

2 8 2 1 5 N

  • v

2 9 2 1 5 N

  • v

3 2 1 5 D e c 1 2 1 5 −10 −5 5 10 15 20 25 30 Differential RTT (ms)

72.52.92.14 (HE, Frankfurt) - 80.81.192.154 (DE-CIX (RIPE)) Median Diff. RTT Normal Reference Detected Anomalies

N

  • v

2 6 2 1 5 N

  • v

2 7 2 1 5 N

  • v

2 8 2 1 5 N

  • v

2 9 2 1 5 N

  • v

3 2 1 5 D e c 1 2 1 5 2 3 4 5 6 7 8 9 10 Differential RTT (ms)

188.93.16.77 (Selectel, St. Petersburg) - 95.213.189.0 (Selectel, Moscow)

  • Certain servers are

affected only by one attack

  • Continuous attack in

Russia

18 / 25

slide-26
SLIDE 26

Unaffected root servers

N

  • v

2 6 2 1 5 N

  • v

2 7 2 1 5 N

  • v

2 8 2 1 5 N

  • v

2 9 2 1 5 N

  • v

3 2 1 5 D e c 1 2 1 5 −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Differential RTT (ms)

193.0.14.129 (K-root) - 212.191.229.90 (Poznan, PL) Median Diff. RTT Normal Reference

Very stable delay during the attacks

  • Thanks to anycast!
  • Far from the attackers

19 / 25

slide-27
SLIDE 27

Congested links for servers F, I, and K

→ Concentration of malicious traffic at IXPs

20 / 25

slide-28
SLIDE 28

Case study: Telekom Malaysia BGP leak

21 / 25

slide-29
SLIDE 29

Case study: Telekom Malaysia BGP leak

22 / 25

slide-30
SLIDE 30

Case study: Telekom Malaysia BGP leak

22 / 25

slide-31
SLIDE 31

Case study: Telekom Malaysia BGP leak

22 / 25

slide-32
SLIDE 32

Case study: Telekom Malaysia BGP leak

Not only with Google... but about 170k prefixes!

22 / 25

slide-33
SLIDE 33

Congestion in Level3

Rerouted traffic has congested Level3 (120 reported links)

  • Example: 229ms increase between two routers in London!

J u n 8 2 1 5 J u n 9 2 1 5 J u n 1 2 1 5 J u n 1 1 2 1 5 J u n 1 2 2 1 5 J u n 1 3 2 1 5 −50 50 100 150 200 250 300 350 Differential RTT (ms)

67.16.133.130 - 67.17.106.150 Median Diff. RTT Normal Reference Detected Anomalies

23 / 25

slide-34
SLIDE 34

Congestion in Level3

Reported links in London:

Delay increase Delay & packet loss

→ Traffic staying within UK/Europe may also be altered

24 / 25

slide-35
SLIDE 35

Summary

Detect and locate delay and forwarding anomalies in billions

  • f traceroutes
  • Non-parametric and robust statistics
  • Diverse root causes: remote attacks, routing anomalies, etc...
  • Give a lot of new insights on reported events

Online detection for network

  • perators
  • http://ihr.iijlab.net/

25 / 25