Analysis of link failures in an Analysis of link failures in an IP - - PowerPoint PPT Presentation

analysis of link failures in an analysis of link failures
SMART_READER_LITE
LIVE PREVIEW

Analysis of link failures in an Analysis of link failures in an IP - - PowerPoint PPT Presentation

Analysis of link failures in an Analysis of link failures in an IP backbone network IP backbone network Gianluca Iannaccone Gianluca Iannaccone Sprint ATL Sprint ATL joint work with: Chen-Nee Chuah, UC Davis Richard Mortier, Microsoft


slide-1
SLIDE 1

Analysis of link failures in an Analysis of link failures in an IP backbone network IP backbone network

Gianluca Iannaccone Gianluca Iannaccone Sprint ATL Sprint ATL

joint work with: Chen-Nee Chuah, UC Davis Richard Mortier, Microsoft Supratik Bhattacharyya, Sprint ATL Christophe Diot, Sprint ATL

slide-2
SLIDE 2

November 7th, 2002 Internet Measurement Workshop 2

Motivation Motivation

  • Today’s Service Level Agreements:

– Performance in terms of delay and packet loss – Availability in terms of “port availability”

  • Need to introduce a “service availability” metric:

– Would permit to compare VoIP/VPN services to standard telephone networks

Question: Question: “How often does a router have no forwarding “How often does a router have no forwarding information for any given destination prefix information for any given destination prefix?”

slide-3
SLIDE 3

November 7th, 2002 Internet Measurement Workshop 3

Methodology Methodology

  • Frequency and duration of link failures

– Recorded IS-IS routing updates – Python Rout(e)ing Toolkit to listen to failures – 4 months of data (Dec 2001 – Mar 2002) – U.S. inter-PoP links – Failures less than 24hrs long

slide-4
SLIDE 4

November 7th, 2002 Internet Measurement Workshop 4

Network Network-

  • wide Time Between Failures

wide Time Between Failures

Average: ~ 34min Average: ~ 34min 50%: ~ 3min 50%: ~ 3min

slide-5
SLIDE 5

November 7th, 2002 Internet Measurement Workshop 5

Breakdown by time of the day (EDT) Breakdown by time of the day (EDT)

Higher incidence of failures at night. Likely due to maintenance.

slide-6
SLIDE 6

November 7th, 2002 Internet Measurement Workshop 6

Causes of failures Causes of failures

  • Duration may give a hint
  • Some speculations:

– Long (>1hour): fiber cuts, severe failures – Medium (>10min): router/line card failures – Short (>1min): line card resets – Very Short (<1min): optical equipment

slide-7
SLIDE 7

November 7th, 2002 Internet Measurement Workshop 7

Does the duration give any hint? Does the duration give any hint?

~ 50% < 1min ~ 50% < 1min ~ 94% < 1hr ~ 94% < 1hr ~ 80% < 10min ~ 80% < 10min

slide-8
SLIDE 8

November 7th, 2002 Internet Measurement Workshop 8

Controlled failure experiment Controlled failure experiment

slide-9
SLIDE 9

November 7th, 2002 Internet Measurement Workshop 9

Impact of a failure: 7 steps to re Impact of a failure: 7 steps to re-

  • route traffic

route traffic

1. Detect link down <100ms 2. Wait to filter out transient flaps 2s 3. Wait before sending update out 50ms 4. Processing & flooding the update ~10ms/hop 5. Wait before computing SPF 5.5s 6. Compute shortest paths 100-400 ms

  • exp. protocol convergence:

5.1s / 5.9s 7. Update the routing tables ~20 pfx/ms

  • exp. service convergence:

1.5s / 2.1s

  • exp. total disruption:

6.6s / 8.0s

slide-10
SLIDE 10

November 7th, 2002 Internet Measurement Workshop 10

Conclusion Conclusion

  • Link failures are part of everyday operations
  • Majority of failures are short-lived
  • Disruption in packet forwarding depends on

– routing protocol dynamics and implementation – router architecture – too many timers and interactions among different components

  • Need to develop link failure model:

– define IP service availability – need more points (4 months are not enough)