Understanding risk for network resilience Paul Smith, Marcus - - PowerPoint PPT Presentation

understanding risk for network resilience
SMART_READER_LITE
LIVE PREVIEW

Understanding risk for network resilience Paul Smith, Marcus - - PowerPoint PPT Presentation

Understanding risk for network resilience Paul Smith, Marcus Schoeller (NEC), and David Hutchison p.smith@comp.lancs.ac.uk Multi-service Networks July 2009 What is network resilience? The ability of a networked system to provide an


slide-1
SLIDE 1

Understanding risk for network resilience

Paul Smith, Marcus Schoeller (NEC), and David Hutchison p.smith@comp.lancs.ac.uk Multi-service Networks July 2009

slide-2
SLIDE 2

What is network resilience?

  • The ability of a networked system to provide an

acceptable level service in light of challenges

  • Challenges
  • Component faults
  • Hardware destruction
  • Human mistakes
  • Malicious attacks
  • ....
slide-3
SLIDE 3

Nothing comes for free

  • Providing network resilience has a cost
  • We need systems to do this
  • We are resource constrained
  • $£€, cpu cycles, bandwidth
slide-4
SLIDE 4

Need to prioritise and focus efforts

All challenges Most probable high-impact challenges

slide-5
SLIDE 5

Identifying critical challenges

Identify critical assets Determine cost of asset compromise Develop system understanding Identify challenges to the system Identify system faults Identify probability of failure 1 2 3 4

Determine measure of exposure

5 6 7

exposure = cost x probability

slide-6
SLIDE 6

What’s difficult about this?

  • Determining reliable measures for

challenge occurrence probabilities [and the probability of that leading to failure]

  • Quantifying the impact of a challenge
slide-7
SLIDE 7

Challenge: Getting reliable numbers

  • Off-line analysis
  • Advisories are useful (e.g., www.cert.org)
  • Fault and attack tree analysis
  • Issues of scalability because of complexity
  • Simulation
  • Need to develop good challenge and fault models
  • Record monitoring data from on-line system
  • Classify challenges using machine learning
  • Introduces resource and security concerns
slide-8
SLIDE 8

Need for on-line impact measures for automated mitigation

Detect Identify Determine Impact Mitigate Symptoms Understand

E.g., Monetary cost resource utilisation Effect on other services Loss of data

E.g., Service failures Network congestion Poor performance Anomalous traffic. Detection of symptoms Understand challenge root cause

slide-9
SLIDE 9

Conclusions

  • To make best use of limited resources, we need

to determine the high-impact challenges

  • Getting good numbers for challenge probabilities

and their impact is hard

  • Some on-line components necessary, which has

implications on system design

  • For more information see www.resumenet.eu