Handling Failures in Cyber-Physical Systems: Potential Directions - - PowerPoint PPT Presentation

handling failures in cyber physical systems potential
SMART_READER_LITE
LIVE PREVIEW

Handling Failures in Cyber-Physical Systems: Potential Directions - - PowerPoint PPT Presentation

Handling Failures in Cyber-Physical Systems: Potential Directions Taylor Johnson and Sayan Mitra Coordinated Science Laboratory University of Illinois at Urbana-Champaign Real-Time Systems Symposium (RTSS) 2009 December 1, 2009 Motivational


slide-1
SLIDE 1

Handling Failures in Cyber-Physical Systems: Potential Directions

Taylor Johnson and Sayan Mitra

Coordinated Science Laboratory University of Illinois at Urbana-Champaign Real-Time Systems Symposium (RTSS) 2009

December 1, 2009

slide-2
SLIDE 2

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time

slide-3
SLIDE 3

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1

slide-4
SLIDE 4

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures?

slide-5
SLIDE 5

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures? CPS can suffer the previous failures and many more!

slide-6
SLIDE 6

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures? CPS can suffer the previous failures and many more! Interdisciplinary research problem Develop failure detection and mitigation methods for cyber-physical systems

slide-7
SLIDE 7

Outline

1

Introduction

2

Research problem

3

Potential Directions

slide-8
SLIDE 8

Cyber-physical fault interaction

Safe Physical state cyber fault Unsafe

slide-9
SLIDE 9

Cyber-physical fault interaction

Safe Cyber state physical fault Unsafe

slide-10
SLIDE 10

Cyber-physical fault interaction

Safe Safe Physical state Cyber state Safe cyber fault Cyber state cyber f lt physical fault physical fault fault Unsafe fault Unsafe fault

slide-11
SLIDE 11

Classes of failures

Cyber (software) failures Distributed computing: crash; Byzantine General: bugs Real-time systems: timing (missing deadlines) Physical failures Sensor; actuator and control surface Robustness Failures between cyber and physical Communications Occurrence Single, permanent, transient, intermittent, or incessant

slide-12
SLIDE 12

Prior work

Example solutions Simplex architecture Giotto Etherware

slide-13
SLIDE 13

Prior work

Example solutions Simplex architecture Giotto Etherware Common theme: solutions through abstraction!

slide-14
SLIDE 14

Handling failures: active versus passive

Active (non-masking) Failure detectors Reliable failure detectors from unreliable processes ⇒ reliable systems from unreliable components (e.g., COTS, processes, stochastic processors, robustness, etc.)? Fault detection and isolation (FDI) Passive (masking) Redundancy from the consensus example Self-stabilizing algorithms ⇒ self-stabilizing systems?

slide-15
SLIDE 15

Self-stabilizing algorithms

fault closure fault Not Legal Legal convergence

slide-16
SLIDE 16

Self-stabilizing systems?

fault fault Poor performance Good performance performance performance closure convergence closure Safe

slide-17
SLIDE 17

Formal methods and verification

Motivation Why formal methods? Provable guarantees Successfully applied in a variety of problems Maturing tools and formalisms Useful concepts Abstraction Compositional reasoning Temporal logic and verification Actor model

slide-18
SLIDE 18

Challenges and questions

Model cyber and physical faults in such a way that they can be decoupled from one another, if possible

Must make any solutions compositional to avoid explosion

  • f interaction cases

Complexity of analyzing all these fault sources simultaneously must be reduced: how does one fault influence another influence another is intractable

Impossibility results Formal methods challenges ([Emerson, Clarke, and Sifakis, “Model checking: algorithmic verification and debugging”, Nov. 2009]): model checking for (a) software, (b) real-time systems, (c) hybrid systems, (d) probabilistic systems, and compositional model checking Lots of work to be done, but many interesting directions!

slide-19
SLIDE 19

Thank you and questions

Questions Hopefully there are lots of questions to motivate the discussion!