SLIDE 1
Handling Failures in Cyber-Physical Systems: Potential Directions - - PowerPoint PPT Presentation
Handling Failures in Cyber-Physical Systems: Potential Directions - - PowerPoint PPT Presentation
Handling Failures in Cyber-Physical Systems: Potential Directions Taylor Johnson and Sayan Mitra Coordinated Science Laboratory University of Illinois at Urbana-Champaign Real-Time Systems Symposium (RTSS) 2009 December 1, 2009 Motivational
SLIDE 2
SLIDE 3
Motivational example from distributed computing
Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1
SLIDE 4
Motivational example from distributed computing
Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures?
SLIDE 5
Motivational example from distributed computing
Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures? CPS can suffer the previous failures and many more!
SLIDE 6
Motivational example from distributed computing
Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures? CPS can suffer the previous failures and many more! Interdisciplinary research problem Develop failure detection and mitigation methods for cyber-physical systems
SLIDE 7
Outline
1
Introduction
2
Research problem
3
Potential Directions
SLIDE 8
Cyber-physical fault interaction
Safe Physical state cyber fault Unsafe
SLIDE 9
Cyber-physical fault interaction
Safe Cyber state physical fault Unsafe
SLIDE 10
Cyber-physical fault interaction
Safe Safe Physical state Cyber state Safe cyber fault Cyber state cyber f lt physical fault physical fault fault Unsafe fault Unsafe fault
SLIDE 11
Classes of failures
Cyber (software) failures Distributed computing: crash; Byzantine General: bugs Real-time systems: timing (missing deadlines) Physical failures Sensor; actuator and control surface Robustness Failures between cyber and physical Communications Occurrence Single, permanent, transient, intermittent, or incessant
SLIDE 12
Prior work
Example solutions Simplex architecture Giotto Etherware
SLIDE 13
Prior work
Example solutions Simplex architecture Giotto Etherware Common theme: solutions through abstraction!
SLIDE 14
Handling failures: active versus passive
Active (non-masking) Failure detectors Reliable failure detectors from unreliable processes ⇒ reliable systems from unreliable components (e.g., COTS, processes, stochastic processors, robustness, etc.)? Fault detection and isolation (FDI) Passive (masking) Redundancy from the consensus example Self-stabilizing algorithms ⇒ self-stabilizing systems?
SLIDE 15
Self-stabilizing algorithms
fault closure fault Not Legal Legal convergence
SLIDE 16
Self-stabilizing systems?
fault fault Poor performance Good performance performance performance closure convergence closure Safe
SLIDE 17
Formal methods and verification
Motivation Why formal methods? Provable guarantees Successfully applied in a variety of problems Maturing tools and formalisms Useful concepts Abstraction Compositional reasoning Temporal logic and verification Actor model
SLIDE 18
Challenges and questions
Model cyber and physical faults in such a way that they can be decoupled from one another, if possible
Must make any solutions compositional to avoid explosion
- f interaction cases
Complexity of analyzing all these fault sources simultaneously must be reduced: how does one fault influence another influence another is intractable
Impossibility results Formal methods challenges ([Emerson, Clarke, and Sifakis, “Model checking: algorithmic verification and debugging”, Nov. 2009]): model checking for (a) software, (b) real-time systems, (c) hybrid systems, (d) probabilistic systems, and compositional model checking Lots of work to be done, but many interesting directions!
SLIDE 19