Handling Failures in Cyber-Physical Systems: Potential Directions - - PowerPoint PPT Presentation

▶

Mar 12, 2023 135 likes •345 views

Handling Failures in Cyber-Physical Systems: Potential Directions Taylor Johnson and Sayan Mitra Coordinated Science Laboratory University of Illinois at Urbana-Champaign Real-Time Systems Symposium (RTSS) 2009 December 1, 2009 Motivational

SLIDE 1

Handling Failures in Cyber-Physical Systems: Potential Directions

Taylor Johnson and Sayan Mitra

Coordinated Science Laboratory University of Illinois at Urbana-Champaign Real-Time Systems Symposium (RTSS) 2009

December 1, 2009

SLIDE 2

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time

SLIDE 3

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1

SLIDE 4

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures?

SLIDE 5

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures? CPS can suffer the previous failures and many more!

SLIDE 6

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1 Natural question: how many processes are required to tolerate both f crash failures and t Byzantine failures? CPS can suffer the previous failures and many more! Interdisciplinary research problem Develop failure detection and mitigation methods for cyber-physical systems

SLIDE 7

Outline

1

Introduction

2

Research problem

3

Potential Directions

SLIDE 8

Cyber-physical fault interaction

Safe Physical state cyber fault Unsafe

SLIDE 9

Cyber-physical fault interaction

Safe Cyber state physical fault Unsafe

SLIDE 10

Cyber-physical fault interaction

Safe Safe Physical state Cyber state Safe cyber fault Cyber state cyber f lt physical fault physical fault fault Unsafe fault Unsafe fault

SLIDE 11

Classes of failures

Cyber (software) failures Distributed computing: crash; Byzantine General: bugs Real-time systems: timing (missing deadlines) Physical failures Sensor; actuator and control surface Robustness Failures between cyber and physical Communications Occurrence Single, permanent, transient, intermittent, or incessant

SLIDE 12

Prior work

Example solutions Simplex architecture Giotto Etherware

SLIDE 13

Prior work

Example solutions Simplex architecture Giotto Etherware Common theme: solutions through abstraction!

SLIDE 14

Handling failures: active versus passive

Active (non-masking) Failure detectors Reliable failure detectors from unreliable processes ⇒ reliable systems from unreliable components (e.g., COTS, processes, stochastic processors, robustness, etc.)? Fault detection and isolation (FDI) Passive (masking) Redundancy from the consensus example Self-stabilizing algorithms ⇒ self-stabilizing systems?

SLIDE 15

Self-stabilizing algorithms

fault closure fault Not Legal Legal convergence

SLIDE 16

Self-stabilizing systems?

fault fault Poor performance Good performance performance performance closure convergence closure Safe

SLIDE 17

Formal methods and verification

Motivation Why formal methods? Provable guarantees Successfully applied in a variety of problems Maturing tools and formalisms Useful concepts Abstraction Compositional reasoning Temporal logic and verification Actor model

SLIDE 18

Challenges and questions

Model cyber and physical faults in such a way that they can be decoupled from one another, if possible

Must make any solutions compositional to avoid explosion

f interaction cases

Complexity of analyzing all these fault sources simultaneously must be reduced: how does one fault influence another influence another is intractable

Impossibility results Formal methods challenges ([Emerson, Clarke, and Sifakis, “Model checking: algorithmic verification and debugging”, Nov. 2009]): model checking for (a) software, (b) real-time systems, (c) hybrid systems, (d) probabilistic systems, and compositional model checking Lots of work to be done, but many interesting directions!

SLIDE 19

Handling Failures in Cyber-Physical Systems: Potential Directions

Taylor Johnson and Sayan Mitra

Coordinated Science Laboratory University of Illinois at Urbana-Champaign Real-Time Systems Symposium (RTSS) 2009

December 1, 2009

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time

Motivational example from distributed computing

Consensus (synchronous) Every process has an input and all non-faulty ones must decide on a common value in finite time in spite of failures processes (at least) rounds f crash failures f + 1 f + 1 t Byzantine failures 3t + 1 t + 1

Motivational example from distributed computing

Motivational example from distributed computing

Motivational example from distributed computing

Outline

1

Introduction

2

Research problem

3

Potential Directions

Cyber-physical fault interaction

Safe Physical state cyber fault Unsafe

Cyber-physical fault interaction

Safe Cyber state physical fault Unsafe

Cyber-physical fault interaction

Safe Safe Physical state Cyber state Safe cyber fault Cyber state cyber f lt physical fault physical fault fault Unsafe fault Unsafe fault

Classes of failures

Prior work

Example solutions Simplex architecture Giotto Etherware

Prior work

Example solutions Simplex architecture Giotto Etherware Common theme: solutions through abstraction!

Handling failures: active versus passive

Self-stabilizing algorithms

fault closure fault Not Legal Legal convergence

Self-stabilizing systems?

fault fault Poor performance Good performance performance performance closure convergence closure Safe

Formal methods and verification

Motivation Why formal methods? Provable guarantees Successfully applied in a variety of problems Maturing tools and formalisms Useful concepts Abstraction Compositional reasoning Temporal logic and verification Actor model

Challenges and questions

Model cyber and physical faults in such a way that they can be decoupled from one another, if possible

Must make any solutions compositional to avoid explosion

Complexity of analyzing all these fault sources simultaneously must be reduced: how does one fault influence another influence another is intractable

Thank you and questions

Questions Hopefully there are lots of questions to motivate the discussion!