Sandboxing Controllers for Stochastic Cyber-Physical Systems - - PowerPoint PPT Presentation

sandboxing controllers for stochastic cyber physical
SMART_READER_LITE
LIVE PREVIEW

Sandboxing Controllers for Stochastic Cyber-Physical Systems - - PowerPoint PPT Presentation

FORMATS 2019 Sandboxing Controllers for Stochastic Cyber-Physical Systems Bingzhuo Zhong, Technical University of Munich, Germany Majid Zamani, CU Boulder, USA & Ludwig Maximilian University of Munich, Germany Marco Caccamo, Technical


slide-1
SLIDE 1

Sandboxing Controllers for Stochastic Cyber-Physical Systems

FORMATS 2019, Amsterdam August 29, 2019

FORMATS 2019

Bingzhuo Zhong, Technical University of Munich, Germany Majid Zamani, CU Boulder, USA & Ludwig Maximilian University of Munich, Germany Marco Caccamo, Technical University of Munich, Germany

slide-2
SLIDE 2

Motivation

Chair of Cyber-Physical Systems in Production Engineering

2 In modern cyber-physical systems, lots of high performance, but unverified controllers are required to be used for complex tasks, e.g. deep neural network. To ensure the safety, we exploit the idea of sandbox from the community of computer security.

  • (Isolation) Restrict the behaviour of the untrusted component by isolating it from the critical part of a

digital controller.

  • (Supervision) It can only access the critical part when it follows the rules given by the sandboxing

mechanism. Sandboxing unverified controllers for functionality and safety

slide-3
SLIDE 3

Motivation

Chair of Cyber-Physical Systeams in Production Engineering

3 Sandboxing unverified controllers for functionality and safety In this work, we focus on

  • Discrete-time, stochastic systems, i.e., , where is a sequence of

(independent and) identical distributed random variables, possibly unbounded.

  • A typical specification: invariance.

In modern cyber-physical systems, lots of high performance, but unverified controllers are required to be used for complex tasks, e.g. deep neural network.

slide-4
SLIDE 4

Basic idea

Chair of Cyber-Physical Systems in Production Engineering

4

  • Only focus on safety, aim at maximizing the probability of safety
  • Check inputs from the unverified controller
  • Feeding input provided by safety advisor as fallback action once

input from the unverified control is hazardous Novelties:

  • Stochastic systems
  • Providing probabilistic guarantee for

fulfilling safety specification

  • More flexible for compromise between

safety probability and functionality

slide-5
SLIDE 5

Definition

Chair of Cyber-Physical Systems in Production Engineering

5

State space Input space Set of Input executable at state x Borel-measurable stochastic kernel

Controlled discrete time Markov process Invariance specification:The system is expected to stay within a safety set. Discrete time stochastic system For controlled discrete time Markov process:

  • Figure out Markov policy which
  • maximize the possibility for the system staying in the safety set or
  • minimize the possibility for the system reaching the unsafety set

in finite time horizon. We focus on the case where .

slide-6
SLIDE 6

Definition

Chair of Cyber-Physical Systems in Production Engineering

6

State space Input space Set of Input executable at state x Borel-measurable stochastic kernel

Controlled discrete time Markov process Invariance specification:The system is expected to stay within a safety set. Discrete time stochastic system For controlled discrete time Markov process:

  • Figure out Markov policy which
  • maximize the possibility for the system staying in the safety set or
  • minimize the possibility for the system reaching the unsafety set

in finite time horizon. We focus on the case where .

slide-7
SLIDE 7

Safety Advisor

Chair of Cyber-Physical Systems in Production Engineering

7

Controlled Markov process Markov decision process Discretization Bellman backward recursion Markov policy in finite time horizon

Safety advisor, providing input for each state at each time instant in the time horizon to maximize the safety probability Remarks:

  • Length of the time horizon is tunable regarding the selected maximal tolerable probability of reaching

unsafe states.

slide-8
SLIDE 8

Discretization of Controlled Markov process

Chair of Cyber-Physical Systems in Production Engineering

8

Controlled Markov process Markov decision process Discretization

X A

x1 x2 x3 xm ... ...

U

u1 u2 u3 uj ... ...

slide-9
SLIDE 9

Discretization of Controlled Markov process

Chair of Cyber-Physical Systems in Production Engineering

9

Controlled Markov process Markov decision process Discretization

X A

x1 x2 x3 xm ... ...

U

u1 u2 u3 uj ... ...

sink state

slide-10
SLIDE 10

Discretization of Controlled Markov process

Chair of Cyber-Physical Systems in Production Engineering

10

Controlled Markov process Markov decision process Discretization

X A

x1 x2 x3 xm ... ...

U

u1 u2 u3 uj ... ...

slide-11
SLIDE 11

Markov Policy in finite time horizon

Chair of Cyber-Physical Systems in Production Engineering

11 Given a time horizon H, the safety advisor (Markov Policy in finite time horizon) for the finite MDP is a matrix as the following: t=0 t=1 t=2 t=3 ...... t=H-2 t=H-1

x1 x2 xm-1 ... xm

...... ...... ...... ...... ......

... ... ... ... ... ... ...

Fill in all entries of the matrix.

Controlled Markov process Markov decision process Discretization Bellman backward recursion Markov policy in finite time horizon

where

slide-12
SLIDE 12

Markov Policy in finite time horizon

Chair of Cyber-Physical Systems in Production Engineering

12 To determine the proper input in each entry, the following value function is introduced: Then the safety advisor can be rucursively synthesized as the following: t=0 t=1 t=2 t=3 ...... t=H-2 t=H-1

x1 x2 xm-1 ... xm

...... ...... ...... ...... ......

... ... ... ... ... ... ...

initialized with

slide-13
SLIDE 13

initialized with The safety advisor

Markov Policy in finite time horizon

Chair of Cyber-Physical Systems in Production Engineering

13 t=0 t=1 t=2 t=3 ...... t=H-2 t=H-1

x1 x2 xm-1 ... xm

...... ...... ...... ...... ......

... ... ... ... ... ... ...

Remarks: indicates the probability of reaching the unsafe set within , i.e.,

slide-14
SLIDE 14

Markov Policy in finite time horizon

Chair of Cyber-Physical Systems in Production Engineering

14 t=0 t=1 t=2 t=3 ...... t=H-2 t=H-1

x1 x2 xm-1 ... xm

...... ...... ...... ...... ......

... ... ... ... ... ... ...

In our implementation, the time horizon of the Safety Advisor is determined in a way such that: where ρ is the maximal tolerable probability of reaching the unsafe set. and

slide-15
SLIDE 15

History-based Supervisor

Chair of Cyber-Physical Systems in Production Engineering

15 Key idea: at every time instant during the execution, check the feasibility of the inputs from unverified controller based on history path. Example: at time t = k, the history path up to time t = k is: where and .

slide-16
SLIDE 16

History-based Supervisor

Chair of Cyber-Physical Systems in Production Engineering

16 Noise is (i.i.d.) random variable If inputs from unverified controller is accepted In case that we keep using safety advisor afterwards At time t = k, given the history path up to time t = k: (or ) current input given by the unverified controller can only be accepted when the following inequality holds: Keep idea: At every time instant, make sure whether ρ can be respected by keep using safety advisor afterward.

slide-17
SLIDE 17

Case Study – Temperature Control Problem

Chair of Cyber-Physical Systems in Production Engineering

17 Considering a room is equipped with a heater, the dynamic of the system is

Safety specification : Sampling time period : 9 min Problem setting

Time horizon for the safety advisor: [0,40] (6h) Safety guarantee : 99%

: Temperature of the external environment : Temperature of the heater : Gaussian white noise : Conduction factor between the heater and the room : Conduction factor between the external environment and the room : The input to the room at time t : The temperature of the room at time t

slide-18
SLIDE 18

Case Study – Temperature Control Problem

Chair of Cyber-Physical Systems in Production Engineering

18

Initial state 19.01℃ Unverified controller u is 0 all the time Percentage of paths in safety set (with Safe-visor) 99.02% Average acceptance rate of unverified controller 19.12% Percentage of paths in safety set (without Safe-visor) 0% Percentage of paths in safety set (purely with Safety Advisor) 99.18% Average execution time for History-based Supervisor 33.42 μs Safety specification : Number of simulation :

slide-19
SLIDE 19

Case Study – Traffic Control Problem

Chair of Cyber-Physical Systems in Production Engineering

19 Considering a road traffic control containing a cell with 2 entries and 1 exit, the dynamic of the system is

Safety specification : Problem setting

Time horizon for the safety advisor: [0,8186] (13.64h) Safety guarantee : 99.95%

* in one sampling interval : Number of cars pass the entry without the traffic light* : Temperature of the external environment : Percentage of cars which leave the cell through the exit* : Number of cars pass the entry controlled by the traffic light* : Flow speed of the vehicle on the road : Sampling time interval of the system : The input to the room at time t (1 means the green light is on while 0 means the red light is on) : The density of traffic at time t : Gaussian white noise

slide-20
SLIDE 20

Case Study – Traffic Control Problem

Chair of Cyber-Physical Systems in Production Engineering

20

Initial state 9 Unverified controller u(t) = 1 when t is odd number, otherwise 0 Percentage of paths in safety set (with Safe-visor) 99.958% Average acceptance rate of unverified controller 8.5114% Percentage of paths in safety set (without Safe-visor) 0% Percentage of paths in safety set (purely with Safety Advisor) 99.989% Average execution time for History-based Supervisor 31.82 μs Safety specification : Number of simulation :

slide-21
SLIDE 21

Perspective

Chari of Cyber-Physical Systems in Production Engineering

21

Extending our method to 1) systems modeled by partially observable Markov decision process. 2) more general safety specification, e.g. co-safe linear temporal logic.

slide-22
SLIDE 22

Acknowledgements

Chari of Cyber-Physical Systems in Production Engineering

22

Funding:

  • H2020 ERC Starting Grant AutoCPS (grant agreement No 804639)
  • German Research Foundation (DFG) (grants ZA 873/1-1 and ZA 873/4-1).
  • German Federal Ministry of Education and Research & Alexander von Humboldt Foundation:

Alexander von Humboldt Professorship

slide-23
SLIDE 23

Q & A

Chari of Cyber-Physical Systems in Production Engineering

23