Safe AI for CPS Andr Platzer Carnegie Mellon University Joint work - - PowerPoint PPT Presentation

safe ai for cps
SMART_READER_LITE
LIVE PREVIEW

Safe AI for CPS Andr Platzer Carnegie Mellon University Joint work - - PowerPoint PPT Presentation

Safe AI for CPS Andr Platzer Carnegie Mellon University Joint work with Nathan Fulton Safety-Critical Systems "How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing Safety-Critical


slide-1
SLIDE 1

Safe AI for CPS

André Platzer Carnegie Mellon University Joint work with Nathan Fulton

slide-2
SLIDE 2

Safety-Critical Systems

"How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing

slide-3
SLIDE 3

Safety-Critical Systems

"How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing

slide-4
SLIDE 4

Ensure the safety of Autonomous Cyber-Physical Systems. Best of both worlds: learning together with CPS safety

  • Flexibility of learning
  • Guarantees of CPS formal methods

Diametrically opposed: flexibility+adaptability versus predictability+simplicity

  • 1. Cyber-Physical Systems with Differential Dynamic Logic
  • 2. Sandboxed reinforcement learning is provably safe
  • 3. Model-update learning addresses uncertainty with multiple models

This Talk

slide-5
SLIDE 5

Airborne Collision Avoidance System ACAS X

  • Developed by FAA to replace current TCAS in aircraft
  • Approximately optimizes MDP on a grid
  • Advisory from lookup tables with 5D interpolation regions
  • Identified safe region per advisory and proved in KeYmaera X

1 1 2 3 4 5 6

dela y

STTT’17

slide-6
SLIDE 6

10,000 20 9,600

Comparison: ACAS X issues DNC

But CL1500 or no change would not lead to a collision

intruder path 10,200 10,400 10,80 h (ft) 5 10 15 20 5 10 15 time to crossing (s)

  • wnship

path without DNC

  • wnship

path with DNC Following advisory DNC No change trajectory

slide-7
SLIDE 7

Model-Based Verification

φ

Reinforcement Learning

slide-8
SLIDE 8

Model-Based Verification

pos < stopSign

Reinforcement Learning

slide-9
SLIDE 9

Model-Based Verification

pos < stopSign

Reinforcement Learning

ctrl

slide-10
SLIDE 10

Approach: prove that control software achieves a specification with respect to a model of the physical system.

Model-Based Verification

pos < stopSign

Reinforcement Learning

ctrl

slide-11
SLIDE 11

Approach: prove that control software achieves a specification with respect to a model of the physical system.

Model-Based Verification

pos < stopSign

Reinforcement Learning

ctrl

slide-12
SLIDE 12

Benefits:

  • Strong safety guarantees
  • Automated analysis

Model-Based Verification

φ

Reinforcement Learning

slide-13
SLIDE 13

Benefits:

  • Strong safety guarantees
  • Automated analysis

Drawbacks:

  • Control policies are typically

non-deterministic: answers “what is safe”, not “what is useful”

Model-Based Verification

φ

Reinforcement Learning

slide-14
SLIDE 14

Benefits:

  • Strong safety guarantees
  • Automated analysis

Drawbacks:

  • Control policies are typically

non-deterministic: answers “what is safe”, not “what is useful”

  • Assumes accurate model

Model-Based Verification

φ

Reinforcement Learning

slide-15
SLIDE 15

Benefits:

  • Strong safety guarantees
  • Automated analysis

Drawbacks:

  • Control policies are typically

non-deterministic: answers “what is safe”, not “what is useful”

  • Assumes accurate model.

Model-Based Verification

φ

Reinforcement Learning

Observe Act

slide-16
SLIDE 16

Benefits:

  • Strong safety guarantees
  • Automated analysis

Drawbacks:

  • Control policies are typically

non-deterministic: answers “what is safe”, not “what is useful”

  • Assumes accurate model.

Model-Based Verification

φ

Reinforcement Learning

Observe Act

Benefits:

  • No need for complete model
  • Optimal (effective) policies
slide-17
SLIDE 17

Benefits:

  • Strong safety guarantees
  • Automated analysis

Drawbacks:

  • Control policies are typically

non-deterministic: answers “what is safe”, not “what is useful”

  • Assumes accurate model.

Model-Based Verification

φ

Reinforcement Learning

Observe Act

Benefits:

  • No need for complete model
  • Optimal (effective) policies

Drawbacks:

  • No strong safety guarantees
  • Proofs are obtained and

checked by hand

  • Formal proofs = decades-long

proof development

slide-18
SLIDE 18

Benefits:

  • Strong safety guarantees
  • Aomputational aids (ATP)

Drawbacks:

  • Control policies are typically

non-deterministic: answers “what is safe”, not “what is useful”

  • Assumes accurate model

Model-Based Verification

φ

Reinforcement Learning

Observe Act

Benefits:

  • No need for complete model
  • Optimal (effective) policies

Drawbacks:

  • No strong safety guarantees
  • Proofs are obtained and

checked by hand

  • Formal proofs = decades-long

proof development

Goal: Provably correct reinforcement learning

slide-19
SLIDE 19

Benefits:

  • Strong safety guarantees
  • Aomputational aids (ATP)

Drawbacks:

  • Control policies are typically

non-deterministic: answers “what is safe”, not “what is useful”

  • Assumes accurate model

Model-Based Verification

φ

Reinforcement Learning

Observe Act

Benefits:

  • No need for complete model
  • Optimal (effective) policies

Drawbacks:

  • No strong safety guarantees
  • Proofs are obtained and

checked by hand

  • Formal proofs = decades-long

proof development

Goal: Provably correct reinforcement learning

  • 1. Learn Safety
  • 2. Learn a Safe Policy
  • 3. Justify claims of safety
slide-20
SLIDE 20

Part I: Differential Dynamic Logic

Trustworthy Proofs for Hybrid Systems

slide-21
SLIDE 21

Hybrid Programs

x := t

x=x0 y=y0 z=z0

...

x=t y=y0 z=z0

...

slide-22
SLIDE 22

Hybrid Programs

a;b

a;b a b

x := t

x=x0 y=y0 z=z0

...

x=t y=y0 z=z0

...

slide-23
SLIDE 23

Hybrid Programs

?P a;b

a;b a b If P is true: no change If P is false: terminate

x := t

x=x0 y=y0 z=z0

...

x=t y=y0 z=z0

...

slide-24
SLIDE 24

Hybrid Programs

?P a;b

a;b a b If P is true: no change If P is false: terminate

a*

a ...a...

x := t

x=x0 y=y0 z=z0

...

x=t y=y0 z=z0

...

slide-25
SLIDE 25

Hybrid Programs

a∪b ?P a;b

a;b a b If P is true: no change If P is false: terminate

a*

a ...a...

x := t

x=x0 y=y0 z=z0

...

x=t y=y0 z=z0

...

slide-26
SLIDE 26

Hybrid Programs

a∪b ?P a;b

a;b a b If P is true: no change If P is false: terminate

a* x’=f(x)

x=x0 ... x=F(0) ... x=F(T) ... ⋮ a ...a...

x := t

x=x0 y=y0 z=z0

...

x=t y=y0 z=z0

...

slide-27
SLIDE 27

Approaching a Stopped Car

Is this property true?

Stopped Car Own Car

[ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel≥0 & t≤T} }* ](pos <= stoppedCarPos)

slide-28
SLIDE 28

Approaching a Stopped Car

Assuming we only accelerate when it’s safe to do so, is this property true?

Stopped Car Own Car

[ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel≥0 & t≤T} }* ](pos <= stoppedCarPos)

slide-29
SLIDE 29

Approaching a Stopped Car

Stopped Car Own Car safeDistance(pos,vel,stoppedCarPos,B)

safeDistance(pos,vel,stoppedCarPos,B) → [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel≥0 & t≤T} }* ](pos <= stoppedCarPos) if we also assume the system is safe initially:

slide-30
SLIDE 30

safeDistance(pos,vel,stoppedCarPos,B) → [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel≥0 & t≤T} }* ](pos <= stoppedCarPos)

Approaching a Stopped Car

Stopped Car Own Car safeDistance(pos,vel,stoppedCarPos,B)

slide-31
SLIDE 31

The Fundamental Question

Proofs give strong mathematical evidence of safety. Why would our program not work if we have a proof?

slide-32
SLIDE 32

The Fundamental Question

Why would our program not work if we have a proof?

  • 1. Was the proof correct?
slide-33
SLIDE 33

The Fundamental Question

Why would our program not work if we have a proof?

  • 1. Was the proof correct?
  • 2. Was the model accurate enough?

slide-34
SLIDE 34

The Fundamental Question

Why would our program not work if we have a proof?

  • 1. Was the proof correct? KeYmaera X
  • 2. Was the model accurate enough?

DI Axiom: [{x'=f&Q}]P↔([?Q]P←(Q→[{x'=f&Q}]P')) Example: [v’=rpv2-g,t’=1]v ≥ v0 - gt ↔ … ↔ [v’:=rpv2-g][t’:=1]v’ ≥ -g*t’ ↔ rpv2-g ≥ -g ↔ H→rp≥0 Side derivation: (v ≥ v0 - gt)’ ↔ ...↔ ...↔ ... dI Tactic: H=rp≥0 & ra≥0 & g>0 & ...

Axioms

KyX

qed

ODE & Controls Tooling Clever Bellerophon Programs

slide-35
SLIDE 35

The Fundamental Question

Why would our program not work if we have a proof?

  • 1. Was the proof correct? KeYmaera X
  • 2. Was the model accurate enough? Safe RL

DI Axiom: [{x'=f&Q}]P↔([?Q]P←(Q→[{x'=f&Q}]P')) Example: [v’=rpv2-g,t’=1]v ≥ v0 - gt ↔ … ↔ [v’:=rpv2-g][t’:=1]v’ ≥ -g*t’ ↔ rpv2-g ≥ -g ↔ H→rp≥0 Side derivation: (v ≥ v0 - gt)’ ↔ ...↔ ...↔ ... dI Tactic: H=rp≥0 & ra≥0 & g>0 & ...

Axioms

KyX

qed

ODE & Controls Tooling Clever Bellerophon Programs

slide-36
SLIDE 36

Part II: Justified Speculative Control

Safe reinforcement learning in partially modeled environments

AAAI 2018

slide-37
SLIDE 37

Model-Based Verification

Accurate, analyzable models often exist!

{ {?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} }*

slide-38
SLIDE 38

Model-Based Verification

Accurate, analyzable models often exist!

{ {?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} }*

discrete control Continuous motion

slide-39
SLIDE 39

Model-Based Verification

Accurate, analyzable models often exist!

{ {?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} }*

discrete, non-deterministic control Continuous motion

slide-40
SLIDE 40

Model-Based Verification

Accurate, analyzable models often exist!

init → [{ { ?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} }*]pos < stopSign

slide-41
SLIDE 41

Model-Based Verification

Accurate, analyzable models often exist! formal verification gives strong safety guarantees

init → [{ { ?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} }*]pos < stopSign

slide-42
SLIDE 42

Model-Based Verification

Accurate, analyzable models often exist! formal verification gives strong safety guarantees

=

  • Computer-checked proofs
  • f safety specification.
slide-43
SLIDE 43

Model-Based Verification

Accurate, analyzable models often exist! formal verification gives strong safety guarantees

=

  • Computer-checked proofs
  • f safety specification
  • Formal proofs mapping

model to runtime monitors

slide-44
SLIDE 44

Model-Based Verification Isn’t Enough

Perfect, analyzable models don’t exist!

slide-45
SLIDE 45

Model-Based Verification Isn’t Enough

Perfect, analyzable models don’t exist!

{ { ?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} }*

How to implement? Only accurate sometimes

slide-46
SLIDE 46

Model-Based Verification Isn’t Enough

Perfect, analyzable models don’t exist!

{ { ?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {dx’=w*y, dy’=-w*x, ...} }*

How to implement? Only accurate sometimes

slide-47
SLIDE 47

Safe RL Contribution

Justified Speculative Control is an approach toward provably safe reinforcement learning that:

  • 1. learns to resolve nondeterminism without

sacrificing formal safety results

slide-48
SLIDE 48

Safe RL Contribution

Justified Speculative Control is an approach toward provably safe reinforcement learning that:

  • 1. learns to resolve nondeterminism without

sacrificing formal safety results

  • 2. allows and directs speculation whenever

model mismatches occur

slide-49
SLIDE 49

Learning to Resolve Non-determinism

Observe & compute reward Act

slide-50
SLIDE 50

Learning to Resolve Non-determinism

Observe & compute reward

accel ∪ brake U turn

slide-51
SLIDE 51

Learning to Resolve Non-determinism

Observe & compute reward

{accel,brake,turn}

slide-52
SLIDE 52

Learning to Resolve Non-determinism

Observe & compute reward

Policy

{accel,brake,turn}

slide-53
SLIDE 53

Learning to Resolve Non-determinism

Observe & compute reward

(safe?) Policy

{accel,brake,turn}

slide-54
SLIDE 54

Learning to Safely Resolve Non-determinism

Observe & compute reward

(safe?) Policy

Safety Monitor

Useful to stay safe during learning Crucial after deployment

slide-55
SLIDE 55

Learning to Safely Resolve Non-determinism

Observe & compute reward

(safe?) Policy

Safety Monitor

≠ “Trust Me”

slide-56
SLIDE 56

Learning to Safely Resolve Non-determinism

Observe & compute reward

(safe?) Policy

φ

Use a theorem prover to extract: (init→[{{accel∪brake};ODEs}*](safe))

φ

slide-57
SLIDE 57

Learning to Safely Resolve Non-determinism

Observe & compute reward

(safe?) Policy

φ

Use a theorem prover to extract: (init→[{{accel∪brake};ODEs}*](safe))

φ

slide-58
SLIDE 58

(safe?) Policy

Learning to Safely Resolve Non-determinism

Observe & compute reward

φ

Main Theorem: If the ODEs are accurate, then

  • ur formal proofs transfer from the non-

deterministic model to the learned (deterministic) policy

Use a theorem prover to extract: (init→[{{accel∪brake};ODEs}*](safe))

φ

slide-59
SLIDE 59

(safe?) Policy

Learning to Safely Resolve Non-determinism

Observe & compute reward

φ

Main Theorem: If the ODEs are accurate, then

  • ur formal proofs transfer from the non-

deterministic model to the learned (deterministic) policy via the model monitor.

Use a theorem prover to extract: (init→[{{accel∪brake};ODEs}*](safe))

φ

slide-60
SLIDE 60

What about the physical model?

Observe & compute reward

φ

Use a theorem prover to extract: (init→[{{accel∪brake};ODEs}*](safe))

φ

{pos’=vel,vel’=acc} ≠ (safe?) Policy

slide-61
SLIDE 61

What About the Physical Model?

Observe & compute reward {brake, accel, turn}

slide-62
SLIDE 62

What About the Physical Model?

Observe & compute reward {brake, accel, turn}

Model is accurate.

slide-63
SLIDE 63

What About the Physical Model?

Observe & compute reward {brake, accel, turn}

Model is accurate.

slide-64
SLIDE 64

What About the Physical Model?

Observe & compute reward {brake, accel, turn}

Model is accurate.

Model is inaccurate

slide-65
SLIDE 65

What About the Physical Model?

Observe & compute reward {brake, accel, turn}

Model is accurate.

Model is inaccurate Obstacle!

slide-66
SLIDE 66

What About the Physical Model?

Observe & compute reward {brake, accel, turn}

Expected Reality

slide-67
SLIDE 67

Speculation is Justified

Observe & compute reward {brake, accel, turn}

Expected (safe) Reality (crash!)

slide-68
SLIDE 68

Leveraging Verification Results to Learn Better

Observe & compute reward {brake, accel, turn}

Use a real-valued version of the model monitor as a reward signal

slide-69
SLIDE 69

Safe RL: How?

Details: ☐ Detect modeled vs unmodeled state space correctly at runtime. ☐ Convert monitors into reward signals

slide-70
SLIDE 70

Detecting unmodeled State Space

The ModelPlex algorithm, implemented using Bellerophon, generates verified runtime monitors.

[x:=t]f(x) ↔ f(t) [a;b]P ↔ [a][b]P [a∪b]P ↔ ([a]P & [b]P) [x’=f&Q]P → (Q → P) ...

AXIOM BASE

KeYmaera X Core

Q.E.D. Programming Languages

Standard Library

ModelPlex

slide-71
SLIDE 71

Detecting unmodeled State Space

  • ldPos := read_sensor(GPS)

actuate(accel) newPos := read_sensor(GPS) if (∃t. model_after(t) == newPos): # No model deviation. else: # Model deviation…?

slide-72
SLIDE 72

Detecting unmodeled State Space

  • ldPos := read_sensor(GPS)

actuate(accel) newPos := read_sensor(GPS) if (∃t. model_after(t) == newPos): # No model deviation. else: # Model deviation…?

slide-73
SLIDE 73

Detecting unmodeled State Space

  • ldPos := read_sensor(GPS)

actuate(accel) newPos := read_sensor(GPS) if (QE(∃t. model_after(t) == newPos)): # No model deviation. else: # Model deviation…?

slide-74
SLIDE 74

Safe RL: How?

Details: Runtime monitoring separates modeled from unmodeled state space. ☐ Convert monitors into reward signals

slide-75
SLIDE 75

Safe RL: How?

Details: Runtime monitoring separates modeled from unmodeled state space. ☐ Convert monitors into reward signals: (ℝn→") → (ℝn→ℝ)!?

slide-76
SLIDE 76

An Example

init → [{ {?safeAccel;accel ∪ brake ∪ ?safeMaint; maintVel}; {pos’ = vel, vel’ = acc, t’=1} }*]safe

slide-77
SLIDE 77

An Example Monitor

init → [{ {?safeAccel;accel ∪ brake ∪ ?safeMaintain; maintainVel}; {pos’ = vel, vel’ = acc, t’=1} }*]safe (tpost >= 0 ∧ apost = acc ∧ vpost = acc tpost + v ∧ ppost = acc tpost2/2 + v tpost + p) ∨ (tpost >= 0 ∧ apost = 0 ∧ vpost = v ∧ ppost = vtpost + p) ∨ Etc.

slide-78
SLIDE 78

An Example Monitor

init → [{ {?safeAccel;accel ∪ brake ∪ ?safeMaintain; maintainVel}; {pos’ = vel, vel’ = acc, t’=1} }*]safe (tpost >= 0 ∧ apost = accel ∧ vpost = acc tpost + v ∧ ppost = acc tpost2/2 + v tpost + p) ∨ (tpost >= 0 ∧ apost = 0 ∧ vpost = v ∧ ppost = vtpost + p) ∨ Etc.

slide-79
SLIDE 79

An Example Monitor

init → [{ {?safeAccel;accel ∪ brake ∪ ?safeMaintain; maintainVel}; {pos’ = vel, vel’ = acc, t’=1} }*]safe (tpost >= 0 ∧ apost = acc ∧ vpost = accel tpost + v ∧ ppost = acc tpost2/2 + v tpost + p) ∨ (tpost >= 0 ∧ apost = 0 ∧ vpost = v ∧ ppost = vtpost + p) ∨ Etc.

slide-80
SLIDE 80

An Example Monitor

init → [{ {?safeAccel;accel ∪ brake ∪ ?safeMaintain; maintainVel}; {pos’ = vel, vel’ = acc, t’=1} }*]safe (tpost >= 0 ∧ apost = acc ∧ vpost = accel tpost + v ∧ ppost = acc tpost2/2 + v tpost + p) ∨ (tpost >= 0 ∧ apost = 0 ∧ vpost = v ∧ ppost = vtpost + p) ∨ Etc.

  • Q.E. for RCF
  • ODE solutions backed

by proofs

Quantitative monitor as reward signal

slide-81
SLIDE 81

Safe RL: How?

Details: Runtime monitoring separates modeled from unmodeled state space. Convert monitors into gradients: (ℝn→") → (ℝn→ℝ)

slide-82
SLIDE 82

Safe RL: How?

Details: Runtime monitoring separates modeled from unmodeled state space. Convert models into gradients: ModelPlex (ℝn→") → (ℝn→ℝ)

slide-83
SLIDE 83

Learning to Safely Resolve Non-determinism

Observe & compute reward

(safe?) Policy

φ

Use a theorem prover to extract: (init→[{{accel∪brake};ODEs}*](safe))

φ

slide-84
SLIDE 84

Learning to Safely Handle Multiple Models

Observe & compute reward

(safe?) Policy

(init→[{ctrli;ODEi}*](safe))

φi

slide-85
SLIDE 85

Learning to Safely Handle Multiple Models

Observe & compute reward

(safe?) Policy

φ1

(init→[{ctrli;ODEi}*](safe))

φi φ2 φn

slide-86
SLIDE 86

Learning to Safely Handle Multiple Models

Observe & compute reward

(safe?) Policy

φ1

(init→[{ctrli;ODEi}*](safe))

φi φ2 φn

Monitor Conjunction

  • f all plausible models
slide-87
SLIDE 87

Learning to Safely Handle Multiple Models

Observe & compute reward

(safe?) Policy

φ1

(init→[{ctrli;ODEi}*](safe))

φi φ2 φn

Differentiating Experiment

slide-88
SLIDE 88

Learning to Safely Handle Multiple Models

Observe & compute reward

(safe?) Policy

(init→[{ctrli;ODEi}*](safe))

φi φ2 φn

Differentiating Experiment

slide-89
SLIDE 89

Learning to Safely Handle Multiple Models

Observe & compute reward

(safe?) Policy

(init→[{ctrli;ODEi}*](safe))

φi φn

Differentiating Experiment

slide-90
SLIDE 90

Conclusion

KeYmaera X + Justified Speculative Control provide strong safety guarantees for learning-enabled CPS.

  • 1. Was the proof correct?
  • 2. Was the model accurate enough?

slide-91
SLIDE 91

Conclusion

DI Axiom: [{x'=f&Q}]P↔([?Q]P←(Q→[{x'=f&Q}]P')) Example: [v’=rpv2-g,t’=1]v ≥ v0 - gt ↔ … ↔ [v’:=rpv2-g][t’:=1]v’ ≥ -g*t’ ↔ rpv2-g ≥ -g ↔ H→rp≥0 Side derivation: (v ≥ v0 - gt)’ ↔ ...↔ ...↔ ... dI Tactic: H=rp≥0 & ra≥0 & g>0 & ...

Axioms

KyX

qed

ODE & Controls Tooling Clever Bellerophon Programs

KeYmaera X + Justified Speculative Control provide strong safety guarantees for learning-enabled CPS.

  • 1. Was the proof correct? KeYmaera X
  • 2. Was the model accurate enough?
slide-92
SLIDE 92

Conclusion

=

KeYmaera X + Justified Speculative Control provide strong safety guarantees for learning-enabled CPS.

  • 1. Was the proof correct? KeYmaera X
  • 2. Was the model accurate enough? Justified Speculation

Get to here... ...from here

slide-93
SLIDE 93

Conclusion

KeYmaera X + Justified Speculative Control provide strong safety guarantees for learning-enabled CPS.

  • 1. Was the proof correct? KeYmaera X
  • 2. Was the model accurate enough? Justified Speculation
  • 3. With multiple possible models? µ-learning
  • 4. When off-model? Verification-preserving model update
slide-94
SLIDE 94

Acknowledgments

slide-95
SLIDE 95