safe ai for cps
play

Safe AI for CPS Andr Platzer Carnegie Mellon University Joint work - PowerPoint PPT Presentation

Safe AI for CPS Andr Platzer Carnegie Mellon University Joint work with Nathan Fulton Safety-Critical Systems "How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing Safety-Critical


  1. Safe AI for CPS André Platzer Carnegie Mellon University Joint work with Nathan Fulton

  2. Safety-Critical Systems "How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing

  3. Safety-Critical Systems "How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing

  4. This Talk Ensure the safety of Autonomous Cyber-Physical Systems. Best of both worlds: learning together with CPS safety Flexibility of learning • Guarantees of CPS formal methods • Diametrically opposed: flexibility+adaptability versus predictability+simplicity 1. Cyber-Physical Systems with Differential Dynamic Logic 2. Sandboxed reinforcement learning is provably safe 3. Model-update learning addresses uncertainty with multiple models

  5. Airborne Collision Avoidance System ACAS X Developed by FAA to replace current TCAS in aircraft ● Approximately optimizes MDP on a grid ● Advisory from lookup tables with 5D interpolation regions ● Identified safe region per advisory and proved in KeYmaera X ● dela y 1 6 2 3 5 1 4 STTT’17

  6. Comparison: ACAS X issues DNC h No change Following ownship 10,80 (ft) advisory DNC trajectory path 0 without DNC 10,400 intruder path 10,200 ownship 10,000 path with DNC 9,600 20 15 10 5 0 5 10 15 20 time to crossing (s) But CL1500 or no change would not lead to a collision

  7. Model-Based Verification Reinforcement Learning φ

  8. Model-Based Verification Reinforcement Learning pos < stopSign

  9. Model-Based Verification Reinforcement Learning ctrl pos < stopSign

  10. Model-Based Verification Reinforcement Learning ctrl pos < stopSign Approach : prove that control software achieves a specification with respect to a model of the physical system.

  11. Model-Based Verification Reinforcement Learning ctrl pos < stopSign Approach : prove that control software achieves a specification with respect to a model of the physical system.

  12. Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ●

  13. Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful”

  14. Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model ●

  15. Model-Based Verification Reinforcement Learning Act φ Benefits: Observe Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model. ●

  16. Model-Based Verification Reinforcement Learning Act φ Observe Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Automated analysis Optimal (effective) policies ● ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model. ●

  17. Model-Based Verification Reinforcement Learning Act φ Observe Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Automated analysis Optimal (effective) policies ● ● Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model. proof development ●

  18. Model-Based Verification Reinforcement Learning Act φ Observe Goal: Provably correct reinforcement learning Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Aomputational aids (ATP) Optimal (effective) policies ● ● Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model proof development ●

  19. Model-Based Verification Reinforcement Learning Act φ Observe Goal: Provably correct reinforcement learning Benefits: Benefits: 1. Learn Safety Strong safety guarantees No need for complete model ● ● 2. Learn a Safe Policy Aomputational aids (ATP) Optimal (effective) policies ● ● 3. Justify claims of safety Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model proof development ●

  20. Part I: Differential Dynamic Logic Trustworthy Proofs for Hybrid Systems

  21. Hybrid Programs x := t x=x 0 x=t y=y 0 y=y 0 z=z 0 z=z 0 ... ...

  22. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ...

  23. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ... If P is true: no change ?P If P is false: terminate

  24. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ... If P is true: no change ?P If P is false: terminate a* a ...a...

  25. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ... a ∪ b If P is true: no change ?P If P is false: terminate a* a ...a...

  26. Hybrid Programs a;b x := t x=x 0 x=t a;b y=y 0 y=y 0 z=z 0 z=z 0 a b ... ... a ∪ b If P is true: no change ?P If P is false: terminate x=F(0) ... x’=f(x) a* x=x 0 a ...a... ⋮ ... x=F(T) ...

  27. Approaching a Stopped Car Own Car Stopped Car Is this property true? [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel ≥ 0 & t ≤ T} }* ](pos <= stoppedCarPos)

  28. Approaching a Stopped Car Own Car Stopped Car Assuming we only accelerate when it’s safe to do so , is this property true? [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel ≥ 0 & t ≤ T} }* ](pos <= stoppedCarPos)

  29. Approaching a Stopped Car Own Car Stopped Car safeDistance(pos,vel,stoppedCarPos,B) if we also assume the system is safe initially: safeDistance(pos,vel,stoppedCarPos,B) → [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel ≥ 0 & t ≤ T} }* ](pos <= stoppedCarPos)

  30. Approaching a Stopped Car Own Car Stopped Car safeDistance(pos,vel,stoppedCarPos,B) safeDistance(pos,vel,stoppedCarPos,B) → [ { {accel ∪ brake}; t:=0; {pos’=vel,vel’=accel,t’=1 & vel ≥ 0 & t ≤ T} }* ](pos <= stoppedCarPos)

  31. The Fundamental Question Proofs give strong mathematical evidence of safety. Why would our program not work if we have a proof ?

  32. The Fundamental Question Why would our program not work if we have a proof ? 1. Was the proof correct?

  33. The Fundamental Question Why would our program not work if we have a proof ? 1. Was the proof correct? 2. Was the model accurate enough? ≠

  34. The Fundamental Question Why would our program not work if we have a proof ? 1. Was the proof correct? KeYmaera X 2. Was the model accurate enough? dI Tactic: DI Axiom: ODE & Controls Tooling [{x'=f&Q}]P↔([?Q]P←(Q→[{x'=f&Q}]P')) Example: [v’=r p v 2 -g,t’=1]v ≥ v 0 - gt ↔ Clever Bellerophon … ↔ Programs Side derivation: [v’:=r p v 2 -g][t’:=1]v’ ≥ -g*t’ ↔ (v ≥ v 0 - r p v 2 -g ≥ -g ↔ gt)’ ↔ H → r p ≥ 0 ... ↔ ... ↔ KyX qed Axioms ... H=r p ≥0 & r a ≥0 & g>0 & ...

  35. The Fundamental Question Why would our program not work if we have a proof ? 1. Was the proof correct? KeYmaera X 2. Was the model accurate enough? Safe RL dI Tactic: DI Axiom: ODE & Controls Tooling [{x'=f&Q}]P↔([?Q]P←(Q→[{x'=f&Q}]P')) Example: [v’=r p v 2 -g,t’=1]v ≥ v 0 - gt ↔ Clever Bellerophon … ↔ Programs Side derivation: [v’:=r p v 2 -g][t’:=1]v’ ≥ -g*t’ ↔ (v ≥ v 0 - r p v 2 -g ≥ -g ↔ gt)’ ↔ H → r p ≥ 0 ... ↔ ... ↔ KyX qed Axioms ... H=r p ≥0 & r a ≥0 & g>0 & ...

  36. Part II: Justified Speculative Control ≠ Safe reinforcement learning in partially modeled environments AAAI 2018

  37. Model-Based Verification Accurate, analyzable models often exist! { ∪ brake ∪ ?safeTurn; turn}; {?safeAccel;accel {pos’ = vel, vel’ = acc} }*

  38. Model-Based Verification Accurate , analyzable models often exist! { ∪ brake ∪ ?safeTurn; turn}; {?safeAccel;accel {pos’ = vel, vel’ = acc} discrete control }* Continuous motion

  39. Model-Based Verification Accurate , analyzable models often exist! { ∪ brake ∪ ?safeTurn; turn}; {?safeAccel;accel {pos’ = vel, vel’ = acc} discrete, non-deterministic }* Continuous control motion

  40. Model-Based Verification Accurate , analyzable models often exist! init → [ { ∪ brake ∪ ?safeTurn; turn}; { ?safeAccel;accel {pos’ = vel, vel’ = acc} }* ]pos < stopSign

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend