Safe Reinforcement Learning via Formal Methods Nathan Fulton and Andr - PowerPoint PPT Presentation

Safe Reinforcement Learning via Formal Methods Nathan Fulton and André Platzer Carnegie Mellon University

Safety-Critical Systems "How can we provide people with cyber-physical systems they can bet their lives on?" - Jeannette Wing

Autonomous Safety-Critical Systems How can we provide people with autonomous cyber-physical systems they can bet their lives on?

Model-Based Verification Reinforcement Learning φ

Model-Based Verification Reinforcement Learning pos < stopSign

Model-Based Verification Reinforcement Learning ctrl pos < stopSign

Model-Based Verification Reinforcement Learning ctrl pos < stopSign Approach : prove that control software achieves a specification with respect to a model of the physical system.

Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ●

Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful”

Model-Based Verification Reinforcement Learning φ Benefits: Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model ●

Model-Based Verification Reinforcement Learning Act φ Benefits: Observe Strong safety guarantees ● Automated analysis ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model. ●

Model-Based Verification Reinforcement Learning Act φ Observe Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Automated analysis Optimal (effective) policies ● ● Drawbacks: Control policies are typically ● non-deterministic: answers “what is safe”, not “what is useful” Assumes accurate model. ●

Model-Based Verification Reinforcement Learning Act φ Observe Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Automated analysis Optimal (effective) policies ● ● Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model. proof development ●

Model-Based Verification Reinforcement Learning Act φ Observe Goal: Provably correct reinforcement learning Benefits: Benefits: Strong safety guarantees No need for complete model ● ● Aomputational aids (ATP) Optimal (effective) policies ● ● Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model proof development ●

Model-Based Verification Reinforcement Learning Act φ Observe Goal: Provably correct reinforcement learning Benefits: Benefits: 1. Learn Safety Strong safety guarantees No need for complete model ● ● 2. Learn a Safe Policy Aomputational aids (ATP) Optimal (effective) policies ● ● 3. Justify claims of safety Drawbacks: Drawbacks: Control policies are typically No strong safety guarantees ● ● non-deterministic: answers Proofs are obtained and ● “what is safe”, not “what is checked by hand useful” Formal proofs = decades-long ● Assumes accurate model proof development ●

Model-Based Verification Accurate, analyzable models often exist! { {?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} }*

Model-Based Verification Accurate , analyzable models often exist! { {?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} discrete control }* Continuous motion

Model-Based Verification Accurate , analyzable models often exist! { {?safeAccel;accel ∪ brake ∪ ?safeTurn; turn}; {pos’ = vel, vel’ = acc} discrete, non-deterministic }* Continuous motion control

Model-Based Verification Accurate , analyzable models often exist! init → [ { ∪ brake ∪ ?safeTurn; turn}; { ?safeAccel;accel {pos’ = vel, vel’ = acc} }* ]pos < stopSign

Model-Based Verification Accurate , analyzable models often exist! formal verification gives strong safety guarantees init → [{ ∪ brake ∪ ?safeTurn; turn}; { ?safeAccel;accel {pos’ = vel, vel’ = acc} }*]pos < stopSign

Model-Based Verification Accurate , analyzable models often exist! formal verification gives strong safety guarantees ● Computer-checked proofs = of safety specification.

Model-Based Verification Accurate , analyzable models often exist! formal verification gives strong safety guarantees ● Computer-checked proofs = of safety specification ● Formal proofs mapping model to runtime monitors

Model-Based Verification Isn’t Enough Perfect , analyzable models don’t exist!

Model-Based Verification Isn’t Enough Perfect , analyzable models don’t exist! How to implement? { ∪ brake ∪ ?safeTurn; turn}; { ?safeAccel;accel {pos’ = vel, vel’ = acc} }* Only accurate sometimes

Model-Based Verification Isn’t Enough Perfect , analyzable models don’t exist! How to implement? { ∪ brake ∪ ?safeTurn; turn}; { ?safeAccel;accel {dx’=w*y, dy’=-w*x, ...} }* Only accurate sometimes

Our Contribution Justified Speculative Control is an approach toward provably safe reinforcement learning that: 1. learns to resolve non-determinism without sacrificing formal safety results

Our Contribution Justified Speculative Control is an approach toward provably safe reinforcement learning that: 1. learns to resolve non-determinism without sacrificing formal safety results 2. allows and directs speculation whenever model mismatches occur

Learning to Resolve Non-determinism Act Observe & compute reward

Learning to Resolve Non-determinism accel ∪ brake U turn Observe & compute reward

Learning to Resolve Non-determinism {accel,brake,turn} Observe & compute reward

Learning to Resolve Non-determinism {accel,brake,turn} ⇨ Policy Observe & compute reward

Learning to Resolve Non-determinism {accel,brake,turn} (safe?) ⇨ Policy Observe & compute reward

Learning to Safely Resolve Non-determinism Safety Monitor (safe?) ⇨ Policy Observe & compute reward

Learning to Safely Resolve Non-determinism Safety Monitor (safe?) ⇨ Policy Observe & compute reward ≠ “Trust Me”

Learning to Safely Resolve Non-determinism φ (safe?) ⇨ Policy Observe & compute reward Use a theorem prover to prove: (init → [{{accel ∪ brake};ODEs}*](safe)) ↔ φ

Learning to Safely Resolve Non-determinism φ Main Theorem: If the ODEs are accurate, then (safe?) ⇨ our formal proofs transfer from the Policy non-deterministic model to the learned Observe & compute (deterministic) policy reward Use a theorem prover to prove: (init → [{{accel ∪ brake};ODEs}*](safe)) ↔ φ

Learning to Safely Resolve Non-determinism φ Main Theorem: If the ODEs are accurate, then (safe?) ⇨ our formal proofs transfer from the Policy non-deterministic model to the learned Observe & compute (deterministic) policy via the model monitor. reward Use a theorem prover to prove: (init → [{{accel ∪ brake};ODEs}*](safe)) ↔ φ

What about the physical model? φ (safe?) ⇨ {pos’=vel,vel’=acc} ≠ Policy Observe & compute reward Use a theorem prover to prove: (init → [{{accel ∪ brake};ODEs}*](safe)) ↔ φ

What About the Physical Model? {brake, accel, turn} Observe & compute reward

What About the Physical Model? Model is accurate. {brake, accel, turn} Observe & compute reward

What About the Physical Model? Model is accurate. {brake, accel, turn} Model is inaccurate Observe & compute reward

What About the Physical Model? Model is accurate. {brake, accel, turn} Model is inaccurate Observe & compute Obstacle! reward

What About the Physical Model? Expected {brake, accel, turn} Reality Observe & compute reward

Speculation is Justified Expected {brake, accel, turn} (safe) Reality (crash!) Observe & compute reward

Safe Reinforcement Learning via Formal Methods Nathan Fulton and Andr - PowerPoint PPT Presentation

Safe Reinforcement Learning via Formal Methods Nathan Fulton and Andr Platzer Carnegie Mellon University Safe Reinforcement Learning via Formal Methods Nathan Fulton and Andr Platzer Carnegie Mellon University Safety-Critical Systems

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Formal Methods and Cryptography Lecture 25 Formal Methods Formal Methods Logical foundations

Formal Methods and Cryptography Lecture 24 1 Formal Methods 2 Formal Methods Logical

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning via Formal Methods Nathan Fulton and Andr Platzer Carnegie Mellon

1. Algorithms for Inverse Reinforcement Learning 2. Apprenticeship learning via Inverse

Lecture Outline 1. The lecturer 2. Introduction to Formal Methods DD2452 Formal Methods 3.

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Social interactions and incentives II MPA 612: Public Management Economics January 29, 2018

Object Oriented Software Development Naufal F. Setiawan School of Computing and Information

Generics Akim Demaille, Etienne Renault, Roland Levillain March 29, 2020 TYLA Generics March

Value Types Objectives Discuss concept of value types efficiency memory management

The people versus the commission: the indigenous boycott of land registration in Fijis early

Analyzing End-Users Knowledge and Feelings Surrounding Smartphone Security and Privacy May

Who we are: Tom Walker // @thomwithoutanh Privacy The right to privacy : individuals should be

Russian Annexation of Crimea Class 9 Ukraine through history Kiev is cradle of Russian