Cautious Adaptation For RL in Safety-Critical Settings - - PowerPoint PPT Presentation

cautious adaptation for rl in safety critical settings
SMART_READER_LITE
LIVE PREVIEW

Cautious Adaptation For RL in Safety-Critical Settings - - PowerPoint PPT Presentation

3 1 2 Cautious Adaptation For RL in Safety-Critical Settings International Conference on Machine Learning 2020 Jesse Zhang 1 , Brian Cheung 1 , Chelsea Finn 2 , Sergey Levine 1 , Dinesh Jayaraman 3 1 Outline Short overview (4 Minutes)


slide-1
SLIDE 1

Cautious Adaptation For RL in Safety-Critical Settings

International Conference on Machine Learning 2020

Jesse Zhang1, Brian Cheung1, Chelsea Finn2, Sergey Levine1, Dinesh Jayaraman3

1

1 2 3

slide-2
SLIDE 2

Outline

  • Short overview (4 Minutes)
  • In-depth talk(11 Minutes)

2

slide-3
SLIDE 3

Introduction

  • Real-world RL hazardous in

safety-critical settings

  • Hard to reset from real-life failures
  • How to adapt to unseen

environments safely?

3

slide-4
SLIDE 4

Motivation

  • How do humans adapt?

4

slide-5
SLIDE 5

Motivation

5

  • Safety-Critical Adaptation (SCA):

○ Pretraining: Sandbox environments ○ Adaptation: Safety-critical target environment

slide-6
SLIDE 6

Methodology

Transfer risk knowledge from prior experience

  • Safety-Critical Adaptation (SCA)
  • Cautious Adaptation in RL (CARL)

6

slide-7
SLIDE 7

Cautious Adaptation in RL (CARL)

  • Approach (Model-Based):

○ Pretraining: probabilistic models capture state transition uncertainty1 ○ Adaptation: utilize uncertainty to safely adapt to new environment (planning cost function modification)

7

1PETS (Chua et al., 2018)

slide-8
SLIDE 8

Environments Tested

8

1Duckietown (Chevalier-Boisvert et al., 2018)

Cartpole (varying pole lengths) Duckietown (varying car width) Half Cheetah (varying disabled joint)

slide-9
SLIDE 9

Results (Cartpole)

9

slide-10
SLIDE 10

Results (Duckietown Driving)

10

slide-11
SLIDE 11

Results (Half-Cheetah)

11

slide-12
SLIDE 12

Short Summary

12

  • Capture environment risk with prior experience

○ Probabilistic dynamics models

  • Plan with risk in mind for safety-critical adaptation
slide-13
SLIDE 13

Outline

  • Discussion of related works
  • Detailed discussion of CARL methodology
  • Further analysis of results

○ Comparison to other methods ○ Average reward, # of catastrophic events

13

slide-14
SLIDE 14

Related Work

  • Risk-Averse RL

○ Conditional Value at Risk

14

  • Model Based RL for Safety
  • Rockafellar et al. (2000); Morimura et al. (2010); Borkar & Jain (2010); Chow & Ghavamzadeh (2014); Tamar et al. (2015); Chow et al.

(2015); Rajeswaran et al. (2016);

  • Capturing Uncertainty

Fisac et al (2017); Sadigh & Kapoor (2017); Berkenkamp et al (2017); Ostafew et al (2016); Hakobyan et al (2019); Hanssen & Foss (2015); Hewing et al (2019); Aswani et al (2013) Nagabandi et al (2018); Sæmundsson et al (2018); Finn et al (2017);

○ Explicit safety constraints ○ Meta-learning

slide-15
SLIDE 15

Model-based RL Preliminaries: PETS

15

Ensemble of Probabilistic Dynamics Models Trajectory sampling for candidate action selection Sequence with highest action score is executed Action Score Over predicted trajectories with actions A Reward for i’th trajectory

slide-16
SLIDE 16

CARL for Safety Critical Adaptation

16

  • PETS: Ensemble captures stochasticity in

single environment

○ CARL: Captures uncertainty induced by variations across environments

  • Pretraining: Train PETS

○ Randomly sample domain ○ Dynamics model captures uncertainty about state transitions, reward, and risk

  • Adaptation: Unseen domain

○ Risk averse action selection

slide-17
SLIDE 17

Risk Averse Action Selection

Case 1: Low Reward Risk-Aversion, CARL (Reward)

17

  • Select actions that minimize worst-case outcomes
  • : worst 𝛿 percentile of predicted trajectories
slide-18
SLIDE 18

Risk Averse Action Selection

Case 2: Catastrophic State Risk-Aversion, CARL (State)

  • Avoid catastrophic states directly
  • Build state safety cost, g(A)
  • Maximize:

18

  • Lagrangian relaxation of constraint minimizing probability of encountering states in

a catastrophic set

slide-19
SLIDE 19

Risk Averse Action Selection

Case 2: Catastrophic State Risk-Aversion

19

slide-20
SLIDE 20

CARL System Overview

20

slide-21
SLIDE 21

Environments Tested

21

1Duckietown (Chevalier-Boisvert et al., 2018)

Cartpole (varying pole lengths) Duckietown (varying car width) Half Cheetah (varying disabled joint)

slide-22
SLIDE 22

Experiment Setup

  • MB + Finetune: PETS, finetune on test environment

22

2(Finn et al., 2017)

  • RARL: Robust Adversarial Reinforcement Learning1
  • PPO-MAML: Model-Agnostic Meta Learning2
  • CARL (Reward): Reward-based CARL
  • CARL (State): State-based CARL

1(Pinto et al., 2017)

slide-23
SLIDE 23

23

(State)

slide-24
SLIDE 24

24

(State)

slide-25
SLIDE 25

25

(State)

slide-26
SLIDE 26

Summary

  • Safety-Critical Adaptation (SCA)

○ Train on sandbox environments, adapt to safety-critical environments

  • CARL and CARL (Reward)

○ Capture source uncertainty, perform risk-averse planning Thank you!

26