Cautious Adaptation For RL in Safety-Critical Settings - - PowerPoint PPT Presentation

▶

Aug 16, 2023 419 likes •697 views

3 1 2 Cautious Adaptation For RL in Safety-Critical Settings International Conference on Machine Learning 2020 Jesse Zhang 1 , Brian Cheung 1 , Chelsea Finn 2 , Sergey Levine 1 , Dinesh Jayaraman 3 1 Outline Short overview (4 Minutes)

SLIDE 1

Cautious Adaptation For RL in Safety-Critical Settings

International Conference on Machine Learning 2020

Jesse Zhang1, Brian Cheung1, Chelsea Finn2, Sergey Levine1, Dinesh Jayaraman3

1 2 3

SLIDE 2

Outline

Short overview (4 Minutes)
In-depth talk(11 Minutes)

SLIDE 3

Introduction

Real-world RL hazardous in

safety-critical settings

Hard to reset from real-life failures
How to adapt to unseen

environments safely?

SLIDE 4

Motivation

How do humans adapt?

SLIDE 5

Motivation

Safety-Critical Adaptation (SCA):

○ Pretraining: Sandbox environments ○ Adaptation: Safety-critical target environment

SLIDE 6

Methodology

Transfer risk knowledge from prior experience

Safety-Critical Adaptation (SCA)
Cautious Adaptation in RL (CARL)

SLIDE 7

Cautious Adaptation in RL (CARL)

Approach (Model-Based):

○ Pretraining: probabilistic models capture state transition uncertainty1 ○ Adaptation: utilize uncertainty to safely adapt to new environment (planning cost function modification)

1PETS (Chua et al., 2018)

SLIDE 8

Environments Tested

1Duckietown (Chevalier-Boisvert et al., 2018)

Cartpole (varying pole lengths) Duckietown (varying car width) Half Cheetah (varying disabled joint)

SLIDE 9

Results (Cartpole)

SLIDE 10

Results (Duckietown Driving)

SLIDE 11

Results (Half-Cheetah)

SLIDE 12

Short Summary

Capture environment risk with prior experience

○ Probabilistic dynamics models

Plan with risk in mind for safety-critical adaptation

SLIDE 13

Outline

Discussion of related works
Detailed discussion of CARL methodology
Further analysis of results

○ Comparison to other methods ○ Average reward, # of catastrophic events

SLIDE 14

Related Work

Risk-Averse RL

○ Conditional Value at Risk

Model Based RL for Safety
Rockafellar et al. (2000); Morimura et al. (2010); Borkar & Jain (2010); Chow & Ghavamzadeh (2014); Tamar et al. (2015); Chow et al.

(2015); Rajeswaran et al. (2016);

Capturing Uncertainty

Fisac et al (2017); Sadigh & Kapoor (2017); Berkenkamp et al (2017); Ostafew et al (2016); Hakobyan et al (2019); Hanssen & Foss (2015); Hewing et al (2019); Aswani et al (2013) Nagabandi et al (2018); Sæmundsson et al (2018); Finn et al (2017);

○ Explicit safety constraints ○ Meta-learning

SLIDE 15

Model-based RL Preliminaries: PETS

Ensemble of Probabilistic Dynamics Models Trajectory sampling for candidate action selection Sequence with highest action score is executed Action Score Over predicted trajectories with actions A Reward for i’th trajectory

SLIDE 16

CARL for Safety Critical Adaptation

PETS: Ensemble captures stochasticity in

single environment

○ CARL: Captures uncertainty induced by variations across environments

Pretraining: Train PETS

○ Randomly sample domain ○ Dynamics model captures uncertainty about state transitions, reward, and risk

Adaptation: Unseen domain

○ Risk averse action selection

SLIDE 17

Risk Averse Action Selection

Case 1: Low Reward Risk-Aversion, CARL (Reward)

Select actions that minimize worst-case outcomes
: worst 𝛿 percentile of predicted trajectories

SLIDE 18

Risk Averse Action Selection

Case 2: Catastrophic State Risk-Aversion, CARL (State)

Avoid catastrophic states directly
Build state safety cost, g(A)
Maximize:

Lagrangian relaxation of constraint minimizing probability of encountering states in

a catastrophic set

SLIDE 19

Risk Averse Action Selection

Case 2: Catastrophic State Risk-Aversion

SLIDE 20

CARL System Overview

SLIDE 21

Environments Tested

1Duckietown (Chevalier-Boisvert et al., 2018)

Cartpole (varying pole lengths) Duckietown (varying car width) Half Cheetah (varying disabled joint)

SLIDE 22

Experiment Setup

MB + Finetune: PETS, finetune on test environment

2(Finn et al., 2017)

RARL: Robust Adversarial Reinforcement Learning1
PPO-MAML: Model-Agnostic Meta Learning2
CARL (Reward): Reward-based CARL
CARL (State): State-based CARL

1(Pinto et al., 2017)

SLIDE 23

(State)

SLIDE 24

(State)

SLIDE 25

(State)

SLIDE 26

Summary

Safety-Critical Adaptation (SCA)

○ Train on sandbox environments, adapt to safety-critical environments

CARL and CARL (Reward)

○ Capture source uncertainty, perform risk-averse planning Thank you!