Class notes 1. Homework 5 due Tuesday, November 13 th 11:59pm

Real-World Robot Learning: Safety and Flexibility CS294-112: Deep Reinforcement Learning Gregory Kahn

Why should you care? Safety Flexibility

Outline Topics Algorithms • Safety • Imitation learning • Flexibility • Model-free • Model-based 2 * 3 = 6 papers we’ll cover By no means the best / only papers on these topics Safety Flexibility Imitation learning Model-free Model-based

Safety Flexibility Imitation learning Model-free Model-based

Goal Learn control policy that maps observations to controls Control Observation Policy Safety Flexibility Imitation learning Model-free Model-based

Assumption ● Able to generate good trajectories using an expert policy Human expert Trajectory optimization - cost function - optimization - full state information only during training Safety Flexibility Imitation learning Model-free Model-based

Supervised Learning Gather expert Supervised Trajectory trajectories learning optimization Training trajectory Learned policy trajectory Policy reaches states not in training set! [Ross et al 2010] ● Problem: training and test distributions differ Safety Flexibility Imitation learning Model-free Model-based

Dataset Aggregation (DAgger) [Ross et al 2011] ● Problem: training and test distributions differ ● Solution: execute policy during training Supervised Gather expert learning trajectories Safety Flexibility Imitation learning Model-free Model-based

Safety during training ● DAgger mixes the actions Safety Flexibility Imitation learning Model-free Model-based

Policy Learning using Adaptive Trajectory Optimization (PLATO) ● DAgger mixes the actions ● PLATO mixes the objectives cost J → avoids high cost Safety Flexibility Imitation learning Model-free Model-based

Algorithm comparisons approach sampling safe similar training policy and test distributions supervised learning DAgger PLATO Safety Flexibility Imitation learning Model-free Model-based

Experiments: final neural network policies Canyon Forest Safety Flexibility Imitation learning Model-free Model-based

Experiments: metrics Canyon Forest Safety Flexibility Imitation learning Model-free Model-based

Experiments: metrics Forest Canyon Forest Canyon Safety Flexibility Imitation learning Model-free Model-based

Goal NOT SAFE Safety Flexibility Imitation learning Model-free Model-based

Shielding Pre-emptive shielding Post-posed shielding Like learning in a transformed MDP Shield can be used at test time Safety Flexibility Imitation learning Model-free Model-based

How to shield: linear temporal logic ● Encode safety with temporal logic ● Assumption: Known approximate/conservative transition dynamics Safety Flexibility Imitation learning Model-free Model-based

Experiments Safety criteria - Don’t crash Safety Flexibility Imitation learning Model-free Model-based

Experiments Safety criteria - Don’t run out of oxygen - If enough oxygen, don’t surface w/o divers Safety Flexibility Imitation learning Model-free Model-based

Goal unknown environment How to do reinforcement learning without destroying the robot during training using only onboard images Safety Flexibility Imitation learning Model-free Model-based

Approach unknown environment learn a collision prediction model raw image command velocities neural network Safety Flexibility Imitation learning Model-free Model-based

Collision prediction model Safety Flexibility Imitation learning Model-free Model-based

Model-based RL using collision prediction model Encourage safe, low-speed collisions by reasoning about May experience collisions the model’s uncertainty Form speed-dependent, Gather trajectories using uncertainty-aware MPC controller collision cost . Train uncertainty-aware Data collision prediction model Deep neural network with Robot increases speed uncertainty estimates from as model becomes more bootstrapping and dropout confident Safety Flexibility Imitation learning Model-free Model-based

Collision cost high speed predict collision large uncertainty large cost Safety Flexibility Imitation learning Model-free Model-based

Estimating neural network output uncertainty Bootstrapping Training time Test time Input Data Resample with replacement M 1 M 2 M 3 D 1 D 2 D 3 Train Train Train M 1 M 2 M 3 Safety Flexibility Imitation learning Model-free Model-based

Estimating neural network output uncertainty Dropout Test time Training time Input Data Model Model Model Model Model Model Safety Flexibility Imitation learning Model-free Model-based

Preliminary real-world experiments Not accounting for uncertainty (higher-speed collisions) Safety Flexibility Imitation learning Model-free Model-based

Preliminary real-world experiments accounting for uncertainty (lower-speed collisions) Safety Flexibility Imitation learning Model-free Model-based

Preliminary real-world experiments successful flight past obstacle Safety Flexibility Imitation learning Model-free Model-based

Safety takeaways • Tradeoff between safety and exploration • Safety guarantees require expert oversight or known environment + dynamics • Uncertainty can play a key role Safety Flexibility Imitation learning Model-free Model-based

Goal User-specified command Safety Flexibility Imitation learning Model-free Model-based

Approach Option A: Input command Option B: Branch using command + empirically better - only works for discrete commands Safety Flexibility Imitation learning Model-free Model-based

Approach Important details • Data augmentation • Contrast • Brightness • Tone • Gaussian blur • Salt-and-pepper noise • Region dropout • Adding noise to expert Safety Flexibility Imitation learning Model-free Model-based

[slides adapted from Tuomas Haarnoja] Safety Flexibility Imitation learning Model-free Model-based

Goal Space of trajectories Avoidance skill Task 1: Reach Task 2: Avoid Reaching while Reaching skill avoiding skill Safety Flexibility Imitation learning Model-free Model-based

Policy Composition Space of trajectories Task 1: Reach Task 2: Avoid Avoidance skill Task 1+2: Reach and avoid Reaching while Reaching skill avoiding skill Related to divergence between and Reusability! Safety Flexibility Imitation learning Model-free Model-based

Task 1 Task 2 Task 1 + 2

Stacking policy Avoidance policy

Stacking policy Avoidance policy Combined policy

Standard Reinforcement Learning Train Data inefficient Data Data Data Expert in the loop Policy Policy Policy Inflexible Test

CAPs Approach Event Cues Detector Train Data efficient Data Detector in the loop CAPs Flexible Test

Detect Predict Control Safety Flexibility Imitation learning Model-free Model-based

Detect Predict Control Detector Event Cues Safety Flexibility Imitation learning Model-free Model-based

Detect Predict Control Safety Flexibility Imitation learning Model-free Model-based

8x 8x 8x 8x 8x 8x Safety Flexibility Imitation learning Model-free Model-based

8x Safety Flexibility Imitation learning Model-free Model-based

Drive at 7m/s Avoid collisions Drive in either lane Drive in right lane 6x 6x Safety Flexibility Imitation learning Model-free Model-based

CAPs 6x

Collision Avoidance CAPs DQL Safety Flexibility Imitation learning Model-free Model-based

Avoid collisions Follow goal heading Move towards doors Heading

Flexibility takeaways • Carefully construct how your policy / model deals with goals • Model-free methods require extra care to reuse • Model-based methods are flexible by construction Safety Flexibility Imitation learning Model-free Model-based

Class notes 1. Homework 5 due Tuesday, November 13 th 11:59pm - PowerPoint PPT Presentation

Class notes 1. Homework 5 due Tuesday, November 13 th 11:59pm Real-World Robot Learning: Safety and Flexibility CS294-112: Deep Reinforcement Learning Gregory Kahn Why should you care? Safety Flexibility Outline Topics Algorithms

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Problem solved: IBM Notes Replacement 2 IBM Notes Replacement Migrating from IBM Notes to

Printout Tuesday, October 29, 2019 7:38 PM Quick Notes Page 1 Quick Notes Page 2 Quick Notes

Briefing Notes The Briefing Notes Page The Briefing Notes include: An introduction to the

AMath 483/583 Lecture 8 Notes: This lecture: Fortran subroutines and functions Arrays

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Electing Your Membership Class Class TG, Class TH, or Class DC As a school employee who

TwissOptics Class Joschua Dilly TwissOptics Class 2 The TwissOptics Class Resonance Driving

AMath 483/583 Lecture 27 Notes: Outline: Random walk solution of Poisson problem

Cell History and Structure Quiz on Block Day January 18-19, 2016 Admit Ticket NOTES: Take notes

NOTES: ORGANIC MARKET IS BOOMING, INCREASE OF ORGANIC FARMLAND RECENT YEARS NOTES: ORGANIC MARKET

Lotus Notes & Domino 8: New Interface, New Features, New Gadgets Whats New? Notes 8.0

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

C R RAO AIMSCS Lecture Notes Series Author (s): B.L.S. PRAKASA RAO Title of the Notes : Brief

Slides from lecture Friday, April 26, 2019 12:02 PM Unfiled Notes Page 1 Unfiled Notes Page 2

Flexible Anonymous Network Flexible Anonymous Network Florentin Rochet Florentin Rochet ,

Serving Students with Disabilities during Periods of Remote or Blended Learning Technical

Flexibility and Robustness: The Cloud, Standards, Web Services and the Hybrid Future of

LECTURES ON REAL OPTIONS: PART I BASIC CONCEPTS Robert S. Pindyck Massachusetts Institute of

Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University

FCP: A Flexible Transport Framework for Accommoda:ng Diversity

Flexibility of the BLRM in Dose-Escalation Trials Ursula Garczarek Cytel Inc. | Hagen (DE)

Rigidity and flexibility of Hamiltonian 4-manifolds Liat Kessler University of Haifa Online