[PPT] - Safe model-based learning for robot control Felix Berkenkamp, PowerPoint Presentation

SLIDE 1

Safe model-based learning for robot control

Felix Berkenkamp, Andreas Krause, Angela P. Schoellig

@CDC Workshop on Learning for Control 16th December 2018

SLIDE 2

The future of automation

2

Felix Berkenkamp

SLIDE 3

The future of automation

3

Felix Berkenkamp

Large prior uncertainties, active decision making Need safeand high-pe performancebehavior

SLIDE 4

Control approach

4

Felix Berkenkamp

System model System identification Controller design Controlled environments Robustness towards errors Safety constraints data collection

SLIDE 5

Two approaches

5

Felix Berkenkamp

Control (Sy Systems) + Models + Feedback + Safety + Worst-case

Learning
Data

Performance limited by system understanding

Systems must learn and adapt

SLIDE 6

Reinforcement learning approach

6

Felix Berkenkamp

System model System identification Controller design Data samples Controller optimization Action State Agent Environment Reward Collecting relevant data for the task (in con

ntrol
lled environments)

Performance typically in expe pectation

n

SLIDE 7

Two approaches

7

Felix Berkenkamp

Control (Sy Systems) Machine Learning (Data) + Models + Feedback + Safety + Worst-case

Learning
Data

+ Learning + Data collection + Explore / exploit + Average case

Worst-case
Safety

Systems must learn and adapt

Performance limited by system understanding Safety limited by lack of system understanding

safety, data efficiency

Model-based reinforcement learning

SLIDE 8

Prerequisites for safe reinforcement learning

8

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model errors and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety

SLIDE 9

Overview

9

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model errors and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety

SLIDE 10

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

SLIDE 11

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

SLIDE 12

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

SLIDE 13

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

SLIDE 14

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

SLIDE 15

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data sub-Gaussian

SLIDE 16

Gaussian process

16

Felix Berkenkamp

SLIDE 17

Gaussian process

16

Felix Berkenkamp

SLIDE 18

Gaussian process

16

Felix Berkenkamp

SLIDE 19

Gaussian process

16

Felix Berkenkamp

SLIDE 20

Gaussian process

16

Felix Berkenkamp

Theorem(informally): The model error is contained in the scaled Gaussian process confidence intervals with probability at least jointly for all , time steps, and actively selected measurements.

Gaussian Proce cess Optimization in the Bandit Setting: No Regret and Experimental Design

N. Srinivas, A. Krause, S. Kakade, M.Seeger, ICML 2010

SLIDE 21

A Bayesian dynamics model

21

Dyna namics

Felix Berkenkamp

SLIDE 22

A Bayesian dynamics model

21

Dyna namics

Felix Berkenkamp

SLIDE 23

A Bayesian dynamics model

21

Dyna namics

Felix Berkenkamp

SLIDE 24

A Bayesian dynamics model

21

Dyna namics

Felix Berkenkamp

SLIDE 25

Overview

25

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model errors and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety

SLIDE 26

Safety definition

26

unsafe

Felix Berkenkamp

robust, control-invariant prior knowledge

SLIDE 27

Safety for learned models

27

Felix Berkenkamp

Dyna namics Policy

+

Stabi bility? Region

n of attraction
n?

SLIDE 28

Lyapunov functions

28

[A.M. Lyapunov 1892]

Felix Berkenkamp

SLIDE 29

Lyapunov functions

29

Felix Berkenkamp

SLIDE 30

Learning Lyapunov functions

30

Felix Berkenkamp

Finding the right Lyapunov function is difficult!

The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic c Systems S.M. Richards, F. Berkenkamp, A. Krause, CoRL 2018

Weights - positive-definite Nonlinearities - trivial nullspace Classification problem

SLIDE 31

Overview

31

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model errors and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety

SLIDE 32

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

SLIDE 33

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

SLIDE 34

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

SLIDE 35

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

SLIDE 36

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

SLIDE 37

Safety definition

32

unsafe

Felix Berkenkamp

Theorem (informally): Under suitable conditions can identify (near-)maximal subset of X on which π is stable, while never leaving the safe set

Safe Model-based Reinforceme ment Learning with Stability Guarantees

F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

SLIDE 38

Illustration of safe learning

38

Policy Need to safelyexplore!

Felix Berkenkamp

Safe Model-based Reinforceme ment Learning with Stability Guarantees

F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

low high

SLIDE 39

Illustration of safe learning

39

Policy

Felix Berkenkamp

Safe Model-based Reinforceme ment Learning with Stability Guarantees

F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

low high

SLIDE 40

Model predictive control

40

Felix Berkenkamp

Makes decisions based on predictions about the future Includes input / state constraints

SLIDE 41

Model predictive control on a robot

41

Felix Berkenkamp Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

https://youtu.be/3xRNmNv5Efk

SLIDE 42

Model predictive control

42

Felix Berkenkamp

Problem: True dynamics are unknown!

SLIDE 43

Outer approximation contains true dynamics for all time steps with probability at least

Prediction under uncertainty

43

Learning-based Model Predicti ctive Control for Safe Exploration

T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018

Felix Berkenkamp

SLIDE 44

Safe model-based learning framework

44

unsafe safety trajectory exploration trajectory first step same Theorem (informally): Under suitable conditions can always guarantee that we are able to return to the safe set

Felix Berkenkamp

SLIDE 45

Exploration via expected performance

45

Felix Berkenkamp

We design our cost functions to be helpful for optimization Driving too fast Slow down for safety Faster driving after learning Exploration objective: subject to safety constraints

SLIDE 46

Example

46

Felix Berkenkamp Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

https://youtu.be/3xRNmNv5Efk

SLIDE 47

Example

47

Felix Berkenkamp Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

https://youtu.be/3xRNmNv5Efk

SLIDE 48

Summary

48

Felix Berkenkamp

Safe Mode del-based Reinfor

rcement Learning

Understand model and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety RKHS S / Gaussi ussian n proc

cesse

sses Lyapu puno nov st stabi bility Mod

del predictive con
ntrol

https: ps:// //be berkenkamp. p.me www.dynsyslab.org www.las.inf.ethz.ch reliable confidence intervals stability of learned models Uncertainty propagation, safe active learning