Safe model-based learning for robot control Felix Berkenkamp, - - PowerPoint PPT Presentation

safe model based learning for robot control
SMART_READER_LITE
LIVE PREVIEW

Safe model-based learning for robot control Felix Berkenkamp, - - PowerPoint PPT Presentation

Safe model-based learning for robot control Felix Berkenkamp, Andreas Krause, Angela P. Schoellig @CDC Workshop on Learning for Control 16 th December 2018 The future of automation Felix Berkenkamp 2 The future of automation Large prior


slide-1
SLIDE 1

Safe model-based learning for robot control

Felix Berkenkamp, Andreas Krause, Angela P. Schoellig

@CDC Workshop on Learning for Control 16th December 2018

slide-2
SLIDE 2

The future of automation

2

Felix Berkenkamp

slide-3
SLIDE 3

The future of automation

3

Felix Berkenkamp

Large prior uncertainties, active decision making Need safeand high-pe performancebehavior

slide-4
SLIDE 4

Control approach

4

Felix Berkenkamp

System model System identification Controller design Controlled environments Robustness towards errors Safety constraints data collection

slide-5
SLIDE 5

Two approaches

5

Felix Berkenkamp

Control (Sy Systems) + Models + Feedback + Safety + Worst-case

  • Learning
  • Data

Performance limited by system understanding

Systems must learn and adapt

slide-6
SLIDE 6

Reinforcement learning approach

6

Felix Berkenkamp

System model System identification Controller design Data samples Controller optimization Action State Agent Environment Reward Collecting relevant data for the task (in con

  • ntrol
  • lled environments)

Performance typically in expe pectation

  • n
slide-7
SLIDE 7

Two approaches

7

Felix Berkenkamp

Control (Sy Systems) Machine Learning (Data) + Models + Feedback + Safety + Worst-case

  • Learning
  • Data

+ Learning + Data collection + Explore / exploit + Average case

  • Worst-case
  • Safety

Systems must learn and adapt

Performance limited by system understanding Safety limited by lack of system understanding

safety, data efficiency

Model-based reinforcement learning

slide-8
SLIDE 8

Prerequisites for safe reinforcement learning

8

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model errors and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety

slide-9
SLIDE 9

Overview

9

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model errors and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety

slide-10
SLIDE 10

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

slide-11
SLIDE 11

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

slide-12
SLIDE 12

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

slide-13
SLIDE 13

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

slide-14
SLIDE 14

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data

slide-15
SLIDE 15

Learning a model

10

Dyna namics

Felix Berkenkamp

Need to quantify model error Model error must decrease with data sub-Gaussian

slide-16
SLIDE 16

Gaussian process

16

Felix Berkenkamp

slide-17
SLIDE 17

Gaussian process

16

Felix Berkenkamp

slide-18
SLIDE 18

Gaussian process

16

Felix Berkenkamp

slide-19
SLIDE 19

Gaussian process

16

Felix Berkenkamp

slide-20
SLIDE 20

Gaussian process

16

Felix Berkenkamp

Theorem(informally): The model error is contained in the scaled Gaussian process confidence intervals with probability at least jointly for all , time steps, and actively selected measurements.

Gaussian Proce cess Optimization in the Bandit Setting: No Regret and Experimental Design

  • N. Srinivas, A. Krause, S. Kakade, M.Seeger, ICML 2010
slide-21
SLIDE 21

A Bayesian dynamics model

21

Dyna namics

Felix Berkenkamp

slide-22
SLIDE 22

A Bayesian dynamics model

21

Dyna namics

Felix Berkenkamp

slide-23
SLIDE 23

A Bayesian dynamics model

21

Dyna namics

Felix Berkenkamp

slide-24
SLIDE 24

A Bayesian dynamics model

21

Dyna namics

Felix Berkenkamp

slide-25
SLIDE 25

Overview

25

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model errors and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety

slide-26
SLIDE 26

Safety definition

26

unsafe

Felix Berkenkamp

robust, control-invariant prior knowledge

slide-27
SLIDE 27

Safety for learned models

27

Felix Berkenkamp

Dyna namics Policy

+

Stabi bility? Region

  • n of attraction
  • n?
slide-28
SLIDE 28

Lyapunov functions

28

[A.M. Lyapunov 1892]

Felix Berkenkamp

slide-29
SLIDE 29

Lyapunov functions

29

Felix Berkenkamp

slide-30
SLIDE 30

Learning Lyapunov functions

30

Felix Berkenkamp

Finding the right Lyapunov function is difficult!

The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic c Systems S.M. Richards, F. Berkenkamp, A. Krause, CoRL 2018

Weights - positive-definite Nonlinearities - trivial nullspace Classification problem

slide-31
SLIDE 31

Overview

31

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model errors and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety

slide-32
SLIDE 32

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

slide-33
SLIDE 33

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

slide-34
SLIDE 34

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

slide-35
SLIDE 35

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

slide-36
SLIDE 36

Safety definition

32

unsafe

Felix Berkenkamp Safe Model-based Reinforceme ment Learning with Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

slide-37
SLIDE 37

Safety definition

32

unsafe

Felix Berkenkamp

Theorem (informally): Under suitable conditions can identify (near-)maximal subset of X on which π is stable, while never leaving the safe set

Safe Model-based Reinforceme ment Learning with Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Initial safe policy

slide-38
SLIDE 38

Illustration of safe learning

38

Policy Need to safelyexplore!

Felix Berkenkamp

Safe Model-based Reinforceme ment Learning with Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

low high

slide-39
SLIDE 39

Illustration of safe learning

39

Policy

Felix Berkenkamp

Safe Model-based Reinforceme ment Learning with Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

low high

slide-40
SLIDE 40

Model predictive control

40

Felix Berkenkamp

Makes decisions based on predictions about the future Includes input / state constraints

slide-41
SLIDE 41

Model predictive control on a robot

41

Felix Berkenkamp Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

https://youtu.be/3xRNmNv5Efk

slide-42
SLIDE 42

Model predictive control

42

Felix Berkenkamp

Problem: True dynamics are unknown!

slide-43
SLIDE 43

Outer approximation contains true dynamics for all time steps with probability at least

Prediction under uncertainty

43

Learning-based Model Predicti ctive Control for Safe Exploration

  • T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018

Felix Berkenkamp

slide-44
SLIDE 44

Safe model-based learning framework

44

unsafe safety trajectory exploration trajectory first step same Theorem (informally): Under suitable conditions can always guarantee that we are able to return to the safe set

Felix Berkenkamp

slide-45
SLIDE 45

Exploration via expected performance

45

Felix Berkenkamp

We design our cost functions to be helpful for optimization Driving too fast Slow down for safety Faster driving after learning Exploration objective: subject to safety constraints

slide-46
SLIDE 46

Example

46

Felix Berkenkamp Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

https://youtu.be/3xRNmNv5Efk

slide-47
SLIDE 47

Example

47

Felix Berkenkamp Robust constrained learning-based NMPC enabling reliable mobile robot path track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

https://youtu.be/3xRNmNv5Efk

slide-48
SLIDE 48

Summary

48

Felix Berkenkamp

Safe Mode del-based Reinfor

  • rcement Learning

Understand model and learning dynamics Algorithm to safely acquire data and optimize task Define safety, analyze a model for safety RKHS S / Gaussi ussian n proc

  • cesse

sses Lyapu puno nov st stabi bility Mod

  • del predictive con
  • ntrol

https: ps:// //be berkenkamp. p.me www.dynsyslab.org www.las.inf.ethz.ch reliable confidence intervals stability of learned models Uncertainty propagation, safe active learning