Safe model-based learning for robot control Breaking your robot is - - PowerPoint PPT Presentation

safe model based learning for robot control
SMART_READER_LITE
LIVE PREVIEW

Safe model-based learning for robot control Breaking your robot is - - PowerPoint PPT Presentation

Safe model-based learning for robot control Breaking your robot is only fun in simulation Felix ix Berkenkamp, Andreas Krause, Angela P. Schoellig @LCCC Workshop on Learning and Adaptation for Sensorimotor Control Lund University October


slide-1
SLIDE 1

Safe model-based learning for robot control

Felix ix Berkenkamp, Andreas Krause, Angela P. Schoellig

@LCCC Workshop on Learning and Adaptation for Sensorimotor Control – Lund University October 2018

Breaking your robot is only fun in simulation

slide-2
SLIDE 2

The Promise of Robotics = Physical In Interaction

Angela Schoellig 2

Virtual world

  • f data &

information.

slide-3
SLIDE 3

The Promise of Robotics = Physical In Interaction

Angela Schoellig 3

Virtual world

  • f data &

information. Virtual world Real world

Exponential increase in complexity!

slide-4
SLIDE 4

The Real World Is Complex | Robots Today… and Tomorrow

Angela Schoellig 4

Human-centered Envir ironments Dedic icated Envir ironments Manually programmed. Based on a-priori knowledge. Robots are limited by our under- standing of the system/environment. Unknown, unpredictable and changing Need safe and high-performance behavior

Robots must safely le learn and adapt

slide-5
SLIDE 5

Characteristics of Robot Learning

Robots are fe feedback systems Strict safety requirements Resource constraints (data, payload, communication)

Angela Schoellig 5

State Action Agent Environment Reward

Reinforcement t Learning: An In Intr troducti tion

  • R. Sutton, A.G. Barto, 1998

Results to date have been limited to learning sin ingle ta tasks, and demonstrated in sim imula lation or la lab sett ttings.

NEXT CHALLENGE: realistic application scenarios — safety, data efficiency, online learning —

slide-6
SLIDE 6

Work at the Dynamic Systems Lab (Prof. Schoellig)

Research Characteristics

Alg lgorithms th that run on real l robots.

  • Data efficiency
  • Online adaptation and learning
  • Safety guarantees during learning in

in a clo losed-loop system

Angela Schoellig 6

Machine Learnin ing Control th theory = scie ience of f feedback (stability, performance, robustness)

Approach

slide-7
SLIDE 7

Performance and Safety: Fast Swarm Flight

Angela Schoellig 7

slide-8
SLIDE 8

Safety: Off-Road Driving

Angela Schoellig 8

slide-9
SLIDE 9

Prerequisites for safe reinforcement learning

9

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model and learning dynamics Algorithm to safely acquire data Define safety, analyze a model for safety

slide-10
SLIDE 10

Overview

10

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model and learning dynamics Algorithm to safely acquire data Define safety, analyze a model for safety

slide-11
SLIDE 11

Learning a model

11

Dynamics

Felix Berkenkamp

Need to quantify model error Model error must decrease with measurements

slide-12
SLIDE 12

Gaussian process

12

Felix Berkenkamp

slide-13
SLIDE 13

Gaussian process

12

Felix Berkenkamp

slide-14
SLIDE 14

Gaussian process

12

Felix Berkenkamp

slide-15
SLIDE 15

Gaussian process

12

Felix Berkenkamp

slide-16
SLIDE 16

Gaussian process

12

Felix Berkenkamp

slide-17
SLIDE 17

Gaussian process

12

Felix Berkenkamp

slide-18
SLIDE 18

Gaussian process

12

Felix Berkenkamp

slide-19
SLIDE 19

A Bayesian dynamics model

19

Dynamics

Felix Berkenkamp Onli line Learning of f Lin inearly Parameterized Contr trol Problems

  • Y. Abbasi-Yadkori, PhD thesis 2012

On Kernelized Multi lti-armed Bandits ts S.R. Chowdhury, A. Gopalan, ICML 2017

slide-20
SLIDE 20

Samples from the Gaussian process prior

20

Felix Berkenkamp

time state The transition dynamics are correlated!

slide-21
SLIDE 21

Samples from the Gaussian process prior

21

Felix Berkenkamp

time state The transition dynamics are correlated!

slide-22
SLIDE 22

Samples from the Gaussian process prior

22

Felix Berkenkamp

time state The transition dynamics are correlated!

slide-23
SLIDE 23

Overview

23

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model and learning dynamics Algorithm to safely acquire data Define safety, analyze a model for safety

slide-24
SLIDE 24

Safety definition

24

unsafe

Felix Berkenkamp

robust, control-invariant prior knowledge

slide-25
SLIDE 25

Safety for learned models

25

Felix Berkenkamp

Dynamics Poli licy

+

Stabil ility?

slide-26
SLIDE 26

Lyapunov functions

26

[A.M. Lyapunov 1892]

Felix Berkenkamp

slide-27
SLIDE 27

Lyapunov functions

27

Felix Berkenkamp

slide-28
SLIDE 28

Region of attraction

28

unsafe Th Theorem (informally): Under suitable conditions can identify (near-)maximal subset of X on which π is stable, while never leaving the safe set Initial safe policy

Safe Model-based Reinforcement t Learning with ith Stability Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Felix Berkenkamp

slide-29
SLIDE 29

Illustration of safe learning

29

Policy Need to sa safely explore!

Felix Berkenkamp

Sa Safe Model-based Rein inforcement Learn Learning wit ith St Stabili lity Gu Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017
slide-30
SLIDE 30

Illustration of safe learning

30

Policy

Felix Berkenkamp

Sa Safe Model-based Rein inforcement Learn Learning wit ith St Stabili lity Gu Guarantees

  • F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017
slide-31
SLIDE 31

Lyapunov function

31

Felix Berkenkamp

Finding the right Lyapunov function is difficult!

Th The Lyapunov Neural Netw twork: Adapti tive Stability ty Certif tificati tion for Safe Learning of f Dynamic Systems S.M. Richards, F. Berkenkamp, A. Krause, CoRL 2018

Weights - positive-definite Nonlinearities - trivial nullspace Decision boundary

slide-32
SLIDE 32

Overview

32

Felix Berkenkamp

Safe Model-based Reinforcement Learning Understand model and learning dynamics Algorithm to safely acquire data Define safety, analyze a model for safety

slide-33
SLIDE 33

Model predictive control

33

Felix Berkenkamp

Makes decisions based on predictions about the future Includes input / state constraints

slide-34
SLIDE 34

Model predictive control on a robot

34

Felix Berkenkamp Robust t constr trained le learning-based NMPC enabling reli liable mobile robot t path th tr track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

Video at https://youtu.be/3xR NmNv5Efk

slide-35
SLIDE 35

Model predictive control

35

Felix Berkenkamp

Problem: True dynamics are unknown!

slide-36
SLIDE 36

Outer approximation contains true dynamics for all time steps with probability at least

Forward-propagating uncertainty

36

Learning-based Model Predictive Contr trol for Safe Explorati tion

  • T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018

Felix Berkenkamp

slide-37
SLIDE 37

Safe model-based learning framework

37

unsafe safety trajectory exploration trajectory first step same Th Theorem (informally): Under suitable conditions can always guarantee that we are able to return to the safe set

Felix Berkenkamp

slide-38
SLIDE 38

Safe model-based learning framework

38

unsafe safety trajectory exploration trajectory first step same Exploration limited by size of the safe set!

Felix Berkenkamp

slide-39
SLIDE 39

39

Felix Berkenkamp

How should we collect data for a control task?

slide-40
SLIDE 40

Optimizing expected performance

40

Felix Berkenkamp

We design our cost functions to be helpful for optimization Driving too fast Slow down for safety Faster driving after learning Exploration objective:

slide-41
SLIDE 41

Example

41

Felix Berkenkamp Robust t constr trained le learning-based NMPC enabling reli liable mobile robot t path th tr track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

Video at https://youtu.be/3xR NmNv5Efk

slide-42
SLIDE 42

Summary and Outlook

42

Felix Berkenkamp

Safe Model-based Rein inforcement Learnin ing Understand model and learning dynamics Algorithm to safely acquire data Define safety, analyze a model for safety Gaussia ian processes Lyapunov stabil ility Model l predic ictiv ive control https://berkenkamp.me www.dynsyslab.org

slide-43
SLIDE 43

Thanks To…

My y Team – In Industrial Partners – Funding Agencies

Angela Schoellig 43

My outstanding collaborators at U of f T (Tim Barfoot) and ETH (Andreas Krause, Raffaello D’Andrea and the whole FMA team).

www.dynsyslab.org