safe model based learning for robot control
play

Safe model-based learning for robot control Breaking your robot is - PowerPoint PPT Presentation

Safe model-based learning for robot control Breaking your robot is only fun in simulation Felix ix Berkenkamp, Andreas Krause, Angela P. Schoellig @LCCC Workshop on Learning and Adaptation for Sensorimotor Control Lund University October


  1. Safe model-based learning for robot control Breaking your robot is only fun in simulation Felix ix Berkenkamp, Andreas Krause, Angela P. Schoellig @LCCC Workshop on Learning and Adaptation for Sensorimotor Control – Lund University October 2018

  2. The Promise of Robotics = Physical In Interaction Virtual world of data & information. Angela Schoellig 2

  3. The Promise of Robotics = Physical In Interaction Virtual world Virtual world of data & information. Real world Exponential increase in complexity! Angela Schoellig 3

  4. The Real World Is Complex | Robots Today… and Tomorrow Dedic icated Envir ironments Human-centered Envir ironments Manually programmed. Unknown, unpredictable and changing Based on a-priori knowledge. Need safe and high-performance behavior Robots are limited by our under- Robots must safely le learn and adapt standing of the system/environment. Angela Schoellig 4

  5. Characteristics of Robot Learning Robots are fe feedback systems Agent Action Reward Strict safety requirements State Environment Resource constraints (data, payload, communication) Reinforcement t Learning: An In Intr troducti tion R. Sutton, A.G. Barto, 1998 Results to date have been limited to learning sin ingle ta tasks, and demonstrated in sim imula lation or la lab sett ttings. NEXT CHALLENGE: realistic application scenarios — safety, data efficiency, online learning — Angela Schoellig 5

  6. Work at the Dynamic Systems Lab (Prof. Schoellig) Approach Control = scie ience of f feedback Machine theory th (stability, performance, Learnin ing robustness) Research Characteristics Alg lgorithms th that run on real l robots. • Data efficiency • Online adaptation and learning • Safety guarantees during learning in in a clo losed-loop system Angela Schoellig 6

  7. Performance and Safety: Fast Swarm Flight Angela Schoellig 7

  8. Safety: Off-Road Driving Angela Schoellig 8

  9. Prerequisites for safe reinforcement learning Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Safe Model-based Reinforcement Learning Felix Berkenkamp 9

  10. Overview Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Safe Model-based Reinforcement Learning Felix Berkenkamp 10

  11. Learning a model Dynamics Model error must decrease with measurements Need to quantify model error Felix Berkenkamp 11

  12. Gaussian process Felix Berkenkamp 12

  13. Gaussian process Felix Berkenkamp 12

  14. Gaussian process Felix Berkenkamp 12

  15. Gaussian process Felix Berkenkamp 12

  16. Gaussian process Felix Berkenkamp 12

  17. Gaussian process Felix Berkenkamp 12

  18. Gaussian process Felix Berkenkamp 12

  19. A Bayesian dynamics model Dynamics On Kernelized Multi lti-armed Bandits ts Onli line Learning of f Lin inearly Parameterized Contr trol Problems S.R. Chowdhury, A. Gopalan, ICML 2017 Y. Abbasi-Yadkori, PhD thesis 2012 Felix Berkenkamp 19

  20. Samples from the Gaussian process prior state The transition dynamics are correlated! time Felix Berkenkamp 20

  21. Samples from the Gaussian process prior state The transition dynamics are correlated! time Felix Berkenkamp 21

  22. Samples from the Gaussian process prior state The transition dynamics are correlated! time Felix Berkenkamp 22

  23. Overview Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Safe Model-based Reinforcement Learning Felix Berkenkamp 23

  24. Safety definition robust, control-invariant prior knowledge unsafe Felix Berkenkamp 24

  25. Safety for learned models Dynamics Poli licy + Stabil ility? Felix Berkenkamp 25

  26. Lyapunov functions [A.M. Lyapunov 1892] Felix Berkenkamp 26

  27. Lyapunov functions Felix Berkenkamp 27

  28. Region of attraction Safe Model-based Reinforcement t Learning with ith Stability Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Initial safe policy Th Theorem (informally): Under suitable conditions can identify (near-)maximal unsafe subset of X on which π is stable, while never leaving the safe set Felix Berkenkamp 28

  29. Illustration of safe learning Need to sa safely explore! Policy Sa Safe Model-based Rein inforcement Learn Learning wit ith St Stabili lity Gu Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 29

  30. Illustration of safe learning Policy Sa Safe Model-based Rein inforcement Learn Learning wit ith St Stabili lity Gu Guarantees F. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017 Felix Berkenkamp 30

  31. Lyapunov function Finding the right Lyapunov function is difficult! Weights - positive-definite Nonlinearities - trivial nullspace Decision boundary Th The Lyapunov Neural Netw twork: Adapti tive Stability ty Certif tificati tion for Safe Learning of f Dynamic Systems S.M. Richards, F. Berkenkamp, A. Krause, CoRL 2018 Felix Berkenkamp 31

  32. Overview Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Safe Model-based Reinforcement Learning Felix Berkenkamp 32

  33. Model predictive control Makes decisions based on predictions about the future Includes input / state constraints Felix Berkenkamp 33

  34. Model predictive control on a robot Video at https://youtu.be/3xR NmNv5Efk Robust t constr trained le learning-based NMPC enabling reli liable mobile robot t path th tr track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 34

  35. Model predictive control Problem: True dynamics are unknown! Felix Berkenkamp 35

  36. Forward-propagating uncertainty Outer approximation contains true dynamics for all time steps with probability at least Learning-based Model Predictive Contr trol for Safe Explorati tion T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018 Felix Berkenkamp 36

  37. Safe model-based learning framework exploration trajectory first step same Th Theorem (informally): Under suitable conditions can always guarantee that we are safety trajectory unsafe able to return to the safe set Felix Berkenkamp 37

  38. Safe model-based learning framework exploration trajectory first step same Exploration limited by size of safety trajectory unsafe the safe set! Felix Berkenkamp 38

  39. How should we collect data for a control task? Felix Berkenkamp 39

  40. Optimizing expected performance We design our cost functions to be helpful for optimization Exploration objective: Driving too fast Slow down for safety Faster driving after learning Felix Berkenkamp 40

  41. Example Video at https://youtu.be/3xR NmNv5Efk Robust t constr trained le learning-based NMPC enabling reli liable mobile robot t path th tr track cking C.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016 Felix Berkenkamp 41

  42. Summary and Outlook Understand model and Define safety, analyze a Algorithm to safely learning dynamics model for safety acquire data Gaussia ian processes Lyapunov stabil ility Model l predic ictiv ive control Safe Model-based Rein inforcement Learnin ing https://berkenkamp.me www.dynsyslab.org Felix Berkenkamp 42

  43. Thanks To… My y Team – In Industrial Partners – Funding Agencies www.dynsyslab.org My outstanding collaborators at U of f T (Tim Barfoot) and ETH (Andreas Krause, Raffaello D’Andrea and the whole FMA team). Angela Schoellig 43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend