Reinforcement Learning and Model Predictive Control RL : optimizes - PowerPoint PPT Presentation

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost using data Solid math for the tuning DNNs support approximations Safety? Explainability? 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost using data Solid math for the tuning DNNs support approximations Safety? Explainability? Why do we dislike RL? “Machine Learning is alchemy”, Ali Rahimi, 2018 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost MPC : creates “optimal” policy π θ from using data model, cost, constraints Solid math for the tuning Solid math... everywhere DNNs support approximations Safe & explainable Safety? Explainability? Policy is as good as the model is Why do we dislike RL? “Machine Learning is alchemy”, Ali Rahimi, 2018 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost MPC : creates “optimal” policy π θ from using data model, cost, constraints Solid math for the tuning Solid math... everywhere DNNs support approximations Safe & explainable Safety? Explainability? Policy is as good as the model is Why do we dislike RL? How do we get a model? First-principles: x + = f θ ( x , u ) SYSID (fit to data) Robust MPC for uncertainties “Machine Learning is alchemy”, Ali Rahimi, 2018 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost MPC : creates “optimal” policy π θ from using data model, cost, constraints Solid math for the tuning Solid math... everywhere DNNs support approximations Safe & explainable Safety? Explainability? Policy is as good as the model is Why do we dislike RL? How do we get a model? First-principles: x + = f θ ( x , u ) SYSID (fit to data) Robust MPC for uncertainties “Machine Learning is alchemy”, Ali Rahimi, 2018 Can we combine RL and MPC? Why? How? 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost MPC : creates “optimal” policy π θ from using data model, cost, constraints Solid math for the tuning Solid math... everywhere DNNs support approximations Safe & explainable Safety? Explainability? Policy is as good as the model is Can we combine RL and MPC? Why? RL : big toolbox to “tune” parameters θ from data (tools need adaptation though) MPC : policy approximation π θ that: Uses prior knowledge Ensures safety, formal results, explainable 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost MPC : creates “optimal” policy π θ from using data model, cost, constraints Solid math for the tuning Solid math... everywhere DNNs support approximations Safe & explainable Safety? Explainability? Policy is as good as the model is Can we combine RL and MPC? Why? RL : big toolbox to “tune” parameters θ from data (tools need adaptation though) MPC : policy approximation π θ that: Uses prior knowledge Ensures safety, formal results, explainable Some fun observations RL may “tune” cost and constraints in MPC ( ∼ learn constraints tightening) RL may “tune” SYSID model SYSID comes back in its “set-membership version”, RL tunes uncertainty set 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost MPC : creates “optimal” policy π θ from using data model, cost, constraints Solid math for the tuning Solid math... everywhere DNNs support approximations Safe & explainable Safety? Explainability? Policy is as good as the model is Can we combine RL and MPC? Why? RL : big toolbox to “tune” parameters θ from data (tools need adaptation though) MPC : policy approximation π θ that: Uses prior knowledge Ensures safety, formal results, explainable Some fun observations RL may “tune” cost and constraints in MPC ( ∼ learn constraints tightening) RL may “tune” SYSID model SYSID comes back in its “set-membership version”, RL tunes uncertainty set Where is this going? 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes policy π θ for given cost MPC : creates “optimal” policy π θ from using data model, cost, constraints Solid math for the tuning Solid math... everywhere DNNs support approximations Safe & explainable Safety? Explainability? Policy is as good as the model is 1. Safe Reinforcement Learning with stability guarantees using min-max Robust NMPC, S. Gros, M. Zanon,Transaction on Automatic Control, (submitted) 2. Reinforcement Learning for mixed-integer problems with MPC-based function approximation, S. Gros, M. Zanon, IFAC 2020 (submitted) 3. Learning Real-Time Iteration NMPC, V. Kungurstev, M. Zanon, S. Gros, IFAC 2020 (submitted) 4. Safe Reinforcement Learning via projection on a safe set: how to achieve optimality? S. Gros, M. Zanon, IFAC 2020 (submitted) 5. Towards Safe Reinforcement Learning Using NMPC and Policy Gradients: Part I - Stochastic case, S. Gros, M. Zanon, Transaction on Automatic Control (submitted, arxiv.org/abs/1906.04057) 6. Towards Safe Reinforcement Learning Using NMPC and Policy Gradients: Part II - Deterministic Case, S. Gros, M. Zanon, Transaction on Automatic Control (submitted, arxiv.org/abs/1906.04034) 7. Safe Reinforcement Learning Using Robust MPC, S. Gros, M. Zanon, Transaction on Automatic Control, 2019 (submitted, arxiv.org/abs/1906.04005) 8. Practical Reinforcement Learning of Stabilizing Economic MPC, M. Zanon, S. Gros, A. Bemporad, European Control Conference 2019 9. Data-driven Economic NMPC using Reinforcement Learning, S. Gros, M. Zanon, Transaction on Automatic Control, 2019 8 th of November, 2019 S. Gros SARLEM 1 / 1

Reinforcement Learning and Model Predictive Control RL : optimizes - PowerPoint PPT Presentation

Reinforcement Learning and Model Predictive Control RL : optimizes policy for given cost using data Solid math for the tuning DNNs support approximations Safety? Explainability? 8 th of November, 2019 S. Gros SARLEM 1 / 1 Reinforcement

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

Generalized Model Predictive Control (Discretely Generalized MPC) Sa sa V. Rakovi c, Ph.D.

Experience with Model Predictive Control and Model-Based Reinforcement Learning Auralee Edelen

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Reinforcement Learning Reinforcement Learning Now that you know a little about Optimal Control

SYNTHESIS OF CARBON NANOTUBE REINFORCEMENT IN ALUMINUM POWDER BY IN SITU CHEMICAL VAPOR

1 2 3 The Industry Standard The Industry Standard Design and investigation of rectangular,

A Composable Specification Language for Reinforcement Learning Tasks Kishor Jothimurugan, Rajeev

MANAGING BEHAVIOR UTILIZING POSITIVE BEHAVIORAL SUPPORTS TO IMPROVE SCHOOL CLIMATE IS DISCIPLINE

SDR Engineering Consultants, Inc. April 24, 2019 OUTLINE Fiber-Reinforced Polymers

Joint-less Industrial Floor, MTD Hungaria, Nemesvamos 1 General Floor Characteristics

Beta Presentation Gamifying Gamechan3rs The Capstone Experience Team Michael Sadler Foundation

1 The purpose of this talk is to describe capacity- building efforts in Tennessee for increasing