Model Learning for Long-term Safe Control in Changing Environments - PowerPoint PPT Presentation

Model Learning for Long-term Safe Control in Changing Environments Christopher D. McKinnon and Angela P. Schoellig CPS V&V I&F Workshop, 2018

Platform and Operating Conditions 2 Chris McKinnon

Control Approach: Stochastic MPC A predictive controller that assumes a probabilistic model for the robot dynamics Depend on an The control problem is a constrained optimization problem accurate model for robot dynamics! Probabilistic chance constraints → deterministic constraints based on predicted uncertainty Chris McKinnon 3

What is the main challenge? Dynamics can be affected by internal and external factors that are out of our control Tyre Tread Un-expected Payload Surface Type Chris McKinnon 4

Probabilistic Modeling for Robots in Changing Environments Learning Control Unknown Number Fixed Number of Single Model of Models Models Optimism-Driven Exploration [Abbeel et al, 2015] • Robust Trajectory Planning for Nonparametric Bayesian Learning of Switching Linear • • Provably Safe and Robust Learning-based MPC [T omlin • Autonomous Parafoils Under Wind Dynamical Systems.DP-GPMM [Willsky et al, 2017] et al, 2013] Uncertainty [How et al, 2013] Learning Multi-Modal Models for Robot Dynamics • Robust Constrained Learning-based Control [Ostafew • Bayesian Optimization with Automatic with a Mixture of Gaussian Process Experts • et al, 2015] prior selection for Data-efficient Direct [McKinnon et al, 2017] Information Theoretic MPC for Model-Based • Policy Search [Jouret et al, 2017] Experience Recommendation for Long T erm Safe • Reinforcement Learning [Theodorou et al, 2017] Learning-based Model Predictive Control in Changing Operating Conditions [McKinnon et al, 2018] Learn fast, forget slow: safe predictive control for • systems with locally linear actuator dynamics performing repetitive tasks [ McKinnon et al, 2019] My work has focused on the case when the robot can encounter new operating conditions during its deployment Chris McKinnon 5

We build on a GP-based Approach [Ostafew et al, 2015] Context/Mode Multi-modal Gaussian Process Gaussian Process Deterministic mean Additive, Gaussian noise This works well if g(x) is fixed. • What if g(x) can change (e.g. it snows)? • C. Ostafew, A. P. Schoellig, and T. Barfoot. Robust Constrained Learning-based NMPC Enabling Reliable Mobile Robot Path Tracking.Intl. Journal of Robotics Research (IJRR), 35(13):1547 – 1563, 2016 Chris McKinnon 6

Repetitive Path Following • At any point along the path, we only have to model the dynamics for the upcoming maneuver and not the entire state-action space. → We can use local models which are more computationally efficient. • It is easy to store data indexed by location along the path → We get location-specific information for free Chris McKinnon 7

Repetitive Path Following with Changing Dynamics 8 Chris McKinnon

Our Approach: Experience Recommendation for the GP • Each run, the robot stores a new set of experiences. • Our goal is to choose useful experiences from past over the upcoming section of the path to construct a model for control 9 Chris McKinnon

Our Approach: Experience Recommendation for the GP Fit a local GP to each run in memory run 1 run 1 run 2 1. Which runs have data that is safe to use? 2. Which run is most similar? The GPs let us compare the dynamics in each run 10 Chris McKinnon

1. Which Runs Have Data That is Safe to Use? Safety Check: Are the ‘r’ - 𝜏 bounds reasonable? run 1 run 2 Reject if: Recent Data Outlier! 11 Chris McKinnon

2. Which Run is Most Similar? Run Run Similarity Measure Prior run 1 run 2 Likelihood of Recent Data Likelihood is calculated using local GPs Recent Data 12 Chris McKinnon

Constructing the Local GP for Control • Now we have matched recent experience to the dynamics in a previous run • Use data from that run to construct a GP for control 13 Chris McKinnon

Experimental Setup • Platform: • Clearpath Grizzly • Configurations • Nominal, Loaded, Altered • Experiment: • 30 runs • Parking lot, ~42 m course • switch configuration every two runs • Baseline: • Use experiences from last the run Clearpath Grizzly in the Loaded configuration 14 Chris McKinnon

Experimental Results: Long-term autonomy in Changing Conditions Compared the cost of traversing a path with the robot in three different configurations Modes Nominal Loaded Artificial Disturbance • The proposed method i mproves significantly after just one run in each mode • The proposed method can search up to 300 previous runs for relevant experience • This enables truly long-term safe learning . 15 Chris McKinnon

Summary Experience Recommendation for Long T erm Safe Learning-based Model Predictive Control in Changing Operating Conditions [McKinnon et al, 2018] Main Results: • Model selection criteria linked to controller safety requirements • Resorts to a conservative model when new dynamics are encountered • Runs in a separate thread to the control loop Main Limitations: • I t was hard to get the GP to model the dynamics in a wide range of operating conditions • The model only improves after each run • Proof by experiment… 16

To try to improve, new assumptions about robot dynamics Learn ‘actuator dynamics’ rather than a general additive model error. For a unicycle-type robot like the Grizzly, this is: We try to learn something simpler so we can do so more reliably 17 Chris McKinnon

New Learning Model: Weighted Bayesian Linear Regression Assume a simpler form for the model with no hyperparameters We still want to learn from all past data, so include a weight for each experience After some math, we can estimate the distribution of w and 𝜏 2 Compared to the previous approach: 1. Simpler model (vs. GP) → fitting the model is more reliable 2. Predicting acceleration instead of velocity → Easier to model 3. wBLR scales better with # data pts → can leverage more data 18 Chris McKinnon

Re-visit Model Learning 1. Use data from the live run → Fast adaptation to new conditions 2. Use data from previous runs to anticipate repetitive changes ** wBLR uses a weighted combination of all data instead of a subset like the GP Chris McKinnon 19

Illustrative Example with an Artificial Disturbance 20 Chris McKinnon

The Effect of Each Component of the Algorithm Fast adaptation works well but long-term learning anticipates repetitive changes in the dynamics. The combination achieves the best performance. 21 Chris McKinnon

A More Challenging Example: GP vs. wBLR Model Performance Metrics The wBLR- based model degrades more ‘gracefully’ than the GP 22 Chris McKinnon

Summary Learn Fast, Forget Slow: Safe Predictive Control for Systems with Locally Linear Actuator Dynamics Performing Repetitive Tasks [McKinnon et al, 2019, submitted ] Main Results: • Adapt quickly to changes in dynamics • Leverage past data to anticipate repetitive changes in the dynamics • Provide an accurate estimate of model uncertainty Main Limitations: • We don’t really handle how ‘fast’ the dynamics can change → ?? • Only provide safety for the next ~3 seconds → Terminal safe set? • All dynamics are currently lumped into one model → Scenario MPC? • The controller is unaware of how motion effects localization (vision) 23

Progress Thanks and hope to see you at the coffee break! 24 Chris McKinnon

Model Learning for Long-term Safe Control in Changing Environments - PowerPoint PPT Presentation

Model Learning for Long-term Safe Control in Changing Environments Christopher D. McKinnon and Angela P. Schoellig CPS V&V I&F Workshop, 2018 Platform and Operating Conditions 2 Chris McKinnon Control Approach: Stochastic MPC A

The short- -term and long term and long- -term term The short stratospheric and tropospheric

South Burlington School District Proposed Long-Term Bond Why issue a long term bond? Entities

HOME Long-term control working group Dr Kim Thomas on behalf of the long-term control group

REZCO CASH: SHORT TERM GAIN = LONG TERM PAIN CASH VS EQUITY 2 CASH VS EQUITY CASH VS EQUITY

Long-Term Care Plan By Ron Bitz Group Long Term Care Benefits Consultant Lets R s Revi view

The Federal Long Term Care Insurance Program (FLTCIP) Jane Scheidler, CLTC,LTCP Account

Managed Long Term Care Managed Long Term Care Informational Materials f B Brenda Rivera d Ri

Long Term Care Om budsm an San Diego County 4/ 26/ 2016 The mission of the Long Term Care

Long Term No Access Action 1045 25.10.17 Long Term No Access - Profile LTNA @ 2% of All Meters

Long-term Market Analysis 2018-40 About this years Long-term Market Analysis (LMA) Why LMA?

Presentation for Long-Term Care Homes September 2013 Ministry of Health and Long-Term Care

GFDR 2015 2015 Long-Te Term Finance Chapte pter 2 2: The he Use sers o s of Lo

The Future of Long-Term Care A Changing Profile Candace Chartier, CEO April 5, 2016 The

The Future of Long Term Care in Ontario Adrienne Palmer Ontario Long Term Care Association LTC

Long-Term Stewardship of Three Evapotranspirative Covers, 15 years Sue Collins, Long-Term

Long term Sec rit Long-term Security through g Quantum Cryptography D Dominique Unruh i i U

Non-linear MPC Robert Platt Northeastern University NonLinear Model Predictive Control Given:

Prt tr

Toward a Principled Framework to Design Dynamic Adaptive Streaming Alg lgorithms over HTTP

Microwave Instruments Bjorn Lambrigtsen September 18, 2002 AIRS Science Team Meeting

A new window on primordial non-Gaussianity based on 1201.5375 with M. Zaldarriaga Enrico Pajer

Online Convex Optimization Using Predictions Niangjun Chen Joint

What is Mosek up to January 15, 2019 Erling D. Andersen www.mosek.com Mosek A software

Clump formation through colliding stellar winds in the Galactic Center Caldern et al. (2016)