Model Learning for Long-term Safe Control in Changing Environments - - PowerPoint PPT Presentation

model learning for long term safe control in
SMART_READER_LITE
LIVE PREVIEW

Model Learning for Long-term Safe Control in Changing Environments - - PowerPoint PPT Presentation

Model Learning for Long-term Safe Control in Changing Environments Christopher D. McKinnon and Angela P. Schoellig CPS V&V I&F Workshop, 2018 Platform and Operating Conditions 2 Chris McKinnon Control Approach: Stochastic MPC A


slide-1
SLIDE 1

CPS V&V I&F Workshop, 2018

Model Learning for Long-term Safe Control in Changing Environments

Christopher D. McKinnon and Angela P. Schoellig

slide-2
SLIDE 2

Platform and Operating Conditions

Chris McKinnon

2

slide-3
SLIDE 3

Control Approach: Stochastic MPC

Chris McKinnon 3

The control problem is a constrained optimization problem A predictive controller that assumes a probabilistic model for the robot dynamics Probabilistic chance constraints → deterministic constraints based on predicted uncertainty Depend on an accurate model for robot dynamics!

slide-4
SLIDE 4

What is the main challenge?

Chris McKinnon 4

Tyre Tread Un-expected Payload

Dynamics can be affected by internal and external factors that are out of our control

Surface Type

slide-5
SLIDE 5

Probabilistic Modeling for Robots in Changing Environments

Chris McKinnon 5

Learning Control Single Model Fixed Number of Models Unknown Number

  • f Models
  • Optimism-Driven Exploration [Abbeel et al, 2015]
  • Provably Safe and Robust Learning-based MPC [T
  • mlin

et al, 2013]

  • Robust Constrained Learning-based Control [Ostafew

et al, 2015]

  • Information Theoretic MPC for Model-Based

Reinforcement Learning [Theodorou et al, 2017]

  • Robust Trajectory Planning for

Autonomous Parafoils Under Wind Uncertainty [How et al, 2013]

  • Bayesian Optimization with Automatic

prior selection for Data-efficient Direct Policy Search [Jouret et al, 2017]

  • Nonparametric Bayesian Learning of Switching Linear

Dynamical Systems.DP-GPMM [Willsky et al, 2017]

  • Learning Multi-Modal Models for Robot Dynamics

with a Mixture of Gaussian Process Experts [McKinnon et al, 2017]

  • Experience Recommendation for Long

T erm Safe Learning-based Model Predictive Control in Changing Operating Conditions [McKinnon et al, 2018]

  • Learn fast, forget slow: safe predictive control for

systems with locally linear actuator dynamics performing repetitive tasks [ McKinnon et al, 2019]

My work has focused on the case when the robot can encounter new operating conditions during its deployment

slide-6
SLIDE 6

We build on a GP-based Approach [Ostafew et al, 2015]

Chris McKinnon 6

Gaussian Process Deterministic mean

  • This works well if g(x) is fixed.
  • What if g(x) can change (e.g. it snows)?
  • C. Ostafew, A. P. Schoellig, and T. Barfoot. Robust Constrained Learning-based NMPC Enabling Reliable Mobile Robot

Path Tracking.Intl. Journal of Robotics Research (IJRR), 35(13):1547–1563, 2016

Context/Mode

Multi-modal Gaussian Process Additive, Gaussian noise

slide-7
SLIDE 7

Repetitive Path Following

Chris McKinnon 7

  • At any point along the path, we only have to model the dynamics for

the upcoming maneuver and not the entire state-action space. → We can use local models which are more computationally efficient.

  • It is easy to store data indexed by location along the path

→ We get location-specific information for free

slide-8
SLIDE 8

Repetitive Path Following with Changing Dynamics

Chris McKinnon

8

slide-9
SLIDE 9

Chris McKinnon

  • Each run, the robot stores a new set of experiences.
  • Our goal is to choose useful experiences from past over the upcoming section of the

path to construct a model for control

9

Our Approach: Experience Recommendation for the GP

slide-10
SLIDE 10

Our Approach: Experience Recommendation for the GP

Chris McKinnon

  • 1. Which runs have data that is safe to use?
  • 2. Which run is most similar?

run 1 run 1 run 2

Fit a local GP to each run in memory

The GPs let us compare the dynamics in each run

10

slide-11
SLIDE 11

run 1 run 2

  • 1. Which Runs Have Data That is Safe to Use?

Chris McKinnon

11

Reject if:

Recent Data

Safety Check: Are the ‘r’-𝜏 bounds reasonable?

Outlier!

slide-12
SLIDE 12

run 1 run 2

  • 2. Which Run is Most Similar?

Chris McKinnon

12

Run Similarity Measure

Run Prior Likelihood of Recent Data

Likelihood is calculated using local GPs

Recent Data

slide-13
SLIDE 13

Constructing the Local GP for Control

Chris McKinnon

13

  • Now we have matched recent experience to the dynamics in a previous run
  • Use data from that run to construct a GP for control
slide-14
SLIDE 14

Experimental Setup

Chris McKinnon

14

  • Platform:
  • Clearpath Grizzly
  • Configurations
  • Nominal, Loaded, Altered
  • Experiment:
  • 30 runs
  • Parking lot, ~42 m course
  • switch configuration every two runs
  • Baseline:
  • Use experiences from last the run

Clearpath Grizzly in the Loaded configuration

slide-15
SLIDE 15

Chris McKinnon

15

Nominal Loaded Artificial Disturbance Modes

  • The proposed method improves significantly after just one run in each mode
  • The proposed method can search up to 300 previous runs for relevant experience
  • This enables truly long-term safe learning.

Compared the cost of traversing a path with the robot in three different configurations

Experimental Results: Long-term autonomy in Changing Conditions

slide-16
SLIDE 16

Summary

16

Main Results:

  • Model selection criteria linked to controller safety requirements
  • Resorts to a conservative model when new dynamics are encountered
  • Runs in a separate thread to the control loop

Main Limitations:

  • It was hard to get the GP to model the dynamics in a wide range of operating

conditions

  • The model only improves after each run
  • Proof by experiment…

Experience Recommendation for Long T erm Safe Learning-based Model Predictive Control in Changing Operating Conditions [McKinnon et al, 2018]

slide-17
SLIDE 17

To try to improve, new assumptions about robot dynamics

Chris McKinnon

17

Learn ‘actuator dynamics’ rather than a general additive model error. For a unicycle-type robot like the Grizzly, this is:

We try to learn something simpler so we can do so more reliably

slide-18
SLIDE 18

New Learning Model: Weighted Bayesian Linear Regression

Chris McKinnon

18

Assume a simpler form for the model with no hyperparameters We still want to learn from all past data, so include a weight for each experience After some math, we can estimate the distribution of w and 𝜏2

Compared to the previous approach:

  • 1. Simpler model (vs. GP) → fitting the model is more reliable
  • 2. Predicting acceleration instead of velocity → Easier to model
  • 3. wBLR scales better with # data pts → can leverage more data
slide-19
SLIDE 19

Re-visit Model Learning

Chris McKinnon 19

  • 1. Use data from the live run → Fast adaptation to new conditions
  • 2. Use data from previous runs to anticipate repetitive changes

** wBLR uses a weighted combination of all data instead of a subset like the GP

slide-20
SLIDE 20

Illustrative Example with an Artificial Disturbance

Chris McKinnon

20

slide-21
SLIDE 21

The Effect of Each Component of the Algorithm

Chris McKinnon

21

Fast adaptation works well but long-term learning anticipates repetitive changes in the dynamics. The combination achieves the best performance.

slide-22
SLIDE 22

A More Challenging Example: GP vs. wBLR

Chris McKinnon

22

The wBLR-based model degrades more ‘gracefully’ than the GP

Model Performance Metrics

slide-23
SLIDE 23

Summary

23

Main Results:

  • Adapt quickly to changes in dynamics
  • Leverage past data to anticipate repetitive changes in the dynamics
  • Provide an accurate estimate of model uncertainty

Main Limitations:

  • We don’t really handle how ‘fast’ the dynamics can change → ??
  • Only provide safety for the next ~3 seconds → Terminal safe set?
  • All dynamics are currently lumped into one model → Scenario MPC?
  • The controller is unaware of how motion effects localization (vision)

Learn Fast, Forget Slow: Safe Predictive Control for Systems with Locally Linear Actuator Dynamics Performing Repetitive Tasks [McKinnon et al, 2019, submitted]

slide-24
SLIDE 24

Thanks and hope to see you at the coffee break!

Chris McKinnon

24

Progress