Advanced Model-Based Reinforcement Learning CS 294-112: Deep - - PowerPoint PPT Presentation

advanced model based
SMART_READER_LITE
LIVE PREVIEW

Advanced Model-Based Reinforcement Learning CS 294-112: Deep - - PowerPoint PPT Presentation

Advanced Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 3 is extended by one week, to Wednesday after next Todays Lecture 1. Managing overfitting in model-based RL


slide-1
SLIDE 1

Advanced Model-Based Reinforcement Learning

CS 294-112: Deep Reinforcement Learning Sergey Levine

slide-2
SLIDE 2

Class Notes

  • 1. Homework 3 is extended by one week, to Wednesday after next
slide-3
SLIDE 3
  • 1. Managing overfitting in model-based RL
  • What’s the problem?
  • How do we represent uncertainty?
  • 2. Model-based RL with images
  • The POMDP model for model-based RL
  • Learning encodings
  • Learning dynamics-aware encoding
  • Goals:
  • Understand the issue with overfitting and uncertainty in model-based RL
  • Understand how the POMDP model fits with model-based RL
  • Understand recent research on model-based RL with complex observations

Today’s Lecture

slide-4
SLIDE 4

A performance gap in model-based RL

Nagabandi, Kahn, Fearing, L. ICRA 2018 pure model-based (about 10 minutes real time) model-free training (about 10 days…)

slide-5
SLIDE 5

Why the performance gap?

need to not overfit here… …but still have high capacity over here

slide-6
SLIDE 6

Why the performance gap?

every N steps very tempting to go here…

slide-7
SLIDE 7

Remember from last time…

slide-8
SLIDE 8

Remember from last time…

slide-9
SLIDE 9

Why are GPs so popular for model-based RL?

expected reward under high-variance prediction is very low, even though mean is the same!

slide-10
SLIDE 10

Intuition behind uncertainty-aware RL

every N steps

  • nly take actions for which we think we’ll get high

reward in expectation (w.r.t. uncertain dynamics) This avoids “exploiting” the model The model will then adapt and get better

slide-11
SLIDE 11

There are a few caveats…

Need to explore to get better Expected value is not the same as pessimistic value Expected value is not the same as optimistic value …but expected value is often a good start

slide-12
SLIDE 12

How can we have uncertainty-aware models?

why is this not enough? Idea 1: use output entropy what is the variance here? Two types of uncertainty:

aleatoric or statistical uncertainty epistemic or model uncertainty “the model is certain about the data, but we are not certain about the model”

slide-13
SLIDE 13

How can we have uncertainty-aware models?

Idea 2: estimate mode uncertainty

“the model is certain about the data, but we are not certain about the model” the entropy of this tells us the model uncertainty!

slide-14
SLIDE 14

Quick overview of Bayesian neural networks

expected weight uncertainty about the weight

For more, see: Blundell et al., Weight Uncertainty in Neural Networks Gal et al., Concrete Dropout We’ll learn more about variational inference later!

slide-15
SLIDE 15

Bootstrap ensembles

Train multiple models and see if they agree! How to train? Main idea: need to generate “independent” datasets to get “independent” models

slide-16
SLIDE 16

Bootstrap ensembles in deep learning

This basically works Very crude approximation, because the number of models is usually small (< 10) Resampling with replacement is usually unnecessary, because SGD and random initialization usually makes the models sufficiently independent

slide-17
SLIDE 17

How to plan with uncertainty

distribution over deterministic models

slide-18
SLIDE 18

Slightly more complex option: moment matching

slide-19
SLIDE 19

Example: model-based RL with ensembles

exceeds performance of model-free after 40k steps (about 10 minutes of real time) before after

slide-20
SLIDE 20

Further readings

  • Deisenroth et al. PILCO: A Model-Based and Data-Efficient Approach to

Policy Search. Recent papers:

  • Chua et al. Deep Reinforcement Learning in a Handful of Trials using

Probabilistic Dynamics Models.

  • Feinberg et al. Model-Based Value Expansion for Efficient Model-Free

Reinforcement Learning.

  • Buckman et al. Sample-Efficient Reinforcement Learning with Stochastic

Ensemble Value Expansion.

slide-21
SLIDE 21

Break

slide-22
SLIDE 22

Previously: model-free RL with images

This lecture: Can we use model-based methods with images?

slides from C. Finn

slide-23
SLIDE 23

Recap: Model-based RL

What about POMDPs?

slides from C. Finn

slide-24
SLIDE 24

Learning in Latent Space

Key idea: learn embedding , then learn in latent space (model-based or model-free) What do we want g to be? It depends on the method — we’ll see.

slides from C. Finn

slide-25
SLIDE 25

Learning in Latent Space

Key idea: learn embedding , then learn in latent space (model-based or model-free) controlling a slot-car

slides from C. Finn

slide-26
SLIDE 26
  • 1. collect data with exploratory policy
  • 2. learn low-dimensional embedding of image (how?)
  • 3. run q-learning with function approximation with

embedding embedding is low-dimensional and summarizes the image

}

slides from C. Finn

slide-27
SLIDE 27

Pros: + Learn visual skill very efficiently Cons:

  • Autoencoder might not recover the right

representation

  • Not necessarily suitable for model-based methods
  • 1. collect data with exploratory policy
  • 2. learn low-dimensional embedding of image (how?)
  • 3. run q-learning with function approximation with

embedding

slides from C. Finn

slide-28
SLIDE 28

Learning in Latent Space

Key idea: learn embedding , then learn in latent space (model-based or model-free)

slides from C. Finn

slide-29
SLIDE 29
  • 1. collect data with exploratory policy
  • 2. learn smooth, structured embedding of image
  • 3. learn local-linear model with embedding
  • 4. run iLQG with local models to learn to reach goal image

embedding is smooth and structured

slides from C. Finn

slide-30
SLIDE 30
  • 1. collect data with exploratory policy
  • 2. learn smooth, structured embedding of image
  • 3. learn local-linear model with embedding
  • 4. run iLQG with local models to learn to reach goal image

Because we aren’t using states, we need a reward.

slides from C. Finn

slide-31
SLIDE 31

slides from C. Finn

slide-32
SLIDE 32

Learning in Latent Space

Key idea: learn embedding , then learn in latent space (model-based or model-free)

slides from C. Finn

slide-33
SLIDE 33
  • 1. collect data
  • 2. learn embedding of image & dynamics model (jointly)
  • 3. run iLQG to learn to reach image of goal

embedding that can be modeled

slides from C. Finn

slide-34
SLIDE 34

slides from C. Finn

slide-35
SLIDE 35

Local models with images

slide-36
SLIDE 36
slide-37
SLIDE 37

Learn directly in observation space

Key idea: learn embedding

Finn, L. Deep Visual Foresight for Planning Robot

  • Motion. ICRA 2017.

Ebert, Finn, Lee, L. Self-Supervised Visual Planning with Temporal Skip Connections. CoRL 2017.

slide-38
SLIDE 38

Use predictions to complete tasks

Designated Pixel Goal Pixel

slide-39
SLIDE 39

Task execution

slide-40
SLIDE 40

Predict alternative quantities

If I take a set of actions:

Pinto et al. ‘16

What will health/damage/etc. be? Will I successfully grasp? Will I collide?

Kahn et al. ‘17 Dosovitskiy & Koltun ‘17

Pros:

+ Only predict task-relevant quantities!

Cons:

  • Need to manually pick quantities, must be able to directly observe them