Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - - PowerPoint PPT Presentation

learning latent dynamics for planning from pixels
SMART_READER_LITE
LIVE PREVIEW

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - - PowerPoint PPT Presentation

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh danijar.com/planet Planning with Learned Models Watter et al., 2015, Banijamali


slide-1
SLIDE 1

Learning Latent Dynamics for Planning from Pixels

danijar.com/planet Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh

slide-2
SLIDE 2

Planning with Learned Models

Agrawal et al., 2016; Finn & Levine, 2016; Ebert et al., 2018 Watter et al., 2015, Banijamali et al. 2017, Zhang et al. 2017

slide-3
SLIDE 3

Visual Control Tasks

Some model-free methods can solve these tasks but need up to 100,000 episodes partially

  • bservable

contacts many joints sparse reward balance

slide-4
SLIDE 4

Recipe for scalable model-based reinforcement learning Efficient planning in latent space with large batch size Reaches top performance using 200X fewer episodes

We introduce PlaNet

1 2 3

slide-5
SLIDE 5

Latent Dynamics Model

encode images

slide-6
SLIDE 6

Latent Dynamics Model

encode images predict states

slide-7
SLIDE 7

Latent Dynamics Model

encode images predict states decode images

slide-8
SLIDE 8

Latent Dynamics Model

encode images predict states decode images decode rewards

slide-9
SLIDE 9

Recurrent State Space Model

s1 z1 s2 z2 s3 z3

stochastic deterministic

h1 h2 h3 z1 z3 z2 h1 h2 h3

Recurrent Neural Network State Space Model Recurrent State Space Model

slide-10
SLIDE 10

Unguided Video Predictions by Single Agent

5 frames context and 45 frames predicted

slide-11
SLIDE 11

Recovers the True Dynamics

Can predict simulator state from copy of model state

slide-12
SLIDE 12

Planning in Latent Space

slide-13
SLIDE 13

Planning in Latent Space

slide-14
SLIDE 14

Planning in Latent Space

slide-15
SLIDE 15

Planning in Latent Space

slide-16
SLIDE 16

Planning in Latent Space

slide-17
SLIDE 17

Planning in Latent Space

slide-18
SLIDE 18

Initialize factorized Gaussian population distribution over action sequences

Cross Entropy Planner

1

Horizon

slide-19
SLIDE 19

Initialize factorized Gaussian population distribution over action sequences Sample 1000 candidate action sequences

Cross Entropy Planner

1 2

Horizon Candidates

slide-20
SLIDE 20

Initialize factorized Gaussian population distribution over action sequences Sample 1000 candidate action sequences Evaluate candidates in parallel using the model

Cross Entropy Planner

1 2 3

Horizon Candidates

slide-21
SLIDE 21

Initialize factorized Gaussian population distribution over action sequences Sample 1000 candidate action sequences Evaluate candidates in parallel using the model Re-fit the population to the top 100 candidates

Cross Entropy Planner

1 2 3 4

Horizon Candidates

slide-22
SLIDE 22

Initialize factorized Gaussian population distribution over action sequences Sample 1000 candidate action sequences Evaluate candidates in parallel using the model Re-fit the population to the top 100 candidates Repeat for 10 steps

Cross Entropy Planner

1 2 3 4 5

Horizon Candidates

slide-23
SLIDE 23

Comparison to Model-Free Agents

Training time: 1 day on a single GPU

slide-24
SLIDE 24

Comparison of Model Designs

slide-25
SLIDE 25

Comparison of Iterative Planning

slide-26
SLIDE 26
slide-27
SLIDE 27

Some Additional Tasks

Minitaur: 400 episodes Quadruped: 2000 episodes In three dimensions

slide-28
SLIDE 28

Conclusions

1 2 3

PlaNet solves control tasks from images by efficient planning in the compact latent space of a learned model Pure planning with learned dynamics is feasible for control tasks with image observations, contacts, sparse rewards Planning with learned models can reach the performance of top model-free algorithms in 200 times fewer episodes and the same training time

slide-29
SLIDE 29

Enabling More Model-Based RL Research

Explore dynamics without supervision Distill the planner to save computation Value function to extend planning horizon With Jimmy Ba, Mohammad Norouzi, Timothy Lillicrap

slide-30
SLIDE 30

Learning Latent Dynamics for Planning from Pixels

Website with code, videos, blog post, animated paper: danijar.com/planet

Thank you

slide-31
SLIDE 31

Multi-Step Consistency in Latent Space

Perfect one-step model would give perfect multi-step predictions Under limited capacity, one-step and multi-step solutions may not coincide Encourage consistency between one-step and multi-step in latent space

1 2 3