Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - - PowerPoint PPT Presentation
Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - - PowerPoint PPT Presentation
Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh danijar.com/planet Planning with Learned Models Watter et al., 2015, Banijamali
Planning with Learned Models
Agrawal et al., 2016; Finn & Levine, 2016; Ebert et al., 2018 Watter et al., 2015, Banijamali et al. 2017, Zhang et al. 2017
Visual Control Tasks
Some model-free methods can solve these tasks but need up to 100,000 episodes partially
- bservable
contacts many joints sparse reward balance
Recipe for scalable model-based reinforcement learning Efficient planning in latent space with large batch size Reaches top performance using 200X fewer episodes
We introduce PlaNet
1 2 3
Latent Dynamics Model
encode images
Latent Dynamics Model
encode images predict states
Latent Dynamics Model
encode images predict states decode images
Latent Dynamics Model
encode images predict states decode images decode rewards
Recurrent State Space Model
s1 z1 s2 z2 s3 z3
stochastic deterministic
h1 h2 h3 z1 z3 z2 h1 h2 h3
Recurrent Neural Network State Space Model Recurrent State Space Model
Unguided Video Predictions by Single Agent
5 frames context and 45 frames predicted
Recovers the True Dynamics
Can predict simulator state from copy of model state
Planning in Latent Space
Planning in Latent Space
Planning in Latent Space
Planning in Latent Space
Planning in Latent Space
Planning in Latent Space
Initialize factorized Gaussian population distribution over action sequences
Cross Entropy Planner
1
Horizon
Initialize factorized Gaussian population distribution over action sequences Sample 1000 candidate action sequences
Cross Entropy Planner
1 2
Horizon Candidates
Initialize factorized Gaussian population distribution over action sequences Sample 1000 candidate action sequences Evaluate candidates in parallel using the model
Cross Entropy Planner
1 2 3
Horizon Candidates
Initialize factorized Gaussian population distribution over action sequences Sample 1000 candidate action sequences Evaluate candidates in parallel using the model Re-fit the population to the top 100 candidates
Cross Entropy Planner
1 2 3 4
Horizon Candidates
Initialize factorized Gaussian population distribution over action sequences Sample 1000 candidate action sequences Evaluate candidates in parallel using the model Re-fit the population to the top 100 candidates Repeat for 10 steps
Cross Entropy Planner
1 2 3 4 5
Horizon Candidates
Comparison to Model-Free Agents
Training time: 1 day on a single GPU
Comparison of Model Designs
Comparison of Iterative Planning
Some Additional Tasks
Minitaur: 400 episodes Quadruped: 2000 episodes In three dimensions
Conclusions
1 2 3
PlaNet solves control tasks from images by efficient planning in the compact latent space of a learned model Pure planning with learned dynamics is feasible for control tasks with image observations, contacts, sparse rewards Planning with learned models can reach the performance of top model-free algorithms in 200 times fewer episodes and the same training time
Enabling More Model-Based RL Research
Explore dynamics without supervision Distill the planner to save computation Value function to extend planning horizon With Jimmy Ba, Mohammad Norouzi, Timothy Lillicrap
Learning Latent Dynamics for Planning from Pixels
Website with code, videos, blog post, animated paper: danijar.com/planet
Thank you
Multi-Step Consistency in Latent Space
Perfect one-step model would give perfect multi-step predictions Under limited capacity, one-step and multi-step solutions may not coincide Encourage consistency between one-step and multi-step in latent space
1 2 3