Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - - PowerPoint PPT Presentation

learning latent dynamics for planning from pixels
SMART_READER_LITE
LIVE PREVIEW

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - - PowerPoint PPT Presentation

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijar h danijar.com/planet Planning with Learned Models Watter et al., 2015, Banijamali


slide-1
SLIDE 1

Learning Latent Dynamics for Planning from Pixels

danijar.com/planet Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh

slide-2
SLIDE 2

Planning with Learned Models

Agrawal et al., 2016; Finn & Levine, 2016; Ebert et al., 2018 Watter et al., 2015, Banijamali et al. 2017, Zhang et al. 2017

slide-3
SLIDE 3

Visual Control Tasks

Some model-free methods can solve these tasks but need up to 100,000 episodes partially

  • bservable

contacts many joints sparse reward balance

slide-4
SLIDE 4

Visual Control Tasks

Some model-free methods can solve these tasks but need up to 100,000 episodes partially

  • bservable

contacts many joints sparse reward balance

slide-5
SLIDE 5

Recipe for scalable model-based reinforcement learning Efficient planning in latent space with large batch size Reaches top performance using 200X fewer episodes

We introduce PlaNet

1 2 3

slide-6
SLIDE 6

Latent Dynamics Model

encode images

slide-7
SLIDE 7

Latent Dynamics Model

encode images predict states

slide-8
SLIDE 8

Latent Dynamics Model

encode images predict states decode images

slide-9
SLIDE 9

Latent Dynamics Model

encode images predict states decode images decode rewards

slide-10
SLIDE 10

Recurrent State Space Model

s1 z1 s2 z2 s3 z3

stochastic deterministic

h1 h2 h3 z1 z3 z2 h1 h2 h3

Recurrent Neural Network State Space Model Recurrent State Space Model

slide-11
SLIDE 11

Unguided Video Predictions by Single Agent

5 frames context and 45 frames predicted

slide-12
SLIDE 12

Unguided Video Predictions by Single Agent

5 frames context and 45 frames predicted

slide-13
SLIDE 13

Planning in Latent Space

slide-14
SLIDE 14

Planning in Latent Space

slide-15
SLIDE 15

Planning in Latent Space

slide-16
SLIDE 16

Planning in Latent Space

slide-17
SLIDE 17

Planning in Latent Space

slide-18
SLIDE 18

Planning in Latent Space

slide-19
SLIDE 19

Comparison to Model-Free Agents

Training time 1 day on a single GPU

slide-20
SLIDE 20

Enabling More Model-Based RL Research

Explore dynamics without supervision Distill the planner to save computation Value function to extend planning horizon

slide-21
SLIDE 21

Learning Latent Dynamics for Planning from Pixels

Website with code, videos, blog post, animated paper: danijar.com/planet

33