Dream to Control Learning Behaviors by Latent Imagination Danijar - - PowerPoint PPT Presentation

▶

Nov 15, 2022 110 likes •431 views

Dream to Control Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Google Brain DeepMind @danijarh danijar.com/dreamer We introduce Dreamer Scalable reinforcement learning from pixels

SLIDE 1

Dream to Control

Learning Behaviors by Latent Imagination

danijar.com/dreamer Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi @danijarh

Google Brain DeepMind

SLIDE 2

We introduce Dreamer

Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences

1 2 3

SLIDE 3

We introduce Dreamer

Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences

1 2 3

SLIDE 4

We introduce Dreamer

Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences

1 2 3

SLIDE 5

Dreamer Agent Overview

SLIDE 6

Dreamer Agent Overview

SLIDE 7

Dreamer Agent Overview

SLIDE 8

World Model with Latent States

a1 a2

SLIDE 9

World Model with Latent States

encode images

a1 a2

SLIDE 10

World Model with Latent States

a1 a2

encode images compute states

SLIDE 11

World Model with Latent States

r1 a1 r2 a2 r3 ̂ ̂ ̂

encode images compute states predict rewards

SLIDE 12

World Model with Latent States

r1 a1 r2 a2 r3 ̂ ̂ ̂

encode images compute states predict rewards predict images

SLIDE 13

Long-Term Video Prediction

SLIDE 14

Long-Term Video Prediction

SLIDE 15

Learning Behaviors by Latent Imagination

SLIDE 16

Learning Behaviors by Latent Imagination

SLIDE 17

Learning Behaviors by Latent Imagination

SLIDE 18

Learning Behaviors by Latent Imagination

encode images

SLIDE 19

Learning Behaviors by Latent Imagination

encode images imagine ahead

a1 a2

SLIDE 20

Learning Behaviors by Latent Imagination

encode images imagine ahead predict rewards

a1 r2 a2 r3 ̂ ̂

SLIDE 21

̂ ̂

Learning Behaviors by Latent Imagination

encode images imagine ahead predict rewards predict values

a1 r2 a2 r3 ̂ ̂

v2 v3

SLIDE 22

̂ ̂

Learning Behaviors by Latent Imagination

encode images imagine ahead predict rewards predict values

a1 r2 a2 r3 ̂ ̂

v2 v3

SLIDE 23

Behaviors Learned by Dreamer

SLIDE 24

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction Model-based: 28 hours of interaction

SLIDE 25

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction A3C (243) Model-based: 28 hours of interaction

SLIDE 26

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction PlaNet (332) A3C (243) Model-based: 28 hours of interaction

SLIDE 27

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction Dreamer (823) PlaNet (332) A3C (243) Model-based: 28 hours of interaction

SLIDE 28

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction Dreamer (823) PlaNet (332) D4PG (786) A3C (243) Model-based: 28 hours of interaction

SLIDE 29

Introducing Dreamer: Scalable Reinforcement Learning Using World Models

SLIDE 30

Dream to Control

Learning Behaviors by Latent Imagination

Blog post, code, videos, paper: danijar.com/dreamer