Dream to Control Learning Behaviors by Latent Imagination Danijar - - PowerPoint PPT Presentation

dream to control
SMART_READER_LITE
LIVE PREVIEW

Dream to Control Learning Behaviors by Latent Imagination Danijar - - PowerPoint PPT Presentation

Dream to Control Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Google Brain DeepMind @danijarh danijar.com/dreamer We introduce Dreamer Scalable reinforcement learning from pixels


slide-1
SLIDE 1

Dream to Control

Learning Behaviors by Latent Imagination

danijar.com/dreamer Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi @danijarh

Google Brain DeepMind

slide-2
SLIDE 2

We introduce Dreamer

Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences

1 2 3

slide-3
SLIDE 3

We introduce Dreamer

Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences

1 2 3

slide-4
SLIDE 4

We introduce Dreamer

Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences

1 2 3

slide-5
SLIDE 5

Dreamer Agent Overview

slide-6
SLIDE 6

Dreamer Agent Overview

slide-7
SLIDE 7

Dreamer Agent Overview

slide-8
SLIDE 8

World Model with Latent States

  • 1
  • 2
  • 3

a1 a2

slide-9
SLIDE 9

World Model with Latent States

  • 1
  • 2
  • 3

encode images

a1 a2

slide-10
SLIDE 10

World Model with Latent States

a1 a2

  • 1
  • 2
  • 3

encode images compute states

slide-11
SLIDE 11

World Model with Latent States

r1 a1 r2 a2 r3 ̂ ̂ ̂

  • 1
  • 2
  • 3

encode images compute states predict rewards

slide-12
SLIDE 12

World Model with Latent States

  • 1

r1 a1 r2 a2 r3 ̂ ̂ ̂

  • 1

̂

  • 2
  • 2

̂

  • 3
  • 3

̂

encode images compute states predict rewards predict images

slide-13
SLIDE 13

Long-Term Video Prediction

slide-14
SLIDE 14

Long-Term Video Prediction

slide-15
SLIDE 15

Learning Behaviors by Latent Imagination

slide-16
SLIDE 16

Learning Behaviors by Latent Imagination

slide-17
SLIDE 17

Learning Behaviors by Latent Imagination

slide-18
SLIDE 18

Learning Behaviors by Latent Imagination

encode images

  • 1
slide-19
SLIDE 19

Learning Behaviors by Latent Imagination

encode images imagine ahead

a1 a2

  • 1
slide-20
SLIDE 20

Learning Behaviors by Latent Imagination

encode images imagine ahead predict rewards

a1 r2 a2 r3 ̂ ̂

  • 1
slide-21
SLIDE 21

̂ ̂

Learning Behaviors by Latent Imagination

encode images imagine ahead predict rewards predict values

a1 r2 a2 r3 ̂ ̂

  • 1

v2 v3

slide-22
SLIDE 22

̂ ̂

Learning Behaviors by Latent Imagination

encode images imagine ahead predict rewards predict values

a1 r2 a2 r3 ̂ ̂

  • 1

v2 v3

slide-23
SLIDE 23

Behaviors Learned by Dreamer

slide-24
SLIDE 24

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction Model-based: 28 hours of interaction

slide-25
SLIDE 25

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction A3C (243) Model-based: 28 hours of interaction

slide-26
SLIDE 26

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction PlaNet (332) A3C (243) Model-based: 28 hours of interaction

slide-27
SLIDE 27

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction Dreamer (823) PlaNet (332) A3C (243) Model-based: 28 hours of interaction

slide-28
SLIDE 28

Large-Scale Evaluation for Control from Pixels

Model-free: 23 days of interaction Dreamer (823) PlaNet (332) D4PG (786) A3C (243) Model-based: 28 hours of interaction

slide-29
SLIDE 29

Introducing Dreamer: Scalable Reinforcement Learning Using World Models

slide-30
SLIDE 30

Dream to Control

Learning Behaviors by Latent Imagination

Blog post, code, videos, paper: danijar.com/dreamer