Dream to Control
Learning Behaviors by Latent Imagination
danijar.com/dreamer Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi @danijarh
Google Brain DeepMind
Dream to Control Learning Behaviors by Latent Imagination Danijar - - PowerPoint PPT Presentation
Dream to Control Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Google Brain DeepMind @danijarh danijar.com/dreamer We introduce Dreamer Scalable reinforcement learning from pixels
danijar.com/dreamer Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi @danijarh
Google Brain DeepMind
Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences
1 2 3
Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences
1 2 3
Scalable reinforcement learning from pixels using a world model Learn actor and value in imagination for long-sighted behaviors Efficiently update actor by backprop through imagined sequences
1 2 3
a1 a2
encode images
a1 a2
a1 a2
encode images compute states
r1 a1 r2 a2 r3 ̂ ̂ ̂
encode images compute states predict rewards
r1 a1 r2 a2 r3 ̂ ̂ ̂
̂
̂
̂
encode images compute states predict rewards predict images
encode images
encode images imagine ahead
a1 a2
encode images imagine ahead predict rewards
a1 r2 a2 r3 ̂ ̂
̂ ̂
encode images imagine ahead predict rewards predict values
a1 r2 a2 r3 ̂ ̂
v2 v3
̂ ̂
encode images imagine ahead predict rewards predict values
a1 r2 a2 r3 ̂ ̂
v2 v3
Model-free: 23 days of interaction Model-based: 28 hours of interaction
Model-free: 23 days of interaction A3C (243) Model-based: 28 hours of interaction
Model-free: 23 days of interaction PlaNet (332) A3C (243) Model-based: 28 hours of interaction
Model-free: 23 days of interaction Dreamer (823) PlaNet (332) A3C (243) Model-based: 28 hours of interaction
Model-free: 23 days of interaction Dreamer (823) PlaNet (332) D4PG (786) A3C (243) Model-based: 28 hours of interaction
Introducing Dreamer: Scalable Reinforcement Learning Using World Models
Blog post, code, videos, paper: danijar.com/dreamer