Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun - - PowerPoint PPT Presentation

task agnostic dynamics priors for deep reinforcement
SMART_READER_LITE
LIVE PREVIEW

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun - - PowerPoint PPT Presentation

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2 1 MIT, 2 Princeton Key Questions t t+1 Can we learn physics in a task-agnostic fashion? Does it help sample efficiency of RL? Can we


slide-1
SLIDE 1

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

Yilun Du1, Karthik Narasimhan2

1 MIT, 2 Princeton

slide-2
SLIDE 2

Key Questions

  • Can we learn physics in a

task-agnostic fashion?

  • Does it help sample

efficiency of RL?

  • Can we transfer the learned

physics from one environment to other?

t t+1

slide-3
SLIDE 3

Dynamics Model in RL

  • Frame Prediction (Oh et al.(2015), Finn et al.(2016), Weber et al. (2017), …)
  • Action conditional and not easily transferable across environments
  • Parameterized physics models (Cutler et al. (2014), Scholz et al.(2014), Zhu

et al. (2018), …)

  • Requires manual specification
  • Our method: learn physics priors through task-independent data
  • Action unconditional modeling of data
  • Inductive local biases in architecture to reflect local nature of physics
slide-4
SLIDE 4

Overall Approach

  • Pre-train a frame predictor on physics

videos

  • Initialize dynamics model and use it to

train a policy

  • Simultaneously fine-tune dynamics

model on target environment.

slide-5
SLIDE 5

SpatialNet

  • Two key operations:
  • Isolation of dynamics of each entity
  • Accurate modeling of dynamic interactions of local spaces around each entity

Future Frame Spatial Memory Input

SpatialNet

zt ht zt+1 ht+1

slide-6
SLIDE 6

Spatial Memory

  • Use 2D grid memory to locally store dynamic state of each object
  • Use convolutions and residual connections to better model dynamics (instead of

additive updates in the ConvLSTM model (Xingjian et al., 2015))

Spatial Memory State Ground Truth Label State (ht ) Input (zt) Gated Input (it) State (ht) Proposal State (ut) State New (ht+1) Input (zt)

Spatial Memory

Output (ot) Ce Cu Cdyn Cd

slide-7
SLIDE 7

Spatial Memory

  • Use 2D grid memory to locally store dynamic state of each object
  • Use convolutions and residual connections to better model dynamics (instead of

additive updates in the ConvLSTM model (Xingjian et al., 2015))

Spatial Memory State Input Frames

slide-8
SLIDE 8

Experimental Setup

  • PhysVideos: 625k frames of video

containing moving objects of various shapes and sizes

  • PhysWorld: Collection of 2D/3D

Physics-centric games

  • Atari: Stochastic version with sticky

actions

  • RL agent: Predicted frames stack with
  • bservation frames as joint input into

a policy

  • Same prior for all tasks

PhysGoal PhysForage PhysShooter Phys3D

slide-9
SLIDE 9

Model Predictions

Pixel Prediction Accuracy

slide-10
SLIDE 10

Predicting Physical Parameters

slide-11
SLIDE 11

Policy Learning: PhysShooter

slide-12
SLIDE 12

Policy Learning: Atari

slide-13
SLIDE 13

Transfer Learning

Model Transfer > Model + Policy Transfer > No Transfer

slide-14
SLIDE 14

Conclusion

  • Task-agnostic priors over models provide a potential solution for

improving sample efficiency for RL

  • Being task-agnostic allows us to pre-train priors without access to the

target task

  • Such priors also generalize well to a wide variety of tasks and show

good transfer performance