task agnostic dynamics priors for deep reinforcement
play

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun - PowerPoint PPT Presentation

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2 1 MIT, 2 Princeton Key Questions t t+1 Can we learn physics in a task-agnostic fashion? Does it help sample efficiency of RL? Can we


  1. Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2 1 MIT, 2 Princeton

  2. Key Questions t t+1 • Can we learn physics in a task-agnostic fashion? • Does it help sample efficiency of RL? • Can we transfer the learned physics from one environment to other?

  3. Dynamics Model in RL • Frame Prediction (Oh et al.(2015), Finn et al.(2016), Weber et al. (2017), …) • Action conditional and not easily transferable across environments • Parameterized physics models (Cutler et al. (2014), Scholz et al.(2014), Zhu et al. (2018), …) • Requires manual specification • Our method: learn physics priors through task-independent data • Action unconditional modeling of data • Inductive local biases in architecture to reflect local nature of physics

  4. Overall Approach • Pre-train a frame predictor on physics videos • Initialize dynamics model and use it to train a policy • Simultaneously fine-tune dynamics model on target environment.

  5. SpatialNet • Two key operations: • Isolation of dynamics of each entity • Accurate modeling of dynamic interactions of local spaces around each entity SpatialNet h t z t Input Future Frame z t+1 h t+1 Spatial Memory

  6. Spatial Memory • Use 2D grid memory to locally store dynamic state of each object • Use convolutions and residual connections to better model dynamics (instead of additive updates in the ConvLSTM model (Xingjian et al., 2015)) Spatial Memory State (h t ) Gated Input (i t ) State New (h t+1 ) C e C dyn C u C d Spatial Memory Ground State Output (o t ) Proposal State (u t ) Truth Label Input (z t ) State (h t ) Input (z t )

  7. Spatial Memory • Use 2D grid memory to locally store dynamic state of each object • Use convolutions and residual connections to better model dynamics (instead of additive updates in the ConvLSTM model (Xingjian et al., 2015)) Spatial Memory State Input Frames

  8. Experimental Setup • PhysVideos : 625k frames of video containing moving objects of various shapes and sizes PhysGoal PhysShooter • PhysWorld : Collection of 2D/3D Physics-centric games • Atari : Stochastic version with sticky actions • RL agent: Predicted frames stack with observation frames as joint input into a policy • Same prior for all tasks Phys3D PhysForage

  9. Model Predictions Pixel Prediction Accuracy

  10. Predicting Physical Parameters

  11. Policy Learning: PhysShooter

  12. Policy Learning: Atari

  13. Transfer Learning Model Transfer > Model + Policy Transfer > No Transfer

  14. Conclusion • Task-agnostic priors over models provide a potential solution for improving sample efficiency for RL • Being task-agnostic allows us to pre-train priors without access to the target task • Such priors also generalize well to a wide variety of tasks and show good transfer performance

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend