Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun - PowerPoint PPT Presentation

Jul 24, 2023 •400 likes •555 views

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2 1 MIT, 2 Princeton Key Questions t t+1 Can we learn physics in a task-agnostic fashion? Does it help sample efficiency of RL? Can we

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2 1 MIT, 2 Princeton
Key Questions t t+1 • Can we learn physics in a task-agnostic fashion? • Does it help sample efficiency of RL? • Can we transfer the learned physics from one environment to other?
Dynamics Model in RL • Frame Prediction (Oh et al.(2015), Finn et al.(2016), Weber et al. (2017), …) • Action conditional and not easily transferable across environments • Parameterized physics models (Cutler et al. (2014), Scholz et al.(2014), Zhu et al. (2018), …) • Requires manual specification • Our method: learn physics priors through task-independent data • Action unconditional modeling of data • Inductive local biases in architecture to reflect local nature of physics
Overall Approach • Pre-train a frame predictor on physics videos • Initialize dynamics model and use it to train a policy • Simultaneously fine-tune dynamics model on target environment.
SpatialNet • Two key operations: • Isolation of dynamics of each entity • Accurate modeling of dynamic interactions of local spaces around each entity SpatialNet h t z t Input Future Frame z t+1 h t+1 Spatial Memory
Spatial Memory • Use 2D grid memory to locally store dynamic state of each object • Use convolutions and residual connections to better model dynamics (instead of additive updates in the ConvLSTM model (Xingjian et al., 2015)) Spatial Memory State (h t ) Gated Input (i t ) State New (h t+1 ) C e C dyn C u C d Spatial Memory Ground State Output (o t ) Proposal State (u t ) Truth Label Input (z t ) State (h t ) Input (z t )
Spatial Memory • Use 2D grid memory to locally store dynamic state of each object • Use convolutions and residual connections to better model dynamics (instead of additive updates in the ConvLSTM model (Xingjian et al., 2015)) Spatial Memory State Input Frames
Experimental Setup • PhysVideos : 625k frames of video containing moving objects of various shapes and sizes PhysGoal PhysShooter • PhysWorld : Collection of 2D/3D Physics-centric games • Atari : Stochastic version with sticky actions • RL agent: Predicted frames stack with observation frames as joint input into a policy • Same prior for all tasks Phys3D PhysForage
Model Predictions Pixel Prediction Accuracy
Predicting Physical Parameters
Policy Learning: PhysShooter
Policy Learning: Atari
Transfer Learning Model Transfer > Model + Policy Transfer > No Transfer
Conclusion • Task-agnostic priors over models provide a potential solution for improving sample efficiency for RL • Being task-agnostic allows us to pre-train priors without access to the target task • Such priors also generalize well to a wide variety of tasks and show good transfer performance

Recommend

Task-agnostic priors for reinforcement learning Karthik Narasimhan Princeton Collaborators:

Task-agnostic priors for reinforcement learning Karthik Narasimhan Princeton Collaborators: Yilun Du (MIT), Regina Barzilay (MIT), Tommi Jaakkola (MIT) State of RL ~1100 PF/s days ~800 PF/s days or 45000 years (source: OpenAI) Little to no

556 views • 30 slides

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

528 views • 35 slides

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Review: Continuous priors, discrete data Bent coin: unknown probability of heads. Prior f ( ) = 2 on [0,1]. Data: heads on one toss.

549 views • 26 slides

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

568 views • 18 slides

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement learning? Agent/Actor + Action + Environment + State + Reward How does reinforcement learning work?

793 views • 31 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

589 views • 27 slides

LANGUAGE-AGNOSTIC INJECTION LANGUAGE-AGNOSTIC INJECTION DETECTION DETECTION Lars Hermerschmidt,

LANGUAGE-AGNOSTIC INJECTION LANGUAGE-AGNOSTIC INJECTION DETECTION DETECTION Lars Hermerschmidt, Andreas Straub, Goran Piskachev injections grow on trees 1 SHOTGUN UNPARSER SHOTGUN UNPARSER 1 if (recursive || print_dir_name) 2 { 3 if

640 views • 27 slides

MANA for MPI MPI-Agnostic Network-Agnostic Transparent Checkpointing Rohan Garg, *Gregory Price,

MANA for MPI MPI-Agnostic Network-Agnostic Transparent Checkpointing Rohan Garg, *Gregory Price, and Gene Cooperman Northeastern University Why checkpoint, and why transparently? Whether for maintenance, analysis, time-sharing, load balancing,

778 views • 40 slides

Pool-based Agnostic Pool-based Agnostic Experiment Design Experiment Design in Linear

ECML2008 Sep. 15-19, 2008 Pool-based Agnostic Pool-based Agnostic Experiment Design Experiment Design in Linear Regression in Linear Regression Masashi Sugiyama (Tokyo Tech.) Shinichi Nakajima (Nikon) 2 Linear Regression Linear

437 views • 25 slides

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang

Introduction Zellners g priors Mixture of g priors Consistency Discussion Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang Department of Statistics, University of Wisconsin Madison April 30, 2010

384 views • 35 slides

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate to a likelihood if the posterior is the same type of distribution as the prior. Updating becomes algebra instead of calculus. hypothesis data

509 views • 33 slides

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data Bent coin: unknown probability of heads. Prior f ( ) = 2 on [0,1]. Data: heads on one toss. Question: Find the posterior pdf to this data.

496 views • 18 slides

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation,

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation, and Cargo-Cult Statistics Quantifauxcation, and Cargo-Cult Statistics Philip B. Stark, www.stat.berkeley.edu/~stark, @philipbstark Philip B.

896 views • 54 slides

Informative Priors for Graphical Model Structure James Cussens, University of York

Informative Priors for Graphical Model Structure James Cussens, University of York jc@cs.york.ac.uk (joint work with Nicos Angelopoulos) Supported by the UK EPSRC MATHFIT programme Use of structural priors The use of structural priors

353 views • 23 slides

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree Search, Nature 2016] CS 486/686 University of Waterloo Lecture 21: July 12, 2017 Outline AlphaGo Supervised Learning of Policy Networks

541 views • 15 slides

Session 9 Serialization/JSON 1 Lecture Objectives Understand the need for serialization

Session 9 Serialization/JSON Session 9 Serialization/JSON 1 Lecture Objectives Understand the need for serialization Understand various approaches to serialization Understand the use of JSON as a popular approach to serialization

339 views • 11 slides

HOWDY! DSA IT Liaisons Communications Committee 11/03/2020 Agenda Tech Tip: Siteimprove

HOWDY! DSA IT Liaisons Communications Committee 11/03/2020 Agenda Tech Tip: Siteimprove Intelligence Platform Annual Security Assessment Update Data Protection Efforts and Best Practices Mac Management Initiative

443 views • 17 slides

GROUND ZERO SEO HOW TO ANALYZE YOUR SEO PROJECT BEFORE YOU GET STARTED What is Ground Zero SEO?

GROUND ZERO SEO HOW TO ANALYZE YOUR SEO PROJECT BEFORE YOU GET STARTED What is Ground Zero SEO? My System of Analyzing an SEO project BEFORE I do anything! There are many questions you need to ask yourself that will determine the

407 views • 6 slides

GUESSING Guessing is harder than knowing. Orel Herschiser TODAY Our definition of

GUESSING Guessing is harder than knowing. Orel Herschiser TODAY Our definition of Local vs Global SEO 5 ways they are the same 5 Local SEO differences Summary Q&A Next steps SEARCH ENGINE OPTIMIZATION*

684 views • 20 slides

Computer Graphics (CS 543) Lecture 10: Normal Maps, Parametrization, Tone Mapping Prof Emmanuel

Computer Graphics (CS 543) Lecture 10: Normal Maps, Parametrization, Tone Mapping Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) Normal Mapping Store normals in texture Normals <x,y,z> stored in

707 views • 34 slides

Quick Eventz is a global location-based community calendar web application where you can list

Putting your community EVENTZ on the MAP TM Quick Eventz is a global location-based community calendar web application where you can list and search for events in your area on an interactive map. Tuesday, August 27, 13 WHAT IS QUICK

146 views • 14 slides

Point-of-Interest Recommender Systems by HosseinAli Rahmani Dashti Supervisors: Dr. Mitra

Point-of-Interest Recommender Systems by HosseinAli Rahmani Dashti Supervisors: Dr. Mitra Baratchi Advisor: Dr. Sajad Ahmadian Dr. Mohsen Afsharchi Outline Data Social Networks Location-Based Social Networks Information

495 views • 30 slides

Location-Based Web Services for Car Infotainment Susanne Boll University of Oldenburg &

Location-Based Web Services for Car Infotainment Susanne Boll University of Oldenburg & OFFIS Institute for Information Technology 01.07.2009 1st tubs.CITY Symposium, Braunschweig 521630"N 103143"E C3World projekt

343 views • 16 slides

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun - PowerPoint PPT Presentation

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2 1 MIT, 2 Princeton Key Questions t t+1 Can we learn physics in a task-agnostic fashion? Does it help sample efficiency of RL? Can we

Task-agnostic priors for reinforcement learning Karthik Narasimhan Princeton Collaborators:

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

LANGUAGE-AGNOSTIC INJECTION LANGUAGE-AGNOSTIC INJECTION DETECTION DETECTION Lars Hermerschmidt,

MANA for MPI MPI-Agnostic Network-Agnostic Transparent Checkpointing Rohan Garg, *Gregory Price,

Pool-based Agnostic Pool-based Agnostic Experiment Design Experiment Design in Linear

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation,

Informative Priors for Graphical Model Structure James Cussens, University of York

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Session 9 Serialization/JSON 1 Lecture Objectives Understand the need for serialization

HOWDY! DSA IT Liaisons Communications Committee 11/03/2020 Agenda Tech Tip: Siteimprove

GROUND ZERO SEO HOW TO ANALYZE YOUR SEO PROJECT BEFORE YOU GET STARTED What is Ground Zero SEO?

GUESSING Guessing is harder than knowing. Orel Herschiser TODAY Our definition of

Computer Graphics (CS 543) Lecture 10: Normal Maps, Parametrization, Tone Mapping Prof Emmanuel

Quick Eventz is a global location-based community calendar web application where you can list

Point-of-Interest Recommender Systems by HosseinAli Rahmani Dashti Supervisors: Dr. Mitra

Location-Based Web Services for Car Infotainment Susanne Boll University of Oldenburg &amp;

Location-Based Web Services for Car Infotainment Susanne Boll University of Oldenburg &