Fast Adaptation via Policy-Dynamics Value Functions Roberta - PowerPoint PPT Presentation

Mar 22, 2023 •136 likes •297 views

Fast Adaptation via Policy-Dynamics Value Functions Roberta Raileanu Max Goldstein Arthur Szlam Rob Fergus NYU NYU FAIR NYU ICML 2020 Dynamics Often Change in the Real World How can agents rapidly adapt to changes in the environments

Fast Adaptation via Policy-Dynamics Value Functions Roberta Raileanu Max Goldstein Arthur Szlam Rob Fergus NYU NYU FAIR NYU ICML 2020
Dynamics Often Change in the Real World
How can agents rapidly adapt to changes in the environment’s dynamics ? Learn a General Value Function in the Space of Policies and Dynamics
Policy-Dynamics Value Function (PD-VF) Value Function Total Future Reward Fixed Policy-Dynamics Total Future Reward Value Function
Fast Adaptation to New Dynamics Family of Environments Each Environment has a unobserved Different Transition Function Train on a Family of Different but Related Dynamics Test on New Dynamics
Training Recipe 1. Reinforcement Learning Phase - train individual policies on each training environment 2. Self-Supervised Learning Phase - Learn policy and dynamics embeddings using collected the trajectories 3. Supervised Learning Phase - Learn a value function for this space of policies and environments 4. Evaluation Phase - Infer the dynamics of a new environment using steps - Find the policy that maximizes the learned value function
Learning Policy and Dynamics Embeddings Learn Policy Embedding Learn Dynamics Embedding
Learning the Policy-Dynamics Value Function Training the Policy-Dynamics Value Function
Evaluation Phase Closed-form solution: top singular vector of A’s SVD decomposition Optimal Policy Embedding (OPE)
Environments Spaceship Swimmer Ant-Wind Continuous Dynamics Ant-Legs Ant-Legs Discrete Dynamics
Evaluation on Unseen Environments
Evaluation on Unseen Environments
Learned Embeddings Policy Embeddings Dynamics Embeddings Policy Color Dynamics Color
Takeaways Learn a value function in a space of policies and dynamics Infer the dynamics of a new environment from only a few interactions No need for parameter updates, long rollouts, or dense rewards to adapt Improved performance on unseen environments
Future Work ● Reward function variation → condition W on a task embedding ● Multi-agent settings → dynamics given by the others’ policies ● Continual learning ● Integrate prior knowledge / constraints ● Estimate other metrics apart from reward
Thank you!

Recommend

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema Winter School 2009 Minema Minema Winter School 2009 Winter School 2009 Winter School 2009 Minema Minema Minema Minema Winter School 2009 Winter

661 views • 54 slides

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a changing environment. Coastal change is not new! Traditional defences are a form of adaptation too. Are they always appropriate? Property

516 views • 19 slides

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward Fireside Chat Story so far What mindset do you need? Whats different about this market? Fast Reward 2 Fireside Chat Clear

508 views • 12 slides

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27 October 2020 Adaptation 1 Better quality when system is adapted to a task Domain adaptation to a specific domain, e.g., information technology

962 views • 46 slides

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Models Models Models Jen- -Wei Wei Roger Roger Kuo Kuo Jen Speech Lab, CSIE, NTNU Speech Lab, CSIE, NTNU rogerkuo@csie.ntnu.edu.tw

2.07k views • 183 slides

Chapter 6 Attaway MATLAB 4E Types of Functions Categories of functions: functions that

MATLAB Programs Chapter 6 Attaway MATLAB 4E Types of Functions Categories of functions: functions that calculate and return one value functions that calculate and return more than one value functions that just accomplish a task, such

549 views • 29 slides

Statistical mechanics via Answers: GUE asymptotics of symmetric functions Probability via Schur

Asymptotics of symmetric functions Greta Panova Lozenge tilings The objects Probabilistic questions Statistical mechanics via Answers: GUE asymptotics of symmetric functions Probability via Schur functions Schur functions asymptotics

770 views • 50 slides

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions are full-fledged objects in Python This means you can pass functions as parameters Example: Calculate the average of the values of a function at

332 views • 12 slides

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd Functions Dr. Ken W. Smith Sam Houston State University 2013 Smith (SHSU) Elementary Functions 2013 1 / 25 Even and odd functions In this lesson we

865 views • 71 slides

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W. Smith Sam Houston State University 2013 Smith (SHSU) Elementary Functions 2013 13 / 27 Functions defined by equations Many functions we explore

786 views • 63 slides

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Data - Vectors or Functions Vectors Functions Popular functional bases Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions Popular functional bases Outline Data - Vectors or Functions 1 Vectors 2

1.26k views • 63 slides

Functions Programmer-Defined Functions Local Variables in Functions Overloading

Functions Programmer-Defined Functions Local Variables in Functions Overloading Function Names void Functions, Call-By-Reference Parameters in Functions Programmer-Defined Functions function declaration function call

1.21k views • 70 slides

Functions Declarations vs Definitions Inline Functions Class Member functions

2/21/2017 Functions in C++ Functions Declarations vs Definitions Inline Functions Class Member functions Overloaded Functions For : COP 3330. Pointers to functions Object oriented Programming (Using C++) Recursive

184 views • 6 slides

Periodic Functions and Orthogonal Systems Periodic Functions Even and Odd Functions

Periodic Functions and Orthogonal Systems Periodic Functions Even and Odd Functions Properties of Even and Odd Functions Properties of Periodic Functions Piecewise-Defined Functions Representations of Even and Odd Extensions

501 views • 11 slides

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions Main syntactic feature: Variable input length to fixed length output Hash Functions Main syntactic feature: Variable input length to fixed length

1.85k views • 147 slides

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions Main syntactic feature: Variable input length to fixed length output Hash Functions Main syntactic feature: Variable input length to fixed length

1.76k views • 147 slides

Lecture 4 Jan-Willem van de Meent Homework 1 Counting with Spark Probabilistic Prediction

Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 4 Jan-Willem van de Meent Homework 1 Counting with Spark Probabilistic Prediction <latexit

517 views • 34 slides

Defuzzification Techniques Debasis Samanta IIT Kharagpur dsamanta@iitkgp.ac.in 09.02.2018

Defuzzification Techniques Debasis Samanta IIT Kharagpur dsamanta@iitkgp.ac.in 09.02.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 09.02.2018 1 / 55 What is defuzzification? Defuzzification means the fuzzy to crisp

670 views • 55 slides

Workshop 7.6a: Factorial ANOVA Murray Logan 19 Jul 2017 Section 1 Background Factorial ANOVA

Workshop 7.6a: Factorial ANOVA Murray Logan 19 Jul 2017 Section 1 Background Factorial ANOVA Factorial ANOVA Response (mean growth rate of seedlings) Response (mean growth rate of seedlings) a) b)

250 views • 22 slides

Right-handed neutrinos as the source of density perturbations Lotfi Boubekeur ICTP - Trieste.

Right-handed neutrinos as the source of density perturbations Lotfi Boubekeur ICTP - Trieste. Based on: LB and P. Creminelli , hep-ph/0602052 PRD 73 (2006) 103516. Workshop on Cosmological Perturbations, GGI Firenze 25 October,

610 views • 17 slides

Strategy to lock the knee of exoskeleton stance leg: study in the framework of ballistic walking

Strategy to lock the knee of exoskeleton stance leg: study in the framework of ballistic walking model Yannick Aoustin, Alexander Formalskii 1 Mesrob 2015, Nantes July 10, 2015 1 Supported Ministry of Education and Science of Russian Federation,

319 views • 18 slides

Food & Consumer Demand in the 21 st Century Mid-South Agricultural Finance Conference

Food & Consumer Demand in the 21 st Century Mid-South Agricultural Finance Conference University of Tennessee-Martin Matthew C. Roberts, PhD | www.kernmantlegroup.com 9 August 2017 US Wealth Continues to Rise US Wealth Continues to Rise:

481 views • 30 slides

Verse in Question: What does wisdom mean? If any of you lacks wisdom, ... James 1:5a

Joy and Doubts 07.31.11 (part 3 of Joy Series) Scripture: James 1:5-8 ESV Intro: Today we want to continue our Joy series by talking about the connection between Joy and Doubt. Doubt is the great robber of our faith and of our joy! We cannot

334 views • 4 slides

t f a r D Weighted Essentially Non-Oscillatory limiters for Runge-Kutta Discontinuous

t f a r D Weighted Essentially Non-Oscillatory limiters for Runge-Kutta Discontinuous Galerkin Methods Jianxian Qiu School of Mathematical Science Xiamen University jxqiu@xmu.edu.cn http://ccam.xmu.edu.cn/teacher/jxqiu Collaborators: J.

684 views • 56 slides