Learning a Prior over Intent via Meta-Inverse Reinforcement Learning - PowerPoint PPT Presentation

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn University of California, Berkeley

Motivation : a well specified reward function remains an important assumption for applying RL in practice MANDRIL Meta Reward and Intention Learning

Motivation : a well specified reward function remains an important assumption for applying RL in practice Simulation MANDRIL Meta Reward and Intention Learning

Motivation : a well specified reward function remains an important assumption for applying RL in practice Simulation Real World MANDRIL Meta Reward and Intention Learning

Motivation : a well specified reward function remains an important assumption for applying RL in practice Simulation Real World Often easier to provide expert data and learn a reward function using inverse RL MANDRIL Meta Reward and Intention Learning

Motivation : a well specified reward function remains an important assumption for applying RL in practice Simulation Real World Often easier to provide expert data and learn a reward function using inverse RL Inverse RL frequently requires a lot of data to learn a generalizable reward MANDRIL Meta Reward and Intention Learning

Motivation : a well specified reward function remains an important assumption for applying RL in practice Simulation Real World Often easier to provide expert data and learn a reward function using inverse RL Inverse RL frequently requires a lot of data to learn a generalizable reward This is due in part with the fundamental ambiguity of reward learning MANDRIL Meta Reward and Intention Learning

Goal : how can agents infer rewards from one or a few demonstrations? MANDRIL Meta Reward and Intention Learning

Goal : how can agents infer rewards from one or a few demonstrations? Intuition: demonstrations from previous tasks induce a prior over the space of possible future tasks MANDRIL Meta Reward and Intention Learning

Goal : how can agents infer rewards from one or a few demonstrations? Intuition: demonstrations from previous tasks induce a prior over the space of possible future tasks Shared Context → E ffi cient adaptation MANDRIL Meta Reward and Intention Learning

Meta-inverse reinforcement learning : using prior tasks information to accelerate inverse-RL MANDRIL Meta Reward and Intention Learning

Our instantiation : (background) Model-agnostic meta-learning MANDRIL Meta Reward and Intention Learning

Our approach: Meta reward and intention learning MANDRIL Meta Reward and Intention Learning

Domain 1 : SpriteWorld environment Meta- Training Evaluation time MANDRIL Meta Reward and Intention Learning

Domain 1 : SpriteWorld environment Meta- Training Evaluation time Each task is a specific landmark navigation task MANDRIL Meta Reward and Intention Learning

Domain 1 : SpriteWorld environment Meta- Training Evaluation time Each task is a specific landmark navigation task Each task exhibits the same terrain preferences MANDRIL Meta Reward and Intention Learning

Domain 1 : SpriteWorld environment Meta- Training Evaluation time Each task is a specific landmark navigation task Each task exhibits the same terrain preferences Evaluation time varies the position of landmark and uses unseen sprites MANDRIL Meta Reward and Intention Learning

Domain 2 : First person navigation (SUNCG) MANDRIL Meta Reward and Intention Learning

Domain 2 : First person navigation (SUNCG) Tasks require both learning navigation (NAV) and picking (PICK) MANDRIL Meta Reward and Intention Learning

Domain 2 : First person navigation (SUNCG) Tasks require both learning navigation (NAV) and picking (PICK) Task illustration MANDRIL Meta Reward and Intention Learning

Domain 2 : First person navigation (SUNCG) Tasks require both learning navigation (NAV) and picking (PICK) Task illustration Agent view MANDRIL Meta Reward and Intention Learning

Domain 2 : First person navigation (SUNCG) Tasks require both learning navigation (NAV) and picking (PICK) Task illustration Agent view Tasks share a common theme but di ff er in visual layout and specific goal MANDRIL Meta Reward and Intention Learning

Results : With only a limited number of demonstrations, performance is significantly better MANDRIL Meta Reward and Intention Learning

Results : Optimizing initial weights consistently improves performance across tasks Success rate is significantly improved on both test and unseen house layouts especially on the harder PICK task MANDRIL Meta Reward and Intention Learning

Reward function can be adapted with a limited number of demonstrations MANDRIL Meta Reward and Intention Learning

Thanks! Tuesday, Poster #222 Anca Dragan Sergey Levine Chelsea Finn Kelvin Xu Ellis Ratner

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning - PowerPoint PPT Presentation

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn University of California, Berkeley Motivation : a well specified reward function remains an important

Intents and Intent Filters Intent Intent is an messaging object. There are three fundamental

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

1. Algorithms for Inverse Reinforcement Learning 2. Apprenticeship learning via Inverse

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

What are the right abstractions for capturing programming and intent Discussion section Thursday

Efficient Meta Learning via Minibatch Proximal Update Pan Zhou Joint work with Xiao-Tong Yuan,

Prior Learning Assessment Content of Presentation Introduction to Prior Learning Assessment

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

E ng a g e me nt E va lua tio n Go a l Se tting Pre pa ring fo r the Me dic a l

Robotic Agents (CMPSC 311) Robot Navigation Janyl Jumadinova September 19, 2019 Janyl

Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Geant[V,X] Geometry Ph. Canal, S.Y. Jun, G. Lima (Fermilab) Geant R&D Retreat September 13,

Evaluation of techniques for navigation of higher- order ambisonics Acoustics 17 Boston

Me Menus nus and nd Navigation No screens Say your name Prof. Lydia Chilton COMS 4170 12

Pointing and Navigation Beating Fitts law Michel Beaudouin-Lafon Laboratoire de Recherche en

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning - PowerPoint PPT Presentation

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn University of California, Berkeley Motivation : a well specified reward function remains an important

Intents and Intent Filters Intent Intent is an messaging object. There are three fundamental

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

1. Algorithms for Inverse Reinforcement Learning 2. Apprenticeship learning via Inverse

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

What are the right abstractions for capturing programming and intent Discussion section Thursday

Efficient Meta Learning via Minibatch Proximal Update Pan Zhou Joint work with Xiao-Tong Yuan,

Prior Learning Assessment Content of Presentation Introduction to Prior Learning Assessment

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

E ng a g e me nt E va lua tio n Go a l Se tting Pre pa ring fo r the Me dic a l

Robotic Agents (CMPSC 311) Robot Navigation Janyl Jumadinova September 19, 2019 Janyl

Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Geant[V,X] Geometry Ph. Canal, S.Y. Jun, G. Lima (Fermilab) Geant R&amp;D Retreat September 13,

Evaluation of techniques for navigation of higher- order ambisonics Acoustics 17 Boston

Me Menus nus and nd Navigation No screens Say your name Prof. Lydia Chilton COMS 4170 12

Pointing and Navigation Beating Fitts law Michel Beaudouin-Lafon Laboratoire de Recherche en

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Geant[V,X] Geometry Ph. Canal, S.Y. Jun, G. Lima (Fermilab) Geant R&D Retreat September 13,