a bayesian approach to generative adversarial imitation
play

A Bayesian Approach to Generative Adversarial Imitation Learning - PowerPoint PPT Presentation

A Bayesian Approach to Generative Adversarial Imitation Learning NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io Imitation Learning A Markov decision process (MDP)


  1. A Bayesian Approach to Generative Adversarial Imitation Learning NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io

  2. Imitation Learning • A Markov decision process (MDP) without cost • A policy Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  3. Imitation Learning • A Markov decision process (MDP) without cost • A policy • Instead, there is a set of expert’s demonstrations : • Learn a policy that mimics well. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  4. Generative Adversarial Imitation Learning (GAIL) • Use generative adversarial networks (GANs) for imitation learning: 1. Sample trajectories by using and (expert demonstrations). 2. Train discriminator. 3. Update policy by using reinforcement learning (RL), e.g., TRPO, PPO. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  5. Generative Adversarial Imitation Learning (GAIL) I don’t want to • move a lot… GAIL requires model-free RL inner loops. • The environment simulation is required. • Sample-efficiency issues • Obtaining trajectory samples from the environment is often very costly, e.g., physical robots in a real world. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  6. Generative Adversarial Imitation Learning (GAIL) I don’t want to • move a lot… GAIL requires model-free RL inner loops. • The environment simulation is required. • Sample-efficiency issues • Obtaining trajectory samples from the environment is often very costly, e.g., physical robots in a real world. • Motivation • For each iteration, the discriminator is updated by using minibatches. • How about using Bayesian classification to train discriminator? • Expected to make more refined cost function for imitation learning! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  7. Bayesian Framework for GAIL • Probabilistic model for trajectories • For each trajectories , a sequence of state-action pairs satisfies Markov property : trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  8. Bayesian Framework for GAIL • Probabilistic model for trajectories • For each trajectories , a sequence of state-action pairs satisfies Markov property : • Two policies: agent’s policy , expert’s policy agent’s expert’s trajectory trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  9. Bayesian Framework for GAIL • Role of discriminator • The probability that models whether comes from the expert or the agent trajectory discriminator agent’s expert’s trajectory trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  10. Bayesian Framework for GAIL • Posterior distributions • Posterior for discriminator (conditioned on perfect trajectory discrimination) • Posterior for policy (conditioned on preventing perfect discrimination) Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  11. Bayesian Framework for GAIL • Posterior distributions • Posterior for discriminator (conditioned on perfect trajectory discrimination) GAIL uses maximum likelihood estimation (MLE) for both policy and discriminator updates! • Posterior for policy (conditioned on preventing perfect discrimination) Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  12. Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive learning cost Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  13. Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive learning cost • Learning Curve for 5 MuJoCo tasks! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  14. Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive For more information, please come to our poster session! learning cost • Wed Dec 5th 5-7 PM @ Room 210 & 230 AB #129 Learning Curve for 5 MuJoCo tasks! Thanks! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend