A Bayesian Approach to Generative Adversarial Imitation Learning - PowerPoint PPT Presentation

A Bayesian Approach to Generative Adversarial Imitation Learning NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io

Imitation Learning • A Markov decision process (MDP) without cost • A policy Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Imitation Learning • A Markov decision process (MDP) without cost • A policy • Instead, there is a set of expert’s demonstrations : • Learn a policy that mimics well. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Generative Adversarial Imitation Learning (GAIL) • Use generative adversarial networks (GANs) for imitation learning: 1. Sample trajectories by using and (expert demonstrations). 2. Train discriminator. 3. Update policy by using reinforcement learning (RL), e.g., TRPO, PPO. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Generative Adversarial Imitation Learning (GAIL) I don’t want to • move a lot… GAIL requires model-free RL inner loops. • The environment simulation is required. • Sample-efficiency issues • Obtaining trajectory samples from the environment is often very costly, e.g., physical robots in a real world. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Generative Adversarial Imitation Learning (GAIL) I don’t want to • move a lot… GAIL requires model-free RL inner loops. • The environment simulation is required. • Sample-efficiency issues • Obtaining trajectory samples from the environment is often very costly, e.g., physical robots in a real world. • Motivation • For each iteration, the discriminator is updated by using minibatches. • How about using Bayesian classification to train discriminator? • Expected to make more refined cost function for imitation learning! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Bayesian Framework for GAIL • Probabilistic model for trajectories • For each trajectories , a sequence of state-action pairs satisfies Markov property : trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Bayesian Framework for GAIL • Probabilistic model for trajectories • For each trajectories , a sequence of state-action pairs satisfies Markov property : • Two policies: agent’s policy , expert’s policy agent’s expert’s trajectory trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Bayesian Framework for GAIL • Role of discriminator • The probability that models whether comes from the expert or the agent trajectory discriminator agent’s expert’s trajectory trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Bayesian Framework for GAIL • Posterior distributions • Posterior for discriminator (conditioned on perfect trajectory discrimination) • Posterior for policy (conditioned on preventing perfect discrimination) Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Bayesian Framework for GAIL • Posterior distributions • Posterior for discriminator (conditioned on perfect trajectory discrimination) GAIL uses maximum likelihood estimation (MLE) for both policy and discriminator updates! • Posterior for policy (conditioned on preventing perfect discrimination) Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive learning cost Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive learning cost • Learning Curve for 5 MuJoCo tasks! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive For more information, please come to our poster session! learning cost • Wed Dec 5th 5-7 PM @ Room 210 & 230 AB #129 Learning Curve for 5 MuJoCo tasks! Thanks! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

A Bayesian Approach to Generative Adversarial Imitation Learning - PowerPoint PPT Presentation

A Bayesian Approach to Generative Adversarial Imitation Learning NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io Imitation Learning A Markov decision process (MDP)

Why do imitation and analogy fail? Why do imitation and analogy fail? Imitation Imitation

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

generative design systems Generative Brief Design Definitions Workshop Processes

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations Chih-Hui Ho, Chun Hu,

FAIC Foreign Accent Imitation Corpus Sara Neuhauser University of Jena, Germany IAFPA 2011

Imitation Theory and Experimental Evidence Joerg Oechssler University of Heidelberg

Imitation as a Stepping Stone to Innovation Amy Jocelyn Glass Texas A&M University Shift

Ne Network M k Mode dels ls and and Pr Protoco col Suites 01204325 Data Communications and

IPsec, BGPsec, DNSSEC Lecture 20 Internet Protocol Suite TCP/IP: Developed in the 70 s IP: at

CS5412: THE REALTIME CLOUD Lecture XXIV Ken Birman Can the Cloud Support Real-Time? 2

Challenge using dynamic point-to-point circuits in LHC The obvious thing is to take info The

Proverbs Series Lesson #022 June 23, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert

Proverbs Series Lesson #017 May 19, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert L.

CS-5630 / CS-6630 Visualization for Data Science Design Guidelines Alexander Lex

Proximity-based Clustering Clustering with no distance information What if one wants to

A Bayesian Approach to Generative Adversarial Imitation Learning - PowerPoint PPT Presentation

A Bayesian Approach to Generative Adversarial Imitation Learning NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io Imitation Learning A Markov decision process (MDP)

Why do imitation and analogy fail? Why do imitation and analogy fail? Imitation Imitation

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

generative design systems Generative Brief Design Definitions Workshop Processes

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations Chih-Hui Ho, Chun Hu,

FAIC Foreign Accent Imitation Corpus Sara Neuhauser University of Jena, Germany IAFPA 2011

Imitation Theory and Experimental Evidence Joerg Oechssler University of Heidelberg

Imitation as a Stepping Stone to Innovation Amy Jocelyn Glass Texas A&amp;M University Shift

Ne Network M k Mode dels ls and and Pr Protoco col Suites 01204325 Data Communications and

IPsec, BGPsec, DNSSEC Lecture 20 Internet Protocol Suite TCP/IP: Developed in the 70 s IP: at

CS5412: THE REALTIME CLOUD Lecture XXIV Ken Birman Can the Cloud Support Real-Time? 2

Challenge using dynamic point-to-point circuits in LHC The obvious thing is to take info The

Proverbs Series Lesson #022 June 23, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert

Proverbs Series Lesson #017 May 19, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert L.

CS-5630 / CS-6630 Visualization for Data Science Design Guidelines Alexander Lex

Proximity-based Clustering Clustering with no distance information What if one wants to

Imitation as a Stepping Stone to Innovation Amy Jocelyn Glass Texas A&M University Shift