imitating latent policies from observation
play

Imitating Latent Policies from Observation Ashley D. Edwards, - PowerPoint PPT Presentation

Imitating Latent Policies from Observation Ashley D. Edwards, Himanshu Sahni, Yannick Schroecker, Charles L. Isbell Georgia Institute of Technology Introduction Imitation from Observation enables learning from state sequences Typical


  1. Imitating Latent Policies from Observation Ashley D. Edwards, Himanshu Sahni, Yannick Schroecker, Charles L. Isbell Georgia Institute of Technology

  2. Introduction • Imitation from Observation enables learning from state sequences • Typical approaches need extensive environment interactions • Humans can learn policies just by watching

  3. Approach Given: Sequence of noisy expert observations Assumption: Discrete actions with deterministic transitions • z is defined as a latent action that caused a transition to occur • z can imply a real action or some other type of transition Action: Right Action: Right Z = 1 Z = 2 • A latent policy is the probability of taking a latent action in some state

  4. Approach ILPO 1. Given sequence of observations, learn latent policy 2. Use a few environment steps to align actions

  5. Approach ILPO 1. Given sequence of observations, learn latent policy 2. Use a few environment steps to align actions Latent policy network

  6. Approach ILPO 1. Given sequence of observations, learn latent policy 2. Use a few environment steps to align actions (b) Action Remapping Network Action remapping network

  7. Experiments: Classic Control • Access to expert observations only • No reward function used in approach • Comparison to Behavioral Cloning from Observation [1] [1] Torabi, Faraz, Garrett Warnell, and Peter Stone. "Behavioral cloning from observation." Proceedings of the 27th International Joint Conference on Artificial Intelligence . AAAI Press, 2018.

  8. Experiments: CoinRun

  9. Experiments: CoinRun

  10. Thank You! Room: Pacific Ballroom at 6:30pm (Today)! Poster: #33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend