per decision option discounting
play

Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, - PowerPoint PPT Presentation

Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, Philippe Hamel, Ann Nowe, Doina Precup Motivation: Agents that reason over long temporal horizons Motivation: Agents that reason over long temporal horizons Horizon depends on


  1. Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, Philippe Hamel, Ann Nowe, Doina Precup

  2. Motivation: Agents that reason over long temporal horizons

  3. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ

  4. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ

  5. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ

  6. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ Large γ-s are inefficient in practice :(

  7. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction?

  8. Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction? Options still tied to γ!

  9. Motivation: Agents that reason over long temporal horizons Contribution: Generalize the options framework to let it extend the agent’s horizon. Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction? Options still tied to γ!

  10. The Options Framework Reward model: Transition model:

  11. The Options Framework Reward model: Transition model:

  12. Options with Time Dilation Reward model: Transition model: (2) per-decision (1) decouple

  13. Options with Time Dilation Reward model: Transition model: (2) per-decision (1) decouple

  14. Options with Time Dilation Reward model: Transition model: (2) per-decision (1) decouple γ p controls how much we care about option duration (pseudo-primitive when γ p =1)

  15. Options with Time Dilation Reward model: Transition model: (2) per-decision (1) decouple γ p controls how much we care about option duration (pseudo-primitive when γ p =1) Key intuition: Insulate option time from global time

  16. Primitive Timestep Invariance Ours Classical

  17. Bias-Variance Tradeoff Empirical error (Four Rooms) Analytical variance bound

  18. Bias-Variance Tradeoff Empirical error (Four Rooms) Analytical variance bound Larger γ p can induce less variance!

  19. Bias-Variance Tradeoff Empirical error (Four Rooms) Analytical variance bound Thanks! More at poster #114 :) Larger γ p can induce less variance!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend