towards temporl learning when to act
play

Towards TempoRL: Learning When to Act Andr Biedenkapp, Raghu Rajan, - PowerPoint PPT Presentation

Towards TempoRL: Learning When to Act Andr Biedenkapp, Raghu Rajan, Frank Hutter & Marius Lindauer Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020 In a Nutshell 1. We propose a proactive way of doing RL 2. We


  1. Towards TempoRL: Learning When to Act André Biedenkapp, Raghu Rajan, Frank Hutter & Marius Lindauer Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  2. In a Nutshell 1. We propose a proactive way of doing RL 2. We introduce skip-connections into MDPs ○ through action repetition ○ allows for faster propagation of rewards 3. We propose a novel algorithm using skip-connections ○ learn what action to take & when to make new decisions ○ condition when on what 4. We evaluate our approach with tabular Q-learning on small grid worlds Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  3. Motivation r = 0 r = 1 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  4. Motivation r = 0 r = -1 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  5. Optimal Policies Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  6. Optimal Policies: When do we need to act? # Steps: 16 # Decisions: 16 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  7. Optimal Policies: When do we need to act? # Steps: 16 # Decisions: 5 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  8. Optimal Policies: When do we need to act? # Steps: 16 # Decisions: 4 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  9. Optimal Policies: When do we need to act? # Steps: 16 # Decisions: 3 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  10. Proactive Decision Making # Steps: 16 # Steps: 16 # Decisions: 16 # Decisions: 3 ~80% fewer Decision points Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  11. Skip MDPs Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  12. Flat Hierarchy 1. Use standard Q-learning to determine the behaviour 2. Condition skips on the chosen action. 3. Play action for the next steps The action Q-function The skip Q-function can be learned using n-step updates Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  13. Experimental Evaluation Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  14. Experimental Evaluation Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  15. Wrap-Up Code & Data available: https://github.com/automl/TabularTempoRL Future work: - Use deep function approximation - Different exploration mechanisms for skip and behaviour policies Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend