towards temporl learning when to act

Towards TempoRL: Learning When to Act Andr Biedenkapp, Raghu Rajan, - PowerPoint PPT Presentation

Towards TempoRL: Learning When to Act Andr Biedenkapp, Raghu Rajan, Frank Hutter & Marius Lindauer Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020 In a Nutshell 1. We propose a proactive way of doing RL 2. We


  1. Towards TempoRL: Learning When to Act André Biedenkapp, Raghu Rajan, Frank Hutter & Marius Lindauer Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  2. In a Nutshell 1. We propose a proactive way of doing RL 2. We introduce skip-connections into MDPs ○ through action repetition ○ allows for faster propagation of rewards 3. We propose a novel algorithm using skip-connections ○ learn what action to take & when to make new decisions ○ condition when on what 4. We evaluate our approach with tabular Q-learning on small grid worlds Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  3. Motivation r = 0 r = 1 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  4. Motivation r = 0 r = -1 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  5. Optimal Policies Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  6. Optimal Policies: When do we need to act? # Steps: 16 # Decisions: 16 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  7. Optimal Policies: When do we need to act? # Steps: 16 # Decisions: 5 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  8. Optimal Policies: When do we need to act? # Steps: 16 # Decisions: 4 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  9. Optimal Policies: When do we need to act? # Steps: 16 # Decisions: 3 Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  10. Proactive Decision Making # Steps: 16 # Steps: 16 # Decisions: 16 # Decisions: 3 ~80% fewer Decision points Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  11. Skip MDPs Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  12. Flat Hierarchy 1. Use standard Q-learning to determine the behaviour 2. Condition skips on the chosen action. 3. Play action for the next steps The action Q-function The skip Q-function can be learned using n-step updates Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  13. Experimental Evaluation Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  14. Experimental Evaluation Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

  15. Wrap-Up Code & Data available: https://github.com/automl/TabularTempoRL Future work: - Use deep function approximation - Different exploration mechanisms for skip and behaviour policies Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020

Recommend


More recommend