Towards TempoRL: Learning When to Act Andr Biedenkapp, Raghu Rajan, - - PowerPoint PPT Presentation

towards temporl learning when to act
SMART_READER_LITE
LIVE PREVIEW

Towards TempoRL: Learning When to Act Andr Biedenkapp, Raghu Rajan, - - PowerPoint PPT Presentation

Towards TempoRL: Learning When to Act Andr Biedenkapp, Raghu Rajan, Frank Hutter & Marius Lindauer Biedenkapp, Rajan, Hutter and Lindauer Towards TempoRL BIG@ICML 2020 In a Nutshell 1. We propose a proactive way of doing RL 2. We


slide-1
SLIDE 1

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Towards TempoRL: Learning When to Act

André Biedenkapp, Raghu Rajan, Frank Hutter & Marius Lindauer

slide-2
SLIDE 2

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

In a Nutshell

2. We introduce skip-connections into MDPs ○ through action repetition ○ allows for faster propagation of rewards 3. We propose a novel algorithm using skip-connections ○ learn what action to take & when to make new decisions ○ condition when on what 4. We evaluate our approach with tabular Q-learning on small grid worlds 1. We propose a proactive way of doing RL

slide-3
SLIDE 3

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Motivation

r = 0 r = 1

slide-4
SLIDE 4

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Motivation

r = 0 r = -1

slide-5
SLIDE 5

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Optimal Policies

slide-6
SLIDE 6

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Optimal Policies: When do we need to act?

# Steps: 16 # Decisions: 16

slide-7
SLIDE 7

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Optimal Policies: When do we need to act?

# Steps: 16 # Decisions: 5

slide-8
SLIDE 8

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Optimal Policies: When do we need to act?

# Steps: 16 # Decisions: 4

slide-9
SLIDE 9

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Optimal Policies: When do we need to act?

# Steps: 16 # Decisions: 3

slide-10
SLIDE 10

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Proactive Decision Making

# Steps: 16 # Decisions: 3 # Steps: 16 # Decisions: 16

~80% fewer Decision points

slide-11
SLIDE 11

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Skip MDPs

slide-12
SLIDE 12

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Flat Hierarchy

1. Use standard Q-learning to determine the behaviour 2. Condition skips on the chosen action. 3. Play action for the next steps The action Q-function The skip Q-function can be learned using n-step updates

slide-13
SLIDE 13

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Experimental Evaluation

slide-14
SLIDE 14

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Experimental Evaluation

slide-15
SLIDE 15

Towards TempoRL Biedenkapp, Rajan, Hutter and Lindauer BIG@ICML 2020

Wrap-Up

Code & Data available:

https://github.com/automl/TabularTempoRL

Future work:

  • Use deep function approximation
  • Different exploration mechanisms

for skip and behaviour policies