Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, - - PowerPoint PPT Presentation

per decision option discounting
SMART_READER_LITE
LIVE PREVIEW

Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, - - PowerPoint PPT Presentation

Per-Decision Option Discounting Anna Harutyunyan, Peter Vrancx, Philippe Hamel, Ann Nowe, Doina Precup Motivation: Agents that reason over long temporal horizons Motivation: Agents that reason over long temporal horizons Horizon depends on


slide-1
SLIDE 1

Per-Decision Option Discounting

Anna Harutyunyan, Peter Vrancx, Philippe Hamel, Ann Nowe, Doina Precup

slide-2
SLIDE 2

Motivation: Agents that reason over long temporal horizons

slide-3
SLIDE 3

Horizon depends on discount γ Motivation: Agents that reason over long temporal horizons

slide-4
SLIDE 4

Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ

slide-5
SLIDE 5

Horizon depends on discount γ Larger grid requires a larger γ Motivation: Agents that reason over long temporal horizons

slide-6
SLIDE 6

Horizon depends on discount γ Larger grid requires a larger γ Large γ-s are inefficient in practice :( Motivation: Agents that reason over long temporal horizons

slide-7
SLIDE 7

Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction? Motivation: Agents that reason over long temporal horizons

slide-8
SLIDE 8

Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction? Options still tied to γ!

slide-9
SLIDE 9

Motivation: Agents that reason over long temporal horizons Horizon depends on discount γ Larger grid requires a larger γ Temporal abstraction? Options still tied to γ! Contribution: Generalize the options framework to let it extend the agent’s horizon.

slide-10
SLIDE 10

The Options Framework

Reward model: Transition model:

slide-11
SLIDE 11

The Options Framework

Reward model: Transition model:

slide-12
SLIDE 12

Options with Time Dilation

Reward model: Transition model:

(1) decouple (2) per-decision

slide-13
SLIDE 13

Options with Time Dilation

Reward model: Transition model:

(1) decouple (2) per-decision

slide-14
SLIDE 14

Options with Time Dilation

Reward model: Transition model:

γp controls how much we care about option duration (pseudo-primitive when γp=1) (1) decouple (2) per-decision

slide-15
SLIDE 15

Options with Time Dilation

Reward model: Transition model: Key intuition: Insulate option time from global time

γp controls how much we care about option duration (pseudo-primitive when γp=1) (1) decouple (2) per-decision

slide-16
SLIDE 16

Primitive Timestep Invariance

Ours Classical

slide-17
SLIDE 17

Bias-Variance Tradeoff

Analytical variance bound Empirical error (Four Rooms)

slide-18
SLIDE 18

Bias-Variance Tradeoff

Analytical variance bound Empirical error (Four Rooms) Larger γp can induce less variance!

slide-19
SLIDE 19

Bias-Variance Tradeoff

Analytical variance bound Empirical error (Four Rooms)

Thanks! More at poster #114 :)

Larger γp can induce less variance!