Hindsight Credit Assignment Anna Harutyunyan , Will Dabney, Thomas - - PowerPoint PPT Presentation

hindsight credit assignment
SMART_READER_LITE
LIVE PREVIEW

Hindsight Credit Assignment Anna Harutyunyan , Will Dabney, Thomas - - PowerPoint PPT Presentation

Hindsight Credit Assignment Anna Harutyunyan , Will Dabney, Thomas Mesnard, Mo Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos NeurIPS 2019 How did past actions infmuence future


slide-1
SLIDE 1

NeurIPS 2019

Hindsight Credit Assignment

Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mo Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos

slide-2
SLIDE 2

How did past actions infmuence future outcomes?

slide-3
SLIDE 3

RL relies on MDP structure, and takes time as main proxy for credit relevance

slide-4
SLIDE 4

RL relies on MDP structure, and takes time as main proxy for credit relevance

slide-5
SLIDE 5

RL relies on MDP structure, and takes time as main proxy for credit relevance

Time

slide-6
SLIDE 6

RL relies on MDP structure, and takes time as main proxy for credit relevance

Time

Credit

slide-7
SLIDE 7

RL relies on MDP structure, and takes time as main proxy for credit relevance

Time

Credit

Meeting at 4pm

slide-8
SLIDE 8

RL relies on MDP structure, and takes time as main proxy for credit relevance

Time

Credit

Lunch

slide-9
SLIDE 9

RL relies on MDP structure, and takes time as main proxy for credit relevance

Time

Credit

Umbrella

slide-10
SLIDE 10

RL relies on MDP structure, and takes time as main proxy for credit relevance

Time

Credit Credit

slide-11
SLIDE 11

Instead of only relying

  • n MDP assumptions,

let’s learn credit relevance explicitly!

slide-12
SLIDE 12

x

slide-13
SLIDE 13

future

  • utcome

past action x

slide-14
SLIDE 14

future

  • utcome

past action x

slide-15
SLIDE 15

How did past actions infmuence future outcomes?

future

  • utcome

past action x

slide-16
SLIDE 16

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y

slide-17
SLIDE 17

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y

slide-18
SLIDE 18

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y

slide-19
SLIDE 19

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y

slide-20
SLIDE 20

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y z

slide-21
SLIDE 21

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y z

slide-22
SLIDE 22

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y z

slide-23
SLIDE 23

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y z State

slide-24
SLIDE 24

future

  • utcome

past action

How did past actions infmuence future outcomes?

x y z State Return

slide-25
SLIDE 25

Hindsight Credit Assignment

How relevant was a to get to a state Xk?

slide-26
SLIDE 26

Hindsight Credit Assignment

How relevant was a to get to a state Xk? How relevant was a to achieve the return Z?

slide-27
SLIDE 27

Hindsight Credit Assignment

How relevant was a to get to a state Xk? How relevant was a to achieve the return Z?

slide-28
SLIDE 28

Hindsight Credit Assignment

How relevant was a to get to a state Xk? How relevant was a to achieve the return Z?

HCA Algorithms: Learn the hindsight distribution P, and use it to better estimate value functions or policy gradients

slide-29
SLIDE 29

Experiments

slide-30
SLIDE 30

Experiments

slide-31
SLIDE 31

Thank you for your attention!

Poster #204 :)