Hierarchical Reinforcement Learning and Human Behavior Matthew - - PowerPoint PPT Presentation

hierarchical reinforcement learning and human behavior
SMART_READER_LITE
LIVE PREVIEW

Hierarchical Reinforcement Learning and Human Behavior Matthew - - PowerPoint PPT Presentation

Hierarchical Reinforcement Learning and Human Behavior Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University a a a * v v v Knutson et al., NeuroReport , 2001 Schultz et al.,


slide-1
SLIDE 1

Hierarchical Reinforcement Learning and Human Behavior

Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University

slide-2
SLIDE 2

*

a a a

v

  • v

v

slide-3
SLIDE 3

Knutson et al., NeuroReport, 2001 Schultz et al., Science, 1997

slide-4
SLIDE 4

Gehring & Willoughby, Science, 2002 Matsumoto & Hikosaka, Nature, 2007

slide-5
SLIDE 5

From Niv, Joel & Dayan, TICS, 2006 (artwork by B. Balleine) From Glasher, Daw, Dayan & O’Doherty, 2010

slide-6
SLIDE 6

0.8%

slide-7
SLIDE 7

The Curse of Dimensionality

slide-8
SLIDE 8

15.6% The Blessing of Abstraction

slide-9
SLIDE 9

Botvinick, Niv & Barto, Cognition, 2009

!! !! !!

!!! !!! !!!

s(t=1) s(t=2) s(t=3) s(t=4) s(t=5) s(t=6)

slide-10
SLIDE 10

W W S W W P G W W W W

After Sutton, Precup & Singh, 1999 Botvinick, Niv & Barto, Cognition, 2009

slide-11
SLIDE 11

Botvinick, Niv & Barto, Cognition, 2009

!! !! !!

!!! !!! !!!

s(t=1) s(t=2) s(t=3) s(t=4) s(t=5) s(t=6)

slide-12
SLIDE 12
slide-13
SLIDE 13

Botvinick & Weinstein, Trans. Royal Society, 2014

slide-14
SLIDE 14

Humpheys & Forde, Cog. Neuropsych., 2001 Hamilton & Grafton, J Neurosci, 2006

slide-15
SLIDE 15

state (s) state (s) state (s) action action action Actor Critic DLS Environment

VS OFC

DLS

DA

DLPFC +

Critic Environment R(s) V(s) Actor π(s) δ Critic Environment VS Actor DLS

DA

A B C D

action state (s)

HT+ HT+

R (s) Actor Critic DLS V (s) π (s) Environment

  • δ
  • 1

Botvinick, Niv & Barto, Cognition, 2009

1

slide-16
SLIDE 16

From Curtis & D’Esposito, TICS, 2003

slide-17
SLIDE 17

White & Wise, Exp Br Res, 1999

slide-18
SLIDE 18

Miller & Cohen, Ann. Rev. Neurosci, 2001

slide-19
SLIDE 19

From Badre, TICS, 2008

slide-20
SLIDE 20

state (s) state (s) state (s) action action action Actor Critic DLS Environment

VS OFC

DLS

DA

DLPFC +

Critic Environment R(s) V(s) Actor π(s) δ Critic Environment VS Actor DLS

DA

A B C D

action state (s)

HT+ HT+

R (s) Actor Critic DLS V (s) π (s) Environment

  • δ
  • 2

Botvinick, Niv & Barto, Cognition, 2009

2

slide-21
SLIDE 21

O’Reilly & Frank, Neural Computation, 2006

slide-22
SLIDE 22

O’Reilly & Frank, Neural Computation, 2006 Bonini et al., J. Neurosci., 2011

slide-23
SLIDE 23

state (s) state (s) state (s) action action action Actor Critic DLS Environment

VS OFC

DLS

DA

DLPFC +

Critic Environment R(s) V(s) Actor π(s) δ Critic Environment VS Actor DLS

DA

A B C D

action state (s)

HT+ HT+

R (s) Actor Critic DLS V (s) π (s) Environment

  • δ
  • 3

Botvinick, Niv & Barto, Cognition, 2009

3

slide-24
SLIDE 24

Schoenbaum, et al. J Neurosci. 1999

slide-25
SLIDE 25

state (s) state (s) state (s) action action action Actor Critic DLS Environment

VS OFC

DLS

DA

DLPFC +

Critic Environment R(s) V(s) Actor π(s) δ Critic Environment VS Actor DLS

DA

A B C D

action state (s)

HT+ HT+

R (s) Actor Critic DLS V (s) π (s) Environment

  • δ
  • 4

Botvinick, Niv & Barto, Cognition, 2009

4

slide-26
SLIDE 26

!! !! !!

!!! !!! !!!

s(t=1) s(t=2) s(t=3) s(t=4) s(t=5) s(t=6)

slide-27
SLIDE 27

Carlos Diuk

slide-28
SLIDE 28

Diuk, et al., J Neurosci, 2013

Carlos Diuk

slide-29
SLIDE 29

!! !! !!

!!! !!! !!!

s(t=1) s(t=2) s(t=3) s(t=4) s(t=5) s(t=6)

“RPE” “PPE”

slide-30
SLIDE 30

Ribas-Fernandes et al., Neuron, 2011

Jose Fernandes Alec Solway

slide-31
SLIDE 31

B A D C E

Timestep Timestep Timestep

  • 1

1

Standard RL Hierarchical RL

A B C D E

RPE RPE PPE

Ribas-Fernandes et al., Neuron, 2011

Jose Fernandes Alec Solway

slide-32
SLIDE 32

!

From Yeung, et al., 2005

Ribas-Fernandes et al., Neuron, 2011

Jose Fernandes Alec Solway

slide-33
SLIDE 33

Ribas-Fernandes et al., Neuron, 2011

Jose Fernandes Alec Solway

slide-34
SLIDE 34

Botvinick, Niv & Barto, Cognition, 2009

A

100 200 1 2 3

Log Solution Time Episode

1

Model Evidenc Search Time

4

slide-35
SLIDE 35

The Burden of Abstraction

slide-36
SLIDE 36
  • 1. What should be learned?
  • 2. Do people learn it?
  • 3. How?
slide-37
SLIDE 37

Alec Solway Carlos Diuk

Solway et al., PLoS Comp. Biol., 2014

slide-38
SLIDE 38

Alec Solway Carlos Diuk

Solway et al., PLoS Comp. Biol., 2014

slide-39
SLIDE 39

Alec Solway Carlos Diuk

Solway et al., PLoS Comp. Biol., 2014

slide-40
SLIDE 40

Alec Solway Carlos Diuk

!!!!!!!!!!!!!!!Pr !"#" !"#$% =

Solway et al., PLoS Comp. Biol., 2014

slide-41
SLIDE 41

Model Evidence Search Time Codelength

!!!!!!!!!!!!!!!Pr !"#" !"#$% =

Solway et al., PLoS Comp. Biol., 2014

!!!!!!!!!!!!!!!Pr !"#" !"#$% = Pr !"#" !"#$%, ! Pr! ! !"#$%

!∈!

,

slide-42
SLIDE 42

Solway et al., PLoS Comp. Biol., 2014

slide-43
SLIDE 43

Fortunato, Physics Reports, 2010

Zachary’s karate club Santa Fe Institute collaborations Lusseau’s bottlenose dolphins

slide-44
SLIDE 44

Simsek, Wolfe & Barto, 2005

slide-45
SLIDE 45
  • 1. What should be learned?
  • 2. Do people learn it?
  • 3. How?
slide-46
SLIDE 46

Carlos Diuk DebbieYee

Solway et al., PLoS Comp. Biol., 2014

slide-47
SLIDE 47

Carlos Diuk DebbieYee

Solway et al., PLoS Comp. Biol., 2014

slide-48
SLIDE 48

Carlos Diuk DebbieYee

Solway et al., PLoS Comp. Biol., 2014

slide-49
SLIDE 49

S G

B

2200 2300 2400 2500 2600 2700 2800 2900

Reject

  • Carlos Diuk DebbieYee

Solway et al., PLoS Comp. Biol., 2014

slide-50
SLIDE 50
  • 1. What should be learned?
  • 2. Do people learn it?
  • 3. How?
slide-51
SLIDE 51

!!!!!!!!!!!!!!!Pr !"#" !"#$% = Pr !"#" !"#$%, ! Pr! ! !"#$%

!∈!

,

slide-52
SLIDE 52

Anna Schapiro Schapiro et al., Nature Neurosci, 2013

slide-53
SLIDE 53
  • 0.36

1.00

  • 0.36

0.66 1.00

  • 0.36

1.00 0.66

  • 0.36

Schapiro et al., Nature Neurosci, 2013

slide-54
SLIDE 54

Schapiro et al., Nature Neurosci, 2013

slide-55
SLIDE 55

Schapiro et al., Nature Neurosci, 2013

slide-56
SLIDE 56

Time

Schapiro et al., Nature Neurosci, 2013

slide-57
SLIDE 57

0.1 0.2 0.3 0.4 Probability of parse Cluster transition parse

Experiment 1

All trials Hamiltonian paths 0.1 0.2 0.3 0.4

Experiment 2

Other parse Cluster transition parse Other parse

Schapiro et al., Nature Neurosci, 2013

slide-58
SLIDE 58

Schapiro et al., Nature Neurosci, 2013

+ HC

slide-59
SLIDE 59

1.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Successor Representation Correlation

0.38 0.18 0.36 0.34 0.32 0.30 0.28 0.26 0.24 0.22 0.20

Pattern Correlation

Diuk et al., in prep.

Carlos Diuk

+ HC

slide-60
SLIDE 60

Schapiro et al., 2013.; Rogers & McClelland, 2003 Current Stimulus Next Stimulus

slide-61
SLIDE 61

Model Evidence Search Time Codelength

!!!!!!!!!!!!!!!Pr !"#" !"#$% =

Solway et al., PLoS Comp. Biol., 2014

slide-62
SLIDE 62
  • cf. Dayan, 1993
slide-63
SLIDE 63

Rosvall & Bergstrom, PNAS, 2008

slide-64
SLIDE 64

Mahadevan & Maggioni, 2005

slide-65
SLIDE 65

Stachenfeld, Botvinick & Gershman, NIPS, 2014

slide-66
SLIDE 66

Olshausen & Field, Nature, 1996

slide-67
SLIDE 67

Botvinick & Plaut, Psych Review, 2004

slide-68
SLIDE 68

Conclusions

  • The scaling problem in RL
  • Hierarchy can help
  • HRL in the brain
  • The need for good representations
  • Model-free versus model-based HRL
  • Task decomposition, bottlenecks, community detection
  • Prospective coding and structure discovery
  • Hierarchy as compression

Codelength

slide-69
SLIDE 69

Collaborators

Carlos Diuk (Facebook) Jose Ribas-Fernandes (U. Victoria) Anna Schapiro Alec Solway (V. Tech / UCL) Kim Stachenfeld Ari Weinstein Debbie Yee (Wash. U.)

Lab Contributors

Andy Barto (UMass) Yael Niv (Princeton) Tim Rogers (Wisconsin) Nick Turk-Browne (Princeton)