Hierarchical Reinforcement Learning and Human Behavior
Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University
Hierarchical Reinforcement Learning and Human Behavior Matthew - - PowerPoint PPT Presentation
Hierarchical Reinforcement Learning and Human Behavior Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University a a a * v v v Knutson et al., NeuroReport , 2001 Schultz et al.,
Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University
Knutson et al., NeuroReport, 2001 Schultz et al., Science, 1997
Gehring & Willoughby, Science, 2002 Matsumoto & Hikosaka, Nature, 2007
From Niv, Joel & Dayan, TICS, 2006 (artwork by B. Balleine) From Glasher, Daw, Dayan & O’Doherty, 2010
Botvinick, Niv & Barto, Cognition, 2009
W W S W W P G W W W W
After Sutton, Precup & Singh, 1999 Botvinick, Niv & Barto, Cognition, 2009
Botvinick, Niv & Barto, Cognition, 2009
Botvinick & Weinstein, Trans. Royal Society, 2014
Humpheys & Forde, Cog. Neuropsych., 2001 Hamilton & Grafton, J Neurosci, 2006
state (s) state (s) state (s) action action action Actor Critic DLS Environment
VS OFC
DLS
DA
DLPFC +
Critic Environment R(s) V(s) Actor π(s) δ Critic Environment VS Actor DLS
DA
action state (s)
HT+ HT+
R (s) Actor Critic DLS V (s) π (s) Environment
Botvinick, Niv & Barto, Cognition, 2009
From Curtis & D’Esposito, TICS, 2003
White & Wise, Exp Br Res, 1999
Miller & Cohen, Ann. Rev. Neurosci, 2001
From Badre, TICS, 2008
state (s) state (s) state (s) action action action Actor Critic DLS Environment
VS OFC
DLS
DA
DLPFC +
Critic Environment R(s) V(s) Actor π(s) δ Critic Environment VS Actor DLS
DA
action state (s)
HT+ HT+
R (s) Actor Critic DLS V (s) π (s) Environment
Botvinick, Niv & Barto, Cognition, 2009
O’Reilly & Frank, Neural Computation, 2006
O’Reilly & Frank, Neural Computation, 2006 Bonini et al., J. Neurosci., 2011
state (s) state (s) state (s) action action action Actor Critic DLS Environment
VS OFC
DLS
DA
DLPFC +
Critic Environment R(s) V(s) Actor π(s) δ Critic Environment VS Actor DLS
DA
action state (s)
HT+ HT+
R (s) Actor Critic DLS V (s) π (s) Environment
Botvinick, Niv & Barto, Cognition, 2009
Schoenbaum, et al. J Neurosci. 1999
state (s) state (s) state (s) action action action Actor Critic DLS Environment
VS OFC
DLS
DA
DLPFC +
Critic Environment R(s) V(s) Actor π(s) δ Critic Environment VS Actor DLS
DA
action state (s)
HT+ HT+
R (s) Actor Critic DLS V (s) π (s) Environment
Botvinick, Niv & Barto, Cognition, 2009
Carlos Diuk
Diuk, et al., J Neurosci, 2013
Carlos Diuk
Ribas-Fernandes et al., Neuron, 2011
Jose Fernandes Alec Solway
B A D C E
Timestep Timestep Timestep
1
Standard RL Hierarchical RL
A B C D E
RPE RPE PPE
Ribas-Fernandes et al., Neuron, 2011
Jose Fernandes Alec Solway
From Yeung, et al., 2005
Ribas-Fernandes et al., Neuron, 2011
Jose Fernandes Alec Solway
Ribas-Fernandes et al., Neuron, 2011
Jose Fernandes Alec Solway
Botvinick, Niv & Barto, Cognition, 2009
A
100 200 1 2 3
Log Solution Time Episode
1
Model Evidenc Search Time
4
Alec Solway Carlos Diuk
Solway et al., PLoS Comp. Biol., 2014
Alec Solway Carlos Diuk
Solway et al., PLoS Comp. Biol., 2014
Alec Solway Carlos Diuk
Solway et al., PLoS Comp. Biol., 2014
Alec Solway Carlos Diuk
Solway et al., PLoS Comp. Biol., 2014
Solway et al., PLoS Comp. Biol., 2014
!∈!
Solway et al., PLoS Comp. Biol., 2014
Fortunato, Physics Reports, 2010
Zachary’s karate club Santa Fe Institute collaborations Lusseau’s bottlenose dolphins
Simsek, Wolfe & Barto, 2005
Carlos Diuk DebbieYee
Solway et al., PLoS Comp. Biol., 2014
Carlos Diuk DebbieYee
Solway et al., PLoS Comp. Biol., 2014
Carlos Diuk DebbieYee
Solway et al., PLoS Comp. Biol., 2014
S G
B
2200 2300 2400 2500 2600 2700 2800 2900
Reject
Solway et al., PLoS Comp. Biol., 2014
Anna Schapiro Schapiro et al., Nature Neurosci, 2013
Schapiro et al., Nature Neurosci, 2013
Schapiro et al., Nature Neurosci, 2013
Schapiro et al., Nature Neurosci, 2013
Schapiro et al., Nature Neurosci, 2013
0.1 0.2 0.3 0.4 Probability of parse Cluster transition parse
Experiment 1
All trials Hamiltonian paths 0.1 0.2 0.3 0.4
Experiment 2
Other parse Cluster transition parse Other parse
Schapiro et al., Nature Neurosci, 2013
Schapiro et al., Nature Neurosci, 2013
1.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Successor Representation Correlation
0.38 0.18 0.36 0.34 0.32 0.30 0.28 0.26 0.24 0.22 0.20
Pattern Correlation
Diuk et al., in prep.
Carlos Diuk
Schapiro et al., 2013.; Rogers & McClelland, 2003 Current Stimulus Next Stimulus
Solway et al., PLoS Comp. Biol., 2014
Rosvall & Bergstrom, PNAS, 2008
Mahadevan & Maggioni, 2005
Stachenfeld, Botvinick & Gershman, NIPS, 2014
Olshausen & Field, Nature, 1996
Botvinick & Plaut, Psych Review, 2004
Codelength
Collaborators
Carlos Diuk (Facebook) Jose Ribas-Fernandes (U. Victoria) Anna Schapiro Alec Solway (V. Tech / UCL) Kim Stachenfeld Ari Weinstein Debbie Yee (Wash. U.)
Lab Contributors
Andy Barto (UMass) Yael Niv (Princeton) Tim Rogers (Wisconsin) Nick Turk-Browne (Princeton)