Hierarchical Reinforcement Learning and Human Behavior Matthew - PowerPoint PPT Presentation

Hierarchical Reinforcement Learning and Human Behavior Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University

a a a * v v v � � �

Knutson et al., NeuroReport , 2001 Schultz et al., Science , 1997

Matsumoto & Hikosaka, Nature , 2007 Gehring & Willoughby, Science , 2002

From Glasher, Daw, Dayan & O’Doherty, 2010 From Niv, Joel & Dayan, TICS, 2006 (artwork by B. Balleine)

The Curse of Dimensionality

15.6% The Blessing of Abstraction

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! s ( t =1) s ( t =2) s ( t =3) s ( t =4) s ( t =5) s ( t =6) Botvinick, Niv & Barto, Cognition , 2009

W W S W W P G W W W W After Sutton, Precup & Singh, 1999 Botvinick, Niv & Barto, Cognition , 2009

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! s ( t =1) s ( t =2) s ( t =3) s ( t =4) s ( t =5) s ( t =6) Botvinick, Niv & Barto, Cognition , 2009

Botvinick & Weinstein, Trans. Royal Society, 2014

Hamilton & Grafton, J Neurosci , 2006 Humpheys & Forde, Cog. Neuropsych., 2001

A C Actor Actor π ( s ) DLS Critic Critic state ( s ) state ( s ) V ( s ) VS action action R ( s ) δ DA HT+ Environment Environment B D Actor Actor 1 1 π ( s ) DLS DLS DLS o Critic Critic VS state ( s ) o state ( s ) DLPFC V ( s ) o + OFC action action R ( s ) δ HT+ DA o Environment Environment Botvinick, Niv & Barto, Cognition , 2009

From Curtis & D’Esposito, TICS , 2003

White & Wise, Exp Br Res, 1999

Miller & Cohen, Ann. Rev. Neurosci , 2001

From Badre, TICS , 2008

A C Actor Actor π ( s ) DLS Critic Critic state ( s ) state ( s ) V ( s ) VS action action R ( s ) δ DA HT+ 2 2 Environment Environment B D Actor Actor π ( s ) DLS DLS DLS o Critic Critic VS state ( s ) o state ( s ) DLPFC V ( s ) o + OFC action action R ( s ) δ HT+ DA o Environment Environment Botvinick, Niv & Barto, Cognition , 2009

O’Reilly & Frank, Neural Computation, 2006

O’Reilly & Frank, Neural Computation, 2006 Bonini et al., J. Neurosci ., 2011

A C Actor Actor π ( s ) DLS Critic Critic state ( s ) state ( s ) V ( s ) VS action action R ( s ) δ DA HT+ Environment Environment 3 3 B D Actor Actor π ( s ) DLS DLS DLS o Critic Critic VS state ( s ) o state ( s ) DLPFC V ( s ) o + OFC action action R ( s ) δ HT+ DA o Environment Environment Botvinick, Niv & Barto, Cognition , 2009

Schoenbaum, et al. J Neurosci . 1999

A C Actor Actor π ( s ) DLS Critic Critic state ( s ) state ( s ) V ( s ) VS action action R ( s ) δ DA HT+ Environment Environment B D Actor Actor π ( s ) DLS DLS DLS o Critic Critic VS state ( s ) o state ( s ) DLPFC V ( s ) o + OFC action action R ( s ) δ HT+ DA o Environment Environment 4 4 Botvinick, Niv & Barto, Cognition , 2009

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! s ( t =1) s ( t =2) s ( t =3) s ( t =4) s ( t =5) s ( t =6)

Carlos Diuk

Carlos Diuk Diuk, et al. , J Neurosci, 2013

“RPE” ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! s ( t =1) s ( t =2) s ( t =3) s ( t =4) s ( t =5) s ( t =6) “PPE”

Jose Fernandes Alec Solway Ribas-Fernandes et al., Neuron , 2011

Jose Fernandes Alec Solway Standard RL Hierarchical RL PPE 1 RPE RPE A 0 -1 B D A C E D B C E Timestep Timestep Timestep Ribas-Fernandes et al., Neuron , 2011

Jose Fernandes Alec Solway ! From Yeung, et al., 2005 Ribas-Fernandes et al., Neuron , 2011

Jose Fernandes Alec Solway Ribas-Fernandes et al., Neuron , 2011

Search Time A 4 Log Solution Time 3 Model Evidenc 2 1 1 100 200 Episode Botvinick, Niv & Barto, Cognition , 2009

The Burden of Abstraction

� 1. What should be learned? � � 2. Do people learn it? � � 3. How? �

Alec Solway Carlos Diuk Solway et al., PLoS Comp. Biol., 2014

Alec Solway Carlos Diuk !!!!!!!!!!!!!!! Pr !"#" !"#$% = Solway et al., PLoS Comp. Biol., 2014

!!!!!!!!!!!!!!! Pr !"#" !"#$% = Pr !"#" !"#$% , ! Pr ! ! !"#$% , ! ∈ ! Codelength Search Time !!!!!!!!!!!!!!! Pr !"#" !"#$% = Model Evidence Solway et al., PLoS Comp. Biol., 2014

Solway et al., PLoS Comp. Biol., 2014

Zachary’s karate club Santa Fe Institute Lusseau’s bottlenose dolphins collaborations Fortunato, Physics Reports , 2010

Simsek, Wolfe & Barto, 2005

Carlos Diuk DebbieYee Solway et al., PLoS Comp. Biol., 2014

Carlos Diuk DebbieYee B 2900 2800 G S �� 2700 2600 2500 2400 2300 2200 �� Reject Solway et al., PLoS Comp. Biol., 2014

, !!!!!!!!!!!!!!! Pr !"#" !"#$% = Pr !"#" !"#$% , ! Pr ! ! !"#$% ! ∈ !

Anna Schapiro Schapiro et al., Nature Neurosci, 2013

1.00 0.66 -0.36 0.66 1.00 -0.36 -0.36 -0.36 1.00 Schapiro et al., Nature Neurosci, 2013

Schapiro et al., Nature Neurosci, 2013

Time Schapiro et al., Nature Neurosci, 2013

Experiment 1 Experiment 2 All trials 0.4 0.4 Hamiltonian Probability of parse paths 0.3 0.3 0.2 0.2 0.1 0.1 0 0 Cluster transition Cluster transition Other parse Other parse parse parse Schapiro et al., Nature Neurosci, 2013

+ HC Schapiro et al., Nature Neurosci, 2013

Carlos Diuk 1.0 Successor Representation 0.9 0.8 Correlation 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.38 0.36 0.34 Pattern Correlation 0.32 0.30 0.28 0.26 0.24 0.22 0.20 0.18 + HC Diuk et al., in prep.

Current Stimulus Next Stimulus Schapiro et al., 2013.; Rogers & McClelland, 2003

Codelength Search Time !!!!!!!!!!!!!!! Pr !"#" !"#$% = Model Evidence Solway et al., PLoS Comp. Biol., 2014

cf. Dayan, 1993

Rosvall & Bergstrom, PNAS, 2008

Mahadevan & Maggioni, 2005

Stachenfeld, Botvinick & Gershman, NIPS, 2014

Olshausen & Field, Nature, 1996

Botvinick & Plaut, Psych Review, 2004

Conclusions � • The scaling problem in RL • Hierarchy can help • Model-free versus model-based HRL • HRL in the brain • The need for good representations • Task decomposition, bottlenecks, community detection • Prospective coding and structure discovery Codelength • Hierarchy as compression

Collaborators Lab Contributors Carlos Diuk (Facebook) Jose Ribas-Fernandes (U. Victoria) Andy Barto (UMass) Anna Schapiro Yael Niv (Princeton) Alec Solway (V. Tech / UCL) Tim Rogers (Wisconsin) Kim Stachenfeld Nick Turk-Browne (Princeton) Ari Weinstein Debbie Yee (Wash. U.)

Hierarchical Reinforcement Learning and Human Behavior Matthew - PowerPoint PPT Presentation

Hierarchical Reinforcement Learning and Human Behavior Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University a a a * v v v Knutson et al., NeuroReport , 2001 Schultz et al.,

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

Signaling Networks The word is the shadow of the deed. - Democritus Democritus The Laughing

De-ghosting for Gigapixel Snapshot Processing Alexandros-Stavros Iliopoulos 1 Jun Hu 1 Nikos

Porting Nouveau to Tegra K1 How NVIDIA became a Nouveau contributor Alexandre Courbot, NVIDIA

Generating Top-N Recommendations from Binary Profile Data Michael Hahsler Marketing Research and

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

BLOGS ARE ECHO CHAMBERS: BLOGS ARE ECHO CHAMBERS Eric Gilbert | Tony Bergstrom | Karrie Karahalios

An Extension of Systems Factorial Technology (SFT) to Arbitrary Numbers of Processes James

Marketing PostgreSQL brand where to start Valeria Kaplan dataegret.com About me PostgreSQL:

Sambuz

Useful Links

Newsletter

Mail Us