reinforcement learning in humans and animals nathaniel daw nyu - - PowerPoint PPT Presentation

reinforcement learning in humans and animals
SMART_READER_LITE
LIVE PREVIEW

reinforcement learning in humans and animals nathaniel daw nyu - - PowerPoint PPT Presentation

reinforcement learning in humans and animals nathaniel daw nyu neuroscience; psychology; neuroeconomics cognition centric roundtable stevens, may 13 2011 collaborators NYU: Aaron Bornstein Sara Constantino Nick Gustafson Jian Li Seth


slide-1
SLIDE 1

reinforcement learning in humans and animals

nathaniel daw nyu neuroscience; psychology; neuroeconomics cognition centric roundtable stevens, may 13 2011

slide-2
SLIDE 2

collaborators

NYU: Aaron Bornstein Sara Constantino Nick Gustafson Jian Li Seth Madlon-Kay Dylan Simon Bijan Pesaran Columbia: Daphna Shohamy Elliott Wimmer UCL: Peter Dayan Ben Seymour Ray Dolan Berkeley: Bianca Wittmann U Chicago: Jeff Beeler Xiaoji Zhuang Princeton: Yael Niv Sam Gershman Trinity: John O’Doherty Tel Aviv: Tom Schonberg Daphna Joel Montreal: Aaron Courville CMU: David Touretzky Austin: Ross Otto

funding: NIMH, NIDA, NARSAD, McKnight Endowment, HFSP

slide-3
SLIDE 3

question

longstanding question in psychology: what information is learned from reward

– law of effect (Thorndike): learn to repeat reinforced actions

  • dopamine

– cognitive maps (Tolman): learn “map” of task structure; evaluate new actions online

  • even rats can do this
slide-4
SLIDE 4

new leverage on this problem

draw on computer science, economics for methods, frameworks 1.new computational & neural tools

– examine learning via trial-by-trial adjustments in behavior and neural signals

1.new computational theories

– algorithmic view – dopamine associated with “model-free” RL – “model-based” RL as account for cognitive maps (Daw, Niv & Dayan 2005, 2006)

slide-5
SLIDE 5

learned decision making in humans

+

0.25 0.5 probability 100 200 300 0.25 0.5 trial probability

“bandit” tasks Daw et al. 2006 Wittmann et al 2008 Gershman et al 2009 Schonberg et al 2007, 2010 Glascher et al. 2010 Li & Daw 2011

slide-6
SLIDE 6

trial-by-trial analysis

experience (past choices & outcomes) t-1 t-3 t-4 … t-2 model (RL algorithm + probabilistic choice rule: experience  choices)

Choice Probability

predicted values prediction errors etc predicted choice (probabilities) behavior: which model & parameters make

  • bserved choices most likely?
slide-7
SLIDE 7

 Â

slide-8
SLIDE 8

Á

slide-9
SLIDE 9

? E[V(a)] = Σo P(o|a) V(o) Á

“model- free” “model- based”

slide-10
SLIDE 10

rat version

Valued Devalued

Lever Presses

5 10 moderate training extensive training actions per minute

(Holland, 2004)

two behavioral modes: devaluation-sensitive (“goal directed”) devaluation-insensitive (“habitual”) neurally dissociable with lesions (Dickinson, Balleine, Killcross) dual systems view

(Balleine, Daw & O’Doherty, 2009)

slide-11
SLIDE 11

task

70% with prob: 70% 26% 57% 41% 28% (all slowly changing) (Daw, Gershman, Seymour, et al Neuron 2011)

slide-12
SLIDE 12

question

does choice behavior respect sequential structure?

slide-13
SLIDE 13

idea

30%

How does bottom-stage feedback affect top-stage choices? Example: rare transition at top level, followed by win

  • Which top-stage action

is now favored?

slide-14
SLIDE 14

predictions

direct reinforcement ignores transition structure model-based planning respects transition structure

slide-15
SLIDE 15

data

reinforcement planning

17 subs x 201 trials each

reward: p<1e-8 reward x rare: p<5e-5 (mixed effects logit)

 results reject pure reinforcement models  suggest mixture of planning and reinforcement processes

(Daw, Gershman, Seymour, et al Neuron 2011)

slide-16
SLIDE 16

Otto, Gershman, Markman

dual task

single task dual task dual x reward: p < 5e-7 dual x reward x rare: p< .05

slide-17
SLIDE 17

neural analysis

behavior incorporates model knowledge: not just TD want to ask same question neurally can we dissociate multiple neural systems underlying neural behavior

  • in particular, can we show subcortical systems are dumb?
slide-18
SLIDE 18

dopamine & RL

(Schultz et al. 1997) (Daw et al. 2011)

slide-19
SLIDE 19

fMRI analysis

+ β· =

hypothesis: striatal “error” signals are solely reinforcement driven

  • 1. generate candidate error signals assuming TD
  • 2. additional regressor captures how this signal would be changed for

errors relative to values computed by planning

TD error change due to forward planning net signal estimate this

slide-20
SLIDE 20

fMRI analysis

+ β·

TD error change due to forward planning

 contrary to theories: even striatal error signals incorporate knowledge of task structure

(P<.05 cluster) (Daw, Gershman, Seymour, et al Neuron 2011)

slide-21
SLIDE 21

variation across subjects

subjects differ in degree of model usage

+ β· =

TD error change due to planning net signal compare behavioral & neural estimates

=

slide-22
SLIDE 22

variation across subjects

subjects differ in degree of model usage

+ β· =

TD error change due to planning net signal p<.05 SVC

slide-23
SLIDE 23

average signal

R NAcc: start of trial:

  • interaction not significant
  • but size of interaction covaries with behavioral model usage

(p=.02)

slide-24
SLIDE 24

can distinguish multiple learned representations in humans

  • neurally more intertwined than expected

related areas: self control (drugs, dieting, savings etc.) learning in multiplayer interactions (games)

  • equilibrium vs equilibration
  • do we learn about actions or about opponents?

thoughts

slide-25
SLIDE 25

p-beauty context

  • fast equilibration with repeated play, most

subjects never reinforced

Singaporean undergrads – Ho et al. 1998

slide-26
SLIDE 26

RPS

  • do subjects learn by reinforcement?
  • best respond to reinforcement?
  • best respond to that?

(Hampton et al, 2008)

slide-27
SLIDE 27

conclusions

  • 0. use of computational models to quantify phenomena &

distinctions for neural study

  • 1. can leverage this to distinguish different sorts of

learning, trial-by-trial

– beginning to map neural substrates

  • 2. implications for self control, economic interactions