[PDF] - Neurobiological Foundations of Reward and Risk ... and PDF Document

SLIDE 1

Neurobiological Foundations

f Reward and Risk

... and corresponding risk prediction errors

Peter Bossaerts

1

The Dopaminergic System

★Go back to Class 2:

Single-Unit Recording Of

Dopamine Neurons

fMRI Analysis Of Reward

(And Risk)

★Remark: fMRI focuses on

projection areas of Dopamine Neurons

3

2. Reward Prediction Errors And

TD (Temporal Difference) Models

Dopamine neurons do NOT signal expected rewards but reward prediction errors! (Expected rewards/values are encoded in ventro-medial prefrontal cortex, among others)

4

SLIDE 3

Prediction Error Learning

Simple “Rescorla Wagner” Learning Rule Notice relation between math and emotions!

5

!

7!

+:!Elation! ,:!Disappointment! DOPAMINE! Prediction!

Ft +1 = Ft + αηt.

Bossaerts!@!Claremont!Athenaeum!

TD Models: Learning (To Do) The Right Thing Through Reinforcement (Prediction Errors)

Can learn to assign value (of discounted future rewards) to complex signals Derives from dynamic programming...

6

SLIDE 4

Dynamic Programming

Value function V States S (transiting to S’) Actions to be taken... while learning value function (Converges IF RIGHT EXPLORATION; see Watkins-Dayan 1992)

7

Dorsal vs Ventral Striatum

Top: Ventral Striatum (A: Pavlovian; B: Instrumental (Conditioning) Bottom: Caudate (From O’Doherty)

8

pre- signal instrumental Re- corre- predic- Pavlov-

SLIDE 5

3.1. Pharmacological Evidence

L-Dopa (Green): Dopamine agonist Haloperidol (Red): Dopamine Antagonist Placebo = Grey ... in a two-armed bandit task (From Pessiglione)

9

(Curious Difference In Loss Learning)

10

OPPONENT PROCESS THEORY

 For best control, let two

pponent forces balance

each other (thumb + index)

 (a) Reward prediction

errors in striatum, GAIN and (2nd row) LOSS conditions

 (b) Punishment

prediction errors in insula, LOSS trials only

 Notice: opponent

process in loss trials

 (Not unlike pain

avoidance/relief: Seymour ea 2005)

26

INSULA “MANIPULATION” AFFECTS LOSS LEARNING

 Same task,

but now with insula lesion patients (Unpublished , Pessiglione)

27

INSULA ACTIVATION IN LOSS AVOIDANCE TASK PREDICTS SUCCESS IN (LOSS) BANDIT PROBLEM 1 MONTH LATER

 Samanez-

Larkin ea, Psych Reviews

28

Younger Adults Older Adults r = .45, p < .05, prep = .897 –0.2 Percentage Signal Change Insula Peak Voxel z = 4.71, p < .000005, prep = .999 Avoidance Learning (% Correct) 100 87.5 75 62.5 50

a

0.2 0.1 –0.1

b

Fig. 1. Task structure for a representative trial.

SLIDE 6

Important Remark

The idea that gains and losses are to be valued separately (as in Prospect Theory or Disappointment Aversion) squares well with the neurobiological foundation Of course, it is not clear (yet) where the brain sets the reference point! (What is a loss?)

11

3.2. Direct Dopamine Measurement

Day ea, Nat Neuro 2007: dopamine release in Nucleus Accumbens of rats is correlated with reward predictive cues Notice learning effect!

12

SLIDE 7

4. Risk: Variance

See Class 2 Slides. (Can one use signals to predict choice? Yes!)

13

Using brain signals to predict choice

!

VSt (ventral striatum): correlates with expected reward

!

ACC (anterior cingulate cortex): correlates with (objective) risk

!

IFG (inferior frontal gyrus): correlates with (subjective) risk

(Christopoulos ea, J

Neuroscience 2009)

(ACC and IFG have opposite

effects: opponent theory in biology)

fMRI brain signals predicts risk taking

Getting At Causality...

14

Disrupting process using transcranial magnetic stimulation

! Disrupting inferior

frontal gyrus leads to reduced risk aversion

(Knoch ea, J Neuroscience

2006)

SLIDE 8

5. Risk Prediction Errors

Risk Prediction Error = SIZE of reward prediction error minus EXPECTED SIZE (variance) (Driving term of a GARCH process)

15

6. The Norepinephrine Story

See pupil dilation slide, Class 2...

16

SLIDE 9

(Sophisticated - and only 1 step ahead)

17 EXAMPLE: BET=“SECOND CARD LOWER” CARD 1=3 CARD 2=2

4

t1: Forecast based on card 1 = -5/9 t2: Outcome = +1

t0: Prediction

f Forecast = 0

(t0,t1) (Forecast) Risk ~ 0.6 (t1,t2) (Outcome-Forecast) Risk ~0.7

Risk Prediction Errors

Relation With Tracking “Changes” In The Environment?

18

… AND NOREPINEPHRINE

 Enhancing NE levels induces rats to abandon old

“hypotheses” and find the newly optimal paths in a navigation task

 (Or are they more sensitive to signals of

UNEXPECTED UNCERTAINTY, i.e., that something changed? Yu-Dayan, Neuron 2005)

32

SLIDE 10

7. Higher Moments

19

12

Skewness

! Skewness = one-sided outliers ! Positive & negative skewness:

anterior insula (again)

! Positive skewness also ventral

striatum (see expected reward)!

Wu ea, PLoS ONE (2011)

Variance-Skewness

See MEG results, Class 2

20

SLIDE 11

8. Correlation

The Task

21

Results...

(Wunderlich, Symmonds, ea Neuron 2011)

22

SLIDE 12

8. Processing Time

Computing “value” takes time because all components have to be evaluated (reward, risk,...) and then put together... What happens if we constrain subjects to make a decision within 1s, 3s (and 5s)?

23

Results

24

1s Prob(gamble)

Increases in Expected Reward
Decreases in Variance
Decreases in Skew (Risk Loving

for Losses)

Increases in PRICE

5s Prob(gamble)

Increases in Expected Reward
Insensitive to Variance
Decreases in Skew (more!)
Insensitive to PRICE