Reinforcement Learning in Psychology and Neuroscience with thanks - - PowerPoint PPT Presentation

reinforcement learning in psychology and neuroscience
SMART_READER_LITE
LIVE PREVIEW

Reinforcement Learning in Psychology and Neuroscience with thanks - - PowerPoint PPT Presentation

Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig Princeton University Psychology has identified two primitive kinds of learning Classical Conditioning Operant Conditioning (a.k.a. Instrumental


slide-1
SLIDE 1

Reinforcement Learning in Psychology and Neuroscience

with thanks to Elliot Ludvig Princeton University

slide-2
SLIDE 2

Psychology has identified two primitive kinds of learning

  • Classical Conditioning
  • Operant Conditioning (a.k.a. Instrumental learning)
  • Computational theory:

❖ Classical = Prediction

  • What is going to happen?

❖ Operant = Control

  • What to do to maximize reward?
slide-3
SLIDE 3

Classical Conditioning

slide-4
SLIDE 4

Pavlov

  • Russian physiologist
  • Interested in how learning happened in

the brain

  • Conditional and Unconditional Stimuli
slide-5
SLIDE 5

Rescorla-Wagner Model (1972)

  • Computational model of conditioning

❖ Widely cited and used

  • Learning as violation of expectations

❖ TD learning as extension of RW

slide-6
SLIDE 6

Operant Learning

  • Operant Conditioning is all about choice in

3 main ways:

❖ Decide which response to make? ❖ Decide how much to respond? ❖ Decide when to respond?

slide-7
SLIDE 7

Thorndike’s Puzzle Box

slide-8
SLIDE 8

Operant Chambers

slide-9
SLIDE 9

Complex Cognition

slide-10
SLIDE 10

Marr’s 3 Levels of Analysis

  • Computational

❖ What function is being fulfilled?

  • Algorithmic

❖ How is it accomplished?

  • Implementational

❖ What physical substrate is involved?

slide-11
SLIDE 11
  • Learn to predict discounted sum of upcoming

reward through TD with linear function approximation:

The Basic TD Model

δt = rt+1 + γVt+1 − Vt. Vt = wT

t xt = n

  • i=1

wt(i)xt(i)

  • The TD error is calculated as:
slide-12
SLIDE 12
slide-13
SLIDE 13

TD(λ) algorithm/model/neuron

wi ei

˙ w

i ~ δ ⋅ei

xi

Reward

δ

States

  • r

Features Value of state

  • r action

wi ⋅xi

i

TD Error

TD Error Eligibility Trace

λ

slide-14
SLIDE 14

Brain reward systems

Hammer, Menzel

Honeybee Brain VUM Neuron

What signal does this neuron carry?

slide-15
SLIDE 15

Dopamine

  • Small-molecule Neurotransmitter

❖ Diffuse projections from mid-brain

throughout the brain

from Pinel (2000), p.364

Key Idea: Phasic change in baseline dopamine responding = reward prediction error

slide-16
SLIDE 16

Wolfram Schultz, et al.

Dopamine neurons signal the error/change in prediction of reward

TD error

slide-17
SLIDE 17

Reward Unexpected

Reward Value TD error

Reward Expected

Cue Value TD error

Reward Absent

Value TD error

Representation- independent predictions

  • f TD errors

TD errort = rt+1 + γVt+1 Vt

slide-18
SLIDE 18

The theory that Dopamine = TD error is one of the most important interactions ever between artificial intelligence and neuroscience