10703 Deep Reinforcement Learning Reinforcement Learning in Humans - - PowerPoint PPT Presentation

▶

Feb 09, 2024 237 likes •452 views

10703 Deep Reinforcement Learning Reinforcement Learning in Humans and Animals Tom Mitchell October 29, 2018 Reading: Barto & Sutton Chapter 15 Tom Mitchell, October 2018 Outline RL in primates RL in humans Error signals and

SLIDE 1

Tom Mitchell, October 2018

10703 Deep Reinforcement Learning

Tom Mitchell October 29, 2018

Reinforcement Learning in Humans and Animals

Reading: Barto & Sutton Chapter 15

SLIDE 2

Tom Mitchell, October 2018

RL in primates
RL in humans
Error signals and predictive coding

Outline

SLIDE 3

Tom Mitchell, October 2018

Reward based learning in primates

SLIDE 4

Tom Mitchell, October 2018

Dopamine As Reward Signal

[Schultz et al., Science, 1997]

t

SLIDE 5

Tom Mitchell, October 2018

Dopamine As Reward Signal

[Schultz et al., Science, 1997]

t

SLIDE 6

Tom Mitchell, October 2018

Dopamine As Reward Signal

[Schultz et al., Science, 1997]

t

) V(s ) V(s γ r error

t 1 t t

− + =

SLIDE 7

Tom Mitchell, October 2018

Reward based learning in humans

SLIDE 8

Tom Mitchell, October 2018

RL Models for Human Learning

[Seymore et al., Nature 2004]

SLIDE 9

Tom Mitchell, October 2018

9 [Seymore et al., Nature 2004]

SLIDE 10

Tom Mitchell, October 2018

One Theory of RL in the Brain

Basal ganglia monitor events, predict future rewards
When prediction revised upward (downward), causes increase

(decrease) in activity of midbrain dopaminergic neurons, influencing ACC

This dopamine-based activation

somehow results in revising the reward prediction function. Possibly through direct influence

n Basal ganglia, and via

prefrontal cortex

from [Nieuwenhuis et al.]

SLIDE 11

Tom Mitchell, October 2018

SLIDE 12

Tom Mitchell, October 2018

SLIDE 13

Tom Mitchell, October 2018

Hebbian learning

– fire together wire together

Spike Timing Dependent Plasticity

(STDP)

– if incoming neuron fires before outgoing  then strengthen connection – if incoming neuron fires after outgoing  then weaken connection

Reward modulated STDP

– less understood – in some neurons, appears STDP occurs only if neuromodulator (e.g., dopamine) activity follows firing within time up to 10 sec

Neuron Level Learning Mechanisms

SLIDE 14

Tom Mitchell, October 2018

SLIDE 15

Tom Mitchell, October 2018

Summary: Temporal Difference ML Model   Predicts Dopaminergic Neuron Acitivity during Learning

Evidence now of neural reward signals from

– Direct neural recordings in monkeys – fMRI in humans (1 mm spatial resolution) – EEG in humans (1-10 msec temporal resolution)

Dopaminergic responses encode Temporal Difference error
Some differences, and efforts to refine the model

– How/where is the value function encoded in the brain? – Study timing (e.g., basal ganglia learns faster than PFC ?) – Role of prior knowledge, rehearsal of experience, multi-task learning?

SLIDE 16

Tom Mitchell, October 2018

Predictive Coding

SLIDE 17

Tom Mitchell, October 2018

[Rao & Ballard, Nature, 1999]

SLIDE 18

Tom Mitchell, October 2018

[Rao & Ballard, 1999]

SLIDE 19

Tom Mitchell, October 2018

SLIDE 20

Tom Mitchell, October 2018

10703 Deep Reinforcement Learning

Tom Mitchell October 29, 2018

Reinforcement Learning in Humans and Animals

Reading: Barto & Sutton Chapter 15

Outline

Reward based learning in primates

Dopamine As Reward Signal

t

Dopamine As Reward Signal

t

Dopamine As Reward Signal

t

) V(s ) V(s γ r error

− + =

Reward based learning in humans

RL Models for Human Learning

One Theory of RL in the Brain

(decrease) in activity of midbrain dopaminergic neurons, influencing ACC

somehow results in revising the reward prediction function. Possibly through direct influence

prefrontal cortex

from [Nieuwenhuis et al.]

– fire together wire together

(STDP)

– if incoming neuron fires before outgoing then strengthen connection – if incoming neuron fires after outgoing then weaken connection

– less understood – in some neurons, appears STDP occurs only if neuromodulator (e.g., dopamine) activity follows firing within time up to 10 sec

Neuron Level Learning Mechanisms

Summary: Temporal Difference ML Model Predicts Dopaminergic Neuron Acitivity during Learning

– Direct neural recordings in monkeys – fMRI in humans (1 mm spatial resolution) – EEG in humans (1-10 msec temporal resolution)

– How/where is the value function encoded in the brain? – Study timing (e.g., basal ganglia learns faster than PFC ?) – Role of prior knowledge, rehearsal of experience, multi-task learning?

Predictive Coding

– if incoming neuron fires before outgoing  then strengthen connection – if incoming neuron fires after outgoing  then weaken connection

Summary: Temporal Difference ML Model   Predicts Dopaminergic Neuron Acitivity during Learning