10703 Deep Reinforcement Learning Reinforcement Learning in Humans - - PowerPoint PPT Presentation

10703 deep reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

10703 Deep Reinforcement Learning Reinforcement Learning in Humans - - PowerPoint PPT Presentation

10703 Deep Reinforcement Learning Reinforcement Learning in Humans and Animals Tom Mitchell October 29, 2018 Reading: Barto & Sutton Chapter 15 Tom Mitchell, October 2018 Outline RL in primates RL in humans Error signals and


slide-1
SLIDE 1

Tom Mitchell, October 2018

10703 Deep Reinforcement Learning

Tom Mitchell October 29, 2018

Reinforcement Learning in Humans and Animals

Reading: Barto & Sutton Chapter 15

slide-2
SLIDE 2

Tom Mitchell, October 2018

  • RL in primates
  • RL in humans
  • Error signals and predictive coding

Outline

slide-3
SLIDE 3

Tom Mitchell, October 2018

Reward based learning in primates

slide-4
SLIDE 4

Tom Mitchell, October 2018

Dopamine As Reward Signal

[Schultz et al., Science, 1997]

t

slide-5
SLIDE 5

Tom Mitchell, October 2018

Dopamine As Reward Signal

[Schultz et al., Science, 1997]

t

slide-6
SLIDE 6

Tom Mitchell, October 2018

6

Dopamine As Reward Signal

[Schultz et al., Science, 1997]

t

) V(s ) V(s γ r error

t 1 t t

− + =

+

slide-7
SLIDE 7

Tom Mitchell, October 2018

Reward based learning in humans

slide-8
SLIDE 8

Tom Mitchell, October 2018

RL Models for Human Learning

[Seymore et al., Nature 2004]

slide-9
SLIDE 9

Tom Mitchell, October 2018

9 [Seymore et al., Nature 2004]

slide-10
SLIDE 10

Tom Mitchell, October 2018

One Theory of RL in the Brain

  • Basal ganglia monitor events, predict future rewards
  • When prediction revised upward (downward), causes increase

(decrease) in activity of midbrain dopaminergic neurons, influencing ACC

  • This dopamine-based activation

somehow results in revising the reward prediction function. Possibly through direct influence

  • n Basal ganglia, and via

prefrontal cortex

from [Nieuwenhuis et al.]

slide-11
SLIDE 11

Tom Mitchell, October 2018

slide-12
SLIDE 12

Tom Mitchell, October 2018

slide-13
SLIDE 13

Tom Mitchell, October 2018

  • Hebbian learning

– fire together wire together

  • Spike Timing Dependent Plasticity


(STDP)

– if incoming neuron fires before outgoing
 then strengthen connection – if incoming neuron fires after outgoing
 then weaken connection

  • Reward modulated STDP

– less understood – in some neurons, appears STDP occurs only if neuromodulator (e.g., dopamine) activity follows firing within time up to 10 sec

Neuron Level Learning Mechanisms

slide-14
SLIDE 14

Tom Mitchell, October 2018

slide-15
SLIDE 15

Tom Mitchell, October 2018

Summary: Temporal Difference ML Model 
 Predicts Dopaminergic Neuron Acitivity during Learning

  • Evidence now of neural reward signals from

– Direct neural recordings in monkeys – fMRI in humans (1 mm spatial resolution) – EEG in humans (1-10 msec temporal resolution)

  • Dopaminergic responses encode Temporal Difference error
  • Some differences, and efforts to refine the model

– How/where is the value function encoded in the brain? – Study timing (e.g., basal ganglia learns faster than PFC ?) – Role of prior knowledge, rehearsal of experience, multi-task learning?

slide-16
SLIDE 16

Tom Mitchell, October 2018

Predictive Coding

slide-17
SLIDE 17

Tom Mitchell, October 2018

[Rao & Ballard, Nature, 1999]

slide-18
SLIDE 18

Tom Mitchell, October 2018

[Rao & Ballard, 1999]

slide-19
SLIDE 19

Tom Mitchell, October 2018

slide-20
SLIDE 20

Tom Mitchell, October 2018