Reinforcement Learning in Psychology and Neuroscience
with thanks to Elliot Ludvig University of Warwick
Reinforcement Learning in Psychology and Neuroscience with thanks - - PowerPoint PPT Presentation
Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig University of Warwick Bidirectional Influences Psychology Artificial Intelligence Reinforcement Learning Control Neuroscience Theory Any information
with thanks to Elliot Ludvig University of Warwick
Reinforcement Learning Artificial Intelligence Psychology Control Theory Neuroscience
David Marr, 1972
– What is being computed? – Why are these the right things to compute?
– How are these things computed?
– How is this implemented physically?
processes, analogous to our prediction and control.
completely different fields: psychology and neuroscience
features of biological learning
learning)
❖ Classical = Prediction
❖ Operant = Control
❖ Classical Conditioning concerns
❖ Contiguity = Co-occurrence ❖ Sufficient for association?
❖ Conditioned Taste Aversion
❖ Blocking ❖ Contingency Experiments
Learning about the sound in Phase 2 does not occur because it is blocked by the association formed in Phase 1
Light comes to cause salivation Will sound come to cause salivation? No.
❖ Widely cited and used
❖ As in linear supervised learning (LMS, p2) ❖ TD learning is a real-time extension
David Marr, 1972
– What is being computed? – Why are these the right things to compute?
– How are these things computed?
– How is this implemented physically?
δt . = Rt+1 + γ ˆ v(St+1, θ) − ˆ v(St, θ)
wi ei
i ~ δ ⋅ei
xi
Reward
δ
States
Features Value of state
wi ⋅xi
i
TD Error
TD Error Eligibility Trace
λ
Hammer, Menzel
Honeybee Brain VUM Neuron
What signal does this neuron carry?
❖ Diffuse projections from mid-brain
Old Current New
Error Calculation
Dopamine
Schultz et al., (1997); Montague et al. (1996)
Wolfram Schultz, et al.
Dopamine neurons signal the error/change in prediction of reward
Reward Unexpected
Reward Value TD error
Reward Expected
Cue Value TD error
Reward Absent
Value TD error
δt = Rt+1 + γˆ vt+1 − ˆ vt
processes, analogous to our prediction and control.
completely different fields: psychology and neuroscience
features of biological learning
– includes learning and planning – makes plain why value functions are so important – makes plain why so many fields care about these algorithms
– all involve decision, goals, and time...
Reinforcement Learning Artificial Intelligence Psychology Control Theory Neuroscience