TDRL Model Dopamine TD Applications Model-based RL slide # 1 / 141
Reinforcement Learning: From neural processes modelling to Robotics applications
Mehdi Khamassi (CNRS, ISIR-UPMC, Paris)
30 January 2015 Michèle Sebag’s course @ Univ. Orsay
Reinforcement Learning: From neural processes modelling to Robotics - - PowerPoint PPT Presentation
TDRL Model Dopamine TD Applications Model-based RL slide # 1 / 141 Reinforcement Learning: From neural processes modelling to Robotics applications Mehdi Khamassi (CNRS, ISIR-UPMC, Paris) 30 January 2015 Michle Sebags course @ Univ.
TDRL Model Dopamine TD Applications Model-based RL slide # 1 / 141
Mehdi Khamassi (CNRS, ISIR-UPMC, Paris)
30 January 2015 Michèle Sebag’s course @ Univ. Orsay
TDRL Model Dopamine TD Applications Model-based RL slide # 2 / 141
TDRL Model Dopamine TD Applications Model-based RL slide # 3 / 141
TDRL Model Dopamine TD Applications Model-based RL slide # 4 / 141
Doya, 2000
TDRL Model Dopamine TD Applications Model-based RL slide # 5 / 141
TDRL Model Dopamine TD Applications Model-based RL slide # 6 / 141
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 7 / 141
Sutton & Barto (1998) Reinforcement Learning: An Introduction TDRL Model
The Actor learns to select actions that maximize reward. The Critic learns to predict reward (its value V). A reward prediction error constitutes the reinforcement signal.
TDRL Model Dopamine TD Applications Model-based RL slide # 8 / 141
1 2 3 4 5
Reward 1 2 3 4 5 actions:
reward
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 9 / 141
1 2 3 4 5
Reward 1 2 3 4 5 actions: reinforcement
reward
reward reinforcement
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 10 / 141
1 2 3 4 5
Reward 1 2 3 4 5 actions: reinforcement
reward
reward reinforcement
Value estimation (“reward prediction”):
Rescorla and Wagner (1972).
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 11 / 141
1 2 3 4 5
Reward 1 2 3 4 5 actions:
reward
reward reinforcement reinforcement
Sutton and Barto (1998).
Value estimation (“reward prediction”):
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 12 / 141
learning rate (=0.9) discount factor (=0.9)
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 13 / 141
learning rate (=0.9) discount factor (=0.9)
TDRL Model
0 = 0 + 0 - 0 = 0 + 0.9 * 0
TDRL Model Dopamine TD Applications Model-based RL slide # 14 / 141
1 = 1 + 0 - 0.9 = 0 + 0.9 * 1 learning rate (=0.9) discount factor (=0.9)
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 15 / 141
TDRL Model
1 = 1 + 0 - 0.9 = 0 + 0.9 * 1 learning rate (=0.9) discount factor (=0.9)
Color indicates value
TDRL Model Dopamine TD Applications Model-based RL slide # 16 / 141
TDRL Model
learning rate (=0.9) discount factor (=0.9)
0 = 0 + 0 - 0 = 0 + 0.9 * 0
TDRL Model Dopamine TD Applications Model-based RL slide # 17 / 141
learning rate (=0.9) discount factor (=0.9)
0.81 = 0 + 0.9 * 0.9 - 0.72 = 0 + 0.9 * 0.81
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 18 / 141
TDRL Model
0.81 = 0 + 0.9 * 0.9 - 0.72 = 0 + 0.9 * 0.81 learning rate (=0.9) discount factor (=0.9)
Color indicates value
TDRL Model Dopamine TD Applications Model-based RL slide # 19 / 141
learning rate (=0.9) discount factor (=0.9)
0.1 = 1 + 0 - 0.9 0.99 = 0.9 + 0.9 * 0.1
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 20 / 141
0.1 = 1 + 0 - 0.9 0.99 = 0.9 + 0.9 * 0.1 learning rate (=0.9) discount factor (=0.9)
Color indicates value
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 21 / 141
TDRL Model
learning rate (=0.1) discount factor (=0.9)
usually small for stability
TDRL Model Dopamine TD Applications Model-based RL slide # 22 / 141
TDRL Model
learning rate (=0.1) discount factor (=0.9)
TDRL Model Dopamine TD Applications Model-based RL slide # 23 / 141
TDRL Model
learning rate (=0.1) discount factor (=0.9)
TDRL Model Dopamine TD Applications Model-based RL slide # 24 / 141
TDRL Model
learning rate (=0.1) discount factor (=0.9)
TDRL Model Dopamine TD Applications Model-based RL slide # 25 / 141
TDRL Model
learning rate (=0.1) discount factor (=0.9)
TDRL Model Dopamine TD Applications Model-based RL slide # 26 / 141
TDRL Model
learning rate (=0.1) discount factor (=0.9)
TDRL Model Dopamine TD Applications Model-based RL slide # 27 / 141
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 28 / 141
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 29 / 141
Dopaminergic neuron
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 30 / 141
TDRL Model
state / action a1 : North a2 : South a3 : East a4 : West s1 0.92 0.10 0.35 0.05 s2 0.25 0.52 0.43 0.37 s3 0.78 0.9 1.0 0.81 s4 0.0 1.0 0.9 0.9 … … … … …
TDRL Model Dopamine TD Applications Model-based RL slide # 31 / 141
TDRL Model
0.9 0.1 0.3 0.8 0.1 0.9 0.3 0.1 0.8 0.8 0. 0.1
state / action a1 : North a2 : South a3 : East a4 : West s1 0.92 0.10 0.35 0.05 s2 0.25 0.52 0.43 0.37 s3 0.78 0.9 1.0 0.81 s4 0.0 1.0 0.9 0.9 … … … … …
TDRL Model Dopamine TD Applications Model-based RL slide # 32 / 141
TDRL Model
b
state / action a1 : North a2 : South a3 : East a4 : West s1 0.92 0.10 0.35 0.05 s2 0.25 0.52 0.43 0.37 s3 0.78 0.9 1.0 0.81 s4 0.0 1.0 0.9 0.9 … … … … …
TDRL Model Dopamine TD Applications Model-based RL slide # 33 / 141
ACTOR-CRITIC SARSA Q-LEARNING
State-dependent Reward Prediction Error (independent from the action)
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 34 / 141
ACTOR-CRITIC SARSA Q-LEARNING
State-dependent Reward Prediction Error (independent from the action)
TDRL Model
Also used to update the ACTOR
TDRL Model Dopamine TD Applications Model-based RL slide # 35 / 141
ACTOR-CRITIC SARSA Q-LEARNING
Reward Prediction Error dependent on the action chosen to be performed next
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 36 / 141
ACTOR-CRITIC SARSA Q-LEARNING
Reward Prediction Error dependent on the best action
TDRL Model
TDRL Model Dopamine TD Applications Model-based RL slide # 37 / 141
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 38 / 141
Dopamine
Taken from Bernard Balleine’s lecture at Okinawa Computational Neuroscience Course (2005).
TDRL Model Dopamine TD Applications Model-based RL slide # 39 / 141
reward reinforcement
R S
Schultz et al. (1993); Houk et al. (1995); Schultz et al. (1997).
+1
Analogy with dopaminergic neurons’ activity
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 40 / 141
reward reinforcement
R S +1
Schultz et al. (1993); Houk et al. (1995); Schultz et al. (1997).
Analogy with dopaminergic neurons’ activity
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 41 / 141
reward reinforcement
R S
Schultz et al. (1993); Houk et al. (1995); Schultz et al. (1997).
Analogy with dopaminergic neurons’ activity
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 42 / 141
reward reinforcement
R S
Schultz et al. (1993); Houk et al. (1995); Schultz et al. (1997).
Analogy with dopaminergic neurons’ activity
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 43 / 141
Barto (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002). see Joel et al. (2002) for a review.
Dopaminergic neuron
Houk et al. (1995)
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 44 / 141
also called: Tapped-delay line Temporal-order input [0 0 1 0 0 0 0]
Montague et al. (1996); Suri & Schultz (2001) Daw (2003); Bertin et al. (2007).
Dopaminergic neuron
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 45 / 141
Dopaminergic neuron
Temporal-order input [0 0 1 0 0 0 0]
1 2 3 4 5 reward
information
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 46 / 141
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 47 / 141
Niv et al. (2006), commentary about the results presented in Morris et al. (2006) Nat Neurosci.
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 48 / 141
Niv et al. (2006), commentary about the results presented in Morris et al. (2006) Nat Neurosci.
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 49 / 141
Niv et al. (2006), commentary about the results presented in Morris et al. (2006) Nat Neurosci.
Dopamine
TDRL Model Dopamine TD Applications Model-based RL slide # 50 / 141
Sequence of observed trials : Left (Reward); Left (Nothing); Right (Nothing); Left (Reward); …
RL model
Brain responses Prediction error fMRI scanner
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 51 / 141
TD-learning models Behavior of the animal
Bellot, Sigaud, Khamassi (2012) SAB conference.
Low fitting error High fitting error
TDRL Model Dopamine TD Applications Model-based RL slide # 52 / 141
Khamassi et al. (2013) Prog Brain Res; Khamassi et al. (in revision)
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 53 / 141
Multiple regression analysis with bootstrap Q δ β*
Khamassi et al. (2013) Prog Brain Res; Khamassi et al. (in revision)
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 54 / 141
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 55 / 141
Actions 1 2 3 4 5 reward
Sensory input
Khamassi et al. (2005). Adaptive Behavior. Khamassi et al. (2006). Lecture Notes in Computer Science
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 56 / 141
Coordination by a self-organizing map Actor-Critic multi-modules neural network
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 57 / 141
Hand-tuned Autonomous Random
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 58 / 141
Autonomous
Two methods :
(tests modules' capacity for state prediction)
Baldassarre (2002); Doya et al. (2002). Within a particular subpart of the maze, only the module with the most accurate reward prediction is
responsible for learning in a given task subset.
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 59 / 141
average
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 60 / 141
(Average performance during the second half of the experiment)
94 3,500 404 30,000
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 61 / 141
(Average performance during the second half of the experiment)
94 3,500 404 30,000
TD Applications
TDRL Model Dopamine TD Applications Model-based RL slide # 62 / 141
TDRL Model Dopamine TD Applications Model-based RL slide # 63 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 64 / 141
Model-based RL
learning rate (=0.1) discount factor (=0.9)
TDRL Model Dopamine TD Applications Model-based RL slide # 65 / 141
Method in Artificial Intelligence: Off-line Dyna-Q-learning (Sutton & Barto, 1998)
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 66 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 67 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 68 / 141
maxQ=0.3 maxQ=0.9 maxQ=0.7
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 69 / 141
maxQ=0.3 maxQ=0.9 maxQ=0.7
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 70 / 141
maxQ=0.3 maxQ=0.9 maxQ=0.7
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 71 / 141
maxQ=0.3 maxQ=0.9 maxQ=0.7
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 72 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 73 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 74 / 141
Nakazawa, McHugh, Wilson, Tonegawa (2004) Nature Reviews Neuroscience
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 75 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 76 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 77 / 141
“Ripple” events = irregular
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 78 / 141
Girardeau G, Benchenane K, Wiener SI, Buzsáki G,
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 79 / 141
Johnson & Redish (2007) J Neurosci
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 80 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 81 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 82 / 141
Model-based RL
Koos, Mouret & Doncieux. IEEE Trans Evolutionary Comput 2012
TDRL Model Dopamine TD Applications Model-based RL slide # 83 / 141
Model-based RL
Koos, Cully & Mouret. Int J Robot Res 2013
TDRL Model Dopamine TD Applications Model-based RL slide # 84 / 141
1.
2.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 85 / 141
(Daw Niv Dayan 2005, Nat Neurosci) Model-based system Model-free sys. Skinner box (instrumental conditioning)
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 86 / 141
Yin et al. 2004; Balleine 2005; Yin & Knowlton 2006
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 87 / 141
Devalue
Yin et al. 2004; Balleine 2005; Yin & Knowlton 2006
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 88 / 141
Yin et al. 2004; Balleine 2005; Yin & Knowlton 2006
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 89 / 141
Goal-directed Habitual
Yin et al. 2004; Balleine 2005; Yin & Knowlton 2006
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 90 / 141
Change R: fast to update Goal-directed Change R: slow to update Habitual
Switch with experience [reduce computational load] Daw et al 2005 Nat Neurosci
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 91 / 141
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 92 / 141
Khamassi & Humphries (2012) Frontiers in Behavioral Neuroscience
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 93 / 141
Model-based RL
Benoît Girard 2010 UPMC lecture
Model-based navigation Model-free navigation
TDRL Model Dopamine TD Applications Model-based RL slide # 94 / 141
Martinet et al. (2011) model applied to the Tolman maze
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 95 / 141
Martinet et al. (2011) model applied to the Tolman maze
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 96 / 141
Devan and White, 1999 Packard and Knowlton, 2002 Rotation 180° N S O E Rats with a lesion
Rats with a lesion of the dorsal striatum Previous platform location Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 97 / 141
Model-free system (basal ganglia) Model-based system (hippocampal place cells)
Work by Laurent Dollé: Dollé et al., 2008, 2010, submitted
TDRL Model Dopamine TD Applications Model-based RL slide # 98 / 141
Model: Dollé et al., 2010
Task of Pearce et al., 1998 Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 99 / 141
Caluwaerts et al. (2012) Biomimetics & Bioinspiration
TDRL Model Dopamine TD Applications Model-based RL slide # 100 / 141
Caluwaerts et al. (2012) Biomimetics & Bioinspiration
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 101 / 141
TDRL Model Dopamine TD Applications Model-based RL slide # 102 / 141
Task: Clean the table Current state: A priori given action plan (right image) Goal: Autonomous learning by the robot
Model-based RL
TDRL Model Dopamine TD Applications Model-based RL slide # 103 / 141
Flagel et al. (2011). “A selective role for dopamine in stimulus-reward learning”. Nature, 469:53:7.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 104 / 141
Flagel et al. (2011). “A selective role for dopamine in stimulus-reward learning”. Nature, 469:53:7.
Fast Scan Cyclic Voltammetry (FSCV) in the ventral striatum.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 105 / 141
Flagel et al. (2011). “A selective role for dopamine in stimulus-reward learning”. Nature, 469:53:7.
Fast Scan Cyclic Voltammetry (FSCV) in the ventral striatum.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 106 / 141
Flagel et al. (2011). “A selective role for dopamine in stimulus-reward learning”. Nature, 469:53:7.
Systemic injection of flupentixol prior to each session.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 107 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 108 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
McClure et al. (2003); Humphries et al. (2012) Schultz et al. (1997)
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 109 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 110 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 111 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 112 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 113 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 114 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 115 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 116 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 117 / 141
Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLOS Computational Biology.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 118 / 141
1.
2.
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 119 / 141
Doya, 2002
Meta-Learning
b
TDRL Model Dopamine TD Applications Model-based RL slide # 120 / 141
Meta-Learning
b
Doya, 2002
TDRL Model Dopamine TD Applications Model-based RL slide # 121 / 141
condition change
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 122 / 141
Khamassi et al. (2011) Front in Neurorobotics; Khamassi et al. (2013) Prog Brain Res
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 123 / 141
Meta-Learning
Khamassi et al. (2011) Frontiers in Neurorobotics
TDRL Model Dopamine TD Applications Model-based RL slide # 124 / 141 Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 125 / 141 Meta-Learning
Khamassi et al. (2011) Frontiers in Neurorobotics
Reproduction of the global properties of monkey
TDRL Model Dopamine TD Applications Model-based RL slide # 126 / 141
Multiple regression analysis with bootstrap Q δ β*
Khamassi et al. (2013) Prog Brain Res; Khamassi et al. (2014) Cerebral Cortex
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 127 / 141
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 128 / 141
Khamassi et al. (2011) Frontiers in Neurorobotics
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 129 / 141 Go signal Error Wooden board Reward Choice Human’s hands Cheating Cheating
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 130 / 141 meta-value(i) meta-value(i) + α’. Δ[averageReward] Threshold
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 131 / 141
ACC is in an appropriate position to evaluate feedback
ACC-LPFC interactions could regulate exploration
Such modulation could be subserved via
Such a pluridisciplinary approach can contribute both
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 132 / 141
Can meta-learning principles be useful for the
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 133 / 141
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 134 / 141
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 135 / 141
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 136 / 141
Recurrent neural-networks applied to Robotics
Mayer et al. (IROS 2006) Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 137 / 141
RL with self-modifying policies (actions that can edit
Success-story criterion (time varying set V of past
Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 138 / 141
Learning maps of task-relevant motor behaviors under
How can these primitive constrained motor behaviors
Stollenga et al. (IROS 2013) Meta-Learning
TDRL Model Dopamine TD Applications Model-based RL slide # 139 / 141
Dopamine neurons encode a reward prediction error. Model-based analysis in Neurosci of Decision-making Reinforcement Learning models need to be refined to
multiple parallel decision systems. off-line learning during sleep. meta-learning (ACC-DLPFC interactions). These model improvements can produce testable
TDRL Model Dopamine TD Applications Model-based RL slide # 140 / 141
The Reinforcement Learning framework provides
It can also help explain neural activity in the brain. Such a pluridisciplinary approach can contribute both
TDRL Model Dopamine TD Applications Model-based RL slide # 141 / 141
Nassim Aklil Jean Bellot Ken Caluwaerts Raja Chatila Laurent Dollé Benoît Girard Florian Lesaint Olivier Sigaud Guillaume Viejo
Rachid Alami Aurélie Clodic
Mark D. Humphries
FP6 IST 027189 European project Learning under Uncertainty Project ; ROBOERGOSUM Project HABOT Project Emergence(s) Program
TDRL Model Dopamine TD Applications Model-based RL slide # 142 / 141