Learning Flexible Goal-Directed Behavior
Christian Balkenius
Lund University Cognitive Science
Wednesday, April 18, 12
Learning Flexible Goal-Directed Behavior Christian Balkenius Lund - - PowerPoint PPT Presentation
Learning Flexible Goal-Directed Behavior Christian Balkenius Lund University Cognitive Science Wednesday, April 18, 12 Wednesday, April 18, 12 Wednesday, April 18, 12 Wednesday, April 18, 12 Wednesday, April 18, 12 Stimulus-Approach
Christian Balkenius
Lund University Cognitive Science
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
(1874-1949)
(1886-1959)
Wednesday, April 18, 12
(1874-1949)
(1886-1959)
Wednesday, April 18, 12
(1874-1949)
(1886-1959)
Wednesday, April 18, 12
Mackintosh, 1983
Wednesday, April 18, 12
Mackintosh, 1983
Wednesday, April 18, 12
Stimulus-Response Stimulus-Approach
Mackintosh, 1983
Wednesday, April 18, 12
Wednesday, April 18, 12
Balkenius, Dacke, Balkenius, 2010
V O M V O M
Turning velocity as function
lateral zone frontal zone
V O M
Lateral velocity as function
V O M
Forward velocity as function
Forward velocity as function
A B C D Wednesday, April 18, 12
0, 50, 0° 60, 50, 0°
0, 0, 0° 60, 0, 0°
0, -50, 0° 60, -50, 0°
Balkenius, Robotics and Autonomous Systems 1998
Wednesday, April 18, 12
0, 50, 0° 60, 50, 0°
0, 0, 0° 60, 0, 0°
0, -50, 0° 60, -50, 0°
Balkenius, Robotics and Autonomous Systems 1998
2 4 6 8 10 12 5 10 0.2 0.4 0.6 0.8 1
corr
Figure 3.2 Sensitivity to translation when all contribute to the average
Wednesday, April 18, 12
0, 50, 0° 60, 50, 0°
0, 0, 0° 60, 0, 0°
0, -50, 0° 60, -50, 0°
Balkenius, Robotics and Autonomous Systems 1998
2 4 6 8 10 12 5 10 0.2 0.4 0.6 0.8 1
corr
Figure 3.2 Sensitivity to translation when all contribute to the average
Wednesday, April 18, 12
0, 50, 0° 60, 50, 0°
0, 0, 0° 60, 0, 0°
0, -50, 0° 60, -50, 0°
Balkenius, Robotics and Autonomous Systems 1998
2 4 6 8 10 12 5 10 0.2 0.4 0.6 0.8 1
corr
Figure 3.2 Sensitivity to translation when all contribute to the average
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Balkenius & Kopp, 1997
Wednesday, April 18, 12
Balkenius & Kopp, 1997
Wednesday, April 18, 12
Balkenius & Kopp, 1997
Wednesday, April 18, 12
Balkenius & Kopp, 1997
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
F1 F2 F3
Balkenius, Åström, Eriksson, 2004
Wednesday, April 18, 12
F1 F2 F3
S(x, y) = G(x, y) ∗
θmFm(x, y)
Balkenius, Åström, Eriksson, 2004
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Wednesday, April 18, 12
Wednesday, April 18, 12
delay delay delay
DELAY LINE
p(t) p(t+n) p(t-1) p(t-2) p(t-3)
Linear Predictor
Wednesday, April 18, 12
delay delay delay
DELAY LINE
p(t) p(t+n) p(t-1) p(t-2) p(t-3)
Linear Predictor
Prediction confidence sets the gain of the controller
Wednesday, April 18, 12
from catch up saccades to smooth pursuit from 0-4 month
Data from von Hofsten & Rosander, 1997 Simulation from Balkenius & Johansson, Epigenetic Robotics, 2005
Wednesday, April 18, 12
from catch up saccades to smooth pursuit from 0-4 month
Data from von Hofsten & Rosander, 1997 Simulation from Balkenius & Johansson, Epigenetic Robotics, 2005
Wednesday, April 18, 12
S1 S2 S1 S2 A. B.
Coded Dimension Tuning Curve Tuning Curves Detector Response Detector Response Coded Dimension
Population Response to S1 Population Response to S2 Detector Response to S1 Detector Response to S2 D1 D2 D3 D1 D2 D1 D1 D3Wednesday, April 18, 12
the system learns associations between retinal positions and joint angles
S1 S2 S1 S2 A. B.
Coded Dimension Tuning Curve Tuning Curves Detector Response Detector Response Coded Dimension
Population Response to S1 Population Response to S2 Detector Response to S1 Detector Response to S2 D1 D2 D3 D1 D2 D1 D1 D3Wednesday, April 18, 12
the system learns associations between retinal positions and joint angles not driven by error between hand position and target
S1 S2 S1 S2 A. B.
Coded Dimension Tuning Curve Tuning Curves Detector Response Detector Response Coded Dimension
Population Response to S1 Population Response to S2 Detector Response to S1 Detector Response to S2 D1 D2 D3 D1 D2 D1 D1 D3Wednesday, April 18, 12
the system learns associations between retinal positions and joint angles not driven by error between hand position and target population coding supports interpolation and some extrapolation
S1 S2 S1 S2 A. B.
Coded Dimension Tuning Curve Tuning Curves Detector Response Detector Response Coded Dimension
Population Response to S1 Population Response to S2 Detector Response to S1 Detector Response to S2 D1 D2 D3 D1 D2 D1 D1 D3Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
0 min
500 ms sensory-motor delay
Wednesday, April 18, 12
2 min
Wednesday, April 18, 12
5 min
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Contextual Inhibition
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Contextual Inhibition
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Contextual Inhibition
Wednesday, April 18, 12
Stimulus-Approach Stimulus-Response
Contextual Inhibition
Wednesday, April 18, 12
126 600ms
C E TIME
every 10 ms vs time. C, onset of the delay; E, end of the waiting period. The filled circle on the abscissa indicates the time after the begin- ning of the delay (130 ms) at which the popula- tion vector reached statistical significance
C 400 ~ 800ms TIME
delay; E, end of waiting period.
the waiting period. The population vector reached statis- tical significance (P<0.05, modified Rayleigh test, see Materials and methods) 130 ms after the beginning of the
vector was pointing in the leftward direction, similar to the direction of the final part of the movement in the memorized trajectory.
Directional engagement of cells during the waiting period.
The analyses above dealt with the neuronal population
folding during the waiting period can be gained by ana- lyzing the directional properties of cells engaged during that period. Given that directionally tuned cells were preferentially engaged during the waiting period (Table 2; see above), we analyzed the distributions of the direc- tional influences exerted by the cells that changed activity during each of the three 200-ms epochs of the waiting period (see Materials and methods): if the cell activity increased, the cell was taken to exert a unit-length direc- tional influence in its preferred direction; if the activity decreased, the opposite direction was taken. Frequency distributions of these directions were then constructed and plotted. The following can be seen in Fig. 9. First, the directional influences of cells recruited in each of the three epochs are widely distributed. Second, the distribu- tion of the directional influences of cells recruited during the first 200 ms of the waiting period is skewed towards a leftward direction; indeed, the mean direction (Mardia 1972) of this distribution is at 186.5 ~ and it is statistically significant (mean resultant 0.379, n=22, P<0.05, Ray- leigh test). Third, there is a clockwise shift in the direc- tional influences of cells recruited during the second 200 ms of the waiting period; the mean direction is now at 116.8 ~ (length of mean resultant 0.475, n=27, P<0.01, Rayleigh test). Finally, there is a further clockwise shift in the directional influences of cells recruited during the last 200 ms of the waiting period, but this is not statistically
tions of all these cells are combined to yield the neuronal population vector (see above); but this analysis showed (a) that the directional contributions by single cells were distributed and not restricted to a narrow set of direc- tions and (b) that there was a shifting directional engage- ment of cells, from the leftward (+-) to the upward direc- tion (T).
Location of recordings. The recording sites for both ani-
mals were in the crown and the exposed part of the pre- central gyrus (Brodmann's area 4; Fig. 10).
Human studies
The mean (_+ SD) of the immediate premovement time in the memorized movement trials was 204__38 ms. Con-
A B
tional influences of single cells during the first three successive 200-ms epochs of the waiting peri-
C 400-600 ms. Bars are plotted in the middle of 10 ~ directional bins. The length of a bar indicates the percentage of cells making direc- tional contributions within a par- ticular bin. The center circle repre- sents 0 and the outer circle 5% change Neuronal population vectors are plotted every 10 ms vs time. C, onset of the delay; E, end of the waiting period. The filled circle on the abscissa indicates the time after the beginning of the delay (130 ms) at which the population vector reached statistical significance
Ashe, et al. (1993). Exp Brain Res 95:118-130 movement 1 movement 2
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12
excitation
Wednesday, April 18, 12
suppression excitation
Wednesday, April 18, 12
suppression excitation inhibition
Wednesday, April 18, 12
suppression excitation inhibition
Wednesday, April 18, 12
suppression excitation GENERALIZATION inhibition
Wednesday, April 18, 12
suppression excitation GENERALIZATION SPECIALIZATION inhibition
Wednesday, April 18, 12
suppression excitation GENERALIZATION SPECIALIZATION inhibition
SPECIFIC
Wednesday, April 18, 12
Wednesday, April 18, 12
ACTOR CRITIC PUNISH RL-CORE
actor target critic target potential actions selected action
Wednesday, April 18, 12
ACTOR CRITIC PUNISH RL-CORE
actor target critic target potential actions selected action
Δ = 0 Δ = 1 Δ = 2
Wednesday, April 18, 12
ACTOR CRITIC PUNISH RL-CORE SELECT MERGE INV D
actor target critic target selected action potential actions selected action
collision l
a t i
location
Δ = 0 Δ = 1 Δ = 2
WORLD
Wednesday, April 18, 12
Wednesday, April 18, 12
Q(c, s, aj) =
n
siwijIij, Iij =
p
(1 − ckuijk). u(t+1)
ijk
= u(t)
ijk − β(1 − u(t) ijk)siajck
|s|wij ∆Qt n ∆Qt > 0. n ∆Qt < 0, w(t+1)
ij
= w(t)
ij + αsiaj
|s| ∆Qt
Wednesday, April 18, 12
Maze Accumulated Extra Steps
100 200 300 400 500 600 700 20 40 60 80 100 200 300 400 500 600 700 20 40 60 80 100 200 300 400 500 600 700 20 40 60 80 500 1000 1500 2000 2500 20 40 60 80 1000 2000 3000 4000 5000 6000 20 40 60 80
Wednesday, April 18, 12
Maze Accumulated Extra Steps
200 400 600 800 1000 1200 1400 20 40 60 80 100 200 400 600 800 1000 20 40 60 80 100 500 1000 1500 2000 2500 3000 3500 20 40 60 80 100 2000 4000 6000 8000 10000 12000 200 400 600 800 1000 20 40 60 80 100
Wednesday, April 18, 12
50000 100000 150000 200000 250000 300000 350000 400000 20 40 60 80 100
Wednesday, April 18, 12
Q-Learning with a regular linear network
10 100 1000 10000 100000 50 100 150 200 250 300 350 400 450
Context-Q
10 100 1000 10000 100000 50 100 150 200 250 300 350 400 450
1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd
S G
S G
S
G
Winberg, 2005
Wednesday, April 18, 12
‘stimulus generalization’ contextual specialization progress separate from state that controls action learns to avoid doing bad things
Wednesday, April 18, 12
‘stimulus generalization’ contextual specialization progress separate from state that controls action learns to avoid doing bad things
Wednesday, April 18, 12
Wednesday, April 18, 12
Wednesday, April 18, 12