[PPT] - Learning Flexible Goal-Directed Behavior Christian Balkenius Lund PowerPoint Presentation

SLIDE 1

Learning Flexible Goal-Directed Behavior

Christian Balkenius

Lund University Cognitive Science

Wednesday, April 18, 12

SLIDE 2

Wednesday, April 18, 12

SLIDE 3

Wednesday, April 18, 12

SLIDE 4

Wednesday, April 18, 12

SLIDE 5

Wednesday, April 18, 12

SLIDE 6

Stimulus-Approach Stimulus-Response Contextual Inhibition

Wednesday, April 18, 12

SLIDE 7

E. L. Thorndike

(1874-1949)

E. C. Tolman

(1886-1959)

Wednesday, April 18, 12

SLIDE 8

E. L. Thorndike

(1874-1949)

E. C. Tolman

(1886-1959)

Reactive Behavior

Wednesday, April 18, 12

SLIDE 9

E. L. Thorndike

(1874-1949)

E. C. Tolman

(1886-1959)

Reactive Behavior Purposive Behavior

Wednesday, April 18, 12

SLIDE 10

A A

Mackintosh, 1983

Wednesday, April 18, 12

SLIDE 11

A A

Mackintosh, 1983

Wednesday, April 18, 12

SLIDE 12

A A

Stimulus-Response Stimulus-Approach

Mackintosh, 1983

Wednesday, April 18, 12

SLIDE 13

Wednesday, April 18, 12

SLIDE 14

B

Balkenius, Dacke, Balkenius, 2010

V O M V O M

Turning velocity as function

f angle to target

lateral zone frontal zone

V O M

Lateral velocity as function

f angle to target

V O M

Forward velocity as function

f distance to target

Forward velocity as function

f angle to target

A B C D Wednesday, April 18, 12

SLIDE 15

60, 50, 0°

0, 50, 0° 60, 50, 0°

60, 0, -0°

0, 0, 0° 60, 0, 0°

60,-50, 0°

0, -50, 0° 60, -50, 0°

Balkenius, Robotics and Autonomous Systems 1998

Wednesday, April 18, 12

SLIDE 16

60, 50, 0°

0, 50, 0° 60, 50, 0°

60, 0, -0°

0, 0, 0° 60, 0, 0°

60,-50, 0°

0, -50, 0° 60, -50, 0°

Balkenius, Robotics and Autonomous Systems 1998

2 4 6 8 10 12 5 10 0.2 0.4 0.6 0.8 1

corr

Figure 3.2 Sensitivity to translation when all contribute to the average

Wednesday, April 18, 12

SLIDE 17

60, 50, 0°

0, 50, 0° 60, 50, 0°

60, 0, -0°

0, 0, 0° 60, 0, 0°

60,-50, 0°

0, -50, 0° 60, -50, 0°

Balkenius, Robotics and Autonomous Systems 1998

2 4 6 8 10 12 5 10 0.2 0.4 0.6 0.8 1

corr

Figure 3.2 Sensitivity to translation when all contribute to the average

Wednesday, April 18, 12

SLIDE 18

60, 50, 0°

0, 50, 0° 60, 50, 0°

60, 0, -0°

0, 0, 0° 60, 0, 0°

60,-50, 0°

0, -50, 0° 60, -50, 0°

Balkenius, Robotics and Autonomous Systems 1998

2 4 6 8 10 12 5 10 0.2 0.4 0.6 0.8 1

corr

Figure 3.2 Sensitivity to translation when all contribute to the average

value

Wednesday, April 18, 12

SLIDE 19

3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 20

3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 21

3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 22

3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 23

3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 24

3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 25

3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 26

eye regions

Switching Control

Balkenius & Kopp, 1997

Wednesday, April 18, 12

SLIDE 27

eye regions

Switching Control

rienting

Balkenius & Kopp, 1997

Wednesday, April 18, 12

SLIDE 28

eye regions

Switching Control

rienting

saccades

Balkenius & Kopp, 1997

Wednesday, April 18, 12

SLIDE 29

eye regions

Switching Control

rienting

saccades fixation / smooth pursuit

Balkenius & Kopp, 1997

Wednesday, April 18, 12

SLIDE 30

Wednesday, April 18, 12

SLIDE 31

Wednesday, April 18, 12

SLIDE 32

Smooth Pursuit Eye-Movements

Wednesday, April 18, 12

SLIDE 33

฀
Balkenius & Johansson, 2005

Wednesday, April 18, 12

SLIDE 34

฀
F0

F1 F2 F3

Balkenius, Åström, Eriksson, 2004

Orienting

Wednesday, April 18, 12

SLIDE 35

฀
F0

F1 F2 F3

S(x, y) = G(x, y) ∗

m

θmFm(x, y)

Balkenius, Åström, Eriksson, 2004

Orienting

Wednesday, April 18, 12

SLIDE 36

฀
Balkenius, 2000

Saccades stimulus-response

Wednesday, April 18, 12

SLIDE 37

฀
Balkenius, 2000

Saccades stimulus-response

Wednesday, April 18, 12

SLIDE 38

3 2 1 3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 39

3 2 1 3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 40

3 2 1 3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 41

3 2 1 3 2 1

Stimulus-Approach

Wednesday, April 18, 12

SLIDE 42

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

Wednesday, April 18, 12

SLIDE 43

฀
Balkenius & Johansson, 2005

Smooth Pursuit stimulus-approach

Wednesday, April 18, 12

SLIDE 44

฀
Balkenius & Johansson, 2005

Smooth Pursuit stimulus-approach

delay delay delay

DELAY LINE

p(t) p(t+n) p(t-1) p(t-2) p(t-3)

Linear Predictor

Wednesday, April 18, 12

SLIDE 45

฀
Balkenius & Johansson, 2005

Smooth Pursuit stimulus-approach

delay delay delay

DELAY LINE

p(t) p(t+n) p(t-1) p(t-2) p(t-3)

Linear Predictor

Prediction confidence sets the gain of the controller

Wednesday, April 18, 12

SLIDE 46

The Development of Smooth Pursuit

Gradual development

from catch up saccades to smooth pursuit from 0-4 month

Data from von Hofsten & Rosander, 1997 Simulation from Balkenius & Johansson, Epigenetic Robotics, 2005

Wednesday, April 18, 12

SLIDE 47

The Development of Smooth Pursuit

Gradual development

from catch up saccades to smooth pursuit from 0-4 month

Data from von Hofsten & Rosander, 1997 Simulation from Balkenius & Johansson, Epigenetic Robotics, 2005

Wednesday, April 18, 12

SLIDE 48

Learning to Reach

S1 S2 S1 S2 A. B.

Coded Dimension Tuning Curve Tuning Curves Detector Response Detector Response Coded Dimension

Population Response to S1 Population Response to S2 Detector Response to S1 Detector Response to S2 D1 D2 D3 D1 D2 D1 D1 D3

Wednesday, April 18, 12

SLIDE 49

Learning to Reach

the system learns associations between retinal positions and joint angles

S1 S2 S1 S2 A. B.

Coded Dimension Tuning Curve Tuning Curves Detector Response Detector Response Coded Dimension

Population Response to S1 Population Response to S2 Detector Response to S1 Detector Response to S2 D1 D2 D3 D1 D2 D1 D1 D3

Wednesday, April 18, 12

SLIDE 50

Learning to Reach

the system learns associations between retinal positions and joint angles not driven by error between hand position and target

S1 S2 S1 S2 A. B.

Coded Dimension Tuning Curve Tuning Curves Detector Response Detector Response Coded Dimension

Population Response to S1 Population Response to S2 Detector Response to S1 Detector Response to S2 D1 D2 D3 D1 D2 D1 D1 D3

Wednesday, April 18, 12

SLIDE 51

Learning to Reach

the system learns associations between retinal positions and joint angles not driven by error between hand position and target population coding supports interpolation and some extrapolation

S1 S2 S1 S2 A. B.

Coded Dimension Tuning Curve Tuning Curves Detector Response Detector Response Coded Dimension

Population Response to S1 Population Response to S2 Detector Response to S1 Detector Response to S2 D1 D2 D3 D1 D2 D1 D1 D3

Wednesday, April 18, 12

SLIDE 52

Learning to Reach

sensory prediction + inverse model

Wednesday, April 18, 12

SLIDE 53

Learning to Reach

Wednesday, April 18, 12

SLIDE 54

Learning to Reach

ngoing interaction

Wednesday, April 18, 12

SLIDE 55

0 min

Fishing

500 ms sensory-motor delay

Wednesday, April 18, 12

SLIDE 56

Fishing

2 min

Wednesday, April 18, 12

SLIDE 57

Fishing

5 min

Wednesday, April 18, 12

SLIDE 58

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2 1

Wednesday, April 18, 12

SLIDE 59

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2 1

Wednesday, April 18, 12

SLIDE 60

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2 1

Wednesday, April 18, 12

SLIDE 61

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2 1

Wednesday, April 18, 12

SLIDE 62

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2 1

Wednesday, April 18, 12

SLIDE 63

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2 1

Wednesday, April 18, 12

SLIDE 64

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2 1

Wednesday, April 18, 12

SLIDE 65

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2

Contextual Inhibition

1

Wednesday, April 18, 12

SLIDE 66

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2

Contextual Inhibition

1

Wednesday, April 18, 12

SLIDE 67

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2

Contextual Inhibition

1

Wednesday, April 18, 12

SLIDE 68

3 2 1 3 2 1

Stimulus-Approach Stimulus-Response

3 2

Contextual Inhibition

1

Wednesday, April 18, 12

SLIDE 69

126 600ms

C E TIME

Fig. 7. Neuronal population vectors are plotted

every 10 ms vs time. C, onset of the delay; E, end of the waiting period. The filled circle on the abscissa indicates the time after the beginning of the delay (130 ms) at which the population vector reached statistical significance

.,;>

C 400 ~ 800ms TIME

Fig. 8. The length of the population vector vs time. C, onset of

delay; E, end of waiting period.

Fig. 8, showing that the signal increased gradually during

the waiting period. The population vector reached statistical significance (P<0.05, modified Rayleigh test, see Materials and methods) 130 ms after the beginning of the

delay. Figure 7 shows that at this point the population

vector was pointing in the leftward direction, similar to the direction of the final part of the movement in the memorized trajectory.

Directional engagement of cells during the waiting period.

The analyses above dealt with the neuronal population

vector. However, a different insight into the process un-

folding during the waiting period can be gained by ana- lyzing the directional properties of cells engaged during that period. Given that directionally tuned cells were preferentially engaged during the waiting period (Table 2; see above), we analyzed the distributions of the directional influences exerted by the cells that changed activity during each of the three 200-ms epochs of the waiting period (see Materials and methods): if the cell activity increased, the cell was taken to exert a unit-length directional influence in its preferred direction; if the activity decreased, the opposite direction was taken. Frequency distributions of these directions were then constructed and plotted. The following can be seen in Fig. 9. First, the directional influences of cells recruited in each of the three epochs are widely distributed. Second, the distribution of the directional influences of cells recruited during the first 200 ms of the waiting period is skewed towards a leftward direction; indeed, the mean direction (Mardia 1972) of this distribution is at 186.5 ~ and it is statistically significant (mean resultant 0.379, n=22, P<0.05, Ray- leigh test). Third, there is a clockwise shift in the directional influences of cells recruited during the second 200 ms of the waiting period; the mean direction is now at 116.8 ~ (length of mean resultant 0.475, n=27, P<0.01, Rayleigh test). Finally, there is a further clockwise shift in the directional influences of cells recruited during the last 200 ms of the waiting period, but this is not statistically

significant. Of course, the ongoing weighted contribu-

tions of all these cells are combined to yield the neuronal population vector (see above); but this analysis showed (a) that the directional contributions by single cells were distributed and not restricted to a narrow set of directions and (b) that there was a shifting directional engagement of cells, from the leftward (+-) to the upward direction (T).

Location of recordings. The recording sites for both ani-

mals were in the crown and the exposed part of the pre- central gyrus (Brodmann's area 4; Fig. 10).

Human studies

The mean (_+ SD) of the immediate premovement time in the memorized movement trials was 204__38 ms. Con-

A B

Fig. 9A-C. Polar plots of direc-

tional influences of single cells during the first three successive 200-ms epochs of the waiting peri-

d. A 0-200 ms; B 200-400 ms;

C 400-600 ms. Bars are plotted in the middle of 10 ~ directional bins. The length of a bar indicates the percentage of cells making directional contributions within a par- ticular bin. The center circle repre- sents 0 and the outer circle 5% change Neuronal population vectors are plotted every 10 ms vs time. C, onset of the delay; E, end of the waiting period. The filled circle on the abscissa indicates the time after the beginning of the delay (130 ms) at which the population vector reached statistical significance

Ashe, et al. (1993). Exp Brain Res 95:118-130 movement 1 movement 2

Wednesday, April 18, 12

SLIDE 70

Phase 1 CXA : CS + US Phase 2 CXA : CS Test A CXA : CS → no-CR Test B CXB : CS → CR

Extinction does not transfer to a new context (Bouton 1991, 1992)

Context Effects

Wednesday, April 18, 12

SLIDE 71

Contextual Inhibition

Wednesday, April 18, 12

SLIDE 72

Three Learning Conditions

Rew

better than expected maximal generalization

Rew

worse than expected contextual exception

Pun

bad minimal generalization

Wednesday, April 18, 12

SLIDE 73

S a

excitation

Wednesday, April 18, 12

SLIDE 74

S a C

suppression excitation

Wednesday, April 18, 12

SLIDE 75

S a C

suppression excitation inhibition

Wednesday, April 18, 12

SLIDE 76

S a C

suppression excitation inhibition

Rew Rew Pun

Wednesday, April 18, 12

SLIDE 77

S a C

suppression excitation GENERALIZATION inhibition

Rew Rew Pun

Wednesday, April 18, 12

SLIDE 78

S a C

suppression excitation GENERALIZATION SPECIALIZATION inhibition

Rew Rew Pun

Wednesday, April 18, 12

SLIDE 79

S a C

suppression excitation GENERALIZATION SPECIALIZATION inhibition

Rew Rew Pun

SPECIFIC

Wednesday, April 18, 12

SLIDE 80

State & Action Evaluation Sensory Coding Action Selection

Wednesday, April 18, 12

SLIDE 81

ACTOR CRITIC PUNISH RL-CORE

Σ Σ

actor target critic target potential actions selected action

Sensory Coding Action Selection

Wednesday, April 18, 12

SLIDE 82

ACTOR CRITIC PUNISH RL-CORE

Σ Σ

actor target critic target potential actions selected action

Δ = 0 Δ = 1 Δ = 2

Sensory Coding Action Selection

Wednesday, April 18, 12

SLIDE 83

ACTOR CRITIC PUNISH RL-CORE SELECT MERGE INV D

Σ Σ

actor target critic target selected action potential actions selected action

bstacles

collision l

c

a t i

n

location

Δ = 0 Δ = 1 Δ = 2

WORLD

Wednesday, April 18, 12

SLIDE 84

Wednesday, April 18, 12

SLIDE 85

Q(c, s, aj) =

n

i=0

siwijIij, Iij =

p

k=0

(1 − ckuijk). u(t+1)

ijk

= u(t)

ijk − β(1 − u(t) ijk)siajck

|s|wij ∆Qt n ∆Qt > 0. n ∆Qt < 0, w(t+1)

ij

= w(t)

ij + αsiaj

|s| ∆Qt

Learning Algorithm

Wednesday, April 18, 12

SLIDE 86

Maze Accumulated Extra Steps

100 200 300 400 500 600 700 20 40 60 80 100 200 300 400 500 600 700 20 40 60 80 100 200 300 400 500 600 700 20 40 60 80 500 1000 1500 2000 2500 20 40 60 80 1000 2000 3000 4000 5000 6000 20 40 60 80

Wednesday, April 18, 12

SLIDE 87

Maze Accumulated Extra Steps

200 400 600 800 1000 1200 1400 20 40 60 80 100 200 400 600 800 1000 20 40 60 80 100 500 1000 1500 2000 2500 3000 3500 20 40 60 80 100 2000 4000 6000 8000 10000 12000 200 400 600 800 1000 20 40 60 80 100

Wednesday, April 18, 12

SLIDE 88

50000 100000 150000 200000 250000 300000 350000 400000 20 40 60 80 100

A More Complex Example

Wednesday, April 18, 12

SLIDE 89

Q-Learning with a regular linear network

10 100 1000 10000 100000 50 100 150 200 250 300 350 400 450

Context-Q

10 100 1000 10000 100000 50 100 150 200 250 300 350 400 450

1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd

S G

S

G

Winberg, 2005

Context Prevents Catastrophic Forgetting

Wednesday, April 18, 12

SLIDE 90

Four Algorithms

Q

ContextQ ContextAC ContextACP Q(s, a) Q(c, s, a) Q(c, s, a) V(s) Q(c, s, a) V(s) P(s, a)

‘stimulus generalization’ contextual specialization progress separate from state that controls action learns to avoid doing bad things

Wednesday, April 18, 12

SLIDE 91

Four Algorithms

Q

ContextQ ContextAC ContextACP Q(s, a) Q(c, s, a) Q(c, s, a) V(s) Q(c, s, a) V(s) P(s, a)

‘stimulus generalization’ contextual specialization progress separate from state that controls action learns to avoid doing bad things

general less general

Wednesday, April 18, 12

SLIDE 92

Stimulus-Approach Stimulus-Response Contextual Inhibition

Wednesday, April 18, 12

SLIDE 93

Wednesday, April 18, 12