The Empirical Data: Do infants generalize identity rules? - - PowerPoint PPT Presentation

the empirical data do infants generalize identity rules
SMART_READER_LITE
LIVE PREVIEW

The Empirical Data: Do infants generalize identity rules? - - PowerPoint PPT Presentation

de li de ji we ji... Pre-Wiring & Pre-Training : What does a neural network need to learn truly general identity rules? Raquel G. Alhama Willem Zuidema The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old


slide-1
SLIDE 1

Raquel G. Alhama Willem Zuidema

What does a neural network need to learn truly general identity rules?

Pre-Wiring & Pre-Training:

de li de ji we ji...

slide-2
SLIDE 2

[Marcus et al. 1999]

PARTICIPANTS: 7 month old infants

The Empirical Data: Do infants generalize identity rules?

slide-3
SLIDE 3

The Empirical Data: Do infants generalize identity rules?

[Marcus et al. 1999]

PARTICIPANTS: 7 month old infants FAMILIARIZATION: ABA: “wi-je-wi le-di-le ji-li-ji … ” ABB: “wi-je-je le-di-di ji-li-li ...”

= ji li li = ji li li

slide-4
SLIDE 4

The Empirical Data: Do infants generalize identity rules?

[Marcus et al. 1999]

PARTICIPANTS: 7 month old infants FAMILIARIZATION: ABA: “wi-je-wi le-di-le ji-li-ji … ” ABB: “wi-je-je le-di-di ji-li-li ...” TEST: “ba-po-ba ko-ga-ga ba-po-po … ” ABA ABB ABB

= ji li li = ji li li

slide-5
SLIDE 5

[Marcus et al. 1999]

The Empirical Data: Do infants generalize identity rules?

slide-6
SLIDE 6

The Empirical Data: Do infants generalize identity rules?

RESULT: Difgerential attention between grammars

= ba po ba = ba po po = wi je wi = wi je je

bapoba kogako bapopo kogaga

= A B A = A B B

slide-7
SLIDE 7

Modelling the Results

  • Symbolic Cognition

“XYZ: X is the same as Z”

[Marcus et al. 1999]

= X Y Z

slide-8
SLIDE 8

Modelling the Results

  • Symbolic Cognition

“XYZ: X is the same as Z”

  • Simple Recurrent Network (SRN)

– Trained to predict next syllable – Fails to predict novel (test) items

[Evaluation: % correct in predicting the third syllable] [Marcus et al. 1999]

= X Y Z

slide-9
SLIDE 9

Modelling the Results

  • Symbolic Cognition

“XYZ: X is the same as Z”

  • Simple Recurrent Network (SRN)

– Trained to predict next syllable – Fails to predict novel (test) items

[Evaluation: % correct in predicting the third syllable] [Marcus et al. 1999]

A generalizing solution is in the hypothesis space

  • f the SRN – why doesn't it fjnd it?

= X Y Z

slide-10
SLIDE 10

Simulations with a Simple Recurrent Network

Proportion of statistically signifjcant responses to difgerent grammar conditions,

  • ut of 400 runs of the model (with difgerent parameter settings)
slide-11
SLIDE 11

Simulations with a Simple Recurrent Network

Proportion of statistically signifjcant responses to difgerent grammar conditions,

  • ut of 400 runs of the model (with difgerent parameter settings)
slide-12
SLIDE 12

What is missing in the SRN simulations?

  • The SRN was simulated as a tabula rasa

– It starts learning from a random state

slide-13
SLIDE 13

What is missing in the SRN simulations?

  • The SRN was simulated as a tabula rasa

– It starts learning from a random state

  • Pre-Wiring:

What would be a more cognitively plausible initial state?

  • Pre-Training:

What is the role of prior experience?

slide-14
SLIDE 14

Implementation: the Echo State Network

ESN, Jaeger (2001)

  • Same hypothesis space as SRN
  • Reservoir Computing approach:
  • nly the weights in the output layer

are trained

(generally with Ridge Regression, but we use Gradient Descent)

  • The weights in the reservoir are

randomly initialized

(with spectral radious < 1)

– How can we pre-wire it for this

task?

slide-15
SLIDE 15

Pre-Wiring: Delay Line Memory

  • DELAY LINE MEMORY: mechanism to preserve the

input by propagating it in a path with a delay

  • Implementation:

– “Feed-Forward” structure in the reservoir – Strict or approximated copy

t=0 t=1 t=2

slide-16
SLIDE 16

Pre-Wiring: Delay Line Memory

slide-17
SLIDE 17

Pre-Wiring: Delay Line Memory

slide-18
SLIDE 18

Pre-Wiring: Delay Line Memory

slide-19
SLIDE 19

Pre-Wiring: Delay Line Memory

slide-20
SLIDE 20

Pre-Wiring: Delay Line Memory

Does the model learn the generalized solution?

slide-21
SLIDE 21

Simulations with the Delay Line

Extended Original

slide-22
SLIDE 22

Pre-Training

  • There are many solutions that fjt the training data

(non-generalizing solutions)

  • Where does the pressure to fjnd a general solution

come from?

– Hypothesis: prior experience with environmental data may

have created a domain-general bias for abstract solutions

→ PRE-TRAINING: Incremental Novelty Exposure

slide-23
SLIDE 23

T R A I N I N G T E S T

A1 A2 A3 A4 B1 B2 B3 B4

Ai Bj Ai

C1 C2 C3 C4 D1 D2 D3 D4

Ci Dj Ci

Pre-Training: Incremental Novelty Exposure

slide-24
SLIDE 24

T R A I N I N G T E S T

A1 A2 A3 A4 B1 B2 B3 B4

A1B1 A5B5 Ai Bj Ai

C1 C2 C3 C4 D1 D2 D3 D4

Ci Dj Ci

C1 C2 C3 C4 D1 D2 D3 D4

Ci Dj Ci

A2 A3 A4 A5 B2 B3 B4 B5

Ai Bj Ai

Pre-Training: Incremental Novelty Exposure

slide-25
SLIDE 25

T R A I N I N G T E S T

A1 A2 A3 A4 B1 B2 B3 B4

A1B1 A5B5 ... ... Ai Bj Ai

C1 C2 C3 C4 D1 D2 D3 D4

Ci Dj Ci A2 B2 A6B6

C1 C2 C3 C4 D1 D2 D3 D4

Ci Dj Ci

A2 A3 A4 A5 B2 B3 B4 B5

Ai Bj Ai

Pre-Training: Incremental Novelty Exposure

slide-26
SLIDE 26

T R A I N I N G T E S T

A1 A2 A3 A4 B1 B2 B3 B4

A1B1 A5B5 ... ... Ai Bj Ai

C1 C2 C3 C4 D1 D2 D3 D4

Ci Dj Ci A2 B2 A6B6

C1 C2 C3 C4 D1 D2 D3 D4

Ci Dj Ci

A2 A3 A4 A5 B2 B3 B4 B5

Ai Bj Ai

C1 C2 C3 C4 D1 D2 D3 D4

Ci Dj Ci

Ak-3Ak-2Ak-1Ak Bk-3Bk-2Bk-1Bk

Ai Bj Ai Ak-4Bk-4 AkBk

Pre-Training: Incremental Novelty Exposure

slide-27
SLIDE 27

Simulations with Incremental Novelty Exposure

Huge increase in % of correct predictions!

Random Random

slide-28
SLIDE 28

Conclusions

  • Finally, simulations of a recurrent network successfully

solving the task of Marcus et al. 1999

  • This simple learning problem might hold lessons for

more complex architectures solving more complex tasks

  • Crucial for success are:

(i) Pre-Wiring with a structure that improves memory (cf. LSTM, Memory Networks) (ii) Pre-Training with training regimes that favour generalization (cf. Dropout)

Contact: rgalhama@uva.nl https://staff.fnwi.uva.nl/r.garridoalhama/

slide-29
SLIDE 29

Efgect of the Delay Line

Without Delay Line With Delay Line