the empirical data do infants generalize identity rules
play

The Empirical Data: Do infants generalize identity rules? - PowerPoint PPT Presentation

de li de ji we ji... Pre-Wiring & Pre-Training : What does a neural network need to learn truly general identity rules? Raquel G. Alhama Willem Zuidema The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old


  1. de li de ji we ji... Pre-Wiring & Pre-Training : What does a neural network need to learn truly general identity rules? Raquel G. Alhama Willem Zuidema

  2. The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old infants [Marcus et al. 1999]

  3. The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old infants FAMILIARIZATION: = ji li li ABA: “wi-je-wi le-di-le ji-li-ji … ” = ABB: “wi-je-je le-di-di ji-li-li ...” ji li li [Marcus et al. 1999]

  4. The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old infants FAMILIARIZATION: = ABA: “wi-je-wi le-di-le ji-li-ji … ” ji li li = ABB: “wi-je-je le-di-di ji-li-li ...” ji li li TEST: “ba-po-ba ko-ga-ga ba-po-po … ” ABB ABA ABB [Marcus et al. 1999]

  5. The Empirical Data: Do infants generalize identity rules? [Marcus et al. 1999]

  6. The Empirical Data: Do infants generalize identity rules? bapoba bapopo ≠ kogako kogaga RESULT: Difgerential attention between grammars = = = ba po ba wi je wi A B A = = = wi je je A B B ba po po

  7. Modelling the Results = ● Symbolic Cognition X Y Z “XYZ: X is the same as Z” [Marcus et al. 1999]

  8. Modelling the Results = ● Symbolic Cognition X Y Z “XYZ: X is the same as Z” ● Simple Recurrent Network (SRN) – Trained to predict next syllable – Fails to predict novel (test) items [Evaluation: % correct in predicting the third syllable] [Marcus et al. 1999]

  9. Modelling the Results = ● Symbolic Cognition X Y Z “XYZ: X is the same as Z” ● Simple Recurrent Network (SRN) – Trained to predict next syllable – Fails to predict novel (test) items [Evaluation: % correct in predicting the third syllable] A generalizing solution is in the hypothesis space of the SRN – why doesn't it fjnd it? [Marcus et al. 1999]

  10. Simulations with a Simple Recurrent Network Proportion of statistically signifjcant responses to difgerent grammar conditions, out of 400 runs of the model (with difgerent parameter settings)

  11. Simulations with a Simple Recurrent Network Proportion of statistically signifjcant responses to difgerent grammar conditions, out of 400 runs of the model (with difgerent parameter settings)

  12. What is missing in the SRN simulations? ● The SRN was simulated as a tabula rasa – It starts learning from a random state

  13. What is missing in the SRN simulations? ● The SRN was simulated as a tabula rasa – It starts learning from a random state ● Pre-Wiring : What would be a more cognitively plausible initial state? ● Pre-Training : What is the role of prior experience?

  14. Implementation: the Echo State Network ● Same hypothesis space as SRN ● Reservoir Computing approach: only the weights in the output layer are trained (generally with Ridge Regression, but we use Gradient Descent) ● The weights in the reservoir are randomly initialized (with spectral radious < 1 ) ESN, Jaeger (2001) – How can we pre-wire it for this task?

  15. Pre-Wiring: Delay Line Memory ● DELAY LINE MEMORY: mechanism to preserve the input by propagating it in a path with a delay t=0 t=1 t=2 ● Implementation: – “Feed-Forward” structure in the reservoir – Strict or approximated copy

  16. Pre-Wiring: Delay Line Memory

  17. Pre-Wiring: Delay Line Memory

  18. Pre-Wiring: Delay Line Memory

  19. Pre-Wiring: Delay Line Memory

  20. Pre-Wiring: Delay Line Memory Does the model learn the generalized solution?

  21. Simulations with the Delay Line Original Extended

  22. Pre-Training ● There are many solutions that fjt the training data (non-generalizing solutions) ● Where does the pressure to fjnd a general solution come from? – Hypothesis: prior experience with environmental data may have created a domain-general bias for abstract solutions → PRE-TRAINING: Incremental Novelty Exposure

  23. Pre-Training: Incremental Novelty Exposure T R A 1 A 2 A 3 A 4 A I B 1 B 2 B 3 B 4 N I N A i B j A i G C 1 C 2 C 3 C 4 T E S D 1 D 2 D 3 D 4 T C i D j C i

  24. Pre-Training: Incremental Novelty Exposure A 1 B 1 A 5 B 5 T R A 2 A 3 A 4 A 5 A 1 A 2 A 3 A 4 A I B 2 B 3 B 4 B 5 B 1 B 2 B 3 B 4 N I N A i B j A i A i B j A i G C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 T E S D 1 D 2 D 3 D 4 D 1 D 2 D 3 D 4 T C i D j C i C i D j C i

  25. Pre-Training: Incremental Novelty Exposure A 1 B 1 A 5 B 5 A 2 B 2 A 6 B 6 T R A 2 A 3 A 4 A 5 A 1 A 2 A 3 A 4 A I ... B 2 B 3 B 4 B 5 B 1 B 2 B 3 B 4 N I N A i B j A i A i B j A i G C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 T E ... S D 1 D 2 D 3 D 4 D 1 D 2 D 3 D 4 T C i D j C i C i D j C i

  26. Pre-Training: Incremental Novelty Exposure A 1 B 1 A 5 B 5 A 2 B 2 A 6 B 6 A k-4 B k-4 A k B k T R A k-3 A k-2 A k-1 A k A 2 A 3 A 4 A 5 A 1 A 2 A 3 A 4 A I ... B k-3 B k-2 B k-1 B k B 2 B 3 B 4 B 5 B 1 B 2 B 3 B 4 N I N A i B j A i A i B j A i A i B j A i G C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 T E ... S D 1 D 2 D 3 D 4 D 1 D 2 D 3 D 4 D 1 D 2 D 3 D 4 T C i D j C i C i D j C i C i D j C i

  27. Simulations with Incremental Novelty Exposure Random Random Huge increase in % of correct predictions!

  28. Conclusions ● Finally, simulations of a recurrent network successfully solving the task of Marcus et al. 1999 ● This simple learning problem might hold lessons for more complex architectures solving more complex tasks ● Crucial for success are: (i) Pre-Wiring with a structure that improves memory (cf. LSTM, Memory Networks) (ii) Pre-Training with training regimes that favour generalization (cf. Dropout) Contact: rgalhama@uva.nl https://staff.fnwi.uva.nl/r.garridoalhama/

  29. Efgect of the Delay Line Without Delay Line With Delay Line

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend