A Cognitively Plausible Adaptive Neural Language Model Marten van - PowerPoint PPT Presentation

A Cognitively Plausible Adaptive Neural Language Model Marten van Schijndel and Tal Linzen May 12, 2018 Department of Cognitive Science, Johns Hopkins University van Schijndel and Linzen May 12, 2018 1 / 21

Humans adapt to linguistic context Subjects learn to expect vocabulary items and syntactic structures (Otten & Van Berkum, 2008; Fine et al., 2013) RRC: The soldiers warned about the dangers conducted the raid P(RRC) typical 0 008 Fine et al. 0 50 By end of experiment, subjects expected RRC more than at beginning van Schijndel and Linzen May 12, 2018 2 / 21

Humans adapt to linguistic context Subjects learn to expect vocabulary items and syntactic structures (Otten & Van Berkum, 2008; Fine et al., 2013) RRC: The soldiers warned about the dangers conducted the raid typical Fine et al. By end of experiment, subjects expected RRC more than at beginning van Schijndel and Linzen May 12, 2018 2 / 21 P(RRC) = 0 . 008 → 0 . 50

Adaptation studied in NLP Learn new words from context But can we model human adaptation? van Schijndel and Linzen May 12, 2018 3 / 21 • Domain adaptation (Kuhn & de Mori, 1990; McClosky, 2010) News Model → Biomedical Text • Handling unknown words (Grave et al., 2015) • Style adaptation (Jaech & Ostendorf, 2017) Lawyer A → Lawyer B

Our proposed model LSTM language model (Gives prob of next word in sequence) Base Model: Trained on Wikipedia (90M words) (Gulordava et al., 2018) Adaptation algorithm: 1 Test on a sentence 2 Update weights based on that sentence 3 Repeat on remaining sentences van Schijndel and Linzen May 12, 2018 4 / 21

Experiment 1: Does adaptation improve prediction accuracy? van Schijndel and Linzen May 12, 2018 5 / 21

Accuracy Evaluation Measure: Perplexity Perplexity: How much probability mass is assigned to wrong words? How surprised is the model by the data? (Lower is better) van Schijndel and Linzen May 12, 2018 6 / 21

Accuracy Evaluation Data Test data: Natural Stories Corpus (Futrell et al., 2017) van Schijndel and Linzen May 12, 2018 7 / 21 • 10 texts (485 sentences) • 7 Fairy Tales • 3 Documentaries

Accuracy Results van Schijndel and Linzen May 12, 2018 8 / 21 Full Corpus Separate Story Types 180 Wikipedia Wikipedia+Adaptation 160 140 120 Perplexity 100 80 60 40 20 0 Natural Stories Fairy Tales Documentaries

Experiment 2: Are adaptive expectations human-like? van Schijndel and Linzen May 12, 2018 9 / 21

14 12 10 Surprisal 8 6 4 2 0 The little girl bitten by the dog ... May 12, 2018 van Schijndel and Linzen Psycholinguistic Evaluation Measure: Surprisal Reading times can be predicted with surprisal (Smith and Levy, 2013) 10 / 21 Surprisal ( w i ) = − log P ( w i | w 1 .. i − 1 )

Psycholinguistic Evaluation Measure: Surprisal Reading times can be predicted with surprisal (Smith and Levy, 2013) May 12, 2018 van Schijndel and Linzen 10 / 21 Surprisal ( w i ) = − log P ( w i | w 1 .. i − 1 ) 14 12 10 Surprisal 8 6 4 2 0 The little girl bitten by the dog ...

Psycholinguistic Evaluation Data: Reading Times Test data: Natural Stories Corpus (Futrell et al., 2017) ––––––––––––––––––––––––––––– van Schijndel and Linzen May 12, 2018 11 / 21 Also contains self-paced reading times! ( N = 181)

Psycholinguistic Evaluation Data: Reading Times Test data: Natural Stories Corpus (Futrell et al., 2017) The ––––––––––––––––––––––––– van Schijndel and Linzen May 12, 2018 11 / 21 Also contains self-paced reading times! ( N = 181)

Psycholinguistic Evaluation Data: Reading Times Test data: Natural Stories Corpus (Futrell et al., 2017) ––– boy ––––––––––––––––––––– van Schijndel and Linzen May 12, 2018 11 / 21 Also contains self-paced reading times! ( N = 181)

Psycholinguistic Evaluation Data: Reading Times Test data: Natural Stories Corpus (Futrell et al., 2017) ––––––– threw ––––––––––––––– van Schijndel and Linzen May 12, 2018 11 / 21 Also contains self-paced reading times! ( N = 181)

Psycholinguistic Evaluation Data: Reading Times Test data: Natural Stories Corpus (Futrell et al., 2017) ––––––––––––– the ––––––––––– van Schijndel and Linzen May 12, 2018 11 / 21 Also contains self-paced reading times! ( N = 181)

Psycholinguistic Evaluation Data: Reading Times Test data: Natural Stories Corpus (Futrell et al., 2017) ––––––––––––––––– dog ––––––– van Schijndel and Linzen May 12, 2018 11 / 21 Also contains self-paced reading times! ( N = 181)

Psycholinguistic Evaluation Data: Reading Times Test data: Natural Stories Corpus (Futrell et al., 2017) ––––––––––––––––––––– a ––––– van Schijndel and Linzen May 12, 2018 11 / 21 Also contains self-paced reading times! ( N = 181)

Psycholinguistic Evaluation Data: Reading Times Test data: Natural Stories Corpus (Futrell et al., 2017) ––––––––––––––––––––––– ball. van Schijndel and Linzen May 12, 2018 11 / 21 Also contains self-paced reading times! ( N = 181)

Psycholinguistic Evaluation 1.0034 May 12, 2018 van Schijndel and Linzen Fixed effects of linear mixed regression *** 13.422 0.6294 8.4480 Non-adaptive surprisal *** 6.361 6.3828 Non-adaptive surprisal is a good predictor of reading times Word length 0.680 0.5284 0.3592 Sentence position t -value 12 / 21 ˆ β σ ˆ

Psycholinguistic Evaluation *** May 12, 2018 van Schijndel and Linzen Fixed effects of linear mixed regression *** 12.968 0.6764 8.7714 Adaptive surprisal -1.314 0.6754 -0.8873 Non-adaptive surprisal 6.404 Adaptive surprisal is a better predictor of reading times 1.0035 6.4266 Word length 0.547 0.5310 0.2903 Sentence position t -value 13 / 21 ˆ β σ ˆ

Experiment 3: Does the model adapt to vocabulary, syntax, or both? van Schijndel and Linzen May 12, 2018 14 / 21

Generated 200 dative sentence pairs Prepositional Object (PO): The boy threw the ball to the dog. Double Object (DO): The boy threw the dog the ball. van Schijndel and Linzen May 12, 2018 15 / 21

Dative evaluation paradigm van Schijndel and Linzen May 12, 2018 16 / 21

Model adapts to vocabulary �� syntax van Schijndel and Linzen May 12, 2018 17 / 21 DO Adapted (The boy threw the dog a ball) 600 Wikipedia Wikipedia+Adaptation 500 400 Perplexity 300 200 100 0 PO DO (The boy threw a ball to the dog) (The captain mailed the student a letter)

Our adaptive language model makes than a non-adaptive language model. van Schijndel and Linzen May 12, 2018 18 / 21 • More accurate predictions • More human-like predictions • Adaptation driven by both vocabulary and syntax

Future directions: van Schijndel and Linzen May 12, 2018 19 / 21 • How sensitive are RT results to learning rate? • Reproduce psycholinguistic adaptation results • Compare adaptation mechanisms using human behavioral data

Thanks! van Schijndel and Linzen May 12, 2018 20 / 21

Model adapts to vocabulary �� syntax van Schijndel and Linzen May 12, 2018 21 / 21 DO Adapted PO Adapted 600 Base 500 Adapted 400 Perplexity 300 200 100 0 PO DO DO PO (Vocab) (Syntax) (Vocab) (Syntax)

A Cognitively Plausible Adaptive Neural Language Model Marten van - PowerPoint PPT Presentation

A Cognitively Plausible Adaptive Neural Language Model Marten van Schijndel and Tal Linzen May 12, 2018 Department of Cognitive Science, Johns Hopkins University van Schijndel and Linzen May 12, 2018 1 / 21 Humans adapt to linguistic context

Modelling semantics developing a cognitively plausible, data-driven approach Objective

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Cognitively Ergonomic Route Background Directions Aspect of cognitively ergonomic route

WHY IS IT PLAUSIBLE? (Barry Mazur, JMM conference, Jan. 5, 2012) ( A ) = ( B ) ( B ) is

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Position Based Dynamics A fast yet physically plausible method for deformable body simulation

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

neuroimaging biomarkers of Alzheimers Disease in cognitively normal elderly patients Aline

Case Discussion m/w), word finding difficulties, cognitively slower, progressed over days -

Cognitively-grounded Procedural Content Generation Rogelio E. Cardona-Rivera recardon@ncsu.edu

Inflation and the Theory of the Phillips Curve Thomas I. Palley New America Foundation

The Effects of Professional Forecast Dissemination on Macroeconomic Volatility Sacha Gelfer June

Columbia University Department of Economics Lecture 18 Economics UN3213 Intermediate

A Resource Model For Adaptable Applications ICSE 2006 Workshop on Software Engineering for

Endogenous Regime Switching Near the Zero Lower Bound 1 Kevin J. Lansing Federal Reserve Bank of

The Axiom of Interdependence The Role of Aggregate Demand and Aggregate Supply in the New

Lecture 3: Dependence measures using RKHS embeddings MLSS T ubingen, 2015 Arthur Gretton

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling Instructor: Sham Kakade 1 The