Language is not language processing Marten van Schijndel May 2020 - PowerPoint PPT Presentation

Language is not language processing Marten van Schijndel May 2020 Department of Linguistics, Cornell University van Schijndel May 2020 1 / 63

CL/NLP often aim to create models of language comprehension (NLI, parsing, information extraction, etc) Often, language models are trained on large amounts of text And these are the starting point for more complex models Or they are used for cognitive modeling van Schijndel May 2020 2 / 63

Two potential problems Model biases may not align with human comprehension biases All language data comes from production not comprehension (though annotations provide comprehension cues) In this talk, I explore these two possible problems with our current modeling paradigm van Schijndel May 2020 3 / 63 → Models may not learn human comprehension during training → Comprehension signal may not be present in the produced data

Overview Part 0: Background Part 1: Magnitude probing Part 2: World knowledge probing Part 3: Production / comprehension mismatch van Schijndel May 2020 4 / 63

Part 0: Background Neural networks have proven especially successful at finding linguistically accurate language processing solutions. van Schijndel May 2020 5 / 63

NNs are often trained on a word prediction task van Schijndel May 2020 6 / 63

Why word prediction? We can measure how unexpected a word is with surprisal (1) van Schijndel May 2020 7 / 63 Shannon, 1948, Bell Systems Technical Journal Hale, 2001, Proc. North American Assoc. Comp. Ling. Levy, 2008, Cognition Surprisal ( w i ) = − log P ( w i | w 1 .. i − 1 )

Why word prediction? Surprisal indicates what the model finds unexpected/unnatural which can then be mapped onto human behavioral and neural measurements van Schijndel May 2020 8 / 63 So. Many. People. • acceptability/grammaticality • reading/reaction times • neural activation

This is kind of crazy! We know frequency/predictability affect human language processing However, many plausible explanations of human responses involve experience beyond language statistics E.g., can language models learn intention from text alone? There may be some weak signal, but ... van Schijndel May 2020 9 / 63

Part 1: Magnitude probing van Schijndel May 2020 10 / 63 van Schijndel & Linzen, 2018, Proc. CogSci van Schijndel & Linzen, in prep

Humans experience a visceral response upon encountering garden path constructions NNs model average stats and therefore average frequency responses. Garden path responses exist in the tail. van Schijndel May 2020 11 / 63

They exist in the tail because: 1 the statistics are in the tail (predictability) OR 2 the response is unusual (reanalysis) van Schijndel May 2020 12 / 63

The horse raced past the barn fell . van Schijndel May 2020 13 / 63 Bever, 1970, Cognition and the Development of Language

The horse that was raced past the barn fell . van Schijndel May 2020 13 / 63 Bever, 1970, Cognition and the Development of Language

S VBN NP N barn D the P past raced RC NP N horse D the van Schijndel May 2020 14 / 63 PP NP . P VP PP NP N barn D the past VP V raced NP N horse D the S Bever, 1970, Cognition and the Development of Language

While human responses are framed in terms of explicit syntactic frequencies, RNNs can predict garden path responses without explicit syntactic training. van Schijndel May 2020 15 / 63 van Schijndel & Linzen, 2018, Proc. CogSci Futrell et al., 2019, Proc. NAACL Frank & Hoeks, 2019, Proc. CogSci

Do RNNs process garden paths similar to humans? Look beyond garden path existence to garden path magnitude van Schijndel May 2020 16 / 63

Models WikiRNN: Gulordava et al., (2018) LSTM Data: Wikipedia (80M words) SoapRNN: 2-layer LSTM (Same training parameters as above) Data: Corpus of American Soap Operas (80M words; Davies, 2011) van Schijndel May 2020 17 / 63

Three garden paths NP/S: The woman saw May 2020 van Schijndel which was raced past the barn fell. MV/RR: The horse visited, her nephew laughed loudly. 18 / 63 NP/Z: When the woman that the doctor wore a hat.   the doctor wore a hat.    visited her nephew laughed loudly.    raced past the barn fell. 

Surprisal-to-ms conversion (2) van Schijndel May 2020 19 / 63 Smith & Levy, 2013, Cognition RT ( w t ) = α S ( w t )

Probability-to-ms Conversion (3) van Schijndel May 2020 20 / 63 Smith & Levy, 2013, Cognition RT ( w i ) = δ 0 S ( w i ) + δ − 1 S ( w i − 1 ) + δ − 2 S ( w i − 2 ) + δ − 3 S ( w i − 3 )

Deriving the original mapping Probabilities Reading Time Data (SPR; ignoring ET) Generalized Additive Mixed Model van Schijndel May 2020 21 / 63 Smith & Levy, 2013, Cognition • Kneser-Ney trigram probabilities • Estimated from British National Corpus (100M words) • Brown corpus • 35 participants • 5000 words / participant • mgcv package • Factors: text position, word length × log-frequency, participant

Deriving the new mapping Probabilities Reading Time Data (SPR) Linear Mixed Model entropy, entropy reduction van Schijndel May 2020 22 / 63 • LSTM LM probabilities • Estimated from Wikipedia/Soaps (80M words) • 80 simple sentences (fillers) • 224 participants • 1000 words / participant • lme4 package • Factors: text position, word length × log-frequency, participant

Learned Surp-to-RT mapping Smith & Levy, 2013: WikiRNN using Prasad & Linzen, 2019: SoapRNN using Prasad & Linzen, 2019: van Schijndel May 2020 23 / 63 δ 0 = 0 . 53 δ − 1 = 1 . 53 δ − 2 = 0 . 92 δ − 3 = 0 . 84 ( δ 0 = 0 . 04 ) δ − 1 = 1 . 10 δ − 2 = 0 . 37 δ − 3 = 0 . 39 ( δ 0 = − 0 . 04 ) δ − 1 = 0 . 83 δ − 2 = 0 . 91 δ − 3 = 0 . 44

RNN garden path prediction van Schijndel May 2020 24 / 63 40 WikiRNN Surprisal SoapRNN Surprisal 35 Humans Difference in Reading Times (ms) 30 25 20 15 10 5 0 5 (a) NP/S (b) NP/Z (c) MV/RR

Instead of region response, examine word-by-word response van Schijndel May 2020 25 / 63

Word-by-word garden path prediction van Schijndel May 2020 26 / 63 Region Position 50 0 1 2 Difference in Reading Times (ms) 40 30 20 10 0 10 WikiRNN SoapRNN Humans WikiRNN SoapRNN Humans WikiRNN SoapRNN Humans (a) NP/S (b) NP/Z (c) MV/RR

Do RNNs garden path in a reasonable way? van Schijndel May 2020 27 / 63

Parts-of-speech predictions van Schijndel May 2020 28 / 63

Conclusion human effect occurrence statistics (We will come back to this in Part 3) van Schijndel May 2020 29 / 63 • Conversion rates are relatively similar, but all underestimate • Suggests human processing involves mechanisms outside

But how well can human responses be explained by text statistics? We know that RNNs track syntactic and semantic statistics. What about event representations? van Schijndel May 2020 30 / 63

Part 2: World knowledge probing van Schijndel May 2020 31 / 63 Davis & van Schijndel, 2020, Proc. CogSci Davis & van Schijndel, 2020, Proc. CUNY

(1) a. Context - Several horses were being raced. b. Target - The horse raced past the barn fell. Knowledge of the situation mitigates the garden path van Schijndel May 2020 32 / 63

Context: One knight exists van Schijndel May 2020 33 / 63 Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

Context: One knight exists van Schijndel May 2020 34 / 63 Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

Context: Two knights exist van Schijndel May 2020 35 / 63 Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

Context: Two knights exist van Schijndel May 2020 36 / 63 Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

(2) Target 37 / 63 May 2020 van Schijndel fell to the ground with a thud. Unreduced - The knight who was killed by the dragon (ii) ground with a thud. Reduced - The knight killed by the dragon fell to the (i) b. a. not the other. breath of fire, the dragon killed one of the knights but 2NP - Two knights were attacking a dragon. With its (ii) not the squire. With its breath of fire, the dragon killed the knight but 1NP - A knight and his squire were attacking a dragon. (i) Context Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

5 similar models but with intact context trained with different random seeds on 80M Wikipedia Trueswell & Tanenhaus (1991) We sum the surprisal of verb+by van Schijndel May 2020 38 / 63 • Models: 5 LSTMs with shuffled context • Test data: Spivey-Knowlton et al. (1993)

All models predict garden path effect van Schijndel May 2020 39 / 63

Reference mitigates garden path van Schijndel May 2020 40 / 63

In humans, temporal context also mitigates garden paths van Schijndel May 2020 41 / 63 Trueswell & Tanenhaus, 1991, Language and Cognitive Processes

Language is not language processing Marten van Schijndel May 2020 - PowerPoint PPT Presentation

Language is not language processing Marten van Schijndel May 2020 Department of Linguistics, Cornell University van Schijndel May 2020 1 / 63 CL/NLP often aim to create models of language comprehension (NLI, parsing, information extraction,

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Language Processing with Perl and Prolog Chapter 2: Corpus Processing Tools Pierre Nugues Lund

Tourist photography in Ladakh, North India Alex Gillespie a.t.gillespie@lse.ac.uk Lost Horizon

Summer School 2020 Canterbury Whats a summer school? summer sciool noun [C] /sm

Union Civil War Veterans and Northern Society Herbert Schell Lecture University of South

Getting it Right: 4 Key Principles for Building an Early Learning and Child Care System that

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates of SGD

AEGIS: Architecture for Tamper-Evident and Tamper-Resistant Processing G. Edward Suh, Dwaine

Who? Investigating the social entities in a corpus Max Kemman University of Luxembourg December

Grasp Moduli Spaces, Gaussian Processes, and Multimodal Sensor Data Florian T. Pokorny, Yasemin

Language is not language processing Marten van Schijndel May 2020 - PowerPoint PPT Presentation

Language is not language processing Marten van Schijndel May 2020 Department of Linguistics, Cornell University van Schijndel May 2020 1 / 63 CL/NLP often aim to create models of language comprehension (NLI, parsing, information extraction,

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing: Traditional Processing Pipeline Roman Kern &lt;rkern@tugraz.at&gt;

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Language Processing with Perl and Prolog Chapter 2: Corpus Processing Tools Pierre Nugues Lund

Tourist photography in Ladakh, North India Alex Gillespie a.t.gillespie@lse.ac.uk Lost Horizon

Summer School 2020 Canterbury Whats a summer school? summer sciool noun [C] /sm

Union Civil War Veterans and Northern Society Herbert Schell Lecture University of South

Getting it Right: 4 Key Principles for Building an Early Learning and Child Care System that

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates of SGD

AEGIS: Architecture for Tamper-Evident and Tamper-Resistant Processing G. Edward Suh, Dwaine

Who? Investigating the social entities in a corpus Max Kemman University of Luxembourg December

Grasp Moduli Spaces, Gaussian Processes, and Multimodal Sensor Data Florian T. Pokorny, Yasemin

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>