language is not language processing
play

Language is not language processing Marten van Schijndel May 2020 - PowerPoint PPT Presentation

Language is not language processing Marten van Schijndel May 2020 Department of Linguistics, Cornell University van Schijndel May 2020 1 / 63 CL/NLP often aim to create models of language comprehension (NLI, parsing, information extraction,


  1. Language is not language processing Marten van Schijndel May 2020 Department of Linguistics, Cornell University van Schijndel May 2020 1 / 63

  2. CL/NLP often aim to create models of language comprehension (NLI, parsing, information extraction, etc) Often, language models are trained on large amounts of text And these are the starting point for more complex models Or they are used for cognitive modeling van Schijndel May 2020 2 / 63

  3. Two potential problems Model biases may not align with human comprehension biases All language data comes from production not comprehension (though annotations provide comprehension cues) In this talk, I explore these two possible problems with our current modeling paradigm van Schijndel May 2020 3 / 63 → Models may not learn human comprehension during training → Comprehension signal may not be present in the produced data

  4. Overview Part 0: Background Part 1: Magnitude probing Part 2: World knowledge probing Part 3: Production / comprehension mismatch van Schijndel May 2020 4 / 63

  5. Part 0: Background Neural networks have proven especially successful at finding linguistically accurate language processing solutions. van Schijndel May 2020 5 / 63

  6. NNs are often trained on a word prediction task van Schijndel May 2020 6 / 63

  7. NNs are often trained on a word prediction task van Schijndel May 2020 6 / 63

  8. Why word prediction? We can measure how unexpected a word is with surprisal (1) van Schijndel May 2020 7 / 63 Shannon, 1948, Bell Systems Technical Journal Hale, 2001, Proc. North American Assoc. Comp. Ling. Levy, 2008, Cognition Surprisal ( w i ) = − log P ( w i | w 1 .. i − 1 )

  9. Why word prediction? Surprisal indicates what the model finds unexpected/unnatural which can then be mapped onto human behavioral and neural measurements van Schijndel May 2020 8 / 63 So. Many. People. • acceptability/grammaticality • reading/reaction times • neural activation

  10. This is kind of crazy! We know frequency/predictability affect human language processing However, many plausible explanations of human responses involve experience beyond language statistics E.g., can language models learn intention from text alone? There may be some weak signal, but ... van Schijndel May 2020 9 / 63

  11. Part 1: Magnitude probing van Schijndel May 2020 10 / 63 van Schijndel & Linzen, 2018, Proc. CogSci van Schijndel & Linzen, in prep

  12. Humans experience a visceral response upon encountering garden path constructions NNs model average stats and therefore average frequency responses. Garden path responses exist in the tail. van Schijndel May 2020 11 / 63

  13. They exist in the tail because: 1 the statistics are in the tail (predictability) OR 2 the response is unusual (reanalysis) van Schijndel May 2020 12 / 63

  14. The horse raced past the barn fell . van Schijndel May 2020 13 / 63 Bever, 1970, Cognition and the Development of Language

  15. The horse that was raced past the barn fell . van Schijndel May 2020 13 / 63 Bever, 1970, Cognition and the Development of Language

  16. S VBN NP N barn D the P past raced RC NP N horse D the van Schijndel May 2020 14 / 63 PP NP . P VP PP NP N barn D the past VP V raced NP N horse D the S Bever, 1970, Cognition and the Development of Language

  17. While human responses are framed in terms of explicit syntactic frequencies, RNNs can predict garden path responses without explicit syntactic training. van Schijndel May 2020 15 / 63 van Schijndel & Linzen, 2018, Proc. CogSci Futrell et al., 2019, Proc. NAACL Frank & Hoeks, 2019, Proc. CogSci

  18. Do RNNs process garden paths similar to humans? Look beyond garden path existence to garden path magnitude van Schijndel May 2020 16 / 63

  19. Models WikiRNN: Gulordava et al., (2018) LSTM Data: Wikipedia (80M words) SoapRNN: 2-layer LSTM (Same training parameters as above) Data: Corpus of American Soap Operas (80M words; Davies, 2011) van Schijndel May 2020 17 / 63

  20. Three garden paths NP/S: The woman saw May 2020 van Schijndel which was raced past the barn fell. MV/RR: The horse visited, her nephew laughed loudly. 18 / 63 NP/Z: When the woman that the doctor wore a hat.   the doctor wore a hat.    visited her nephew laughed loudly.    raced past the barn fell. 

  21. Surprisal-to-ms conversion (2) van Schijndel May 2020 19 / 63 Smith & Levy, 2013, Cognition RT ( w t ) = α S ( w t )

  22. Probability-to-ms Conversion (3) van Schijndel May 2020 20 / 63 Smith & Levy, 2013, Cognition RT ( w i ) = δ 0 S ( w i ) + δ − 1 S ( w i − 1 ) + δ − 2 S ( w i − 2 ) + δ − 3 S ( w i − 3 )

  23. Deriving the original mapping Probabilities Reading Time Data (SPR; ignoring ET) Generalized Additive Mixed Model van Schijndel May 2020 21 / 63 Smith & Levy, 2013, Cognition • Kneser-Ney trigram probabilities • Estimated from British National Corpus (100M words) • Brown corpus • 35 participants • 5000 words / participant • mgcv package • Factors: text position, word length × log-frequency, participant

  24. Deriving the new mapping Probabilities Reading Time Data (SPR) Linear Mixed Model entropy, entropy reduction van Schijndel May 2020 22 / 63 • LSTM LM probabilities • Estimated from Wikipedia/Soaps (80M words) • 80 simple sentences (fillers) • 224 participants • 1000 words / participant • lme4 package • Factors: text position, word length × log-frequency, participant

  25. Learned Surp-to-RT mapping Smith & Levy, 2013: WikiRNN using Prasad & Linzen, 2019: SoapRNN using Prasad & Linzen, 2019: van Schijndel May 2020 23 / 63 δ 0 = 0 . 53 δ − 1 = 1 . 53 δ − 2 = 0 . 92 δ − 3 = 0 . 84 ( δ 0 = 0 . 04 ) δ − 1 = 1 . 10 δ − 2 = 0 . 37 δ − 3 = 0 . 39 ( δ 0 = − 0 . 04 ) δ − 1 = 0 . 83 δ − 2 = 0 . 91 δ − 3 = 0 . 44

  26. RNN garden path prediction van Schijndel May 2020 24 / 63 40 WikiRNN Surprisal SoapRNN Surprisal 35 Humans Difference in Reading Times (ms) 30 25 20 15 10 5 0 5 (a) NP/S (b) NP/Z (c) MV/RR

  27. Instead of region response, examine word-by-word response van Schijndel May 2020 25 / 63

  28. Word-by-word garden path prediction van Schijndel May 2020 26 / 63 Region Position 50 0 1 2 Difference in Reading Times (ms) 40 30 20 10 0 10 WikiRNN SoapRNN Humans WikiRNN SoapRNN Humans WikiRNN SoapRNN Humans (a) NP/S (b) NP/Z (c) MV/RR

  29. Do RNNs garden path in a reasonable way? van Schijndel May 2020 27 / 63

  30. Parts-of-speech predictions van Schijndel May 2020 28 / 63

  31. Conclusion human effect occurrence statistics (We will come back to this in Part 3) van Schijndel May 2020 29 / 63 • Conversion rates are relatively similar, but all underestimate • Suggests human processing involves mechanisms outside

  32. But how well can human responses be explained by text statistics? We know that RNNs track syntactic and semantic statistics. What about event representations? van Schijndel May 2020 30 / 63

  33. Part 2: World knowledge probing van Schijndel May 2020 31 / 63 Davis & van Schijndel, 2020, Proc. CogSci Davis & van Schijndel, 2020, Proc. CUNY

  34. (1) a. Context - Several horses were being raced. b. Target - The horse raced past the barn fell. Knowledge of the situation mitigates the garden path van Schijndel May 2020 32 / 63

  35. Context: One knight exists van Schijndel May 2020 33 / 63 Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

  36. Context: One knight exists van Schijndel May 2020 34 / 63 Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

  37. Context: Two knights exist van Schijndel May 2020 35 / 63 Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

  38. Context: Two knights exist van Schijndel May 2020 36 / 63 Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

  39. (2) Target 37 / 63 May 2020 van Schijndel fell to the ground with a thud. Unreduced - The knight who was killed by the dragon (ii) ground with a thud. Reduced - The knight killed by the dragon fell to the (i) b. a. not the other. breath of fire, the dragon killed one of the knights but 2NP - Two knights were attacking a dragon. With its (ii) not the squire. With its breath of fire, the dragon killed the knight but 1NP - A knight and his squire were attacking a dragon. (i) Context Spivey-Knowlton et al., 1993, Canadian Journal of Experimental Psychology

  40. 5 similar models but with intact context trained with different random seeds on 80M Wikipedia Trueswell & Tanenhaus (1991) We sum the surprisal of verb+by van Schijndel May 2020 38 / 63 • Models: 5 LSTMs with shuffled context • Test data: Spivey-Knowlton et al. (1993)

  41. All models predict garden path effect van Schijndel May 2020 39 / 63

  42. Reference mitigates garden path van Schijndel May 2020 40 / 63

  43. In humans, temporal context also mitigates garden paths van Schijndel May 2020 41 / 63 Trueswell & Tanenhaus, 1991, Language and Cognitive Processes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend