A Cognitively Plausible Adaptive Neural Language Model Marten van - - PowerPoint PPT Presentation

a cognitively plausible adaptive neural language model
SMART_READER_LITE
LIVE PREVIEW

A Cognitively Plausible Adaptive Neural Language Model Marten van - - PowerPoint PPT Presentation

A Cognitively Plausible Adaptive Neural Language Model Marten van Schijndel and Tal Linzen May 12, 2018 Department of Cognitive Science, Johns Hopkins University van Schijndel and Linzen May 12, 2018 1 / 21 Humans adapt to linguistic context


slide-1
SLIDE 1

A Cognitively Plausible Adaptive Neural Language Model

Marten van Schijndel and Tal Linzen May 12, 2018

Department of Cognitive Science, Johns Hopkins University

van Schijndel and Linzen May 12, 2018 1 / 21

slide-2
SLIDE 2

Humans adapt to linguistic context

Subjects learn to expect vocabulary items and syntactic structures (Otten & Van Berkum, 2008; Fine et al., 2013) RRC: The soldiers warned about the dangers conducted the raid P(RRC)

typical

0 008

Fine et al.

0 50 By end of experiment, subjects expected RRC more than at beginning

van Schijndel and Linzen May 12, 2018 2 / 21

slide-3
SLIDE 3

Humans adapt to linguistic context

Subjects learn to expect vocabulary items and syntactic structures (Otten & Van Berkum, 2008; Fine et al., 2013) RRC: The soldiers warned about the dangers conducted the raid P(RRC)

typical

0 008

Fine et al.

0 50 By end of experiment, subjects expected RRC more than at beginning

van Schijndel and Linzen May 12, 2018 2 / 21

slide-4
SLIDE 4

Humans adapt to linguistic context

Subjects learn to expect vocabulary items and syntactic structures (Otten & Van Berkum, 2008; Fine et al., 2013) RRC: The soldiers warned about the dangers conducted the raid P(RRC) =

typical

0.008 →

Fine et al.

0.50 By end of experiment, subjects expected RRC more than at beginning

van Schijndel and Linzen May 12, 2018 2 / 21

slide-5
SLIDE 5

Humans adapt to linguistic context

Subjects learn to expect vocabulary items and syntactic structures (Otten & Van Berkum, 2008; Fine et al., 2013) RRC: The soldiers warned about the dangers conducted the raid P(RRC) =

typical

0.008 →

Fine et al.

0.50 By end of experiment, subjects expected RRC more than at beginning

van Schijndel and Linzen May 12, 2018 2 / 21

slide-6
SLIDE 6

Adaptation studied in NLP

  • Domain adaptation (Kuhn & de Mori, 1990; McClosky, 2010)

News Model → Biomedical Text

  • Handling unknown words (Grave et al., 2015)

Learn new words from context

  • Style adaptation (Jaech & Ostendorf, 2017)

Lawyer A → Lawyer B But can we model human adaptation?

van Schijndel and Linzen May 12, 2018 3 / 21

slide-7
SLIDE 7

Adaptation studied in NLP

  • Domain adaptation (Kuhn & de Mori, 1990; McClosky, 2010)

News Model → Biomedical Text

  • Handling unknown words (Grave et al., 2015)

Learn new words from context

  • Style adaptation (Jaech & Ostendorf, 2017)

Lawyer A → Lawyer B But can we model human adaptation?

van Schijndel and Linzen May 12, 2018 3 / 21

slide-8
SLIDE 8

Our proposed model

LSTM language model (Gives prob of next word in sequence) Base Model: Trained on Wikipedia (90M words) (Gulordava et al., 2018) Adaptation algorithm:

1 Test on a sentence 2 Update weights based on that sentence 3 Repeat on remaining sentences

van Schijndel and Linzen May 12, 2018 4 / 21

slide-9
SLIDE 9

Our proposed model

LSTM language model (Gives prob of next word in sequence) Base Model: Trained on Wikipedia (90M words) (Gulordava et al., 2018) Adaptation algorithm:

1 Test on a sentence 2 Update weights based on that sentence 3 Repeat on remaining sentences

van Schijndel and Linzen May 12, 2018 4 / 21

slide-10
SLIDE 10

Experiment 1: Does adaptation improve prediction accuracy?

van Schijndel and Linzen May 12, 2018 5 / 21

slide-11
SLIDE 11

Accuracy Evaluation Measure: Perplexity

Perplexity: How much probability mass is assigned to wrong words? How surprised is the model by the data? (Lower is better)

van Schijndel and Linzen May 12, 2018 6 / 21

slide-12
SLIDE 12

Accuracy Evaluation Data

Test data: Natural Stories Corpus (Futrell et al., 2017)

  • 10 texts (485 sentences)
  • 7 Fairy Tales
  • 3 Documentaries

van Schijndel and Linzen May 12, 2018 7 / 21

slide-13
SLIDE 13

Accuracy Results

Natural Stories Fairy Tales Documentaries 20 40 60 80 100 120 140 160 180 Perplexity Separate Story Types Full Corpus

Wikipedia Wikipedia+Adaptation

van Schijndel and Linzen May 12, 2018 8 / 21

slide-14
SLIDE 14

Experiment 2: Are adaptive expectations human-like?

van Schijndel and Linzen May 12, 2018 9 / 21

slide-15
SLIDE 15

Psycholinguistic Evaluation Measure: Surprisal

Reading times can be predicted with surprisal (Smith and Levy, 2013) Surprisal(wi) = − log P(wi | w1..i−1)

The little girl bitten by the dog ... 2 4 6 8 10 12 14 Surprisal

van Schijndel and Linzen May 12, 2018 10 / 21

slide-16
SLIDE 16

Psycholinguistic Evaluation Measure: Surprisal

Reading times can be predicted with surprisal (Smith and Levy, 2013) Surprisal(wi) = − log P(wi | w1..i−1)

The little girl bitten by the dog ... 2 4 6 8 10 12 14 Surprisal

van Schijndel and Linzen May 12, 2018 10 / 21

slide-17
SLIDE 17

Psycholinguistic Evaluation Data: Reading Times

Test data: Natural Stories Corpus (Futrell et al., 2017) Also contains self-paced reading times! (N = 181) –––––––––––––––––––––––––––––

van Schijndel and Linzen May 12, 2018 11 / 21

slide-18
SLIDE 18

Psycholinguistic Evaluation Data: Reading Times

Test data: Natural Stories Corpus (Futrell et al., 2017) Also contains self-paced reading times! (N = 181) The –––––––––––––––––––––––––

van Schijndel and Linzen May 12, 2018 11 / 21

slide-19
SLIDE 19

Psycholinguistic Evaluation Data: Reading Times

Test data: Natural Stories Corpus (Futrell et al., 2017) Also contains self-paced reading times! (N = 181) ––– boy –––––––––––––––––––––

van Schijndel and Linzen May 12, 2018 11 / 21

slide-20
SLIDE 20

Psycholinguistic Evaluation Data: Reading Times

Test data: Natural Stories Corpus (Futrell et al., 2017) Also contains self-paced reading times! (N = 181) ––––––– threw –––––––––––––––

van Schijndel and Linzen May 12, 2018 11 / 21

slide-21
SLIDE 21

Psycholinguistic Evaluation Data: Reading Times

Test data: Natural Stories Corpus (Futrell et al., 2017) Also contains self-paced reading times! (N = 181) ––––––––––––– the –––––––––––

van Schijndel and Linzen May 12, 2018 11 / 21

slide-22
SLIDE 22

Psycholinguistic Evaluation Data: Reading Times

Test data: Natural Stories Corpus (Futrell et al., 2017) Also contains self-paced reading times! (N = 181) ––––––––––––––––– dog –––––––

van Schijndel and Linzen May 12, 2018 11 / 21

slide-23
SLIDE 23

Psycholinguistic Evaluation Data: Reading Times

Test data: Natural Stories Corpus (Futrell et al., 2017) Also contains self-paced reading times! (N = 181) ––––––––––––––––––––– a –––––

van Schijndel and Linzen May 12, 2018 11 / 21

slide-24
SLIDE 24

Psycholinguistic Evaluation Data: Reading Times

Test data: Natural Stories Corpus (Futrell et al., 2017) Also contains self-paced reading times! (N = 181) ––––––––––––––––––––––– ball.

van Schijndel and Linzen May 12, 2018 11 / 21

slide-25
SLIDE 25

Psycholinguistic Evaluation

Non-adaptive surprisal is a good predictor of reading times ˆ β ˆ σ t-value Sentence position 0.3592 0.5284 0.680 Word length 6.3828 1.0034 6.361 *** Non-adaptive surprisal 8.4480 0.6294 13.422 *** Fixed effects of linear mixed regression

van Schijndel and Linzen May 12, 2018 12 / 21

slide-26
SLIDE 26

Psycholinguistic Evaluation

Adaptive surprisal is a better predictor of reading times ˆ β ˆ σ t-value Sentence position 0.2903 0.5310 0.547 Word length 6.4266 1.0035 6.404 *** Non-adaptive surprisal

  • 0.8873

0.6754

  • 1.314

Adaptive surprisal 8.7714 0.6764 12.968 *** Fixed effects of linear mixed regression

van Schijndel and Linzen May 12, 2018 13 / 21

slide-27
SLIDE 27

Experiment 3: Does the model adapt to vocabulary, syntax, or both?

van Schijndel and Linzen May 12, 2018 14 / 21

slide-28
SLIDE 28

Generated 200 dative sentence pairs

Prepositional Object (PO): The boy threw the ball to the dog. Double Object (DO): The boy threw the dog the ball.

van Schijndel and Linzen May 12, 2018 15 / 21

slide-29
SLIDE 29

Dative evaluation paradigm

van Schijndel and Linzen May 12, 2018 16 / 21

slide-30
SLIDE 30

Model adapts to vocabulary syntax

PO (The boy threw a ball to the dog) DO (The captain mailed the student a letter) 100 200 300 400 500 600 Perplexity

DO Adapted (The boy threw the dog a ball) Wikipedia Wikipedia+Adaptation

van Schijndel and Linzen May 12, 2018 17 / 21

slide-31
SLIDE 31

Our adaptive language model makes

  • More accurate predictions
  • More human-like predictions

than a non-adaptive language model.

  • Adaptation driven by both vocabulary and syntax

van Schijndel and Linzen May 12, 2018 18 / 21

slide-32
SLIDE 32

Future directions:

  • How sensitive are RT results to learning rate?
  • Reproduce psycholinguistic adaptation results
  • Compare adaptation mechanisms using human behavioral data

van Schijndel and Linzen May 12, 2018 19 / 21

slide-33
SLIDE 33

Thanks!

van Schijndel and Linzen May 12, 2018 20 / 21

slide-34
SLIDE 34

Model adapts to vocabulary syntax

PO (Vocab) DO (Syntax) DO (Vocab) PO (Syntax) 100 200 300 400 500 600 Perplexity PO Adapted DO Adapted

Base Adapted

van Schijndel and Linzen May 12, 2018 21 / 21