A Neural Model of Adaptation in Reading Marten van Schijndel and Tal - - PowerPoint PPT Presentation

a neural model of adaptation in reading
SMART_READER_LITE
LIVE PREVIEW

A Neural Model of Adaptation in Reading Marten van Schijndel and Tal - - PowerPoint PPT Presentation

A Neural Model of Adaptation in Reading Marten van Schijndel and Tal Linzen November 4, 2018 Department of Cognitive Science, Johns Hopkins University van Schijndel and Linzen November 4, 2018 1 / 24 Thought experiment cassowary


slide-1
SLIDE 1

A Neural Model of Adaptation in Reading

Marten van Schijndel and Tal Linzen November 4, 2018

Department of Cognitive Science, Johns Hopkins University

van Schijndel and Linzen November 4, 2018 1 / 24

slide-2
SLIDE 2

Thought experiment

…cassowary … …cassowary? … …cassowary … …cassowary! … You are now less surprised when this person says ‘Cassowary’

van Schijndel and Linzen November 4, 2018 2 / 24

slide-3
SLIDE 3

Thought experiment

…cassowary … …cassowary? … …cassowary … …cassowary! … You are now less surprised when this person says ‘Cassowary’

van Schijndel and Linzen November 4, 2018 2 / 24

slide-4
SLIDE 4

Thought experiment

…cassowary … …cassowary? … …cassowary … …cassowary! … You are now less surprised when this person says ‘Cassowary’

van Schijndel and Linzen November 4, 2018 2 / 24

slide-5
SLIDE 5

Thought experiment

…cassowary … …cassowary? … …cassowary … …cassowary! … You are now less surprised when this person says ‘Cassowary’

van Schijndel and Linzen November 4, 2018 2 / 24

slide-6
SLIDE 6

Thought experiment

…cassowary … …cassowary? … …cassowary … …cassowary! … You are now less surprised when this person says ‘Cassowary’

van Schijndel and Linzen November 4, 2018 2 / 24

slide-7
SLIDE 7

Thought experiment

…cassowary … …cassowary? … …cassowary … …cassowary! … You are now less surprised when this person says ‘Cassowary’

van Schijndel and Linzen November 4, 2018 2 / 24

slide-8
SLIDE 8

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-9
SLIDE 9

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-10
SLIDE 10

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-11
SLIDE 11

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-12
SLIDE 12

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-13
SLIDE 13

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-14
SLIDE 14

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-15
SLIDE 15

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-16
SLIDE 16

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-17
SLIDE 17

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-18
SLIDE 18

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-19
SLIDE 19

A psycholinguistic example of adaptation (Fine & Jaeger, 2016)

The soldiers warned about the dangers conducted the raid. Unreduced: The soldiers (who were) warned about the dangers … By end of experiment, subjects expected RRC more than at beginning

  • Humans adapt to syntactic structures

van Schijndel and Linzen November 4, 2018 3 / 24

slide-20
SLIDE 20

Adaptation is studied in NLP

  • Domain adaptation (Kuhn & de Mori, 1990; McClosky, 2010)

News Model → Biomedical Text

  • Handling unknown words (Grave et al., 2015)

Learn new words from context

  • Style adaptation (Jaech & Ostendorf, 2017)

Lawyer A → Lawyer B But can we model human adaptation?

van Schijndel and Linzen November 4, 2018 4 / 24

slide-21
SLIDE 21

Adaptation is studied in NLP

  • Domain adaptation (Kuhn & de Mori, 1990; McClosky, 2010)

News Model → Biomedical Text

  • Handling unknown words (Grave et al., 2015)

Learn new words from context

  • Style adaptation (Jaech & Ostendorf, 2017)

Lawyer A → Lawyer B But can we model human adaptation?

van Schijndel and Linzen November 4, 2018 4 / 24

slide-22
SLIDE 22

Our proposed model

LSTM language model (Gives prob of next word in sequence) Base Model: Trained on Wikipedia (90M words) (Gulordava et al., 2018) Adaptation algorithm:

1 Test on a sentence 2 Update weights based on that sentence 3 Repeat on remaining sentences

van Schijndel and Linzen November 4, 2018 5 / 24

slide-23
SLIDE 23

Our proposed model

LSTM language model (Gives prob of next word in sequence) Base Model: Trained on Wikipedia (90M words) (Gulordava et al., 2018) Adaptation algorithm:

1 Test on a sentence 2 Update weights based on that sentence 3 Repeat on remaining sentences

van Schijndel and Linzen November 4, 2018 5 / 24

slide-24
SLIDE 24

Our proposed model

LSTM language model (Gives prob of next word in sequence) Base Model: Trained on Wikipedia (90M words) (Gulordava et al., 2018) Adaptation algorithm:

1 Test on a sentence 2 Update weights based on that sentence 3 Repeat on remaining sentences

van Schijndel and Linzen November 4, 2018 5 / 24

slide-25
SLIDE 25

Our proposed model

LSTM language model (Gives prob of next word in sequence) Base Model: Trained on Wikipedia (90M words) (Gulordava et al., 2018) Adaptation algorithm:

1 Test on a sentence 2 Update weights based on that sentence 3 Repeat on remaining sentences

van Schijndel and Linzen November 4, 2018 5 / 24

slide-26
SLIDE 26

Our proposed model

LSTM language model (Gives prob of next word in sequence) Base Model: Trained on Wikipedia (90M words) (Gulordava et al., 2018) Adaptation algorithm:

1 Test on a sentence 2 Update weights based on that sentence 3 Repeat on remaining sentences

van Schijndel and Linzen November 4, 2018 5 / 24

slide-27
SLIDE 27

Experiment 1 (standard): Does adaptation improve model accuracy?

van Schijndel and Linzen November 4, 2018 6 / 24

slide-28
SLIDE 28

Accuracy Evaluation Data

Test data: Natural Stories Corpus (Futrell et al., 2017)

  • 10 texts (485 sentences)
  • 7 Fairy Tales
  • 3 Documentaries

van Schijndel and Linzen November 4, 2018 7 / 24

slide-29
SLIDE 29

Accuracy Results

Natural Stories Fairy Tales Documentaries 20 40 60 80 100 120 140 160 180 Perplexity Separate Story Types Full Corpus

Wikipedia Wikipedia+Adaptation

van Schijndel and Linzen November 4, 2018 8 / 24

slide-30
SLIDE 30

Experiment 2: Evaluate model against human adaptation

van Schijndel and Linzen November 4, 2018 9 / 24

slide-31
SLIDE 31

Evaluation Measure: Surprisal

Reading times can be predicted with surprisal (Smith and Levy, 2013) Surprisal(wi) = − log P(wi | w1..i−1)

van Schijndel and Linzen November 4, 2018 10 / 24

slide-32
SLIDE 32

Evaluation Data: Reading Times

  • Timeline of adaptation is similar to human adaptation
  • Adaptive surprisal predicts reading times better than

non-adaptive surprisal

van Schijndel and Linzen November 4, 2018 11 / 24

slide-33
SLIDE 33

Evaluation: Reading Times

The soldiers (who were) warned about the dangers conducted the raid.

Figure 1: Human reading times Figure 2: Adaptive model surprisal

van Schijndel and Linzen November 4, 2018 12 / 24

slide-34
SLIDE 34

Evaluation: Reading Times

The soldiers (who were) warned about the dangers conducted the raid.

  • −0.1

0.0 0.1 10 20 30 40

Item order (#RCs seen) Length− and Order−corrected log RTs (ms)

Figure 1: Human reading times

  • −4

4 8 10 20 30 40

Item order (#RCs seen) Order−corrected surprisal (bits)

Figure 2: Adaptive model surprisal

van Schijndel and Linzen November 4, 2018 12 / 24

slide-35
SLIDE 35

Evaluation: Reading Times

Reading time predictions with adaptation Sentence position Word length Surprisal Adaptive surprisal Sentence position Word length Surprisal Adaptive surprisal 0.0 2.5 5.0 7.5

Predictors Reading time (ms per predictor unit)

van Schijndel and Linzen November 4, 2018 13 / 24

slide-36
SLIDE 36

Experiment 3: How sensitive is adaptation to different signals? Vocabulary? Syntax?

van Schijndel and Linzen November 4, 2018 14 / 24

slide-37
SLIDE 37

Generated 200 dative sentence pairs

Prepositional Object (PO): The boy threw the ball to the dog. Double Object (DO): The boy threw the dog the ball.

van Schijndel and Linzen November 4, 2018 15 / 24

slide-38
SLIDE 38

Dative evaluation paradigm

van Schijndel and Linzen November 4, 2018 16 / 24

slide-39
SLIDE 39

Model adapts to vocabulary syntax

PO (The boy threw a ball to the dog) DO (The captain mailed the student a letter) 100 200 300 400 500 600 Perplexity

DO Adapted (The boy threw the dog a ball) Wikipedia Wikipedia+Adaptation

van Schijndel and Linzen November 4, 2018 17 / 24

slide-40
SLIDE 40

Conclusion

  • Proposed a simple adaptation mechanism which
  • Is more accurate than a non-adaptive model
  • Makes more human-like predictions than a non-adaptive model

Is not dependent on learning rate (see paper) Does not seem to suffer from catastrophic forgetting (see paper)

Proposed new ways of evaluating adaptation:

Human adaptive behavior Psycholinguistic experiments to probe signal sensitivity: Adaptation is sensitive to both vocabulary and syntax

van Schijndel and Linzen November 4, 2018 18 / 24

slide-41
SLIDE 41

Conclusion

  • Proposed a simple adaptation mechanism which
  • Is more accurate than a non-adaptive model
  • Makes more human-like predictions than a non-adaptive model
  • Is not dependent on learning rate (see paper)
  • Does not seem to suffer from catastrophic forgetting (see paper)

Proposed new ways of evaluating adaptation:

Human adaptive behavior Psycholinguistic experiments to probe signal sensitivity: Adaptation is sensitive to both vocabulary and syntax

van Schijndel and Linzen November 4, 2018 18 / 24

slide-42
SLIDE 42

Conclusion

  • Proposed a simple adaptation mechanism which
  • Is more accurate than a non-adaptive model
  • Makes more human-like predictions than a non-adaptive model
  • Is not dependent on learning rate (see paper)
  • Does not seem to suffer from catastrophic forgetting (see paper)
  • Proposed new ways of evaluating adaptation:

Human adaptive behavior Psycholinguistic experiments to probe signal sensitivity: Adaptation is sensitive to both vocabulary and syntax

van Schijndel and Linzen November 4, 2018 18 / 24

slide-43
SLIDE 43

Conclusion

  • Proposed a simple adaptation mechanism which
  • Is more accurate than a non-adaptive model
  • Makes more human-like predictions than a non-adaptive model
  • Is not dependent on learning rate (see paper)
  • Does not seem to suffer from catastrophic forgetting (see paper)
  • Proposed new ways of evaluating adaptation:
  • Human adaptive behavior

Psycholinguistic experiments to probe signal sensitivity: Adaptation is sensitive to both vocabulary and syntax

van Schijndel and Linzen November 4, 2018 18 / 24

slide-44
SLIDE 44

Conclusion

  • Proposed a simple adaptation mechanism which
  • Is more accurate than a non-adaptive model
  • Makes more human-like predictions than a non-adaptive model
  • Is not dependent on learning rate (see paper)
  • Does not seem to suffer from catastrophic forgetting (see paper)
  • Proposed new ways of evaluating adaptation:
  • Human adaptive behavior
  • Psycholinguistic experiments to probe signal sensitivity:

Adaptation is sensitive to both vocabulary and syntax

van Schijndel and Linzen November 4, 2018 18 / 24

slide-45
SLIDE 45

Thanks!

van Schijndel and Linzen November 4, 2018 19 / 24

slide-46
SLIDE 46

Catastrophic Forgetting Test

MultiNLI has 10 domains (Williams, et al. 2018) Split each domain into training and testing sets (1000 sentences each)

1 Adapt to a training domain 2 Adapt to a second training domain

Does the model forget the first adaptive training domain?

van Schijndel and Linzen November 4, 2018 20 / 24

slide-47
SLIDE 47

Catastrophic Forgetting Test

MultiNLI has 10 domains (Williams, et al. 2018) Split each domain into training and testing sets (1000 sentences each)

1 Adapt to a training domain 2 Adapt to a second training domain

Does the model forget the first adaptive training domain?

van Schijndel and Linzen November 4, 2018 20 / 24

slide-48
SLIDE 48

Catastrophic Forgetting Test

MultiNLI has 10 domains (Williams, et al. 2018) Split each domain into training and testing sets (1000 sentences each)

1 Adapt to a training domain 2 Adapt to a second training domain

Does the model forget the first adaptive training domain?

van Schijndel and Linzen November 4, 2018 20 / 24

slide-49
SLIDE 49

Catastrophic Forgetting Test

(a) initial (b) adaptive (c) post-adaptation 50 100 150 200 250 300 350 400 450 Perplexity

Figure 3: Perplexity on the held-out set of G1 (a) before adaptation, (b) after adaptation to G1, (c) after adapting to G1 then adapting to G2.

van Schijndel and Linzen November 4, 2018 21 / 24

slide-50
SLIDE 50

Model adapts to vocabulary syntax

Initial 0.002 0.02 0.2 2 20 200 Learning Rate 100 200 300 400 500 600 Perplexity

DO Adapted Shared Vocab (PO Test) Shared Syntax (DO Test)

Figure 4: Learning rate influence over syntactic and lexical adaptation.

van Schijndel and Linzen November 4, 2018 22 / 24

slide-51
SLIDE 51

Psycholinguistic Evaluation

ˆ β ˆ σ t Without adaptive surprisal: Sentence position 0.55 0.53 1.03 Word length 7.29 1.00 7.26 Non-adaptive Surprisal 6.64 0.68 9.79 With adaptive surprisal: Sentence position 0.29 0.53 0.55 Word length 6.42 1.00 6.40 Non-adaptive Surprisal

  • 0.89

0.68

  • 1.31

Adaptive Surprisal 8.45 0.63 13.42 Fixed effects of linear mixed regression

van Schijndel and Linzen November 4, 2018 23 / 24

slide-52
SLIDE 52

Qualitative Adaptation Timeline

The soldiers warned about the dangers conducted the raid.

Figure 5: Human reading times Figure 6: Adaptive model surprisal

van Schijndel and Linzen November 4, 2018 24 / 24

slide-53
SLIDE 53

Qualitative Adaptation Timeline

The soldiers warned about the dangers conducted the raid.

  • −0.1

0.0 0.1 10 20 30 40

Item order (#RCs seen) Length− and Order−corrected log RTs (ms) Sentence Type

  • ambiguous

unambiguous

Figure 5: Human reading times

  • −2

2 4 10 20 30 40

Item order (#RCs seen) Order−corrected surprisal (bits) condition

  • ambiguous

unambiguous

Figure 6: Adaptive model surprisal

van Schijndel and Linzen November 4, 2018 24 / 24