Inoculation by Fine-Tuning: A Method for Analyzing Challenge - - PowerPoint PPT Presentation

inoculation by fine tuning a method for analyzing
SMART_READER_LITE
LIVE PREVIEW

Inoculation by Fine-Tuning: A Method for Analyzing Challenge - - PowerPoint PPT Presentation

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets Nelson F. Liu Roy Schwartz Noah A. Smith NAACL 2019June 4, 2019 UWNLP Two Key Ingredients of NLP Systems Training Model Dataset Architecture NLP System 2


slide-1
SLIDE 1

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets

Nelson F. Liu UWNLP NAACL 2019—June 4, 2019 Roy Schwartz Noah A. Smith

slide-2
SLIDE 2

2

Training Dataset Model Architecture

Two Key Ingredients of NLP Systems

NLP System

😋

slide-3
SLIDE 3

3

Training Dataset Model Architecture

Why Might NLP Systems Fail?

NLP System

🤓

slide-4
SLIDE 4

4

Training Dataset Model Architecture

Dataset Weaknesses

NLP System

🤓

slide-5
SLIDE 5

5

Training Dataset Model Architecture

Model Weaknesses

NLP System

🤓

slide-6
SLIDE 6

Challenge Datasets Break Models

6

slide-7
SLIDE 7

7

Challenge Datasets Break Models

slide-8
SLIDE 8

8

Challenge Datasets Break Models

slide-9
SLIDE 9

NLP Systems Are Brittle

9

slide-10
SLIDE 10

NLP Systems Are Brittle

10

slide-11
SLIDE 11

Inoculation by Fine-Tuning

11

slide-12
SLIDE 12

Inoculation by Fine-Tuning

12

slide-13
SLIDE 13

Inoculation by Fine-Tuning

13

slide-14
SLIDE 14

Inoculation

14

slide-15
SLIDE 15

Inoculate Models to Better Understand Why They Fail

15

slide-16
SLIDE 16

Three Clear Outcomes of Interest

16

Challenge Evaluation Outcome Inoculation

?

slide-17
SLIDE 17

(1) Dataset Weakness

17

Challenge Evaluation Outcome Inoculation Dataset Weakness

slide-18
SLIDE 18

(2) Model Weakness

18

Model Weakness Challenge Evaluation Outcome Inoculation

slide-19
SLIDE 19

(3) Predictive Artifacts / Other

19

Predictive Artifacts / Other Challenge Evaluation Outcome Inoculation

slide-20
SLIDE 20

Three Clear Outcomes of Interest

20

Dataset Weakness Model Weakness

Predictive Artifacts / Other

Challenge Evaluation Outcome Inoculation

slide-21
SLIDE 21

Case Studies

  • Inoculating natural language

inference (NLI) models

  • Inoculating SQuAD reading

comprehension models

21

slide-22
SLIDE 22

Natural Language Inference (NLI)

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders."

22

Entailment Contradiction Neutral

[Dagan et al., 2004] Example from MultiNLI [Williams et al., 2018]

slide-23
SLIDE 23

Two NLI Challenge Datasets

[Naik and Ravichander et al., 2018]

23

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders."

slide-24
SLIDE 24

Two NLI Challenge Datasets

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders and true is true."

24

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders."

Word Overlap Challenge Dataset

[Naik and Ravichander et al., 2018]

slide-25
SLIDE 25

Two NLI Challenge Datasets

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders and true is true." Premise: "I have done what you asked." Hypothesis: "I have disobeyed your ordets."

25

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders."

Word Overlap Challenge Dataset Spelling Errors Challenge Dataset

[Naik and Ravichander et al., 2018]

slide-26
SLIDE 26

Small Perturbations Break NLI Models

26

Word Overlap Spelling Errors

  • 12.6%

(absolute)

  • 4.8%

(absolute)

slide-27
SLIDE 27

Inoculating NLI models

27

Word Overlap Spelling Errors

slide-28
SLIDE 28

Inoculating NLI models

28

Word Overlap Spelling Errors

Dataset Weakness Model Weakness

slide-29
SLIDE 29

More Examples in the Paper!

29

Dataset Weakness Dataset Weakness Model Weakness Model Weakness

Predictive Artifacts / Other

slide-30
SLIDE 30

SQuAD

[Rajpurkar et al., 2016]

Example from Robin Jia

Question: "The number of new Huguenot colonists declined after what year?" Passage: "The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700; thereafter, the numbers declined…" Correct Answer: "1700"

30

slide-31
SLIDE 31

Adversarial SQuAD

Question: "The number of new Huguenot colonists declined after what year?" Passage: "The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700; thereafter, the numbers declined. The number of

  • ld Acadian colonists declined after the year of

1675." Correct Answer: "1700"

[Jia and Liang, 2017]

Example from Robin Jia

31

slide-32
SLIDE 32

Small Perturbations Break SQuAD Models

32

  • 24.5 F1

(absolute)

slide-33
SLIDE 33

Inoculating SQuAD models

33

slide-34
SLIDE 34

34

Predictive Artifacts / Other

Inoculating SQuAD models

slide-35
SLIDE 35

Takeaways

35

  • Inoculation by Fine-Tuning helps us understand

why our models fail.

  • While all challenge datasets break our models,

they stress them in different ways.

Dataset Weakness Model Weakness

Predictive Artifacts / Other

  • Potentially many situations where inoculation can help

clarify model results when transferring to other datasets.

slide-36
SLIDE 36

Takeaways

36

Thank You! Questions?

  • Inoculation by Fine-Tuning helps us understand

why our models fail.

  • While all challenge datasets break our models,

they stress them in different ways.

Dataset Weakness Model Weakness

Predictive Artifacts / Other

  • Potentially many situations where inoculation can help

clarify model results when transferring to other datasets.

slide-37
SLIDE 37

Limitations of Inoculation by Fine-Tuning

  • Requires a somewhat balanced label distribution in the

challenge dataset.

  • Else, fine-tuned model will always predict majority label
  • This method is not a silver bullet!
  • First step toward disentangling failures of {original /

challenge} datasets and models.

37

slide-38
SLIDE 38

38

slide-39
SLIDE 39

Inoculating Multiple SQuAD Reading Comprehension Models

39

slide-40
SLIDE 40

Inoculating Multiple NLI Models Against Word Overlap Adversary

40

slide-41
SLIDE 41

Inoculating Multiple NLI Models Against Spelling Errors

41