[PPT] - Inoculation by Fine-Tuning: A Method for Analyzing Challenge PowerPoint Presentation

SLIDE 1

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets

Nelson F. Liu UWNLP NAACL 2019—June 4, 2019 Roy Schwartz Noah A. Smith

SLIDE 2

2

Training Dataset Model Architecture

Two Key Ingredients of NLP Systems

NLP System

😋

SLIDE 3

3

Training Dataset Model Architecture

Why Might NLP Systems Fail?

NLP System

🤓

SLIDE 4

4

Training Dataset Model Architecture

Dataset Weaknesses

NLP System

🤓

SLIDE 5

5

Training Dataset Model Architecture

Model Weaknesses

NLP System

🤓

SLIDE 6

Challenge Datasets Break Models

6

SLIDE 7

7

Challenge Datasets Break Models

SLIDE 8

8

Challenge Datasets Break Models

SLIDE 9

NLP Systems Are Brittle

9

SLIDE 10

NLP Systems Are Brittle

10

SLIDE 11

Inoculation by Fine-Tuning

11

SLIDE 12

Inoculation by Fine-Tuning

12

SLIDE 13

Inoculation by Fine-Tuning

13

SLIDE 14

Inoculation

14

SLIDE 15

Inoculate Models to Better Understand Why They Fail

15

SLIDE 16

Three Clear Outcomes of Interest

16

Challenge Evaluation Outcome Inoculation

?

SLIDE 17

(1) Dataset Weakness

17

Challenge Evaluation Outcome Inoculation Dataset Weakness

SLIDE 18

(2) Model Weakness

18

Model Weakness Challenge Evaluation Outcome Inoculation

SLIDE 19

(3) Predictive Artifacts / Other

19

Predictive Artifacts / Other Challenge Evaluation Outcome Inoculation

SLIDE 20

Three Clear Outcomes of Interest

20

Dataset Weakness Model Weakness

Predictive Artifacts / Other

Challenge Evaluation Outcome Inoculation

SLIDE 21

Case Studies

Inoculating natural language

inference (NLI) models

Inoculating SQuAD reading

comprehension models

21

SLIDE 22

Natural Language Inference (NLI)

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders."

22

Entailment Contradiction Neutral

[Dagan et al., 2004] Example from MultiNLI [Williams et al., 2018]

SLIDE 23

Two NLI Challenge Datasets

[Naik and Ravichander et al., 2018]

23

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders."

SLIDE 24

Two NLI Challenge Datasets

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders and true is true."

24

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders."

Word Overlap Challenge Dataset

[Naik and Ravichander et al., 2018]

SLIDE 25

Two NLI Challenge Datasets

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders and true is true." Premise: "I have done what you asked." Hypothesis: "I have disobeyed your ordets."

25

Premise: "I have done what you asked." Hypothesis: "I have disobeyed your orders."

Word Overlap Challenge Dataset Spelling Errors Challenge Dataset

[Naik and Ravichander et al., 2018]

SLIDE 26

Small Perturbations Break NLI Models

26

Word Overlap Spelling Errors

12.6%

(absolute)

4.8%

(absolute)

SLIDE 27

Inoculating NLI models

27

Word Overlap Spelling Errors

SLIDE 28

Inoculating NLI models

28

Word Overlap Spelling Errors

Dataset Weakness Model Weakness

SLIDE 29

More Examples in the Paper!

29

Dataset Weakness Dataset Weakness Model Weakness Model Weakness

Predictive Artifacts / Other

SLIDE 30

SQuAD

[Rajpurkar et al., 2016]

Example from Robin Jia

Question: "The number of new Huguenot colonists declined after what year?" Passage: "The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700; thereafter, the numbers declined…" Correct Answer: "1700"

30

SLIDE 31

Adversarial SQuAD

Question: "The number of new Huguenot colonists declined after what year?" Passage: "The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700; thereafter, the numbers declined. The number of

ld Acadian colonists declined after the year of

1675." Correct Answer: "1700"

[Jia and Liang, 2017]

Example from Robin Jia

31

SLIDE 32

Small Perturbations Break SQuAD Models

32

24.5 F1

(absolute)

SLIDE 33

Inoculating SQuAD models

33

SLIDE 34

34

Predictive Artifacts / Other

Inoculating SQuAD models

SLIDE 35

Takeaways

35

Inoculation by Fine-Tuning helps us understand

why our models fail.

While all challenge datasets break our models,

they stress them in different ways.

Dataset Weakness Model Weakness

Predictive Artifacts / Other

Potentially many situations where inoculation can help

clarify model results when transferring to other datasets.

SLIDE 36

Takeaways

36

Thank You! Questions?

Inoculation by Fine-Tuning helps us understand

why our models fail.

While all challenge datasets break our models,

they stress them in different ways.

Dataset Weakness Model Weakness

Predictive Artifacts / Other

Potentially many situations where inoculation can help

clarify model results when transferring to other datasets.

SLIDE 37

Limitations of Inoculation by Fine-Tuning

Requires a somewhat balanced label distribution in the

challenge dataset.

Else, fine-tuned model will always predict majority label
This method is not a silver bullet!
First step toward disentangling failures of {original /

challenge} datasets and models.

37

SLIDE 38

38

SLIDE 39

Inoculating Multiple SQuAD Reading Comprehension Models

39

SLIDE 40

Inoculating Multiple NLI Models Against Word Overlap Adversary

40

SLIDE 41

Inoculating Multiple NLI Models Against Spelling Errors

41