inoculation by fine tuning a method for analyzing
play

Inoculation by Fine-Tuning: A Method for Analyzing Challenge - PowerPoint PPT Presentation

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets Nelson F. Liu Roy Schwartz Noah A. Smith NAACL 2019June 4, 2019 UWNLP Two Key Ingredients of NLP Systems Training Model Dataset Architecture NLP System 2


  1. Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets Nelson F. Liu Roy Schwartz Noah A. Smith NAACL 2019—June 4, 2019 UWNLP

  2. Two Key Ingredients of NLP Systems Training Model Dataset Architecture 😋 NLP System � 2

  3. Why Might NLP Systems Fail? Training Model Dataset Architecture 🤓 NLP System � 3

  4. Dataset Weaknesses Training Model Dataset Architecture 🤓 NLP System � 4

  5. Model Weaknesses Training Model Dataset Architecture 🤓 NLP System � 5

  6. Challenge Datasets Break Models � 6

  7. Challenge Datasets Break Models � 7

  8. Challenge Datasets Break Models � 8

  9. NLP Systems Are Brittle � 9

  10. NLP Systems Are Brittle � 10

  11. Inoculation by Fine-Tuning � 11

  12. Inoculation by Fine-Tuning � 12

  13. Inoculation by Fine-Tuning � 13

  14. Inoculation � 14

  15. Inoculate Models to Better Understand Why They Fail � 15

  16. Three Clear Outcomes of Interest ? Challenge Evaluation Inoculation Outcome � 16

  17. (1) Dataset Weakness Challenge Dataset Evaluation Inoculation Weakness Outcome � 17

  18. (2) Model Weakness Challenge Model Evaluation Inoculation Weakness Outcome � 18

  19. (3) Predictive Artifacts / Other Challenge Predictive Artifacts Evaluation Inoculation / Other Outcome � 19

  20. Three Clear Outcomes of Interest Dataset Weakness Model Challenge Weakness Evaluation Inoculation Outcome Predictive Artifacts / Other � 20

  21. Case Studies • Inoculating natural language inference (NLI) models • Inoculating SQuAD reading comprehension models � 21

  22. [Dagan et al., 2004] Example from MultiNLI [Williams et al., 2018] Natural Language Inference (NLI) Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Entailment Neutral Contradiction � 22

  23. [Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." � 23

  24. [Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Word Overlap Challenge Dataset Premise : "I have done what you asked." Hypothesis : " I have disobeyed your orders and true is true ." � 24

  25. [Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Word Overlap Spelling Errors Challenge Dataset Challenge Dataset Premise : "I have done what Premise : "I have done you asked." what you asked." Hypothesis : " I have Hypothesis : "I have disobeyed your orders and disobeyed your ordets ." true is true ." � 25

  26. Small Perturbations Break NLI Models Word Overlap Spelling Errors -12.6% -4.8% (absolute) (absolute) � 26

  27. Inoculating NLI models Word Overlap Spelling Errors � 27

  28. Inoculating NLI models Word Overlap Spelling Errors Model Weakness Dataset Weakness � 28

  29. More Examples in the Paper! Dataset Model Predictive Artifacts Weakness Weakness / Other Dataset Model Weakness Weakness � 29

  30. [Rajpurkar et al., 2016] Example from Robin Jia SQuAD Question: " The number of new Huguenot colonists declined after what year? " Passage: " The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700 ; thereafter, the numbers declined… " Correct Answer: " 1700 " � 30

  31. [Jia and Liang, 2017] Example from Robin Jia Adversarial SQuAD Question: "The number of new Huguenot colonists declined after what year?" Passage: "The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700 ; thereafter, the numbers declined. The number of old Acadian colonists declined after the year of 1675 . " Correct Answer: " 1700 " � 31

  32. Small Perturbations Break SQuAD Models -24.5 F1 (absolute) � 32

  33. Inoculating SQuAD models � 33

  34. Inoculating SQuAD models Predictive Artifacts / Other � 34

  35. Takeaways • Inoculation by Fine-Tuning helps us understand why our models fail . • While all challenge datasets break our models, they stress them in di ff erent ways . Dataset Model Predictive Artifacts / Other Weakness Weakness • Potentially many situations where inoculation can help clarify model results when transferring to other datasets. � 35

  36. Thank You! Questions? Takeaways • Inoculation by Fine-Tuning helps us understand why our models fail . • While all challenge datasets break our models, they stress them in di ff erent ways . Dataset Model Predictive Artifacts / Other Weakness Weakness • Potentially many situations where inoculation can help clarify model results when transferring to other datasets. � 36

  37. Limitations of Inoculation by Fine-Tuning • Requires a somewhat balanced label distribution in the challenge dataset. • Else, fine-tuned model will always predict majority label • This method is not a silver bullet! • First step toward disentangling failures of {original / challenge} datasets and models. � 37

  38. � 38

  39. Inoculating Multiple SQuAD Reading Comprehension Models � 39

  40. Inoculating Multiple NLI Models Against Word Overlap Adversary � 40

  41. Inoculating Multiple NLI Models Against Spelling Errors � 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend