don t take the premise for granted
play

Dont Take the Premise for Granted: Mitigating Artifacts in Natural - PowerPoint PPT Presentation

Dont Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference Yonatan Belinkov *, Adam Poliak*, Stuart Shieber, Benjamin Van Durme, Alexander Rush July 29, 2019 ACL, Florence NLU as Relationship Identification


  1. Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference Yonatan Belinkov *, Adam Poliak*, Stuart Shieber, Benjamin Van Durme, Alexander Rush July 29, 2019 ACL, Florence

  2. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction [Sources: Hill+ ‘16, Zhang+ ‘16]

  3. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction [Sources: Hill+ ‘16, Zhang+ ‘16]

  4. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction Reading comprehension “No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought [Sources: Hill+ ‘16, Zhang+ ‘16]

  5. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction Reading comprehension Visual question answering “No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought Q: Is the girl walking the bike? A: Yes, No [Sources: Hill+ ‘16, Zhang+ ‘16]

  6. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction Assumption: Identifying the relationship requires Reading comprehension Visual question answering deep language understanding “No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought Q: Is the girl walking the bike? A: Yes, No [Sources: Hill+ ‘16, Zhang+ ‘16]

  7. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)

  8. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) Hypothesis: A woman is sleeping

  9. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) Premise: Hypothesis: A woman is sleeping

  10. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) Premise: Hypothesis: A woman is sleeping entailment neutral contradiction

  11. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) Premise: Hypothesis: A woman is sleeping entailment neutral contradiction

  12. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) 100 80 60 40 20 0 SNLI Multi-NLI Majority Hypothesis-Only InferSent

  13. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) 100 80 60 40 20 0 SNLI Multi-NLI Majority Hypothesis-Only InferSent

  14. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) • Reading comprehension (Kaushik & Lipton ‘18) • Visual question answering (Zhang+ ’16; Kafle & Kanan ’16; Goyal+ ’17; Agarwal+ ’17; inter alia ) • Story cloze completion (Schwartz+ ‘17, Cai+ ’17)

  15. Problem: One-sided biases mean that models may not learn the true relationship between premise and hypothesis 15

  16. Strategies for dealing with dataset bias • Construct new datasets (Sharma+ ‘18) o $$$ o Other bias

  17. Strategies for dealing with dataset bias • Construct new datasets (Sharma+ ‘18) o $$$ o Other bias • Filter “easy” examples (Gururangan+ ‘18) o Hard to scale o May still have biases (see SWAG → BERT → HellaSWAG)

  18. Strategies for dealing with dataset bias • Construct new datasets (Sharma+ ‘18) o $$$ o Other bias • Filter “easy” examples (Gururangan+ ‘18) o Hard to scale o May still have biases (see SWAG → BERT → HellaSWAG) • Forgo datasets with known biases o Not all bias is bad o Biased datasets may have other useful information

  19. Our approach: Design models that facilitate learning less biased representations

  20. <latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit> A Generative Perspective ● Typical NLI models maximize the discriminative likelihood p θ ( y | P, H )

  21. <latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit> A Generative Perspective ● Typical NLI models maximize the discriminative likelihood p θ ( y | P, H ) g g – classifier f P , f H – encoders f P f H P H

  22. <latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit> <latexit sha1_base64="3ROWeDUn/KIih3NPtU61pkoOB8E=">AB8nicbVBNS8NAEN3Ur1q/qh69LBahgpSkCnoseumxgv2ANJTNdtMu3WTD7kQIsT/DiwdFvPprvPlv3LY5aOuDgcd7M8zM82PBNdj2t1VYW9/Y3Cpul3Z29/YPyodHS0TRVmbSiFVzyeaCR6xNnAQrBcrRkJfsK4/uZv53UemNJfRA6Qx80IyinjAKQEjuXG1hZ9weoGb54Nyxa7Zc+BV4uSkgnK0BuWv/lDSJGQRUEG0dh07Bi8jCjgVbFrqJ5rFhE7IiLmGRiRk2svmJ0/xmVGOJDKVAR4rv6eyEiodRr6pjMkMNbL3kz8z3MTCG68jEdxAiyi0VBIjBIPsfD7liFERqCKGKm1sxHRNFKJiUSiYEZ/nlVdKp15zLWv3+qtK4zeMohN0iqrIQdeogZqohdqIome0St6s8B6sd6tj0VrwcpnjtEfWJ8/BUuPxg=</latexit> A Generative Perspective ● Typical NLI models maximize the discriminative likelihood p θ ( y | P, H ) ● Our key idea: If we generate the premise, it cannot be ignored ● We will maximize the likelihood of generating the premise p ( P | y, H )

  23. <latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit> <latexit sha1_base64="3ROWeDUn/KIih3NPtU61pkoOB8E=">AB8nicbVBNS8NAEN3Ur1q/qh69LBahgpSkCnoseumxgv2ANJTNdtMu3WTD7kQIsT/DiwdFvPprvPlv3LY5aOuDgcd7M8zM82PBNdj2t1VYW9/Y3Cpul3Z29/YPyodHS0TRVmbSiFVzyeaCR6xNnAQrBcrRkJfsK4/uZv53UemNJfRA6Qx80IyinjAKQEjuXG1hZ9weoGb54Nyxa7Zc+BV4uSkgnK0BuWv/lDSJGQRUEG0dh07Bi8jCjgVbFrqJ5rFhE7IiLmGRiRk2svmJ0/xmVGOJDKVAR4rv6eyEiodRr6pjMkMNbL3kz8z3MTCG68jEdxAiyi0VBIjBIPsfD7liFERqCKGKm1sxHRNFKJiUSiYEZ/nlVdKp15zLWv3+qtK4zeMohN0iqrIQdeogZqohdqIome0St6s8B6sd6tj0VrwcpnjtEfWJ8/BUuPxg=</latexit> A Generative Perspective ● Typical NLI models maximize the discriminative likelihood p θ ( y | P, H ) ● Our key idea: If we generate the premise, it cannot be ignored ● We will maximize the likelihood of generating the premise p ( P | y, H ) Hypothesis: A woman is sleeping Premise: A woman is running in Relation: contradiction the park with her dog

  24. A Generative Perspective ● Unfortunately, text generation is hard! Hypothesis: A woman is sleeping Premise: A woman is running in Relation: contradiction the park with her dog

  25. A Generative Perspective ● Unfortunately, text generation is hard! Hypothesis: A woman is sleeping Premise: A woman is running in Relation: contradiction the park with her dog Premise: A woman sings a song while playing piano Premise: This woman is laughing at her baby …

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend