Dont Take the Premise for Granted: Mitigating Artifacts in Natural - - PowerPoint PPT Presentation

don t take the premise for granted
SMART_READER_LITE
LIVE PREVIEW

Dont Take the Premise for Granted: Mitigating Artifacts in Natural - - PowerPoint PPT Presentation

Dont Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference Yonatan Belinkov *, Adam Poliak*, Stuart Shieber, Benjamin Van Durme, Alexander Rush July 29, 2019 ACL, Florence NLU as Relationship Identification


slide-1
SLIDE 1

Don’t Take the Premise for Granted:

Mitigating Artifacts in Natural Language Inference

Yonatan Belinkov*, Adam Poliak*, Stuart Shieber, Benjamin Van Durme, Alexander Rush

July 29, 2019 ACL, Florence

slide-2
SLIDE 2

NLU as Relationship Identification

Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction Natural language inference (entailment)

[Sources: Hill+ ‘16, Zhang+ ‘16]

slide-3
SLIDE 3

NLU as Relationship Identification

Natural language inference (entailment)

[Sources: Hill+ ‘16, Zhang+ ‘16]

Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction

slide-4
SLIDE 4

NLU as Relationship Identification

Reading comprehension

“No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought

Natural language inference (entailment)

[Sources: Hill+ ‘16, Zhang+ ‘16]

Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction

slide-5
SLIDE 5

NLU as Relationship Identification

Reading comprehension

“No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought

Q: Is the girl walking the bike? A: Yes, No

Visual question answering Natural language inference (entailment)

[Sources: Hill+ ‘16, Zhang+ ‘16]

Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction

slide-6
SLIDE 6

NLU as Relationship Identification

Reading comprehension

“No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought

Q: Is the girl walking the bike? A: Yes, No

Visual question answering Natural language inference (entailment)

[Sources: Hill+ ‘16, Zhang+ ‘16]

Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction

Assumption: Identifying the relationship requires deep language understanding

slide-7
SLIDE 7

One-Sided Biases

  • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)
slide-8
SLIDE 8

One-Sided Biases

  • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)

Hypothesis: A woman is sleeping

slide-9
SLIDE 9

One-Sided Biases

  • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)

Hypothesis: A woman is sleeping Premise:

slide-10
SLIDE 10

One-Sided Biases

  • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)

Hypothesis: A woman is sleeping Premise: entailment neutral contradiction

slide-11
SLIDE 11

One-Sided Biases

  • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)

Hypothesis: A woman is sleeping Premise: entailment neutral contradiction

slide-12
SLIDE 12

20 40 60 80 100 SNLI Multi-NLI

Majority Hypothesis-Only InferSent

One-Sided Biases

  • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)
slide-13
SLIDE 13

20 40 60 80 100 SNLI Multi-NLI

Majority Hypothesis-Only InferSent

One-Sided Biases

  • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)
slide-14
SLIDE 14

One-Sided Biases

  • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)
  • Reading comprehension (Kaushik & Lipton ‘18)
  • Visual question answering (Zhang+ ’16; Kafle & Kanan ’16; Goyal+ ’17;

Agarwal+ ’17; inter alia)

  • Story cloze completion (Schwartz+ ‘17, Cai+ ’17)
slide-15
SLIDE 15

Problem:

One-sided biases mean that models may not learn the true relationship between premise and hypothesis

15

slide-16
SLIDE 16

Strategies for dealing with dataset bias

  • Construct new datasets (Sharma+ ‘18)
  • $$$
  • Other bias
slide-17
SLIDE 17

Strategies for dealing with dataset bias

  • Construct new datasets (Sharma+ ‘18)
  • $$$
  • Other bias
  • Filter “easy” examples (Gururangan+ ‘18)
  • Hard to scale
  • May still have biases (see SWAG → BERT → HellaSWAG)
slide-18
SLIDE 18

Strategies for dealing with dataset bias

  • Construct new datasets (Sharma+ ‘18)
  • $$$
  • Other bias
  • Filter “easy” examples (Gururangan+ ‘18)
  • Hard to scale
  • May still have biases (see SWAG → BERT → HellaSWAG)
  • Forgo datasets with known biases
  • Not all bias is bad
  • Biased datasets may have other useful information
slide-19
SLIDE 19

Our approach:

Design models that facilitate learning less biased representations

slide-20
SLIDE 20

A Generative Perspective

  • Typical NLI models maximize the discriminative likelihood

pθ(y|P, H)

<latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit>
slide-21
SLIDE 21

A Generative Perspective

  • Typical NLI models maximize the discriminative likelihood

pθ(y|P, H)

<latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit>

g – classifier fP, fH – encoders

P H fP fH g

slide-22
SLIDE 22

A Generative Perspective

  • Typical NLI models maximize the discriminative likelihood
  • Our key idea: If we generate the premise, it cannot be ignored
  • We will maximize the likelihood of generating the premise

pθ(y|P, H)

<latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit>

p(P|y, H)

<latexit sha1_base64="3ROWeDUn/KIih3NPtU61pkoOB8E=">AB8nicbVBNS8NAEN3Ur1q/qh69LBahgpSkCnoseumxgv2ANJTNdtMu3WTD7kQIsT/DiwdFvPprvPlv3LY5aOuDgcd7M8zM82PBNdj2t1VYW9/Y3Cpul3Z29/YPyodHS0TRVmbSiFVzyeaCR6xNnAQrBcrRkJfsK4/uZv53UemNJfRA6Qx80IyinjAKQEjuXG1hZ9weoGb54Nyxa7Zc+BV4uSkgnK0BuWv/lDSJGQRUEG0dh07Bi8jCjgVbFrqJ5rFhE7IiLmGRiRk2svmJ0/xmVGOJDKVAR4rv6eyEiodRr6pjMkMNbL3kz8z3MTCG68jEdxAiyi0VBIjBIPsfD7liFERqCKGKm1sxHRNFKJiUSiYEZ/nlVdKp15zLWv3+qtK4zeMohN0iqrIQdeogZqohdqIome0St6s8B6sd6tj0VrwcpnjtEfWJ8/BUuPxg=</latexit>
slide-23
SLIDE 23

A Generative Perspective

  • Typical NLI models maximize the discriminative likelihood
  • Our key idea: If we generate the premise, it cannot be ignored
  • We will maximize the likelihood of generating the premise

pθ(y|P, H)

<latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit>

Hypothesis: A woman is sleeping Relation: contradiction Premise: A woman is running in the park with her dog

p(P|y, H)

<latexit sha1_base64="3ROWeDUn/KIih3NPtU61pkoOB8E=">AB8nicbVBNS8NAEN3Ur1q/qh69LBahgpSkCnoseumxgv2ANJTNdtMu3WTD7kQIsT/DiwdFvPprvPlv3LY5aOuDgcd7M8zM82PBNdj2t1VYW9/Y3Cpul3Z29/YPyodHS0TRVmbSiFVzyeaCR6xNnAQrBcrRkJfsK4/uZv53UemNJfRA6Qx80IyinjAKQEjuXG1hZ9weoGb54Nyxa7Zc+BV4uSkgnK0BuWv/lDSJGQRUEG0dh07Bi8jCjgVbFrqJ5rFhE7IiLmGRiRk2svmJ0/xmVGOJDKVAR4rv6eyEiodRr6pjMkMNbL3kz8z3MTCG68jEdxAiyi0VBIjBIPsfD7liFERqCKGKm1sxHRNFKJiUSiYEZ/nlVdKp15zLWv3+qtK4zeMohN0iqrIQdeogZqohdqIome0St6s8B6sd6tj0VrwcpnjtEfWJ8/BUuPxg=</latexit>
slide-24
SLIDE 24

A Generative Perspective

  • Unfortunately, text generation is hard!

Hypothesis: A woman is sleeping Relation: contradiction Premise: A woman is running in the park with her dog

slide-25
SLIDE 25

A Generative Perspective

  • Unfortunately, text generation is hard!

Hypothesis: A woman is sleeping Relation: contradiction Premise: A woman is running in the park with her dog Premise: A woman sings a song while playing piano Premise: This woman is laughing at her baby

slide-26
SLIDE 26

A Generative Perspective

  • Unfortunately, text generation is hard!
slide-27
SLIDE 27

A Generative Perspective

  • Unfortunately, text generation is hard!
  • Instead, rewrite as follows

log p(P|y, H) = log pθ(y|P, H)p(P|H) p(y|H)

<latexit sha1_base64="9v09EiJaXnOzu+QknBOgyld74=">ACLXicbVDLSsNAFJ34tr6qLt0MFqGClKQKuhFEXRZwT6gKWEynbSDk2SYuRFC7A+58VdEcFERt/6G0zQLXwcGDuecy517fCm4BtueWHPzC4tLyurpbX1jc2t8vZOW8eJoqxFYxGrk80EzxiLeAgWFcqRkJfsI5/dzX1O/dMaR5Ht5BK1g/JMOIBpwSM5JWvXREPsaw28QNOj3DjEJ/jXHIDRWgmPRdGDEg1NX4z92fZxuE4k7lqmFeu2DU7B/5LnIJUIGmV35xBzFNQhYBFUTrnmNL6GdEAaeCjUtuopk9I4MWc/QiIRM97P82jE+MoAB7EyLwKcq98nMhJqnYa+SYERvq3NxX/83oJBGf9jEcyARbR2aIgERhiPK0OD7hiFERqCKGKm79iOiKmJjAFl0wJzu+T/5J2veYc1+o3J5WLy6KOFbSH9lEVOegUXaAGaqIWougRPaMJerOerFfr3fqYResYmYX/YD1+QXElaQw</latexit>
slide-28
SLIDE 28

A Generative Perspective

  • Unfortunately, text generation is hard!
  • Instead, rewrite as follows
  • Assume

is constant

log p(P|y, H) = log pθ(y|P, H)p(P|H) p(y|H)

<latexit sha1_base64="9v09EiJaXnOzu+QknBOgyld74=">ACLXicbVDLSsNAFJ34tr6qLt0MFqGClKQKuhFEXRZwT6gKWEynbSDk2SYuRFC7A+58VdEcFERt/6G0zQLXwcGDuecy517fCm4BtueWHPzC4tLyurpbX1jc2t8vZOW8eJoqxFYxGrk80EzxiLeAgWFcqRkJfsI5/dzX1O/dMaR5Ht5BK1g/JMOIBpwSM5JWvXREPsaw28QNOj3DjEJ/jXHIDRWgmPRdGDEg1NX4z92fZxuE4k7lqmFeu2DU7B/5LnIJUIGmV35xBzFNQhYBFUTrnmNL6GdEAaeCjUtuopk9I4MWc/QiIRM97P82jE+MoAB7EyLwKcq98nMhJqnYa+SYERvq3NxX/83oJBGf9jEcyARbR2aIgERhiPK0OD7hiFERqCKGKm79iOiKmJjAFl0wJzu+T/5J2veYc1+o3J5WLy6KOFbSH9lEVOegUXaAGaqIWougRPaMJerOerFfr3fqYResYmYX/YD1+QXElaQw</latexit>

p(P|H)

<latexit sha1_base64="zpUfmMPoM3Sze2tHu8f0Ute6dAk=">AB73icbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9NJjBfsB7VKyabYNTbJrkhXK2j/hxYMiXv073vw3pu0etPXBwO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8e3Mbz9SpVk780kpr7AQ8lCRrCxUicuN9ATqp/3iyW34s6BVomXkRJkaPSLX71BRBJBpSEca9313Nj4KVaGEU6nhV6iaYzJGA9p1KJBdV+Or93is6sMkBhpGxJg+bq74kUC60nIrCdApuRXvZm4n9eNzHhtZ8yGSeGSrJYFCYcmQjNnkcDpigxfGIJorZWxEZYWJsREVbAje8surpFWteBeV6t1lqXaTxZGHEziFMnhwBTWoQwOaQIDM7zCm/PgvDjvzseiNedkM8fwB87nD2QpjuM=</latexit>
slide-29
SLIDE 29

A Generative Perspective

  • Unfortunately, text generation is hard!
  • Instead, rewrite as follows
  • Assume

is constant

  • We have

log p(P|y, H) = log pθ(y|P, H)p(P|H) p(y|H)

<latexit sha1_base64="9v09EiJaXnOzu+QknBOgyld74=">ACLXicbVDLSsNAFJ34tr6qLt0MFqGClKQKuhFEXRZwT6gKWEynbSDk2SYuRFC7A+58VdEcFERt/6G0zQLXwcGDuecy517fCm4BtueWHPzC4tLyurpbX1jc2t8vZOW8eJoqxFYxGrk80EzxiLeAgWFcqRkJfsI5/dzX1O/dMaR5Ht5BK1g/JMOIBpwSM5JWvXREPsaw28QNOj3DjEJ/jXHIDRWgmPRdGDEg1NX4z92fZxuE4k7lqmFeu2DU7B/5LnIJUIGmV35xBzFNQhYBFUTrnmNL6GdEAaeCjUtuopk9I4MWc/QiIRM97P82jE+MoAB7EyLwKcq98nMhJqnYa+SYERvq3NxX/83oJBGf9jEcyARbR2aIgERhiPK0OD7hiFERqCKGKm79iOiKmJjAFl0wJzu+T/5J2veYc1+o3J5WLy6KOFbSH9lEVOegUXaAGaqIWougRPaMJerOerFfr3fqYResYmYX/YD1+QXElaQw</latexit>

p(P|H)

<latexit sha1_base64="zpUfmMPoM3Sze2tHu8f0Ute6dAk=">AB73icbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9NJjBfsB7VKyabYNTbJrkhXK2j/hxYMiXv073vw3pu0etPXBwO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8e3Mbz9SpVk780kpr7AQ8lCRrCxUicuN9ATqp/3iyW34s6BVomXkRJkaPSLX71BRBJBpSEca9313Nj4KVaGEU6nhV6iaYzJGA9p1KJBdV+Or93is6sMkBhpGxJg+bq74kUC60nIrCdApuRXvZm4n9eNzHhtZ8yGSeGSrJYFCYcmQjNnkcDpigxfGIJorZWxEZYWJsREVbAje8surpFWteBeV6t1lqXaTxZGHEziFMnhwBTWoQwOaQIDM7zCm/PgvDjvzseiNedkM8fwB87nD2QpjuM=</latexit>

log pθ(y|P, H) − log p(y|H)

<latexit sha1_base64="XwFjTGJCo0Zi/mk4IpctsLjXRIY=">ACEnicbVA9SwNBEN2LXzF+RS1tFoOQgIa7KGgZtEkZwXxAEo69zV6yZO/2J0TQsxvsPGv2FgoYmtl579xc7lCEx8MPN6bYWaeFwmuwba/rczK6tr6RnYzt7W9s7uX3z9oahkryhpUCqnaHtFM8JA1gINg7UgxEniCtbzRzcxv3TOluQzvYByxXkAGIfc5JWAkN1/CXSEHOHK7MGRAimP8gOunuFbC+Cy1Eq1WcvMFu2wnwMvESUkBpai7+a9uX9I4YCFQbTuOHYEvQlRwKlg01w31iwidEQGrGNoSAKme5PkpSk+MUof+1KZCgEn6u+JCQm0Hge6QwIDPWiNxP/8zox+Fe9CQ+jGFhI54v8WGCQeJYP7nPFKIixIYQqbm7FdEgUoWBSzJkQnMWXl0mzUnbOy5Xbi0L1Oo0ji47QMSoiB12iKqhOmogih7RM3pFb9aT9WK9Wx/z1oyVzhyiP7A+fwDyZpb</latexit>

Need to estimate this

slide-30
SLIDE 30

Method 1: Auxiliary Hypothesis Classifier

  • Learn a new estimator

○ Share the hypothesis-encoder ○ Learn an additional classification layer ○ Multi-task objective function

pφ,θ(y|H)

<latexit sha1_base64="4KUbdLMltNwXDf+/5YB9S7xazG0=">AB/XicbVDLSsNAFJ3UV62v+Ni5CRahgpSkCrosumygn1AE8JkOmGTh7M3AgxFn/FjQtF3Pof7vwbp20Wj1w4XDOvdx7j5dwJsE0v7TS0vLK6lp5vbKxubW9o+/udWcCkI7JOax6HtYUs4i2gEGnPYTQXHocdrzxtdTv3dHhWRxdAtZQp0QjyLmM4JBSa5+kLi5nQTs1IaAp7UsofWiatXzbo5g/GXWAWpogJtV/+0hzFJQxoB4VjKgWUm4ORYACOcTip2KmCyRiP6EDRCIdUOvns+olxrJSh4cdCVQTGTP05keNQyiz0VGeIZCL3lT8zxuk4F86OYuSFGhE5ov8lBsQG9MojCETlADPFMFEMHWrQIsMAEVWEWFYC2+/Jd0G3XrN64Oa82r4o4yugQHaEastAFaqIWaqMOIugePaEX9Ko9as/am/Y+by1pxcw+gXt4xsqYpUH</latexit>

max

θ

L1(θ) = log pθ(y|P, H) − α log pφ,θ(y|H) max

φ

L2(φ) = β log pφ,θ(y|H)

<latexit sha1_base64="qyPzlropyZrSpuQrT/PaT01fc48=">ACnicfVFbaxNBFJ5dbzVeGvWxLweDkgNu/FBX4SqUIoXjGDaQCYsZyeT3aGzs8PMbGlY87P8I713zibEFb8cDAx/m+c4t1VJYF0UXQXj5q3bd3budu7df/Bwt/vo8bEtK8P4hJWyNMULZdC8YkTvKpNhyLVPKT9PRDw5+cWNFqb67lebzAjMloKh86mk+5OmPBOqRiky9WLdoQWeJ9Tl3CF8TuL+Fg7g+VugsxAt2R/BT9gvA9HA3gJFKXO8VJQU52L/a1svdF5EaXQenvSO4/6DRhAE4132hT8j0GHcrW47BKSbi8aRpuA6yBuQY+0MU6v+iZFXBlWMSrZ3FkXbzGo0THI/dmW5RnaKGZ95qLDgdl5v1ruGZz6zgGVp/FMONtk/f9RYWLsqUq8s0OX2Ktck/8XNKrd8M6+F0pXjim0LSsJroTmVrAQhjMnVx4gM8L3CixHg8z5i3b8EuKrI18Hx6Nh/Go4+jbqHbxv17FD9shT0icxeU0OyBEZkwlhwV7wLvgYfAohPAy/hF+30jBo/zwhf0U4/Q1yMZ</latexit>
slide-31
SLIDE 31

Method 1: Auxiliary Hypothesis Classifier

  • Learn a new estimator

max

θ

L1(θ) = log pθ(y|P, H) − α log pφ,θ(y|H) max

φ

L2(φ) = β log pφ,θ(y|H)

<latexit sha1_base64="qyPzlropyZrSpuQrT/PaT01fc48=">ACnicfVFbaxNBFJ5dbzVeGvWxLweDkgNu/FBX4SqUIoXjGDaQCYsZyeT3aGzs8PMbGlY87P8I713zibEFb8cDAx/m+c4t1VJYF0UXQXj5q3bd3budu7df/Bwt/vo8bEtK8P4hJWyNMULZdC8YkTvKpNhyLVPKT9PRDw5+cWNFqb67lebzAjMloKh86mk+5OmPBOqRiky9WLdoQWeJ9Tl3CF8TuL+Fg7g+VugsxAt2R/BT9gvA9HA3gJFKXO8VJQU52L/a1svdF5EaXQenvSO4/6DRhAE4132hT8j0GHcrW47BKSbi8aRpuA6yBuQY+0MU6v+iZFXBlWMSrZ3FkXbzGo0THI/dmW5RnaKGZ95qLDgdl5v1ruGZz6zgGVp/FMONtk/f9RYWLsqUq8s0OX2Ktck/8XNKrd8M6+F0pXjim0LSsJroTmVrAQhjMnVx4gM8L3CixHg8z5i3b8EuKrI18Hx6Nh/Go4+jbqHbxv17FD9shT0icxeU0OyBEZkwlhwV7wLvgYfAohPAy/hF+30jBo/zwhf0U4/Q1yMZ</latexit>

P H

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit>

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit>

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit>

φ

<latexit sha1_base64="7IV0aYw+KzLZQYPKxTqfPO3usi0=">AB63icbVC7SgNBFL3rM8ZX1FKRwSBYhd1YaBm0sUzAPCBZwuxkNhkyM7vMzAphSWlrY6GIrf+Q7DzG/wJZ5MUmnjgwuGce7n3niDmTBvX/XJWVtfWNzZzW/ntnd29/cLBYUNHiSK0TiIeqVaANeVM0rphtNWrCgWAafNYHib+c0HqjSL5L0ZxdQXuC9ZyAg2mdSJB6xbKLoldwq0TLw5KVZOJrXvx9NJtVv47PQikgqDeFY67bnxsZPsTKMcDrOdxJNY0yGuE/blkosqPbT6a1jdG6VHgojZUsaNFV/T6RYaD0Sge0U2Az0opeJ/3ntxITXfspknBgqyWxRmHBkIpQ9jnpMUWL4yBJMFLO3IjLAChNj48nbELzFl5dJo1zyLkvlmk3jBmbIwTGcwQV4cAUVuIMq1IHAJ7gBV4d4Tw7b87rHXFmc8cwR84Hz8B1ZHq</latexit>

φ

<latexit sha1_base64="7IV0aYw+KzLZQYPKxTqfPO3usi0=">AB63icbVC7SgNBFL3rM8ZX1FKRwSBYhd1YaBm0sUzAPCBZwuxkNhkyM7vMzAphSWlrY6GIrf+Q7DzG/wJZ5MUmnjgwuGce7n3niDmTBvX/XJWVtfWNzZzW/ntnd29/cLBYUNHiSK0TiIeqVaANeVM0rphtNWrCgWAafNYHib+c0HqjSL5L0ZxdQXuC9ZyAg2mdSJB6xbKLoldwq0TLw5KVZOJrXvx9NJtVv47PQikgqDeFY67bnxsZPsTKMcDrOdxJNY0yGuE/blkosqPbT6a1jdG6VHgojZUsaNFV/T6RYaD0Sge0U2Az0opeJ/3ntxITXfspknBgqyWxRmHBkIpQ9jnpMUWL4yBJMFLO3IjLAChNj48nbELzFl5dJo1zyLkvlmk3jBmbIwTGcwQV4cAUVuIMq1IHAJ7gBV4d4Tw7b87rHXFmc8cwR84Hz8B1ZHq</latexit>

pφ,θ(y|H)

<latexit sha1_base64="4KUbdLMltNwXDf+/5YB9S7xazG0=">AB/XicbVDLSsNAFJ3UV62v+Ni5CRahgpSkCrosumygn1AE8JkOmGTh7M3AgxFn/FjQtF3Pof7vwbp20Wj1w4XDOvdx7j5dwJsE0v7TS0vLK6lp5vbKxubW9o+/udWcCkI7JOax6HtYUs4i2gEGnPYTQXHocdrzxtdTv3dHhWRxdAtZQp0QjyLmM4JBSa5+kLi5nQTs1IaAp7UsofWiatXzbo5g/GXWAWpogJtV/+0hzFJQxoB4VjKgWUm4ORYACOcTip2KmCyRiP6EDRCIdUOvns+olxrJSh4cdCVQTGTP05keNQyiz0VGeIZCL3lT8zxuk4F86OYuSFGhE5ov8lBsQG9MojCETlADPFMFEMHWrQIsMAEVWEWFYC2+/Jd0G3XrN64Oa82r4o4yugQHaEastAFaqIWaqMOIugePaEX9Ko9as/am/Y+by1pxcw+gXt4xsqYpUH</latexit>

fP fH

g g

slide-32
SLIDE 32

Method 1: Auxiliary Hypothesis Classifier

  • Learn a new estimator

max

θ

L1(θ) = log pθ(y|P, H) − α log pφ,θ(y|H) max

φ

L2(φ) = β log pφ,θ(y|H)

<latexit sha1_base64="qyPzlropyZrSpuQrT/PaT01fc48=">ACnicfVFbaxNBFJ5dbzVeGvWxLweDkgNu/FBX4SqUIoXjGDaQCYsZyeT3aGzs8PMbGlY87P8I713zibEFb8cDAx/m+c4t1VJYF0UXQXj5q3bd3budu7df/Bwt/vo8bEtK8P4hJWyNMULZdC8YkTvKpNhyLVPKT9PRDw5+cWNFqb67lebzAjMloKh86mk+5OmPBOqRiky9WLdoQWeJ9Tl3CF8TuL+Fg7g+VugsxAt2R/BT9gvA9HA3gJFKXO8VJQU52L/a1svdF5EaXQenvSO4/6DRhAE4132hT8j0GHcrW47BKSbi8aRpuA6yBuQY+0MU6v+iZFXBlWMSrZ3FkXbzGo0THI/dmW5RnaKGZ95qLDgdl5v1ruGZz6zgGVp/FMONtk/f9RYWLsqUq8s0OX2Ktck/8XNKrd8M6+F0pXjim0LSsJroTmVrAQhjMnVx4gM8L3CixHg8z5i3b8EuKrI18Hx6Nh/Go4+jbqHbxv17FD9shT0icxeU0OyBEZkwlhwV7wLvgYfAohPAy/hF+30jBo/zwhf0U4/Q1yMZ</latexit>

P H

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit>

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit>

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit>

φ

<latexit sha1_base64="7IV0aYw+KzLZQYPKxTqfPO3usi0=">AB63icbVC7SgNBFL3rM8ZX1FKRwSBYhd1YaBm0sUzAPCBZwuxkNhkyM7vMzAphSWlrY6GIrf+Q7DzG/wJZ5MUmnjgwuGce7n3niDmTBvX/XJWVtfWNzZzW/ntnd29/cLBYUNHiSK0TiIeqVaANeVM0rphtNWrCgWAafNYHib+c0HqjSL5L0ZxdQXuC9ZyAg2mdSJB6xbKLoldwq0TLw5KVZOJrXvx9NJtVv47PQikgqDeFY67bnxsZPsTKMcDrOdxJNY0yGuE/blkosqPbT6a1jdG6VHgojZUsaNFV/T6RYaD0Sge0U2Az0opeJ/3ntxITXfspknBgqyWxRmHBkIpQ9jnpMUWL4yBJMFLO3IjLAChNj48nbELzFl5dJo1zyLkvlmk3jBmbIwTGcwQV4cAUVuIMq1IHAJ7gBV4d4Tw7b87rHXFmc8cwR84Hz8B1ZHq</latexit>

φ

<latexit sha1_base64="7IV0aYw+KzLZQYPKxTqfPO3usi0=">AB63icbVC7SgNBFL3rM8ZX1FKRwSBYhd1YaBm0sUzAPCBZwuxkNhkyM7vMzAphSWlrY6GIrf+Q7DzG/wJZ5MUmnjgwuGce7n3niDmTBvX/XJWVtfWNzZzW/ntnd29/cLBYUNHiSK0TiIeqVaANeVM0rphtNWrCgWAafNYHib+c0HqjSL5L0ZxdQXuC9ZyAg2mdSJB6xbKLoldwq0TLw5KVZOJrXvx9NJtVv47PQikgqDeFY67bnxsZPsTKMcDrOdxJNY0yGuE/blkosqPbT6a1jdG6VHgojZUsaNFV/T6RYaD0Sge0U2Az0opeJ/3ntxITXfspknBgqyWxRmHBkIpQ9jnpMUWL4yBJMFLO3IjLAChNj48nbELzFl5dJo1zyLkvlmk3jBmbIwTGcwQV4cAUVuIMq1IHAJ7gBV4d4Tw7b87rHXFmc8cwR84Hz8B1ZHq</latexit>

Gradient reversal

pφ,θ(y|H)

<latexit sha1_base64="4KUbdLMltNwXDf+/5YB9S7xazG0=">AB/XicbVDLSsNAFJ3UV62v+Ni5CRahgpSkCrosumygn1AE8JkOmGTh7M3AgxFn/FjQtF3Pof7vwbp20Wj1w4XDOvdx7j5dwJsE0v7TS0vLK6lp5vbKxubW9o+/udWcCkI7JOax6HtYUs4i2gEGnPYTQXHocdrzxtdTv3dHhWRxdAtZQp0QjyLmM4JBSa5+kLi5nQTs1IaAp7UsofWiatXzbo5g/GXWAWpogJtV/+0hzFJQxoB4VjKgWUm4ORYACOcTip2KmCyRiP6EDRCIdUOvns+olxrJSh4cdCVQTGTP05keNQyiz0VGeIZCL3lT8zxuk4F86OYuSFGhE5ov8lBsQG9MojCETlADPFMFEMHWrQIsMAEVWEWFYC2+/Jd0G3XrN64Oa82r4o4yugQHaEastAFaqIWaqMOIugePaEX9Ko9as/am/Y+by1pxcw+gXt4xsqYpUH</latexit>

fP fH

g g

slide-33
SLIDE 33

Method 2: Negative Sampling

  • Sample alternative premises
slide-34
SLIDE 34

Method 2: Negative Sampling

  • Sample alternative premises

○ Lower bound from Jensen’s inequality ○ Approximate the expectation with uniform samples P’

− log p(y | H) = − log X

P 0

p(P 0 | H)p(y | P 0, H) = − log EP 0p(y | P 0, H) ≥ −EP 0 log p(y | P 0, H),

<latexit sha1_base64="M+usxvslUe8h6cP4ndjKFs4nPnY=">ACo3icbVHdbtMwFHYCYyMbrGOX3FhUWwfaqS7gBukCQSaxAUBrdukOqoc5zSzZjuZ7SBVUV5sj8Edb4PTdhVdOZKlz9+PfM5xWgpubBj+8fwnTzebW49D7Z3Xrzc7ey9ujRFpRkMWSEKfZ1SA4IrGFpuBVyXGqhMBVylt59b/eoXaMLdWGnJS5opPOKPWUePOPUkh56qmgufqXROcEFHkuDyaYiJ5hs/f4sOPeE4SU8lxHfcaJ8e9pb70xr3j9k5IsIzURFJ7k6b4S/OQXDUHJIc7fLJuXG1j7j52dlDZQ6/jTjfsh7PC6yBagC5aVDzu/CZwSoJyjJBjRlFYWmTmrLmYAmIJWBkrJbmsPIQUlmKSe7bjB47J8KTQ7iLZ+y/iZpKY6Yydc52FPNYa8n/aPKTj4kNVdlZUGx+UOTSmBb4PbDcMY1MCumDlCmuesVsxuqKbPuWwO3hOjxyOvgctCPTvuDH4Pu2afFOrbQa/QGHaEIvUdn6BzFaIiYh72v3ncv9g/8b/5P/2Ju9b1FZh+tlJ/8BWNbxEc=</latexit>
slide-35
SLIDE 35

Method 2: Negative Sampling

  • Sample alternative premises

○ Lower bound from Jensen’s inequality ○ Approximate the expectation with uniform samples P’ ○ Multi-task objective function

max

θ

L1(θ) = (1 − α) log pθ(y|P, H) − α log pφ,θ(y|P 0, H) max

φ

L2(φ) = β log pφ,θ(y|P 0, H)

<latexit sha1_base64="AjC4+CUtwrCJUo2CId9KBqrYcQ=">ADenichVLbtNAFJ3YPIp5pWJkEaN3KTQRHaKBukCoTUBQsjkbZSJrLGk4k96nhsPGNE5Pof+DZ2fAkbFowfUYOL4EqWzr3nPvyBClnUjnOj5h3rp95+7OPev+g4ePHvd3985kmeEzkjCk+wiwJyJuhMcXpRZpRHAecngeX76r8+ReaSZaIT2qd0kWMQ8FWjGClQ/5u75uNAhoyUWDOQvG8tOwx4kI09H6vQHryBjY9kHvuFNyx1xhvCK6iTmqOBNzyqHISgZcNtu9bGWEVBULwvNxX+rUMh/aylHdVmrGupZSMqlpvJoWV1VtEVvpIRVRh+MF3Rw2slxq5Y4R5GmvqdvymvJ19TFsGBtCgdKIHTW0srtB0oTdKPpqAJ1Gz2Q+m8Ba3sLy+8PnIlTG7wJ3BYMQGue3/+OlgnJYyoU4VjKueukalHgTDHCqb5CLmKySUO6VxDgWMqF0X9dEpo68gSrpJMf0LBOrqtKHAs5ToONLP6HbKbq4J/y81ztXq9KJhIc0UFaRqtcg5VAqt3CJcso0TxtQaYZEzPCkmEM0yUfq3VEdzuyjfB2XTiHk+mH6eDk7ftOXbAU7APRsAFr8AJOAUemAHS+2k8Mw6MofHL3DcPzRcN1ei1mifgDzNf/gb6sgNz</latexit>

− log p(y | H) = − log X

P 0

p(P 0 | H)p(y | P 0, H) = − log EP 0p(y | P 0, H) ≥ −EP 0 log p(y | P 0, H),

<latexit sha1_base64="M+usxvslUe8h6cP4ndjKFs4nPnY=">ACo3icbVHdbtMwFHYCYyMbrGOX3FhUWwfaqS7gBukCQSaxAUBrdukOqoc5zSzZjuZ7SBVUV5sj8Edb4PTdhVdOZKlz9+PfM5xWgpubBj+8fwnTzebW49D7Z3Xrzc7ey9ujRFpRkMWSEKfZ1SA4IrGFpuBVyXGqhMBVylt59b/eoXaMLdWGnJS5opPOKPWUePOPUkh56qmgufqXROcEFHkuDyaYiJ5hs/f4sOPeE4SU8lxHfcaJ8e9pb70xr3j9k5IsIzURFJ7k6b4S/OQXDUHJIc7fLJuXG1j7j52dlDZQ6/jTjfsh7PC6yBagC5aVDzu/CZwSoJyjJBjRlFYWmTmrLmYAmIJWBkrJbmsPIQUlmKSe7bjB47J8KTQ7iLZ+y/iZpKY6Yydc52FPNYa8n/aPKTj4kNVdlZUGx+UOTSmBb4PbDcMY1MCumDlCmuesVsxuqKbPuWwO3hOjxyOvgctCPTvuDH4Pu2afFOrbQa/QGHaEIvUdn6BzFaIiYh72v3ncv9g/8b/5P/2Ju9b1FZh+tlJ/8BWNbxEc=</latexit>
slide-36
SLIDE 36

g

Method 2: Negative Sampling

  • Sample alternative premises

max

θ

L1(θ) = (1 − α) log pθ(y|P, H) − α log pφ,θ(y|P 0, H) max

φ

L2(φ) = β log pφ,θ(y|P 0, H)

<latexit sha1_base64="AjC4+CUtwrCJUo2CId9KBqrYcQ=">ADenichVLbtNAFJ3YPIp5pWJkEaN3KTQRHaKBukCoTUBQsjkbZSJrLGk4k96nhsPGNE5Pof+DZ2fAkbFowfUYOL4EqWzr3nPvyBClnUjnOj5h3rp95+7OPev+g4ePHvd3985kmeEzkjCk+wiwJyJuhMcXpRZpRHAecngeX76r8+ReaSZaIT2qd0kWMQ8FWjGClQ/5u75uNAhoyUWDOQvG8tOwx4kI09H6vQHryBjY9kHvuFNyx1xhvCK6iTmqOBNzyqHISgZcNtu9bGWEVBULwvNxX+rUMh/aylHdVmrGupZSMqlpvJoWV1VtEVvpIRVRh+MF3Rw2slxq5Y4R5GmvqdvymvJ19TFsGBtCgdKIHTW0srtB0oTdKPpqAJ1Gz2Q+m8Ba3sLy+8PnIlTG7wJ3BYMQGue3/+OlgnJYyoU4VjKueukalHgTDHCqb5CLmKySUO6VxDgWMqF0X9dEpo68gSrpJMf0LBOrqtKHAs5ToONLP6HbKbq4J/y81ztXq9KJhIc0UFaRqtcg5VAqt3CJcso0TxtQaYZEzPCkmEM0yUfq3VEdzuyjfB2XTiHk+mH6eDk7ftOXbAU7APRsAFr8AJOAUemAHS+2k8Mw6MofHL3DcPzRcN1ei1mifgDzNf/gb6sgNz</latexit>

P’ H

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit>

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit>

θ

<latexit sha1_base64="XSnmv7Q3htaBkjl1qDUiNLnL28=">AB7XicbZC7SgNBFIZn4y2ut6ilzWAQrMJuLQRgzaWEcwFkiXMTmaTMbMzy8xZISx5BxsLRWwsfBR7G/FtnFwKTfxh4OP/z2HOWEiuAHP+3ZyS8srq2v5dXdjc2t7p7C7Vzcq1ZTVqBJKN0NimOCS1YCDYM1EMxKHgjXCwdU4b9wzbiStzBMWBCTnuQRpwSsVW9DnwHpFIpeyZsIL4I/g+LFh3uevH251U7hs91VNI2ZBCqIMS3fSyDIiAZOBRu57dSwhNAB6bGWRUliZoJsMu0IH1mniyOl7ZOAJ+7vjozExgzj0FbGBPpmPhub/2WtFKzIOMySYFJOv0oSgUGhcer4y7XjIYWiBUczsrpn2iCQV7INcewZ9feRHq5ZJ/UirfeMXKJZoqjw7QITpGPjpFXSNqiGKLpD+gJPTvKeXRenNdpac6Z9eyjP3LefwAE25Jp</latexit> φ <latexit sha1_base64="7IV0aYw+KzLZQYPKxTqfPO3usi0=">AB63icbVC7SgNBFL3rM8ZX1FKRwSBYhd1YaBm0sUzAPCBZwuxkNhkyM7vMzAphSWlrY6GIrf+Q7DzG/wJZ5MUmnjgwuGce7n3niDmTBvX/XJWVtfWNzZzW/ntnd29/cLBYUNHiSK0TiIeqVaANeVM0rphtNWrCgWAafNYHib+c0HqjSL5L0ZxdQXuC9ZyAg2mdSJB6xbKLoldwq0TLw5KVZOJrXvx9NJtVv47PQikgqDeFY67bnxsZPsTKMcDrOdxJNY0yGuE/blkosqPbT6a1jdG6VHgojZUsaNFV/T6RYaD0Sge0U2Az0opeJ/3ntxITXfspknBgqyWxRmHBkIpQ9jnpMUWL4yBJMFLO3IjLAChNj48nbELzFl5dJo1zyLkvlmk3jBmbIwTGcwQV4cAUVuIMq1IHAJ7gBV4d4Tw7b87rHXFmc8cwR84Hz8B1ZHq</latexit>

φ

<latexit sha1_base64="7IV0aYw+KzLZQYPKxTqfPO3usi0=">AB63icbVC7SgNBFL3rM8ZX1FKRwSBYhd1YaBm0sUzAPCBZwuxkNhkyM7vMzAphSWlrY6GIrf+Q7DzG/wJZ5MUmnjgwuGce7n3niDmTBvX/XJWVtfWNzZzW/ntnd29/cLBYUNHiSK0TiIeqVaANeVM0rphtNWrCgWAafNYHib+c0HqjSL5L0ZxdQXuC9ZyAg2mdSJB6xbKLoldwq0TLw5KVZOJrXvx9NJtVv47PQikgqDeFY67bnxsZPsTKMcDrOdxJNY0yGuE/blkosqPbT6a1jdG6VHgojZUsaNFV/T6RYaD0Sge0U2Az0opeJ/3ntxITXfspknBgqyWxRmHBkIpQ9jnpMUWL4yBJMFLO3IjLAChNj48nbELzFl5dJo1zyLkvlmk3jBmbIwTGcwQV4cAUVuIMq1IHAJ7gBV4d4Tw7b87rHXFmc8cwR84Hz8B1ZHq</latexit>

Gradient reversal Random premise

fP fH

slide-37
SLIDE 37

What is this good for?

slide-38
SLIDE 38

What is this good for?

Are less biased models more transferable?

slide-39
SLIDE 39

A Toy Example

Synthetic dataset (unbiased)

slide-40
SLIDE 40

A Toy Example

Synthetic dataset (unbiased) Synthetic dataset (biased)

slide-41
SLIDE 41

A Toy Example

Synthetic dataset (unbiased) Synthetic dataset (biased)

slide-42
SLIDE 42

Models transfer well on synthetic data

Method 1: Auxiliary Hypothesis Classifier

slide-43
SLIDE 43

Method 2: Negative Sampling

Models transfer well on synthetic data

slide-44
SLIDE 44

Do the models transfer well

  • n standard NLI datasets?
slide-45
SLIDE 45

Degradation in domain

20 40 60 80 100

SNLI Test SNLI Hard Accuracy

Baseline Auxiliary Hyp. Classifier Negative Sampling

slide-46
SLIDE 46

Transfer to other datasets

Method 1: Auxiliary Hypothesis Classifier Baseline

Improvements in 9/11 datasets

!

slide-47
SLIDE 47

Transfer to other datasets

Baseline Method 2: Negative Sampling

Less consistent improvements When it works, it works well

! "

slide-48
SLIDE 48

Analysis

slide-49
SLIDE 49

Analysis

Q: Does it matter what kind of bias we have? A: Yes! Different biases than training data à

○ Usually, more improvement from our methods ○ But not always

slide-50
SLIDE 50

Analysis

Q: Does it matter what kind of bias we have? A: Yes! Different biases than training data à

○ Usually, more improvement from our methods ○ But not always

Q: Do stronger hyper-parameters help? A: More emphasis on the auxiliary objective à

○ More transferability, but worse in-domain performance

slide-51
SLIDE 51

Analysis

Q: Does it matter what kind of bias we have? A: Yes! Different biases than training data à

○ Usually, more improvement from our methods ○ But not always

Q: Do stronger hyper-parameters help? A: More emphasis on the auxiliary objective à

○ More transferability, but worse in-domain performance

Q: What if we get a bit of out-of-domain training data? A: Pre-training with our methods still helps

○ Especially with datasets with different biases

slide-52
SLIDE 52

More Analysis

Q: Are biases really removed from the hidden representations? A: Some, but not all

  • See our recent work: On Adversarial Removal of Hypothesis-only Bias in NLI,

*SEM 2019

slide-53
SLIDE 53

More Analysis

Q: Are biases really removed from the hidden representations? A: Some, but not all

  • See our recent work: On Adversarial Removal of Hypothesis-only Bias in NLI,

*SEM 2019

Q: Does this approach work for other tasks? A: Seems to work for VQA (Ramakrishnan+ ‘18) A: But there are shortcomings

  • See our recent work: Adversarial Regularization for VQA: Strengths,

Shortcomings, and Side Effects, SiVL 2019

slide-54
SLIDE 54

Contributions

  • Our approach may aid with one-sided biases in NLI and other tasks

○ Reduces the amount of bias ○ Improves transferability

But, the methods should be handled with care

Not all bias may be removed Some other information may also be removed The goal matters: bias may sometimes be helpful Acknowledgements:

slide-55
SLIDE 55

Contributions

  • Our approach may aid with one-sided biases in NLI and other tasks

○ Reduces the amount of bias ○ Improves transferability

  • Our analysis shows that the methods should be handled with care

○ Not all bias may be removed ○ Some other information may also be removed ○ The goal matters: bias may sometimes be helpful

Acknowledgements: