On Adversarial Removal of Hypothesis-only Bias in Natural Language - - PowerPoint PPT Presentation

on adversarial removal of hypothesis only bias in natural
SMART_READER_LITE
LIVE PREVIEW

On Adversarial Removal of Hypothesis-only Bias in Natural Language - - PowerPoint PPT Presentation

On Adversarial Removal of Hypothesis-only Bias in Natural Language Inference Yonatan Belinkov * , Adam Poliak*, Benjamin Van Durme, Stuart Shieber, Alexander Rush *SEM, Minneapolis, MN June 7, 2019 Co-Authors Yonatan Belinkov Adam Poliak


slide-1
SLIDE 1

On Adversarial Removal of Hypothesis-only Bias in Natural Language Inference

Yonatan Belinkov*, Adam Poliak*, Benjamin Van Durme, Stuart Shieber, Alexander Rush

*SEM, Minneapolis, MN June 7, 2019

slide-2
SLIDE 2

Alexander Rush Stuart Shieber Adam Poliak Benjamin Van Durme

Co-Authors

Yonatan Belinkov

slide-3
SLIDE 3

Background

slide-4
SLIDE 4

Natural Language Inference

Premise: The brown cat ran Hypothesis: The animal moved

4

slide-5
SLIDE 5

Natural Language Inference

Premise: The brown cat ran Hypothesis: The animal moved entailment neutral contradiction

5

slide-6
SLIDE 6

Natural Language Inference

Premise: The brown cat ran Hypothesis: The animal moved

6

entailment neutral contradiction

slide-7
SLIDE 7

Natural Language Inference

Premise: The brown cat ran Hypothesis: The animal moved

7

entailment neutral contradiction

slide-8
SLIDE 8

Natural Language Inference

Premise: The brown cat ran Hypothesis: The animal moved

8

entailment neutral contradiction

slide-9
SLIDE 9

*SEM 2018

slide-10
SLIDE 10

Hypothesis Only NLI

10

slide-11
SLIDE 11

Hypothesis Only NLI

Hypothesis: A woman is sleeping

11

slide-12
SLIDE 12

Hypothesis Only NLI

Hypothesis: A woman is sleeping Premise:

12

slide-13
SLIDE 13

Hypothesis Only NLI

Hypothesis: A woman is sleeping Premise:

13

entailment neutral contradiction

slide-14
SLIDE 14

Hypothesis Only NLI

Hypothesis: A woman is sleeping Premise:

14

entailment neutral contradiction

slide-15
SLIDE 15

SNLI Results

15

slide-16
SLIDE 16

A woman is sleeping

16

slide-17
SLIDE 17

Hypothesis: A woman is sleeping Premises:

17

slide-18
SLIDE 18

Hypothesis: A woman is sleeping Premises: A woman sings a song while playing piano

18

slide-19
SLIDE 19

Hypothesis: A woman is sleeping Premises: This woman is laughing at her baby shower

19

slide-20
SLIDE 20

Hypothesis: A woman is sleeping Premises: A woman with glasses is playing jenga

20

slide-21
SLIDE 21

Why is she sleeping?

21

slide-22
SLIDE 22

Studies in eliciting norming data are prone to repeated responses across subjects

22

(see McRae et al. (2005) and discussion in §2 of Zhang et. al. (2017)’s Ordinal Common-sense Inference)

slide-23
SLIDE 23

Problem:

Hypothesis-only biases mean that models may not learn the true relationship between premise and hypothesis

23

slide-24
SLIDE 24

How to handle such biases?

24

slide-25
SLIDE 25

Strategies for dealing with dataset biases

  • Construct new datasets (Sharma et al. 2018)

○ $$$ ○ More bias

slide-26
SLIDE 26

Strategies for dealing with dataset biases

  • Construct new datasets (Sharma et al. 2018)

○ $$$ ○ More bias

  • Filter “easy” examples (Gururangan et al. 2018)

○ Hard to scale ○ May still have biases (see SWAG → BERT → HellaSWAG)

slide-27
SLIDE 27

Strategies for dealing with dataset biases

  • Construct new datasets (Sharma et al. 2018)

○ $$$ ○ More bias

  • Filter “easy” examples (Gururangan et al. 2018)

○ Hard to scale ○ May still have biases (see SWAG → BERT → HellaSWAG)

  • Forgo datasets with known biases

○ Not all bias is bad ○ Biased datasets may have other useful information

slide-28
SLIDE 28

Our solution:

Design architectures that facilitate learning less biased representations

slide-29
SLIDE 29

Adversarial Learning to the Rescue

slide-30
SLIDE 30

NLI Model Components

g – classifier f - encoder p h

slide-31
SLIDE 31

Baseline NLI Model

p h

slide-32
SLIDE 32

Method 1 –

  • Adv. Hypothesis-Only Classifier

p h

slide-33
SLIDE 33

Method 1 –

  • Adv. Hypothesis-Only Classifier

p h

slide-34
SLIDE 34

Method 1 –

  • Adv. Hypothesis-Only Classifier

Reverse gradients: Penalize hypothesis encoder if classifier does well

p h

slide-35
SLIDE 35

Method 2 –

  • Adv. Training Examples

p h

slide-36
SLIDE 36

Method 2 –

  • Adv. Training Examples

Perturb training examples

  • Randomly swap premises
  • Reverse gradients into

hypothesis encoder

p’ h

slide-37
SLIDE 37

Results & Analysis

slide-38
SLIDE 38

What happens to model performance?

slide-39
SLIDE 39

Degradation in domain

slide-40
SLIDE 40

Degradation in domain

slide-41
SLIDE 41

Are biases removed?

slide-42
SLIDE 42

Hidden biases - Adversarial Classifier

slide-43
SLIDE 43

Hidden biases - Adversarial Classifier

slide-44
SLIDE 44

Hidden biases - Adversarial Classifier

slide-45
SLIDE 45

Hidden biases - Adversarial Data

slide-46
SLIDE 46

Hidden biases - Adversarial Data

slide-47
SLIDE 47

What happens to specific biases?

slide-48
SLIDE 48

Indicator Words

Poliak et al (*SEM 2018) Gururangan et al (*NAACL 2018)

slide-49
SLIDE 49

Decrease in correlation with contradiction

Relative improvement when predicting contradiction

slide-50
SLIDE 50

What is this good for?

slide-51
SLIDE 51

Are less biased models more transferable?

slide-52
SLIDE 52

ACL 2019

slide-53
SLIDE 53

Method 1 –

  • Adv. Hypothesis-Only Classifier
slide-54
SLIDE 54

Method 2 –

  • Adv. Training Examples
slide-55
SLIDE 55

Conclusions

  • Adversarial learning may help combat hypothesis-side

biases in NLI

  • Applicable to other tasks with one-sided biases: reading

comprehension, visual question answering, etc.

slide-56
SLIDE 56

SiVL 2019

slide-57
SLIDE 57

Conclusions

  • Adversarial learning may help combat hypothesis-side

biases in NLI

  • Applicable to other tasks with one-sided biases
  • May reduce the amount of bias and improve transferability
  • But, the methods should be handled with care

○ Not all bias may be removed ○ The goal matters: some bias may be helpful in certain scenarios

  • Acknowledgements