[PPT] - Semantically Equivalent Adversarial Rules for Debugging NLP Models PowerPoint Presentation

SLIDE 1

Semantically Equivalent Adversarial Rules for Debugging NLP Models

Marco Tulio Ribeiro Carlos Guestrin

Sameer Singh (UC Irvine)

1

SLIDE 2

NLP / ML models are getting smarter: VQA

What type of road sign is shown? > STOP .

2

Visual7A [Zhu et al 2016]

SLIDE 3

NLP / ML models are getting smarter: MC (SQuAD)

The biggest city on the river Rhine is Cologne, Germany with a population

f more than 1,050,000 people.

It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) How long is the Rhine? 1230km

Question: are they prone to oversensitivity?

3

BiDAF [Seo et al 2017]

SLIDE 4

Oversensitivity in images

Adversaries are indistinguishable to humans… But unlikely in the real world (except for attacks)

“panda” 57.7% confidence “gibbon” 99.3% confidence

4

SLIDE 5

Adversarial examples

Find closest example with different prediction

5

SLIDE 6

What about text?

What type of road sign is shown? > STOP . What type of road sign is shown?

Perceptible by humans, unlikely in real world

What type of road sign is sho wn?

6

SLIDE 7

What about text?

What type of road sign is shown? > STOP . What type of road sign is shown?

A single word changes too much!

7

SLIDE 8

Semantics matter

What type of road sign is shown? > Do not Enter. > STOP . What type of road sign is shown?

Bug, and likely in the real world

8

SLIDE 9

Semantics matter

The biggest city on the river Rhine is Cologne, Germany with a population

f more than 1,050,000 people.

It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) How long is the Rhine? > More than 1,050,000 > 1230km How long is the Rhine?

Not all changes are the same: semantics matter

9

SLIDE 10

Adversarial Rules

Find rule that generates many adversaries

10

SLIDE 11

Generalizing adversaries

What type of road sign is shown? > Do not Enter. > STOP . What type of road sign is shown?

flips 3.9% of examples

Rule What NOUN Which NOUN

11

SLIDE 12

Semantics matter

What color is the sky? > Gray. > Blue. What color is the sky?

flips 3.9% of examples

Rule What NOUN Which NOUN

12

SLIDE 13

Semantics matter

The biggest city on the river Rhine is Cologne, Germany with a population

f more than 1,050,000 people.

It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) How long is the Rhine? > More than 1,050,000 > 1230km How long is the Rhine?

flips 3% of examples

Rule ? ??

13

SLIDE 14

Semantics matter

Detailed investigation of chum salmon, Oncorhynchus keta, showed that these fish digest ctenophores 20 times as fast as an equal weight of shrimps. What is the oncorhynchus also called? > Oncorhynchus keta What is the oncorhynchus also called?

flips 3% of examples

Rule ? ??

> chum salmon

14

SLIDE 15

Adversarial Rules

Rules are global and actionable, more interesting than individual adversaries

15

SLIDE 16

Semantically Equivalent Adversary (SEA)

16

SLIDE 17

Ingredients

Semantic score function A black box model Semantically Equivalent Different prediction

17

SLIDE 18

Revisiting adversaries

Find closest example with different prediction

18

SLIDE 19

Sentence X en - pt en - fr Portuguese Translation French Translation fr - en pt - en

Semantic Similarity: Paraphrasing

Good movie! Bom filme! Bon film! Translators Back translators Score Good movie Good film Great movie … Movie good 0.35 0.34 0.1 … 0.001

Language model comes for free

19

[Mallinson et al, 2017]

SLIDE 20

Finding an adversary

What color is the tray? Pink What colour is the tray? Green Which color is the tray? Green What color is it? Green What color is tray? Pink How color is the tray? Green

20

SLIDE 21

Semantically Equivalent Adversarial Rules (SEARs)

21

SLIDE 22

From SEAs to Rules

Find SEAs Propose Candidate Rules Select Small Rule Set

22

SLIDE 23

Proposing Candidate Rules

(What → Which) (What NOUN → Which NOUN) (WP type → Which type) (WP NOUN → Which NOUN) … (What type → Which type) What type of road sign is shown? What type of road sign is shown? Candidate Rules:

Exact Match Context POS Tags

23

What Which type of road sign is shown? What Which is the person looking at? What Which was I thinking?

Must not change semantics

SLIDE 24

From SEAs to Rules

Find SEAs Propose Candidate Rules Select Small Rule Set

24

SLIDE 25

Semantically Equivalent Adversarial Rules (SEARS)

Induces many flipped predictions Flips different predictions High Adversary Count Non-Redundancy

What NOUN → Which NOUN What type → Which type color → colour Selected Rules

25

SLIDE 26

Examples: VQA

26

Visual7a-Telling [Zhu et al 2016]

SLIDE 27

Examples: Machine Comprehension

27

BiDAF [Seo et al 2017]

SLIDE 28

Examples: Movie Review Sentiment Analysis

28

FastText [Joulin et al 2016]

SLIDE 29

Experiments

29

SLIDE 30

1. SEAs vs Humans

30

SLIDE 31

Set up

Humans Top scored SEA SEA (top 5) + Human Evaluate adversaries for semantic equivalence

31

SLIDE 32

How often can SEAs be produced?

11.5 23 34.5 46 Human SEA Human + SEA 45 36 33.6 Human SEA Human + SEA 25 33.8 42.5 51.3 60 Human SEA Human + SEA 33 26 25.3 Human SEA Human + SEA

Visual Question Answering Sentiment Analysis

SEAs find equivalent adversaries as often as Humans SEAs + Humans better than Humans

32

SLIDE 33

Humans produce different adversaries:

33

Humans did not produce these: But they did produce these:

They are so easy to love… What kind of meat is on the boy’s plate? How many suitcases? Also great directing and photography

SLIDE 34

2. SEARs vs Experts

34

SLIDE 35

Part 1: experts come up with rules

35

Objective: maximize mistakes with good rules

SLIDE 36

Part 2: experts evaluate our SEARs

Experts only accept good rules

36

SLIDE 37

Results

4 8 12 16 Visual QA Sentiment 10.9 14.2 3.3 3 Experts SEARs 5 10 15 20 Visual QA Sentiment 5.4 10.1 12.9 16.9 Finding Rules Evaluating SEARs

% correct predictions flipped Time (minutes)

37

SLIDE 38

3. Fixing bugs

38

SLIDE 39

Closing the loop

(color → colour) (WP VBZ → WP’s) …

Filter out bad rules Augment training Retrain model Data

39

SLIDE 40

Results

3.5 7 10.5 14 Visual QA Sentiment 3.4 1.4 12.6 12.6 Original Augmented

% of flips due to bugs

Fix bugs, no loss in accuracy

40

SLIDE 41

Conclusion

41

Semantics matter

SEA SEARS

Models are prone to these bugs SEAs and SEARs help find and fix them

SLIDE 42

Semantically Equivalent Adversarial Rules for Debugging NLP Models

Marco Tulio Ribeiro Carlos Guestrin

Sameer Singh (UC Irvine)

42

SLIDE 43

43

Semantic scoring is still a research problem…

Also: inaccurate for long texts

SLIDE 44

Problem: not comparable across instances

Good movie Good film Great movie … 0.35 0.34 0.1 … good great excellent … 0.7 0.2 0.05 … 1 0.97 0.29 … 1 0.29 0.07 … good movie good

44

SLIDE 45

Examples: VQA

45

SLIDE 46

Examples: Movie Review Sentiment Analysis

46

FastText [Joulin et al 2016]

SLIDE 47

47