On Evaluation of Adversarial Perturbations for Sequence-to-Sequence - - PowerPoint PPT Presentation

on evaluation of adversarial perturbations for sequence
SMART_READER_LITE
LIVE PREVIEW

On Evaluation of Adversarial Perturbations for Sequence-to-Sequence - - PowerPoint PPT Presentation

On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models Paul Michel, Xian Li, Graham Neubig, Juan Pino Adversarial Attacks/Perturbations Apply a small (indistinguishable) perturbation to the input that elicit large


slide-1
SLIDE 1

On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models

Paul Michel, Xian Li, Graham Neubig, Juan Pino

slide-2
SLIDE 2

Adversarial Attacks/Perturbations

  • Apply a small (indistinguishable) perturbation to the input that elicit

large changes in the output

slide-3
SLIDE 3

Adversarial Attacks/Perturbations

  • Apply a small (indistinguishable) perturbation to the input that elicit

large changes in the output

Figure from Goodfellow et al. (2014)

slide-4
SLIDE 4

Adversarial Attacks/Perturbations

  • Apply a small (indistinguishable) perturbation to the input that elicit

large changes in the output

Figure from Goodfellow et al. (2014)

slide-5
SLIDE 5

Adversarial Attacks/Perturbations

  • Apply a small (indistinguishable) perturbation to the input that elicit

large changes in the output

Figure from Goodfellow et al. (2014)

slide-6
SLIDE 6

Adversarial Attacks/Perturbations

  • Apply a small (indistinguishable) perturbation to the input that elicit

large changes in the output

Figure from Goodfellow et al. (2014)

slide-7
SLIDE 7
  • Small perturbations are well defined in vision

○ Small l2 ~= indistinguishable to the human eye

Indistinguishable Perturbations

...

l2 distance

slide-8
SLIDE 8
  • Small perturbations are well defined in vision

○ Small l2 ~= indistinguishable to the human eye

  • What about text?

Indistinguishable Perturbations

...

l2 distance

slide-9
SLIDE 9

Not all Text Perturbations are Equal

He’s very friendly

slide-10
SLIDE 10

Not all Text Perturbations are Equal

He’s very friendly He’s pretty friendly

[Similar meaning]

slide-11
SLIDE 11

Not all Text Perturbations are Equal

He’s very friendly He’s pretty friendly He’s very annoying

[Difgerent meaning] [Similar meaning]

✔ ❌

slide-12
SLIDE 12

Not all Text Perturbations are Equal

He’s very friendly He’s pretty friendly He’s very annoying He’s She friendly

[Difgerent meaning] [Similar meaning] [Nonsensical]

✔ ❌ ❌

slide-13
SLIDE 13

Not all Text Perturbations are Equal

He’s very friendly He’s pretty friendly He’s very annoying He’s She friendly

[Difgerent meaning] [Similar meaning] [Nonsensical]

✔ ❌ ❌

He’s very freindly

[Typo]

slide-14
SLIDE 14

Not all Text Perturbations are Equal

⇒Can’t expect the model to output the same output!

He’s very friendly He’s pretty friendly He’s very annoying He’s She friendly

[Difgerent meaning] [Similar meaning] [Nonsensical]

✔ ❌ ❌

He’s very freindly

[Typo]

slide-15
SLIDE 15

Not all Text Perturbations are Equal

⇒Can’t expect the model to output the same output!

He’s very friendly He’s pretty friendly He’s very annoying He’s She friendly

[Difgerent meaning] [Similar meaning] [Nonsensical]

✔ ❌ ❌

He’s very freindly

[Typo]

✔ This paper: Why and How you should evaluate adversarial perturbations

slide-16
SLIDE 16

A Framework for Evaluating Adversarial Attacks

slide-17
SLIDE 17

Problem Definition

Ils le réinvestissent directement en engageant plus de procès.

Original

They plow it right back into filing more troll lawsuits.

Reference

slide-18
SLIDE 18

Problem Definition

Ils le réinvestissent directement en engageant plus de procès.

Original

They plow it right back into filing more troll lawsuits.

Reference

slide-19
SLIDE 19

Problem Definition

Ils le réinvestissent directement en engageant plus de procès.

Original

They direct it directly by engaging more cases.

Base output

They plow it right back into filing more troll lawsuits.

Reference

slide-20
SLIDE 20

Problem Definition

Evaluate

Ils le réinvestissent directement en engageant plus de procès.

Original

They direct it directly by engaging more cases.

Base output

They plow it right back into filing more troll lawsuits.

Reference

slide-21
SLIDE 21

Problem Definition

Evaluate Attack

Ils le réinvestissent directement en engageant plus de procès.

Original

Ilss le réinvestissent dierctement en engagaent plus de procès.

  • Adv. src

They direct it directly by engaging more cases.

Base output

They plow it right back into filing more troll lawsuits.

Reference

slide-22
SLIDE 22

Problem Definition

Evaluate Attack

Ils le réinvestissent directement en engageant plus de procès.

Original

Ilss le réinvestissent dierctement en engagaent plus de procès.

  • Adv. src

They direct it directly by engaging more cases.

Base output

.. de plus.

  • Adv. output

They plow it right back into filing more troll lawsuits.

Reference

slide-23
SLIDE 23

Problem Definition

Evaluate Attack Evaluate too!

Ils le réinvestissent directement en engageant plus de procès.

Original

Ilss le réinvestissent dierctement en engagaent plus de procès.

  • Adv. src

They direct it directly by engaging more cases.

Base output

.. de plus.

  • Adv. output

They plow it right back into filing more troll lawsuits.

Reference

slide-24
SLIDE 24

Source Side Evaluation

  • Evaluate meaning preservation on the source side
  • Where is a similarity metric such that

He’s pretty friendly He’s very friendly

>

He’s very annoying He’s very friendly He’s pretty friendly He’s very friendly

>

He’s She friendly He’s very friendly

[...]

slide-25
SLIDE 25
  • Given , a similarity metric on the target side

Target Side Evaluation

slide-26
SLIDE 26
  • Given , a similarity metric on the target side

Target Side Evaluation

  • Evaluate relative meaning destruction on the target side
slide-27
SLIDE 27
  • Given , a similarity metric on the target side

Target Side Evaluation

  • Evaluate relative meaning destruction on the target side
slide-28
SLIDE 28
  • Given , a similarity metric on the target side

Target Side Evaluation

  • Evaluate relative meaning destruction on the target side
slide-29
SLIDE 29
  • Given , a similarity metric on the target side

Target Side Evaluation

  • Evaluate relative meaning destruction on the target side
slide-30
SLIDE 30
  • Given , a similarity metric on the target side

Target Side Evaluation

  • Evaluate relative meaning destruction on the target side
slide-31
SLIDE 31
  • Given , a similarity metric on the target side

Target Side Evaluation

  • Evaluate relative meaning destruction on the target side
slide-32
SLIDE 32

Successful Adversarial Attacks

  • Ensure that:
slide-33
SLIDE 33

Successful Adversarial Attacks

  • Ensure that:

Source meaning destruction

slide-34
SLIDE 34

Successful Adversarial Attacks

  • Ensure that:

Target meaning destruction Source meaning destruction

slide-35
SLIDE 35

Successful Adversarial Attacks

  • Ensure that:

Target meaning destruction Source meaning destruction

  • Destroy the meaning on the target side more than on the source side
slide-36
SLIDE 36

Which similarity metric to use?

  • Human evaluation

○ 6 point scale, details in paper

“How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences is meaningless 1. The topic is the same but the meaning is difgerent 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

slide-37
SLIDE 37

Which similarity metric to use?

  • Human evaluation

○ 6 point scale, details in paper

  • BLEU [Papineni et al., 2002]

○ Geometric mean of n-gram precision + length penalty

“How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences is meaningless 1. The topic is the same but the meaning is difgerent 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

slide-38
SLIDE 38

Which similarity metric to use?

  • Human evaluation

○ 6 point scale, details in paper

  • BLEU [Papineni et al., 2002]

○ Geometric mean of n-gram precision + length penalty

  • METEOR [Banerjee and Lavie, 2005]

○ Word matching taking into account stemming, synonyms, paraphrases...

“How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences is meaningless 1. The topic is the same but the meaning is difgerent 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

slide-39
SLIDE 39

Which similarity metric to use?

  • Human evaluation

○ 6 point scale, details in paper

  • BLEU [Papineni et al., 2002]

○ Geometric mean of n-gram precision + length penalty

  • METEOR [Banerjee and Lavie, 2005]

○ Word matching taking into account stemming, synonyms, paraphrases...

  • chrF [Popović, 2015]

○ Character n-gram F-score

“How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences is meaningless 1. The topic is the same but the meaning is difgerent 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

slide-40
SLIDE 40

Experimental Setting

slide-41
SLIDE 41

Data and Models

  • Data

○ IWSLT 2016 dataset ○ {Czech, German, French} → English

  • Models

○ LSTM based model ○ Transformer based model ○ Both word and sub-word based models

slide-42
SLIDE 42

Gradient Based Adversarial Attacks on Text

  • Idea: Back propagate through the model to score possible substitutions

Le gros chien .

slide-43
SLIDE 43

Gradient Based Adversarial Attacks on Text

  • Idea: Back propagate through the model to score possible substitutions

Le Encoder gros chien .

slide-44
SLIDE 44

Gradient Based Adversarial Attacks on Text

  • Idea: Back propagate through the model to score possible substitutions

Decoder The big dog . The big dog . <eos> Le Encoder gros chien .

slide-45
SLIDE 45

Gradient Based Adversarial Attacks on Text

  • Idea: Back propagate through the model to score possible substitutions

Decoder The big dog . The big dog . <eos> Le Encoder gros chien .

Adversarial loss

slide-46
SLIDE 46

Gradient Based Adversarial Attacks on Text

  • Idea: Back propagate through the model to score possible substitutions

Decoder The big dog . The big dog . <eos> Le Encoder gros chien .

Adversarial loss

slide-47
SLIDE 47

Gradient Based Adversarial Attacks on Text

  • Idea: Back propagate through the model to score possible substitutions

Decoder The big dog . The big dog . <eos> Le Encoder gros chien .

Adversarial loss

, ... chat ... petit ... un ...

slide-48
SLIDE 48

Constrained Adversarial Attacks

slide-49
SLIDE 49

Constrained Adversarial Attacks: kNN

  • Only replace words with 10 nearest neighbors in embedding space

Example from our fr→en Transformer source embeddings

○ grand (tall SING+MASC) ■ grands (tall PL+MASC) ■ grande (tall SING+FEM) ■ grandes (tall PL+FEM) ■ gros (fat SING+MASC) ■ grosse (fat SING+FEM) ○ math (math) ■ maths (maths) ■ mathématique (mathematic) ■ mathématiques (mathematics) ■ objective (objective [ADJ] SING+FEM)

slide-50
SLIDE 50

Constrained Adversarial Attacks: CharSwap

  • Only swap word internal characters to get OOVs

○ grand → grnad ○ adversarial → advresarial ○ [...]

  • If that’s impossible, repeat the last character

○ he → heeeeeee

⇒ Realistic typos

⤻ ⤺ ⤻ ⤺

slide-51
SLIDE 51

Constrained Adversarial Attacks

slide-52
SLIDE 52

Choosing an Similarity Metric

  • Human vs automatic (pearson r):

○ Humans score original/adversarial input ○ Humans score original/adversarial output ○ Compare scores to automatic metric with

Pearson correlation

slide-53
SLIDE 53

Choosing an Similarity Metric

  • Human vs automatic (pearson r):

○ Humans score original/adversarial input ○ Humans score original/adversarial output ○ Compare scores to automatic metric with

Pearson correlation

slide-54
SLIDE 54
  • chrF better

⇒ = := chrF ⇒ := RDchrF (Relative Decrease in chrF)

Choosing an Similarity Metric

  • Human vs automatic (pearson r):

○ Humans score original/adversarial input ○ Humans score original/adversarial output ○ Compare scores to automatic metric with

Pearson correlation

slide-55
SLIDE 55

Efgect of Constraints on Evaluation

Better target destruction Better source preservation

slide-56
SLIDE 56

Efgect of Constraints on Adversarial Training

slide-57
SLIDE 57

Efgect of Constraints on Adversarial Training

  • Adversarial training ≈ training with adversarial examples

○ 𝛽 = 0: Standard training ○ 𝛽 = 1 : Training only on adversarial examples

Standard input Adversarial input

slide-58
SLIDE 58

Efgect of Constraints on Adversarial Training

  • Adversarial training ≈ training with adversarial examples

○ 𝛽 = 0: Standard training ○ 𝛽 = 1 : Training only on adversarial examples

  • Training with Unconstrained attacks vs CharSwap attacks
  • Evaluate on

○ robustness to CharSwap attacks ○ Accuracy on non-adversarial data

Standard input Adversarial input

slide-59
SLIDE 59

Efgect of Constraints on Adversarial Training: Adversarial Robustness

  • Robustness to CharSwap attacks on the validation set

lower is better

slide-60
SLIDE 60

Efgect of Constraints on Adversarial Training: Adversarial Robustness

  • Robustness to CharSwap attacks on the validation set

lower is better

slide-61
SLIDE 61

Efgect of Constraints on Adversarial Training: Adversarial Robustness

  • Robustness to CharSwap attacks on the validation set

lower is better

slide-62
SLIDE 62

Efgect of Constraints on Adversarial Training: Adversarial Robustness

  • Robustness to CharSwap attacks on the validation set

lower is better

  • Adversarial training ⇒ better robustness
slide-63
SLIDE 63
  • Target chrF on the original test set

Efgect of Constraints on Adversarial Training: Accuracy on Non-Adversarial Input

Higher is better

slide-64
SLIDE 64
  • Target chrF on the original test set

Efgect of Constraints on Adversarial Training: Accuracy on Non-Adversarial Input

Higher is better

slide-65
SLIDE 65
  • Target chrF on the original test set

Efgect of Constraints on Adversarial Training: Accuracy on Non-Adversarial Input

Higher is better

slide-66
SLIDE 66
  • Target chrF on the original test set

Efgect of Constraints on Adversarial Training: Accuracy on Non-Adversarial Input

Higher is better

  • Unconstrained attacks ⇒ hurts accuracy
slide-67
SLIDE 67

Takeway

  • When doing adversarial attacks

○ Evaluate meaning preservation on the source side

“How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences is meaningless 1. The topic is the same but the meaning is difgerent 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

slide-68
SLIDE 68

Takeway

  • When doing adversarial attacks

○ Evaluate meaning preservation on the source side

“How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences is meaningless 1. The topic is the same but the meaning is difgerent 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

  • When doing adversarial training

○ Consider adding constraints to your attacks

slide-69
SLIDE 69
  • Not only true for seq2seq!

○ Easily transposed to classification, etc.. ○ Just adapt and accordingly

Takeway

  • When doing adversarial attacks

○ Evaluate meaning preservation on the source side

“How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences is meaningless 1. The topic is the same but the meaning is difgerent 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

  • When doing adversarial training

○ Consider adding constraints to your attacks

slide-70
SLIDE 70

TEAPOT

  • Tool implementing our evaluation

framework

  • pip install teapot-nlp
  • github.com/pmichel31415/teapot-nlp
slide-71
SLIDE 71

Questions

slide-72
SLIDE 72

Gradient Based Adversarial Attacks on Text

  • Idea: Word substitution ⟺ Adding word vector difgerence
  • Use the 1st order approximation to maximize the loss
slide-73
SLIDE 73

Human Evaluation: the Gold Standard

“How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences is meaningless 1. The topic is the same but the meaning is difgerent 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

Check for semantic similarity and fluency

slide-74
SLIDE 74

Example of a Successful Attack

(source chrF = 80.89, target RDchrF = 84.06)

Original

Ils le réinvestissent directement en engageant plus de procès.

  • Adv. src.

Ilss le réinvestissent dierctement en engagaent plus de procès.

Ref.

They plow it right back into filing more troll lawsuits.

Base output

They direct it directly by engaging more cases.

  • Adv. output

.. de plus.

slide-75
SLIDE 75

Example of an Unsuccessful Attack

(source chrF = 54.46, target RDchrF = 0.00)

Original

C’était en Juillet 1969.

  • Adv. src.

C’étiat en Jiullet 1969.

Ref.

This is from July, 1969.

Base output

This was in July 1969.

  • Adv. output

This is. in 1969.