Adversarial Training and Robustness for Multiple Perturbations - - PowerPoint PPT Presentation

▶

Jun 21, 2023 328 likes •565 views

Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramr & Dan Boneh NeurIPS 2019 Adversarial examples 88% Tabby Cat 99% Guacamole Szegedy et al., 2014 Goodfellow et al., 2015 Athalye, 2017 Adversarial

SLIDE 1

Adversarial Training and Robustness for Multiple Perturbations

Florian Tramèr & Dan Boneh NeurIPS 2019

Poster #87

SLIDE 2

Adversarial examples

Adversarial Training and Robustness for Multiple Perturbations

88% Tabby Cat 99% Guacamole

Szegedy et al., 2014 Goodfellow et al., 2015 Athalye, 2017

SLIDE 3

Adversarial examples

ML models learn very different features than humans
This is a safety concern for deployed ML models
Classification in adversarial settings is hard

Adversarial Training and Robustness for Multiple Perturbations

88% Tabby Cat 99% Guacamole

Szegedy et al., 2014 Goodfellow et al., 2015 Athalye, 2017

SLIDE 4

Adversarial Training and Robustness for Multiple Perturbations

Adversarial training

Szegedy et al., 2014 Madry et al., 2017

SLIDE 5

Adversarial Training and Robustness for Multiple Perturbations

Adversarial training

Szegedy et al., 2014 Madry et al., 2017

1. Choose a set of perturbations: e.g., noise of small ℓ∞ norm:

SLIDE 6

Adversarial Training and Robustness for Multiple Perturbations

Adversarial training

Szegedy et al., 2014 Madry et al., 2017

1. Choose a set of perturbations: e.g., noise of small ℓ∞ norm: 2. For each example , find an adversarial example: 3. Train the model on 4. Repeat until convergence

SLIDE 7

Adversarial Training and Robustness for Multiple Perturbations

How well does it work?

ℓ1 noise Rotation

Engstrom et al., 2017 Sharma & Chen, 2018

SLIDE 8

Adversarial Training and Robustness for Multiple Perturbations

How well does it work?

accuracy

16% 70% 96%

Adversarial training on CIFAR10, with ℓ∞ noise

ℓ1 noise Rotation No noise ℓ∞ noise

Engstrom et al., 2017

Sharma & Chen, 2018

SLIDE 9

Adversarial Training and Robustness for Multiple Perturbations

How well does it work?

accuracy

16% 70% 96%

Adversarial training on CIFAR10, with ℓ∞ noise

ℓ1 noise Rotation No noise ℓ∞ noise

Engstrom et al., 2017

Sharma & Chen, 2018

SLIDE 10

Adversarial Training and Robustness for Multiple Perturbations

How to prevent other adversarial examples?

SLIDE 11

Adversarial Training and Robustness for Multiple Perturbations

How to prevent other adversarial examples?

S1 = {δ: ❘❘δ❘❘∞ ≤ ε∞} S2 = {δ: ❘❘δ❘❘1 ≤ ε1} S3 = {𝜀:«small rotation»}

Adversary can

choose a perturbation type for each input

SLIDE 12

Pick worst-case adversarial example from S
Train the model on that example

Adversarial Training and Robustness for Multiple Perturbations

How to prevent other adversarial examples?

S1 = {δ: ❘❘δ❘❘∞ ≤ ε∞} S2 = {δ: ❘❘δ❘❘1 ≤ ε1} S3 = {𝜀:«small rotation»}

S = S1 ⋃ S2 ⋃ S3

Adversary can choose a perturbation type for each input

SLIDE 13

Adversarial Training and Robustness for Multiple Perturbations

Does this work?

SLIDE 14

Adversarial Training and Robustness for Multiple Perturbations

Does this work?

SLIDE 15

Adversarial Training and Robustness for Multiple Perturbations

Does this work?

A robustness tradeoff is provably inherent in some classification tasks Increased robustness to one type of noise ⇒ decreased robustness to another Empirically validated on CIFAR10 & MNIST

SLIDE 16

Adversarial Training and Robustness for Multiple Perturbations

Does this work?

A robustness tradeoff is provably inherent in some classification tasks Increased robustness to one type of noise ⇒ decreased robustness to another Empirically validated on CIFAR10 & MNIST MNIST:

SLIDE 17

Adversarial Training and Robustness for Multiple Perturbations

Does this work?

A robustness tradeoff is provably inherent in some classification tasks Increased robustness to one type of noise ⇒ decreased robustness to another Empirically validated on CIFAR10 & MNIST MNIST:

For ℓ∞, ℓ1 and ℓ2 noise:

50% accuracy

SLIDE 18

Adversarial Training and Robustness for Multiple Perturbations

Does this work?

A robustness tradeoff is provably inherent in some classification tasks Increased robustness to one type of noise ⇒ decreased robustness to another Empirically validated on CIFAR10 & MNIST MNIST:

gradient masking

For ℓ∞, ℓ1 and ℓ2 noise:

50% accuracy

SLIDE 19

Adversarial Training and Robustness for Multiple Perturbations

What if we combine perturbations?

SLIDE 20

Adversarial Training and Robustness for Multiple Perturbations

What if we combine perturbations?

natural image

rotation ℓ∞ noise ½ rotation + ½ ℓ∞ noise

SLIDE 21

Adversarial Training and Robustness for Multiple Perturbations

What if we combine perturbations?

Accuracy

55% 65% 70% 96%

natural image rotation ℓ∞ noise ½ rotation + ½ ℓ∞ noise No noise One noise type One of two noise types Mixture of two noise types

SLIDE 22

Adversarial Training and Robustness for Multiple Perturbations

Conclusion

Adversarial training for multiple perturbation sets works, but...

Significant loss in robustness
Weak robustness to affine combinations of perturbations

https://arxiv.org/abs/1904.13000

Poster #87

SLIDE 23

Adversarial Training and Robustness for Multiple Perturbations

Conclusion

Adversarial training for multiple perturbation sets works, but...

Significant loss in robustness
Weak robustness to affine combinations of perturbations

Open questions:

Train a single MNIST model with high robustness to any ℓp noise
Better scaling of multi-perturbation adversarial training
Which perturbations do we care about?