[PPT] - Certified Robustness to Adversarial Examples with Di ff erential PowerPoint Presentation

SLIDE 1

Certified Robustness to Adversarial Examples with Differential Privacy

Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana Columbia University

Code: https://github.com/columbia/pixeldp  Contact: mathias@cs.columbia.edu

SLIDE 2

2

Deep Learning

Deep Neural Networks (DNNs) deliver remarkable

performance on many tasks.

DNNs are increasingly deployed, including in attack-prone

contexts:

Taylor Swift Said to Use Facial Recognition to Identify Stalkers

By Sopan Deb, Natasha Singer - Dec. 13, 2018

SLIDE 3

3

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

Example

ticket 2 ticket 3 ticket 1 no ticket … … …

input x layer 1 layer 2 layer 3 softmax 0.1 0.2 0.1 0.6

SLIDE 4

4

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax

Example

no ticket

argmax

But DNNs are vulnerable to adversarial example attacks.

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

SLIDE 5

no ticket ticket 2

5

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax

+

Example

argmax 0.1 0.7 0.1 0.1

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

But DNNs are vulnerable to adversarial example attacks.

SLIDE 6

Accuracy (top 1)

0.25 0.5 0.75 1

Size of attack α (2-norm)

0.5 1 1.5 2 2.5 3

6

Accuracy under attack

Inception-v3 DNN on ImageNet dataset.

2

||α|| = 0.52 Teddy bear Giant panda

2

||α|| = 1.06 Teapot

SLIDE 7

1. Evaluate accuracy under attack:
Launch an attack on examples in a test set.
Compute accuracy on the attacked examples.
2. Improve accuracy under attack:
Many approaches: e.g. train on adversarial examples.

(e.g Goodfellow+ '15; Papernot+ '16; Buckman+ '18; Guo+ '18)

Problem: both steps are attack specific, leading to an arms race that attackers are winning.

(e.g Carlini-Wagner '17; Athalye+ '18)

Best-effort approaches

7

SLIDE 8

Guaranteed accuracy: what is my minimum accuracy

under any attack?

Prediction robustness: given a prediction can any

attack change it?

Key questions

8

SLIDE 9

A few recent approaches with provable guarantees.

(e.g. Wong-Kolter '18; Raghunathan+ '18; Wang+ '18)

Poor scalability in terms of:
Input dimension (e.g. number of pixels).
DNN size.
Size of training data.

9

Key questions

Guaranteed accuracy: what is my minimum accuracy

under any attack?

Prediction robustness: given a prediction can any

attack change it?

SLIDE 10

My defense PixelDP gives answers for norm

bounded attacks.

Key idea: novel use of differential privacy theory at

prediction time.

The most scalable approach: first provable

guarantees for large models on ImageNet!

10

Key questions

Guaranteed accuracy: what is my minimum accuracy

under any attack?

Prediction robustness: given a prediction can any

attack change it?

SLIDE 11

PixelDP outline

Motivation Design Evaluation

11

SLIDE 12

Key idea

12

Problem: small input perturbations create large score changes.

ticket 2

… … …

0.1 0.6 0.1 0.2 input x layer 1 layer 2 layer 3 softmax

+

0.1 0.7 0.1 0.1 argmax

2

=

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

SLIDE 13

Key idea

13

Problem: small input perturbations create large score changes.
Idea: design a DNN with bounded maximum score changes

(leveraging Differential Privacy theory).

ticket 2

… … …

0.1 0.6 0.1 0.2 input x layer 1 layer 2 layer 3 softmax

+

argmax

2

= 0.1 0.7 0.1 0.1

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

SLIDE 14

Differential Privacy (DP): technique to randomize a computation
ver a database, such that changing one data point can only

lead to bounded changes in the distribution over possible

utputs.
For (ε, δ)-DP randomized computation Af:

≤ P(Af(d) ∈ S) ≤ e✏P(Af(d0) ∈ S) + δ

(Af(d) (Af(d0)

Differential Privacy

14

We prove the Expected Output Stability Bound. For any DP

mechanism with bounded outputs in [0, 1] we have:

SLIDE 15

Make prediction DP

15

Key idea

Problem: small input perturbations create large score changes.
Idea: design a DNN with bounded maximum score changes

(leveraging Differential Privacy theory).

no ticket

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax argmax

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

SLIDE 16

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

stalker 2

argmax 0.1 0.2 0.1 0.6

Make prediction DP

16

Key idea

Problem: small input perturbations create large score changes.
Idea: design a DNN with bounded maximum score changes

(leveraging Differential Privacy theory).

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax

stability bounds

SLIDE 17

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

stalker 2

argmax 0.1 0.2 0.1 0.6

Make prediction DP

17

Key idea

Problem: small input perturbations create large score changes.
Idea: design a DNN with bounded maximum score changes

(leveraging Differential Privacy theory).

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax

stability bounds

SLIDE 18

18

PixelDP architecture

1. Add a new noise layer to make DNN DP

.

2. Estimate the DP DNN's mean scores.
3. Add estimation error in the stability bounds.

SLIDE 19

19

PixelDP architecture

… …

input x layer 1 noise layer

+

1. Add a new noise layer to make DNN DP

.

2. Estimate the DP DNN's mean scores.
3. Add estimation error in the stability bounds.

layer 2 layer 3 softmax

…

0.2 0.1 0.1 0.6

SLIDE 20

… …

input x layer 1 noise layer

+

PixelDP architecture

20

(

1. Add a new noise layer to make DNN DP

.

2. Estimate the DP DNN's mean scores.
3. Add estimation error in the stability bounds.

(ε, δ)-DP

layer 2 layer 3 softmax

…

0.2 0.1 0.1 0.6

SLIDE 21

… …

input x layer 1 noise layer

+

PixelDP architecture

Resilience to post-processing: any computation on the

utput of an (ε, δ)-DP mechanism is still (ε, δ)-DP

.

(

21

1. Add a new noise layer to make DNN DP

.

2. Estimate the DP DNN's mean scores.
3. Add estimation error in the stability bounds.

layer 2 layer 3 softmax

…

0.2 0.1 0.1 0.6

SLIDE 22

… …

input x layer 1 0.1 0.2 0.1 0.6

PixelDP architecture

layer 2 layer 3 softmax noise layer

+ …

0.2 0.1 0.1 0.6

? ^ Compute empirical mean with standard Monte Carlo estimate.

22

1. Add a new noise layer to make DNN DP

.

2. Estimate the DP DNN's mean scores.
3. Add estimation error in the stability bounds.

SLIDE 23

… …

input x layer 1 0.1 0.2 0.1 0.6

PixelDP architecture

…

0.2 0.1 0.1 0.6 layer 2 layer 3 softmax noise layer

+

23

1.0 0.5

stability bounds

^

η-confidence intervals

^ stalker 3 stalker 2 harmless stalker 1

1. Add a new noise layer to make DNN DP

.

2. Estimate the DP DNN's mean scores.
3. Add estimation error in the stability bounds.

SLIDE 24

… …

input x layer 1 0.1 0.2 0.1 0.6

PixelDP architecture

…

0.2 0.1 0.1 0.6 layer 2 layer 3 softmax noise layer

+

24

1.0 0.5

stability bounds

^

η-confidence intervals

^ stalker 3 stalker 2 harmless stalker 1

1. Add a new noise layer to make DNN DP

.

2. Estimate the DP DNN's mean scores.
3. Add estimation error in the stability bounds.

SLIDE 25

Train DP DNN with noise.
Control pre-noise sensitivity during training.
Support various attack norms ( ).
Scale to large DNNs and datasets.

Further challenges

0, L1, L2, L1

25

SLIDE 26

Scaling to Inception on ImageNet

26

Large dataset: image resolution is 300x300x3.
Large model:
48 layers deep.
23 millions parameters.
Released pre-trained by Google on ImageNet.

Inception-v3

SLIDE 27

… … …

input x

…

noise layer

+

input x

5

PixelDP auto-encoder

27

Scaling to Inception on ImageNet

SLIDE 28

28

Inception-v3

Scaling to Inception on ImageNet

… …

+

(

Post-processing

PixelDP auto-encoder

SLIDE 29

PixelDP Outline

Motivation Design Evaluation

29

SLIDE 30

Evaluation:

1. Guaranteed accuracy on large DNNs/datasets 2. Are robust predictions harder to attack in practice? 3. Comparison with other defenses against state-of-the- art attacks.

30

SLIDE 31

Methodology

Dataset Image size Number of Classes ImageNet 299x299x3 1000 CIFAR-100 32x32x3 100 CIFAR-10 32x32x3 10 SVHN 32x32x3 10 MNIST 28x28x1 10 Dataset Number of  Layers Number of Parameters Inception-v3 48 23M Wide ResNet 28 36M CNN 3 3M

Five datasets: Three models: Attack methodology: Metrics:

Guaranteed accuracy.
Accuracy under attack.
State of the art attack [Carlini

and Wagner S&P'17].

Strengthened against our

defense by averaging gradients

ver multiple noise draws.
31

SLIDE 32

Guaranteed accuracy on ImageNet with Inception-v3

Meaningful guaranteed accuracy for ImageNet!

Model Accuracy (%) Guaranteed accuracy (%) 0.05 0.1 0.2 Baseline 78

PixelDP: L=0.25

68 63 PixelDP: L=0.75 58 53 49 40

32

More DP noise

SLIDE 33

What if we only act on robust predictions?  (e.g. if not robust, check ticket)

Accuracy (top 1)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack size (2-norm)

0.2 0.4 0.6 0.8 1 1.2 1.4

Baseline Precision: threshold 0.05 Recall: threshold 0.05

33

Dataset: CIFAR-10

Accuracy on robust predictions

SLIDE 34

Accuracy (top 1)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack size (2-norm)

0.2 0.4 0.6 0.8 1 1.2 1.4

Baseline Precision: Recall: Madry+ '17

34

Dataset: CIFAR-10 Comparison:  Madry+ '17

Accuracy on robust predictions

If we increase the robustness threshold:  better accuracy, less predictions.

threshold 0.1 threshold 0.1

SLIDE 35

Comparison with other provable defenses

PixelDP scales to larger models, yielding better accuracy and robustness.

Accuracy (top 1)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack size (2-norm)

0.2 0.4 0.6 0.8 1 1.2 1.4

ResNet - PixelDP (L = 0.1) CNN - Wong-Kolter '18

35

Dataset: SVHN Comparison:  Wong-Kolter '18

SLIDE 36

PixelDP summary

PixelDP is the first defense that:
Gives attack-independent guarantees against norm-

bounded adversarial attacks.

And scales to the largest models and datasets.
Already extensions by others!
Improve the bounds at a given noise level (Li+ '18;

Cohen+ '19).

Use other noise distributions (Pinot+ '19).
Adapt optimization (Rakin+ '18).
36

SLIDE 37

37

SLIDE 38

Appendix

38

SLIDE 39

Comparison with best-effort techniques

PixelDP is empirically competitive with the  state-of-the-art best-effort defense.

39

Dataset: CIFAR-10 Comparison:  Best effort defense by Madry+ '17

Accuracy (top 1)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack size (2-norm)

0.2 0.4 0.6 0.8 1 1.2 1.4

Baseline PixelDP (L = 0.1) Madry+ '17

SLIDE 40

Related work

Best effort Certified

+ Scale:

Run a best effort attack per

gradient step [Goodfellow+ '15, Madry+ '17].

Preprocess inputs [Buckman+

'18, Guo+ '18].

Train a second model based on

the first one [Papernot+ '16]. + Flexible:

Support most architectures.
No robustness guarantees:
Often broken soon after release

[Athalye+ '18].

+ Provable guarantees:

Per prediction [Wong-Kolter+ '18,

Wong+ '18, Raghunathan+ '18, Wang+ '18].

In expectation [Sinha+ '17].
Hard to scale:
Requires orders of magnitude more

computation [Wong-Kolter+ '18, Wong+ '18, Wang+ '18].

Support only 1 hidden layer

[Raghunathan+ '18].

Often not flexible:
No ReLU, MaxPool, or accuracy

guarantees [Sinha+ '17].

Only ReLU, no BatchNorm [Wong-Kolter

'18].

40

PixelDP is the first certified defense that both achieves provable guarantees of robustness, scales and is broadly applicable to arbitrary networks.

SLIDE 41

Results - CIFAR-10

attack:

, L1

41

SLIDE 42

Results - SVHN

attack:

, L1

42

SLIDE 43

Certification on ImageNet/Inception-v3

Certified Accuracy

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack Size

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Baseline PixelDP (L=0.1) PixelDP (L=0.3) PixelDP (L=1.0)

43

SLIDE 44

Certification on CIFAR-10

Certified Accuracy

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack Size

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Baseline PixelDP (L=0.1) PixelDP (L=0.3)

44

SLIDE 45

Comparison with Best Effort Techniques

45

2

||α|| = 0.52 Teddy bear Giant panda Teddy bear Undefended:

2

||α|| = 3.41 Undefended:

SLIDE 46

Full references

[Goodfellow+ '15] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining

and harnessing adversarial examples. ICLR 2015.

[Papernot+ '16] N. Papernot, P

. McDaniel, X. Wu, S. Jha, and A.

Swami. Distillation as a defense to adversarial perturbations against

deep neural networks. S&P 2016.

[Buckman+ '18] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow.

Thermometer encoding: One hot way to resist adversarial examples. ICLR 2018.

[Guo+ '18] C. Guo, M. Rana, M. Cisse, and L. van der Maaten.

Countering adversarial images using input transformations. ICLR 2018.

[Madry+ '17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A.
Vladu. Towards deep learning models resistant to adversarial attacks.

arXiv 2017.

46

SLIDE 47

Full references

[Carlini-Wagner '17] ] N. Carlini and D. Wagner. Towards evaluating the

robustness of neural networks. S&P 2017.

[Athalye+ '18] A. Athalye, N. Carlini, and D. Wagner. Obfuscated

gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML 2018.

[Wong-Kolter '18] E. Wong and Z. Kolter. Provable defenses against

adversarial examples via the convex outer adversarial polytope. ICML 2018.

[Raghunathan+ '18] A. Raghunathan, J. Steinhardt, and P

. Liang. Certified defenses against adversarial examples. arXiv 2018.

[Wang+ '18] S. Wang, K. Pei, W. Justin, J. Yang, and S. Jana. Efficient

formal safety analysis of neural networks. NeurIPS 2018.

[Li+ '18] B. Li, C. Chen, W. Wang, and L. Carin. Second-Order

Adversarial Attack and Certifiable Robustness. arXiv 2018.

47

SLIDE 48

Full references

[Rakin+ '18] A.S. Rakin, Z. He, and D. Fan. Parametric Noise Injection:

Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack. arXiv 2018.

[Cohen+ '19] J. Cohen, E. Rosenfeld, and Z. Kolter. Certified

Adversarial Robustness via Randomized Smoothing. arXiv 2019.

[Pinot+ '19] R. Pinot, L. Meunier, A. Araujo, H. Kashima, F

. Yger, C. Gouy-Pailler, and J. Atif. Theoretical evidence for adversarial robustness through randomization: the case of the Exponential family. arXiv 2019.

48