Certified Robustness to Adversarial Examples with Di ff erential - - PowerPoint PPT Presentation

certified robustness to adversarial examples with di ff
SMART_READER_LITE
LIVE PREVIEW

Certified Robustness to Adversarial Examples with Di ff erential - - PowerPoint PPT Presentation

Certified Robustness to Adversarial Examples with Di ff erential Privacy Mathias Lcuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana Columbia University Code: https://github.com/columbia/pixeldp Contact:


slide-1
SLIDE 1

Certified Robustness to Adversarial Examples with Differential Privacy

Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana Columbia University

Code: https://github.com/columbia/pixeldp
 Contact: mathias@cs.columbia.edu

slide-2
SLIDE 2

2

Deep Learning

  • Deep Neural Networks (DNNs) deliver remarkable

performance on many tasks.

  • DNNs are increasingly deployed, including in attack-prone

contexts:

Taylor Swift Said to Use Facial Recognition to Identify Stalkers

By Sopan Deb, Natasha Singer - Dec. 13, 2018

slide-3
SLIDE 3

3

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

Example

ticket 2 ticket 3 ticket 1 no ticket … … …

input x layer 1 layer 2 layer 3 softmax 0.1 0.2 0.1 0.6

slide-4
SLIDE 4

4

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax

Example

no ticket

argmax

But DNNs are vulnerable to adversarial example attacks.

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

slide-5
SLIDE 5

no ticket ticket 2

5

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax

+

Example

argmax 0.1 0.7 0.1 0.1

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

But DNNs are vulnerable to adversarial example attacks.

slide-6
SLIDE 6

Accuracy (top 1)

0.25 0.5 0.75 1

Size of attack α (2-norm)

0.5 1 1.5 2 2.5 3

  • 6

Accuracy under attack

Inception-v3 DNN on ImageNet dataset.

2

||α|| = 0.52 Teddy bear Giant panda

2

||α|| = 1.06 Teapot

slide-7
SLIDE 7
  • 1. Evaluate accuracy under attack:
  • Launch an attack on examples in a test set.
  • Compute accuracy on the attacked examples.
  • 2. Improve accuracy under attack:
  • Many approaches: e.g. train on adversarial examples.

(e.g Goodfellow+ '15; Papernot+ '16; Buckman+ '18; Guo+ '18)

Problem: both steps are attack specific, leading to an arms race that attackers are winning.

(e.g Carlini-Wagner '17; Athalye+ '18)

Best-effort approaches

7

slide-8
SLIDE 8
  • Guaranteed accuracy: what is my minimum accuracy

under any attack?

  • Prediction robustness: given a prediction can any

attack change it?

Key questions

8

slide-9
SLIDE 9
  • A few recent approaches with provable guarantees.

(e.g. Wong-Kolter '18; Raghunathan+ '18; Wang+ '18)

  • Poor scalability in terms of:
  • Input dimension (e.g. number of pixels).
  • DNN size.
  • Size of training data.

9

Key questions

  • Guaranteed accuracy: what is my minimum accuracy

under any attack?

  • Prediction robustness: given a prediction can any

attack change it?

slide-10
SLIDE 10
  • My defense PixelDP gives answers for norm

bounded attacks.

  • Key idea: novel use of differential privacy theory at

prediction time.

  • The most scalable approach: first provable

guarantees for large models on ImageNet!

10

Key questions

  • Guaranteed accuracy: what is my minimum accuracy

under any attack?

  • Prediction robustness: given a prediction can any

attack change it?

slide-11
SLIDE 11

PixelDP outline

Motivation Design Evaluation

11

slide-12
SLIDE 12

Key idea

12

  • Problem: small input perturbations create large score changes.

ticket 2

… … …

0.1 0.6 0.1 0.2 input x layer 1 layer 2 layer 3 softmax

+

0.1 0.7 0.1 0.1 argmax

2

=

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

slide-13
SLIDE 13

Key idea

13

  • Problem: small input perturbations create large score changes.
  • Idea: design a DNN with bounded maximum score changes


(leveraging Differential Privacy theory).

ticket 2

… … …

0.1 0.6 0.1 0.2 input x layer 1 layer 2 layer 3 softmax

+

argmax

2

= 0.1 0.7 0.1 0.1

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

slide-14
SLIDE 14
  • Differential Privacy (DP): technique to randomize a computation
  • ver a database, such that changing one data point can only

lead to bounded changes in the distribution over possible

  • utputs.
  • For (ε, δ)-DP randomized computation Af:

≤ P(Af(d) ∈ S) ≤ e✏P(Af(d0) ∈ S) + δ

(Af(d) (Af(d0)

Differential Privacy

14

  • We prove the Expected Output Stability Bound. For any DP

mechanism with bounded outputs in [0, 1] we have:

slide-15
SLIDE 15

Make prediction DP

15

Key idea

  • Problem: small input perturbations create large score changes.
  • Idea: design a DNN with bounded maximum score changes


(leveraging Differential Privacy theory).

no ticket

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax argmax

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

slide-16
SLIDE 16

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

stalker 2

argmax 0.1 0.2 0.1 0.6

Make prediction DP

16

Key idea

  • Problem: small input perturbations create large score changes.
  • Idea: design a DNN with bounded maximum score changes


(leveraging Differential Privacy theory).

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax

stability bounds

slide-17
SLIDE 17

ticket 3 ticket 2 no ticket ticket 1 1.0 0.5

stalker 2

argmax 0.1 0.2 0.1 0.6

Make prediction DP

17

Key idea

  • Problem: small input perturbations create large score changes.
  • Idea: design a DNN with bounded maximum score changes


(leveraging Differential Privacy theory).

… … …

0.1 0.2 0.1 0.6 input x layer 1 layer 2 layer 3 softmax

stability bounds

slide-18
SLIDE 18

18

PixelDP architecture

  • 1. Add a new noise layer to make DNN DP

.

  • 2. Estimate the DP DNN's mean scores.
  • 3. Add estimation error in the stability bounds.
slide-19
SLIDE 19

19

PixelDP architecture

… …

input x layer 1 noise layer

+

  • 1. Add a new noise layer to make DNN DP

.

  • 2. Estimate the DP DNN's mean scores.
  • 3. Add estimation error in the stability bounds.

layer 2 layer 3 softmax

0.2 0.1 0.1 0.6

slide-20
SLIDE 20

… …

input x layer 1 noise layer

+

PixelDP architecture

20

(

  • 1. Add a new noise layer to make DNN DP

.

  • 2. Estimate the DP DNN's mean scores.
  • 3. Add estimation error in the stability bounds.

(ε, δ)-DP

layer 2 layer 3 softmax

0.2 0.1 0.1 0.6

slide-21
SLIDE 21

… …

input x layer 1 noise layer

+

PixelDP architecture

Resilience to post-processing: any computation on the

  • utput of an (ε, δ)-DP mechanism is still (ε, δ)-DP

.

(

21

  • 1. Add a new noise layer to make DNN DP

.

  • 2. Estimate the DP DNN's mean scores.
  • 3. Add estimation error in the stability bounds.

layer 2 layer 3 softmax

0.2 0.1 0.1 0.6

slide-22
SLIDE 22

… …

input x layer 1 0.1 0.2 0.1 0.6

PixelDP architecture

layer 2 layer 3 softmax noise layer

+ …

0.2 0.1 0.1 0.6

? ^ Compute empirical mean with standard Monte Carlo estimate.

22

  • 1. Add a new noise layer to make DNN DP

.

  • 2. Estimate the DP DNN's mean scores.
  • 3. Add estimation error in the stability bounds.
slide-23
SLIDE 23

… …

input x layer 1 0.1 0.2 0.1 0.6

PixelDP architecture

0.2 0.1 0.1 0.6 layer 2 layer 3 softmax noise layer

+

23

1.0 0.5

stability bounds

^

η-confidence intervals

^ stalker 3 stalker 2 harmless stalker 1

  • 1. Add a new noise layer to make DNN DP

.

  • 2. Estimate the DP DNN's mean scores.
  • 3. Add estimation error in the stability bounds.
slide-24
SLIDE 24

… …

input x layer 1 0.1 0.2 0.1 0.6

PixelDP architecture

0.2 0.1 0.1 0.6 layer 2 layer 3 softmax noise layer

+

24

1.0 0.5

stability bounds

^

η-confidence intervals

^ stalker 3 stalker 2 harmless stalker 1

  • 1. Add a new noise layer to make DNN DP

.

  • 2. Estimate the DP DNN's mean scores.
  • 3. Add estimation error in the stability bounds.
slide-25
SLIDE 25
  • Train DP DNN with noise.
  • Control pre-noise sensitivity during training.
  • Support various attack norms ( ).
  • Scale to large DNNs and datasets.

Further challenges

0, L1, L2, L1

25

slide-26
SLIDE 26

Scaling to Inception on ImageNet

26

  • Large dataset: image resolution is 300x300x3.
  • Large model:
  • 48 layers deep.
  • 23 millions parameters.
  • Released pre-trained by Google on ImageNet.

Inception-v3

slide-27
SLIDE 27

… … …

input x

noise layer

+

input x

5

PixelDP auto-encoder

27

Scaling to Inception on ImageNet

slide-28
SLIDE 28

28

Inception-v3

Scaling to Inception on ImageNet

… …

… …

+

(

Post-processing

PixelDP auto-encoder

slide-29
SLIDE 29

PixelDP Outline

Motivation Design Evaluation

29

slide-30
SLIDE 30

Evaluation:

1. Guaranteed accuracy on large DNNs/datasets 2. Are robust predictions harder to attack in practice? 3. Comparison with other defenses against state-of-the- art attacks.

  • 30
slide-31
SLIDE 31

Methodology

Dataset Image size Number of Classes ImageNet 299x299x3 1000 CIFAR-100 32x32x3 100 CIFAR-10 32x32x3 10 SVHN 32x32x3 10 MNIST 28x28x1 10 Dataset Number of
 Layers Number of Parameters Inception-v3 48 23M Wide ResNet 28 36M CNN 3 3M

Five datasets: Three models: Attack methodology: Metrics:

  • Guaranteed accuracy.
  • Accuracy under attack.
  • State of the art attack [Carlini

and Wagner S&P'17].

  • Strengthened against our

defense by averaging gradients

  • ver multiple noise draws.
  • 31
slide-32
SLIDE 32

Guaranteed accuracy on ImageNet with Inception-v3

Meaningful guaranteed accuracy for ImageNet!

Model Accuracy (%) Guaranteed accuracy (%) 0.05 0.1 0.2 Baseline 78

  • PixelDP: L=0.25

68 63 PixelDP: L=0.75 58 53 49 40

  • 32

More DP noise

slide-33
SLIDE 33

What if we only act on robust predictions?
 (e.g. if not robust, check ticket)

Accuracy (top 1)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack size (2-norm)

0.2 0.4 0.6 0.8 1 1.2 1.4

Baseline Precision: threshold 0.05 Recall: threshold 0.05

  • 33

Dataset: CIFAR-10

Accuracy on robust predictions

slide-34
SLIDE 34

Accuracy (top 1)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack size (2-norm)

0.2 0.4 0.6 0.8 1 1.2 1.4

Baseline Precision: Recall: Madry+ '17

  • 34

Dataset: CIFAR-10 Comparison:
 Madry+ '17

Accuracy on robust predictions

If we increase the robustness threshold:
 better accuracy, less predictions.

threshold 0.1 threshold 0.1

slide-35
SLIDE 35

Comparison with other provable defenses

PixelDP scales to larger models, yielding better accuracy and robustness.

Accuracy (top 1)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack size (2-norm)

0.2 0.4 0.6 0.8 1 1.2 1.4

ResNet - PixelDP (L = 0.1) CNN - Wong-Kolter '18

  • 35

Dataset: SVHN Comparison:
 Wong-Kolter '18

slide-36
SLIDE 36

PixelDP summary

  • PixelDP is the first defense that:
  • Gives attack-independent guarantees against norm-

bounded adversarial attacks.

  • And scales to the largest models and datasets.
  • Already extensions by others!
  • Improve the bounds at a given noise level (Li+ '18;

Cohen+ '19).

  • Use other noise distributions (Pinot+ '19).
  • Adapt optimization (Rakin+ '18).
  • 36
slide-37
SLIDE 37

37

slide-38
SLIDE 38

Appendix

38

slide-39
SLIDE 39

Comparison with best-effort techniques

PixelDP is empirically competitive with the
 state-of-the-art best-effort defense.

  • 39

Dataset: CIFAR-10 Comparison:
 Best effort defense by Madry+ '17

Accuracy (top 1)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack size (2-norm)

0.2 0.4 0.6 0.8 1 1.2 1.4

Baseline PixelDP (L = 0.1) Madry+ '17

slide-40
SLIDE 40

Related work

Best effort Certified

+ Scale:

  • Run a best effort attack per

gradient step [Goodfellow+ '15, Madry+ '17].

  • Preprocess inputs [Buckman+

'18, Guo+ '18].

  • Train a second model based on

the first one [Papernot+ '16]. + Flexible:

  • Support most architectures.
  • No robustness guarantees:
  • Often broken soon after release

[Athalye+ '18].

+ Provable guarantees:

  • Per prediction [Wong-Kolter+ '18,

Wong+ '18, Raghunathan+ '18, Wang+ '18].

  • In expectation [Sinha+ '17].
  • Hard to scale:
  • Requires orders of magnitude more

computation [Wong-Kolter+ '18, Wong+ '18, Wang+ '18].

  • Support only 1 hidden layer

[Raghunathan+ '18].

  • Often not flexible:
  • No ReLU, MaxPool, or accuracy

guarantees [Sinha+ '17].

  • Only ReLU, no BatchNorm [Wong-Kolter

'18].

  • 40

PixelDP is the first certified defense that both achieves provable guarantees of robustness, scales and is broadly applicable to arbitrary networks.

slide-41
SLIDE 41

Results - CIFAR-10

attack:

, L1

  • 41
slide-42
SLIDE 42

Results - SVHN

attack:

, L1

  • 42
slide-43
SLIDE 43

Certification on ImageNet/Inception-v3

Certified Accuracy

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack Size

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Baseline PixelDP (L=0.1) PixelDP (L=0.3) PixelDP (L=1.0)

  • 43
slide-44
SLIDE 44

Certification on CIFAR-10

Certified Accuracy

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Attack Size

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Baseline PixelDP (L=0.1) PixelDP (L=0.3)

  • 44
slide-45
SLIDE 45

Comparison with Best Effort Techniques

  • 45

2

||α|| = 0.52 Teddy bear Giant panda Teddy bear Undefended:

2

||α|| = 3.41 Undefended:

slide-46
SLIDE 46

Full references

  • [Goodfellow+ '15] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining

and harnessing adversarial examples. ICLR 2015.

  • [Papernot+ '16] N. Papernot, P

. McDaniel, X. Wu, S. Jha, and A.

  • Swami. Distillation as a defense to adversarial perturbations against

deep neural networks. S&P 2016.

  • [Buckman+ '18] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow.

Thermometer encoding: One hot way to resist adversarial examples. ICLR 2018.

  • [Guo+ '18] C. Guo, M. Rana, M. Cisse, and L. van der Maaten.

Countering adversarial images using input transformations. ICLR 2018.

  • [Madry+ '17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A.
  • Vladu. Towards deep learning models resistant to adversarial attacks.

arXiv 2017.

46

slide-47
SLIDE 47

Full references

  • [Carlini-Wagner '17] ] N. Carlini and D. Wagner. Towards evaluating the

robustness of neural networks. S&P 2017.

  • [Athalye+ '18] A. Athalye, N. Carlini, and D. Wagner. Obfuscated

gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML 2018.

  • [Wong-Kolter '18] E. Wong and Z. Kolter. Provable defenses against

adversarial examples via the convex outer adversarial polytope. ICML 2018.

  • [Raghunathan+ '18] A. Raghunathan, J. Steinhardt, and P

. Liang. Certified defenses against adversarial examples. arXiv 2018.

  • [Wang+ '18] S. Wang, K. Pei, W. Justin, J. Yang, and S. Jana. Efficient

formal safety analysis of neural networks. NeurIPS 2018.

  • [Li+ '18] B. Li, C. Chen, W. Wang, and L. Carin. Second-Order

Adversarial Attack and Certifiable Robustness. arXiv 2018.

47

slide-48
SLIDE 48

Full references

  • [Rakin+ '18] A.S. Rakin, Z. He, and D. Fan. Parametric Noise Injection:

Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack. arXiv 2018.

  • [Cohen+ '19] J. Cohen, E. Rosenfeld, and Z. Kolter. Certified

Adversarial Robustness via Randomized Smoothing. arXiv 2019.

  • [Pinot+ '19] R. Pinot, L. Meunier, A. Araujo, H. Kashima, F

. Yger, C. Gouy-Pailler, and J. Atif. Theoretical evidence for adversarial robustness through randomization: the case of the Exponential family. arXiv 2019.

48