Distillation as a Defense to Adversarial Perturbations against Deep - - PowerPoint PPT Presentation

▶

Apr 20, 2023 966 likes •1.38k views

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks Nicolas Papernot , Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami May 24th, 2016 @ 37th IEEE Symposium on Security and Privacy @NicolasPapernot 1 M

SLIDE 1

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami

May 24th, 2016 @ 37th IEEE Symposium on Security and Privacy

@NicolasPapernot

SLIDE 2

–Johnny Appleseed

“Type a quote here.”

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

p0=0.01 p1=0.93 p8=0.02 pN=0.01

M components N components Neuron Weighted Link (weight is a parameter part of )

θO …

SLIDE 3

–Johnny Appleseed

“Type a quote here.”

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

p0=0.01 p1=0.02 p8=0.89 pN=0.01

M components N components Neuron Weighted Link (weight is a parameter part of )

θO …

SLIDE 4

SLIDE 5

Deep Learning for Classification

SLIDE 6

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

M components N components Neuron Weighted Link (weight is a parameter part of )

θO

SLIDE 7

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

M components N components Neuron Weighted Link (weight is a parameter part of )

θO

SLIDE 8

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

M components N components Neuron Weighted Link (weight is a parameter part of )

θO

SLIDE 9

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

M components N components Neuron Weighted Link (weight is a parameter part of )

θO

SLIDE 10

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

M components N components Neuron Weighted Link (weight is a parameter part of )

θO

SLIDE 11

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

M components N components Neuron Weighted Link (weight is a parameter part of )

θO

SLIDE 12

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

M components N components Neuron Weighted Link (weight is a parameter part of )

θO

SLIDE 13

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

M components N components Neuron Weighted Link (weight is a parameter part of )

θO

p0=0.01 p1=0.93 p8=0.02 pN=0.01

SLIDE 14

Audio Frame State

Phoneme

Word

Sentence Meaning

Feature Extraction Acoustic Model Decision Trees Lexicon Language Model NLP

Source: Tara N. Sainath, Google @ ICML DL Workshop 2015

SLIDE 15

Adversarial Samples

SLIDE 16

CIFAR10 Dataset

bird airplane truck automobile bird

0 1 2 3 4 5 6 7 8 9 Output classification 9 8 7 6 5 4 3 2 1 0 Input class

SLIDE 17

Adversarial strategy

SLIDE 18

Defending against Adversarial Perturbations

SLIDE 19

DNN Robustness

SLIDE 20

Defense Design

Low impact on the architecture
Maintain accuracy
Robust in space relatively close to the legitimate

distribution

Maintain speed of network

SLIDE 21

Softmax Layer and Probabilities

SLIDE 22

Defensive Distillation

SLIDE 23

Defensive Distillation

SLIDE 24

Defensive Distillation

SLIDE 25

Defensive Distillation

SLIDE 26

Defensive Distillation

SLIDE 27

Defensive Distillation

SLIDE 28

Defensive Distillation

SLIDE 29

Defensive Distillation

Set temperature T=1 for predictions

SLIDE 30

Intuition behind Defensive Distillation

Constraining Training Reducing Jacobian Amplitudes

0 if i not correct class never equal to 0

SLIDE 31

Validation

SLIDE 32

Experimental Setup

SLIDE 33

10 20 30 40 50 60 70 80 90 100 1 10 100 Adversarial Sample Success Rate Distillation Temperature Adversarial Samples Success Rate (MNIST) Adversarial Samples Baseline Rate (MNIST) Adversarial Samples Success Rate (CIFAR10) Adversarial Samples Baseline Rate (CIFAR10)

SLIDE 34

Impact on accuracy

SLIDE 35

Impact on Jacobian Amplitude

SLIDE 36

Estimation of Robustness

SLIDE 37

Conclusions

SLIDE 38

Take aways

Distillation significantly reduces attack success
Yields model smoothness
Easy implementation, low overhead
Acceptable impact on accuracy

SLIDE 39

Questions?

@NicolasPapernot nicolas@papernot.fr https://www.papernot.fr