Obfuscated Gradients Give a False Sense of Security: Circumventing - - PowerPoint PPT Presentation

obfuscated gradients
SMART_READER_LITE
LIVE PREVIEW

Obfuscated Gradients Give a False Sense of Security: Circumventing - - PowerPoint PPT Presentation

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Anish Athalye* 1 , Nicholas Carlini * 2 , and David Wagner 3 1 Massachusetts Institute of Technology 2 University of California, Berkeley (now


slide-1
SLIDE 1

Obfuscated Gradients

Give a False Sense of Security:

Circumventing Defenses to Adversarial Examples Anish Athalye*1, Nicholas Carlini*2 , and David Wagner3 1 Massachusetts Institute of Technology 2 University of California, Berkeley (now Google Brain) 3 University of California, Berkeley
slide-2
SLIDE 2

Advice on performing adversarial example defense evaluations

Or,

slide-3
SLIDE 3
slide-4
SLIDE 4 Definition 1: Inputs specifically crafted to fool a neural network. Definition 2: Given an input x, find an input x' that is misclassified such that |x-x'| < 𝜁 Correct definition. Hard to formalize. Adversarial Examples Not complete. Easy to formalize.
slide-5
SLIDE 5 Adversarial Examples Definition 1 Defn. 2
slide-6
SLIDE 6

13 total defense papers at ICLR'18 9 are white-box, non-certified 6 of these are broken 
 (~0% accuracy) 1 of these is partially broken

slide-7
SLIDE 7

~50% of our paper is our attacks

slide-8
SLIDE 8

~50% of our paper is our attacks

This talk is about the other 50%.

slide-9
SLIDE 9
slide-10
SLIDE 10

This Talk:

How should we evaluate
 adversarial example defenses?

slide-11
SLIDE 11
slide-12
SLIDE 12
  • 1. A precise threat model
  • 2. A clear defense proposal
  • 3. A thorough evaluation
slide-13
SLIDE 13
slide-14
SLIDE 14

A threat model is a formal statement defining when a system is intended to be secure.

  • 1. Threat Model
slide-15
SLIDE 15
  • 1. Threat Model
What dataset is considered? Adversarial example definition? What does the attacker know? (model architecture? parameters?
 training data? randomness?) If black-box: are queries allowed?
slide-16
SLIDE 16 All Possible Adversaries Threat Model
slide-17
SLIDE 17 All Possible Adversaries Threat Model
slide-18
SLIDE 18 All Possible Adversaries
 
 
 
 
 Threat Model
slide-19
SLIDE 19

Good Threat Model: "Robust when L2 distortion is less than 5, given the attacker has white-box knowledge" Claim: 90% accuracy on ImageNet

slide-20
SLIDE 20
slide-21
SLIDE 21
  • 2. Defense Proposal

Precise proposal of one specific defense 
 


(with code and models available)
slide-22
SLIDE 22
slide-23
SLIDE 23 A defense evaluation has one purpose, to answer:


"Is the defense secure under the threat model?"

  • 3. Defense Evaluation
slide-24
SLIDE 24 acc, loss = model.evaluate(
 Xtest, Ytest)

Is no longer sufficient.

  • 3. Defense Evaluation
slide-25
SLIDE 25

This step is why security is hard

  • 3. Defense Evaluation
slide-26
SLIDE 26

Serious effort
 to evaluate
 By space, most papers are ½ evaluation

slide-27
SLIDE 27

Going through the motions is

insufficient

to evaluate a defense to adversarial examples

slide-28
SLIDE 28

The purpose of a defense evaluation is NOT to show the defense is RIGHT

slide-29
SLIDE 29

The purpose of a defense evaluation is to FAIL to show the defense is WRONG

slide-30
SLIDE 30
slide-31
SLIDE 31

Everything the following papers do is standard practice Actionable advice requires specific, concrete examples

slide-32
SLIDE 32

Perform an adaptive attack

slide-33
SLIDE 33

A "hold out" set is not an adaptive attack

slide-34
SLIDE 34

Stop using FGSM (exclusively)

slide-35
SLIDE 35

Use more than 100 (or 1000?) iteration of gradient descent

slide-36
SLIDE 36

Iterative attacks should always do better than single step attacks.

slide-37
SLIDE 37

Unbounded optimization attacks should eventually reach in 0% accuracy

slide-38
SLIDE 38

Unbounded optimization attacks should eventually reach in 0% accuracy

slide-39
SLIDE 39

Unbounded optimization attacks should eventually reach in 0% accuracy

slide-40
SLIDE 40

Model accuracy should be monotonically decreasing

slide-41
SLIDE 41

Model accuracy should be monotonically decreasing

slide-42
SLIDE 42
slide-43
SLIDE 43

slide-44
SLIDE 44

Evaluate against the worst attack

slide-45
SLIDE 45

Plot accuracy vs distortion

slide-46
SLIDE 46

Verify enough iterations

  • f gradient descent
slide-47
SLIDE 47

Try gradient-free attack algorithms

slide-48
SLIDE 48
slide-49
SLIDE 49

Conclusion

The hardest part of a
 defense is the evaluation

slide-50
SLIDE 50

Anish: aathalye@mit.edu Me: nicholas@carlini.com

Please do reach out to us if you have any evaluation questions

Thank You