Obfuscated Gradients Give a False Sense of Security: Circumventing - - PowerPoint PPT Presentation

obfuscated gradients
SMART_READER_LITE
LIVE PREVIEW

Obfuscated Gradients Give a False Sense of Security: Circumventing - - PowerPoint PPT Presentation

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Anish Athalye* 1 , Nicholas Carlini * 2 , and David Wagner 3 1 Massachusetts Institute of Technology 2 University of California, Berkeley (now


slide-1
SLIDE 1

Obfuscated Gradients Give a False Sense of Security:

Circumventing Defenses to Adversarial Examples

Anish Athalye*1, Nicholas Carlini*2 , and David Wagner3

1 Massachusetts Institute of Technology 2 University of California, Berkeley (now Google Brain) 3 University of California, Berkeley

slide-2
SLIDE 2

How and Why

slide-3
SLIDE 3

Act I

Background: Adversarial Examples for Neural Networks

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9

Why should we care about adversarial examples?

Make ML robust Make ML better

slide-10
SLIDE 10
slide-11
SLIDE 11

13 total defense papers at ICLR'18 9 are white-box, non-certified 6 of these are broken 
 (~0% accuracy) 1 of these is partially broken

slide-12
SLIDE 12
slide-13
SLIDE 13

How did we evade them? Why we able to evade them?

slide-14
SLIDE 14
slide-15
SLIDE 15

Act II

HOW: Our Attacks

slide-16
SLIDE 16

How do we generate adversarial examples?

slide-17
SLIDE 17

neural network loss


  • n the given input

the perturbation is less than a given threshold MAXIMIZE SUCH THAT

slide-18
SLIDE 18

Why can we generate adversarial examples (with gradient descent)?

slide-19
SLIDE 19

Dog Truck Airplane

slide-20
SLIDE 20

( (

slide-21
SLIDE 21
slide-22
SLIDE 22

We find that 7 of 9 ICLR defenses rely on the same artifact:

  • bfuscated gradients
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26

"Fixing" Gradient Descent

[0.1,
 0.3, 0.0,
 0.2, 0.4]

slide-27
SLIDE 27
slide-28
SLIDE 28

Act III

WHY:
 Evaluation Methodology

slide-29
SLIDE 29

Serious effort
 to evaluate
 By space, most papers are ½ evaluation

slide-30
SLIDE 30

What went wrong then?

slide-31
SLIDE 31

acc, loss = model.evaluate(
 x_test, y_test)

Is no longer sufficient.

slide-32
SLIDE 32

There is no single test set for security

slide-33
SLIDE 33

The only thing that matters is robustness against an adversary
 targeting the defense

slide-34
SLIDE 34
slide-35
SLIDE 35

The purpose of a defense evaluation is NOT to show the defense is RIGHT

slide-36
SLIDE 36

The purpose of a defense evaluation is to FAIL to show the defense is WRONG

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39

Act IV

Making & Measuring
 Progress

slide-40
SLIDE 40

Strive for simplicity

  • ver complexity
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45

What metric should we optimize?

slide-46
SLIDE 46

Threat Model The set of assumptions
 we place on the adversary

slide-47
SLIDE 47

In the context of adversarial examples:

  • 1. Perturbation Bounds & Measure
  • 2. Model Access & Knowledge
slide-48
SLIDE 48

The threat model MUST assume the attacker has read the paper and knows the defender is using those techniques to defend.

slide-49
SLIDE 49

Metrics for Success

Accuracy under existing threat models More permissive threat models

slide-50
SLIDE 50

"making the attacker think more" 
 is not (usually) progress The threat model doesn't limit the attacker's approach

slide-51
SLIDE 51
slide-52
SLIDE 52

Act V

Conclusion

slide-53
SLIDE 53

A paper can only do so much in an evaluation.

slide-54
SLIDE 54

A paper can only do so much in an evaluation. We need more re-evaluation papers.

slide-55
SLIDE 55

"Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break."

  • - Bruce Schneier

So you want to build a defense?

slide-56
SLIDE 56

So you want to build a defense?

As a corollary: learn to break defenses before you try to build them If you can't break the state-of-the-art, you are unlikely to be able to build on it

slide-57
SLIDE 57

Challenging Suggestions

Defense-GAN on MNIST We were able to break it only partially Samangouei et al. 2018 ("Defense-GAN...") "Strong" Adversarial Training on CIFAR We were not able to break it at all Madry et al. 2018 ("Towards Deep...")

slide-58
SLIDE 58

Email us Anish: aathalye@mit.edu Me: nicholas@carlini.com Visit our poster & originally scheduled talk
 (Today, #110) & (Tomorrow, A7 @ 2:50)

Track Progress robust-ml.org Source Code git.io/obfuscated-gradients

slide-59
SLIDE 59
slide-60
SLIDE 60
slide-61
SLIDE 61

Did we get it right?

1. We reproduced the original claims against the
 (weak) attacks initially attempted
 2. We showed the papers authors' our results
 3. It's possible we didn't. But our code is public:


https://github.com/anishathalye/obfuscated-gradients

slide-62
SLIDE 62

Isn't this just gradient masking?

The short answer: No, if it were, we wouldn't
 have seen 7 of 9 ICLR defenses relying on it.

slide-63
SLIDE 63

X defense has multiple parts, but you only broke each part separately.

  • True. Usually, an ensemble several

weaker defenses is not an effective defense strategy, unless there is an
 argument they cover each other's weaknesses.

He et al. "Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong". WOOT'18.

slide-64
SLIDE 64

Did you try X with adversarial training?

Not usually. In some cases the combination is worse than adversarial training alone

slide-65
SLIDE 65

Specific advice for performing evaluations

  • Carlini et al. 2017 & S&P ("Towards Evaluating ...")
  • Athalye et al. 2018 @ ICML ("Obfuscated ...")
  • Madry et al. 2018 @ ICLR ("Towards Deep...")
  • Uesato et al. 2018 @ ICML ("Adversarial Risk...")

Details in our originally-scheduled talk, Tomorrow @ 2:50 in A7

slide-66
SLIDE 66

There is a true notion of robustness, for a computationally unbounded adversary. We are forced to approximate this.

Adversarial Risk and the Dangers of Evaluating Against Weak Attacks. 
 Jonathan Uesato, Brendan O'Donoghue, Aaron van den Oord, Pushmeet Kohli. 
 ICML 2018.

slide-67
SLIDE 67