Adversarial Examples are Not Easily Detected: Bypassing Ten - - PowerPoint PPT Presentation

adversarial examples are not easily detected bypassing
SMART_READER_LITE
LIVE PREVIEW

Adversarial Examples are Not Easily Detected: Bypassing Ten - - PowerPoint PPT Presentation

Adversarial Examples are Not Easily Detected: Bypassing Ten Detection Methods Nicholas Carlini, David Wagner University of California, Berkeley Background Neural Networks I assume knowledge of neural networks ... This talk: neural


slide-1
SLIDE 1

Adversarial Examples are Not Easily Detected: Bypassing Ten Detection Methods

Nicholas Carlini, David Wagner University of California, Berkeley

slide-2
SLIDE 2
slide-3
SLIDE 3

Background

slide-4
SLIDE 4

Neural Networks

  • I assume knowledge of neural networks ...
  • This talk: neural networks for classification
  • Specifically image-based classification
slide-5
SLIDE 5

Background: Adversarial Examples

  • Given an input X classified as label T ...
  • ... it is easy to find an X′ close to X
  • ... so that F(X′) != T
slide-6
SLIDE 6

Constructing Adversarial Examples

  • Formulation: given input x, find x′ where


minimize d(x,x′) + L(x′)
 such that x′ is "valid"


  • Where L(x′) is a loss function minimized when


F(x′) != T and maximized when F(x′) = T

  • Solve via gradient descent
slide-7
SLIDE 7

Normal Adversarial MNIST

7 9 8 8

slide-8
SLIDE 8

Normal Adversarial CIFAR-10 Truck Airplane

slide-9
SLIDE 9

This is decidedly bad

slide-10
SLIDE 10

But also: ripe opportunity for research!

slide-11
SLIDE 11

Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification. Xiaoyu Cao, Neil Zhenqiang Gong APE-GAN: Adversarial Perturbation Elimination with GAN. Shiwei Shen, Guoqing Jin, Ke Gao, Yongdong Zhang A Learning Approach to Secure Learning. Linh Nguyen, Arunesh Sinha EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples. Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks. Thilo Strauss, Markus Hanselmann, Andrej Junginger, Holger Ulmer MagNet: a Two-Pronged Defense against Adversarial Examples. Dongyu Meng, Hao Chen CuRTAIL: ChaRacterizing and Thwarting AdversarIal deep Learning. Bita Darvish Rouhani, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar Efficient Defenses Against Adversarial Attacks. Valentina Zantedeschi, Maria-Irina Nicolae, Ambrish Rawat Learning Adversary-Resistant Deep Neural Networks. Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander

  • G. Ororbia II, Xinyu Xing, Xue Liu, C. Lee Giles

SafetyNet: Detecting and Rejecting Adversarial Examples Robustly. Jiajun Lu, Theerasit Issaranon, David Forsyth Enhancing Robustness of Machine Learning Systems via Data Transformations. Arjun Nitin Bhagoji, Daniel Cullina, Bink Sitawarin, Prateek Mittal Towards Deep Learning Models Resistant to Adversarial Attacks. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu Towards Robust Deep Neural Networks with BANG. Andras Rozsa, Manuel Gunther, Terrance E. Boult Deep Variational Information Bottleneck. Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles. Jiajun Lu, Hussein

slide-12
SLIDE 12

Research Question: Which of these defenses are robust?

slide-13
SLIDE 13
slide-14
SLIDE 14

Focus of this talk: detection schemes

slide-15
SLIDE 15

7

Normal Classifier

Classifier

slide-16
SLIDE 16

Normal Classifier

8

Classifier

slide-17
SLIDE 17

7

Detector & Classifier

Detector Classifier

slide-18
SLIDE 18

Detector & Classifier

Classifier Detector

slide-19
SLIDE 19

This Talk:

  • 1. How to evaluate a defense
  • 2. Comment on explored directions
slide-20
SLIDE 20
slide-21
SLIDE 21

Defense #1: PCA-based detection

Dan Hendrycks and Kevin Gimpel. 2017. Early Methods for Detecting Adversarial

  • Images. In International Conference on Learning Representations (Workshop Track)
slide-22
SLIDE 22

PCA-based detection

  • Hypothesis: Adversarial examples rely on later

principle components

  • ... and valid images don't ...
  • ... so let's detect use of high components
slide-23
SLIDE 23

Normal Adversarial

slide-24
SLIDE 24

It works!

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27

Attack:

Only modify regions of the image that are also used in normal images.

slide-28
SLIDE 28

Original Adversarial (unsecured) Adversarial (with detector)

slide-29
SLIDE 29

Lesson 1: Separate the artifacts of one attack vs intrinsic properties of adversarial examples

slide-30
SLIDE 30

Lesson 2: MNIST is insufficient CIFAR is better

slide-31
SLIDE 31
slide-32
SLIDE 32

Defense #2: Additional Neural Network Detection

Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischo. 2017. On Detecting Adversarial Perturbations. In International Conference on Learning Representations.

slide-33
SLIDE 33

F

Normal Training

( , ) ( , ) 7 3

Training

slide-34
SLIDE 34

( , )

Adversarial Training

( , ) 7 3

Attack

n ( , ) n ( , )

slide-35
SLIDE 35

( , n) ( , n) n

G

Adversarial Training

n ( , ) ( , ) y y

Training

slide-36
SLIDE 36

Sounds great.

slide-37
SLIDE 37

Sounds great.

But we already know it's easy to fool neural networks ...

slide-38
SLIDE 38

... so just construct adversarial examples to


  • 1. be misclassified
  • 2. not be detected
slide-39
SLIDE 39

Breaking Adversarial Training

  • minimize d(x,x′) + L(x′)


such that x′ is "valid"

  • Old: L(x′) measures loss of classifier on x′



 


slide-40
SLIDE 40

Breaking Adversarial Training

  • minimize d(x,x′) + L(x′) + M(x′)


such that x′ is "valid"

  • Old: L(x′) measures loss of classifier on x′
  • New: M(x′) measures loss of detector on x′

slide-41
SLIDE 41

Original Adversarial (unsecured) Adversarial (with detector)

slide-42
SLIDE 42

Lesson 3: Minimize over (compute gradients through) the full defense

slide-43
SLIDE 43
slide-44
SLIDE 44

Defense #3: Network Randomization

Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting Adversarial Samples from Artifacts.

slide-45
SLIDE 45

7

Randomized Classifier

7

Classifier

slide-46
SLIDE 46

Randomized Classifier

3 2 6 7

Classifier

slide-47
SLIDE 47

Breaking Randomization

  • minimize d(x,x′) + L(x′)


such that x′ is "valid"

  • Old: L(x′) measures loss of network on x′



 


slide-48
SLIDE 48

Breaking Randomization

  • minimize d(x,x′) + E[L(x′)]


such that x′ is "valid"

  • Old: L(x′) measures loss of network on x′
  • Now: E[L(x′)] expected loss of network on x′

slide-49
SLIDE 49
slide-50
SLIDE 50

Original Adversarial (unsecured) Adversarial (with detector)

slide-51
SLIDE 51

Original Adversarial (unsecured) Adversarial (with detector)

slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56
  • 1. Don't evaluate only on MNIST
  • 2. Minimize over the full defense
  • 3. Use a strong iterative attack
  • 4. Release your source code!

Evaluation Lessons

https://nicholas.carlini.com/nn_breaking_detection

slide-57
SLIDE 57