Tutorial on Adversarial Machine Learning with CleverHans Nicholas - - PowerPoint PPT Presentation

tutorial on adversarial machine learning with cleverhans
SMART_READER_LITE
LIVE PREVIEW

Tutorial on Adversarial Machine Learning with CleverHans Nicholas - - PowerPoint PPT Presentation

@NicolasPapernot Tutorial on Adversarial Machine Learning with CleverHans Nicholas Carlini University of California, Berkeley Nicolas Papernot Pennsylvania State University Did you git clone https://github.com/carlini/odsc_adversarial_nn ?


slide-1
SLIDE 1

Tutorial on Adversarial Machine Learning with CleverHans

Nicholas Carlini

University of California, Berkeley

Nicolas Papernot

Pennsylvania State University Did you git clone https://github.com/carlini/odsc_adversarial_nn? November 2017 - ODSC

@NicolasPapernot

slide-2
SLIDE 2

If you have not already: git clone https://github.com/carlini/odsc_adversarial_nn cd odsc_adversarial_nn python test_install.py

Getting setup

2

slide-3
SLIDE 3

Why neural networks?

3

slide-4
SLIDE 4

4

Machine Learning Classifier

[0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01] [p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)]

f(x,θ) x Classifier: map inputs to one class among a predefined set

Classification with neural networks

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

8

D E F G H I J K L M N O P Q R A B C S T

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

Machine Learning Classifier

[0 1 0 0 0 0 0 0 0 0] [0 1 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 1 0 0] [0 0 0 0 0 0 0 0 0 1] [0 0 0 1 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 0 0 0] [0 1 0 0 0 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0]

Learning: find internal classifier parameters θ that minimize a cost/loss function (~model error)

slide-12
SLIDE 12

NNs give better results than any other approach But there’s a catch ...

12

slide-13
SLIDE 13

13

Adversarial examples

[GSS15] Goodfellow et al. Explaining and Harnessing Adversarial Examples

slide-14
SLIDE 14

14

slide-15
SLIDE 15

Crafting adversarial examples: fast gradient sign method

15

[GSS15] Goodfellow et al. Explaining and Harnessing Adversarial Examples

During training, the classifier uses a loss function to minimize model prediction errors After training, attacker uses loss function to maximize model prediction error 1. Compute its gradient with respect to the input of the model 2. Take the sign of the gradient and multiply it by a threshold

slide-16
SLIDE 16

Transferability

16

slide-17
SLIDE 17

Not specific to neural networks

Logistic regression SVM Nearest Neighbors Decision Trees

17

slide-18
SLIDE 18

Machine Learning with TensorFlow

import tensorflow as tf sess = tf.Session() five = tf.constant(5) six = tf.constant(6) sess.run(five+six) # 11

18

slide-19
SLIDE 19

Machine Learning with TensorFlow

import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32, []) added = five+number sess.run(added, {number: 6}) # 11 sess.run(added, {number: 8}) # 13

19

slide-20
SLIDE 20

Machine Learning with TensorFlow

import tensorflow as tf number = tf.placeholder(tf.float32, []) squared = number * number derivative = tf.gradients(squared, [number])[0] sess.run(derivative, {number: 5}) # 10

20

slide-21
SLIDE 21

Classifying ImageNet with the Inception Model [Hands On]

21

slide-22
SLIDE 22

Attacking ImageNet

22

slide-23
SLIDE 23

23

slide-24
SLIDE 24

24

Growing community 1.3K+ stars 300+ forks 40+ contributors

slide-25
SLIDE 25

Attacking the Inception Model for ImageNet [Hands On]

python attack.py Replace panda.png with adversarial_panda.png python classify.py Things to try: 1. Replace the given image of a panda with your own image 2. Change the target label which the adversarial example should be classified as

25

slide-26
SLIDE 26

Adversarial Training

26

Training

7 2

slide-27
SLIDE 27

Adversarial Training

27

Attack

Attack

7 2

slide-28
SLIDE 28

Adversarial Training

28

Training

7 2 7 2

slide-29
SLIDE 29

Adversarial training

Intuition: injecting adversarial example during training with correct labels Goal: improve model generalization outside of training manifold

29

Figure by Ian Goodfellow

Training time (epochs)

slide-30
SLIDE 30

Adversarial training

Intuition: injecting adversarial example during training with correct labels Goal: improve model generalization outside of training manifold

30

Figure by Ian Goodfellow

Training time (epochs)

slide-31
SLIDE 31

Adversarial training

Intuition: injecting adversarial example during training with correct labels Goal: improve model generalization outside of training manifold

31

Figure by Ian Goodfellow

Training time (epochs)

slide-32
SLIDE 32

Adversarial training

Intuition: injecting adversarial example during training with correct labels Goal: improve model generalization outside of training manifold

32

Figure by Ian Goodfellow

Training time (epochs)

slide-33
SLIDE 33

Efficient Adversarial Training through Loss Modification

33

Small when prediction is correct on legitimate input

slide-34
SLIDE 34

Efficient Adversarial Training through Loss Modification

34

Small when prediction is correct on legitimate input Small when prediction is correct on adversarial input

slide-35
SLIDE 35

Adversarial Training Demo

35

slide-36
SLIDE 36

Attacking remotely hosted black-box models

36

Remote ML sys “0” “1” “4” (1) The adversary queries remote ML system for labels on inputs of its choice.

slide-37
SLIDE 37

37

Remote ML sys Local substitute (2) The adversary uses this labeled data to train a local substitute for the remote system.

Attacking remotely hosted black-box models

“0” “1” “4”

slide-38
SLIDE 38

38

Remote ML sys Local substitute (3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations.

Attacking remotely hosted black-box models

“0” “2” “9”

slide-39
SLIDE 39

39

Remote ML sys Local substitute “yield sign” (4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.

Attacking remotely hosted black-box models

slide-40
SLIDE 40

40

Defended model Undefended model “yield sign” (4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.

Attacking with transferability

slide-41
SLIDE 41

Attacking Adversarial Training with Transferability Demo

41

slide-42
SLIDE 42

How to test your model for adversarial examples?

White-box attacks

  • One shot

FastGradientMethod

  • Iterative/Optimization-based

BasicIterativeMethod, CarliniWagnerL2 Transferability attacks

  • Transfer from undefended
  • Transfer from defended

42

slide-43
SLIDE 43

Defenses

Adversarial training:

  • Original variant
  • Ensemble adversarial training
  • Madry et al.

Reduce dimensionality of input space:

  • Binarization of the inputs
  • Thermometer-encoding

43

slide-44
SLIDE 44

44

Adversarial examples represent worst-case distribution drifts

[DDS04] Dalvi et al. Adversarial Classification (KDD)

slide-45
SLIDE 45

45

Adversarial examples are a tangible instance of hypothetical AI safety problems

Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg

slide-46
SLIDE 46

46

How to reach out to us? Nicholas Carlini nicholas@carlini.com Nicolas Papernot nicolas@papernot.fr