by learning the Distributions of Adversarial Examples Boqing Gong - - PowerPoint PPT Presentation

by learning the distributions of adversarial examples
SMART_READER_LITE
LIVE PREVIEW

by learning the Distributions of Adversarial Examples Boqing Gong - - PowerPoint PPT Presentation

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019 Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W.,


slide-1
SLIDE 1

by learning the Distributions of Adversarial Examples

Boqing Gong

Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019

slide-2
SLIDE 2

Intriguing properties of deep neural networks (DNNs)

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.

slide-3
SLIDE 3

Intriguing properties of deep neural networks (DNNs)

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.

slide-4
SLIDE 4

Projected gradient descent (PGD) attack

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236.

slide-5
SLIDE 5

Intriguing results (1)

~100% attack success rates on CIFAR10 & ImageNet

vs

slide-6
SLIDE 6

Intriguing results (2)

slide-7
SLIDE 7

Intriguing results (2)

Adversarial examples generalize between different DNNs

vs

E.g., AlexNet InceptionV3

slide-8
SLIDE 8

Intriguing results (3)

A universal adversarial perturbation

Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

slide-9
SLIDE 9

In a nutshell, white-box adversarial attacks can

Fool different DNNs for almost all test examples

Most data points lie near the classification boundaries.

Fool different DNNs by the same adversarial examples

The classification boundaries of various DNNs are close.

Fool different DNNs by a single universal perturbation

We can turn most examples to adversarial by moving them along the same direction by the same amount.

slide-10
SLIDE 10

However, white-box adversarial attacks can

Not apply to most real-world scenarios Not work when the network architecture is unknown Not work when the weights are unknown Not work when querying networks is (e.g., cost) prohibitive

slide-11
SLIDE 11

Black-box attacks

Panda Panda: 0.88493 Indri: 0.00878 Red Panda: 0.00317 Substitute attack (Papernot

et al., 2017)

Decision-based (Brendel et

al., 2017)

Boundary-tracing (Cheng et

al., 2019)

Zero-th order (Chen et al.,

2017)

Natural evolution strategies (Ilyas et al., 2018)

slide-12
SLIDE 12

Bad local optimum, non-smooth optimization, curse of dimensionality, defense-specific gradient estimation, etc.

The? adversarial perturbation (for an input)

Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML.

slide-13
SLIDE 13

Learns the distribution of adversarial examples (for any input)

Our work

slide-14
SLIDE 14

Learns the distribution of adversarial examples (for an input) Reduces the “attack dimension”

Fewer queries into the network.

Smoothes the optimization

Higher attack success rates.

Characterizes the risk of the input example

New defense methods.

Our work

slide-15
SLIDE 15

Learns the distribution of adversarial examples (for an input)

Our work

> > >

slide-16
SLIDE 16

Learns the distribution of adversarial examples (for an input) A sample from the distribution fails DNN by a high chance.

Our work

slide-17
SLIDE 17

Which family of distributions?

slide-18
SLIDE 18

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

slide-19
SLIDE 19

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

slide-20
SLIDE 20

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

slide-21
SLIDE 21

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

slide-22
SLIDE 22

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

slide-23
SLIDE 23

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

slide-24
SLIDE 24

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

slide-25
SLIDE 25

Black-box

slide-26
SLIDE 26

Experiment setup

Attack 13 defended DNNs & 2 vanilla DNNs Consider both and Examine all test examples of CIFAR10 & 1000 of ImageNet

Excluding those misclassified by the targeted DNN

Evaluate by attack success rates

slide-27
SLIDE 27

Attack success rates, ImageNet

slide-28
SLIDE 28

Attack success rates, CIFAR10

slide-29
SLIDE 29

Attack success rate vs. optimization steps

slide-30
SLIDE 30

Transferabilities of the adversarial examples

slide-31
SLIDE 31

A universally effective defense technique?

Adversarial training / defensive learning The PGD attack

DNN weights

slide-32
SLIDE 32

In a nutshell,

Is a powerful black-box attack, >= white-box attacks Is universal, failed various defenses by the same algorithm Characterizes the distributions of adversarial examples Reduces the “attack dimension” Speeds up the defensive learning (ongoing work)

slide-33
SLIDE 33

Physical adversarial attack

Joint work with Yang Zhang, Hassan Foroosh, & David Phil Published in ICLR 2019 Boqing Gong

slide-34
SLIDE 34

Recall the following result

A universal adversarial perturbation

Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

slide-35
SLIDE 35

Physical attack: universal perturbation → 2D mask

Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn

  • Song. "Robust physical-world attacks on deep learning models." CVPR 2018.
slide-36
SLIDE 36

Physical attack: 2D mask → 3D camouflage

Gradient descent w.r.t. camouflage c in order to minimize detection scores for the vehicle under all feasible locations Non-differentiable

slide-37
SLIDE 37

Repeat until done 1. Camouflage a vehicle 2. Drive it around and take many pictures of it 3. Detect it by Faster-RCNN & save the detection scores

→ Dataset: {(camouflage, vehicle, background, detection score)}

Physical attack: 2D mask → 3D camouflage

slide-38
SLIDE 38

Fit a DNN to predict any camouflage’s corresponding detection scores

Physical attack: 2D mask → 3D camouflage

slide-39
SLIDE 39

Physical attack: 2D mask → 3D camouflage

Gradient descent w.r.t. Camouflage c in order to minimize detection scores for the vehicle under all feasible locations Non-differentiable But approximated by a DNN

slide-40
SLIDE 40
slide-41
SLIDE 41

Why do we care?

slide-42
SLIDE 42

Observation, re-observation, & future work

Defended DNNs are still vulnerable to transfer attacks (only to

some moderate degree though)

Adversarial examples from black-box attacks are less transferable than those from white-box attacks All future work on defenses will adopt adversarial training Adversarial training will become faster (we are working on it) We should certify DNNs’ expected robustness by

slide-43
SLIDE 43

New works to watch

Stateful DNNs: Goodfellow (2019). A Research Agenda: Dynamic Models to Defend

Against Correlated Attacks. arXiv: 1903.06293.

Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples Are

Not Bugs, They Are Features. arXiv:1905.02175.

Faster adversarial training: Zhang et al. (2019). You Only Propagate Once:

Painless Adversarial Training Using Maximal Principle. arXiv:1905.00877. && Shafahi et al. (2019). Adversarial Training for Free! arXiv: 1904.12843.

Certifying DNNs’ expected robustness: Webb et al. (2019). A Statistical

Approach to Assessing Neural Network Robustness. ICLR. && Cohen et al. (2019). Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918.