Featur ure Deno noising ng for r Impr provi ving ng Adv - - PowerPoint PPT Presentation
Featur ure Deno noising ng for r Impr provi ving ng Adv - - PowerPoint PPT Presentation
Featur ure Deno noising ng for r Impr provi ving ng Adv dversari rial Robus bustne ness Cihang Xie Johns Hopkins University Background Towards Robust Adversarial Defense Deep networks are Go Good Deep Label: King Penguin
- Background
- Towards Robust Adversarial Defense
Deep Networks
Label: King Penguin
Deep networks are Go Good
Deep Networks
Label: King Penguin Label: Chihuahua
Deep networks are FR FRAGILE to small & carefully crafted perturbations
Deep networks are FR FRAGILE to small & carefully crafted perturbations
We call such images as
Adversarial Examples
Adversarial Examples can exist on Di Differ eren ent Tasks
semantic segmentation pose estimation text classification
[1] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. "Adversarial examples for semantic segmentation and object detection." In ICCV. 2017. [2] Moustapha Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. "Houdini: Fooling deep structured prediction models." In NeurIPS. 2018. [3] Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. "HotFlip: White-Box Adversarial Examples for Text Classification." In ACL. 2018.
Adversarial Examples can be created other than Addi Adding ng Pe Perturbation
[4] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. "Spatially transformed adversarial examples." In ICLR. 2018. [5] Jianyu Wang, Zhishuai Zhang, Cihang Xie, et al. "Visual concepts and compositional voting." In Annals of Mathematical Sciences and Applications. 2018 .
bird : 0.663 bird : 0.629 bird : 0.628 person : 0.736 tvmonitor : 0.998 bird : 0.342 person : 0.817
Adversarial Examples can exist on Th The Physic ical al World ld
[6] Lifeng Huang, Chengying Gao, Yuyin Zhou, Changqing Zou, Cihang Xie, Alan Yuille, Ning Liu. "UPC: Learning Universal Physical Camouflage Attacks on Object Detectors," Arxiv, 2019
Generating Adversarial Example is SI SIMPL PLE:
maximize loss(f(x+r), ytrue; θ)
Maximize the loss function w.r.t. Adversarial Perturbation r
Generating Adversarial Example is SI SIMPL PLE:
maximize loss(f(x+r), ytrue; θ) minimize loss(f(x), ytrue; θ);
Maximize the loss function w.r.t. Adversarial Perturbation r Minimize the loss function w.r.t. Network Parameters θ
- Background
- Towards Robust Adversarial Defense
Deep Networks
Label: King Penguin
1 2 3 4 1 2 3 4 1 2 3 1 2 3 1 2 3 1 2 3
clean adversarial
Clean Adversarial
Ob Obser ervation
- n: Adversarial perturbations are SM
SMALL on the pixel space
1 2 3 4 1 2 3 4 1 2 3 1 2 3 1 2 3 1 2 3
clean adversarial
Clean Adversarial
Ob Obser ervation
- n: Adversarial perturbations are BI
BIG on the feature space
1 2 3 4 1 2 3 4 1 2 3 1 2 3 1 2 3 1 2 3
clean adversarial
Clean Adversarial
Ob Obser ervation
- n: Adversarial perturbations are BI
BIG on the feature space
We should DENOISE these feature maps
Ou Our Sol
- lution
- n: Den
Denoi
- ising at fea
eature e level el
Tradition
- nal Image Denoi
- ising
g Operation
- ns:
Local filters (predefine a local region Ω " for each pixel i):
- Bilateral filter #$ =
& '()*) ∑∀.∈0 $ 1 2$, 2. 2.
- Median filter #$ = 456"78 ∀9 ∈ Ω " : 2.
- Mean filter #$ =
& '()*) ∑∀.∈0 $ 2.
Non-local filters (the local region Ω " is the whole image I):
- Non-local means #$ =
& '()*) ∑∀.∈; 1 2$, 2. 2.
Denoising Block ck Design
1×1 conv
denoising
- peration
Denoising operations may lose information
- we add a residual connection to balance the tradeoff between
removing noise and retaining original signal
Tr Training Strategy: Adversarial training
- Core Idea: train with adversarial examples
min θ max
&
loss(f(x+r),ytrue; θ)
max step: generate adversarial perturbation
Tr Training Strategy: Adversarial training
- Core Idea: train with adversarial examples
min θ max
&
loss(f(x+r),ytrue; θ)
max step: generate adversarial perturbation min step: optimize network parameters
Tr Training Strategy: Adversarial training
- Core Idea: train with adversarial examples
Tw Two Ways for Evaluating Robustness
Defending Against White-box Attacks
- Attackers know everything about models
- Directly maximize loss(f(x+r), ytrue; θ)
Tw Two Ways for Evaluating Robustness
Defending Against White-box Attacks
- Attackers know everything about models
- Directly maximize loss(f(x+r), ytrue; θ)
Defending Against Blind Attacks
- Attackers know nothing about models
- Attackers generate adversarial examples using substitute networks
(rely on transferability)
De Defen ending Against White-box x Attacks
- Evaluating against adversarial attackers with attack iteration up to 2000
(more attack iterations indicate stronger attacks)
De Defen ending Against White-box x Attacks – Pa Part I
10 100 200 400 600 800 1000 1200 1400 1600 1800 2000
attack iterations
25 30 35 40 45 50 55
accuracy (%)
ALP
41.7 40.4 39.6 38.9 39.2 27.9
ALP
41.7 40.4 39.6 38.9 39.2 27.9
2000-iter PGD attack ALP, Inception-v3
- urs, R-152 baseline
A successful adversarial training can give us a STRONG baseline
De Defen ending Against White-box x Attacks – Pa Part I
10 100 200 400 600 800 1000 1200 1400 1600 1800 2000
attack iterations
25 30 35 40 45 50 55
accuracy (%)
ALP
45.5 44.4 43.3 42.8 42.6 41.7 40.4 39.6 38.9 39.2 27.9
2000-iter PGD attack ALP
45.5 44.4 43.3 42.8 42.6 41.7 40.4 39.6 38.9 39.2 27.9
2000-iter PGD attack ALP, Inception-v3
- urs, R-152 baseline
- urs, R-152 denoise
Feature Denoising can give us
additional benefits
De Defen ending Against White-box x Attacks – Pa Part II
10 20 30 40 50 60 70 80 90 100
attack iterations
42 44 46 48 50 52 54 56 58 60 62
accuracy (%) 55.7 45.5 53.5 43.4 52.5 41.7 55.7 45.5 53.5 43.4 52.5 41.7
ResNet-152 baseline +4 bottleneck (ResNet-164) +4 denoise: null (1x1 only) +4 denoise: 3x3 mean +4 denoise: 3x3 median +4 denoise: bilateral, dot prod +4 denoise: bilateral, gaussian +4 denoise: nonlocal, dot prod +4 denoise: nonlocal, gaussian
All denoising operations can help
De Defen ending Against White-box x Attacks – Pa Part III
10 20 30 40 50 60 70 80 90 100
attack iterations
42 44 46 48 50 52 54 56 58 60 62
accuracy (%) 55.7 45.5 52.5 41.7 55.7 45.5 52.5 41.7 57.3 46.1 57.3 46.1
ResNet-152 ResNet-152, denoise ResNet-638
Feature Denoising is nearly as powerful as adding ~500 additional layers
De Defen ending Against White-box x Attacks – Pa Part III
10 20 30 40 50 60 70 80 90 100
attack iterations
42 44 46 48 50 52 54 56 58 60 62
accuracy (%) 55.7 45.5 52.5 41.7 55.7 45.5 52.5 41.7 61.3 49.9 61.3 49.9 57.3 46.1 57.3 46.1
ResNet-152 ResNet-152, denoise ResNet-638 ResNet-638, denoise*
Feature Denoising can still provide benefits for the VERY deep ResNet-638
De Defen ending Against Blind Attacks
- Offline evaluation against 5 BEST attackers from NeurIPS Adversarial Competition 2017
- Online competition against 48 UNKNOWN attackers in CAAD 2018
De Defen ending Against Blind Attacks
- Offline evaluation against 5 BEST attackers from NeurIPS Adversarial Competition 2017
- Online competition against 48 UNKNOWN attackers in CAAD 2018
CAAD 2018 “all or nothing” criterion: an image is considered correctly classified only if the model correctly classifies all adversarial versions of this image created by all attackers
De Defen ending Against Blind Attacks ---
- -- CA
CAAD 2 2017 O Offline E Evaluation
De Defen ending Against Blind Attacks ---
- -- CA
CAAD 2 2017 O Offline E Evaluation
De Defen ending Against Blind Attacks ---
- -- CA
CAAD 2 2017 O Offline E Evaluation
De Defen ending Against Blind Attacks ---
- -- CA
CAAD 2 2018 O Online Co Competition
10 20 30 40 50 1st 2nd 3rd 4th 5th 50.6 40.8 8.6 3.6 0.6
Vi Visualization
Before denoising After denoising Adversarial Examples
0.2 0.4 0.6 0.8 0.5 1 1.5 0.6 1.2 1.8 2.4
Denoising Operations