Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang
Attacks Meet Interpretability: Attribute-steered Detection of - - PowerPoint PPT Presentation
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( 50 times)
Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang
2
Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Pixel-wise Differences (×50 times)
2
Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)
2
Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)
2
Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)
2
Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)
2
Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)
perceptible attributes?
2
Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)
3
3
Input
3
1
Landmark generation Input
3
1
Landmark generation
2
✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …
Attribute annotation Input
3
1
Landmark generation
2
✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …
Attribute annotation
3
Attribute witness extraction Input
3
1
Landmark generation
2
✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …
Attribute annotation
3
Attribute witness extraction
4
Attribute-steered model Input
3
1
Landmark generation
2
✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …
Attribute annotation
3
Attribute witness extraction
4
Attribute-steered model Original model Input
3
1
Landmark generation
2
✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …
Attribute annotation
3
Attribute witness extraction
4
Attribute-steered model Original model
⊖
Input
5
Consistency
3
1
Landmark generation
2
✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …
Attribute annotation
3
Attribute witness extraction
4
Attribute-steered model Original model
⊖
Input
5
Consistency
4
4
4
4
4
5
5
Input
5
Input
Model
Attribute substitution
A
Model
⊖
C
Feature variants
5
Input
Model
Attribute substitution Attribute preservation
A
B
Model Model Model
⊖ ⊖
C
D
Feature variants Feature invariants
5
Input
Model
Attribute substitution Attribute preservation
A
B
Model Model Model
⊖ ⊖
C
D
Feature variants Feature invariants
E
Attribute witnesses
6
6
6
neurons in each layer
6
neurons in each layer
6
neurons in each layer
positives on benign inputs
6
neurons in each layer
positives on benign inputs
accuracy with 23.3% false positives for face recognition systems