Benchmarking Adversarial Robustness
- n Image Classification
Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Zihao Xiao, Hang Su, Jun Zhu
- Dept. of Comp. Sci. and Tech., BNRist Center, Institute for AI, THBI Lab,
Benchmarking Adversarial Robustness on Image Classification Yinpeng - - PowerPoint PPT Presentation
Benchmarking Adversarial Robustness on Image Classification Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Zihao Xiao, Hang Su, Jun Zhu Dept. of Comp. Sci. and Tech., BNRist Center, Institute for AI, THBI Lab, Tsinghua University, Beijing,
2
An adversarial example is crafted by adding a small perturbation, which is visually indistinguishable from the corresponding normal one, but yet are misclassified by the target model.
Alps: 94.39% Dog: 99.99% Puffer: 97.99% Crab: 100.00%
Figure from Dong et al. (2018).
Adaptive attacks [Athalye et al., 2018] Optimization-based attacks [Carlini and Wagner, 2017] Iterative attacks[kurakin et al., 2016]
Attacks Defenses
Adversarial training with FGSM [Kurakin et al., 2015] One-step attacks [Goodfellow et al., 2014] Defensive distillation [Papernot et al., 2016] Randomization, denoising [Xie et al., 2018; Liao et al., 2018]
There is an “arms race” between attacks and defenses, making it hard to understand their effects.
3
n Threat Models: we define complete
n Attacks: we adopt 15 attacks n Defenses: we adopt 16 defenses on
n Evaluation Metrics:
4
5
Feature highlights: