Benchmarking Adversarial Robustness on Image Classification Yinpeng - - PowerPoint PPT Presentation

benchmarking adversarial robustness on image
SMART_READER_LITE
LIVE PREVIEW

Benchmarking Adversarial Robustness on Image Classification Yinpeng - - PowerPoint PPT Presentation

Benchmarking Adversarial Robustness on Image Classification Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Zihao Xiao, Hang Su, Jun Zhu Dept. of Comp. Sci. and Tech., BNRist Center, Institute for AI, THBI Lab, Tsinghua University, Beijing,


slide-1
SLIDE 1

Benchmarking Adversarial Robustness

  • n Image Classification

Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Zihao Xiao, Hang Su, Jun Zhu

  • Dept. of Comp. Sci. and Tech., BNRist Center, Institute for AI, THBI Lab,

Tsinghua University, Beijing, 100084, China Contact: dyp17@mails.tsinghua.edu.cn; fqa19@mails.tsinghua.edu.cn

slide-2
SLIDE 2

2

Adversarial Examples

An adversarial example is crafted by adding a small perturbation, which is visually indistinguishable from the corresponding normal one, but yet are misclassified by the target model.

Alps: 94.39% Dog: 99.99% Puffer: 97.99% Crab: 100.00%

Figure from Dong et al. (2018).

Adaptive attacks [Athalye et al., 2018] Optimization-based attacks [Carlini and Wagner, 2017] Iterative attacks[kurakin et al., 2016]

Attacks Defenses

Adversarial training with FGSM [Kurakin et al., 2015] One-step attacks [Goodfellow et al., 2014] Defensive distillation [Papernot et al., 2016] Randomization, denoising [Xie et al., 2018; Liao et al., 2018]

There is an “arms race” between attacks and defenses, making it hard to understand their effects.

slide-3
SLIDE 3

3

Robustness Benchmark

n Threat Models: we define complete

threat models

n Attacks: we adopt 15 attacks n Defenses: we adopt 16 defenses on

CIFAR-10 and ImageNet

n Evaluation Metrics:

  • Accuracy (attack success rate) vs.

perturbation budget curves

  • Accuracy (attack success rate) vs.

attack strength curves

slide-4
SLIDE 4

4

Evaluation Results on CIFAR-10

ℓ" norm; untargeted attacks; white-box; accuracy curves

slide-5
SLIDE 5

5

Platform: RealSafe

  • We developed a new platform for adversarial machine learning research

called RealSafe focusing on benchmarking adversarial robustness on image classification correctly & efficiently.

Feature highlights:

  • Modular implementation, which consists of attacks, models, defenses, datasets, and evaluations.
  • Support tensorflow & pytorch models with the same interface.
  • Support 11 attacks & many defenses benchmarked in this work.
  • Provide ready-to-use pre-trained baseline models (8 on ImageNet & 8 on CIFAR10).
  • Provide efficient & easy-to-use tools for benchmarking models with the 2 robustness curves.
  • Available at https://github.com/thu-ml/realsafe (Scan the QR code for this URL).
slide-6
SLIDE 6

Thanks