Developments in Adversarial Machine Learning Florian Tramr - PowerPoint PPT Presentation

Developments in Adversarial Machine Learning Florian Tramèr September 19 th 2019 Based on joint work with Jens Behrmannn, Dan Boneh, Nicholas Carlini, Edward Chou, Pascal Dupré, Jörn-Henrik Jacobsen, Nicolas Papernot, Giancarlo Pellegrino, Gili Rusak

Adversarial (Examples in) ML Maybe we GANs vs Adversarial Examples need to write 10x more papers 10000+ papers 2019 2017 2013 2014 1000+ papers N. Carlini, “Recent Advances in Adversarial Machine Learning”, ScAINet 2019 2

Adversarial Examples Szegedy et al., 2014 Goodfellow et al., 2015 Athalye, 2017 88% Tabby Cat 99% Guacamole How? Training ⟹ “tweak model parameters such that 𝑔( ) = 𝑑𝑏𝑢 ” • Attacking ⟹ “tweak input pixels such that 𝑔( ) = 𝑕𝑣𝑏𝑑𝑏𝑛𝑝𝑚𝑓 ” • Why? Concentration of measure in high dimensions? • [Gilmer et al., 2018, Mahloujifar et al., 2018, Fawzi et al., 2018, Ford et al., 2019] Well generalizing “superficial” statistics? • [Jo & Bengio 2017, Ilyas et al., 2019, Gilmer & Hendrycks 2019] 3

Defenses • A bunch of failed ones... • Adversarial Training [ Szegedy et al., 2014, Goodfellow et al., 2015, Madry et al., 2018 ] Þ For each training input ( x , y), find worst-case adversarial input 345637 Loss(𝑔 𝒚 ; , 𝑧) 𝒚’ ∈ 2(𝒚) A set of allowable perturbations of x Þ Train the model on ( x ’, y) e.g., { x ’ : || x - x ’ || ∞ ≤ ε } Worst-case data augmentation • Certified Defenses [Raghunathan et al., 2018, Wong & Kolter 2018] Þ Certificate of provable robustness for each point Þ Empirically weaker than adversarial training 4

L p robustness: An Over-studied Toy Problem? Neural networks aren’t robust. Consider this simple “expectimax L p ” game: 1. Sample random input from test set 2. Adversary perturbs point within small L p ball 3. Defender classifies perturbed point 2015 This was just a toy threat model ... Solving this won’t magically make ML more “secure” 2019 and 1000+ papers later Ian Goodfellow, “The case for dynamic defenses against adversarial examples”, SafeML ICLR Workshop, 2019 5

Limitations of the “expectimax L p ” Game 1. Sample random input from test set • What if model has 99% accuracy and adversary always picks from the 1%? (test-set attack, [Gilmer et al., 2018] ) 2. Adversary perturbs point within L p ball • Why limit to one L p ball? • How do we choose the “right” L p ball? • Why “imperceptible” perturbations? 3. Defender classifies perturbed point • Can the defender abstain? (attack detection) • Can the defender adapt? Ian Goodfellow, “The case for dynamic defenses against adversarial examples”, SafeML ICLR Workshop, 2019 6

A real-world example of the “expectimax L p ” threat model: Perceptual Ad-blocking • Ad-blocker’s goal: classify images as ads • Attacker goals: - Perturb ads to evade detection (False Negative) - Perturb benign content to detect ad-blocker (False Positive) 1. Can the attacker run a “test-set attack”? • No! (or ad designers have to create lots of random ads...) 2. Should attacks be imperceptible? • Yes! The attack should not affect the website user • Still, many choices other than L p balls 3. Is detecting attacks enough? • No! Attackers can exploit FPs and FNs T et al. , “AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning”, CCS 2019 7

Limitations of the “expectimax L p ” Game 1. Sample random input from test set 2. Adversary perturbs point within L p ball • Why limit to one L p ball? • How do we choose the “right” L p ball? • Why “imperceptible” perturbations? 3. Defender classifies perturbed point • Can the defender abstain? (attack detection) 8

Robustness for Multiple Perturbations Do defenses (e.g., adversarial training) generalize across perturbation types? 99 99 99 99 95 91 100 79 80 60 MNIST: 40 12 12 20 9 0 0 0 0 0 0 0 Acc Acc on L ∞ Acc on L1 Acc on RT Standard Training Train against L ∞ Train against L1 Train against RT Robustness to one perturbation type ≠ robustness to all Robustness to one type can increase vulnerability to others T & Boneh , “Adversarial Training and Robustness for Multiple Perturbations”, NeurIPS 2019 10

The multi-perturbation robustness trade-off If there exist models with high robust accuracy for perturbation sets 𝑇 1 , 𝑇 2 , … , 𝑇𝑜 , does there exist a model G 𝑇 𝑗 ? robust to perturbations from ⋃ DEF Answer: in general, NO! Robust for S 1 Not robust for S 2 There exist “mutually exclusive perturbations” (MEPs) x 1 Classifier Classifier vulnerable (robustness to S 1 implies vulnerability robust to S1 to S1 and S2 to S 2 and vice-versa) Not robust for S 1 x 2 Robust for S 2 Formally, we show that for a simple Gaussian binary classification task: Classifier robust to S2 • L 1 and L ∞ perturbations are MEPs • L ∞ and spatial perturbations are MEPs T & Boneh , “Adversarial Training and Robustness for Multiple Perturbations”, NeurIPS 2019 11

Empirical Evaluation Can we train models to be robust to multiple perturbation types simultaneously? Adversarial training for multiple perturbations: Þ For each training input ( x , y), find worst-case adversarial input Loss(𝑔 𝒚 ; , 𝑧) 345637 L 𝒚’ ∈ ⋃ IJK 2 D Scales linearly in number Þ “Black-box” approach: of perturbation sets 345637 Loss(𝑔 𝒚 ; , 𝑧) 345637 Loss 𝑔 𝒚 ; , 𝑧 = FMDMG 345637 L 𝒚’ ∈ ⋃ IJK 2 D 𝒚’ ∈ 2 D Use existing attack tailored to S i T & Boneh , “Adversarial Training and Robustness for Multiple Perturbations”, NeurIPS 2019 12

Results Robust accuracy when Robust accuracy when training/evaluating on a single training/evaluating on both perturbation type 0 . 8 Adv ∞ 0 . 7 Accuracy Loss of ~ 5% CIFAR10: Adv 1 accuracy 0 . 6 Adv max tested on ℓ ∞ 0 . 5 Adv max tested on ℓ 1 Adv max tested on both 0 . 4 0 20000 40000 60000 80000 Steps Adv ∞ 1 . 00 Adv 1 Loss of ~ 20% Accuracy 0 . 75 Adv 2 accuracy MNIST: 0 . 50 Adv max tested on ℓ ∞ Adv max tested on ℓ 1 0 . 25 Adv max tested on ℓ 2 Adv max tested on all 0 . 00 0 2 4 6 8 10 Epochs T & Boneh , “Adversarial Training and Robustness for Multiple Perturbations”, NeurIPS 2019 13

Affine adversaries Instead of picking perturbations from 𝑇 1 ∪ 𝑇 2 why not combine them? E.g., small L 1 noise + small L ∞ noise or small rotation/translation + small L ∞ noise Affine adversary picks perturbation from 𝛾𝑇 1 + 1 − 𝛾 𝑇 2 , for 𝛾 ∈ 0, 1 β =1.0 0.75 0.5 0.25 0.0 RT and L ∞ attacks on CIFAR10 96 100 83 90 80 71 66 Extra loss of 70 56 60 ~ 10% accuracy 50 40 Acc Acc on RT Acc on L ∞ Acc on Acc against union affine adv T & Boneh , “Adversarial Training and Robustness for Multiple Perturbations”, NeurIPS 2019 15

Invariance Adversarial Examples Let’s look at MNIST again: (Simple dataset, centered and scaled, non-trivial robustness is achievable) ∈ 0, 1 784 Models have been trained to “extreme” levels of robustness (E.g., robust to L 1 noise > 30 or L ∞ noise = 0.4) Þ Some of these defenses are certified! natural For such examples, humans agree more often with an L 1 perturbed undefended model than with an overly robust model L ∞ perturbed Jacobsen et al., “Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness”, 2019 17

New Ideas for Defenses What would a realistic attack on a cyber-physical image classifier look like? 1. Attack has to be physically realizable Þ Robustness to physical changes (lighting, pose, etc.) 2. Some degree of “universality” Example: Adversarial patch [Brown et al., 2018] 19

Can we detect such attacks? Observation: To be robust to physical transforms, the attack has to be very “salient” Þ Use model interpretability to extract salient regions Problem: this might also extract “real” objects Þ Add the extracted region(s) onto some test images and check how often this “hijacks” the true prediction Chou et al. “SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems”, 2018 20

Developments in Adversarial Machine Learning Florian Tramr - PowerPoint PPT Presentation

Developments in Adversarial Machine Learning Florian Tramr September 19 th 2019 Based on joint work with Jens Behrmannn, Dan Boneh, Nicholas Carlini, Edward Chou, Pascal Dupr, Jrn-Henrik Jacobsen, Nicolas Papernot, Giancarlo Pellegrino,

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

Synthesis of distributed mobile programs using monadic types in Coq Marino Miculan Marco

Factoring RSA keys from certified smart cards: Coppersmith in the wild Daniel J. Bernstein,

Crypto meets Web Security: Certificates and SSL/TLS Fall 2016 Ada (Adam) Lerner

The BEACON use case under the GDPR Regina Becker ELIXIR-LU ELIXIR AllHands Workshop 7. June

Aims of the course The aim of this course to show that logic is a natural bridge between

probability in the mind Kim Scott 1 Probcomp tutorial 11/1/2012 Marrs levels of analysis for

Approximate inference (Ch. 14) Likelihood Weighting P(b|a) 1 P(a) 0.5 A B P(b|a) 0.2 In

Status of IASI and CrIS processing Chris Barnet, Atmospheric Sounding Science Team Meeting Oct.

Developments in Adversarial Machine Learning Florian Tramr - PowerPoint PPT Presentation

Developments in Adversarial Machine Learning Florian Tramr September 19 th 2019 Based on joint work with Jens Behrmannn, Dan Boneh, Nicholas Carlini, Edward Chou, Pascal Dupr, Jrn-Henrik Jacobsen, Nicolas Papernot, Giancarlo Pellegrino,

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

Synthesis of distributed mobile programs using monadic types in Coq Marino Miculan Marco

Factoring RSA keys from certified smart cards: Coppersmith in the wild Daniel J. Bernstein,

Crypto meets Web Security: Certificates and SSL/TLS Fall 2016 Ada (Adam) Lerner

The BEACON use case under the GDPR Regina Becker ELIXIR-LU ELIXIR AllHands Workshop 7. June

Aims of the course The aim of this course to show that logic is a natural bridge between

probability in the mind Kim Scott 1 Probcomp tutorial 11/1/2012 Marrs levels of analysis for

Approximate inference (Ch. 14) Likelihood Weighting P(b|a) 1 P(a) 0.5 A B P(b|a) 0.2 In

Status of IASI and CrIS processing Chris Barnet, Atmospheric Sounding Science Team Meeting Oct.

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin