toward adversarial robustness by diversity in an ensemble
play

Toward Adversarial Robustness by Diversity in an Ensemble of - PowerPoint PPT Presentation

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks presented at Canadian AI 2020 Mahdieh Abbasi 1 , Arezoo Rajabi 2 , Christian Gagn 1,3 , and Rakesh B. Bobba 2 1. IID, Universit Laval, Qubec,


  1. Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks presented at Canadian AI 2020 Mahdieh Abbasi 1 , Arezoo Rajabi 2 , Christian Gagné 1,3 , and Rakesh B. Bobba 2 1. IID, Université Laval, Québec, Canada 2. Oregon State University, Corvallis, USA 3. Mila, Canada CIFAR AI Chair

  2. Adversarial attacks aims at fooling a Ensemble of Reject samples model through specialists trained with low ensemble imperceptible on diverse subsets classification modifications of classes certainty Legitimate sample ! or Adversarial instance ? ! ! 2

  3. Adversarial example Adding the right imperceptible perturbation to a clean sample creates an adversarial example such that NNs if fooled by misclassifying it confidently ! P ( C = panda | x ) > 0 . 9 x P ( C = gibbon | x 0 ) > 0 . 9 x + ✏ = x 0 3

  4. Examples of attacks Attack name Algorithm Properties ! Fast, one step Fast Gradient Sign Gradient Ascent – ! Non-optimal (FGS) maximize loss of neural network (NN) adversaries ! Fast, one step Gradient Descent – Targeted FGS ! Non-optimal minimize loss of NN toward a target class adversaries ! Moderate - iterative Project the sample to DeepFool ! Better adversaries the nearest decision boundary ! Slow - iterative Directly optimizing Carlini & Wagner ! Optimal adversaries (CW) an objective function 4

  5. Attack models Black-box attack ! Attacker does not know any thing about the victim classifier ! Another NN used as proxy to current classifier Gray-box attack ! Attacker knows the defense mechanism but not the victim’s specific model parameters White-box attack ! Attacker knows the defense mechanism and all the victim’s model parameters 5

  6. The goal Without any specific adversarial training , detect adversarial instances by calibrating our model predictive confidence Reducing predictive confidence over adversarial examples while keeping that of clean samples high How: leveraging diversity in the ensemble of specialists 6

  7. Ensemble of specialists Build an ensemble of specialists, each trained on different set of classes ! Train several specialist neural networks on subset of classes ! A generalist network is also trained on all the classes Example: a 3-class classification problem Generalist NN, 3 classes Specialist NN, 2 classes Specialist NN, 2 classes 7

  8. Schematic explanation (1/2) A black-box attack fooling the generalist classifier (left) can be classified as different classes by the specialists, creating diversity (entropy) in their predictions ! low confidence prediction 8

  9. Schematic Explanation (2/2) Hardening generation of high confidence white-box adversaries as the specialists fool toward distinct fooling classes given the subsets 9

  10. Our approach 1. Creation of the ensemble of specialists 2. Voting Mechanism : merging members predictions to compute the ensemble decision 3. Reject the samples when predictive confidence < ! 10

  11. Ensemble creation 2) For each class, define two class 1) Build a fooling matrix by FGS subsets: most likely fooling classes and adversaries remaining classes True classes Fool classes "#$%&'()$')$*)+*,-.*$/0$12$+3*4('.(+5+$6$7$8*)*&'.(+5 11

  12. Voting mechanism Principle : a sample should be classified by all the relevant models to its given class Agreement: All relevant models vote for a given class, only their prediction confidences are averaged as output Disagreement: There is no agreement from all relevant models to a given class, prediction confidences of all models are averaged as output 12

  13. Confidence upper bound In the presence of disagreement, the ensemble predictive confidence is upper bounded ( M is the ensemble size) 1 ¯ h ( x ) ≤ 0 . 5 + 2 M Based on this corollary, we set a fixed global threshold (i.e. ! =0.5), for rejecting adversaries 13

  14. Evaluation metrics Risk rate for clean samples ( E D ) : ratio of correctly classified but rejected samples (i.e. confidence less than ! ) and those that are misclassified but not rejected (i.e. confidence value above ! ) Risk rate for adversaries ( E A ) : percentage of misclassified adversaries that are not rejected (i.e. confidence value above ! ) 14

  15. Experiments MNIST black-box attacks 9.*')$+',3.*+ :;<$'=>*&+'&(*+ 15

  16. Experiments CIFAR-10 black-box attacks 9.*')$+',3.*+ :;<$'=>*&+'&(*+ 16

  17. Experiments White-box attacks ! Generating adversaries specifically targeted for our ensemble model, pure ensemble, or a vanilla CNN ! Lower success rate is better! It shows the difficulty of adversaries generation 17

  18. Experiments Gray-box attack ! Generate 100 CW adversarial examples using another ensemble specialists ! 74% of them have confidence lower than 0.5 (rejection) ! 26% remaining have confidence higher than 0.5 (not rejected) !"#$%&"'%($)$*'$+% ,+-$(.,(/$.%,($%0,(+% '"%($*"1&/2$%$-$&%3"(% ,%04#,&%"5.$(-$(%% 18

  19. Our contributions Method for building an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism for calibrating the predictive confidence for distinguising clean and adversarial examples Detecting adversaries using a provable fixed global threshold on the predictive confidence 19

  20. !"#$%&'()*' your #++,$+-)$'. Mahdieh Abbasi, PhD student mahdieh.abbasi.1@ulaval.ca Christian Gagné, professor christian.gagne@gel.ulaval.ca https://vision.gel.ulaval.ca/~cgagne https://iid.ulaval.ca Arezoo Rajabi, PhD student rajabia@oregonstate.edu Rakesh B. Booba, professor rakesh.bobba@oregonstate.edu !""#$%&&''($)*+',*-$"."')'/0&#'*#1'&2*22.3+.4'$! 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend