Toward Adversarial Robustness by Diversity in an Ensemble of - - PowerPoint PPT Presentation

toward adversarial robustness by diversity in an ensemble
SMART_READER_LITE
LIVE PREVIEW

Toward Adversarial Robustness by Diversity in an Ensemble of - - PowerPoint PPT Presentation

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks presented at Canadian AI 2020 Mahdieh Abbasi 1 , Arezoo Rajabi 2 , Christian Gagn 1,3 , and Rakesh B. Bobba 2 1. IID, Universit Laval, Qubec,


slide-1
SLIDE 1

Mahdieh Abbasi 1, Arezoo Rajabi 2, Christian Gagné 1,3, and Rakesh B. Bobba 2

  • 1. IID, Université Laval, Québec, Canada
  • 2. Oregon State University, Corvallis, USA
  • 3. Mila, Canada CIFAR AI Chair

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks

presented at Canadian AI 2020

slide-2
SLIDE 2

2

!

!

!

Legitimate sample

  • r

Adversarial instance ?

Adversarial attacks aims at fooling a model through imperceptible modifications Ensemble of specialists trained

  • n diverse subsets
  • f classes

Reject samples with low ensemble classification certainty

slide-3
SLIDE 3

3

Adversarial example

Adding the right imperceptible perturbation to a clean sample creates an adversarial example such that NNs if fooled by misclassifying it confidently!

x x + ✏ = x0

P(C = panda|x) > 0.9 P(C = gibbon|x0) > 0.9

slide-4
SLIDE 4

4

Examples of attacks

Attack name Algorithm Properties

Fast Gradient Sign (FGS)

Gradient Ascent – maximize loss of neural network (NN) ! Fast, one step ! Non-optimal adversaries

Targeted FGS

Gradient Descent – minimize loss of NN toward a target class ! Fast, one step ! Non-optimal adversaries

DeepFool

Project the sample to the nearest decision boundary ! Moderate - iterative ! Better adversaries

Carlini & Wagner (CW)

Directly optimizing an objective function ! Slow - iterative ! Optimal adversaries

slide-5
SLIDE 5

5

Attack models

Black-box attack

! Attacker does not know any thing about the victim classifier ! Another NN used as proxy to current classifier

Gray-box attack

! Attacker knows the defense mechanism but not the victim’s

specific model parameters

White-box attack

! Attacker knows the defense mechanism and all

the victim’s model parameters

slide-6
SLIDE 6

6

The goal

Without any specific adversarial training, detect adversarial instances by calibrating our model predictive confidence

Reducing predictive confidence over adversarial examples while keeping that of clean samples high

How: leveraging diversity in the ensemble

  • f specialists
slide-7
SLIDE 7

7

Ensemble of specialists

Generalist NN, 3 classes Specialist NN, 2 classes Specialist NN, 2 classes

Build an ensemble of specialists, each trained on different set of classes ! Train several specialist neural networks on subset of classes ! A generalist network is also trained on all the classes Example: a 3-class classification problem

slide-8
SLIDE 8

8

Schematic explanation (1/2)

A black-box attack fooling the generalist classifier (left) can be classified as different classes by the specialists, creating diversity (entropy) in their predictions ! low confidence prediction

slide-9
SLIDE 9

9

Schematic Explanation (2/2)

Hardening generation of high confidence white-box adversaries as the specialists fool toward distinct fooling classes given the subsets

slide-10
SLIDE 10

10

Our approach

1. Creation of the ensemble of specialists 2. Voting Mechanism: merging members predictions to compute the ensemble decision 3. Reject the samples when predictive confidence < !

slide-11
SLIDE 11

11

Ensemble creation

True classes Fool classes

2) For each class, define two class subsets: most likely fooling classes and remaining classes 1) Build a fooling matrix by FGS adversaries "#$%&'()$')$*)+*,-.*$/0$12$+3*4('.(+5+$6$7$8*)*&'.(+5

slide-12
SLIDE 12

12

Voting mechanism

Principle: a sample should be classified by all the relevant models to its given class Agreement: All relevant models vote for a given class,

  • nly their prediction confidences are averaged as
  • utput

Disagreement: There is no agreement from all relevant models to a given class, prediction confidences of all models are averaged as output

slide-13
SLIDE 13

13

Confidence upper bound

In the presence of disagreement, the ensemble predictive confidence is upper bounded (M is the ensemble size) Based on this corollary, we set a fixed global threshold (i.e. !=0.5), for rejecting adversaries

¯ h(x) ≤ 0.5 + 1 2M

slide-14
SLIDE 14

14

Evaluation metrics

Risk rate for clean samples (ED): ratio of correctly classified but rejected samples (i.e. confidence less than !) and those that are misclassified but not rejected (i.e. confidence value above !) Risk rate for adversaries (EA): percentage of misclassified adversaries that are not rejected (i.e. confidence value above !)

slide-15
SLIDE 15

15

Experiments

MNIST black-box attacks

9.*')$+',3.*+ :;<$'=>*&+'&(*+

slide-16
SLIDE 16

16

Experiments

CIFAR-10 black-box attacks

9.*')$+',3.*+ :;<$'=>*&+'&(*+

slide-17
SLIDE 17

17

Experiments

White-box attacks

! Generating adversaries specifically targeted for our ensemble model, pure ensemble, or a vanilla CNN ! Lower success rate is better! It shows the difficulty of adversaries generation

slide-18
SLIDE 18

18

Experiments

Gray-box attack

! Generate 100 CW adversarial examples using another ensemble specialists ! 74% of them have confidence lower than 0.5 (rejection) ! 26% remaining have confidence higher than 0.5 (not rejected)

!"#$%&"'%($)$*'$+% ,+-$(.,(/$.%,($%0,(+% '"%($*"1&/2$%$-$&%3"(% ,%04#,&%"5.$(-$(%%

slide-19
SLIDE 19

19

Our contributions

Method for building an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism for calibrating the predictive confidence for distinguising clean and adversarial examples Detecting adversaries using a provable fixed global threshold

  • n the predictive confidence
slide-20
SLIDE 20

20

!"#$%&'()*'your #++,$+-)$'.

Mahdieh Abbasi, PhD student mahdieh.abbasi.1@ulaval.ca Arezoo Rajabi, PhD student rajabia@oregonstate.edu Rakesh B. Booba, professor rakesh.bobba@oregonstate.edu !""#$%&&''($)*+',*-$"."')'/0&#'*#1'&2*22.3+.4'$! Christian Gagné, professor christian.gagne@gel.ulaval.ca https://vision.gel.ulaval.ca/~cgagne https://iid.ulaval.ca