Wild Patterns: Ten Years After the Rise of Adversarial Machine - - PowerPoint PPT Presentation

wild patterns ten years after the rise of adversarial
SMART_READER_LITE
LIVE PREVIEW

Wild Patterns: Ten Years After the Rise of Adversarial Machine - - PowerPoint PPT Presentation

Pattern Recognition University of and Applications Lab Cagliari, Italy Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning Battista Biggio * Slides from this talk are inspired from the tutorial I prepared with Fabio Roli on


slide-1
SLIDE 1

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

Battista Biggio

Pattern Recognition and Applications Lab University of Cagliari, Italy

Winter School on Quantitative Systems Biology: Learning and Artificial Intelligence, Nov. 15-16, Trieste, Italy * Slides from this talk are inspired from the tutorial I prepared with Fabio Roli on such topic. https://www.pluribus-one.it/sec-ml/wild-patterns/

slide-2
SLIDE 2

93

Countering Evasion Attacks

What is the rule? The rule is protect yourself at all times (from the movie “Million dollar baby”, 2004)

slide-3
SLIDE 3

http://pralab.diee.unica.it

@biggiobattista

Security Measures against Evasion Attacks

  • 1. Reduce sensitivity to input changes

with robust optimization

– Adversarial Training / Regularization

  • 2. Introduce rejection / detection
  • f adversarial examples

94

min

$ ∑& max ||*+||,-ℓ(0&, 2 $ 3& + *& )

bounded perturbation!

1 1 1 1 SVM-RBF (higher rejection rate) 1 1 1 1 SVM-RBF (no reject)

slide-4
SLIDE 4

95

Countering Evasion: Reducing Sensitivity to Input Changes with Robust Optimization

slide-5
SLIDE 5

http://pralab.diee.unica.it

@biggiobattista

  • Robust optimization (a.k.a. adversarial training)
  • Robustness and regularization (Xu et al., JMLR 2009)

– under linearity of ℓ and "

#, equivalent to robust optimization

Reducing Input Sensitivity via Robust Optimization

min

#

max

||*+||,-. ∑0 ℓ 10, " # 30 + *0

bounded perturbation!

96

min

#

∑5 ℓ 10, "

# 30

+ 6||73"||8

dual norm of the perturbation ||73"||8 = ||#||8

slide-6
SLIDE 6

http://pralab.diee.unica.it

@biggiobattista

Experiments on Android Malware

  • Infinity-norm regularization is the optimal regularizer against sparse evasion attacks

– Sparse evasion attacks penalize | " |# promoting the manipulation of only few features

Results on Adversarial Android Malware

[Demontis, Biggio et al., Yes, ML Can Be More Secure!..., IEEE TDSC 2017]

Absolute weight values |$| in descending order Why? It bounds the maximum weight absolute values!

min

w,b w ∞ +C

max 0,1− yi f (xi)

( )

i

, w ∞ = max

i=1,...,d wi

Sec-SVM

97

slide-7
SLIDE 7

http://pralab.diee.unica.it

@biggiobattista

Adversarial Training and Regularization

  • Adversarial training can also be seen as a form of regularization, which penalizes the (dual)

norm of the input gradients ! |#$ℓ |&

  • Known as double backprop or gradient/Jacobian regularization

– see, e.g., Simon-Gabriel et al., Adversarial vulnerability of neural networks increases with input dimension, ArXiv 2018; and Lyu et al., A unified gradient regularization family for adversarial examples, ICDM 2015.

98

' g(') '’ with adversarial training Take-home message: the net effect of these techniques is to make the prediction function

  • f the classifier smoother
slide-8
SLIDE 8

http://pralab.diee.unica.it

@biggiobattista

Ineffective Defenses: Obfuscated Gradients

  • Work by Carlini & Wagner (SP’ 17) and Athalye et al. (ICML ‘18) has shown that

– some recently-proposed defenses rely on obfuscated / masked gradients, and – they can be circumvented

99

g(") "’ " Obfuscated gradients do not allow the correct execution of gradient-based attacks... " g(") "’ ... but substitute models and/or smoothing can correctly reveal meaningful input gradients!

slide-9
SLIDE 9

100

Countering Evasion: Detecting & Rejecting Adversarial Examples

slide-10
SLIDE 10

http://pralab.diee.unica.it

@biggiobattista

Detecting & Rejecting Adversarial Examples

  • Adversarial examples tend to occur in blind spots

– Regions far from training data that are anyway assigned to ‘legitimate’ classes

101

blind-spot evasion (not even required to mimic the target class) rejection of adversarial examples through enclosing of legitimate classes

slide-11
SLIDE 11

http://pralab.diee.unica.it

@biggiobattista

Detecting & Rejecting Adversarial Examples

input perturbation (Euclidean distance)

102

[Melis, Biggio et al., Is Deep Learning Safe for Robot Vision? ICCVW ViPAR 2017]

slide-12
SLIDE 12

http://pralab.diee.unica.it

@biggiobattista

[S. Sabour at al., ICLR 2016]

Why Rejection (in Representation Space) Is Not Enough?

103

slide-13
SLIDE 13

http://pralab.diee.unica.it

@biggiobattista

Why Rejection (in Representation Space) Is Not Enough?

Slide credit: David Evans, DLS 2018 - https://www.cs.virginia.edu/~evans/talks/dls2018/

104

slide-14
SLIDE 14

105

Adversarial Examples against Machine Learning Web Demo

https://sec-ml.pluribus-one.it/demo

slide-15
SLIDE 15

106

Poisoning Machine Learning

slide-16
SLIDE 16

http://pralab.diee.unica.it

@biggiobattista

Poisoning Machine Learning

107

x x x x x x x x x x x x x x x x x

x1 x2 ... xd pre-processing and feature extraction training data (with labels) classifier learning

start bang portfolio winner year ... university campus

Start 2007 with a bang! Make WBFS YOUR PORTFOLIO’s first winner

  • f the year

...

start bang portfolio winner year ... university campus 1 1 1 1 1 ...

x

SPAM start bang portfolio winner year ... university campus +2 +1 +1 +1 +1 ...

  • 3
  • 4

w

x x x x x x x x x x x x x x x x x

classifier generalizes well

  • n test data
slide-17
SLIDE 17

http://pralab.diee.unica.it

@biggiobattista

Poisoning Machine Learning

108

x x x x x x x x x x x x x x x x x

x1 x2 ... xd pre-processing and feature extraction corrupted training data classifier learning is compromised...

Start 2007 with a bang! Make WBFS YOUR PORTFOLIO’s first winner

  • f the year

... university campus...

start bang portfolio winner year ... university campus 1 1 1 1 1 ... 1 1

x

SPAM start bang portfolio winner year ... university campus +2 +1 +1 +1 +1 ... +1 +1

w

x x x x x x x x x x x x x x x x x

... to maximize error

  • n test data

x x x

poisoning data

slide-18
SLIDE 18

http://pralab.diee.unica.it

@biggiobattista

  • Goal: to maximize classification error
  • Knowledge: perfect / white-box attack
  • Capability: injecting poisoning samples into TR
  • Strategy: find an optimal attack point xc in TR that maximizes classification error

xc

classification error = 0.039 classification error = 0.022

Poisoning Attacks against Machine Learning

xc

classification error as a function of xc

[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]

109

slide-19
SLIDE 19

http://pralab.diee.unica.it

@biggiobattista

Poisoning is a Bilevel Optimization Problem

  • Attacker’s objective

– to maximize generalization error on untainted data, w.r.t. poisoning point xc

  • Poisoning problem against (linear) SVMs:

Loss estimated on validation data (no attack points!) Algorithm is trained on surrogate data (including the attack point)

[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012] [Xiao, Biggio, Roli et al., Is feature selection secure against training data poisoning? ICML, 2015] [Munoz-Gonzalez, Biggio, Roli et al., Towards poisoning of deep learning..., AISec 2017]

max

$%

& '()*, ,∗

  • s. t. ,∗ = argmin6 ℒ '89 ∪ ;<, =< , ,

max

$%

>

?@A B

max(0,1 − =?,∗ ;? )

  • s. t. ,∗ = argminH,I

A J HKH + C ∑O@A P

max(0,1 − =O, ;O ) + C max(0,1 − =<, ;< )

110

slide-20
SLIDE 20

http://pralab.diee.unica.it

@biggiobattista

xc

(0)

xc

Gradient-based Poisoning Attacks

  • Gradient is not easy to compute

– The training point affects the classification function

  • Trick:

– Replace the inner learning problem with its equilibrium (KKT) conditions – This enables computing gradient in closed form

  • Example for (kernelized) SVM

– similar derivation for Ridge, LASSO, Logistic Regression, etc.

111

xc

(0)

xc

[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012] [Xiao, Biggio, Roli et al., Is feature selection secure against training data poisoning? ICML, 2015]

slide-21
SLIDE 21

http://pralab.diee.unica.it

@biggiobattista

Experiments on MNIST digits

Single-point attack

  • Linear SVM; 784 features; TR: 100; VAL: 500; TS: about 2000

– ‘0’ is the malicious (attacking) class – ‘4’ is the legitimate (attacked) one

xc

(0)

xc

112

[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]

slide-22
SLIDE 22

http://pralab.diee.unica.it

@biggiobattista

Experiments on MNIST digits

Multiple-point attack

  • Linear SVM; 784 features; TR: 100; VAL: 500; TS: about 2000

– ‘0’ is the malicious (attacking) class – ‘4’ is the legitimate (attacked) one 113

[Biggio, Nelson, Laskov. Poisoning attacks against SVMs. ICML, 2012]

slide-23
SLIDE 23

http://pralab.diee.unica.it

@biggiobattista

How about Poisoning Deep Nets?

  • ICML 2017 Best Paper by Koh et al., “Understanding black-box predictions via Influence

Functions” has derived adversarial training examples against a DNN

– they have been constructed attacking only the last layer (KKT-based attack against logistic regression) and assuming the rest of the network to be ”frozen”

114

slide-24
SLIDE 24

http://pralab.diee.unica.it

@biggiobattista

Towards Poisoning Deep Neural Networks

  • Solving the poisoning problem without exploiting KKT conditions (back-gradient)

– Muñoz-González, Biggio, Roli et al., AISec 2017 https://arxiv.org/abs/1708.08689

115

slide-25
SLIDE 25

116

Countering Poisoning Attacks

What is the rule? The rule is protect yourself at all times (from the movie “Million dollar baby”, 2004)

slide-26
SLIDE 26

http://pralab.diee.unica.it

@biggiobattista

Security Measures against Poisoning

  • Rationale: poisoning injects outlying training samples
  • Two main strategies for countering this threat

1. Data sanitization: remove poisoning samples from training data

  • Bagging for fighting poisoning attacks
  • Reject-On-Negative-Impact (RONI) defense

2. Robust Learning: learning algorithms that are robust in the presence of poisoning samples

xc

(0)

xc xc

(0)

xc

117

slide-27
SLIDE 27

http://pralab.diee.unica.it

@biggiobattista

Robust Regression with TRIM

  • TRIM learns the model by retaining only training points with the smallest residuals

argmin

',),*

+ ,, -, . = 1 |.| 2

3∈*

5 63 − 83 9 + ;Ω(>)

@ = 1 + A B, . ⊂ 1, … , @ , . = B

[Jagielski, Biggio et al., IEEE Symp. Security and Privacy, 2018]

118

slide-28
SLIDE 28

http://pralab.diee.unica.it

@biggiobattista

Experiments with TRIM (Loan Dataset)

  • TRIM MSE is within 1% of original model MSE

Existing methods Our defense No defense Better defense

119

[Jagielski, Biggio et al., IEEE Symp. Security and Privacy, 2018]

slide-29
SLIDE 29

120

Other Attacks against ML

slide-30
SLIDE 30

http://pralab.diee.unica.it

@biggiobattista

Attacks against Machine Learning

Integrity Availability Privacy / Confidentiality Test data Evasion (a.k.a. adversarial examples)

  • Model extraction / stealing

Model inversion (hill-climbing) Membership inference attacks Training data Poisoning (to allow subsequent intrusions) – e.g., backdoors or neural network trojans Poisoning (to maximize classification error)

  • [Biggio & Roli, Wild Patterns, 2018 https://arxiv.org/abs/1712.03141]

Misclassifications that do not compromise normal system operation Misclassifications that compromise normal system operation Attacker’s Goal Attacker’s Capability Querying strategies that reveal confidential information on the learning model or its users Attacker’s Knowledge:

  • perfect-knowledge (PK) white-box attacks
  • limited-knowledge (LK) black-box attacks (transferability with surrogate/substitute learning models)

121

slide-31
SLIDE 31

http://pralab.diee.unica.it

@biggiobattista

Model Inversion Attacks

Privacy Attacks

  • Goal: to extract users’ sensitive information

(e.g., face templates stored during user enrollment)

– Fredrikson, Jha, Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. ACM CCS, 2015

  • Also known as hill-climbing attacks in the biometric community

– Adler. Vulnerabilities in biometric encryption systems. 5th Int’l Conf. AVBPA, 2005 – Galbally, McCool, Fierrez, Marcel, Ortega-Garcia. On the vulnerability of face verification systems to hill-climbing attacks. Patt. Rec., 2010

  • How: by repeatedly querying the target system and adjusting the

input sample to maximize its output score (e.g., a measure of the similarity of the input sample with the user templates)

122

Reconstructed Image Training Image

slide-32
SLIDE 32

http://pralab.diee.unica.it

@biggiobattista

Membership Inference Attacks

Privacy Attacks (Shokri et al., IEEE Symp. SP 2017)

  • Goal: to identify whether an input sample is part of the training set used to learn a deep

neural network based on the observed prediction scores for each class

123

slide-33
SLIDE 33

http://pralab.diee.unica.it

@biggiobattista

Training data (poisoned) Backdoored stop sign (labeled as speedlimit)

Backdoor Attacks

Poisoning Integrity Attacks

124

Backdoor / poisoning integrity attacks place mislabeled training points in a region of the feature space far from the rest of training data. The learning algorithm labels such region as desired, allowing for subsequent intrusions / misclassifications at test time Training data (no poisoning)

  • T. Gu, B. Dolan-Gavitt, and S. Garg. Badnets: Identifying vulnerabilities in

the machine learning model supply chain. In NIPS Workshop on Machine Learning and Computer Security, 2017.

  • X. Chen, C. Liu, B. Li, K. Lu, and D. Song. Targeted backdoor attacks on

deep learning systems using data poisoning. ArXiv e-prints, 2017.

  • M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine

learning be secure? In Proc. ACM Symp. Information, Computer and

  • Comm. Sec., ASIACCS ’06, pages 16–25, New York, NY, USA, 2006. ACM.
  • M. Barreno, B. Nelson, A. Joseph, and J. Tygar. The security of machine
  • learning. Machine Learning, 81:121–148, 2010.
  • B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support

vector machines. In J. Langford and J. Pineau, editors, 29th Int’l Conf. on Machine Learning, pages 1807–1814. Omnipress, 2012.

  • B. Biggio, G. Fumera, and F. Roli. Security evaluation of pattern classifiers

under attack. IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, April 2014.

  • H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli. Is feature

selection secure against training data poisoning? In F. Bach and D. Blei, editors, JMLR W&CP - Proc. 32nd Int’l Conf. Mach. Learning (ICML), volume 37, pages 1689–1698, 2015.

  • L. Munoz-Gonzalez, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee,
  • E. C. Lupu, and F. Roli. Towards poisoning of deep learning algorithms

with back-gradient optimization. In 10th ACM Workshop on Artificial Intelligence and Security, AISec ’17, pp. 27–38, 2017. ACM.

  • B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial

machine learning. ArXiv e-prints, 2018.

  • M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li.

Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 39th IEEE Symp. on Security and Privacy, 2018.

Attack referred to as backdoor Attack referred to as ‘poisoning integrity’

slide-34
SLIDE 34

125

Are Adversarial Examples a Real Security Threat?

slide-35
SLIDE 35

http://pralab.diee.unica.it

@biggiobattista

World Is Not Digital…

  • ….Previous cases of adversarial examples have common characteristic: the

adversary is able to precisely control the digital representation of the input to the machine learning tools…..

126

[M. Sharif et al., ACM CCS 2016]

School Bus (x) Ostrich Struthio Camelus Adversarial Noise (r)

slide-36
SLIDE 36

127

Do Adversarial Examples Exist in the Physical World?

slide-37
SLIDE 37

http://pralab.diee.unica.it

@biggiobattista

  • Adversarial images fool deep networks even when they operate in the physical world, for

example, images are taken from a cell-phone camera?

– Alexey Kurakin et al. (2016, 2017) explored the possibility of creating adversarial images for machine learning systems which operate in the physical world. They used images taken from a cell-phone camera as an input to an Inception v3 image classification neural network – They showed that in such a set-up, a significant fraction of adversarial images crafted using the

  • riginal network are misclassified even when fed to the classifier through the camera

[Alexey Kurakin et al., ICLR 2017]

Adversarial Images in the Physical World

128

slide-38
SLIDE 38

http://pralab.diee.unica.it

@biggiobattista [M. Sharif et al., ACM CCS 2016]

The adversarial perturbation is applied only to the eyeglasses image region

Adversarial Glasses

129

slide-39
SLIDE 39

http://pralab.diee.unica.it

@biggiobattista

Should We Be Worried ?

130

slide-40
SLIDE 40

http://pralab.diee.unica.it

@biggiobattista

No, We Should Not…

131

In this paper, we show experiments that suggest that a trained neural network classifies most of the pictures taken from different distances and angles of a perturbed image correctly. We believe this is because the adversarial property of the perturbation is sensitive to the scale at which the perturbed picture is viewed, so (for example) an autonomous car will misclassify a stop sign only from a small range of distances. [arXiv:1707.03501; CVPR 2017]

slide-41
SLIDE 41

http://pralab.diee.unica.it

@biggiobattista

Yes, We Should...

132

[https://blog.openai.com/robust-adversarial-inputs/]

slide-42
SLIDE 42

http://pralab.diee.unica.it

@biggiobattista

Yes, We Should...

133 [Athalye et al., Synthesizing robust adversarial examples. ICLR, 2018]

slide-43
SLIDE 43

http://pralab.diee.unica.it

@biggiobattista

Yes, We Should...

134

slide-44
SLIDE 44

http://pralab.diee.unica.it

@biggiobattista

Yes, We Should...

135

slide-45
SLIDE 45

http://pralab.diee.unica.it

@biggiobattista

  • Adversarial examples can exist in the physical world, we can fabricate concrete

adversarial objects (glasses, road signs, etc.)

  • But the effectiveness of attacks carried out by adversarial objects is still to be

investigated with large scale experiments in realistic security scenarios

  • Gilmer et al. (2018) have recently discussed the realism of security threat caused

by adversarial examples, pointing out that it should be carefully investigated

– Are indistinguishable adversarial examples a real security threat ? – For which real security scenarios adversarial examples are the best attack vector? Better than attacking components outside the machine learning component – …

136

Is This a Real Security Threat?

[Justin Gilmer et al., Motivating the Rules of the Game for Adversarial Example Research, https://arxiv.org/abs/1807.06732]

slide-46
SLIDE 46

137

Are Indistinguishable Perturbations a Real Security Threat?

slide-47
SLIDE 47

http://pralab.diee.unica.it

@biggiobattista

!(#) ≠ &

The adversarial image x + r is visually hard to distinguish from x

…There is a torrent of work that views increased robustness to restricted perturbations as making these models more secure. While not all of this work requires completely indistinguishable modications, many of the papers focus on specifically small modications, and the language in many suggests or implies that the degree of perceptibility of the perturbations is an important aspect of their security risk…

Indistinguishable Adversarial Examples

138 [Justin Gilmer et al., Motivating the Rules of the Game for Adversarial Example Research, arXiv 2018]

slide-48
SLIDE 48

http://pralab.diee.unica.it

@biggiobattista

  • The attacker can benefit by minimal perturbation of a legitimate input; e.g., she

could use the attack for a longer period of time before it is detected

  • But is minimal perturbation a necessary constraint for the attacker?

Indistinguishable Adversarial Examples

139

slide-49
SLIDE 49

http://pralab.diee.unica.it

@biggiobattista

  • Is minimal perturbation a necessary constraint for the attacker?

Indistinguishable Adversarial Examples

140

slide-50
SLIDE 50

http://pralab.diee.unica.it

@biggiobattista

Indistinguishable Adversarial Examples

141

[ Flickr user \faungg". Stop? Accessed: 2018-7-18. Aug. 2014. https://www.flickr.com/photos/44534236@N00/15536855528/]

  • Is minimal perturbation a necessary constraint for the attacker?
slide-51
SLIDE 51

http://pralab.diee.unica.it

@biggiobattista

Attacks with Content Preservation

142

There are well known security applications where minimal perturbations and indistinguishability of adversarial inputs are not required at all…

slide-52
SLIDE 52

http://pralab.diee.unica.it

@biggiobattista

…At the time of writing, we were unable to find a compelling example that required indistinguishability… To have the largest impact, we should both recast future adversarial example research as a contribution to core machine learning and develop new abstractions that capture realistic threat models.

143

Are Indistinguishable Perturbations a Real Security Threat?

[Justin Gilmer et al., Motivating the Rules of the Game for Adversarial Example Research, arXiv 2018]

slide-53
SLIDE 53

http://pralab.diee.unica.it

@biggiobattista

To Conclude…

This is a recent research field…

144

Dagstuhl Perspectives Workshop on “Machine Learning in Computer Security” Schloss Dagstuhl, Germany, Sept. 9th-14th, 2012

slide-54
SLIDE 54

http://pralab.diee.unica.it

@biggiobattista

Timeline of Learning Security

145 S e c u r i t y

  • f

D N N s A d v e r s a r i a l M L

2 4

  • 2

5 : p i

  • n

e e r i n g w

  • r

k D a l v i e t a l . , K D D 2 ; L

  • w

d & M e e k , K D D 2 5 2013: Srndic & Laskov, NDSS claim nonlinear classifiers are secure 2006-2010: Barreno, Nelson, Rubinstein, Joseph, Tygar The Security of Machine Learning 2013-2014: Biggio et al., ECML, IEEE TKDE high-confidence & black-box evasion attacks to show vulnerability of nonlinear classifiers 2014: Srndic & Laskov, IEEE S&P shows vulnerability of nonlinear classifiers with our ECML ‘13 gradient-based attack 2014: Szegedy et al., ICLR adversarial examples vs DL 2016: Papernot et al., IEEE S&P evasion attacks / adversarial examples 2017: Papernot et al., ASIACCS black-box evasion attacks 2017: Carlini & Wagner, IEEE S&P high-confidence evasion attacks 2017: Grosse et al., ESORICS Application to Android malware 2017: Demontis et al., IEEE TDSC Secure learning for Android malware 2004 2014 2006 2013 2014 2017 2016 2017 2017 2017 2014

slide-55
SLIDE 55

http://pralab.diee.unica.it

@biggiobattista

Timeline of Learning Security

146 S e c u r i t y

  • f

D N N s A d v e r s a r i a l M L

2 4

  • 2

5 : p i

  • n

e e r i n g w

  • r

k D a l v i e t a l . , K D D 2 ; L

  • w

d & M e e k , K D D 2 5 2013: Srndic & Laskov, NDSS claim nonlinear classifiers are secure 2006-2010: Barreno, Nelson, Rubinstein, Joseph, Tygar The Security of Machine Learning 2013-2014: Biggio et al., ECML, IEEE TKDE high-confidence & black-box evasion attacks to show vulnerability of nonlinear classifiers 2014: Srndic & Laskov, IEEE S&P shows vulnerability of nonlinear classifiers with our ECML ‘13 gradient-based attack 2014: Szegedy et al., ICLR adversarial examples vs DL 2016: Papernot et al., IEEE S&P evasion attacks / adversarial examples 2017: Papernot et al., ASIACCS black-box evasion attacks 2017: Carlini & Wagner, IEEE S&P high-confidence evasion attacks 2017: Grosse et al., ESORICS Application to Android malware 2004 2014 2006 2013 2014 2017 2016 2017 2017 2017 2014 2017: Demontis et al., IEEE TDSC Secure learning for Android malware

slide-56
SLIDE 56

http://pralab.diee.unica.it

@biggiobattista

Timeline of Learning Security

Adversarial ML

2004-2005: pioneering work Dalvi et al., KDD 2004 Lowd & Meek, KDD 2005 2013: Srndic & Laskov, NDSS 2013: Biggio et al., ECML-PKDD - demonstrated vulnerability of nonlinear algorithms to gradient-based evasion attacks, also under limited knowledge Main contributions:

  • 1. gradient-based adversarial perturbations (against SVMs and neural nets)
  • 2. projected gradient descent / iterative attack (also on discrete features from malware data)

transfer attack with surrogate/substitute model

  • 3. maximum-confidence evasion (rather than minimum-distance evasion)

Main contributions:

  • minimum-distance evasion of linear classifiers
  • notion of adversary-aware classifiers

2006-2010: Barreno, Nelson, Rubinstein, Joseph, Tygar The Security of Machine Learning (and references therein) Main contributions:

  • first consolidated view of the adversarial ML problem
  • attack taxonomy
  • exemplary attacks against some learning algorithms

2014: Szegedy et al., ICLR Independent discovery of (gradient- based) minimum-distance adversarial examples against deep nets; earlier implementation of adversarial training

Security of DNNs

2016: Papernot et al., IEEE S&P Framework for security evalution of deep nets 2017: Papernot et al., ASIACCS Black-box evasion attacks with substitute models (breaks distillation with transfer attacks on a smoother surrogate classifier) 2017: Carlini & Wagner, IEEE S&P Breaks again distillation with maximum-confidence evasion attacks (rather than using minimum-distance adversarial examples) 2016: Papernot et al., Euro S&P Distillation defense (gradient masking) Main contributions:

  • evasion of linear PDF malware detectors
  • claims nonlinear classifiers can be more secure

2014: Biggio et al., IEEE TKDE Main contributions:

  • framework for security evaluation of learning algorithms
  • attacker’s model in terms of goal, knowledge, capability

2017: Demontis et al., IEEE TDSC Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection Main contributions:

  • Secure SVM against adversarial examples in malware

detection 2017: Grosse et al., ESORICS Adversarial examples for malware detection 2018: Madry et al., ICLR Improves the basic iterative attack from Kurakin et al. by adding noise before running the attack; first successful use of adversarial training to generalize across many attack algorithms 2014: Srndic & Laskov, IEEE S&P used Biggio et al.’s ECML-PKDD ‘13 gradient-based evasion attack to demonstrate vulnerability of nonlinear PDF malware detectors 2006: Globerson & Roweis, ICML 2009: Kolcz et al., CEAS 2010: Biggio et al., IJMLC Main contributions:

  • evasion attacks against linear classifiers in spam filtering

Work on security evaluation of learning algorithms Work on evasion attacks (a.k.a. adversarial examples) Pioneering work on adversarial machine learning ... in malware detection (PDF / Android)

Legend

1 2 3 4 1 2 3 4

2015: Goodfellow et al., ICLR Maximin formulation of adversarial training, with adversarial examples generated iteratively in the inner loop 2016: Kurakin et al. Basic iterative attack with projected gradient to generate adversarial examples

2

iterative attacks

Biggio and Roli, Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning, Pattern Recognition, 2018

147

slide-57
SLIDE 57

http://pralab.diee.unica.it

@biggiobattista

Black Swans to the Fore

148

[Szegedy et al., Intriguing properties of neural networks, 2014]

After this “black swan”, the issue of security of DNNs came to the fore… Not only on scientific specialistic journals…

slide-58
SLIDE 58

http://pralab.diee.unica.it

@biggiobattista

The Safety Issue to the Fore…

149

The black box of AI

  • D. Castelvecchi, Nature, Vol. 538, 20, Oct

2016 Machine learning is becoming ubiquitous in basic research as well as in industry. But for scientists to trust it, they first need to understand what the machines are doing. Ellie Dobson, director of data science at the big-data firm Arundo Analytics in Oslo: If something were to go wrong as a result of setting the UK interest rates, she says, “the Bank of England can’t say, the black box made me do it’”.

slide-59
SLIDE 59

http://pralab.diee.unica.it

@biggiobattista

Why So Much Interest?

150

Before the deep net “revolution”, people were not surprised when machine learning was wrong, they were more amazed when it worked well… Now that it seems to work for real applications, people are disappointed, and worried, for errors that humans do not do…

slide-60
SLIDE 60

http://pralab.diee.unica.it

@biggiobattista

Errors of Humans and Machines…

151

Machine learning decisions are affected by several sources of bias that causes “strange” errors But we should keep in mind that also humans are biased…

slide-61
SLIDE 61

http://pralab.diee.unica.it

@biggiobattista

The Bat and the Ball Problem

152

A bat and a ball together cost $ 1.10 The bat costs $ 1.0 more than the ball How much does the ball cost ? Please, give me the first answer coming to your mind !

slide-62
SLIDE 62

http://pralab.diee.unica.it

@biggiobattista

The Bat and the Ball Problem

153

Exact solution is 0.05 dollar (5 cents) The wrong solution ($ 0.10) is due to the attribute substitution, a psychological process thought to underlie a number of cognitive biases It occurs when an individual has to make a judgment (of a target attribute) that is computationally complex, and instead substitutes a more easily calculated heuristic attribute

bat+ball=$1.10 bat=ball+$1.0 ⎧ ⎨ ⎪ ⎩ ⎪

slide-63
SLIDE 63

http://pralab.diee.unica.it

@biggiobattista

Trust in Humans or Machines?

Algorithms are biased, but also humans are as well… When should you trust in humans and when in algorithms?

154

slide-64
SLIDE 64

http://pralab.diee.unica.it

@biggiobattista

Learning Comes at a Price!

155

The introduction of novel learning functionalities increases the attack surface of computer systems and produces new vulnerabilities Safety of machine learning will be more and more important in future computer systems, as well as accountability, transparency, and the protection of fundamental human values and rights

slide-65
SLIDE 65

http://pralab.diee.unica.it

@biggiobattista

Thanks for Listening!

Any questions?

156

Engineering isn't about perfect solutions; it's about doing the best you can with limited resources (Randy Pausch, 1960-2008)