On evasion attacks against machine learning in practical settings - - PowerPoint PPT Presentation

on evasion attacks against machine learning in practical
SMART_READER_LITE
LIVE PREVIEW

On evasion attacks against machine learning in practical settings - - PowerPoint PPT Presentation

On evasion attacks against machine learning in practical settings Lujo Bauer Professor, Electrical & Computer Engineering + Computer Science Director, Cyber Autonomy Research Center Collaborators: Mahmood Sharif, Sruti Bhagavatula, Mike


slide-1
SLIDE 1

1

On evasion attacks against machine learning in practical settings

Lujo Bauer Professor, Electrical & Computer Engineering + Computer Science Director, Cyber Autonomy Research Center Collaborators: Mahmood Sharif, Sruti Bhagavatula, Mike Reiter (UNC), …

slide-2
SLIDE 2

2

Machine Learning Is Ubiquitous

  • Cancer diagnosis
  • Predicting weather
  • Self-driving cars
  • Surveillance and

access-control

slide-3
SLIDE 3

3

What Do You See?

Lion (p=0.99) Race car (p=0.74) Traffic light (p=0.99) Deep Neural Network*

… … … … … …

𝑞𝑑 𝑞𝑑 𝑞𝑑 𝑞𝑑

*CNN-F, proposed by Chatfield et al., “Return of the Devil”, BMVC ‘14

slide-4
SLIDE 4

4

What Do You See Now?

… … … … … …

𝑞𝑑 𝑞𝑑 𝑞𝑑 𝑞𝑑

DNN (same as before)

Pelican (p=0.85) Speedboat (p=0.92) Jeans (p=0.89)

*The attacks generated following the method proposed by Szegedy et al.

slide-5
SLIDE 5

5

The Difference

  • =
  • =
  • =

3

Amplify

slide-6
SLIDE 6

6

Is This an Attack?

  • =
  • =
  • =

3

Amplify

slide-7
SLIDE 7

7

Can an Attacker Fool ML Classifiers?

Defender / beholder doesn’t notice attack

(to be measured by user study)

Can change physical

  • bjects, in a

limited way Can’t control camera position, lighting Fooling face recognition (e.g., for surveillance, access control)

[Sharif, Bhagavatula, Bauer, Reiter CCS ’16, arXiv ’17, TOPS ’19]

  • What is the attack scenario?
  • Does scenario have constraints?
  • On how attacker can manipulate input?
  • On what the changed input can look like?
slide-8
SLIDE 8

19

Step #1: Generate Realistic Eyeglasses

… … … … … …

Generator [0..1]

… … … … … …

Discriminator real / fake Real eyeglasses

slide-9
SLIDE 9

20

Step #2: Generate Realistic Eyeglasses

… … … … … …

Generator [0..1]

… … … … … …

Discriminator real / fake Real eyeglasses

Adversarial

slide-10
SLIDE 10

21

… … … … … …

Generator [0..1]

… … … … … …

Face recognizer Russell Crowe / Owen Wilson / Lujo Bauer / …

Step #2: Generate Realistic Eyeglasses

Adversarial

slide-11
SLIDE 11

22

Ariel

slide-12
SLIDE 12

23

Are Adversarial Eyeglasses Inconspicuous?

real / fake real / fake real / fake …

slide-13
SLIDE 13

24

Are Adversarial Eyeglasses Inconspicuous?

Real Adversarial (realized) Adversari al (digital)

Most realistic 10% of physically realized eyeglasses are more realistic than average real eyeglasses

fraction of time selected as real

slide-14
SLIDE 14

25

Can an Attacker Fool ML Classifiers? (Attempt #2)

  • What is the attack scenario?
  • Does scenario have constraints?
  • On how attacker can manipulate input?
  • On what the changed input can look like?

Defender / beholder doesn’t notice attack

(to be measured by user study)

Can change physical objects in a limited way Can’t control camera position, lighting Fooling face recognition (e.g., for surveillance, access control)

slide-15
SLIDE 15

26

Considering Camera Position, Lighting

  • Used algorithm to measure pose (pitch, roll, yaw)
  • Mixed-effects logistic regression
  • Each 1° of yaw = 0.94x attack success rate
  • Each 1° of pitch = 0.94x (VGG) or 1.12x (OpenFace) attack success rate
  • Varied luminance

(add 150W incandescent light at 45°, 5 luminance levels)

  • Not included in training → 50% degradation in attack success
  • Included in training → no degradation in attack success
slide-16
SLIDE 16

27

What If Defenses Are in Place?

  • Already:
  • Augmentation to make face recognition more robust to eyeglasses
  • New:
  • Train attack detector (Metzen et al. 2017)
  • 100% recall and 100% precision
  • Attack must fool original DNN and detector
  • Re

Result (digital environment): attack success unchang ck success unchanged ed, with minor impact to conspicuousness

slide-17
SLIDE 17

28

Can an Attacker Fool ML Classifiers? (Attempt #2)

  • What is the attack scenario?
  • Does scenario have constraints?
  • On how attacker can manipulate input?
  • On what the changed input can look like?

Defender / beholder doesn’t notice attack

(to be measured by user study)

Can change physical objects in a limited way Can’t control camera position, lighting Fooling face recognition (e.g., for surveillance, access control)

slide-18
SLIDE 18

29

Other Attack Scenarios?

Dodging: One pair of eyeglasses, many attackers? Change to training process: Train with multiple images of one user → train with multiple images of many users Create multiple eyeglasses, test with large population

slide-19
SLIDE 19

30

Other Attack Scenarios?

Dodging: One pair of eyeglasses, many attackers?

# of subjects trained on # of eyeglasses used for dodging Success rate (VGG143)

1 pair of eyeglasses, 50+%

  • f population

avoids recognition 5 pairs of eyeglasses, 85+%

  • f population

avoids recognition

slide-20
SLIDE 20

33

Other Attack Scenarios?

Stop sign → speed limit sign [Eykholt et al., arXiv ‘18]

  • r Defense
slide-21
SLIDE 21

34

Other Attack Scenarios?

Stop sign → speed limit sign [Eykholt et al., arXiv ’18] Hidden voice commands [Carlini et al., ’16-19] noise → “OK, Google, browse to evil dot com” Malware classification [Suciu et al., arXiv ’18] malware → “benign”

  • r Defense
slide-22
SLIDE 22

35

Can an attacker fool ML classifiers?

Face recognition

Attacker goal: evade surveillance, fool access-control mechanism Input: image of face Constraints:

  • Can’t precisely control camera

angle, lighting, pose, …

  • Attack must be inconspicuous

Malware detection

Attacker goal: bypass malware detection system Input: malware binary Constraints:

  • Must be functional malware
  • Changes to binary must not

be easy to remove

Very different constraints!  Attack method does not carry over

slide-23
SLIDE 23

36

Hypothetical attack on malware detection

Malware (p=0.99) Benign (p=0.99) Malware-detection DNN

  • 1. Must be functional malware
  • 2. Changes to binary must not be easy to remove
slide-24
SLIDE 24

37

Attack building block: Binary diversification

  • Originally proposed to mitigate return-oriented programming [3,4]
  • Uses transformations that preserve functionality:

1. Substitution of equivalent instruction 2. Reordering instructions 3. Register-preservation (push and pop) randomization 4. Reassignment of registers 5. Displace code to a new section 6. Add semantic nops

[3] Koo and Polychronakis, “Juggling the Gadgets.” AsiaCCS, ’16. [4] Pappas et al., “Smashing the Gadgets.” IEEE S&P, ’12.

In-place randomization (IPR) Displacement (Disp)

slide-25
SLIDE 25

38

Example: Reordering instructions*

mov eax, [ecx+0x10] push ebx mov ebx, [ecx+0xc] cmp eax, ebx mov [ecx+0x8], eax jle 0x5c

Original code Dependency graph reorder

push ebx mov ebx, [ecx+0xc] mov eax, [ecx+0x10] mov [ecx+0x8], eax cmp eax, ebx jle 0x5c

*Example by Pappas et al.

Reordered code

slide-26
SLIDE 26

39

Transforming malware to evade detection

Input: malicious binary x (classified as malicious) Desired output: malicious binary x’ that is misclassified by AV For each function h in binary x

  • 1. Pick a transformation
  • 2. Apply transformation to function h to create binary x’
  • 3. If x’ is “more benign” than x, continue with x’; otherwise revert to x
slide-27
SLIDE 27

41

Transforming malware to evade detection

Experiment: 100 malicious binaries, 3 malware detectors (80-92% TPR) Success rate (success = malicious binary classified as benign): Success rate for 68 commercial anti viruses (black-box): Up to ~50% of AVs classify transformed malicious binary as benign

Random IPR+Disp-5 Kreuk-5

Misclassified (%)

20 40 60 80 100

Avast Endgame MalConv

Transformed malicious binary classified as benign ~100% of the time

slide-28
SLIDE 28

42

Can an attacker fool ML classifiers? Yes

Face recognition

Attacker goal: evade surveillance, fool access-control mechanism Input: image of face Constraints:

  • Can’t precisely control camera

angle, lighting, pose, …

  • Attack must be inconspicuous

Malware detection

Attacker goal: bypass malware detection system Input: malware binary Constraints:

  • Must be functional malware
  • Changes to binary must not

be easy to remove

slide-29
SLIDE 29

43

Some directions for defenses

  • Know when not to deploy ML algs
  • “Explainable AI” – help defender understand alg’s decision

Image courtesy of Matt Fredrikson

slide-30
SLIDE 30

44

Some directions for defenses

  • Know when not to deploy ML algs
  • “Explainable” AI – help defender understand alg’s decision
  • Harder to apply to input data not easily interpretable by humans
  • “Provably robust/verified” ML – but slow, works only in few cases
  • Test-time inputs similar to training-time inputs should be classified the same
  • … but similarity metrics for vision don’t capture semantic attacks 
  • … and in some domains similarity isn’t important for successful attacks
  • Ensembles, gradient obfuscation, … – help, but only to a point
slide-31
SLIDE 31

45

Fooling ML Classifiers: Summary

  • “Attacks” may not be meaningful until we fix context
  • E.g., for face recognition:
  • Attacker: physically realized (i.e., constrained) attack
  • Defender / observer: attack isn’t noticed as such
  • Even in a practical (constrained) context, real attacks exist
  • Relatively robust, inconspicuous; high success rates
  • Hard-to-formalize constraints can be captured by a DNN
  • We need better definitions for similarity and correctness

Lujo Bauer lbauer@cmu.edu