Recent Trends in Adversarial Machine Learning Thanks to Ian - - PowerPoint PPT Presentation

recent trends in adversarial machine learning
SMART_READER_LITE
LIVE PREVIEW

Recent Trends in Adversarial Machine Learning Thanks to Ian - - PowerPoint PPT Presentation

Recent Trends in Adversarial Machine Learning Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik How it works training Learning Algorithm Training Data Model


slide-1
SLIDE 1

Recent Trends in Adversarial Machine Learning

Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik

slide-2
SLIDE 2

(Deep Learning, Decision Trees,

  • thers …)

Learning: find classifier function that minimize a cost/loss (~model error) Training Data Learning Algorithm Model

How it works … training …

slide-3
SLIDE 3

How it works … run-time …

Inference time: which ”class” is most like the input sample Machine Learning Classifier [0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01]

slide-4
SLIDE 4

… … …

Input Layer Output Layer Hidden Layers

(e.g., convolutional, rectified linear, …)

{

p0=0.01 p1=0.93 p8=0.02 pN=0.01

M components N components Neuron Weighted Link (weight is a parameter part of )

θO …

An Example …

slide-5
SLIDE 5

I.I.D. Machine Learning

I: Independent I: Identically D: Distributed

All train and test examples drawn independently from same distribution

slide-6
SLIDE 6

ML reached “human-level performance”

  • n many IID tasks circa 2013

...solving CAPTCHAS and reading addresses...

...recognizing

  • bjects

and faces….

(Szegedy et al, 2014) (Goodfellow et al, 2013) (Taigmen et al, 2013) (Goodfellow et al, 2013)

slide-7
SLIDE 7

Caveats to “human-level” benchmarks

Humans are not very good at some parts of the benchmark The test data is not very

  • diverse. ML models are fooled

by natural but unusual data.

slide-8
SLIDE 8

Security Requires Moving Beyond I.I.D.

  • Not identical: attackers can use unusual inputs
  • Not independent: attacker can repeatedly send a single mistake

(“test set attack”)

(Eykholt et al, 2017)

slide-9
SLIDE 9

Good models make surprising mistakes in non-IID setting

Schoolbus Perturbation

(rescaled for visualization)

Ostrich + = (Szegedy et al, 2013) “Adversarial examples”

slide-10
SLIDE 10

Adversarial Examples

slide-11
SLIDE 11

Attacks on the machine learning pipeline

Training data Learning algorithm Learned parameters Test input Test output Training set poisoning Model theft Adversarial Examples Recovery of sensitive training data

slide-12
SLIDE 12

Definition

“Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake”

(Goodfellow et al 2017)

slide-13
SLIDE 13

Threat Model

slide-14
SLIDE 14

Fifty Shades of Gray Box Attacks

  • Does the attacker go first, and the defender reacts?
  • This is easy, just train on the attacks, or design some preprocessing to remove them
  • If the defender goes first
  • Does the attacker have full knowledge? This is “white box”
  • Limited knowledge: “black box”
  • Does the attacker know the task the model is solving (input space, output space, defender

cost) ?

  • Does the attacker know the machine learning algorithm being used?
  • Details of the algorithm? (Neural net architecture, etc.)
  • Learned parameters of the model?
  • Can the attacker send “probes” to see how the defender processes different test inputs?
  • Does the attacker observe just the output class? Or also the probabilities?
slide-15
SLIDE 15

Roadmap

  • WHITE-BOX ATTACKS
  • BLACK-BOX ATTACKS
  • TRANSFERABILITY
  • DEFENSE TECHNIQUES
slide-16
SLIDE 16

White Box Attacks

slide-17
SLIDE 17

FGSM (Misclassification)

slide-18
SLIDE 18

Intuition

slide-19
SLIDE 19

JSMA (targeted)

slide-20
SLIDE 20

Carlini-Wagner (CW) (targeted)

slide-21
SLIDE 21

Success of an adversarial image

Experiments excluding MNIST 1s, many of which look like 7s Diff Pair L0 L1 L2 L∞ L0 L1 L2 L∞ 63 91 110 121 35.0 19.9 21.7 34.0 4.86 3.21 2.83 .76 3.82 1.0 .996 1.0

slide-22
SLIDE 22

Black-box Attacks

slide-23
SLIDE 23

Black-box Attacks

slide-24
SLIDE 24

Transferability

slide-25
SLIDE 25

Roadmap

  • WHITE-BOX ATTACKS
  • BLACK-BOX ATTACKS
  • TRANSFERABILITY
  • DEFENSE TECHNIQUES
slide-26
SLIDE 26

Pipeline of Defense Failures

No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models

slide-27
SLIDE 27

Pipeline of Defense Failures

No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Dropout at Train Time

slide-28
SLIDE 28

Pipeline of Defense Failures

No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Weight Decay

slide-29
SLIDE 29

Pipeline of Defense Failures

No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Cropping / fovea mechanisms

  • riginal

foveal

slide-30
SLIDE 30

Pipeline of Defense Failures

No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Adversarial Training with a Weak Attack

slide-31
SLIDE 31

Pipeline of Defense Failures

No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Defensive Distillation

slide-32
SLIDE 32

Pipeline of Defense Failures

No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Adversarial Training with a Strong Attack Current Certified / Provable Defenses

slide-33
SLIDE 33

What’s next defense?

slide-34
SLIDE 34

Future Directions

  • Common goal (AML and ML)
  • Just make the model better
  • They still share this goal
  • It is now clear security research must have some

independent goals. For two models with the same error volume, for reasons of security we prefer:

  • The model with lower confidence on mistakes
  • The model whose mistakes are harder to find
slide-35
SLIDE 35

THANKS!

December 4, 2018 Berkay Celik

@ZBerkayCelik h.ps://beerkay.github.io