Recent Trends in Adversarial Machine Learning
Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik
Recent Trends in Adversarial Machine Learning Thanks to Ian - - PowerPoint PPT Presentation
Recent Trends in Adversarial Machine Learning Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik How it works training Learning Algorithm Training Data Model
Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik
(Deep Learning, Decision Trees,
Learning: find classifier function that minimize a cost/loss (~model error) Training Data Learning Algorithm Model
Inference time: which ”class” is most like the input sample Machine Learning Classifier [0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01]
… … …
Input Layer Output Layer Hidden Layers
(e.g., convolutional, rectified linear, …)
p0=0.01 p1=0.93 p8=0.02 pN=0.01
M components N components Neuron Weighted Link (weight is a parameter part of )
θO …
I: Independent I: Identically D: Distributed
All train and test examples drawn independently from same distribution
...solving CAPTCHAS and reading addresses...
...recognizing
and faces….
(Szegedy et al, 2014) (Goodfellow et al, 2013) (Taigmen et al, 2013) (Goodfellow et al, 2013)
Humans are not very good at some parts of the benchmark The test data is not very
by natural but unusual data.
(“test set attack”)
(Eykholt et al, 2017)
Schoolbus Perturbation
(rescaled for visualization)
Ostrich + = (Szegedy et al, 2013) “Adversarial examples”
Training data Learning algorithm Learned parameters Test input Test output Training set poisoning Model theft Adversarial Examples Recovery of sensitive training data
(Goodfellow et al 2017)
cost) ?
Experiments excluding MNIST 1s, many of which look like 7s Diff Pair L0 L1 L2 L∞ L0 L1 L2 L∞ 63 91 110 121 35.0 19.9 21.7 34.0 4.86 3.21 2.83 .76 3.82 1.0 .996 1.0
No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models
No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Dropout at Train Time
No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Weight Decay
No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Cropping / fovea mechanisms
foveal
No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Adversarial Training with a Weak Attack
No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Defensive Distillation
No effect on advx Reduces advx, but reduces clean accuracy too much Does not affect adaptive attacker Does not generalize over attack algos Seems to generalize, but it’s an illusion Does not generalize over threat models Adversarial Training with a Strong Attack Current Certified / Provable Defenses
independent goals. For two models with the same error volume, for reasons of security we prefer:
December 4, 2018 Berkay Celik
@ZBerkayCelik h.ps://beerkay.github.io