SLIDE 1 ADVERSARIAL EXAMPLES
(In 15 minutes or less)
Neill Patterson, MscAC
SLIDE 2
PART I - BASIC CONCEPTS
SLIDE 3
WE TRAIN MODELS BY TAKING GRADIENTS W.R.T. WEIGHTS w w ηrJw
SLIDE 4 “Panda”
Change weights via gradient descent
SLIDE 5
WE’RE GOING TO TAKE GRADIENTS W.R.T. PIXELS INSTEAD x x ± ηrJx
SLIDE 6
WE ARE GOING TO TAKE GRADIENTS W.R.T. PIXELS INSTEAD x x ± ηrJx
SLIDE 7 “Panda” “Vulture”
Change pixels via gradient descent
SLIDE 8
KEY IDEA: ADD SMALL, WORST- CASE PIXEL DISTORTION TO CAUSE MISCLASSIFICATIONS
SLIDE 9
+ =
“Panda” “Gibbon” 99% confidence 58% confidence
SLIDE 10
THINK OF ADVERSARIAL EXAMPLES AS WORST-CASE DOPPLEGÄNGERS
SLIDE 11
DEMO
SLIDE 12
Sanja Fidler Fiddler Crab
SLIDE 13
PART II - HARNESSING ADVERSARIAL EXAMPLES
SLIDE 14 KEY IDEA: MAKE TRAINING MORE DIFFICULT TO GET STRONGER MODELS
(DROPOUT, RANDOM NOISE, ETC)
SLIDE 15
TRAIN WITH ADVERSARIAL EXAMPLES FOR BETTER GENERALIZATION
SLIDE 16
THE FAST GRADIENT SIGN METHOD OF IAN GOODFELLOW
SLIDE 17
QUICKLY GENERATING ADVERSARIAL EXAMPLES
SLIDE 18
WHAT DIRECTION SHOULD YOU MOVE TOWARDS?
SLIDE 19
INSTEAD OF MOVING TOWARDS A SPECIFIC TYPE OF ERROR, MOVE AWAY FROM THE CORRECT LABEL
SLIDE 20 “Panda”
“Vulture” “House” “Truck”
SLIDE 21
HOW BIG A STEP SHOULD YOU TAKE IF YOU WANT IMPERCEPTIBLE DISTORTION?
SLIDE 22 PIXELS ARE STORED AS SIGNED 8-BIT
- INTEGERS. ADD JUST LESS THAN1-
BIT OF DISTORTION TO EACH PIXEL 0.07 < 1 27 ≈ 0.08
SLIDE 23
WE WANT PRECISELY THIS AMOUNT OF DISTORTION, SO NO MATTER HOW SMALL (OR BIG) THE GRADIENT, JUST TAKE THE SIGN OF IT AND MULTIPLY BY 0.07
x + 0.07 ⇥ sign(rJx)
SLIDE 24
INCORPORATING ADVERSARIAL EXAMPLES INTO YOUR COST FUNCTION
SLIDE 25
GENERATE ADVERSARIAL EXAMPLES AT EACH ITERATION OF TRAINING, BUT DON’T WANT TO KEEP THEM AROUND IN MEMORY FOREVER
SLIDE 26
INSTEAD, MODIFY THE COST FUNCTION TO BE A COMBINATION OF ORIGINAL AND ADVERSARIAL INPUTS
SLIDE 27 Parameters
New cost function
inputs labels
e J(θ, x, y) =
SLIDE 28
Old cost function e J(θ, x, y) = J(θ, x, y) +
SLIDE 29 Adversarial example
e J(θ, x, y) = J(θ, x + ✏signrxJ | {z }, y) J(θ, x, y) + Old cost function
SLIDE 30 e J(θ, x, y) = J(θ, x, y) + J(θ, x + ✏signrxJ, y)
α
(1 − α) mixing components
SLIDE 31 e J(θ, x, y) = J(θ, x, y) + J(θ, x + ✏signrxJ, y)
α
(1 − α) “Train with a mix of original and adversarial examples”
SLIDE 32
NOW DO S.G.D. ON THIS NEW COST FUNCTION, BY TAKING GRADIENTS W.R.T. WEIGHTS w w ηr e Jw
SLIDE 33
PART III - MISCELLANEOUS TIPS FOR TRAINING
SLIDE 34 YOU NEED MORE MODEL CAPACITY
(ADVERSARIAL EXAMPLES DO NOT LIE ON THE MANIFOLD OF REALISTIC IMAGES)
SLIDE 35
FOR EARLY STOPPING, BASE YOUR DECISION ON THE VALIDATION ERROR OF ADVERSARIAL EXAMPLES ONLY
SLIDE 36
RESULTS
SLIDE 37
BETTER GENERALIZATION ABOVE AND BEYOND DROPOUT
0.94% error (MNIST) 0.84% error
SLIDE 38
BETTER GENERALIZATION ABOVE AND BEYOND DROPOUT
0.94% error (MNIST) 0.84% error
SLIDE 39 RESISTANCE TO ADVERSARIAL EXAMPLES
89.4% error
(97.6% confidence)
17.9% error
SLIDE 40
MATHEMATICAL PROPERTIES OF ADVERSARIAL EXAMPLES
SLIDE 41
MATHEMATICAL PROPERTIES OF ADVERSARIAL EXAMPLES
(Ain’t nobody got time for that)
SLIDE 42
THANK YOU FOR YOUR TIME!