ADVERSARIAL EXAMPLES (In 15 minutes or less) Neill Patterson, MscAC - - PowerPoint PPT Presentation

adversarial examples
SMART_READER_LITE
LIVE PREVIEW

ADVERSARIAL EXAMPLES (In 15 minutes or less) Neill Patterson, MscAC - - PowerPoint PPT Presentation

ADVERSARIAL EXAMPLES (In 15 minutes or less) Neill Patterson, MscAC PART I - BASIC CONCEPTS WE TRAIN MODELS BY TAKING GRADIENTS W.R.T. WEIGHTS w w r J w Panda Change weights via gradient descent WERE GOING TO TAKE GRADIENTS


slide-1
SLIDE 1

ADVERSARIAL EXAMPLES

(In 15 minutes or less)

Neill Patterson, MscAC

slide-2
SLIDE 2

PART I - BASIC CONCEPTS

slide-3
SLIDE 3

WE TRAIN MODELS BY TAKING GRADIENTS W.R.T. WEIGHTS w w ηrJw

slide-4
SLIDE 4

“Panda”

Change weights via gradient descent

slide-5
SLIDE 5

WE’RE GOING TO TAKE GRADIENTS W.R.T. PIXELS INSTEAD x x ± ηrJx

slide-6
SLIDE 6

WE ARE GOING TO TAKE GRADIENTS W.R.T. PIXELS INSTEAD x x ± ηrJx

slide-7
SLIDE 7

“Panda” “Vulture”

Change pixels via gradient descent

slide-8
SLIDE 8

KEY IDEA: ADD SMALL, WORST- CASE PIXEL DISTORTION TO CAUSE MISCLASSIFICATIONS

slide-9
SLIDE 9

+ =

“Panda” “Gibbon” 99% confidence 58% confidence

slide-10
SLIDE 10

THINK OF ADVERSARIAL EXAMPLES AS WORST-CASE DOPPLEGÄNGERS

slide-11
SLIDE 11

DEMO

slide-12
SLIDE 12

Sanja Fidler Fiddler Crab

slide-13
SLIDE 13

PART II - HARNESSING ADVERSARIAL EXAMPLES

slide-14
SLIDE 14

KEY IDEA: MAKE TRAINING MORE DIFFICULT TO GET STRONGER MODELS

(DROPOUT, RANDOM NOISE, ETC)

slide-15
SLIDE 15

TRAIN WITH ADVERSARIAL EXAMPLES FOR BETTER GENERALIZATION

slide-16
SLIDE 16

THE FAST GRADIENT SIGN METHOD OF IAN GOODFELLOW

slide-17
SLIDE 17

QUICKLY GENERATING ADVERSARIAL EXAMPLES

slide-18
SLIDE 18

WHAT DIRECTION SHOULD YOU MOVE TOWARDS?

slide-19
SLIDE 19

INSTEAD OF MOVING TOWARDS A SPECIFIC TYPE OF ERROR, MOVE AWAY FROM THE CORRECT LABEL

slide-20
SLIDE 20

“Panda”

“Vulture” “House” “Truck”

slide-21
SLIDE 21

HOW BIG A STEP SHOULD YOU TAKE IF YOU WANT IMPERCEPTIBLE DISTORTION?

slide-22
SLIDE 22

PIXELS ARE STORED AS SIGNED 8-BIT

  • INTEGERS. ADD JUST LESS THAN1-

BIT OF DISTORTION TO EACH PIXEL 0.07 < 1 27 ≈ 0.08

slide-23
SLIDE 23

WE WANT PRECISELY THIS AMOUNT OF DISTORTION, SO NO MATTER HOW SMALL (OR BIG) THE GRADIENT, JUST TAKE THE SIGN OF IT AND MULTIPLY BY 0.07

x + 0.07 ⇥ sign(rJx)

slide-24
SLIDE 24

INCORPORATING ADVERSARIAL EXAMPLES INTO YOUR COST FUNCTION

slide-25
SLIDE 25

GENERATE ADVERSARIAL EXAMPLES AT EACH ITERATION OF TRAINING, BUT DON’T WANT TO KEEP THEM AROUND IN MEMORY FOREVER

slide-26
SLIDE 26

INSTEAD, MODIFY THE COST FUNCTION TO BE A COMBINATION OF ORIGINAL AND ADVERSARIAL INPUTS

slide-27
SLIDE 27

Parameters

New cost function

inputs labels

e J(θ, x, y) =

slide-28
SLIDE 28

Old cost function e J(θ, x, y) = J(θ, x, y) +

slide-29
SLIDE 29

Adversarial example

e J(θ, x, y) = J(θ, x + ✏signrxJ | {z }, y) J(θ, x, y) + Old cost function

slide-30
SLIDE 30

e J(θ, x, y) = J(θ, x, y) + J(θ, x + ✏signrxJ, y)

α

(1 − α) mixing components

slide-31
SLIDE 31

e J(θ, x, y) = J(θ, x, y) + J(θ, x + ✏signrxJ, y)

α

(1 − α) “Train with a mix of original and adversarial examples”

slide-32
SLIDE 32

NOW DO S.G.D. ON THIS NEW COST FUNCTION, BY TAKING GRADIENTS W.R.T. WEIGHTS w w ηr e Jw

slide-33
SLIDE 33

PART III - MISCELLANEOUS TIPS FOR TRAINING

slide-34
SLIDE 34

YOU NEED MORE MODEL CAPACITY

(ADVERSARIAL EXAMPLES DO NOT LIE ON THE MANIFOLD OF REALISTIC IMAGES)

slide-35
SLIDE 35

FOR EARLY STOPPING, BASE YOUR DECISION ON THE VALIDATION ERROR OF ADVERSARIAL EXAMPLES ONLY

slide-36
SLIDE 36

RESULTS

slide-37
SLIDE 37

BETTER GENERALIZATION ABOVE AND BEYOND DROPOUT

0.94% error (MNIST) 0.84% error

slide-38
SLIDE 38

BETTER GENERALIZATION ABOVE AND BEYOND DROPOUT

0.94% error (MNIST) 0.84% error

slide-39
SLIDE 39

RESISTANCE TO ADVERSARIAL EXAMPLES

89.4% error

(97.6% confidence)

17.9% error

slide-40
SLIDE 40

MATHEMATICAL PROPERTIES OF ADVERSARIAL EXAMPLES

slide-41
SLIDE 41

MATHEMATICAL PROPERTIES OF ADVERSARIAL EXAMPLES

(Ain’t nobody got time for that)

slide-42
SLIDE 42

THANK YOU FOR YOUR TIME!