Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania - - PowerPoint PPT Presentation

▶

Jul 23, 2023 104 likes •437 views

@NicolasPapernot Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017 @NicolasPapernot Thank you to our collaborators Sandy Huang

SLIDE 1

Gradient Masking in Machine Learning

Nicolas Papernot

Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017

@NicolasPapernot

SLIDE 2

Pieter Abbeel (Berkeley) Michael Backes (CISPA) Dan Boneh (Stanford)

Z. Berkay Celik

(Penn State) Yan Duan (OpenAI) Ian Goodfellow (Google) Matt Fredrikson (CMU) Kathrin Grosse (CISPA)

Thank you to our collaborators

Sandy Huang (Berkeley) Somesh Jha (U of Wisconsin) Alexey Kurakin (Google) Praveen Manoharan (CISPA) Patrick McDaniel (Penn State) Arunesh Sinha (U of Michigan) Ananthram Swami (US ARL) Florian Tramèr (Stanford) Michael Wellman (U of Michigan)

@NicolasPapernot

SLIDE 3

Gradient Masking

SLIDE 4

Training

Small when prediction is correct on legitimate input

SLIDE 5

Adversarial training

Small when prediction is correct on legitimate input Small when prediction is correct on adversarial input

SLIDE 6

Gradient masking in adversarially trained models

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr

Direction of the adversarially trained model’s gradient Direction of another model’s gradient

SLIDE 7

Gradient masking in adversarially trained models

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr

Direction of the adversarially trained model’s gradient Adversarial example Non-adversarial example Direction of another model’s gradient

SLIDE 8

Gradient masking in adversarially trained models

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr

Non-adversarial example Direction of the adversarially trained model’s gradient Direction of another model’s gradient Adversarial example Non-adversarial example

SLIDE 9

Evading gradient masking (1)

Threat model: white-box adversary Attack: (1) Random step (of norm alpha) (2) FGSM step (of norm eps - alpha)

Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses Illustration adapted from slides by Florian Tramèr

SLIDE 10

Threat model: black-box attack Attack: (1) Learn substitute for defended model (2) Find adversarial direction using substitute

Evading gradient masking (2)

Papernot et al. Practical Black-Box Attacks against Machine Learning Papernot et al. Towards the Science of Security and Privacy in Machine Learning

SLIDE 11

Black-box ML sys Local substitute “no truck sign” “STOP sign” (1) The adversary queries the remote ML system with synthetic inputs to learn a local substitute.

Attacking black-box models

Papernot et al. Practical Black-box Attacks Against Machine Learning

SLIDE 12

Black-box ML sys Local substitute “yield sign” (2) The adversary uses the local substitute to craft adversarial examples.

Attacking black-box models

Papernot et al. Practical Black-box Attacks Against Machine Learning

SLIDE 13

Adversarial example transferability

Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

SLIDE 14

Large adversarial subspaces enable transferability

Tramèr et al. The Space of Transferable Adversarial Examples

On average: 44 orthogonal directions -> 25 transfer

SLIDE 15

Adversarial training

Small when prediction is correct on legitimate input Small when prediction is correct on adversarial input gradient is not adversarial

SLIDE 16