Limits on Robustness to Adversarial Examples Elvis Dohmatob Criteo - - PowerPoint PPT Presentation

limits on robustness to adversarial examples
SMART_READER_LITE
LIVE PREVIEW

Limits on Robustness to Adversarial Examples Elvis Dohmatob Criteo - - PowerPoint PPT Presentation

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Limits on Robustness to Adversarial Examples Elvis Dohmatob Criteo AI Lab October 2, 2019 Elvis Dohmatob Limits on Robustness to Adversarial


slide-1
SLIDE 1

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds

Limits on Robustness to Adversarial Examples

Elvis Dohmatob

Criteo AI Lab

October 2, 2019

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 1 / 41

slide-2
SLIDE 2

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds

Table of contents

1

Preliminaries on adversarial robustness

2

Classifier-dependent lower bounds

3

Universal lower bounds

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 2 / 41

slide-3
SLIDE 3

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds

Preliminaries on adversarial robustness

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 3 / 41

slide-4
SLIDE 4

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds

Definition of adversarial attacks

A classifier is trained and deployed (e.g the computer vision system on a self-driving car) At test / inference time, an attacker may submit queries to the classifier by sampling a real sample point x with true label k (e.g “pig”), modifying it x → xadv given to a prescribed threat model. Goal of attacker is to make classifier label xadv as = k (e.g airliner)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 4 / 41

slide-5
SLIDE 5

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds

The flying pig!

(Picture is courtesy of https: // gradientscience. org/ intro_ adversarial/ )

x → xadv := x + noise , noise ≤ ε = 0.005 (in example above) Fast Gradient Sign Method: noise = sign(∇xloss(h(x), y))

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 5 / 41

slide-6
SLIDE 6

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds

FGSM for generating adversarial examples [Goodfellow ’14]

x → xadv := clip(x + εsign(∇xloss(h(x), y)))

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 6 / 41

slide-7
SLIDE 7

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds

Adversarial attacks and defenses, an arms race!

Image courtesy of [Goldstein’ 19; Shafahi ’19]

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 7 / 41

slide-8
SLIDE 8

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Classifier-dependent lower bounds

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 8 / 41

slide-9
SLIDE 9

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Problem setup

A classifier is simply a Borel-measurable mapping h : X → Y from feature space X (with metric d) to label space Y := {1, . . . , K}. A classifier is trained and deployed (e.g the computer vision system on a self-driving car) At test / inference time, an attacker may submit queries to the classifier by sampling a real sample point x ∈ X with true label k ∈ Y, and modifying it x → xadv according to a prescribed threat model. For example, modifying a few pixels on a road traffic sign [Su et al. ’17] Modifying intensity of pixels by a limited amount determined by a prescribed tolerance level [Tsipras ’18], etc., on it.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 9 / 41

slide-10
SLIDE 10

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Problem setup

A classifier is simply a Borel-measurable mapping h : X → Y from feature space X (with metric d) to label space Y := {1, . . . , K}. A classifier is trained and deployed (e.g the computer vision system on a self-driving car) At test / inference time, an attacker may submit queries to the classifier by sampling a real sample point x ∈ X with true label k ∈ Y, and modifying it x → xadv according to a prescribed threat model. For example, modifying a few pixels on a road traffic sign [Su et al. ’17] Modifying intensity of pixels by a limited amount determined by a prescribed tolerance level [Tsipras ’18], etc., on it.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 9 / 41

slide-11
SLIDE 11

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Problem setup: notations

Standard accuracy: acc(h|k) := 1 − err(h|k), where err(h|k) := PX|k(h(X) = k) is the error of h on class k. Small acc(h|k) = ⇒ h is inaccurate on class k. Adversarial robustness accuracy: accε(h|k) := 1 − errε(h|k), where errε(h|k) := PX|k(∃x′ ∈ Ball(X; ε) | h(x′) = k) is the adversarial robustness error of h on class k. Small accε(h|k) = ⇒ h is vulnerable to attacks on class k.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 10 / 41

slide-12
SLIDE 12

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Problem setup: notations

Standard accuracy: acc(h|k) := 1 − err(h|k), where err(h|k) := PX|k(h(X) = k) is the error of h on class k. Small acc(h|k) = ⇒ h is inaccurate on class k. Adversarial robustness accuracy: accε(h|k) := 1 − errε(h|k), where errε(h|k) := PX|k(∃x′ ∈ Ball(X; ε) | h(x′) = k) is the adversarial robustness error of h on class k. Small accε(h|k) = ⇒ h is vulnerable to attacks on class k. Distance to error set: d(h|k) := EPX|k[d(X, B(h, k))] denotes the average distance of a sample point of true label k, from the error set B(h, k) := {x ∈ X | h(x) = k} of samples assigned to another label. Small d(h|k) = ⇒ h is vulnerable to attacks on class k.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 10 / 41

slide-13
SLIDE 13

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Problem setup: notations

Standard accuracy: acc(h|k) := 1 − err(h|k), where err(h|k) := PX|k(h(X) = k) is the error of h on class k. Small acc(h|k) = ⇒ h is inaccurate on class k. Adversarial robustness accuracy: accε(h|k) := 1 − errε(h|k), where errε(h|k) := PX|k(∃x′ ∈ Ball(X; ε) | h(x′) = k) is the adversarial robustness error of h on class k. Small accε(h|k) = ⇒ h is vulnerable to attacks on class k. Distance to error set: d(h|k) := EPX|k[d(X, B(h, k))] denotes the average distance of a sample point of true label k, from the error set B(h, k) := {x ∈ X | h(x) = k} of samples assigned to another label. Small d(h|k) = ⇒ h is vulnerable to attacks on class k.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 10 / 41

slide-14
SLIDE 14

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

A motivating example (from [Tsipras ’18])

Consider the following classification problem: Prediction target: Y ∼ Bern(1/2, {±1}) based on p ≥ 2 explanatory variables X := (X 1, X 2, . . . , X p) given by Robust feature: X 1 | Y = +Y w.p 70% and −Y w.p. 30%.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 11 / 41

slide-15
SLIDE 15

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

A motivating example (from [Tsipras ’18])

Consider the following classification problem: Prediction target: Y ∼ Bern(1/2, {±1}) based on p ≥ 2 explanatory variables X := (X 1, X 2, . . . , X p) given by Robust feature: X 1 | Y = +Y w.p 70% and −Y w.p. 30%. Non-robust features: X j | Y ∼ N(ηY , 1), for j = 2, . . . , p, where η ∼ p−1/2 is a fixed scalar which controls the difficulty.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 11 / 41

slide-16
SLIDE 16

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

A motivating example (from [Tsipras ’18])

Consider the following classification problem: Prediction target: Y ∼ Bern(1/2, {±1}) based on p ≥ 2 explanatory variables X := (X 1, X 2, . . . , X p) given by Robust feature: X 1 | Y = +Y w.p 70% and −Y w.p. 30%. Non-robust features: X j | Y ∼ N(ηY , 1), for j = 2, . . . , p, where η ∼ p−1/2 is a fixed scalar which controls the difficulty. The linear classifier hlin(x) ≡ sign(wTx) with w = (0, 1/p, . . . , 1/p), where we allow ℓ∞-perturbations of maximum size ε ≥ 2η, solves the problem perfectly (100% accuracy) but its adversarial robustness is zero!

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 11 / 41

slide-17
SLIDE 17

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

A motivating example (from [Tsipras ’18])

Consider the following classification problem: Prediction target: Y ∼ Bern(1/2, {±1}) based on p ≥ 2 explanatory variables X := (X 1, X 2, . . . , X p) given by Robust feature: X 1 | Y = +Y w.p 70% and −Y w.p. 30%. Non-robust features: X j | Y ∼ N(ηY , 1), for j = 2, . . . , p, where η ∼ p−1/2 is a fixed scalar which controls the difficulty. The linear classifier hlin(x) ≡ sign(wTx) with w = (0, 1/p, . . . , 1/p), where we allow ℓ∞-perturbations of maximum size ε ≥ 2η, solves the problem perfectly (100% accuracy) but its adversarial robustness is zero!

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 11 / 41

slide-18
SLIDE 18

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof.

The standard accuracy of the classifier writes acc(hlin) := P(X,Y ) (hlin(X) = Y ) = P

  • YwTX ≥ 0
  • = PY

 (Y /(p − 1))

  • j≥2

N(ηY , 1) ≥ 0   = P (N(η, 1/(p − 1)) ≥ 0) = P (N(0, 1/(p − 1)) ≥ −η) = P (N(0, 1/(p − 1)) ≤ η) ≥ 1 − e−(p−1)η2/2, which is ≥ 1 − δ if η ≥

  • 2 log(1/δ)/(p − 1).

= ⇒ hlin is quasi-perfect!

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 12 / 41

slide-19
SLIDE 19

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof.

The standard accuracy of the classifier writes acc(hlin) := P(X,Y ) (hlin(X) = Y ) = P

  • YwTX ≥ 0
  • = PY

 (Y /(p − 1))

  • j≥2

N(ηY , 1) ≥ 0   = P (N(η, 1/(p − 1)) ≥ 0) = P (N(0, 1/(p − 1)) ≥ −η) = P (N(0, 1/(p − 1)) ≤ η) ≥ 1 − e−(p−1)η2/2, which is ≥ 1 − δ if η ≥

  • 2 log(1/δ)/(p − 1).

= ⇒ hlin is quasi-perfect!

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 12 / 41

slide-20
SLIDE 20

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof.

the adversarial robustness accuracy of hlin writes accε(hlin) := P(X,Y ) (Yhlin(X + ∆x) ≥ 0 ∀∆x∞ ≤ ε) = P(X,Y )

  • inf

∆x∞≤ε YwT(X + ∆x) ≥ 0

  • = P(X,Y )
  • YwTX −

sup

∆x∞≤ε

YwT∆x ≥ 0

  • = P(X,Y )
  • YwTX − εYw1 ≥ 0
  • = P(X,Y )
  • YwTX − ε ≥ 0
  • = P(N(0, 1/(p − 1)) ≥ ε − η) ≤ e−(p−1)(ε−η)2/2.

Thus accε(hlin) ≤ δ for ε ≥ η +

  • 2 log(1/δ)/(p − 1).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 13 / 41

slide-21
SLIDE 21

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof.

the adversarial robustness accuracy of hlin writes accε(hlin) := P(X,Y ) (Yhlin(X + ∆x) ≥ 0 ∀∆x∞ ≤ ε) = P(X,Y )

  • inf

∆x∞≤ε YwT(X + ∆x) ≥ 0

  • = P(X,Y )
  • YwTX −

sup

∆x∞≤ε

YwT∆x ≥ 0

  • = P(X,Y )
  • YwTX − εYw1 ≥ 0
  • = P(X,Y )
  • YwTX − ε ≥ 0
  • = P(N(0, 1/(p − 1)) ≥ ε − η) ≤ e−(p−1)(ε−η)2/2.

Thus accε(hlin) ≤ δ for ε ≥ η +

  • 2 log(1/δ)/(p − 1).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 13 / 41

slide-22
SLIDE 22

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof.

the adversarial robustness accuracy of hlin writes accε(hlin) := P(X,Y ) (Yhlin(X + ∆x) ≥ 0 ∀∆x∞ ≤ ε) = P(X,Y )

  • inf

∆x∞≤ε YwT(X + ∆x) ≥ 0

  • = P(X,Y )
  • YwTX −

sup

∆x∞≤ε

YwT∆x ≥ 0

  • = P(X,Y )
  • YwTX − εYw1 ≥ 0
  • = P(X,Y )
  • YwTX − ε ≥ 0
  • = P(N(0, 1/(p − 1)) ≥ ε − η) ≤ e−(p−1)(ε−η)2/2.

Thus accε(hlin) ≤ δ for ε ≥ η +

  • 2 log(1/δ)/(p − 1).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 13 / 41

slide-23
SLIDE 23

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof.

the adversarial robustness accuracy of hlin writes accε(hlin) := P(X,Y ) (Yhlin(X + ∆x) ≥ 0 ∀∆x∞ ≤ ε) = P(X,Y )

  • inf

∆x∞≤ε YwT(X + ∆x) ≥ 0

  • = P(X,Y )
  • YwTX −

sup

∆x∞≤ε

YwT∆x ≥ 0

  • = P(X,Y )
  • YwTX − εYw1 ≥ 0
  • = P(X,Y )
  • YwTX − ε ≥ 0
  • = P(N(0, 1/(p − 1)) ≥ ε − η) ≤ e−(p−1)(ε−η)2/2.

Thus accε(hlin) ≤ δ for ε ≥ η +

  • 2 log(1/δ)/(p − 1).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 13 / 41

slide-24
SLIDE 24

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof.

the adversarial robustness accuracy of hlin writes accε(hlin) := P(X,Y ) (Yhlin(X + ∆x) ≥ 0 ∀∆x∞ ≤ ε) = P(X,Y )

  • inf

∆x∞≤ε YwT(X + ∆x) ≥ 0

  • = P(X,Y )
  • YwTX −

sup

∆x∞≤ε

YwT∆x ≥ 0

  • = P(X,Y )
  • YwTX − εYw1 ≥ 0
  • = P(X,Y )
  • YwTX − ε ≥ 0
  • = P(N(0, 1/(p − 1)) ≥ ε − η) ≤ e−(p−1)(ε−η)2/2.

Thus accε(hlin) ≤ δ for ε ≥ η +

  • 2 log(1/δ)/(p − 1).

That is, the adversarial accuracy of hlin is close to zero!

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 13 / 41

slide-25
SLIDE 25

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof.

the adversarial robustness accuracy of hlin writes accε(hlin) := P(X,Y ) (Yhlin(X + ∆x) ≥ 0 ∀∆x∞ ≤ ε) = P(X,Y )

  • inf

∆x∞≤ε YwT(X + ∆x) ≥ 0

  • = P(X,Y )
  • YwTX −

sup

∆x∞≤ε

YwT∆x ≥ 0

  • = P(X,Y )
  • YwTX − εYw1 ≥ 0
  • = P(X,Y )
  • YwTX − ε ≥ 0
  • = P(N(0, 1/(p − 1)) ≥ ε − η) ≤ e−(p−1)(ε−η)2/2.

Thus accε(hlin) ≤ δ for ε ≥ η +

  • 2 log(1/δ)/(p − 1).

That is, the adversarial accuracy of hlin is close to zero!

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 13 / 41

slide-26
SLIDE 26

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

What could be going on ? [Intuition from Tsipras and co.]

Prediction target: Y ∼ Bern(1/2, {±1}) Robust feature: X 1 | Y = +Y w.p 70% and −Y w.p. 30%. Non-robust features: X j | Y ∼ N(ηY , 1), for j = 2, . . . , p BTW, we note that an optimal adversarial attack can be done by taking ∆x1 = 0 and ∆xj = −εy for all j = 2, . . . , p.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 14 / 41

slide-27
SLIDE 27

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

What could be going on ? [Intuition from Tsipras and co.]

Prediction target: Y ∼ Bern(1/2, {±1}) Robust feature: X 1 | Y = +Y w.p 70% and −Y w.p. 30%. Non-robust features: X j | Y ∼ N(ηY , 1), for j = 2, . . . , p BTW, we note that an optimal adversarial attack can be done by taking ∆x1 = 0 and ∆xj = −εy for all j = 2, . . . , p. Basic intuition: In standard training, all correlation is good correlation

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 14 / 41

slide-28
SLIDE 28

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

What could be going on ? [Intuition from Tsipras and co.]

Prediction target: Y ∼ Bern(1/2, {±1}) Robust feature: X 1 | Y = +Y w.p 70% and −Y w.p. 30%. Non-robust features: X j | Y ∼ N(ηY , 1), for j = 2, . . . , p BTW, we note that an optimal adversarial attack can be done by taking ∆x1 = 0 and ∆xj = −εy for all j = 2, . . . , p. Basic intuition: In standard training, all correlation is good correlation If we want robustness, must avoid weakly correlated features = ⇒ learn causal features ?

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 14 / 41

slide-29
SLIDE 29

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

What could be going on ? [Intuition from Tsipras and co.]

Prediction target: Y ∼ Bern(1/2, {±1}) Robust feature: X 1 | Y = +Y w.p 70% and −Y w.p. 30%. Non-robust features: X j | Y ∼ N(ηY , 1), for j = 2, . . . , p BTW, we note that an optimal adversarial attack can be done by taking ∆x1 = 0 and ∆xj = −εy for all j = 2, . . . , p. Basic intuition: In standard training, all correlation is good correlation If we want robustness, must avoid weakly correlated features = ⇒ learn causal features ?

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 14 / 41

slide-30
SLIDE 30

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

BTW, humans are not ”perfect”

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 15 / 41

slide-31
SLIDE 31

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Talagrand transportation-cost inequality

The T2(c) property Given c ≥ 0, a distribution µ on X is said to satisfy T2(c) if for every distribution ν on X with ν ≪ µ, one has W2(ν, µ) ≤

  • 2c kl(νµ),

(1) where kl(νµ) :=

  • X log(dν/dµ)dµ, entropy of ν relative to µ.

Generalizes the well-known Pinsker’s inequality for the total variation distance between probability measures (take 2c = 1/2).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 16 / 41

slide-32
SLIDE 32

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Talagrand transportation-cost inequality

The T2(c) property Given c ≥ 0, a distribution µ on X is said to satisfy T2(c) if for every distribution ν on X with ν ≪ µ, one has W2(ν, µ) ≤

  • 2c kl(νµ),

(1) where kl(νµ) :=

  • X log(dν/dµ)dµ, entropy of ν relative to µ.

Generalizes the well-known Pinsker’s inequality for the total variation distance between probability measures (take 2c = 1/2). Unlike Pinsker’s inequality which holds unconditionally, the inequality T2(c) is a privilege only enjoyed by special classes of reference distributions µ.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 16 / 41

slide-33
SLIDE 33

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Talagrand transportation-cost inequality

The T2(c) property Given c ≥ 0, a distribution µ on X is said to satisfy T2(c) if for every distribution ν on X with ν ≪ µ, one has W2(ν, µ) ≤

  • 2c kl(νµ),

(1) where kl(νµ) :=

  • X log(dν/dµ)dµ, entropy of ν relative to µ.

Generalizes the well-known Pinsker’s inequality for the total variation distance between probability measures (take 2c = 1/2). Unlike Pinsker’s inequality which holds unconditionally, the inequality T2(c) is a privilege only enjoyed by special classes of reference distributions µ.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 16 / 41

slide-34
SLIDE 34

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

BLOWUP / aka concentration of measure

The BLOWUP(c) property µ is said to satisfy BLOWUP(c) if for every Borel B ⊆ X with µ(B) > 0 and for every ε ≥

  • 2c log(1/µ(B)), it holds that

µ(Bε) ≥ 1 − e− 1

2c (ε−√

2c log(1/µ(B)))2.

(2) It is a classical result that the Gaussian distribution on Rp has BLOWUP(1) and T2(1), a phenomenon known as Gaussian isoperimetry.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 17 / 41

slide-35
SLIDE 35

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

BLOWUP / aka concentration of measure

The BLOWUP(c) property µ is said to satisfy BLOWUP(c) if for every Borel B ⊆ X with µ(B) > 0 and for every ε ≥

  • 2c log(1/µ(B)), it holds that

µ(Bε) ≥ 1 − e− 1

2c (ε−√

2c log(1/µ(B)))2.

(2) It is a classical result that the Gaussian distribution on Rp has BLOWUP(1) and T2(1), a phenomenon known as Gaussian isoperimetry. These result dates back to works of Borel, L´ evy, Talagrand and of Marton (see [Boucheron ’13] textbook)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 17 / 41

slide-36
SLIDE 36

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

BLOWUP / aka concentration of measure

The BLOWUP(c) property µ is said to satisfy BLOWUP(c) if for every Borel B ⊆ X with µ(B) > 0 and for every ε ≥

  • 2c log(1/µ(B)), it holds that

µ(Bε) ≥ 1 − e− 1

2c (ε−√

2c log(1/µ(B)))2.

(2) It is a classical result that the Gaussian distribution on Rp has BLOWUP(1) and T2(1), a phenomenon known as Gaussian isoperimetry. These result dates back to works of Borel, L´ evy, Talagrand and of Marton (see [Boucheron ’13] textbook)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 17 / 41

slide-37
SLIDE 37

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Marton’s Blowup lemma

Lemma (Marton’s BLOWUP Lemma) On a metric space, it holds that T2(c) ⊆ BLOWUP(c).

  • Proof. Fact: kl(µ|Bµ) = log(1/µ(B)) , where µ|B(A) := µ(A∩B)

µ(B) .

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 18 / 41

slide-38
SLIDE 38

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Marton’s Blowup lemma

Lemma (Marton’s BLOWUP Lemma) On a metric space, it holds that T2(c) ⊆ BLOWUP(c).

  • Proof. Fact: kl(µ|Bµ) = log(1/µ(B)) , where µ|B(A) := µ(A∩B)

µ(B) .

Thus ε ≤ W2(µ|B, µX\Bε) ≤ W2(µ|B, µ) + W2(µ|X\Bε, µ)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 18 / 41

slide-39
SLIDE 39

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Marton’s Blowup lemma

Lemma (Marton’s BLOWUP Lemma) On a metric space, it holds that T2(c) ⊆ BLOWUP(c).

  • Proof. Fact: kl(µ|Bµ) = log(1/µ(B)) , where µ|B(A) := µ(A∩B)

µ(B) .

Thus ε ≤ W2(µ|B, µX\Bε) ≤ W2(µ|B, µ) + W2(µ|X\Bε, µ) ≤

  • 2c kl(µ|Bµ) +
  • 2c kl(µ|X\Bεµ)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 18 / 41

slide-40
SLIDE 40

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Marton’s Blowup lemma

Lemma (Marton’s BLOWUP Lemma) On a metric space, it holds that T2(c) ⊆ BLOWUP(c).

  • Proof. Fact: kl(µ|Bµ) = log(1/µ(B)) , where µ|B(A) := µ(A∩B)

µ(B) .

Thus ε ≤ W2(µ|B, µX\Bε) ≤ W2(µ|B, µ) + W2(µ|X\Bε, µ) ≤

  • 2c kl(µ|Bµ) +
  • 2c kl(µ|X\Bεµ)

  • 2c log(1/µ(B)) +
  • 2c log(1/µ(X \ Bε))

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 18 / 41

slide-41
SLIDE 41

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Marton’s Blowup lemma

Lemma (Marton’s BLOWUP Lemma) On a metric space, it holds that T2(c) ⊆ BLOWUP(c).

  • Proof. Fact: kl(µ|Bµ) = log(1/µ(B)) , where µ|B(A) := µ(A∩B)

µ(B) .

Thus ε ≤ W2(µ|B, µX\Bε) ≤ W2(µ|B, µ) + W2(µ|X\Bε, µ) ≤

  • 2c kl(µ|Bµ) +
  • 2c kl(µ|X\Bεµ)

  • 2c log(1/µ(B)) +
  • 2c log(1/µ(X \ Bε))

=

  • 2c log(1/µ(B)) +
  • 2c log(1/(1 − µ(Bε)).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 18 / 41

slide-42
SLIDE 42

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Marton’s Blowup lemma

Lemma (Marton’s BLOWUP Lemma) On a metric space, it holds that T2(c) ⊆ BLOWUP(c).

  • Proof. Fact: kl(µ|Bµ) = log(1/µ(B)) , where µ|B(A) := µ(A∩B)

µ(B) .

Thus ε ≤ W2(µ|B, µX\Bε) ≤ W2(µ|B, µ) + W2(µ|X\Bε, µ) ≤

  • 2c kl(µ|Bµ) +
  • 2c kl(µ|X\Bεµ)

  • 2c log(1/µ(B)) +
  • 2c log(1/µ(X \ Bε))

=

  • 2c log(1/µ(B)) +
  • 2c log(1/(1 − µ(Bε)).

Rearranging the above inequality gives

  • 2c log(1/(1 − µ(Bε))) ≥ (ε −
  • 2c log(1/µ(B)))+,

and the result follows after squaring & exponentiating.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 18 / 41

slide-43
SLIDE 43

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Marton’s Blowup lemma

Lemma (Marton’s BLOWUP Lemma) On a metric space, it holds that T2(c) ⊆ BLOWUP(c).

  • Proof. Fact: kl(µ|Bµ) = log(1/µ(B)) , where µ|B(A) := µ(A∩B)

µ(B) .

Thus ε ≤ W2(µ|B, µX\Bε) ≤ W2(µ|B, µ) + W2(µ|X\Bε, µ) ≤

  • 2c kl(µ|Bµ) +
  • 2c kl(µ|X\Bεµ)

  • 2c log(1/µ(B)) +
  • 2c log(1/µ(X \ Bε))

=

  • 2c log(1/µ(B)) +
  • 2c log(1/(1 − µ(Bε)).

Rearranging the above inequality gives

  • 2c log(1/(1 − µ(Bε))) ≥ (ε −
  • 2c log(1/µ(B)))+,

and the result follows after squaring & exponentiating.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 18 / 41

slide-44
SLIDE 44

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Adversarial attacks are a ’butterfly effect’ on data manifold

Error set: B(h, k) = {x ∈ X | h(x) = k}, h = classifier Neighbors of error set: B(h, k)ε := {x ∈ X | d(x, B(h, k)) ≤ ε}

B(h, k) B(h, k)ǫ

err(h|k) := PX|k(B(h, k)) > 0 if h is not perfect on class k. Consequence is that accε(h|k) ց 0 expo. fast as function of ε. Thus adversarial robustness is impossible in general! Manuscript: https://arxiv.org/pdf/1810.04065.pdf

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 19 / 41

slide-45
SLIDE 45

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Strong No Free Lunch Theorem

Theorem (Strong “No Free Lunch” [Dohmatob ’18]) Suppose that conditional distribution PX|k has the T2(σ2

k)

  • property. Given a classifier h : X → Y such that err(h|k) > 0,

define ε(h|k) := σk

  • 2 log(1/ err(h|k)). Then we have the

following bounds: (A) Adversarial robustness accuracy: if ε ≥ ε(h|k), then accε(h|k) ≤ e

1 2σ2 k

(ε−ε(h|k))2

. (3) (B) Average distance to error set: d(h|k) ≤ σk

  • log(1/ err(h|k)) +
  • π/2
  • (4)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 20 / 41

slide-46
SLIDE 46

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Proof

Use Marton’s Lemma: BLOWUP(σ2

k) ⊆ T2(σ2 k) with

B := B(h, k) := {x ∈ X | h(x) = k} and µ = PX|k.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 21 / 41

slide-47
SLIDE 47

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Corollary (Strong “No Free Lunch” Theorem on flat space) Let 1 ≤ q ≤ ∞. Define ǫq(h|k) := ε(h|k)p1/q−1/2. If in addition to the assumptions of Strong No Free Lunch Theorem, and suppose the feature space is flat, i.e RicX = 0, then for the ℓq threat model, we have the following bounds: (A1) Adversarial robustness accuracy: if ε ≥ ǫq(h|k), then accε(h|k) ≤ e

− p1−2/q

2σ2 k

(ε−ǫq(h|k))2

. (5) (A2) Average distance to error set: d(h|k) ≤ σkp1/q−1/2 log(1/ err(h|k)) +

  • π/2
  • .

(6) Note that the case q = 1 is a proxy for “few-pixel” attack models [Su et a. ’18].

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 22 / 41

slide-48
SLIDE 48

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Strong No Free Lunch Theorem

Corollary (Strong NFLT for ℓ∞ attacks [Dohmatob ’18]) In particular, for the ℓ∞ threat model, we have the following bounds: (B1) Adversarial robustness accuracy: If ε ≥ ε(h|k)/√p, then accε(h|k) ≤ e

p 2σ2 k

(ε−ε(h|k)/√p)2

. (7) (B2) Average distance to error set: d(h|k) ≤ σk √p

  • log(1/ err(h|k)) +
  • π/2
  • (8)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 23 / 41

slide-49
SLIDE 49

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Special cases of our results

Log-concave distribs dPX|k ∝ e−vk(x)dx satisfying Eme´ ry-Bakry curvature condition: Hessx(vk) + Ricx(X) (1/σ2

k)Ip.

e.g multi-variate Gaussian (considered in [Tsipras ’18, Fawzi et al. 18]) Perturbed log-concave distribs (via Holley-Shroock Theorem) The uniform measure on compact Riemannian manifolds of positive Ricci curvature, e.g spheres (considered in [Gilmer ’18]), tori, or any compact Lie group. Pushforward via a Lipschitz function f , of a distribution in T2(σ2

k). Indeed, take ˜

σk = f Lipσk. etc.

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 24 / 41

slide-50
SLIDE 50

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Worked example: Adversarial spheres [Gilmer ’18]

Y ∼ Bern(1/2, {±}), X|k ∼ uniform(Sp

Rk), where

R+ > R− > 0. Sp

Rk is a compact Riemannian manifold

with constant Ricci curvature (p − 1)R−2

k .

Thus PX|k satisfies T2(R2

k/(p − 1)).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 25 / 41

slide-51
SLIDE 51

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Worked example: Adversarial spheres [Gilmer ’18]

Y ∼ Bern(1/2, {±}), X|k ∼ uniform(Sp

Rk), where

R+ > R− > 0. Sp

Rk is a compact Riemannian manifold

with constant Ricci curvature (p − 1)R−2

k .

Thus PX|k satisfies T2(R2

k/(p − 1)).

∴ EX|k[dgeo(X, B(h, k))] ≤ Rk √p − 1(

  • 2 log(1/err(h|k)) +
  • π/2)

∼ Rk √pΦ−1(acc(h|k)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 25 / 41

slide-52
SLIDE 52

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Worked example: Adversarial spheres [Gilmer ’18]

Y ∼ Bern(1/2, {±}), X|k ∼ uniform(Sp

Rk), where

R+ > R− > 0. Sp

Rk is a compact Riemannian manifold

with constant Ricci curvature (p − 1)R−2

k .

Thus PX|k satisfies T2(R2

k/(p − 1)).

∴ EX|k[dgeo(X, B(h, k))] ≤ Rk √p − 1(

  • 2 log(1/err(h|k)) +
  • π/2)

∼ Rk √pΦ−1(acc(h|k) This is the same bound obtained in “manually” [Gilmer ’18].

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 25 / 41

slide-53
SLIDE 53

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Worked example: Adversarial spheres [Gilmer ’18]

Y ∼ Bern(1/2, {±}), X|k ∼ uniform(Sp

Rk), where

R+ > R− > 0. Sp

Rk is a compact Riemannian manifold

with constant Ricci curvature (p − 1)R−2

k .

Thus PX|k satisfies T2(R2

k/(p − 1)).

∴ EX|k[dgeo(X, B(h, k))] ≤ Rk √p − 1(

  • 2 log(1/err(h|k)) +
  • π/2)

∼ Rk √pΦ−1(acc(h|k) This is the same bound obtained in “manually” [Gilmer ’18].

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 25 / 41

slide-54
SLIDE 54

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Some empirical confirmation

Phase-transition occurs as predicted by our theorems

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 26 / 41

slide-55
SLIDE 55

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Problem setup No Free Lunch Theorems The Strong No Free Lunch Theorem Corollaries

Key papers

[Tsipras ’18] There is no free lunch in adversarial robustness [Gilmer ’18] Adversarial spheres [Fawzi ’18] Adversarial vulnerability for any classifier [Athalye ’18] Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples [Dohmatob ’19] Generalized No Free Lunch Theorem for Adversarial Robustness [Shafahi ’19] Are adversarial examples inevitable?

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 27 / 41

slide-56
SLIDE 56

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Universal lower bounds

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 28 / 41

slide-57
SLIDE 57

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Overview

Previous classifier-dependent bounds make very strong assumptions on both the data and the classifier (e.g the theory fails for perfect classifiers) It would be nice to have universal bounds which only depend on the geomtry of the class-conditional distributions P+ and P− This is very very recent work, started by [Bhagoji ’19] (to appear in NeurIPS!) My own work builds on [Bhagoji ’19] as is still largely ongoing (AISTATS ???) References [Bhagoji ’19] Lower Bounds on Adversarial Robustness from Optimal Transport

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 29 / 41

slide-58
SLIDE 58

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Abstract view of adversarial attacks

The feature space X is an abstract measure space, and the target space is {±1} (binary classification). E.g X = (Rp, Borell). Let P be an unknown probability distribution on the product space X × {±1}. A classifier is any measurable function h : X → {±1}. An attack-model A is the prescription of a closed neighborhood Ax for each point x of X. E.g Ax = Ballℓ∞(x; ε). The case Ax = {x} ∀x ∈ X corresponds to the attackless model. A type-A attack is any measurable function a : X × {±1} → X such that a(x, y) ∈ Ax ∀(x, y) ∈ X × {±1}. With abuse of notation, we’ll also write a ∈ A. E.g a(x, y) := x − yz for some fixed z ∈ Ballℓ∞(0; ε).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 30 / 41

slide-59
SLIDE 59

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Abstract view of adversarial attacks

The robustness error of h to type-A attacks is errA(h) := E(x,y)∼P[∃x′ ∈ Ax s.t h(x′) = y] (9) The Bayes-optimal robustness error for type-A attacks is err∗

A := inf h errA(h)

(10)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 31 / 41

slide-60
SLIDE 60

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

The flying pig example again!

(Picture is courtesy of https: // gradientscience. org/ intro_ adversarial/ )

x → xadv := x + noise , noise ≤ ε = 0.005 (in example above) X = R#pixels, Ax = Ballℓ∞(x; 0.005)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 32 / 41

slide-61
SLIDE 61

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Adversarial attacks as optimal transport [Bhagoji ’19]

Given a classifier h, consider the derived classifier ˜ h : X → {±1} ˜ h(x) :=

  • y,

if ∃y ∈ {±1} s.t h(x′) = y ∀x′ ∈ Ax, ⊥, else. (11) Define the transport ground-cost cA(x, x′) =

  • 1,

if Ax ∩ Ax′ = ∅, 0, else, ✶ ✶

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 33 / 41

slide-62
SLIDE 62

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Adversarial attacks as optimal transport [Bhagoji ’19]

Given a classifier h, consider the derived classifier ˜ h : X → {±1} ˜ h(x) :=

  • y,

if ∃y ∈ {±1} s.t h(x′) = y ∀x′ ∈ Ax, ⊥, else. (11) Define the transport ground-cost cA(x, x′) =

  • 1,

if Ax ∩ Ax′ = ∅, 0, else, and note that ∀x, x′ ∈ X, one has ✶{˜

h(x)=1}+✶{˜ h(x)=−1} ≤ cA(x, x′)+1,

i.e f (x) − g(x′) ≤ cA(x, x′).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 33 / 41

slide-63
SLIDE 63

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Adversarial attacks as optimal transport [Bhagoji ’19]

Given a classifier h, consider the derived classifier ˜ h : X → {±1} ˜ h(x) :=

  • y,

if ∃y ∈ {±1} s.t h(x′) = y ∀x′ ∈ Ax, ⊥, else. (11) Define the transport ground-cost cA(x, x′) =

  • 1,

if Ax ∩ Ax′ = ∅, 0, else, and note that ∀x, x′ ∈ X, one has ✶{˜

h(x)=1}+✶{˜ h(x)=−1} ≤ cA(x, x′)+1,

i.e f (x) − g(x′) ≤ cA(x, x′).

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 33 / 41

slide-64
SLIDE 64

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Adversarial attacks as optimal transport [Bhagoji ’19]

cA(x, x′) =

  • 1,

if Ax ∩ Ax′ = ∅, 0, else, fh(x) − gh(x′) ≤ cA(x, x′) ∀x, x′ ∈ X, and so (fh, gh) is a pair of Kantorovich potentials for OT with ground-cost cA. ∴ OTcA(P−, P+) := sup

K−potentials φ,ψ

EP−[φ(x)] − EP+[ψ(x)]

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 34 / 41

slide-65
SLIDE 65

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Adversarial attacks as optimal transport [Bhagoji ’19]

cA(x, x′) =

  • 1,

if Ax ∩ Ax′ = ∅, 0, else, fh(x) − gh(x′) ≤ cA(x, x′) ∀x, x′ ∈ X, and so (fh, gh) is a pair of Kantorovich potentials for OT with ground-cost cA. ∴ OTcA(P−, P+) := sup

K−potentials φ,ψ

EP−[φ(x)] − EP+[ψ(x)] ≥ sup

h

EP−[gh(x)] − EP+[fh(x)]

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 34 / 41

slide-66
SLIDE 66

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Adversarial attacks as optimal transport [Bhagoji ’19]

cA(x, x′) =

  • 1,

if Ax ∩ Ax′ = ∅, 0, else, fh(x) − gh(x′) ≤ cA(x, x′) ∀x, x′ ∈ X, and so (fh, gh) is a pair of Kantorovich potentials for OT with ground-cost cA. ∴ OTcA(P−, P+) := sup

K−potentials φ,ψ

EP−[φ(x)] − EP+[ψ(x)] ≥ sup

h

EP−[gh(x)] − EP+[fh(x)] = sup

h

2(1 − errA(h)) − 1 = 1 − err∗

A

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 34 / 41

slide-67
SLIDE 67

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Adversarial attacks as optimal transport [Bhagoji ’19]

cA(x, x′) =

  • 1,

if Ax ∩ Ax′ = ∅, 0, else, fh(x) − gh(x′) ≤ cA(x, x′) ∀x, x′ ∈ X, and so (fh, gh) is a pair of Kantorovich potentials for OT with ground-cost cA. ∴ OTcA(P−, P+) := sup

K−potentials φ,ψ

EP−[φ(x)] − EP+[ψ(x)] ≥ sup

h

EP−[gh(x)] − EP+[fh(x)] = sup

h

2(1 − errA(h)) − 1 = 1 − err∗

A

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 34 / 41

slide-68
SLIDE 68

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Universal lower bound on adversarial robustness error

Theorem ([Bhagoji ’19]) Given an attack model A, let OTA(P+, P−) be the optimal transport distance between the +ve and -ve class-conditonal distributions of the samples, with the ground cost given by cA(x, x′) = ✶{Ax∩Ax′=∅}. Then we have he following lower bound

  • n the classification error against A-attacks

err∗

A ≥ 1

2(1 − OTA(P+, P−)) (12) In particular, for the attackless case where Ax = {x} ∀x ∈ X,

  • ne has cA(x, x′) = ✶x=x′ and so OTA(P+, P−) = TV (P+, P−).

The theorem then reduces to the following well-known result err∗ ≥ 1 2(1 − TV (P+, P−)). (13)

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 35 / 41

slide-69
SLIDE 69

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Total-Variational reformulation of the bounds

Theorem ([Dohmatob ’20 ?] / ongoing work) Let A be an attack, and for a ∈ A define a+(x) :≡ a(x, +1). Define Ω := {(x, x′) ∈ X 2 | Ax ∩ Ax′ = ∅}, and TVA(P−, P+) := inf

a∈A TV (a−#P−, a+#P+),

  • TV A(P−, P+) := inf

γ1,γ2 TV (proj2#γ1, proj1#γ2),

(14) where the inf is taken over all distributions on X 2 which are concentrated on Ω s.t proj1#γ1 = P− and proj2#γ2 = P+. Then, OTA(P−, P+) = TV A(P−, P+) ≤ TVA(P−, P+), (15) and there is equality if P− and P+ have densities w.r.t Lebesgue. Above bound suggest that rather than doing adversarial training, we’d rather do normal training on adversarially augmented data!

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 36 / 41

slide-70
SLIDE 70

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Worked example: hierarchical Gaussian classification

(Example from [Schmidt ’18]) µ ∼ N(0, Ip), y ∼ Bern({±1}), X|(Y = y) ∼ N(yµ, σ2Ip), Consider the ℓ∞-norm attack model A give by Ax = Ballℓ∞(x; ε). Given n samples Sn = {(x1, y1), . . . , (xn, yn)} from this model, how small can the robust error of a classifier be ? More precisely, lets bound Eµ∼N(0,1) inf

ˆ h

ESn∼PnEˆ

hn∼ˆ h(Sn)[errA(ˆ

hn; µ)], (16) where errA(ˆ hn; µ) is the adversarial robust error of ˆ hn defined by errA(ˆ hn; µ) := Ey∼Bern({±1})Ex∼N(yµ,σ2Ip)[∃x′ ∈ Ax s.t ˆ hn(x′) = y].

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 37 / 41

slide-71
SLIDE 71

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Worked example: hierarchical Gaussian classification

errA(ˆ hn; µ) := Ey∼Bern({±1})Ex∼N(yµ,σ2Ip)[∃x′ ∈ Ax s.t ˆ hn(x′) = y]. The posterior distribution of the model parameter is N(ˆ µn, ˆ σ2

n),

with ˆ σ2

n = σ2 σ2+n, and ˆ

µn =

n σ2+n ¯

x with ¯ x = 1

n

n

i=1 xi

∴ errA(ˆ hn) = inf

h errA(h; ˆ

µn, ˆ σ2

n) ≥ . . .

≥ EµESn 1 2

  • 1 −

inf

z∞≤ε TV(N(−ˆ

µn + z, ˆ σ2

n), N(ˆ

µn − z, ˆ σ2

n))

  • ≥ EµESnΦ

√p ˆ σn (ˆ µn − ε)+

  • ≥ EµESnP(ˆ

µn∞ ≤ ε)Φ(0) ≈ 1 2Pu∼N(0,Ip)

  • n

n + σ2 u∞ ≤ ε

  • Elvis Dohmatob

Limits on Robustness to Adversarial Examples – slide 38 / 41

slide-72
SLIDE 72

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Worked example: hierarchical Gaussian classification

Thus, if n ≤

ε2σ2 8 log(d), then n σ2+n ≤ ε 2√ 2 log(d), and so

∴ err∗

A, n ≥ 1

2Pu∼N(0,Ip)

  • n

n + σ2 u∞ ≤ ε

  • ≥ 1

2Pu∼N(0,Ip)

  • u∞ ≤ 2
  • 2 log(d)
  • .

. . ≥ 1 2(1 − 1/d) ≈ 1 2 in high dimensions!

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 39 / 41

slide-73
SLIDE 73

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Main references

[Dohmatob ’19] Generalized No Free Lunch Theorem for Adversarial Robustness [Bhagoji ’19] Lower Bounds on Adversarial Robustness from Optimal Transport [Tsipras ’18] There is no free lunch in adversarial robustness [Gilmer ’18] Adversarial spheres [Goodfellow ’14] Explaining and harnessing adversarial examples [Su ’17] One pixel attack for fooling deep neural networks [Fawzi ’18] Adversarial vulnerability for any classifier [Athalye ’18] Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 40 / 41

slide-74
SLIDE 74

Preliminaries on adversarial robustness Classifier-dependent lower bounds Universal lower bounds Link between adversarial examples and optimal transport Adversarially robust learning via adversarially augmented data

Questions ?

Elvis Dohmatob Limits on Robustness to Adversarial Examples – slide 41 / 41