On the Hardness of Robust Classification P. Gourdeau, V. Kanade, M. - - PowerPoint PPT Presentation

on the hardness of robust classification
SMART_READER_LITE
LIVE PREVIEW

On the Hardness of Robust Classification P. Gourdeau, V. Kanade, M. - - PowerPoint PPT Presentation

On the Hardness of Robust Classification P. Gourdeau, V. Kanade, M. Kwiatkowska and J. Worrell University of Oxford Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 1 / 22 Overview A computational and


slide-1
SLIDE 1

On the Hardness of Robust Classification

  • P. Gourdeau, V. Kanade, M. Kwiatkowska and J. Worrell

University of Oxford

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 1 / 22

slide-2
SLIDE 2

Overview

A computational and information-theoretic study of the hardness of robust learning.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 2 / 22

slide-3
SLIDE 3

Overview

A computational and information-theoretic study of the hardness of robust learning. Setting: Binary classification tasks on input space X = {0, 1}n in the presence of an adversary.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 2 / 22

slide-4
SLIDE 4

Overview

A computational and information-theoretic study of the hardness of robust learning. Setting: Binary classification tasks on input space X = {0, 1}n in the presence of an adversary. E.g.: distinguishing between handwritten 0’s and 1’s:

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 2 / 22

slide-5
SLIDE 5

Overview

A computational and information-theoretic study of the hardness of robust learning. Setting: Binary classification tasks on input space X = {0, 1}n in the presence of an adversary. E.g.: distinguishing between handwritten 0’s and 1’s: {((0, 1, . . . , 1), 0), ((1, 1, . . . , 1), 1), . . . , ((0, 1, . . . , 0), 0)}

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 2 / 22

slide-6
SLIDE 6

Overview

A computational and information-theoretic study of the hardness of robust learning. Setting: Binary classification tasks on input space X = {0, 1}n in the presence of an adversary. E.g.: distinguishing between handwritten 0’s and 1’s: {((0, 1, . . . , 1), 0), ((1, 1, . . . , 1), 1), . . . , ((0, 1, . . . , 0), 0)}

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 2 / 22

slide-7
SLIDE 7

Overview

A computational and information-theoretic study of the hardness of robust learning. Setting: Binary classification tasks on input space X = {0, 1}n in the presence of an adversary. E.g.: distinguishing between handwritten 0’s and 1’s: {((0, 1, . . . , 1), 0), ((1, 1, . . . , 1), 1), . . . , ((0, 1, . . . , 0), 0)}

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 2 / 22

slide-8
SLIDE 8

Overview

Today’s talk: A comparison of different notions of robust risk,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 3 / 22

slide-9
SLIDE 9

Overview

Today’s talk: A comparison of different notions of robust risk, A result on the impossibility of sample-efficient distribution-free robust learning,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 3 / 22

slide-10
SLIDE 10

Overview

Today’s talk: A comparison of different notions of robust risk, A result on the impossibility of sample-efficient distribution-free robust learning, Robustness thresholds to robustly learn monotone conjunctions under log-Lipschitz distributions,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 3 / 22

slide-11
SLIDE 11

Overview

Today’s talk: A comparison of different notions of robust risk, A result on the impossibility of sample-efficient distribution-free robust learning, Robustness thresholds to robustly learn monotone conjunctions under log-Lipschitz distributions, A simple proof of the computational hardness of robust learning.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 3 / 22

slide-12
SLIDE 12

Machine Learning Classification Tasks

Big picture:

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 4 / 22

slide-13
SLIDE 13

Machine Learning Classification Tasks

Big picture: Data i.i.d. from unknown distribution

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 4 / 22

slide-14
SLIDE 14

Machine Learning Classification Tasks

Big picture: Data i.i.d. from unknown distribution labelled from some concept.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 4 / 22

slide-15
SLIDE 15

Machine Learning Classification Tasks

Big picture: Data i.i.d. from unknown distribution labelled from some concept. We focus on the realizable setting,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 4 / 22

slide-16
SLIDE 16

Machine Learning Classification Tasks

Big picture: Data i.i.d. from unknown distribution labelled from some concept. We focus on the realizable setting, as opposed to the agnostic setting.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 4 / 22

slide-17
SLIDE 17

Machine Learning Classification Tasks

Big picture: Data i.i.d. from unknown distribution labelled from some concept. We focus on the realizable setting, as opposed to the agnostic setting. Learning algorithm A with sample complexity m: when given a sample S of size ≥ m, A outputs a hypothesis that has low error w.h.p. over S.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 4 / 22

slide-18
SLIDE 18

Robust Classification Tasks

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 5 / 22

slide-19
SLIDE 19

Robust Classification Tasks

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 5 / 22

slide-20
SLIDE 20

Robust Classification Tasks

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 5 / 22

slide-21
SLIDE 21

Robust Classification Tasks

Goal: learn a function that will be robust (with high probability) against an adversary who can perturb the test data.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 5 / 22

slide-22
SLIDE 22

Robust Classification Tasks

Goal: learn a function that will be robust (with high probability) against an adversary who can perturb the test data. Question: How do we define a misclassification?

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 5 / 22

slide-23
SLIDE 23

Adversarial Examples

General idea: An adversarial example is constructed from a natural example drawn from a distribution D by adding a perturbation.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 6 / 22

slide-24
SLIDE 24

Adversarial Examples

General idea: An adversarial example is constructed from a natural example drawn from a distribution D by adding a perturbation.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 6 / 22

slide-25
SLIDE 25

Adversarial Examples

General idea: An adversarial example is constructed from a natural example drawn from a distribution D by adding a perturbation. c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget)

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 6 / 22

slide-26
SLIDE 26

Adversarial Examples

General idea: An adversarial example is constructed from a natural example drawn from a distribution D by adding a perturbation. c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget)

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 6 / 22

slide-27
SLIDE 27

Adversarial Examples

General idea: An adversarial example is constructed from a natural example drawn from a distribution D by adding a perturbation. c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget)

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 6 / 22

slide-28
SLIDE 28

Adversarial Examples

General idea: An adversarial example is constructed from a natural example drawn from a distribution D by adding a perturbation. c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget)

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 6 / 22

slide-29
SLIDE 29

Adversarial Examples

General idea: An adversarial example is constructed from a natural example drawn from a distribution D by adding a perturbation. c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget)

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 6 / 22

slide-30
SLIDE 30

Adversarial Examples

General idea: An adversarial example is constructed from a natural example drawn from a distribution D by adding a perturbation. c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget)

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 6 / 22

slide-31
SLIDE 31

Robust Risk Definitions

c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget)

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 7 / 22

slide-32
SLIDE 32

Robust Risk Definitions

c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget) Robust risks: Constant-in-the-ball: probability that an adversary can perturb a point x drawn from D to z with budget ρ, so that c on x and h on z differ: RC

ρ (h, c) = P x∼D (∃z ∈ Bρ (x) . c(x) = h(z)) .

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 7 / 22

slide-33
SLIDE 33

Robust Risk Definitions

c: target concept h: hypothesis ρ: robustness parameter (adversary’s perturbation budget) Robust risks: Constant-in-the-ball: probability that an adversary can perturb a point x drawn from D to z with budget ρ, so that c on x and h on z differ: RC

ρ (h, c) = P x∼D (∃z ∈ Bρ (x) . c(x) = h(z)) .

Exact-in-the-ball: probability that an adversary can perturb a point x drawn from D to z with budget ρ, so that c and h disagree on z: RE

ρ (h, c) = P x∼D (∃z ∈ Bρ (x) . c(z) = h(z)) .

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 7 / 22

slide-34
SLIDE 34

Comparing Robust Risk Functions

In general, the constant-in-the-ball and the exact-in-the-ball risk functions are not comparable:

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 8 / 22

slide-35
SLIDE 35

Comparing Robust Risk Functions

In general, the constant-in-the-ball and the exact-in-the-ball risk functions are not comparable: (a) RE

ρ > 0, RC ρ = 0 ,

(b) RE

ρ = 0, RC ρ > 0 ,

(c) RE

ρ > 0, RC ρ > 0.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 8 / 22

slide-36
SLIDE 36

Choosing a Robust Risk Function

RC

ρ pros and cons:

simple: only need to know x’s correct label to evaluate its loss,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 9 / 22

slide-37
SLIDE 37

Choosing a Robust Risk Function

RC

ρ pros and cons:

simple: only need to know x’s correct label to evaluate its loss, can have positive risk when c = h,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 9 / 22

slide-38
SLIDE 38

Choosing a Robust Risk Function

RC

ρ pros and cons:

simple: only need to know x’s correct label to evaluate its loss, can have positive risk when c = h,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 9 / 22

slide-39
SLIDE 39

Choosing a Robust Risk Function

RC

ρ pros and cons:

simple: only need to know x’s correct label to evaluate its loss, can have positive risk when c = h, some concept classes are inherently not robust w.r.t. to this definition,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 9 / 22

slide-40
SLIDE 40

Choosing a Robust Risk Function

RC

ρ pros and cons:

simple: only need to know x’s correct label to evaluate its loss, can have positive risk when c = h, some concept classes are inherently not robust w.r.t. to this definition, as ρ → n, we require the function to be constant.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 9 / 22

slide-41
SLIDE 41

Choosing a Robust Risk Function

RC

ρ pros and cons:

simple: only need to know x’s correct label to evaluate its loss, can have positive risk when c = h, some concept classes are inherently not robust w.r.t. to this definition, as ρ → n, we require the function to be constant. RE

ρ pros and cons:

requires knowledge of c outside of sampled points, e.g. through membership queries,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 9 / 22

slide-42
SLIDE 42

Choosing a Robust Risk Function

RC

ρ pros and cons:

simple: only need to know x’s correct label to evaluate its loss, can have positive risk when c = h, some concept classes are inherently not robust w.r.t. to this definition, as ρ → n, we require the function to be constant. RE

ρ pros and cons:

requires knowledge of c outside of sampled points, e.g. through membership queries, RR

ρ (c, c) = 0.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 9 / 22

slide-43
SLIDE 43

Choosing a Robust Risk Function

RC

ρ pros and cons:

simple: only need to know x’s correct label to evaluate its loss, can have positive risk when c = h, some concept classes are inherently not robust w.r.t. to this definition, as ρ → n, we require the function to be constant. RE

ρ pros and cons:

requires knowledge of c outside of sampled points, e.g. through membership queries, RR

ρ (c, c) = 0.

In our view: adversary’s power = creating perturbations that cause c = h, so we choose RE

ρ , despite its drawbacks.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 9 / 22

slide-44
SLIDE 44

Efficient Robust Learning

A efficiently ρ-robustly learns a concept class C with respect to distribution class D:

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 10 / 22

slide-45
SLIDE 45

Efficient Robust Learning

A efficiently ρ-robustly learns a concept class C with respect to distribution class D: There exists a polynomial sample complexity function poly such that for any input dimension n, any target concept c, any distribution D, and any accuracy and confidence parameters ǫ, δ > 0,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 10 / 22

slide-46
SLIDE 46

Efficient Robust Learning

A efficiently ρ-robustly learns a concept class C with respect to distribution class D: There exists a polynomial sample complexity function poly such that for any input dimension n, any target concept c, any distribution D, and any accuracy and confidence parameters ǫ, δ > 0, when A is given access to a sample S ∼ Dm, where m ≥ poly(1/ǫ, 1/δ, n), A outputs h : {0, 1}n → {0, 1} such that

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 10 / 22

slide-47
SLIDE 47

Efficient Robust Learning

A efficiently ρ-robustly learns a concept class C with respect to distribution class D: There exists a polynomial sample complexity function poly such that for any input dimension n, any target concept c, any distribution D, and any accuracy and confidence parameters ǫ, δ > 0, when A is given access to a sample S ∼ Dm, where m ≥ poly(1/ǫ, 1/δ, n), A outputs h : {0, 1}n → {0, 1} such that

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 10 / 22

slide-48
SLIDE 48

Efficient Robust Learning

A efficiently ρ-robustly learns a concept class C with respect to distribution class D: There exists a polynomial sample complexity function poly such that for any input dimension n, any target concept c, any distribution D, and any accuracy and confidence parameters ǫ, δ > 0, when A is given access to a sample S ∼ Dm, where m ≥ poly(1/ǫ, 1/δ, n), A outputs h : {0, 1}n → {0, 1} such that P

S∼Dm

  • RE

ρ(n)(h, c) < ǫ

  • > 1 − δ .

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 10 / 22

slide-49
SLIDE 49

Efficient Robust Learning

A efficiently ρ-robustly learns a concept class C with respect to distribution class D: There exists a polynomial sample complexity function poly such that for any input dimension n, any target concept c, any distribution D, and any accuracy and confidence parameters ǫ, δ > 0, when A is given access to a sample S ∼ Dm, where m ≥ poly(1/ǫ, 1/δ, n), A outputs h : {0, 1}n → {0, 1} such that P

S∼Dm

  • RE

ρ(n)(h, c) < ǫ

  • > 1 − δ .

Note: We require polynomial sample complexity,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 10 / 22

slide-50
SLIDE 50

Efficient Robust Learning

A efficiently ρ-robustly learns a concept class C with respect to distribution class D: There exists a polynomial sample complexity function poly such that for any input dimension n, any target concept c, any distribution D, and any accuracy and confidence parameters ǫ, δ > 0, when A is given access to a sample S ∼ Dm, where m ≥ poly(1/ǫ, 1/δ, n), A outputs h : {0, 1}n → {0, 1} such that P

S∼Dm

  • RE

ρ(n)(h, c) < ǫ

  • > 1 − δ .

Note: We require polynomial sample complexity, It might make more sense to require finite sample complexity in other contexts, e.g. Rn.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 10 / 22

slide-51
SLIDE 51

No Distribution-Free Robust Learning

Theorem

C is efficiently distribution-free robustly learnable iff it is trivial.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 11 / 22

slide-52
SLIDE 52

No Distribution-Free Robust Learning

Theorem

C is efficiently distribution-free robustly learnable iff it is trivial. Proof idea: If C is non-trivial, we can find c1 and c2 and x such that (0, 0, . . . , 1, . . . , 0, 0) c1(x) = c2(x) .

slide-53
SLIDE 53

No Distribution-Free Robust Learning

Theorem

C is efficiently distribution-free robustly learnable iff it is trivial. Proof idea: If C is non-trivial, we can find c1 and c2 and x such that (0, 0, . . . , 0, . . . , 0, 0) c1(x) = c2(x) .

slide-54
SLIDE 54

No Distribution-Free Robust Learning

Theorem

C is efficiently distribution-free robustly learnable iff it is trivial. Proof idea: If C is non-trivial, we can find c1 and c2 and x such that (0, 0, . . . , 0, . . . , 0, 0) c1(x) = c2(x) . Construct a distribution such that c1 and c2 will likely agree on a sample of size polynomial in n but have RE

ρ (c1, c2) = Ω(1).

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 11 / 22

slide-55
SLIDE 55

No Distribution-Free Robust Learning

Theorem

C is efficiently distribution-free robustly learnable iff it is trivial. Proof idea: If C is non-trivial, we can find c1 and c2 and x such that (0, 0, . . . , 0, . . . , 0, 0) c1(x) = c2(x) . Construct a distribution such that c1 and c2 will likely agree on a sample of size polynomial in n but have RE

ρ (c1, c2) = Ω(1).

Let c ∼ Unif(c1, c2) before labelling the sample. Then any function we learn won’t be robust against c with positive probability.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 11 / 22

slide-56
SLIDE 56

“Nice” Distributions

Idea: We need distributional assumptions to have efficient robust learning.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 12 / 22

slide-57
SLIDE 57

“Nice” Distributions

Idea: We need distributional assumptions to have efficient robust learning. Log-Lipschitz distributions: D is α-log-Lipschitz if the logarithm of the density function is log(α)-Lipschitz w.r.t. the Hamming distance.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 12 / 22

slide-58
SLIDE 58

“Nice” Distributions

Idea: We need distributional assumptions to have efficient robust learning. Log-Lipschitz distributions: D is α-log-Lipschitz if the logarithm of the density function is log(α)-Lipschitz w.r.t. the Hamming distance. x1 = (0, . . . , 1, 1, 1, . . . , 0) x2 = (0, . . . , 1, 0, 1, . . . , 0) = ⇒ D(x1) D(x2) ≤ α .

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 12 / 22

slide-59
SLIDE 59

“Nice” Distributions

Idea: We need distributional assumptions to have efficient robust learning. Log-Lipschitz distributions: D is α-log-Lipschitz if the logarithm of the density function is log(α)-Lipschitz w.r.t. the Hamming distance. x1 = (0, . . . , 1, 1, 1, . . . , 0) x2 = (0, . . . , 1, 0, 1, . . . , 0) = ⇒ D(x1) D(x2) ≤ α . Intuition: input points that are close to each other cannot have vastly different probability masses. Examples: uniform distribution, product distribution where the mean of each variable is bounded, etc.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 12 / 22

slide-60
SLIDE 60

Monotone Conjunctions

Efficient distribution-free robust learning is not possible in general, but what happens when we restrict the class of distributions?

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 13 / 22

slide-61
SLIDE 61

Monotone Conjunctions

Efficient distribution-free robust learning is not possible in general, but what happens when we restrict the class of distributions? We look at MON-CONJ : monotone conjunctions E.g.: h(x) = x1 ∧ x3 ∧ x5.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 13 / 22

slide-62
SLIDE 62

Monotone Conjunctions

Efficient distribution-free robust learning is not possible in general, but what happens when we restrict the class of distributions? We look at MON-CONJ : monotone conjunctions E.g.: h(x) = x1 ∧ x3 ∧ x5.

Theorem

The threshold to robustly learn MON-CONJ under log-Lipschitz distributions is ρ(n) = O(log n).

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 13 / 22

slide-63
SLIDE 63

Monotone Conjunctions

Theorem

The threshold to robustly learn MON-CONJ under log-Lipschitz distributions is ρ(n) = O(log n).

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 14 / 22

slide-64
SLIDE 64

Monotone Conjunctions

Theorem

The threshold to robustly learn MON-CONJ under log-Lipschitz distributions is ρ(n) = O(log n). To show that MON-CONJ is not efficiently robustly learnable for ρ(n) = ω(log n), we can show that, under the uniform distribution Choose long enough monotone conjunctions c1 and c2

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 14 / 22

slide-65
SLIDE 65

Monotone Conjunctions

Theorem

The threshold to robustly learn MON-CONJ under log-Lipschitz distributions is ρ(n) = O(log n). To show that MON-CONJ is not efficiently robustly learnable for ρ(n) = ω(log n), we can show that, under the uniform distribution Choose long enough monotone conjunctions c1 and c2 Choose input dimension n large enough,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 14 / 22

slide-66
SLIDE 66

Monotone Conjunctions

Theorem

The threshold to robustly learn MON-CONJ under log-Lipschitz distributions is ρ(n) = O(log n). To show that MON-CONJ is not efficiently robustly learnable for ρ(n) = ω(log n), we can show that, under the uniform distribution Choose long enough monotone conjunctions c1 and c2 Choose input dimension n large enough, A sample of size polynomial in n will likely look constant with fixed probability.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 14 / 22

slide-67
SLIDE 67

Monotone Conjunctions

Theorem

The threshold to robustly learn MON-CONJ under log-Lipschitz distributions is ρ(n) = O(log n). To show that MON-CONJ is not efficiently robustly learnable for ρ(n) = ω(log n), we can show that, under the uniform distribution Choose long enough monotone conjunctions c1 and c2 Choose input dimension n large enough, A sample of size polynomial in n will likely look constant with fixed probability. Again, choose target at random before labelling.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 14 / 22

slide-68
SLIDE 68

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n).

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 15 / 22

slide-69
SLIDE 69

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n). Algorithm: Start with h(x) =

i∈[n] xi. For each positive example x, if

xi = 0, remove i from the index set.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 15 / 22

slide-70
SLIDE 70

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n). Algorithm: Start with h(x) =

i∈[n] xi. For each positive example x, if

xi = 0, remove i from the index set. Example: Input space: X = {0, 1}5 Target: x1 ∧ x3 ∧ x5 Hypothesis: x1 ∧ x2 ∧ x3 ∧ x4 ∧ x5

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 15 / 22

slide-71
SLIDE 71

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n). Algorithm: Start with h(x) =

i∈[n] xi. For each positive example x, if

xi = 0, remove i from the index set. Example: Input space: X = {0, 1}5 Target: x1 ∧ x3 ∧ x5 Hypothesis: x1 ∧ x2 ∧ x3 ∧ x4 ∧ x5 Sample: (1, 1, 1, 0, 1), 1

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 15 / 22

slide-72
SLIDE 72

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n). Algorithm: Start with h(x) =

i∈[n] xi. For each positive example x, if

xi = 0, remove i from the index set. Example: Input space: X = {0, 1}5 Target: x1 ∧ x3 ∧ x5 Hypothesis: x1 ∧ x2 ∧ x3 ∧ x5 Sample: (1, 1, 1, 0, 1), 1

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 15 / 22

slide-73
SLIDE 73

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n). Algorithm: Start with h(x) =

i∈[n] xi. For each positive example x, if

xi = 0, remove i from the index set. Example: Input space: X = {0, 1}5 Target: x1 ∧ x3 ∧ x5 Hypothesis: x1 ∧ x2 ∧ x3 ∧ x5 Sample: (1, 1, 1, 0, 1), 1 (0, 0, 1, 1, 1), 0

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 15 / 22

slide-74
SLIDE 74

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n). Algorithm: Start with h(x) =

i∈[n] xi. For each positive example x, if

xi = 0, remove i from the index set. Example: Input space: X = {0, 1}5 Target: x1 ∧ x3 ∧ x5 Hypothesis: x1 ∧ x2 ∧ x3 ∧ x5 Sample: (1, 1, 1, 0, 1), 1 (0, 0, 1, 1, 1), 0 (1, 0, 1, 1, 1), 1

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 15 / 22

slide-75
SLIDE 75

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n). Algorithm: Start with h(x) =

i∈[n] xi. For each positive example x, if

xi = 0, remove i from the index set. Example: Input space: X = {0, 1}5 Target: x1 ∧ x3 ∧ x5 Hypothesis: x1 ∧ x3 ∧ x5 Sample: (1, 1, 1, 0, 1), 1 (0, 0, 1, 1, 1), 0 (1, 0, 1, 1, 1), 1

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 15 / 22

slide-76
SLIDE 76

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n).

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 16 / 22

slide-77
SLIDE 77

Robust Learnability for Logarithmically-Bounded Adversary

Theorem

The algorithm to PAC-learn MON-CONJ is an efficient ρ-robust learning algorithm for log-Lipschitz distributions when ρ = O(log n). Proof idea: Two cases: If the target conjunction is short enough, we have learned exactly, and hence robustly. If the target conjunction is large enough, we can use concentration bounds to show that the adversary is unlikely to cause a label change.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 16 / 22

slide-78
SLIDE 78

Computational Hardness of Robust Learning

Previous computational hardness of robust learning results used: Another learning model (statistical query) [Bubeck et al., 2018], Cryptographic assumptions [Degwekar and Vaikuntanathan, 2019].

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 17 / 22

slide-79
SLIDE 79

Computational Hardness of Robust Learning

Previous computational hardness of robust learning results used: Another learning model (statistical query) [Bubeck et al., 2018], Cryptographic assumptions [Degwekar and Vaikuntanathan, 2019]. Our proof is quite simple, and only relies on the existence of a hard problem on the boolean hypercube in the PAC-learning framework.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 17 / 22

slide-80
SLIDE 80

Computational Hardness of Robust Learning

Previous computational hardness of robust learning results used: Another learning model (statistical query) [Bubeck et al., 2018], Cryptographic assumptions [Degwekar and Vaikuntanathan, 2019]. Our proof is quite simple, and only relies on the existence of a hard problem on the boolean hypercube in the PAC-learning framework. (C, D, X)

PAC learning Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 17 / 22

slide-81
SLIDE 81

Computational Hardness of Robust Learning

Previous computational hardness of robust learning results used: Another learning model (statistical query) [Bubeck et al., 2018], Cryptographic assumptions [Degwekar and Vaikuntanathan, 2019]. Our proof is quite simple, and only relies on the existence of a hard problem on the boolean hypercube in the PAC-learning framework. (C, D, X)

PAC learning

(C′, D′, X ′)

Robust learning Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 17 / 22

slide-82
SLIDE 82

Computational Hardness of Robust Learning

Previous computational hardness of robust learning results used: Another learning model (statistical query) [Bubeck et al., 2018], Cryptographic assumptions [Degwekar and Vaikuntanathan, 2019]. Our proof is quite simple, and only relies on the existence of a hard problem on the boolean hypercube in the PAC-learning framework. (C, D, X)

PAC learning

(C′, D′, X ′)

Robust learning Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 17 / 22

slide-83
SLIDE 83

Take Away

The definitions and models come from previous work in adversarial machine learning theory.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 18 / 22

slide-84
SLIDE 84

Take Away

The definitions and models come from previous work in adversarial machine learning theory. At first glance, they seem in many ways natural and reasonable.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 18 / 22

slide-85
SLIDE 85

Take Away

The definitions and models come from previous work in adversarial machine learning theory. At first glance, they seem in many ways natural and reasonable.

Their inadequacies surface when viewed under the lens of computational learning theory.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 18 / 22

slide-86
SLIDE 86

Take Away

The definitions and models come from previous work in adversarial machine learning theory. At first glance, they seem in many ways natural and reasonable.

Their inadequacies surface when viewed under the lens of computational learning theory.

It may be possible to only solve “easy” robust learning problems with strong distributional assumptions.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 18 / 22

slide-87
SLIDE 87

Take Away

The definitions and models come from previous work in adversarial machine learning theory. At first glance, they seem in many ways natural and reasonable.

Their inadequacies surface when viewed under the lens of computational learning theory.

It may be possible to only solve “easy” robust learning problems with strong distributional assumptions. Other learning models, e.g. when one has access to membership queries.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 18 / 22

slide-88
SLIDE 88

Current and Future Work

Generalize robustness threshold for other concept classes:

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 19 / 22

slide-89
SLIDE 89

Current and Future Work

Generalize robustness threshold for other concept classes:

Majority functions,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 19 / 22

slide-90
SLIDE 90

Current and Future Work

Generalize robustness threshold for other concept classes:

Majority functions, Linear threshold functions,

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 19 / 22

slide-91
SLIDE 91

Current and Future Work

Generalize robustness threshold for other concept classes:

Majority functions, Linear threshold functions, etc.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 19 / 22

slide-92
SLIDE 92

Current and Future Work

Generalize robustness threshold for other concept classes:

Majority functions, Linear threshold functions, etc.

More powerful learning model (e.g., membership queries).

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 19 / 22

slide-93
SLIDE 93

References I

S´ ebastien Bubeck, Eric Price, and Ilya Razenshteyn. Adversarial examples from computational constraints. arXiv preprint. arXiv:1805.10204, 2018. Akshay Degwekar and Vinod Vaikuntanathan. Computational limitations in robust classification and win-win results. arXiv preprint. arXiv:1902.01086, 2019.

Pascale Gourdeau (University of Oxford) On the Hardness of Robust Classification 20 / 22