Making Generalization Robust Katrina Ligett HUJI & Caltech - - PowerPoint PPT Presentation

making generalization robust
SMART_READER_LITE
LIVE PREVIEW

Making Generalization Robust Katrina Ligett HUJI & Caltech - - PowerPoint PPT Presentation

Making Generalization Robust Katrina Ligett HUJI & Caltech joint with Rachel Cummings, Kobbi Nissim, Aaron Roth, and Steven Wu A model for science A model for science Hypothesis Learning Alg domain: contains all possible


slide-1
SLIDE 1

Making Generalization Robust

Katrina Ligett HUJI & Caltech

joint with Rachel Cummings, Kobbi Nissim, Aaron Roth, and Steven Wu

slide-2
SLIDE 2

A model for science…

slide-3
SLIDE 3

A model for science…

slide-4
SLIDE 4

Learning Alg

Hypothesis

  • domain: contains all possible examples
  • hypothesis: X-> {0,1} labels examples
  • learning alg samples labeled examples, returns hypothesis
slide-5
SLIDE 5

Learning Alg

Hypothesis

The goal of science: Find hypothesis that has low true error on the distribution D: err(h) = Prx~D[h(x) ≠ h*(x)]

slide-6
SLIDE 6

Why does science work?

slide-7
SLIDE 7

Why does science work?

slide-8
SLIDE 8

Learning Alg

Hypothesis

The goal of science: Find hypothesis that has low true error on the distribution D: err(h) = Prx~D[h(x) ≠ h*(x)] Idea: find hypothesis that has low empirical error on S, plus guarantee that findings on the sample generalize to D

slide-9
SLIDE 9

Learning Alg

Hypothesis

Empirical error: errE(h) = 1/n ∑x ∈ S 1[h(x) ≠ h*(x)] Generalization: output h s.t. Pr[|h(S) - h(D) |] ≤ 𝛃] ≥ 1 - β

slide-10
SLIDE 10

taken from Understanding Machine Learning, Shai Shalev-Schwarts and Shai Ben-David

slide-11
SLIDE 11

Problem solved!

slide-12
SLIDE 12

Science doesn’t happen in a vacuum.

Problem solved?

slide-13
SLIDE 13

One thing that can go wrong: post-processing

slide-14
SLIDE 14
  • Learning an SVM: Output encodes Support Vectors (sample points)
  • This output could be post-processed to obtain a non-generalizing

hypothesis: “10% of all data points are x_k”

slide-15
SLIDE 15

Oh, man. Our approach on this Kaggle competition really failed on the test data. Oh well, let’s try again. Did you see that paper published by the Smith lab? Yeah, I bet they’d see an even bigger effect if they accounted for sunspots! The journal requires open access to the data—let’s try it and see!

slide-16
SLIDE 16

A second big problem: adaptive composition

q1 a1 q2 a2

. . .

slide-17
SLIDE 17

A second big problem: adaptive composition

q1 a1 q2 a2

. . .

slide-18
SLIDE 18

A second big problem: adaptive composition

Adaptive composition can cause overfitting! Generalization guarantees don’t “add up”

q1 a1 q2 a2

. . .

slide-19
SLIDE 19

A second big problem: adaptive composition

Adaptive composition can cause overfitting! Generalization guarantees don’t “add up”

q1 a1 q2 a2

. . .

  • Pick parameters; fit model
slide-20
SLIDE 20

A second big problem: adaptive composition

Adaptive composition can cause overfitting! Generalization guarantees don’t “add up”

q1 a1 q2 a2

. . .

  • Pick parameters; fit model
  • ML competitions
slide-21
SLIDE 21

A second big problem: adaptive composition

Adaptive composition can cause overfitting! Generalization guarantees don’t “add up”

q1 a1 q2 a2

. . .

  • Pick parameters; fit model
  • ML competitions
  • Scientific fields that share one dataset
slide-22
SLIDE 22

Some basic questions

  • Is it possible to get good learning algorithms that also are

robust to post-processing? Adaptive composition?

  • How to construct them? Existing algorithms? How much extra

data do they need?

  • Accuracy + generalization + post-processing-robustness = ?
  • Accuracy + generalization + adaptive composition = ?
  • What composes with what? How well (how quickly does

generalization degrade)? Why?

slide-23
SLIDE 23

Notice: generalization doesn’t require correct hypotheses, just that they perform the same on the sample as on the distribution Generalization alone is easy. What’s interesting: generalization + accuracy.

slide-24
SLIDE 24
  • Robust generalization

“no adversary can use output to find a hypothesis that

  • verfits”

information-theoretic (could think computational)

Generalization + post-processing robustness

slide-25
SLIDE 25

Robust Generalization

slide-26
SLIDE 26

Robust Generalization

  • Robust to post-processing
  • Somewhat robust to adaptive composition (more on this later)
slide-27
SLIDE 27

Do Robustly-Generalizing Algs Exist?

slide-28
SLIDE 28

Yes!

Do Robustly-Generalizing Algs Exist?

slide-29
SLIDE 29

Yes!

Do Robustly-Generalizing Algs Exist?

slide-30
SLIDE 30

Yes!

  • This paper: Compression Schemes -> Robust

Generalization

Do Robustly-Generalizing Algs Exist?

slide-31
SLIDE 31

Yes!

  • This paper: Compression Schemes -> Robust

Generalization

  • [DFHPRR15a]: Bounded description length -> Robust

Generalization

Do Robustly-Generalizing Algs Exist?

slide-32
SLIDE 32

Yes!

  • This paper: Compression Schemes -> Robust

Generalization

  • [DFHPRR15a]: Bounded description length -> Robust

Generalization

  • [BNSSSU16]: Differential privacy -> Robust Generalization

Do Robustly-Generalizing Algs Exist?

slide-33
SLIDE 33

Yes!

  • This paper: Compression Schemes -> Robust

Generalization

  • [DFHPRR15a]: Bounded description length -> Robust

Generalization

  • [BNSSSU16]: Differential privacy -> Robust Generalization

Do Robustly-Generalizing Algs Exist?

slide-34
SLIDE 34

Compression schemes

Hypothesis

slide-35
SLIDE 35

Compression schemes

Hypothesis

slide-36
SLIDE 36

Robust Generalization via compression

A A

algorithm

slide-37
SLIDE 37

What Can be Learned under RG?

Theorem (informal; thanks to Shay Moran): sample complexity of robustly generalizing learning is the same up to log factors, as the sample complexity of PAC learning

slide-38
SLIDE 38

Yes!

  • This paper: Compression Schemes -> Robust

Generalization

  • [DFHPRR15a]: Bounded description length -> Robust

Generalization

  • [BNSSSU16]: Differential privacy -> Robust Generalization

Do Robustly-Generalizing Algs Exist?

slide-39
SLIDE 39
slide-40
SLIDE 40

Differential Privacy [DMNS ‘06]

slide-41
SLIDE 41

Differential Privacy [DMNS ‘06]

  • Robust to post-processing [DMNS ‘06] and adaptive composition [DRV

‘10]

  • Necessarily randomized output
  • No mention of how samples drawn!
slide-42
SLIDE 42

Does DP = RG?

slide-43
SLIDE 43

Does DP = RG?

slide-44
SLIDE 44

Does DP = RG?

slide-45
SLIDE 45

Does DP = RG?

No “quick fix” to make RG learner satisfy DP

slide-46
SLIDE 46
  • Robust generalization

“no adversary can use output to find a hypothesis that overfits”

  • Differential privacy [DMNS ‘06]

“similar samples should have the same output”

  • Perfect generalization

“output reveals nothing about the sample”

Notions of generalization

slide-47
SLIDE 47

Perfect Generalization

slide-48
SLIDE 48
  • Differential privacy gives privacy to the individual

Changing one entry in the database shouldn’t change the output too much

  • Perfect generalization gives privacy to the data provider

(e.g. school, hospital) Changing the entire sample to something “typical” shouldn’t change the output too much

PG as a privacy notion

slide-49
SLIDE 49

Exponential Mechanism [MT07]

slide-50
SLIDE 50

DP implies PG with worse parameters

slide-51
SLIDE 51

PG implies DP…sort of

slide-52
SLIDE 52

PG implies DP…sort of

slide-53
SLIDE 53

PG implies DP…sort of

Problems that are solvable under PG are also solvable under DP

slide-54
SLIDE 54
  • Robust generalization

“no adversary can use output to find a hypothesis that overfits”

  • Differential privacy [DMNS ‘06]

“similar samples should have the same output”

  • Perfect generalization

“output reveals nothing about the sample”

Notions of generalization

slide-55
SLIDE 55

Some basic questions

  • Is it possible to get good learning algorithms that also are

robust to post-processing? Adaptive composition?

  • How to construct them? Existing algorithms? How much extra

data do they need?

  • Accuracy + generalization + post-processing-robustness = ?
  • Accuracy + generalization + adaptive composition = ?
  • What composes with what? How well (how quickly does

generalization degrade)? Why?

slide-56
SLIDE 56

Making Generalization Robust

Katrina Ligett katrina.ligett@mail.huji.ac.il HUJI & Caltech

joint with Rachel Cummings, Kobbi Nissim, Aaron Roth, and Steven Wu