VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch - - PowerPoint PPT Presentation

vc generalization bound vc generalization bound
SMART_READER_LITE
LIVE PREVIEW

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch - - PowerPoint PPT Presentation

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND BABY PICTURE) LOGISTICS (AND BABY PICTURE) Problem Set 4 Assigned very soon, but no work expected during Spring break Project proposal Deadline


slide-1
SLIDE 1

Matthieu Bloch March 12, 2020

VC GENERALIZATION BOUND VC GENERALIZATION BOUND

1

slide-2
SLIDE 2

LOGISTICS (AND BABY PICTURE) LOGISTICS (AND BABY PICTURE)

Problem Set 4 Assigned very soon, but no work expected during Spring break Project proposal Deadline extended to March 18, 11:59pm (hard deadline for everyone)

2

slide-3
SLIDE 3

RECAP: DICHOTOMIES AND GROWTH FUNCTION RECAP: DICHOTOMIES AND GROWTH FUNCTION

  • Definition. (Dichotomy)

For a dataset and set of hypotheses , the set of dichotomies generated by

  • n

is By definition and in general

  • Definition. (Growth function)

For a set of hypotheses , the growth function of is The growth function does not depend on the datapoints The growth function is bounded

D ≜ {xi}N

i=1

H H D H({ ) ≜ {{h( ) : h ∈ H} xi}N

i=1

xi }N

i=1

|H({ )| ≤ xi}N

i=1

2N |H({ )| ≪ |H| xi}N

i=1

H H (N) ≜ |H({ )| mH max

{xi}N

i=1

xi}N

i=1

{xi}N

i=1

(N) ≤ mH 2N

3

slide-4
SLIDE 4

RECAP: BREAK POINT RECAP: BREAK POINT

Linear classifiers:

  • Definition. (Shattering)

If can generate all dichotomies on , we say that shatters

  • Definition. (Break point)

If no data set of size can be shattered by , then is a break point for The break point for linear classifiers is Proposition. If there exists any break point for , then is polynomial in If there is no break point for , then

H ≜ {h : → {±1} : x ↦ sgn ( x + b)|w ∈ , b ∈ R} R2 w⊺ R2 (3) = 8 mH (4) = 14 < mH 24 H {xi}N

i=1

H {xi}N

i=1

k H k H 4 H (N) mH N H (N) = mH 2N

4

slide-5
SLIDE 5

5

slide-6
SLIDE 6

VC GENERALIZATION BOUND VC GENERALIZATION BOUND

Consider our learning problem introduced earlier in the semester Proposition (VC bound) Compare this with our previous generalization bound that assumed We replace the by and by We can now handle infinite hypothesis classes! With probability at least Key insight behind proof is how to relate to with and Approach developed by Vapnik and Chervonenkis in 1971

( R(h) − (h) > ϵ) ≤ 4 (2N) P sup

h∈H

∣ ∣ R ˆN ∣ ∣ mH e−

N

1 8 ϵ2

|H| < ∞ ( R(h) − (h) > ϵ) ≤ 2|H| P max

h∈H

∣ ∣ R ˆN ∣ ∣ e−2

N ϵ2

max sup |H| (2N) mH 1 − δ R( ) ≤ ( ) + h∗ R ˆN h∗ (log (2N) + log ) 8 N mH 4 δ − − − − − − − − − − − − − − − − − − − − − √ suph∈H maxh∈H′ ⊂ H H′ | | < ∞ H′

6

slide-7
SLIDE 7

KEY INSIGHTS OF VC BOUND KEY INSIGHTS OF VC BOUND

The growth function plays a role There may be infinitely many , but they generate a finite number of unique dichotomies Hence, is finite Unfortunately still potentialy takes infinitely many different values Key insight: use a second ghost dataset of size with empirical risk Hope that we can squeeze between and We will try to relate to with

  • nly depends on the finite number of unique dichotomies

mH h ∈ H { (h) : h ∈ H} R ˆN R(h) N (h) R ˆ′

N

R(h) (h) R ˆ′

N

(h) R ˆN ( R(h) − (h) > ϵ) P ∣ ∣ R ˆN ∣ ∣ ( (h) − (h) > ) P ∣ ∣R ˆ′

N

R ˆN ∣ ∣ ϵ′ = f(ϵ) ϵ′ ( (h) − (h) > ϵ) P ∣ ∣R ˆN R ˆ′

N

∣ ∣

7

slide-8
SLIDE 8

INTUITION INTUITION

Assume that , be i.i.d. random variables with symmetric distribution around their mean Let Let Lemma (Symmetric bound) If and had symmetric distributions, we would obtain Not quite true, but close

X X′ μ A ≜ {|X − μ| > ϵ} B ≜ {|X − | > ϵ} X′ (A) ≤ 2 (B) P P X ≜ (h) R ˆN ≜ (h) X′ R ˆ′

N

( R(h) − (h) > ϵ) ≤ 2 ( (h) − (h) > ϵ) P ∣ ∣ R ˆN ∣ ∣ P ∣ ∣R ˆN R ˆ′

N

∣ ∣

8

slide-9
SLIDE 9

9

slide-10
SLIDE 10

PROOF OF VC BOUND PROOF OF VC BOUND

Lemma. If ,

N ≥ 4 ln 2 ϵ−2 ( R(h) − (h) > ϵ) ≤ 2 ( (h) − (h) > ) P sup

h∈H

∣ ∣ R ˆN ∣ ∣ P sup

h∈H

∣ ∣R ˆ′

N

R ˆN ∣ ∣ ϵ 2

10

slide-11
SLIDE 11

11

 