Matthieu Bloch March 12, 2020
VC GENERALIZATION BOUND VC GENERALIZATION BOUND
1
VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch - - PowerPoint PPT Presentation
VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND BABY PICTURE) LOGISTICS (AND BABY PICTURE) Problem Set 4 Assigned very soon, but no work expected during Spring break Project proposal Deadline
Matthieu Bloch March 12, 2020
1
Problem Set 4 Assigned very soon, but no work expected during Spring break Project proposal Deadline extended to March 18, 11:59pm (hard deadline for everyone)
2
For a dataset and set of hypotheses , the set of dichotomies generated by
is By definition and in general
For a set of hypotheses , the growth function of is The growth function does not depend on the datapoints The growth function is bounded
D ≜ {xi}N
i=1
H H D H({ ) ≜ {{h( ) : h ∈ H} xi}N
i=1
xi }N
i=1
|H({ )| ≤ xi}N
i=1
2N |H({ )| ≪ |H| xi}N
i=1
H H (N) ≜ |H({ )| mH max
{xi}N
i=1
xi}N
i=1
{xi}N
i=1
(N) ≤ mH 2N
3
Linear classifiers:
If can generate all dichotomies on , we say that shatters
If no data set of size can be shattered by , then is a break point for The break point for linear classifiers is Proposition. If there exists any break point for , then is polynomial in If there is no break point for , then
H ≜ {h : → {±1} : x ↦ sgn ( x + b)|w ∈ , b ∈ R} R2 w⊺ R2 (3) = 8 mH (4) = 14 < mH 24 H {xi}N
i=1
H {xi}N
i=1
k H k H 4 H (N) mH N H (N) = mH 2N
4
5
Consider our learning problem introduced earlier in the semester Proposition (VC bound) Compare this with our previous generalization bound that assumed We replace the by and by We can now handle infinite hypothesis classes! With probability at least Key insight behind proof is how to relate to with and Approach developed by Vapnik and Chervonenkis in 1971
( R(h) − (h) > ϵ) ≤ 4 (2N) P sup
h∈H
∣ ∣ R ˆN ∣ ∣ mH e−
N
1 8 ϵ2
|H| < ∞ ( R(h) − (h) > ϵ) ≤ 2|H| P max
h∈H
∣ ∣ R ˆN ∣ ∣ e−2
N ϵ2
max sup |H| (2N) mH 1 − δ R( ) ≤ ( ) + h∗ R ˆN h∗ (log (2N) + log ) 8 N mH 4 δ − − − − − − − − − − − − − − − − − − − − − √ suph∈H maxh∈H′ ⊂ H H′ | | < ∞ H′
6
The growth function plays a role There may be infinitely many , but they generate a finite number of unique dichotomies Hence, is finite Unfortunately still potentialy takes infinitely many different values Key insight: use a second ghost dataset of size with empirical risk Hope that we can squeeze between and We will try to relate to with
mH h ∈ H { (h) : h ∈ H} R ˆN R(h) N (h) R ˆ′
N
R(h) (h) R ˆ′
N
(h) R ˆN ( R(h) − (h) > ϵ) P ∣ ∣ R ˆN ∣ ∣ ( (h) − (h) > ) P ∣ ∣R ˆ′
N
R ˆN ∣ ∣ ϵ′ = f(ϵ) ϵ′ ( (h) − (h) > ϵ) P ∣ ∣R ˆN R ˆ′
N
∣ ∣
7
Assume that , be i.i.d. random variables with symmetric distribution around their mean Let Let Lemma (Symmetric bound) If and had symmetric distributions, we would obtain Not quite true, but close
X X′ μ A ≜ {|X − μ| > ϵ} B ≜ {|X − | > ϵ} X′ (A) ≤ 2 (B) P P X ≜ (h) R ˆN ≜ (h) X′ R ˆ′
N
( R(h) − (h) > ϵ) ≤ 2 ( (h) − (h) > ϵ) P ∣ ∣ R ˆN ∣ ∣ P ∣ ∣R ˆN R ˆ′
N
∣ ∣
8
9
Lemma. If ,
N ≥ 4 ln 2 ϵ−2 ( R(h) − (h) > ϵ) ≤ 2 ( (h) − (h) > ) P sup
h∈H
∣ ∣ R ˆN ∣ ∣ P sup
h∈H
∣ ∣R ˆ′
N
R ˆN ∣ ∣ ϵ 2
10
11