 
              VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1
LOGISTICS (AND BABY PICTURE) LOGISTICS (AND BABY PICTURE) Problem Set 4 Assigned very soon, but no work expected during Spring break Project proposal Deadline extended to March 18, 11:59pm ( hard deadline for everyone ) 2
RECAP: DICHOTOMIES AND GROWTH FUNCTION RECAP: DICHOTOMIES AND GROWTH FUNCTION Definition. (Dichotomy) For a dataset and set of hypotheses , the set of dichotomies generated by on is D ≜ { x i } N H H D i =1 x i } N x i } N H ({ ) ≜ {{ h ( ) : h ∈ H } i =1 i =1 By definition and in general 2 N x i } N x i } N | H ({ )| ≤ | H ({ )| ≪ | H | i =1 i =1 Definition. (Growth function) For a set of hypotheses , the growth function of is H H x i } N m H ( N ) ≜ max | H ({ )| i =1 { x i } N i =1 The growth function does not depend on the datapoints { x i } N i =1 The growth function is bounded 2 N m H ( N ) ≤ 3
RECAP: BREAK POINT RECAP: BREAK POINT Linear classifiers: R 2 R 2 w ⊺ H ≜ { h : → {±1} : x ↦ sgn ( x + b )| w ∈ , b ∈ R } m H (3) = 8 2 4 m H (4) = 14 < Definition. (Shattering) If can generate all dichotomies on , we say that shatters { x i } N { x i } N H H i =1 i =1 Definition. (Break point) If no data set of size can be shattered by , then is a break point for k H k H The break point for linear classifiers is 4 Proposition. If there exists any break point for , then is polynomial in H m H ( N ) N If there is no break point for , then 2 N H ( N ) = m H 4
5
VC GENERALIZATION BOUND VC GENERALIZATION BOUND Consider our learning problem introduced earlier in the semester Proposition (VC bound) 1 8 ϵ 2 e − N ∣ ˆ N ∣ P ( sup R ( h ) − R ( h ) > ϵ ) ≤ 4 m H (2 N ) ∣ ∣ h ∈ H Compare this with our previous generalization bound that assumed | H | < ∞ ϵ 2 e −2 N ∣ ˆ N ∣ P ( max R ( h ) − R ( h ) > ϵ ) ≤ 2| H | ∣ ∣ h ∈ H We replace the by and by max sup | H | m H (2 N ) We can now handle infinite hypothesis classes! With probability at least 1 − δ − − − − − − − − − − − − − − − − − − − − − 8 4 h ∗ ˆ N h ∗ R ( ) ≤ R ( ) + √ ( log m H (2 N ) + log ) N δ Key insight behind proof is how to relate to with and H ′ H ′ max h ∈ H ′ sup h ∈ H ⊂ H | | < ∞ Approach developed by Vapnik and Chervonenkis in 1971 6
KEY INSIGHTS OF VC BOUND KEY INSIGHTS OF VC BOUND The growth function plays a role m H There may be infinitely many , but they generate a finite number of unique dichotomies h ∈ H Hence, is finite ˆ N { R ( h ) : h ∈ H } Unfortunately still potentialy takes infinitely many different values R ( h ) Key insight : use a second ghost dataset of size with empirical risk ˆ ′ N R ( h ) N Hope that we can squeeze between and ˆ ′ ˆ N R ( h ) ( h ) ( h ) R R N We will try to relate to with ˆ ′ ϵ ′ ϵ ′ P ∣ ∣ P ∣ ∣ ˆ N ˆ N ( R ( h ) − ( h ) > ϵ ) ( ( h ) − ( h ) > ) = f ( ϵ ) R ∣ R R ∣ ∣ N ∣ only depends on the finite number of unique dichotomies ˆ ′ P ∣ ∣ ˆ N ( ∣ R ( h ) − R ( h ) > ϵ ) N ∣ 7
INTUITION INTUITION Assume that , be i.i.d. random variables with symmetric distribution around their mean X X ′ μ Let A ≜ {| X − μ | > ϵ } Let X ′ B ≜ {| X − | > ϵ } Lemma (Symmetric bound) P ( A ) ≤ 2 ( B ) P If and had symmetric distributions, we would obtain ˆ ′ X ′ ˆ N X ≜ ( h ) ≜ ( h ) R R N ˆ ′ P ∣ ∣ P ∣ ˆ N ∣ ˆ N ( R ( h ) − ( h ) > ϵ ) ≤ 2 ( ( h ) − ( h ) > ϵ ) R ∣ R R ∣ ∣ N ∣ Not quite true, but close 8
9
PROOF OF VC BOUND PROOF OF VC BOUND Lemma. If , ϵ −2 N ≥ 4 ln 2 ˆ ′ ϵ ∣ ∣ ∣ ˆ N ∣ ˆ N P ( sup R ( h ) − R ( h ) > ϵ ) ≤ 2 ( P sup ∣ R ( h ) − R ( h ) > ) ∣ ∣ N ∣ 2 h ∈ H h ∈ H 10
   11
Recommend
More recommend