VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch - PowerPoint PPT Presentation

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1

LOGISTICS (AND BABY PICTURE) LOGISTICS (AND BABY PICTURE) Problem Set 4 Assigned very soon, but no work expected during Spring break Project proposal Deadline extended to March 18, 11:59pm ( hard deadline for everyone ) 2

RECAP: DICHOTOMIES AND GROWTH FUNCTION RECAP: DICHOTOMIES AND GROWTH FUNCTION Definition. (Dichotomy) For a dataset and set of hypotheses , the set of dichotomies generated by on is D ≜ { x i } N H H D i =1 x i } N x i } N H ({ ) ≜ {{ h ( ) : h ∈ H } i =1 i =1 By definition and in general 2 N x i } N x i } N | H ({ )| ≤ | H ({ )| ≪ | H | i =1 i =1 Definition. (Growth function) For a set of hypotheses , the growth function of is H H x i } N m H ( N ) ≜ max | H ({ )| i =1 { x i } N i =1 The growth function does not depend on the datapoints { x i } N i =1 The growth function is bounded 2 N m H ( N ) ≤ 3

RECAP: BREAK POINT RECAP: BREAK POINT Linear classifiers: R 2 R 2 w ⊺ H ≜ { h : → {±1} : x ↦ sgn ( x + b )| w ∈ , b ∈ R } m H (3) = 8 2 4 m H (4) = 14 < Definition. (Shattering) If can generate all dichotomies on , we say that shatters { x i } N { x i } N H H i =1 i =1 Definition. (Break point) If no data set of size can be shattered by , then is a break point for k H k H The break point for linear classifiers is 4 Proposition. If there exists any break point for , then is polynomial in H m H ( N ) N If there is no break point for , then 2 N H ( N ) = m H 4

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Consider our learning problem introduced earlier in the semester Proposition (VC bound) 1 8 ϵ 2 e − N ∣ ˆ N ∣ P ( sup R ( h ) − R ( h ) > ϵ ) ≤ 4 m H (2 N ) ∣ ∣ h ∈ H Compare this with our previous generalization bound that assumed | H | < ∞ ϵ 2 e −2 N ∣ ˆ N ∣ P ( max R ( h ) − R ( h ) > ϵ ) ≤ 2| H | ∣ ∣ h ∈ H We replace the by and by max sup | H | m H (2 N ) We can now handle infinite hypothesis classes! With probability at least 1 − δ − − − − − − − − − − − − − − − − − − − − − 8 4 h ∗ ˆ N h ∗ R ( ) ≤ R ( ) + √ ( log m H (2 N ) + log ) N δ Key insight behind proof is how to relate to with and H ′ H ′ max h ∈ H ′ sup h ∈ H ⊂ H | | < ∞ Approach developed by Vapnik and Chervonenkis in 1971 6

KEY INSIGHTS OF VC BOUND KEY INSIGHTS OF VC BOUND The growth function plays a role m H There may be infinitely many , but they generate a finite number of unique dichotomies h ∈ H Hence, is finite ˆ N { R ( h ) : h ∈ H } Unfortunately still potentialy takes infinitely many different values R ( h ) Key insight : use a second ghost dataset of size with empirical risk ˆ ′ N R ( h ) N Hope that we can squeeze between and ˆ ′ ˆ N R ( h ) ( h ) ( h ) R R N We will try to relate to with ˆ ′ ϵ ′ ϵ ′ P ∣ ∣ P ∣ ∣ ˆ N ˆ N ( R ( h ) − ( h ) > ϵ ) ( ( h ) − ( h ) > ) = f ( ϵ ) R ∣ R R ∣ ∣ N ∣ only depends on the finite number of unique dichotomies ˆ ′ P ∣ ∣ ˆ N ( ∣ R ( h ) − R ( h ) > ϵ ) N ∣ 7

INTUITION INTUITION Assume that , be i.i.d. random variables with symmetric distribution around their mean X X ′ μ Let A ≜ {| X − μ | > ϵ } Let X ′ B ≜ {| X − | > ϵ } Lemma (Symmetric bound) P ( A ) ≤ 2 ( B ) P If and had symmetric distributions, we would obtain ˆ ′ X ′ ˆ N X ≜ ( h ) ≜ ( h ) R R N ˆ ′ P ∣ ∣ P ∣ ˆ N ∣ ˆ N ( R ( h ) − ( h ) > ϵ ) ≤ 2 ( ( h ) − ( h ) > ϵ ) R ∣ R R ∣ ∣ N ∣ Not quite true, but close 8

PROOF OF VC BOUND PROOF OF VC BOUND Lemma. If , ϵ −2 N ≥ 4 ln 2 ˆ ′ ϵ ∣ ∣ ∣ ˆ N ∣ ˆ N P ( sup R ( h ) − R ( h ) > ϵ ) ≤ 2 ( P sup ∣ R ( h ) − R ( h ) > ) ∣ ∣ N ∣ 2 h ∈ H h ∈ H 10

   11

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch - PowerPoint PPT Presentation

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND BABY PICTURE) LOGISTICS (AND BABY PICTURE) Problem Set 4 Assigned very soon, but no work expected during Spring break Project proposal Deadline

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

Consistency Analysis for Massively Inconsistent Datasets in Bound-to-Bound Data Collaboration

Rightward Bound: The Rise of Conservatism in Postwar America Rightward Bound : The Rise of

Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 /

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Generalization of Cycle-Covering Heuristics Clemens B uchner Department of Mathematics and

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Community-Preserving Generalization of Social Networks Jordi Casas-Roma 1 and Fran cois Rousseau

Generalization Ability of Majority Vote Point classifiers Akshat Agarwal Rahul K Sevakula

Adaptive Oblivious Transfer And Generalization Olivier Blazy, C eline Chevalier, Paul Germouty

Assessing Generalization in Deep Reinforcement Learning Soo Jung Jang Background Before (ex:

Deep Model Generalization for Medical Image Computing at Scale DOU Qi Department of Computer

Video to Text Description Jia Chen 1 , Shizhe Chen 2 , Qin Jin 2 , Alexander Hauptmann 1 Carnegie

Lecture 4.5: Generalized Fourier series Matthew Macauley Department of Mathematical Sciences

Michael Spece Departments of Machine Learning and Statistics Carnegie Mellon University June 11,

The Effect of Network Width on Stochastic Gradient Descent and Generalization Daniel S. Park

Lecture 4: Linear Regression Optimization Generalization Model complexity

Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of

Generalization Error MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON Elie Kawerk

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer NNs