Dichotomies and Growth Function Matthieu R. Bloch 1 Motivation For - PDF document

1 (1) tuitively, we are hoping that the number of distinct labelings is quantity that better captures the attempt to assess the number of hypotheses that lead to distinct labelings for a given dataset. In- Figure 1: Two distinct classsifiers with the same empirical risk shown are distinct but have exactly the same empirical risk on the training set. (3) (2) max ECE 6254 - Spring 2020 - Lecture 16 v1.0 - revised March 24, 2020 Dichotomies and Growth Function Matthieu R. Bloch 1 Motivation For a hypothesis set H with |H| = M and h ∗ = argmax h ∈H � R N ( h ) , we have shown earlier that �� R N ( h ∗ ) − R ( h ∗ ) � ⩾ ϵ ⩽ 2 M exp ( − 2 Nϵ 2 ) . ∀ ϵ > 0 P In particular, the factor M is the result of the union bound , which we used to show that for ϵ > 0 � � �� R N ( h ∗ ) − R ( h ∗ ) � � � ⩾ ϵ ⩽ P � ⩾ ϵ R N ( h ) − R ( h ) P h ∈H �� M � � � � � ⩽ � ⩾ ϵ R N ( h j ) − R ( h j ) . P j =1 � � � � � � Tie second inequality is tight when the events E j ≜ { � ⩾ ϵ } are disjoint , but this R N ( h j ) − R ( h j ) is rarely the case in our classification setup. Tiis is illustrated in Fig. 1 below, where the two classifier h 1 h 2 Tiis observations suggests that our bound might be extremely loose and that |H| may not nec- essarily be the right measure of the richness of the hypothesis set H . Most of our work in the next few lectures will be devoted to finding a suitable replacement for |H| , which will enable use to prove a generalization bound even in settings for which |H| = ∞ , as is the case for linear classifiers. 2 Dichotomy and growth function Motivated by the situation in Fig. 1, where many classifier have the same empirical risk, we will richness of the hypothesis class H . Formally, we introduce the notion of dichotomy .

2 quantity because it is not only potentially difficult to compute but also dependent on a specific be a bit more careful when counting dichotomies: Again, this is a situation for which we can compute the growth function exactly. Without loss of generality, (7) generates the highest number of dichotomies; here, this is only tractable because the situation is simple. function exactly. In general, this is challenging because we need to identify the worst case dataset that (6) (4) (5) dataset. Tiis motivates the definition of the growth function as follows. duced on the dataset are identical. By definition, for our binary labeling problem, ECE 6254 - Spring 2020 - Lecture 16 v1.0 - revised March 24, 2020 Definition 2.1 (Dichotomy) . For a dataset D ≜ { x i } N i =1 and set of hypotheses H , the set of dichotomies generated by H on D is the set of labelings that can be generated by classfiers in H on the dataset, i.e., H ( { x i } N i =1 ) ≜ {{ h ( x i ) } N i =1 : h ∈ H} . Note that many sets {{ h ( x i ) } N i =1 for distinct h are actually identical because the labelings in- � � � ⩽ � H ( { x i } N i =1 ) � � � � � H ( { x i } N � ≪ |H| . Unfortunately, � H ( { x i } N � is not a particularly useful 2 N and in general i =1 ) i =1 ) Definition 2.2 (Growth function) . For a set of hypotheses H , the growth function of H is � � � H ( { x i } N � . m H ( N ) ≜ max i =1 ) { x i } N i =1 Note that the growth function depends on the number of datapoints N but not on the exact datapoints { x i } N i =1 . Tie growth function measures the maximum number of dichotomies that H can generate over all possible datasets, and by definition, it still holds that m H ( N ) ⩽ 2 N . Example 2.3 (Positive rays) . Consider a binary classification problem in R with the set of positive rays H ≜ { h a : R → {± 1 } : x �→ sign ( x − a ) | a ∈ R } . As illustrated below, the threshold a defines a classifier such that all points to the left are assigned label − 1 while all points to the right are assigned label +1 . a x 1 x 2 x N − 1 x N h ( x ) = +1 h ( x ) = − 1 Although H = ∞ , the number of dichotomies is still finite, and one can actually compute the growth Without losing generality, we can assume that all N points { x i } N i =1 are distinct. Let us introduce x 0 ≜ −∞ and x N +1 ≜ ∞ . For any i ⩾ 0 , all classifiers h a with x i ⩽ a < x i +1 induce the same labeling. Consequently, the number of distinct labelings is at most N + 1 and m H ( N ) = N + 1 . Interestingly, the growth function is growing polynomially in N , which is much slower than the exponential growth 2 N allowed by the upper bound. Example 2.4 (Positive intervals) . Consider a binary classification in R with the set of positive intervals H ≜ { h a,b : R → {± 1 } : x �→ 1 { x ∈ [ a ; b ] } − 1 { x / ∈ [ a ; b ] } | a < b ∈ R } . As illustrated below, the thresholds a < b define a classifier such that all points with [ a ; b ] are assigned label +1 while all points outside are assigned label − 1 . b a x 1 x 2 x N − 1 x N h ( x ) = − 1 h ( x ) = +1 h ( x ) = − 1 we assume that all N datapoints are distinct and we introduce x 0 ≜ −∞ and x N +1 ≜ ∞ . We need to

labeling. the same labelings; 3 ECE 6254 - Spring 2020 - Lecture 16 v1.0 - revised March 24, 2020 • If x 0 < a < b ⩽ x 1 , all classifiers h ab induce an all- − 1 labeling; • for any 0 ⩽ i < j ⩽ N , all classifiers h ab such that x i ⩽ a ⩽ x i +1 < x j ⩽ b ⩽ x j +1 induce • for any 0 ⩽ i ⩽ N , all classifiers h ab such that x i ⩽ a < b < x i +1 induce again an all- − 1 � N +1 � and m H ( N ) = N 2 2 + N Consequently, the number of classifiers is 1 + 2 + 1 , which grows again 2 polynomially in N .

Dichotomies and Growth Function Matthieu R. Bloch 1 Motivation For - PDF document

1 (1) tuitively, we are hoping that the number of distinct labelings is quantity that better captures the attempt to assess the number of hypotheses that lead to distinct labelings for a given dataset. In- Figure 1: Two distinct classsifiers with

Dichotomies and Duality in First-order Model Checking Problems Barnaby Martin Department of

A Survey of Complex Dimensions, Measurability, and Lattice/Nonlattice Dichotomies John A. Rock

Dualities and Dichotomies in Algorithmic Information Theory Jan Reimann Pennsylvania State

Dichotomies in Secondary Predication: A view from complex predicates in Hungarian anyi 1 , 2 and

Breaking Dichotomies Are Civil Rights and Black Na8onalism Mutually Exclusive? *Dr. Devyn

Dichotomies in Ontology-Mediated Querying with the Guarded Fragment Frank Wolter University of

Learning From Data Lecture 6 Bounding The Growth Function Bounding the Growth Function Models

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Function Calls Function Calls Python supports expressions with math-like functions A

Evaluation function Cost function g g Evaluation function Cost function expand vertex

Purpose, Function, and Design Purpose, Function, and Design Purpose, Function, and Design

The Natural Logarithm Function and The Exponential Function One specific logarithm function is

Recursion What happens when a function calls itself? This is known as a recursive function

Squares of function spaces and function spaces on squares Miko laj Krupski University of

PHIL309P Methods in Philosophy, Politics and Economics Eric Pacuit University of Maryland 1 /

The Polarization Function, the QED Beta Function and the Muon Anomalous Magnetic Moment Johann

The Adonis research data preservation project for digital humanities in France What is Adonis ?

Early Site Permit Application Review Clinch River Nuclear Site Environmental Panel August 14,

Databases Lecture 2 1 Rela0on Terminology Rela0on == 2D

Databases: Relational Algebra Students Professors Last First Last First Potter Harry

CRN Study Support Service Greater Manchester Research Support & Industry Services The Role of

iSCSI Items 1. Provide (some) guidance for a ULP timeout value that is workable for the various

EUROCAMP 2009, CIT, Cork May 18 th -19 th Gerard Culley, CISSP IT manager CIT Agenda Background

Annual Meeting Oslo 8 September 2017 Agenda A network of academic researchers from multiple

Dichotomies and Growth Function Matthieu R. Bloch 1 Motivation For - PDF document

1 (1) tuitively, we are hoping that the number of distinct labelings is quantity that better captures the attempt to assess the number of hypotheses that lead to distinct labelings for a given dataset. In- Figure 1: Two distinct classsifiers with

Dichotomies and Duality in First-order Model Checking Problems Barnaby Martin Department of

A Survey of Complex Dimensions, Measurability, and Lattice/Nonlattice Dichotomies John A. Rock

Dualities and Dichotomies in Algorithmic Information Theory Jan Reimann Pennsylvania State

Dichotomies in Secondary Predication: A view from complex predicates in Hungarian anyi 1 , 2 and

Breaking Dichotomies Are Civil Rights and Black Na8onalism Mutually Exclusive? *Dr. Devyn

Dichotomies in Ontology-Mediated Querying with the Guarded Fragment Frank Wolter University of

Learning From Data Lecture 6 Bounding The Growth Function Bounding the Growth Function Models

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Function Calls Function Calls Python supports expressions with math-like functions A

Evaluation function Cost function g g Evaluation function Cost function expand vertex

Purpose, Function, and Design Purpose, Function, and Design Purpose, Function, and Design

The Natural Logarithm Function and The Exponential Function One specific logarithm function is

Recursion What happens when a function calls itself? This is known as a recursive function

Squares of function spaces and function spaces on squares Miko laj Krupski University of

PHIL309P Methods in Philosophy, Politics and Economics Eric Pacuit University of Maryland 1 /

The Polarization Function, the QED Beta Function and the Muon Anomalous Magnetic Moment Johann

The Adonis research data preservation project for digital humanities in France What is Adonis ?

Early Site Permit Application Review Clinch River Nuclear Site Environmental Panel August 14,

Databases Lecture 2 1 Rela0on Terminology Rela0on == 2D

Databases: Relational Algebra Students Professors Last First Last First Potter Harry

CRN Study Support Service Greater Manchester Research Support &amp; Industry Services The Role of

iSCSI Items 1. Provide (some) guidance for a ULP timeout value that is workable for the various

EUROCAMP 2009, CIT, Cork May 18 th -19 th Gerard Culley, CISSP IT manager CIT Agenda Background

Annual Meeting Oslo 8 September 2017 Agenda A network of academic researchers from multiple

CRN Study Support Service Greater Manchester Research Support & Industry Services The Role of