NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF - PDF document

NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF ROLAND WALKER 1. Introduction Vladimir Vapnik and Alexey Chervonenkis proved their eponymous theorem in 1968. The original Russian proof was published in 1971 and then translated to English by B. Seckler later that year. The English translation was most recently reprinted in 2015 [4]. These notes, which provide a relatively self-contained proof of the VC Theorem, assume the reader has some comfort with the basics of real analysis (e.g., Chapters 1 and 2 of [2]) but little or no background in probability theory. In addition to the original paper, we used Chapter 7 and Appendix B of [3] as a reference for the proof of the VC theorem and Appendix A of [1] as a reference for the proof of Chernoff’s theorem. 2. Products of σ -algebras Let I be a nonempty set, and let ( X i , A i ) i ∈ I be a family of measurable spaces (i.e., each X i is a nonempty set and each A i is a σ -algebra on X i ). Definition 2.1. The product � i ∈ I A i is the σ -algebra on � i ∈ I X i given by � π − 1 �� A i = σ ( A i ) : i ∈ I, A i ∈ A i . i i ∈ I Moreover, if I = { 0 , . . . , n − 1 } for some n ≥ 2, we often write A 0 ⊗ · · · ⊗ A n − 1 for � i ∈ I A i just as we often write X 0 × · · · × X n − 1 for � i ∈ I X i . Lemma 2.2. If I is countable, then �� A i = σ A i : A i ∈ A i . i ∈ I i ∈ I Proof. A σ -algebra is closed under taking countable intersections. � Lemma 2.3. If ( E i ) i ∈ I is such that each A i = σ ( E i ) , then � π − 1 �� A i = σ ( E i ) : i ∈ I, E i ∈ E i . i i ∈ I If, in addition, I is countable, then �� A i = σ E i : E i ∈ E i . i ∈ I i ∈ I 1

2 ROLAND WALKER Lemma 2.4. If I = J ⊔ K , with both J and K nonempty, then   �� �  ⊗ A i = A j A k . (2.1) i ∈ I j ∈ J k ∈ K Proof. By Lemma 2.3, the right-hand side of (2.1) is the σ -algebra generated by sets of the form π − 1 j ( A j ) ∩ π − 1 k ( A k ) where j ∈ J , k ∈ K , A j ∈ A j , and A k ∈ A k . � Corollary 2.5. For finite products, the operator ⊗ is associative. 3. Product Measures Let n ≥ 2, and let ( X i , A i , µ i ) i<n be a family of measure spaces; i.e., each µ i : A i → [0 , ∞ ] is a measure (see [2, p. 24]) on the measurable space ( X i , A i ). Let R denote the collection of rectangular sets in A 0 ⊗ · · · ⊗ A n − 1 ; i.e., R = { A 0 × · · · × A n − 1 : A i ∈ A i } . It follows that R is an elementary family (see [2, p. 23]), so the set     � F = R j : 1 ≤ m < ω, R j ∈ R  .  j<m consisting of all finite disjoint unions of rectangles is an algebra [2, Proposition 1.7]. Let ρ : R → [0 , ∞ ] be defined by A 0 × · · · × A n − 1 �→ µ 0 ( A 0 ) · · · µ n − 1 ( A n − 1 ) . Claim 3.1. Suppose ( S j ) j<ω ⊆ R is a family of pairwise disjoint rectangles and R = � j<ω S j . If R ∈ R , then ρ ( R ) = � j<ω ρ ( S j ) . Proof. Suppose R = A 0 × · · · × A n − 1 and each S j = B j 0 × · · · × B j n − 1 with each A i and B j i in A i . Since 1 A 0 ( x 0 ) · · · 1 A n − 1 ( x n − 1 ) = 1 A 0 ×···× A n − 1 ( x 0 , . . . , x n − 1 ) � = 1 B j n − 1 ( x 0 , . . . , x n − 1 ) 0 ×··· B j j<ω � = 1 B j 0 ( x 0 ) · · · 1 B j n − 1 ( x n − 1 ) j<ω for all ( x 0 , . . . , x n − 1 ) ∈ X 0 × · · · × X n − 1 , [2, Theorem 2.15] asserts that µ 0 ( A 0 ) · · · µ n − 1 ( A n − 1 ) � � = · · · 1 A 0 ( x 0 ) · · · 1 A n − 1 ( x n − 1 ) dµ 0 ( x 0 ) · · · dµ n − 1 ( x n − 1 ) X n − 1 X 0 � � � = · · · 1 B j 0 ( x 0 ) · · · 1 B j n − 1 ( x n − 1 ) dµ 0 ( x 0 ) · · · dµ n − 1 ( x n − 1 ) X n − 1 X 0 j<ω � µ 0 ( B j 0 ) · · · µ n − 1 ( B j = n − 1 ) . j<ω �

NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF 3 Let ν : F → [0 , ∞ ] be defined by    �  = � ν R j ρ ( R j ) . j<m j<m In order to show that ν is well-defined, suppose that � j<m R j and � k<m S k describe the same set in F . For each j < m , suppose R j = A j 0 × · · · × A j n − 1 and S k = n − 1 with each A j B k 0 × · · · × B k i and B k i in A i . By Claim 3.1, we have   � � � �  � � A j A j  = ν R j µ 0 · · · µ n − 1 0 n − 1 j<m j<m � � � � � A j A j 0 ∩ B k n − 1 ∩ B k = µ 0 · · · µ n − 1 0 n − 1 j,k<m � � B k � � B k � = µ 0 · · · µ n − 1 0 n − 1 k<m � � � = ν S k . k<m Next, we show that ν is a premeasure on F (see [2, p.30]). Let � j<m R j ∈ F , and �� k<m ℓ S ℓ � �� k<m ℓ S ℓ � let ℓ<ω ⊆ F be pairwise disjoint. Suppose � j<m R j = � . k k ℓ<ω By Claim 3.1, it follows that    � �  = ν R j ρ ( R j ) j<m j<m � � � R j ∩ S ℓ � � = ρ k j<m ℓ<ω k<m ℓ � � � R j ∩ S ℓ � � = ρ k ℓ<ω k<m ℓ j<m � � S ℓ � � = ρ k ℓ<ω k<m ℓ � � � � S ℓ = ν . k ℓ<ω k<m ℓ Let ν ∗ be the outer measure associated with ν ; i.e., ν ∗ : P ( X 0 × · · · × X n − 1 ) → [0 , ∞ ] where     ν ∗ ( A ) = inf � � ν ( F j ) : F j ∈ F , A ⊆ F j  .  j<ω j<ω Definition 3.2. The product measure µ 0 × · · · × µ n − 1 is the restriction of ν ∗ to A 0 ⊗ · · · ⊗ A n − 1 . By [2, Proposition 1.13], this product is indeed a measure which extends ρ . If, in addition, each µ i is σ -finite, then [2, Proposition 1.14] implies that the product is the unique measure extending ρ to A 0 ⊗ · · · ⊗ A n − 1 .

4 ROLAND WALKER Lemma 3.3. If each µ i is σ -finite, then the product µ 0 × · · · × µ n − 1 is associative. Proof. Suppose I ⊔ J = { 0 , . . . , n − 1 } where both I and J are nonempty. Let µ I = � i ∈ I µ i and µ J = � j ∈ J µ j . It follows that ( µ I × µ J ) ⇂ R = ρ . � 4. Pushforwards Suppose ( X, A ) and ( Y, B ) are measurable spaces and f : X → Y is an ( A , B )- measurable function. Definition 4.1. If µ : A → [0 , ∞ ] is a measure, then we call µ ◦ f − 1 : B → [0 , ∞ ] its pushforward by f . Claim 4.2. The pushforward µ ◦ f − 1 is a measure. Proof. Notice that µ ◦ f − 1 ( ∅ ) = µ ( ∅ ) = 0. Suppose ( B i : i < ω ) ⊆ B is pairwise disjoint. It follows that ( f − 1 ( B i ) : i < ω ) ⊆ A is also pairwise disjoint, so µ ◦ f − 1 �� f − 1 ( B i ) µ ◦ f − 1 ( B i ) . B i = µ = � 5. Probability Spaces Definition 5.1. A probability space is a measure space (Ω , A , P ) with P (Ω) = 1 . Definition 5.2. If (Ω , A , P ) is a probability space, then the P -measurable sets (i.e., the elements of A ) are called events. 6. Random Elements and Variables Let (Ω , A , P ) be a probability space. Definition 6.1. A random element of a measurable space (Ψ , B ) is an ( A , B )- measurable function X : Ω → Ψ. Furthermore, if Ψ = R and B = B ( R ), then we call X a random variable . When describing events using preimages of random elements, we often use [ X ∈ B ] for { ω ∈ Ω : X ( ω ) ∈ B } , [ X > r ] for { ω ∈ Ω : X ( ω ) > r } , etc. This abbreviation practice is common in the literature of probability theory. As an aid to the reader, we set off such abbreviations with square brackets rather than braces. Definition 6.2. We say that a collection of random elements X 0 , . . . , X n − 1 of mea- surables spaces (Ψ 0 , B 0 ) , . . . , (Ψ n − 1 , B n − 1 ), respectively, are mutually independent iff: for all ( B 0 , . . . , B n − 1 ) ∈ B 0 × · · · B n − 1 , we have P [ X 0 ∈ B 0 , . . . , X n − 1 ∈ B n − 1 ] = P [ X 0 ∈ B 0 ] · · · P [ X n − 1 ∈ B n − 1 ] . Definition 6.3. If X is a random element of (Ψ , B ), then the probability distribution of X is the pushforward P ◦ X − 1 : B → [0 , 1].

NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF - PDF document

NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF ROLAND WALKER 1. Introduction Vladimir Vapnik and Alexey Chervonenkis proved their eponymous theorem in 1968. The original Russian proof was published in 1971 and then translated to

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner University of California, Los

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Advanced Introduction to Machine Learning, CMU-10715 Vapnik Chervonenkis Theory Barnabs

31. Stokes Theorem Stokes theorem is to Greens theorem, for the work done, as the

Ch04. Maximum Theorem, Implicit Function Theorem and Envelope Theorem Ping Yu Faculty of

Arrows Impossibility Theorem Lecture 12 Arrows Impossibility Theorem Lecture 12, Slide 1

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

LEARNING WITH NONTRIVIAL TEACHER: LEARNING USING PRIVILEGED INFORMATION Vladimir Vapnik

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Generalized Intermediate Value Theorem Intermediate Value Theorem Theorem Intermediate Value

The Replacement Theorem Theorem (Theorem 1.10) Let V be a vector space and suppose G and L are

Section 10 Cosets and the Theorem of Lagrange Instructor: Yifan Yang Fall 2006 Instructor:

PCP Theorem [PCP Theorem is] the most important result in complexity theory since Cooks

Arrows Impossibility Theorem Lecture 12 Arrows Impossibility Theorem Lecture 12, Slide 1

Green's Theorem is a special case of Stoke's 1 Some examples for Stoke's Theorem 2 3 4 5 6

29. The divergence theorem Theorem 29.1 (Divergence Theorem; Gauss, Ostrogradsky) . Let S be a

CS388: Natural Language Processing Coreference Resolu8on Greg Durrett Road Map Text

Semi-Secure Semi-Secure where a hash is a secure one-way function. A cookie is a bit of state

The Athenian Acropolis Image courtesy of Jack Versloot on flickr. License CC BY. 1 Temple of

The Impact of Mass Loss Bin C osmos on the Final Structure Mathieu Renzo and Fate of PhD

Electronic Commerce Technologies CSC 513, Spring 2008 Munindar P. Singh singh@ncsu.edu

Options. I. Christopher G. Lamoureux January 9, 2013 Options. I. Organizing Themes

I MPROVED S TRONGLY D ENIABLE A UTHENTICATED K EY E XCHANGES F OR S ECURE M ESSAGING Nik Unger

Public Service Announcement Help nonprofits solve real world problems at Berkeley Builds! Join