Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner - PowerPoint PPT Presentation

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner University of California, Los Angeles (joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko)

Outline • VC dimension and VC density • VC duality • The model-theoretic context • Uniform bounds on VC density

VC dimension and VC density Let ( X, S ) be a set system : • X is a set (the base set ), most of the time assumed infinite; • S is a collection of subsets of X . We sometimes also speak of a set system S on X . Given A ⊆ X , we let S ∩ A := { S ∩ A : S ∈ S} and call ( A, S ∩ A ) the set system on A induced by S . We say A is shattered by S if S ∩ A = 2 A .

VC dimension and VC density If S � = ∅ , then we define the VC dimension of S , denoted by VC( S ) , as the supremum (in N ∪ {∞} ) of the sizes of all finite subsets of X shattered by S . We also decree VC( ∅ ) := −∞ . Examples 1 X = R , S = all unbounded intervals. Then VC( S ) = 2 . 2 X = R 2 , S = all halfspaces. Then VC( S ) = 3 . One point in the convex hull No point in the convex hull of the others of the others 3 Let S = half spaces in R d . Then VC( S ) = d + 1 . (The inequality � follows from Radon’s Lemma .)

VC dimension and VC density Examples (continued) 4 X = R 2 , S = all convex polygons. Then VC( S ) = ∞ . (But VC( { convex n -gons in R 2 } ) = 2 n + 1 .)

VC dimension and VC density The function � � X �� n �→ π S ( n ) := max |S ∩ A | : A ∈ : N → N n is called the shatter function of S . Then n : π S ( n ) = 2 n � � VC( S ) = sup . One says that S is a VC class if VC( S ) < ∞ . The notion of VC dimension was introduced by Vladimir Vapnik and Alexey Chervonenkis in the early 1970s, in the context of computational learning theory.

VC dimension and VC density A surprising dichotomy holds for π S : Lemma (Sauer-Shelah) If VC( S ) = d < ∞ ( so π S ( n ) < 2 n for n > d ) then � n � n � n � � � π S ( n ) � := + · · · + for every n . � d 0 d An illuminating proof of this lemma is due to Frankl: it is enough to show that if S is a set system on a finite set X , then � X S ∩ B � = 2 B for all B ∈ � | X | � � ⇒ |S| � . d +1 � d This claim is trivially true if S is assumed to be an ideal (i.e., closed under taking subsets). One then shows that there exists an ideal T on X with |S| = |T | and |S ∩ B | � |T ∩ B | for all B .

VC dimension and VC density The Sauer-Shelah dichotomy Either • π S ( n ) = 2 n for every n (if S is not a VC class), or • π S ( n ) = O ( n d ) where d = VC( S ) < ∞ . One may now define the VC density of S as inf { r ∈ R > 0 : π S ( n ) = O ( n r ) } � if VC( S ) < ∞ vc( S ) = ∞ otherwise. log π S ( n ) ∈ R � 0 ∪ {∞} . = lim sup log n n →∞ We also define vc( ∅ ) := −∞ .

VC dimension and VC density Examples � X � n � � 1 S = . Then VC( S ) = vc( S ) = d ; in fact π S ( n ) = . � d � d 2 S = half spaces in R d . Then VC( S ) = d + 1 [as seen above] and vc( S ) = d . Some basic properties • vc( S ) � VC( S ) , and if one is finite then so is the other; • VC( S ) = 0 ⇐ ⇒ |S| = 1 ; • S is finite ⇐ ⇒ vc( S ) = 0 ⇐ ⇒ vc( S ) < 1 ; • S = S 1 ∪ S 2 ⇒ vc( S ) = max { vc( S 1 ) , vc( S 2 ) } . (So vc( S ) doesn’t change if we alter finitely many sets of S .)

VC dimension and VC density VC density is often the right measure for the combinatorial complexity of a set system. For example, it is related to packing numbers and entropy. Definition Let ( S, d ) be a bounded pseudo-metric space, and ε > 0 . 1 D ⊆ S is an ε -packing if d ( a, b ) > ε for all a � = b in D ; 2 the ε -packing number of ( S, d ) is D ( S, d ; ε ) := max {| D | : D ⊆ S is a finite ε -packing } ; 3 the entropic dimension of ( S, d ) is dim( S, d ) := inf { s ∈ R > 0 : ∃ C > 0 : ∀ ε > 0 : D ( S, d ; ε ) � Cε − s } .

VC dimension and VC density If ( X, A , µ ) is a probability space, then we equip A with the (bounded) pseudo-metric d µ ( A, B ) := µ ( A △ B ) . Theorem (Dudley � , Assouad � ) vc( S ) = sup µ dim( S , d µ ) , where the supremum ranges over all probability measures µ on X making all sets in S measurable. There is a refinement of the inequality vc( S ) � dim( S , d µ ) for µ concentrated uniformly on a finite set (Haussler, Wernisch): D ( S , d µ ; ε ) � Cε − ̺ for all ε > 0 , where C only depends on π S (not on ( X, S ) , µ , . . . ).

VC duality Let X be a set (possibly finite). Given A 1 , . . . , A n ⊆ X , denote by S ( A 1 , . . . , A n ) the set of atoms of the Boolean subalgebra of 2 X generated by A 1 , . . . , A n : those subsets of X of the form � � A i ∩ X \ A i where I ⊆ [ n ] = { 1 , . . . , n } i ∈ I i ∈ [ n ] \ I which are non-empty. Suppose now that S is a set system on X . We define n �→ π ∗ � � S ( n ) := max | S ( A 1 , . . . , A n ) | : A 1 , . . . , A n ∈ S : N → N . S ( n ) = 2 n for every n , We say that S is independent (in X ) if π ∗ and dependent (in X ) otherwise.

VC duality Example ( X = R 2 , S = half planes in R 2 ) � maximum number of regions into which n half π ∗ S ( n ) = planes partition the plane. Adding one half plane to n − 1 given half planes divides at most n of the existing regions into 2 pieces. So π ∗ S ( n ) = O ( n 2 ) . The function π ∗ S is called the dual shatter function of S , since S = π S ∗ for a certain set system S ∗ on (for infinite S ) one has π ∗ X ∗ = S , called the dual of S .

VC duality Let X , Y be infinite sets, Φ ⊆ X × Y a binary relation. Put S Φ := { Φ y : y ∈ Y } ⊆ 2 X where Φ y := { x ∈ X : ( x, y ) ∈ Φ } , and π ∗ Φ := π ∗ π Φ := π S Φ , S Φ , VC(Φ) := VC( S Φ ) , vc(Φ) := vc( S Φ ) . We also write Φ ∗ ⊆ Y × X := � � ( y, x ) ∈ Y × X : ( x, y ) ∈ Φ . In this way we obtain two set systems: ( X, S Φ ) and ( Y, S Φ ∗ ) Given a finite set A ⊆ X we have a bijection A ′ �→ � � Φ ∗ Y \ Φ ∗ S Φ ∩ A → S (Φ ∗ x ∩ x : x : x ∈ A ) . x ∈ A ′ x ∈ A \ A ′

VC duality Hence π Φ = π ∗ Φ ∗ and π Φ ∗ = π ∗ Φ , and thus S Φ is a VC class ⇐ ⇒ S Φ ∗ is dependent, S Φ ∗ is a VC class ⇐ ⇒ S Φ is dependent. Moreover (first noticed by Assouad): S Φ ∗ is a VC class. S Φ is a VC class ⇐ ⇒ Exploiting this VC duality one easily shows: vc( ¬ Φ) = vc(Φ) , vc(Φ ∪ Ψ) � vc(Φ) + vc(Ψ) , vc(Φ ∩ Ψ) � vc(Φ) + vc(Ψ) . VC does not satisfy similar subadditivity properties.

The model-theoretic context We fix: L : a first-order language, x = ( x 1 , . . . , x m ) : object variables, y = ( y 1 , . . . , y n ) : parameter variables, ϕ ( x ; y ) : a partitioned L -formula, M : an infinite L -structure, and T : a complete L -theory without finite models. The set system (on M m ) associated with ϕ in M : S M := { ϕ M ( M m ; b ) : b ∈ M n } ϕ If M ≡ N , then π S M ϕ = π S N ϕ . So, picking M | = T arbitrary, set VC( ϕ ) := VC( S M vc( ϕ ) := vc( S M π ϕ := π S M ϕ , ϕ ) , ϕ ) .

The model-theoretic context The dual of ϕ ( x ; y ) is ϕ ∗ ( y ; x ) := ϕ ( x ; y ) . Put VC ∗ ( ϕ ) := VC( ϕ ∗ ) , vc ∗ ( ϕ ) := vc( ϕ ∗ ) . We have π ∗ ϕ = π ϕ ∗ , hence VC ∗ ( ϕ ) and vc ∗ ( ϕ ) can be computed using the dual shatter function of ϕ . If VC( ϕ ) < ∞ then we say that ϕ is dependent in T . The theory T does not have the independence property (is NIP ) if every partitioned L -formula is dependent in T . An important theorem of Shelah (given other proofs by Laskowski and others) says that for T to be NIP it is enough for for every L -formula ϕ ( x ; y ) with | x | = 1 to be dependent. Many (but not all) well-behaved theories arising naturally in model theory are NIP .

The model-theoretic context Some questions about vc in model theory 1 Possible values of vc( ϕ ) . There exists a formula ϕ ( x ; y ) in L rings with | y | = 4 such that vc ACF 0 ( ϕ ) = 4 vc ACF p ( ϕ ) = 3 3 ; 2 for p > 0 . We do not know an example of a formula ϕ in a NIP theory ∈ Q . with vc( ϕ ) / 2 Growth of π ϕ . There is an example of an ω -stable T and an L -formula ϕ ( x ; y ) with | y | = 2 and π ϕ ( n ) = 1 2 n log n (1 + o (1)) . 3 Uniform bounds on vc( ϕ ). The topic of the rest of the talk.

Uniform bounds on VC density Two extrinsic reasons why it should be interesting to obtain bounds on vc( ϕ ) in terms of | y | = number of free parameters: 1 Connections to strengthenings of the NIP concept: if vc( ϕ ) < 2 for each ϕ ( x ; y ) with | y | = 1 then T is dp-minimal ; 2 uniform bounds on VC density often “explain” why certain well-known bounds on the complexity of geometric arrangements, used in computational geometry, are polynomial in the number of objects involved. Example ( L = language of rings, K | = ACF ) Choose ϕ ( x ; y ) so that S K ϕ is the collection of all zero sets (in K m ) of polynomials in m indeterminates with coefficients in K having degree at most d . Hence π ∗ ϕ ( t ) is the maximum number of non-empty Boolean combinations of t such hypersurfaces. Then π ∗ ϕ ( t ) = π ϕ ∗ ( t ) = O ( t m ) .

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner - PowerPoint PPT Presentation

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner University of California, Los Angeles (joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko) Outline VC dimension and VC density VC duality The

NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF ROLAND WALKER 1. Introduction

Advanced Introduction to Machine Learning, CMU-10715 Vapnik Chervonenkis Theory Barnabs

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

LEARNING WITH NONTRIVIAL TEACHER: LEARNING USING PRIVILEGED INFORMATION Vladimir Vapnik

The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers

Floquet Theory for Internal Gravity Waves in a Density-Stratified Fluid Yuanxun Bill Bao Senior

Density Functional Theory Barry T Pickup Department of Chemistry University of Sheffield

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Absolute notions in model theory Syntactic and semantic notions Absolutness from model theory

Sara Zook, RDN-CD, CPHWC Calorie Density Calorie density is a measure of the calorie content of

How to estimate a density on a spider web ? Dominique Picard How to estimate a density on a

VC Dimension, VC Density, and an Application to Algebraically Closed Valued Fields Roland Walker

Popula'on Size Dependent, Age Structured Branching Processes Linger

PERIODS AND SYSTEM- PERIODS AND SYSTEM- VERSIONED TABLES VERSIONED TABLES Vik Fearing February

LBNF/DUNE UK Project News Alfons Weber University of Oxford, UKRI/STFC Rutherford Appleton Lab

-+ S-~'tj8A.-) E-? ? ."vCt\ v L.pwe T,.",r.cm ~,n 0 ~ (0Uf'I8 ! . .U$ud '!I

MHSP with position detection MHSP with position detection capability capability H. Natal da Luz

i PETE Efaw 42 set 2 2 pi for both items E of 3 I PrN fa setp.nu 24 Pr buys 3 I Ey Eber

The Linear Algebra of Space-Time: Length Contraction and Time Dilation Near the Speed of Light

Sambuz

Useful Links

Newsletter

Mail Us

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner - PowerPoint PPT Presentation

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner University of California, Los Angeles (joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko) Outline VC dimension and VC density VC duality The

NOTES ON THE VAPNIK-CHERVONENKIS THEOREM: BACKGROUND AND PROOF ROLAND WALKER 1. Introduction

Advanced Introduction to Machine Learning, CMU-10715 Vapnik Chervonenkis Theory Barnabs

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

LEARNING WITH NONTRIVIAL TEACHER: LEARNING USING PRIVILEGED INFORMATION Vladimir Vapnik

The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers

Floquet Theory for Internal Gravity Waves in a Density-Stratified Fluid Yuanxun Bill Bao Senior

Density Functional Theory Barry T Pickup Department of Chemistry University of Sheffield

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Absolute notions in model theory Syntactic and semantic notions Absolutness from model theory

Sara Zook, RDN-CD, CPHWC Calorie Density Calorie density is a measure of the calorie content of

How to estimate a density on a spider web ? Dominique Picard How to estimate a density on a

VC Dimension, VC Density, and an Application to Algebraically Closed Valued Fields Roland Walker

Popula'on Size Dependent, Age Structured Branching Processes Linger

PERIODS AND SYSTEM- PERIODS AND SYSTEM- VERSIONED TABLES VERSIONED TABLES Vik Fearing February

LBNF/DUNE UK Project News Alfons Weber University of Oxford, UKRI/STFC Rutherford Appleton Lab

-+ S-~'tj8A.-) E-? ? .&quot;vCt\ v L.pwe T,.&quot;,r.cm ~,n 0 ~ (0Uf'I8 ! . .U$ud '!I

MHSP with position detection MHSP with position detection capability capability H. Natal da Luz

i PETE Efaw 42 set 2 2 pi for both items E of 3 I PrN fa setp.nu 24 Pr buys 3 I Ey Eber

The Linear Algebra of Space-Time: Length Contraction and Time Dilation Near the Speed of Light

Sambuz

Useful Links

Newsletter

Mail Us

-+ S-~'tj8A.-) E-? ? ."vCt\ v L.pwe T,.",r.cm ~,n 0 ~ (0Uf'I8 ! . .U$ud '!I