SLIDE 1
Vapnik-Chervonenkis Density in Model Theory
Matthias Aschenbrenner University of California, Los Angeles
(joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko)
SLIDE 2 Outline
- VC dimension and VC density
- VC duality
- The model-theoretic context
- Uniform bounds on VC density
SLIDE 3 VC dimension and VC density
Let (X, S) be a set system:
- X is a set (the base set), most of the time assumed
infinite;
- S is a collection of subsets of X.
We sometimes also speak of a set system S on X. Given A ⊆ X, we let S ∩ A := {S ∩ A : S ∈ S} and call (A, S ∩ A) the set system on A induced by S. We say A is shattered by S if S ∩ A = 2A.
SLIDE 4 VC dimension and VC density
If S = ∅, then we define the VC dimension of S, denoted by VC(S), as the supremum (in N ∪ {∞}) of the sizes of all finite subsets of X shattered by S. We also decree VC(∅) := −∞.
Examples
1 X = R, S = all unbounded intervals. Then VC(S) = 2. 2 X = R2, S = all halfspaces. Then VC(S) = 3.
One point in the convex hull
No point in the convex hull
3 Let S = half spaces in Rd. Then VC(S) = d + 1.
(The inequality follows from Radon’s Lemma.)
SLIDE 5
VC dimension and VC density
Examples (continued)
4 X = R2, S = all convex polygons. Then VC(S) = ∞.
(But VC({convex n-gons in R2}) = 2n + 1.)
SLIDE 6 VC dimension and VC density
The function n → πS(n) := max
X n
is called the shatter function of S. Then VC(S) = sup
. One says that S is a VC class if VC(S) < ∞. The notion of VC dimension was introduced by Vladimir Vapnik and Alexey Chervonenkis in the early 1970s, in the context of computational learning theory.
SLIDE 7 VC dimension and VC density
A surprising dichotomy holds for πS:
Lemma (Sauer-Shelah)
If VC(S) = d < ∞ (so πS(n) < 2n for n > d) then πS(n) n
d
n
n
d
An illuminating proof of this lemma is due to Frankl: it is enough to show that if S is a set system on a finite set X, then S ∩ B = 2B for all B ∈ X
d+1
|S| |X|
d
This claim is trivially true if S is assumed to be an ideal (i.e., closed under taking subsets). One then shows that there exists an ideal T on X with |S| = |T | and |S ∩ B| |T ∩ B| for all B.
SLIDE 8 VC dimension and VC density
The Sauer-Shelah dichotomy
Either
- πS(n) = 2n for every n (if S is not a VC class),
- r
- πS(n) = O(nd) where d = VC(S) < ∞.
One may now define the VC density of S as vc(S) =
- inf{r ∈ R>0 : πS(n) = O(nr)}
if VC(S) < ∞ ∞
= lim sup
n→∞
log πS(n) log n ∈ R0 ∪ {∞}. We also define vc(∅) := −∞.
SLIDE 9 VC dimension and VC density
Examples
1 S =
X
d
- . Then VC(S) = vc(S) = d; in fact πS(n) =
n
d
2 S = half spaces in Rd. Then VC(S) = d + 1 [as seen
above] and vc(S) = d.
Some basic properties
- vc(S) VC(S), and if one is finite then so is the other;
- VC(S) = 0 ⇐
⇒ |S| = 1;
⇒ vc(S) = 0 ⇐ ⇒ vc(S) < 1;
- S = S1 ∪ S2 ⇒ vc(S) = max{vc(S1), vc(S2)}.
(So vc(S) doesn’t change if we alter finitely many sets of S.)
SLIDE 10
VC dimension and VC density
VC density is often the right measure for the combinatorial complexity of a set system. For example, it is related to packing numbers and entropy.
Definition
Let (S, d) be a bounded pseudo-metric space, and ε > 0.
1 D ⊆ S is an ε-packing if d(a, b) > ε for all a = b in D; 2 the ε-packing number of (S, d) is
D(S, d; ε) := max{|D| : D ⊆ S is a finite ε-packing};
3 the entropic dimension of (S, d) is
dim(S, d) := inf{s ∈ R>0 : ∃C > 0 : ∀ε > 0 : D(S, d; ε) Cε−s}.
SLIDE 11
VC dimension and VC density
If (X, A, µ) is a probability space, then we equip A with the (bounded) pseudo-metric dµ(A, B) := µ(A△B).
Theorem (Dudley , Assouad )
vc(S) = sup
µ dim(S, dµ),
where the supremum ranges over all probability measures µ on X making all sets in S measurable. There is a refinement of the inequality vc(S) dim(S, dµ) for µ concentrated uniformly on a finite set (Haussler, Wernisch): D(S, dµ; ε) Cε−̺ for all ε > 0, where C only depends on πS (not on (X, S), µ, . . . ).
SLIDE 12 VC duality
Let X be a set (possibly finite). Given A1, . . . , An ⊆ X, denote by S(A1, . . . , An) the set of atoms of the Boolean subalgebra of 2X generated by A1, . . . , An: those subsets of X of the form
Ai ∩
X \ Ai where I ⊆ [n] = {1, . . . , n} which are non-empty. Suppose now that S is a set system on X. We define n → π∗
S(n) := max
- |S(A1, . . . , An)| : A1, . . . , An ∈ S
- : N → N.
We say that S is independent (in X) if π∗
S(n) = 2n for every n,
and dependent (in X) otherwise.
SLIDE 13 VC duality
Example (X = R2, S = half planes in R2)
π∗
S(n) =
- maximum number of regions into which n half
planes partition the plane. Adding one half plane to n − 1 given half planes divides at most n of the existing regions into 2 pieces. So π∗
S(n) = O(n2).
The function π∗
S is called the dual shatter function of S, since
(for infinite S) one has π∗
S = πS∗ for a certain set system S∗ on
X∗ = S, called the dual of S.
SLIDE 14 VC duality
Let X, Y be infinite sets, Φ ⊆ X × Y a binary relation. Put SΦ := {Φy : y ∈ Y } ⊆ 2X where Φy := {x ∈ X : (x, y) ∈ Φ}, and πΦ := πSΦ, π∗
Φ := π∗ SΦ,
VC(Φ) := VC(SΦ), vc(Φ) := vc(SΦ). We also write Φ∗ ⊆ Y × X :=
- (y, x) ∈ Y × X : (x, y) ∈ Φ
- .
In this way we obtain two set systems: (X, SΦ) and (Y, SΦ∗) Given a finite set A ⊆ X we have a bijection A′ →
Φ∗
x ∩
Y \ Φ∗
x :
SΦ ∩ A → S(Φ∗
x : x ∈ A).
SLIDE 15
VC duality
Hence πΦ = π∗
Φ∗ and πΦ∗ = π∗ Φ, and thus
SΦ is a VC class ⇐ ⇒ SΦ∗ is dependent, SΦ∗ is a VC class ⇐ ⇒ SΦ is dependent. Moreover (first noticed by Assouad): SΦ is a VC class ⇐ ⇒ SΦ∗ is a VC class. Exploiting this VC duality one easily shows: vc(¬Φ) = vc(Φ), vc(Φ ∪ Ψ) vc(Φ) + vc(Ψ), vc(Φ ∩ Ψ) vc(Φ) + vc(Ψ). VC does not satisfy similar subadditivity properties.
SLIDE 16 The model-theoretic context
We fix: L: a first-order language, x = (x1, . . . , xm): object variables, y = (y1, . . . , yn): parameter variables, ϕ(x; y): a partitioned L-formula, M: an infinite L-structure, and T: a complete L-theory without finite models. The set system (on Mm) associated with ϕ in M: SM
ϕ
:= {ϕM(Mm; b) : b ∈ Mn} If M ≡ N, then πSM
ϕ = πSN ϕ . So, picking M |
= T arbitrary, set πϕ := πSM
ϕ ,
VC(ϕ) := VC(SM
ϕ ),
vc(ϕ) := vc(SM
ϕ ).
SLIDE 17
The model-theoretic context
The dual of ϕ(x; y) is ϕ∗(y; x) := ϕ(x; y). Put VC∗(ϕ) := VC(ϕ∗), vc∗(ϕ) := vc(ϕ∗). We have π∗
ϕ = πϕ∗, hence VC∗(ϕ) and vc∗(ϕ) can be computed
using the dual shatter function of ϕ. If VC(ϕ) < ∞ then we say that ϕ is dependent in T. The theory T does not have the independence property (is NIP) if every partitioned L-formula is dependent in T. An important theorem of Shelah (given other proofs by Laskowski and others) says that for T to be NIP it is enough for for every L-formula ϕ(x; y) with |x| = 1 to be dependent. Many (but not all) well-behaved theories arising naturally in model theory are NIP .
SLIDE 18
The model-theoretic context
Some questions about vc in model theory
1 Possible values of vc(ϕ). There exists a formula ϕ(x; y) in
Lrings with |y| = 4 such that vcACF0(ϕ) = 4
3;
vcACFp(ϕ) = 3
2 for p > 0.
We do not know an example of a formula ϕ in a NIP theory with vc(ϕ) / ∈ Q.
2 Growth of πϕ. There is an example of an ω-stable T and an
L-formula ϕ(x; y) with |y| = 2 and πϕ(n) = 1
2n log n (1 + o(1)). 3 Uniform bounds on vc(ϕ). The topic of the rest of the talk.
SLIDE 19 Uniform bounds on VC density
Two extrinsic reasons why it should be interesting to obtain bounds on vc(ϕ) in terms of |y| = number of free parameters:
1 Connections to strengthenings of the NIP concept: if
vc(ϕ) < 2 for each ϕ(x; y) with |y| = 1 then T is dp-minimal;
2 uniform bounds on VC density often “explain” why certain
well-known bounds on the complexity of geometric arrangements, used in computational geometry, are polynomial in the number of objects involved.
Example (L = language of rings, K | = ACF)
Choose ϕ(x; y) so that SK
ϕ is the collection of all zero sets (in
Km) of polynomials in m indeterminates with coefficients in K having degree at most d. Hence π∗
ϕ(t) is the maximum number
- f non-empty Boolean combinations of t such hypersurfaces.
Then π∗
ϕ(t) = πϕ∗(t) = O(tm).
SLIDE 20 Uniform bounds on VC density
The VC density of T is the function vc = vcT : N → R0 ∪ {∞} vc(m) := sup
- vc(ϕ) : ϕ(x; y) is an L-formula with |y| = m
- = sup
- vc∗(ϕ) : ϕ(x; y) is an L-formula with |x| = m
- .
It is sometimes convenient to deal with finitely many formulas: ∆(x; y): a finite non-empty set of partitioned L-formulas; S∆(B): the set of complete ∆(x; B)-types in M (B ⊆ M|y|). If T is NIP then we set vc∗(∆) := inf
- r ∈ R>0 : |S∆(B)| = O(|B|r) for all finite B ⊆ M|y|
. Then we have (using coding tricks) vcT (m) = sup
- vc∗(∆) : ∆ = ∆(x; y) finite, |x| = m
- .
SLIDE 21 Uniform bounds on VC density
Definition (adapted from Guingona)
∆ has uniform definability of types over finite sets (UDTFS) in M if there are finitely many families of L-formulas Fi =
- ϕi(y; y1, . . . , ym)
- ϕ∈∆
(i ∈ I) such that for every finite B ⊆ M|y| and q ∈ S∆(B) there are b1, . . . , bm ∈ B, i ∈ I such that Fi(y; b1, . . . , bm) defines q. If we don’t care about the number of extra parameters m, then we can always achieve |I| = 1, and we can always reduce to |∆| = 1. (At least for |B| 2.) On the other hand: if ∆ has a uniform definition F = (Fi)i∈I for ∆-types with m parameters, then |S∆(B)| |I| · |B|m for every finite B.
SLIDE 22
Uniform bounds on VC density
Definition
M has the VC m property if any ∆(x; y) with |x| = 1 has a uniform definition of ∆-types over finite sets with m parameters.
Theorem
Suppose that M has the VC m property. Then every ∆(x; y) has a uniform definition of ∆-types over finite sets in M with m|x| parameters. Hence vc∗(∆) m|x|, so n vcT (n) m · n for every n. In particular, if m = 1 then vcT (n) = n for every n. We now give some example of theories with the VC m property.
SLIDE 23
Uniform bounds on VC density
Theorem
Suppose M is an expansion of a linearly ordered set. If T is weakly o-minimal, then T has the VC 1 property.
(Generalizes earlier results due to Karpinski-Macintyre and Wilkie.)
Sketch of the proof.
Let M | = T and ∆(x; y) be a finite non-empty set of L-formulas with |x| = 1. We let ϕ range over ∆ and b over M|y|. If for each ϕ and b, the set ϕ(M; b) is an initial segment of M, then clearly ∆ has UDTFS with a single parameter. In general, there is some N such that for each ϕ and b, ϕ(M; b) has N convex components, and hence is a Boolean combination of 2N initial segments of M (uniformly in b). Forming Boolean combinations preserves UDTFS.
SLIDE 24
Uniform bounds on VC density
Interesting classes of NIP theories are provided by various types of valued fields. A non-trivial elaboration of our methods yields the following (probably non-optimal) result:
Theorem
Suppose M = Qp is the field of p-adic numbers, construed as a first-order structure in the language of rings. Then T has the VC 2 property; moreover, vcT (m) 2m − 1 for every m. The result same holds for the subanalytic expansions of Qp considered by Denef and van den Dries. Key tools are cell decomposition and the existence of definable Skolem functions. We also have results stating that certain stable theories T have VC density vcT growing linearly, not obtained via the VC m.
SLIDE 25
Uniform bounds on VC density
Theorem
Let A be an infinite abelian group, T = Th(A). T.f.a.e.:
1 vcT (1) < ∞; 2 there is some d such that vcT (m) dm for every m; 3 there are only finitely many p such that A[p] or A/pA is
infinite, and for all p there are only finitely many n such that U(p, n; A) = |(pnA)[p]/(pn+1A)[p]| is infinite. The proof involves some combinatorics with distributive lattices. As an upshot of the proof we are able to determine the complete theories of all dp-minimal abelian groups.
SLIDE 26
Uniform bounds on VC density
A general theorem is:
Theorem
Suppose T does not have the finite cover property and finite U-rank. Then vcT (m) m U(T) for every m. Cases where the theorem applies:
1 T is countable, totally transcendental, and ℵ0-categorical. 2 T is totally transcendental and unidimensional; e.g., T is
countable and ℵ1-categorical, or T is strongly minimal.
3 T is an expansion of the theory of groups with MR(T) < ω.
SLIDE 27
Uniform bounds on VC density
There are many open questions in this subject. Let me finish with a particularly attractive one:
Open question
If vcT (1) < ∞, is vcT (m) < ∞ for every m? If true, we would have a “VC density version” of Laskowski’s Theorem (= the constructive version of Shelah’s theorem that T is NIP if VC(ϕ) < ∞ for all ϕ(x; y) with |x| = 1).