Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner - - PowerPoint PPT Presentation

vapnik chervonenkis density in model theory
SMART_READER_LITE
LIVE PREVIEW

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner - - PowerPoint PPT Presentation

Vapnik-Chervonenkis Density in Model Theory Matthias Aschenbrenner University of California, Los Angeles (joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko) Outline VC dimension and VC density VC duality The


slide-1
SLIDE 1

Vapnik-Chervonenkis Density in Model Theory

Matthias Aschenbrenner University of California, Los Angeles

(joint with A. Dolich, D. Haskell, D. Macpherson, and S. Starchenko)

slide-2
SLIDE 2

Outline

  • VC dimension and VC density
  • VC duality
  • The model-theoretic context
  • Uniform bounds on VC density
slide-3
SLIDE 3

VC dimension and VC density

Let (X, S) be a set system:

  • X is a set (the base set), most of the time assumed

infinite;

  • S is a collection of subsets of X.

We sometimes also speak of a set system S on X. Given A ⊆ X, we let S ∩ A := {S ∩ A : S ∈ S} and call (A, S ∩ A) the set system on A induced by S. We say A is shattered by S if S ∩ A = 2A.

slide-4
SLIDE 4

VC dimension and VC density

If S = ∅, then we define the VC dimension of S, denoted by VC(S), as the supremum (in N ∪ {∞}) of the sizes of all finite subsets of X shattered by S. We also decree VC(∅) := −∞.

Examples

1 X = R, S = all unbounded intervals. Then VC(S) = 2. 2 X = R2, S = all halfspaces. Then VC(S) = 3.

One point in the convex hull

  • f the others

No point in the convex hull

  • f the others

3 Let S = half spaces in Rd. Then VC(S) = d + 1.

(The inequality follows from Radon’s Lemma.)

slide-5
SLIDE 5

VC dimension and VC density

Examples (continued)

4 X = R2, S = all convex polygons. Then VC(S) = ∞.

(But VC({convex n-gons in R2}) = 2n + 1.)

slide-6
SLIDE 6

VC dimension and VC density

The function n → πS(n) := max

  • |S ∩ A| : A ∈

X n

  • : N → N

is called the shatter function of S. Then VC(S) = sup

  • n : πS(n) = 2n

. One says that S is a VC class if VC(S) < ∞. The notion of VC dimension was introduced by Vladimir Vapnik and Alexey Chervonenkis in the early 1970s, in the context of computational learning theory.

slide-7
SLIDE 7

VC dimension and VC density

A surprising dichotomy holds for πS:

Lemma (Sauer-Shelah)

If VC(S) = d < ∞ (so πS(n) < 2n for n > d) then πS(n) n

d

  • :=

n

  • + · · · +

n

d

  • for every n.

An illuminating proof of this lemma is due to Frankl: it is enough to show that if S is a set system on a finite set X, then S ∩ B = 2B for all B ∈ X

d+1

|S| |X|

d

  • .

This claim is trivially true if S is assumed to be an ideal (i.e., closed under taking subsets). One then shows that there exists an ideal T on X with |S| = |T | and |S ∩ B| |T ∩ B| for all B.

slide-8
SLIDE 8

VC dimension and VC density

The Sauer-Shelah dichotomy

Either

  • πS(n) = 2n for every n (if S is not a VC class),
  • r
  • πS(n) = O(nd) where d = VC(S) < ∞.

One may now define the VC density of S as vc(S) =

  • inf{r ∈ R>0 : πS(n) = O(nr)}

if VC(S) < ∞ ∞

  • therwise.

= lim sup

n→∞

log πS(n) log n ∈ R0 ∪ {∞}. We also define vc(∅) := −∞.

slide-9
SLIDE 9

VC dimension and VC density

Examples

1 S =

X

d

  • . Then VC(S) = vc(S) = d; in fact πS(n) =

n

d

  • .

2 S = half spaces in Rd. Then VC(S) = d + 1 [as seen

above] and vc(S) = d.

Some basic properties

  • vc(S) VC(S), and if one is finite then so is the other;
  • VC(S) = 0 ⇐

⇒ |S| = 1;

  • S is finite ⇐

⇒ vc(S) = 0 ⇐ ⇒ vc(S) < 1;

  • S = S1 ∪ S2 ⇒ vc(S) = max{vc(S1), vc(S2)}.

(So vc(S) doesn’t change if we alter finitely many sets of S.)

slide-10
SLIDE 10

VC dimension and VC density

VC density is often the right measure for the combinatorial complexity of a set system. For example, it is related to packing numbers and entropy.

Definition

Let (S, d) be a bounded pseudo-metric space, and ε > 0.

1 D ⊆ S is an ε-packing if d(a, b) > ε for all a = b in D; 2 the ε-packing number of (S, d) is

D(S, d; ε) := max{|D| : D ⊆ S is a finite ε-packing};

3 the entropic dimension of (S, d) is

dim(S, d) := inf{s ∈ R>0 : ∃C > 0 : ∀ε > 0 : D(S, d; ε) Cε−s}.

slide-11
SLIDE 11

VC dimension and VC density

If (X, A, µ) is a probability space, then we equip A with the (bounded) pseudo-metric dµ(A, B) := µ(A△B).

Theorem (Dudley , Assouad )

vc(S) = sup

µ dim(S, dµ),

where the supremum ranges over all probability measures µ on X making all sets in S measurable. There is a refinement of the inequality vc(S) dim(S, dµ) for µ concentrated uniformly on a finite set (Haussler, Wernisch): D(S, dµ; ε) Cε−̺ for all ε > 0, where C only depends on πS (not on (X, S), µ, . . . ).

slide-12
SLIDE 12

VC duality

Let X be a set (possibly finite). Given A1, . . . , An ⊆ X, denote by S(A1, . . . , An) the set of atoms of the Boolean subalgebra of 2X generated by A1, . . . , An: those subsets of X of the form

  • i∈I

Ai ∩

  • i∈[n]\I

X \ Ai where I ⊆ [n] = {1, . . . , n} which are non-empty. Suppose now that S is a set system on X. We define n → π∗

S(n) := max

  • |S(A1, . . . , An)| : A1, . . . , An ∈ S
  • : N → N.

We say that S is independent (in X) if π∗

S(n) = 2n for every n,

and dependent (in X) otherwise.

slide-13
SLIDE 13

VC duality

Example (X = R2, S = half planes in R2)

π∗

S(n) =

  • maximum number of regions into which n half

planes partition the plane. Adding one half plane to n − 1 given half planes divides at most n of the existing regions into 2 pieces. So π∗

S(n) = O(n2).

The function π∗

S is called the dual shatter function of S, since

(for infinite S) one has π∗

S = πS∗ for a certain set system S∗ on

X∗ = S, called the dual of S.

slide-14
SLIDE 14

VC duality

Let X, Y be infinite sets, Φ ⊆ X × Y a binary relation. Put SΦ := {Φy : y ∈ Y } ⊆ 2X where Φy := {x ∈ X : (x, y) ∈ Φ}, and πΦ := πSΦ, π∗

Φ := π∗ SΦ,

VC(Φ) := VC(SΦ), vc(Φ) := vc(SΦ). We also write Φ∗ ⊆ Y × X :=

  • (y, x) ∈ Y × X : (x, y) ∈ Φ
  • .

In this way we obtain two set systems: (X, SΦ) and (Y, SΦ∗) Given a finite set A ⊆ X we have a bijection A′ →

  • x∈A′

Φ∗

x ∩

  • x∈A\A′

Y \ Φ∗

x :

SΦ ∩ A → S(Φ∗

x : x ∈ A).

slide-15
SLIDE 15

VC duality

Hence πΦ = π∗

Φ∗ and πΦ∗ = π∗ Φ, and thus

SΦ is a VC class ⇐ ⇒ SΦ∗ is dependent, SΦ∗ is a VC class ⇐ ⇒ SΦ is dependent. Moreover (first noticed by Assouad): SΦ is a VC class ⇐ ⇒ SΦ∗ is a VC class. Exploiting this VC duality one easily shows: vc(¬Φ) = vc(Φ), vc(Φ ∪ Ψ) vc(Φ) + vc(Ψ), vc(Φ ∩ Ψ) vc(Φ) + vc(Ψ). VC does not satisfy similar subadditivity properties.

slide-16
SLIDE 16

The model-theoretic context

We fix: L: a first-order language, x = (x1, . . . , xm): object variables, y = (y1, . . . , yn): parameter variables, ϕ(x; y): a partitioned L-formula, M: an infinite L-structure, and T: a complete L-theory without finite models. The set system (on Mm) associated with ϕ in M: SM

ϕ

:= {ϕM(Mm; b) : b ∈ Mn} If M ≡ N, then πSM

ϕ = πSN ϕ . So, picking M |

= T arbitrary, set πϕ := πSM

ϕ ,

VC(ϕ) := VC(SM

ϕ ),

vc(ϕ) := vc(SM

ϕ ).

slide-17
SLIDE 17

The model-theoretic context

The dual of ϕ(x; y) is ϕ∗(y; x) := ϕ(x; y). Put VC∗(ϕ) := VC(ϕ∗), vc∗(ϕ) := vc(ϕ∗). We have π∗

ϕ = πϕ∗, hence VC∗(ϕ) and vc∗(ϕ) can be computed

using the dual shatter function of ϕ. If VC(ϕ) < ∞ then we say that ϕ is dependent in T. The theory T does not have the independence property (is NIP) if every partitioned L-formula is dependent in T. An important theorem of Shelah (given other proofs by Laskowski and others) says that for T to be NIP it is enough for for every L-formula ϕ(x; y) with |x| = 1 to be dependent. Many (but not all) well-behaved theories arising naturally in model theory are NIP .

slide-18
SLIDE 18

The model-theoretic context

Some questions about vc in model theory

1 Possible values of vc(ϕ). There exists a formula ϕ(x; y) in

Lrings with |y| = 4 such that vcACF0(ϕ) = 4

3;

vcACFp(ϕ) = 3

2 for p > 0.

We do not know an example of a formula ϕ in a NIP theory with vc(ϕ) / ∈ Q.

2 Growth of πϕ. There is an example of an ω-stable T and an

L-formula ϕ(x; y) with |y| = 2 and πϕ(n) = 1

2n log n (1 + o(1)). 3 Uniform bounds on vc(ϕ). The topic of the rest of the talk.

slide-19
SLIDE 19

Uniform bounds on VC density

Two extrinsic reasons why it should be interesting to obtain bounds on vc(ϕ) in terms of |y| = number of free parameters:

1 Connections to strengthenings of the NIP concept: if

vc(ϕ) < 2 for each ϕ(x; y) with |y| = 1 then T is dp-minimal;

2 uniform bounds on VC density often “explain” why certain

well-known bounds on the complexity of geometric arrangements, used in computational geometry, are polynomial in the number of objects involved.

Example (L = language of rings, K | = ACF)

Choose ϕ(x; y) so that SK

ϕ is the collection of all zero sets (in

Km) of polynomials in m indeterminates with coefficients in K having degree at most d. Hence π∗

ϕ(t) is the maximum number

  • f non-empty Boolean combinations of t such hypersurfaces.

Then π∗

ϕ(t) = πϕ∗(t) = O(tm).

slide-20
SLIDE 20

Uniform bounds on VC density

The VC density of T is the function vc = vcT : N → R0 ∪ {∞} vc(m) := sup

  • vc(ϕ) : ϕ(x; y) is an L-formula with |y| = m
  • = sup
  • vc∗(ϕ) : ϕ(x; y) is an L-formula with |x| = m
  • .

It is sometimes convenient to deal with finitely many formulas: ∆(x; y): a finite non-empty set of partitioned L-formulas; S∆(B): the set of complete ∆(x; B)-types in M (B ⊆ M|y|). If T is NIP then we set vc∗(∆) := inf

  • r ∈ R>0 : |S∆(B)| = O(|B|r) for all finite B ⊆ M|y|

. Then we have (using coding tricks) vcT (m) = sup

  • vc∗(∆) : ∆ = ∆(x; y) finite, |x| = m
  • .
slide-21
SLIDE 21

Uniform bounds on VC density

Definition (adapted from Guingona)

∆ has uniform definability of types over finite sets (UDTFS) in M if there are finitely many families of L-formulas Fi =

  • ϕi(y; y1, . . . , ym)
  • ϕ∈∆

(i ∈ I) such that for every finite B ⊆ M|y| and q ∈ S∆(B) there are b1, . . . , bm ∈ B, i ∈ I such that Fi(y; b1, . . . , bm) defines q. If we don’t care about the number of extra parameters m, then we can always achieve |I| = 1, and we can always reduce to |∆| = 1. (At least for |B| 2.) On the other hand: if ∆ has a uniform definition F = (Fi)i∈I for ∆-types with m parameters, then |S∆(B)| |I| · |B|m for every finite B.

slide-22
SLIDE 22

Uniform bounds on VC density

Definition

M has the VC m property if any ∆(x; y) with |x| = 1 has a uniform definition of ∆-types over finite sets with m parameters.

Theorem

Suppose that M has the VC m property. Then every ∆(x; y) has a uniform definition of ∆-types over finite sets in M with m|x| parameters. Hence vc∗(∆) m|x|, so n vcT (n) m · n for every n. In particular, if m = 1 then vcT (n) = n for every n. We now give some example of theories with the VC m property.

slide-23
SLIDE 23

Uniform bounds on VC density

Theorem

Suppose M is an expansion of a linearly ordered set. If T is weakly o-minimal, then T has the VC 1 property.

(Generalizes earlier results due to Karpinski-Macintyre and Wilkie.)

Sketch of the proof.

Let M | = T and ∆(x; y) be a finite non-empty set of L-formulas with |x| = 1. We let ϕ range over ∆ and b over M|y|. If for each ϕ and b, the set ϕ(M; b) is an initial segment of M, then clearly ∆ has UDTFS with a single parameter. In general, there is some N such that for each ϕ and b, ϕ(M; b) has N convex components, and hence is a Boolean combination of 2N initial segments of M (uniformly in b). Forming Boolean combinations preserves UDTFS.

slide-24
SLIDE 24

Uniform bounds on VC density

Interesting classes of NIP theories are provided by various types of valued fields. A non-trivial elaboration of our methods yields the following (probably non-optimal) result:

Theorem

Suppose M = Qp is the field of p-adic numbers, construed as a first-order structure in the language of rings. Then T has the VC 2 property; moreover, vcT (m) 2m − 1 for every m. The result same holds for the subanalytic expansions of Qp considered by Denef and van den Dries. Key tools are cell decomposition and the existence of definable Skolem functions. We also have results stating that certain stable theories T have VC density vcT growing linearly, not obtained via the VC m.

slide-25
SLIDE 25

Uniform bounds on VC density

Theorem

Let A be an infinite abelian group, T = Th(A). T.f.a.e.:

1 vcT (1) < ∞; 2 there is some d such that vcT (m) dm for every m; 3 there are only finitely many p such that A[p] or A/pA is

infinite, and for all p there are only finitely many n such that U(p, n; A) = |(pnA)[p]/(pn+1A)[p]| is infinite. The proof involves some combinatorics with distributive lattices. As an upshot of the proof we are able to determine the complete theories of all dp-minimal abelian groups.

slide-26
SLIDE 26

Uniform bounds on VC density

A general theorem is:

Theorem

Suppose T does not have the finite cover property and finite U-rank. Then vcT (m) m U(T) for every m. Cases where the theorem applies:

1 T is countable, totally transcendental, and ℵ0-categorical. 2 T is totally transcendental and unidimensional; e.g., T is

countable and ℵ1-categorical, or T is strongly minimal.

3 T is an expansion of the theory of groups with MR(T) < ω.

slide-27
SLIDE 27

Uniform bounds on VC density

There are many open questions in this subject. Let me finish with a particularly attractive one:

Open question

If vcT (1) < ∞, is vcT (m) < ∞ for every m? If true, we would have a “VC density version” of Laskowski’s Theorem (= the constructive version of Shelah’s theorem that T is NIP if VC(ϕ) < ∞ for all ϕ(x; y) with |x| = 1).