Automatic Structures and Rank Cornell REU Group 2 Summer 2009 1 - - PDF document

automatic structures and rank
SMART_READER_LITE
LIVE PREVIEW

Automatic Structures and Rank Cornell REU Group 2 Summer 2009 1 - - PDF document

Automatic Structures and Rank Cornell REU Group 2 Summer 2009 1 Important Questions 1. Is word automatic in presence of oracles? 2. Is k rank k tree presentable? 3. Can be embedded (not first order) into any word automatic


slide-1
SLIDE 1

Automatic Structures and Rank

Cornell REU Group 2 Summer 2009

1 Important Questions

  • 1. Is ωω word automatic in presence of oracles?
  • 2. Is ωωk rank k tree presentable?
  • 3. Can ωω be embedded (not first order) into any word automatic structure?

2 Definitions

A deterministic finite word automaton on a finite alphabet Σ is defined by a 4-tuple M = (Q, q0, δ, F), where Q is the finite set of states, q0 ∈ Q is the starting state, δ : Q×Σ → Q is the state transition function, and F ⊂ Q is the set of final accepting states. Given an input string v0v1 . . . vn ∈ Σ∗, M visits the sequence of states q0, q1, . . . , qn, where qi+1 = δ(qi, vi). We say that M accepts the input if and only if qn ∈ F. If δ is a relation, δ ⊂ Q × Σ × Q, we call the automaton nondeterministic. A nondeter- ministic finite word automaton M accepts an input string v0v1 . . . vn if and only if there exists a sequence of states q0, q1, . . . , qn such that (qi, vi, qi+1) ∈ δ for all 0 ≤ i < n and qn ∈ F. A language is a subset of Σ∗. We call a language L regular if there exists a finite automaton M such that M accepts v0v1 . . . vn ∈ Σ∗ if and only if v0v1 . . . vn ∈ L. A structure is an n-tuple A = (A, R1, . . . , Rn−1), where A is a set, called the domain, and R1 through Rn−1 are relations on A. Given n strings u1, . . . , un ⊂ Σ∗, we form their convolution ⊗(u1, . . . , un) ∈ [(Σ ∪ {})n]∗ by first padding each string with blank symbols () until they are the same length, then taking the string of n-tuples, ((u1)1, . . . , (un)1)((u1)2, . . . , (un)2) . . ., which we interpret as a string over a finite alphabet of size (|Σ| + 1)n. An n-ary relation R ⊂ (Σ∗)n over a regular domain A ⊂ Σ∗ is called regular if ⊗R ⊂ [(Σ ∪ {})n]∗ is regular, where ⊗R = {⊗(u1, . . . , un)|(u1, . . . , un) ∈ R}. A structure A is finite word automatic if A ⊂ Σ∗ is a regular set of strings over a finite alphabet, and each Ri is regular. 1

slide-2
SLIDE 2

Two structures with the same signature are isomorphic if there is a bijection between their domains preserving relations. If a structure S is isomorphic to an automatic structure A, then S is called automatically presentable. We say a structure admits a rank-n finite word presentation if it has an automatic presentation in which the domain is a subset of 1∗2∗ . . . n∗. For convenience, we define [n] = {0, 1, . . . , n}.

3 Results

Theorem 3.1. ωn+1 is the smallest ordinal which is not rank-n presentable. Theorem 3.2. The Rational numbers, (Q, ≤), has no finite rank automatic presentation. Theorem 3.3. ωn+1 is not rank-n presentable, even with an oracle over an arbitrary alphabet.

4 Important lemmas

Lemma 4.1. Compression Lemma: Suppose that a structure S is isomorphic to a structure A = (A, R1, . . . , Rn), such that A is automatic and A ⊂ 1∗ . . . i<m . . . n∗. Then S admits an automatic presentation of rank n − 1.

  • Proof. (Sketch)

We will exhibit an isomorphism from A to another automatic structure B with domain B ⊂ 1∗ . . . (i−1)∗(i+1)∗ . . . n∗ of rank n−1. There are two cases, i < n, and i = n (which is simpler). Encode the domain as follows: Repeat each symbol in a block of length m each time it

  • appears. Also i appeared j < m times. Transform the j blocks of i to blocks of i + 1, and

then add j many ns to the end of the word. There are finite automata that recognize the domain and relations in the original encoding, and we will transform them to recognize the new encoding. Let R be an n-ary relation, n ≥ 1 (the domain is also a unary relation). The idea is a

  • follows. The machine will work on blocks of letters. At the start, it will guess how many i

blocks were encoded as i + 1 in each string. Then when it starts reading the i + 1 blocks in each component, it will treat the first few as i blocks. Then, when it reaches the end

  • f each component in the string, it will compare the guess to the actual length mod m,

rejecting if it was wrong. Details upon request. Suppose the machine used an oracle input. Then we can repeat each symbol in the oracle input m times as well, and it will be available when it is needed. So if S is isomorphic to a structure A = (A, R1, . . . , Rn), such that A is automatic with oracle o and A ⊂ 2

slide-3
SLIDE 3

1∗ . . . i<m . . . n∗, then S admits an automatic presentation of rank n − 1, with some other

  • racle o′.

Lemma 4.2. Limited Memory Lemma: Suppose (A, R) is an automatic structure, with R of arity n ≥ 2. Suppose the automaton recognizing R has only k states. Let S and L be two subsets of A, with M = maxs∈S |s| < minl∈L |l|, and all strings in l agreeing on the first M characters. For any m < n, the set

  • f relations {Rl(s) = R(s, l) : l ∈ Ln−m} can only partition the set of m-tuples {s ∈ Sm}

into at most k sets. This holds even if R is allowed access to an oracle over any finite alphabet.

  • Proof. Consider the computation of the automaton recognizing R, on some string ⊗(o, s, l).

After reading the first M symbols, the automaton is in one of k states. This state depends

  • nly on s, and not on l, as all choices for l agree on the first M symbols. Partition the tuples

s into X1, · · · Xk, according to which state the automaton is in at that point. If s1 and s2 are in the same piece of the partition, then for each l, the computations on ⊗(o, s1, l), and ⊗(o, s2, l) must agree on every state after the Mth. So either both computations accept,

  • r both reject, for each l.

Lemma 4.3. Fix n. The number of ways to apportion i into n pieces, with order mat- tering is at least in

n!, and at most in. Each partition is determined exactly by a way of

choosing n positions out of a string of length i + n, and letting the blocks in between be the apportionment. pn(i) = n + i n

  • = (i + n) . . . (i + 1)

n! = θ(in) 1 n!in ≤ pn(i) ≤ (i + 1)n

5 Ordinals

It is clear that the structure (ωn, ≤) has a rank n presentation. Code the number ωn−1x1+ · · · + xn as 1x1 . . . nxn. Then we can recognize a ≤ b automatically. We say a ≤ b if they have the same code, or the code for a has a higher number in the first place where they differ (with blank considered as highest). We can also produce a rank n presentation for the structure (ωn· m, ≤), for any natural number m. Code the number ωnx0 + ωn−1x1 + · · · + xn, with x0 < m, as 1mx1+x0 . . . nmxn. We say a ≤ b if either (1) their lengths mod m are different, and the remainder when |a| is divided by m is smaller than when |b| is divided by m, or (2) the lengths mod m are equal, and the code for a has a higher number in the first place where they differ (with 3

slide-4
SLIDE 4

blank considered as highest). These are all regular conditions, so the relation ≤ is regular.

5.1 ω2 has no rank 1 presentation

However, the structure (ω2, ≤) does not have a rank-1 [unary] presentation. Suppose it

  • did. Then there is a regular set D ⊂ 1∗, and a regular relation ≤ on this set inducing the
  • rder type ω2.

The relation R(x, y, z) ⇐ ⇒ x ≤ y ∧ y < x is first order definable in this structure. So there is a finite automaton with pumping length k which recognizes this language. Consider the set Y0 = {y : R(0, y, ω)}. This is an infinite set, and the codes for 0 and ω are each

  • nly finitely long. So there is some y ∈ Y whose code has length at least k longer than

either 0 or ω. The convolved string ⊗(x, y, z) can be pumped within the last k symbols, so we see that for some partition y = y0uy1, with |uy1| < k, all strings y0ujy1 must be in Y0. As D only has at most 1 string of any given length, we see that limn→∞

|Y0|≤n |D|≤n ≥ 1 |u| ≥ 1 k.

However, we can also define the set Yω = {y : R(ω, y, ω· 2)}. This set is disjoint from Y0, and by an identical argument, limn→∞

|Yω|≤n |D|≤n ≥ 1

  • k. We can do this all the way until

Yω·k, at which point we discover that |D| does not have enough short strings to make this all true. So (ω2, ≤) does not have a rank-1 presentation.

5.2 ωn+1 has no rank n presentation

We show this by induction on n. The case n = 1 is established. Suppose the structure (ωn+1, ≤) has a rank-n presentation. Then there is a regular set D ⊂ 1∗2∗ · · · n∗, and a regular relation ≤ on this set inducing the order type ωn+1. The relation R(x, y, z) ⇐ ⇒ x ≤ y ∧ y < x is first order definable in this structure. So there is a finite automaton with pumping length k which recognizes this language. Consider the set Y0 = {y : R(0, y, ωn)}. This is an infinite set, in fact a copy of the order ωn, and the codes for 0 and ωn are each only finitely long. Suppose that for some L, all strings in Y0 have at most L of some symbol. The for some symbol, the subset of strings with at most L of that symbols must also have the order type ωn (because it is a limit ordinal). Then by the compression lemma, we have a rank n − 1 presentation of ωn. By inductive assumption, this is impossible, so for any L there is a string y ∈ Y0 with more than L of each symbol. Take such a string y, with L at least k larger than either of the codes for 0 or ω2. The convolved string ⊗(0, y, ωn) can be pumped within the stretch of any character, after the codes for 0 and ωn end. So we see that for some partition y = y0s1y1s2 · · · yn, with |si| < k and si containing only the symbol i, all strings y = y0sα0

1 y1sα1 2 · · · yn must be in Y0.

4

slide-5
SLIDE 5

If we wish to pump the string y up by at most km symbols, we can pump the different si up to total of at least m times. So if |y| = M, then there are at least pn(m) ≥ mn/n! strings in Y0 of length at most M + km. As D only has at most pn(M + km) = (M + km + 1)n strings of length ≤ M + km, we see that lim

m→∞

|Y0|≤m |D|≤m = lim

m→∞

mn/n! (M + km + 1)n ≥ 1 n!kn We now do the same for the disjoint sets Yωnj = {y : R(ωnj, y, ωn(j + 1))}, for j up to (n! + 1)kn, and have the same contradiction.

6 Rationals

We wish to show that (Q, ≤) does not have a rank-n presentation, for any n. We will show this by induction on n. (Q, ≤) cannot have a 0-ary automatic presentation, because there is only one word over 0 symbols. Now suppose that (Q, ≤) does not have a rank-n−1 presentation. We will show it does not have a rank-n automatic presentation. Suppose it did, then construct the automaton recognizing the relation R(x, y, z) ⇐ ⇒ x < y ∧ y < z. Let k be the number of states it has. Consider the disjoint sets Qj = {y : R(j, y, j + 1)}, for j ∈ [n!kn] = {0, 1, . . . , n!· kn}. Each Qj is a dense linear order without endpoints; an isomorphic copy of Q. Each of these is also a regular set. By the compression lemma, ∀L ∈ N, Qj contains a string yj which has at least L = k + maxj |j| of each of the n symbols. Let M be the maximum of the length of the codes for all representatives yj ∈ [n!kn]. So by the same argument used for ωn, lim

m→∞

|Qj|≤m |D|≤m = lim

m→∞

mn/n! (M + km + 1)n ≥ 1 n!kn However, we define n!kn + 1 of these disjoint sets, and for m large enough, we do not have enough total strings of length ≤ m.

7 Oracles and Ordinals

This work sort of subsumes the earlier section on Ordinals. The technique is slightly stronger, and is the first obvious example of the limited memory lemma. 5

slide-6
SLIDE 6

7.1 ω2

We wish to show that ω2 is not rank-1 presentable, even with an oracle over an arbitrary

  • alphabet. Suppose we have a presentation (ω2, ≤) = (D, L), where D and L are recognized

by finite automata with access to a fixed oracle string o. Suppose that M, the machine recognizing L, has k states. Consider the sets Wj = {ω· j + c : c ∈ N}, for j ∈ {0, 1, . . . , 2k}. Each is infinite and regular (with an oracle), as it is first order definable with parameters as {y : ω· j ≤ y ∧ y < ω· (j + 1)}. Take a set X of representatives xj of Wj, j even. Let M be the maximum length of the encodings of any of these strings. Take a second set Y of representatives yj

  • f Wj, for j odd, each of whose codes is at least of length M.

Consider the computation of F on the string ⊗(o, x, y), where x ∈ X, y ∈ Y . Look at the state of the finite automaton after M steps. Independent of which x and y were chosen, the second coordinate will read blanks from this point on, while the third still reads 1s. The state of the automaton is one of {qj : q ∈ {1, . . . , k}}, independent of y. Partition X into X1, . . . , Xk, according to which state it put the machine into at this point. For each y, two xs in the same piece of the partition must yield the same outcome, as the second coordinate will read blanks regardless of the x chosen. There are comparisons of the xjs with the yjs which distinguish any two xjs. So all k + 1 of the xjs must be in separate pieces of the partition. This is a contradiction.

7.2 ωn

Sketch: When we go to ωn, we find representatives of {ωn−1· j + c} with at least k 0s (using same induction as before, we need to go through a little bit of machinery to get the compression lemma with oracles), and then pump all these up to get long representatives, all starting with many 0s. Then the limited memory lemma applies again.

8 Free Term Algebra

Consider the free term algebra generated by the constant c and a function f(x, y), with the relation R(t1, t2, t3) ⇐ ⇒ t3 = f(t1, t2). We know by the growth argument this isn’t finite word automatically presentable, and this was an attempt to use the limited memory lemma to show that oracles don’t help. But there is one part that fails. Suppose (D, R) were an automatic presentation of this structure, over a language Σ. Let k be the number of states in the automaton recognizing R. For any n, define D≤n to be the set of strings in D of length at most n. Take n ≫ k, and let S = D≤n. Then consider the set T = {t3 : R(s1, s2, t3); s1, s2 ∈ S}. Because every term t3 = c has a unique decomposition as f(t1, t2), we have |T| = |S|2. Now take just the strings in T of length > n, ie T ′ = T\S, with |T ′| ≥ |S|2 − |S|. 6

slide-7
SLIDE 7

For any a ∈ Σn, define Ta = {az ∈ T ′}. Only |S|· |Σ|k of these are nonempty. Given b ∈ Ta, we can pump b down to length ≤ n, while preserving the first n − k symbols. So there must be a string s ∈ S corresponding to each nonempty Ta, and such a string corresponds to at most |Σ|k such sets. (This part doesn’t immediately relativize.) So there is some Ta of size at least (|S|2 − |S|)/(|S|· |Σ|k) = (|S| − 1)/(|Σ|k) > k. Let this set be L. Then by the limited memory lemma (nonoracle version), the relations Rl(t1, t2) ⇐ ⇒ R(t1, t2, l) partition S2 into at most k equivalence classes. But in fact each

  • f the > k relations in that set selects out a single pair (t1, t2), a contradiction.

7