Pheno Technology Carl Pollard Department of Linguistics Ohio State - - PowerPoint PPT Presentation

pheno technology
SMART_READER_LITE
LIVE PREVIEW

Pheno Technology Carl Pollard Department of Linguistics Ohio State - - PowerPoint PPT Presentation

Pheno Technology Carl Pollard Department of Linguistics Ohio State University February 14, 2012 Carl Pollard Pheno Technology Beyond Strings We cant keep pretending that all there is to pheno is strings and functions over strings. Often


slide-1
SLIDE 1

Pheno Technology

Carl Pollard

Department of Linguistics Ohio State University

February 14, 2012

Carl Pollard Pheno Technology

slide-2
SLIDE 2

Beyond Strings

We can’t keep pretending that all there is to pheno is strings and functions over strings. Often we need to ask: strings of what? Syllables? Prosodic words? Intonation phrases? And it’s not enough just to stick things together; often we need to know ‘how tightly’ or by ‘what flavor of glue’ things are stuck together. Also there is the issue of non-determinism: sometimes there is some freedom of variation in how things are ordered. We need to develop some technology for talking about such things within the higher-order pheno theory. First we review how strings are talked about in set theory.

Carl Pollard Pheno Technology

slide-3
SLIDE 3

Review of Standard Notation

For any sets A and B, we write AB for the set of functions from B to A. We write ω for the set of natural numbers. Each natural number n is the same as the set of natural numbers less than n. The members of An are called A-strings of length n. (We drop the A-prefix when we know which set we’re talking about.) The unique member of A0, called the null A-string, is written ǫA (or just ǫ). For n > 0, the string that maps each i < n to ai is usually written a0 . . . an−1. So the notation a is ambiguous between a member of A and the string of length 1 that maps 0 to a.

Carl Pollard Pheno Technology

slide-4
SLIDE 4

The Monoid of A-Strings

a monoid is an algebra with:

a binary operation which is associative, and a distinguished member which is a two-sided identity for the binary operation.

For any set A, the set of all A-strings is A∗ = def

  • i∈ωAi.

A∗ forms a monoid with

⌢ (concatenation) as the associative operation ǫA (the null A-string) as the identity for ⌢.

Here if f ∈ Am and g ∈ An, then f ⌢ g ∈ Am+n is given by

(f ⌢ g)(i) = f(i) for all i < m; and (f ⌢ g)(m + i) = g(i) for all i < n.

Carl Pollard Pheno Technology

slide-5
SLIDE 5

More Notation

Since concatenation is associative, we can just write f ⌢ g ⌢ h instead of (f ⌢ g) ⌢ h or f ⌢ (g ⌢ h). If f = a0 . . . an−1 and g = b0 . . . bm−1, then f ⌢ g = a0 . . . an−1b0 . . . bm−1. Usually concatenation is expressed without the “⌢”, by mere juxtaposition; e.g. fg for f ⌢ g. This can be confusing because it conflicts with the a0....an

  • notation. For example, if a, b, c, d, e ∈ A and f = bcd, then

the notation afe means the string of length 5 abcde, but if f ∈ A, then afe means a string of length 3. Also, if f and g are A-strings, then fg could mean either their concatenation, which is an A-string, or else an A∗-string of length 2. It will be important for us to avoid such confusions.

Carl Pollard Pheno Technology

slide-6
SLIDE 6

A-Languages

For any set A, an A-language is a set of A-strings, i.e. a subset of A∗. Thought of as an A-language, the empty set ∅ is written 0A. The singleton A-language whose only member is the null A-string ǫ is written 1A. For any a ∈ A, a is the singleton A-language whose only member is the string of length one a. For any two A-languages L and M, the language concatenation of L and M, written L • M, is the set of all strings of the form u ⌢ v where u ∈ L and v ∈ M.

Carl Pollard Pheno Technology

slide-7
SLIDE 7

The Ordered Monoid of A-Languages

An ordered monoid is a monoid with an order ≤, such that the associative operation ◦ is monotonic, i.e. if a ≤ b and c ≤ d, then a ◦ c ≤ b ◦ d. For any set A, the set of A-languages ℘(A∗) forms an

  • rdered monoid with

A-languages (i.e. sets of A-strings) as the elements subset inclusion as the order

  • as the associative operation

1A (= {ǫA}) as the identity for •.

Carl Pollard Pheno Technology

slide-8
SLIDE 8

Residuals

Two other important operations on the set ℘(A∗) of A-languages are the residuals of •, defined as follows: for any two A-languages L and M, the right residual of L by M, written L/M, is the set of all strings u such that u ⌢ v ∈ L for every v ∈ M. the left residual of L by M, written M\L, is the set of all strings u such that v ⌢ u ∈ L for every v ∈ M. (With the addition of these operations, ℘(A∗) becomes a kind

  • f ordered algebra called a residuated monoid.)

Carl Pollard Pheno Technology

slide-9
SLIDE 9

Kleene Closure

For any A-language L, the Kleene closure of L, written kl(L), is the A-language defined as follows:

  • 1. (base clause) ǫ ∈ kl(L)
  • 2. (recursion clause) if u ∈ L and v ∈ kl(L), then uv ∈ kl(L)
  • 3. nothing else is in kl(L).

Intuitively: the members of kl(L) are the strings formed by concatenating zero or more strings of L.

Carl Pollard Pheno Technology

slide-10
SLIDE 10

Positive Kleene Closure

For any A-language L, the positive Kleene closure of L, written kl+(L), is the A-language defined as follows:

  • 1. (base clause) If u ∈ L, then u ∈ kl+(L)
  • 2. (recursion clause) if u ∈ L and v ∈ kl+(L), then

uv ∈ kl+(L)

  • 3. nothing else is in kl+(L).

Intuitively: the members of kl+(L) are the strings formed by concatenating one or more strings of L.

Carl Pollard Pheno Technology

slide-11
SLIDE 11

Strings in the Pheno Theory (1/3)

The way we will handle strings in the pheno theory is influenced by the they are handled in typed functional programming languages, which (like HOL) are based on typed lambda calculus. One basic idea is that there can be strings of anything. We express this by replacing the string type s with the unary type constructor Str, which denotes a function from types to types. That is: for each type A, Str(A) (often written StrA) is the type of A-strings. We introduce a type p of ‘basic phenogrammatical units’, which for the time being can be thought of as something like phonological (or prosodic) words. We now revive the notation s as an abbreviation for Strp.

Carl Pollard Pheno Technology

slide-12
SLIDE 12

Strings in the Pheno Theory (2/3)

For each type A: We introduce a constant eA of type StrA for the null A-string. We introduce a constant toSA : A → StrA. Intuitively, for any x of type A, (toS x) plays the same role that would be played in the set-theoretic approach by the length-one string that maps 0 to x. We introduce a constant ·A (written infix) of type StrA → StrA → StrA for concatenation of A-strings. We usually drop the subscript A when it is clear from context what kinds of strings we are talking about. For n > 0, we write ao . . . an as an abbreviation for (toS ao) · . . . · (toS an).

Carl Pollard Pheno Technology

slide-13
SLIDE 13

Strings in the Pheno Theory (3/3)

The most obvious way to characterize the behavior of concatenation is by adding the axioms (with s, t, u variables of type StrA): ⊢ ∀s.s · e = s ⊢ ∀s.e · s = s ⊢ ∀stu.(s · t) · u = s · (t · u)

Carl Pollard Pheno Technology

slide-14
SLIDE 14

Representing the Natural Numbers

Often it’s useful to be able to identify a position in a string

  • r to know the length of a string.

We can represent the natural numbers as the type StrT, which we abbreviate as n. We represent 0 as eT. We define the successor function suc : n → n by suc = def λn.(toS ∗) · n Then we write 0, 1, 2, 3, etc. as abbreviations for eT, toS ∗, ∗∗, ∗ ∗ ∗, etc. If necessary we can define the usual arithmetic functions (addition, multiplication, exponential) by mimicking the way they are recursively defined in set theory.

Carl Pollard Pheno Technology

slide-15
SLIDE 15

Coproduct (Disjoint Union) Types (1/2)

In set theory, the disjoint union of two sets A and B is the union of ‘copies’ A′ and B′ of A and B respectively, where each a ∈ A corresponds to 0, a in A′ and each b ∈ B corresponds to 1, b in B′. The HOL analog of disjoint union is the coproduct type constructor ∨. Thus for any two types A and B, there is a type A ∨ B. A and B are called the cofactors of A ∨ B, just as they are called the factors of A ∧ B.

Carl Pollard Pheno Technology

slide-16
SLIDE 16

Coproduct (Disjoint Union) Types (2/2)

There are term constructors iA,B and jA,B, called (canonical) injections, of types A → (A ∨ B) and B → (A ∨ B) respectively. (Compare these with the projections πA,B and π′A,B of types (A ∧ B) → A and (A ∧ B) → B respectively.) If f : A → C and g : B → C, we write [f, g] : (A ∨ B) → C for the function which is ‘defined by cases’, i.e. if z of type A ∨ B is ix, then ([f, g] z) = (f x), and if z is jy, then ([f, g] z) = (g y). Intuitively, ix is the ‘same thing’ as x, but thought of as a member of A ∨ B instead of as a member of A, and so often we just write x instead of ix when no confusion can arise (and likewise y instead of jy).

Carl Pollard Pheno Technology

slide-17
SLIDE 17

Null and Non-Null String Types (1/2)

For each type A, we think of the type StrA as the coproduct of the null string type NstA and the non-null string type NnsA: StrA = NstA ∨ NnsA Then we adjust the type of eA from StrA to NstA. We introduce a constant cns : A → StrA → NnsA. Intuitively, (cns x s) will represent the result of sticking x

  • nto the front of the string s.

We introduce the constants fstA : NnsA → A (‘first’) and rstA : NnsA → StrA ‘rest’). cns, fst, and rst are related by the axiom (here s is of type NnsA): ⊢ ∀s.s = (cns (fst s) (rst s))

Carl Pollard Pheno Technology

slide-18
SLIDE 18

Null and Non-Null String Types (2/2)

We can define toS : A → NnsA as toS = def λx.(cns x e) Concatenation is related to cns by the axiom:

⊢ ∀xst.(cns x s) · t = (cns x s · t)

We define the length function lenA : StrA → n by the axioms:

⊢ (len e) = 0 ⊢ ∀xs.(len (cns x s)) = (suc (len s))

Carl Pollard Pheno Technology