Proceedin gs of the 35th Annual ACL and 8th European ACL, - - PDF document

proceedin gs of the 35th annual acl and 8th european acl
SMART_READER_LITE
LIVE PREVIEW

Proceedin gs of the 35th Annual ACL and 8th European ACL, - - PDF document

Proceedin gs of the 35th Annual ACL and 8th European ACL, 313-320, Madrid, July 1997. [With minor correction s. ] Ecien t Generation in Prim it i v e Optim ali t y Theory Jason Eisner Dept. of Computer and


slide-1
SLIDE 1 Ecien t Generation in Prim it i v e Optim ali t y Theory Proceedin gs
  • f
the 35th Annual ACL and 8th European ACL, 313-320, Madrid, July 1997. [With minor correction s. ] Jason Eisner Dept.
  • f
Computer and Information Science Univ ersit y
  • f
P ennsylv ania 200 S. 33rd St., Philadelphia, P A 19104-6389, USA jeisner@l inc .ci s.u pe nn. edu Abstract This pap er in tro duces primitiv e Optimal- it y Theory (OTP), a linguistically moti- v ated formalization
  • f
OT. OTP sp ecies the class
  • f
autosegmen tal represen tations, the univ ersal generator Gen, and the t w
  • simple
families
  • f
p ermissible constrain ts. In con trast to less restricted theories us- ing Generalized Alignmen t, OTP's
  • pti-
mal surface forms can b e generated with nite-state metho ds adapted from (Ellison, 1994). Unfortunately these metho ds tak e time exp
  • nen
tial
  • n
the size
  • f
the gram- mar. Indeed the generation problem is sho wn NP-hard in this sense. Ho w ev er, tec hniques are discussed for making Elli- son's approac h fast in the t ypical case, in- cluding a simple tric k that alone pro vides a 100-fold sp eedup
  • n
a grammar fragmen t
  • f
mo derate size. One a v en ue for future impro v emen ts is a new nite-state notion, \factored automata," where regular lan- guages are represen ted compactly via for- mal in tersections \ k i=1 A i
  • f
FSAs. 1 Wh y formalize OT? Phonology has recen tly undergone a paradigm shift. Since the seminal w
  • rk
  • f
(Prince & Smolensky , 1993), phonologists ha v e published literally h un- dreds
  • f
analyses in the new constrain t-based frame- w
  • rk
  • f
Optimalit y Theory ,
  • r
OT. Old-st yle deriv a- tional analyses ha v e all but v anished from the lin- guistics conferences. The price
  • f
this creativ e fermen t has b een a cer- tain lac k
  • f
rigor. The claim for OT as Univ ersal Grammar is not substan tiv e
  • r
falsiable without formal denitions
  • f
the putativ e Univ ersal Gram- mar
  • b
jects Repns , Con, and Gen (see b elo w). F
  • rmalizing
OT is necessary not
  • nly
to esh it
  • ut
as a linguistic theory , but also for the sak e
  • f
compu- tational phonology . Without kno wing what classes
  • f
constrain ts ma y app ear in grammars, w e can sa y
  • nly
so m uc h ab
  • ut
the prop erties
  • f
the system,
  • r
ab
  • ut
algorithms for generation, comprehension, and learning. The cen tral claim
  • f
OT is that the phonology
  • f
an y language can b e naturally describ ed as suc c es- sive ltering. In OT, a phonological gramma r for a language consists
  • f
a v ector C 1 ; C 2 ; : : : C n
  • f
soft constrain t s dra wn from a univ ersal xed set Con . Eac h constrain t in the v ector is a function that scores p
  • ssible
  • utput
represen tations (surface forms): (1) C i : Repns ! f0; 1; 2; : : : g (C i 2 Con) If C i (R) = 0, the
  • utput
represen tation R is said to satisfy the ith constrain t
  • f
the language. Other- wise it is said to violate that constrain t, where the v alue
  • f
C i (R) sp ecies the degree
  • f
violation. Eac h constrain t yields a lter that p ermits
  • nly
minima l violation
  • f
the constrain t: (2) Filter i (S et)= fR 2 S et : C i (R) is minim alg Giv en an underlying phonological input, its set
  • f
legal surface forms under the grammar|t ypi cally
  • f
size 1|is just (3) Filter n (
  • Filter
2 (Filter 1 (Gen (input )))) where the function Gen is xed across languages and Gen (input )
  • Repns
is a p
  • ten
tially innite set
  • f
candidate surface forms. In practice, eac h surface form in Gen (input ) m ust con tain a silen t cop y
  • f
input , so the constrain ts can score it
  • n
ho w closely its pronounced material matc hes input . The constrain ts also score
  • ther
cri- teria, suc h as ho w easy the material is to pronounce. If C 1 in a giv en language is violated b y just the forms with co da consonan ts, then Filter 1 (Gen (input )) in- cludes
  • nly
co da-free candidates|regardless
  • f
their
  • ther
demerits, suc h as discrepancies from input
  • r
un usual syllable structure. The remaining con- strain ts are satised
  • nly
as w ell as they can b e giv en this set
  • f
surviv
  • rs.
Th us, when it is imp
  • ssible
to satisfy all constrain ts at
  • nce,
successiv e ltering means early constrain ts tak e priorit y . Questions under the new paradigm include these:
  • Gener
ation. Ho w to implemen t the input-
  • utput
mapping in (3)? A brute-force approac h
slide-2
SLIDE 2 fails to terminate if Gen pro duces innitely man y candidates. Sp eak ers m ust solv e this problem. So m ust linguists, if they are to kno w what their prop
  • sed
grammars predict.
  • Compr
ehension. Ho w to in v ert the input-
  • utput
mapping in (3)? Hearers m ust solv e this.
  • L
e arning. Ho w to induce a lexicon and a phonology lik e (1) for a particular language, giv en the kind
  • f
evidence a v ailable to c hild lan- guage learners? None
  • f
these questions is w ell-p
  • sed
without restric- tions
  • n
Gen and Con. In the absence
  • f
suc h restrictions, computational linguists ha v e assumed con v enien t
  • nes.
Ellison (1994) solv es the generation problem where Gen pro duces a regular set
  • f
strings and Con admits all nite state transducers that can map a string to a n um b er in unary notation. (Th us C i (R) = 4 if the C i transducer
  • utputs
the string 1111
  • n
input R.) T esar (1995, 1996) extends this result to the case where Gen (input ) is the set
  • f
parse trees for input under some con text-free grammar (CF G). 1 T esar's constrain ts are functions
  • n
parse trees suc h that C i ([A [B 1
  • ]
[B 2
  • ]]
) can b e computed from A, B 1 , B 2 , C i (B 1 ), and C i (B 2 ). The
  • ptimal
tree can then b e found with a standard dynamic-programm ing c hart parser for w eigh ted CF Gs. It is an imp
  • rtan
t question whether these for- malism s are useful in practice. On the
  • ne
hand, are they expressiv e enough to describ e real languages? On the
  • ther,
are they restrictiv e enough to admit go
  • d
comprehension and unsup ervised-learning al- gorithms? The presen t pap er sk etc hes primiti v e Optimal- it y Theory (OTP)|a new formalization
  • f
OT that is explicitly prop
  • sed
as a linguistic h yp
  • the-
sis. Represen tations are autosegmen tal, Gen is triv- ial, and
  • nly
certain simple and phonologically lo cal constrain ts are allo w ed. I then sho w the follo wing: 1. Go
  • d
news: Generation in OTP can b e solv ed attractiv ely with nite-state metho ds. The so- lution is giv en in some detail. 2. Go
  • d
news: OTP usefully restricts the space
  • f
grammars to b e learned. (In particular, Gener- alized Alignmen t is
  • utside
the scop e
  • f
nite- state
  • r
indeed con text-free metho ds.) 3. Bad news: While OTP generation is close to lin- ear
  • n
the size
  • f
the input form, it is NP-hard
  • n
the size
  • f
the gramm ar, whic h for h uman languages is lik ely to b e quite large. 4. Go
  • d
news: Ellison's algorithm can b e impro v ed so that its exp
  • nen
tial blo wup is
  • ften
a v
  • ided.
1 This extension is useful for OT syn tax but ma y ha v e little application to phonology , since the con text-free case reduces to the regular case (i.e., Ellison) unless the CF G con tains recursiv e pro ductions. 2 Primitiv e Optimalit y Theory Primitiv e Optimalit y Theory ,
  • r
OTP , is a formal- ization
  • f
OT featuring a homogeneous
  • utput
repre- sen tation, extremely lo cal constrain ts, and a simple, unrestricted Gen . Linguistic argumen ts for OTP's constrain ts and represen tations are giv en in (Eisner, 1997), whereas the presen t description fo cuses
  • n
its formal prop erties and suitabilit y for computational w
  • rk.
An axiomatic treatmen t is
  • mitted
for rea- sons
  • f
space. Despite its simplicit y , OTP app ears capable
  • f
capturing virtually all analyses found in the (phonological) OT literature. 2.1 Repns : Represen tatio ns in OTP T
  • represen
t [mp], OTP uses not the autosegmen tal represen tation in (4a) (Goldsmith, 1976; Goldsmith, 1990) but rather the simplied autosegmen tal rep- resen tation in (4b), whic h has no asso ciation lines. Similarl y (5a) is replaced b y (5b). The cen tral rep- resen tational notion is that
  • f
a constituen t time- line: an innitely divisible line along
  • n
whic h con- stituen ts are laid
  • ut.
Ev ery constituen t has width and edges. (4) a. voi nas/ |/ C C \ / lab b. v
  • i
[ ] v
  • i
nas [ ] nas C [ C [ ] C ] C lab [ ] lab
  • !
!timeline ! ! F
  • r
phonetic in terpretation: ] v
  • i
sa ys to end v
  • ic-
ing (laryngeal vibration). A t the same instan t, ] nas sa ys to end nasalit y (raise v elum). (5) a.
  • /|\
/| C V C V b.
  • [
  • ]
[
  • ]
  • C
[ ] C C [ ] C V [ ] V V [ ] V A timeline can carry the full panoply
  • f
phonolog- ical and morphological constituen ts|an ything that phonological constrain ts migh t ha v e to refer to. Th us, a timeline b ears not
  • nly
autosegmen tal fea- tures lik e nasal gestures [nas] and proso dic con- stituen ts suc h as syllables [ ], but also stress marks [x], feature domains suc h as [A TR dom ] (Cole & Kisseb erth, 1994) and morphemes suc h as [S tem]. All these constituen ts are formally iden tical: eac h marks
  • an
in terv al
  • n
the timeline. Let Tiers de- note the xed nite set
  • f
constituen t t yp es, fnas,
  • ,
x, A TR dom , S tem, : : : g. It is alw a ys p
  • ssible
to reco v er the
  • ld
represen ta- tion (4a) from the new
  • ne
(4b), under the con v en- tion that t w
  • constituen
ts
  • n
the timeline are link ed if their in teriors
  • v
erlap (Bird & Ellison, 1994). The in teri
  • r
  • f
a constituen t is the
  • p
en in terv al that
slide-3
SLIDE 3 excludes its edges. Th us, l ab is link ed to b
  • th
con- sonan ts C in (4b), but the t w
  • consonan
ts are not link ed to eac h
  • ther,
b ecause their in teriors do not
  • v
erlap. By eliminating explicit asso ciation lines, OTP eliminates the need for faithfulness constrain ts
  • n
them,
  • r
for w ell-formedness constrain ts against gap- ping
  • r
crossing
  • f
asso ciations. In addition, OTP can refer naturally to the edges
  • f
syllables (or mor- phemes). Suc h edges are tric ky to dene in (5a), b e- cause a syllable's features are scattered across m ulti- ple tiers and p erhaps shared with adjacen t syllables. In diagrams
  • f
timelines, suc h as (4b) and (5b), the in ten t is that
  • nly
horizon tal
  • rder
matters. Horizon tal spacing and v ertical
  • rder
are irrelev an t. Th us, a timeline ma y b e represen ted as a nite col- lection S
  • f
lab eled edge brac k ets, equipp ed with
  • r-
dering relations
  • and
' that indicate whic h brac k- ets precede eac h
  • ther
  • r
fall in the same place. V alid timelines (those in Repns ) also require that edge brac k ets come in matc hing pairs, that con- stituen ts ha v e p
  • sitiv
e width, and that constituen ts
  • f
the same t yp e do not
  • v
erlap (i.e., t w
  • con-
stituen ts
  • n
the same tier ma y not b e link ed). 2.2 Gen : Input and
  • utput
in OTP OT's principle
  • f
Con tainmen t (Prince & Smolen- sky , 1993) sa ys that eac h
  • f
the p
  • ten
tial
  • utputs
in Repns includes a silen t cop y
  • f
the input, so that constrain ts ev aluating it can consider the go
  • dness
  • f
matc h b et w een input and
  • utput.
Accordingly , OTP represen ts b
  • th
input and
  • utput
constituen ts
  • n
the constituen t timeline, but
  • n
dieren t tiers. Th us surface nasal autosegmen ts are brac k eted with nas [ and ] nas , while underlying nasal autosegmen ts are brac k eted with nas [ and ] nas . The underlining is a notational con v en tion to denote input material. No connection is required b et w een [nas] and [nas] except as enforced b y constrain ts that prefer [nas] and [nas]
  • r
their edges to
  • v
erlap in some w a y . (6) sho ws a candidate in whic h underlying [nas] has sur- faced \in place" but with righ t w ard spreading. (6) nas [ ] nas nas [ ] nas Here the left edges and in teriors
  • v
erlap, but the righ t edges fail to. Suc h
  • v
erlap
  • f
in teriors ma y b e regarded as featural Input-Output Corresp
  • ndence
in the sense
  • f
(McCarth y & Prince, 1995). The lexicon and morphology supply to Gen an undersp ecied timelin e|a partially
  • rdered
col- lection
  • f
input edges. The use
  • f
a p artial
  • rdering
allo ws the lexicon and morphology to supply
  • at-
ing tones,
  • ating
morphemes and templatic mor- phemes. Giv en suc h an undersp ecied timeline as lexical input, Gen
  • utputs
the set
  • f
al l fully sp ecied time- lines that are consisten t with it. No new input con- stituen ts ma y b e added. In essence, Gen generates ev ery w a y
  • f
rening the partial
  • rder
  • f
input con- stituen ts in to a total
  • rder
and decorating it freely with
  • utput
constituen ts. Conditions suc h as the proso dic hierarc h y (Selkirk, 1980) are enforced b y univ ersally high-rank ed constrain ts, not b y Gen . 2 2.3 Con: The primiti v e constrain ts Ha ving describ ed the represen tations used, it is no w p
  • ssible
to describ e the constrain ts that ev aluate them. OTP claims that Con is restricted to the follo wing t w
  • families
  • f
primiti v e constrain ts : (7)
  • !
  • (\implicatio
n"): \Eac h
  • temp
  • rally
  • v
erlaps some
  • ."
Sc
  • ring:
Constrain t(R) = n um b er
  • f
's in R that do not
  • v
erlap an y
  • .
(8)
  • ?
  • (\clash"):
\Eac h
  • temp
  • rally
  • v
erlaps no
  • ."
Sc
  • ring:
Constrain t(R) = n um b er
  • f
(;
  • )
pairs in R suc h that the
  • v
erlaps the
  • .
That is,
  • !
  • sa
ys that 's attract
  • 's,
while
  • ?
  • sa
ys that 's rep el
  • 's.
These are simple and arguably natural constrain ts; no
  • thers
are used. In eac h primitiv e constrain t,
  • and
  • eac
h sp ec- ify a phonological ev en t. An ev en t is dened to b e either a t yp e
  • f
lab eled edge, written e.g.
  • [
,
  • r
the in terior (excluding edges)
  • f
a t yp e
  • f
lab eled constituen t, written e.g.
  • .
T
  • express
some con- strain ts that app ear in real phonologies, it is also necessary to allo w
  • and
  • to
b e non-empt y con- junctions and disjunctions
  • f
ev en ts. Ho w ev er, it app ears p
  • ssible
to limit these cases to the forms in (9){(10). Note that
  • ther
forms, suc h as those in (11), can b e decomp
  • sed
in to a sequence
  • f
t w
  • r
2 The formalism is complicated sligh tly b y the p
  • s-
sibilit y
  • f
deleting segmen ts (syncop e)
  • r
inserting seg- men ts (ep en thesis), as illustrated b y the candidates b e- lo w. (i) Syncop e (C V C ) C C ): the V is crushed to zero width so the C 's can b e adjacen t. C [ ] [ C ] C C [ ] [ C ] C V j V (ii) Ep en thesis (C C ) C V C ): the C 's are pushed apart. V [ ] V C [ ] C C [ ] C C [ ] C C [ ] C In
  • rder
to allo w adjacency
  • f
the surface consonan ts in (i), as exp ected b y assimilation pro cesses (and encour- aged b y a high-rank ed constrain t), note that the underly- ing v
  • w
el m ust b e allo w ed to ha v e zero width|an
  • ption
a v ailable to to input but not
  • utput
constituen ts. The input represen tation m ust sp ecify
  • nly
V [
  • ]
V , not V [
  • ]
V . Similarl y , to allo w (ii), the input represen ta- tion m ust sp ecify
  • nly
] C 1
  • C
2 [ , not ] C 1 ' C 2 [ .
slide-4
SLIDE 4 more constrain ts. 3 (9) (
  • 1
and
  • 2
and : : : ) ! (
  • 1
  • r
  • 2
  • r
: : : ) Sc
  • ring:
Constrain t(R) = n um b er
  • f
v ectors
  • f
ev en ts hA 1 ; A 2 ; : : : i
  • f
t yp es
  • 1
;
  • 2
; : : : re- sp ectiv ely that all
  • v
erlap
  • n
the timeline and whose in tersection do es not
  • v
erlap an y ev en t
  • f
t yp e
  • 1
  • r
  • 2
  • r
: : : . (10) (
  • 1
and
  • 2
and : : : ) ? (
  • 1
and
  • 2
and : : : ) Sc
  • ring:
Constrain t(R) = n um b er
  • f
v ec- tors
  • f
ev en ts hA 1 ; A 2 ; : : : ; B 1 ; B 2 ; : : : i
  • f
t yp es
  • 1
;
  • 2
; : : : ;
  • 1
;
  • 2
; : : : resp ectiv ely that all
  • v
erlap
  • n
the timeline. (Could also b e notated:
  • 1
?
  • 2
?
  • ?
  • 1
?
  • 2
?
  • .)
(11)
  • !
(
  • 1
and
  • 2
) [cf.
  • !
  • 1
  • !
  • 2
] (
  • 1
  • r
  • 2
) !
  • [cf.
  • 1
!
  • 2
!
  • ]
The unifying theme is that eac h primitiv e con- strain t coun ts the n um b er
  • f
times a candidate gets in to some bad lo c al conguration. This is an in ter- v al
  • n
the timeline throughout whic h certain ev en ts (one
  • r
more sp ecied edges
  • r
in teriors) are all presen t and certain
  • ther
ev en ts (zero
  • r
more sp ec- ied edges
  • r
in teriors) are all absen t. Sev eral examples
  • f
phonologically plausible con- strain ts, with monik ers and descriptions, are giv en b elo w. (Eisner, 1997) sho ws ho w to rewrite h un- dreds
  • f
constrain ts from the literature in the primi- tiv e constrain t notation, and discusses the problem- atic case
  • f
reduplication. (Eisner, in press) giv es a detailed stress t yp
  • logy
using
  • nly
primitiv e con- strain ts; in particular, non-lo cal constrain ts suc h as FtBin, F
  • otF
  • rm,
and Generalized Alignmen t (McCarth y & Prince, 1993) are eliminated. (12) a. Onset:
  • [
! C [ \Ev ery syllable starts with a consonan t." b. NonFinality: ] Wor d ? ] F \The end
  • f
a w
  • rd
ma y not b e fo
  • ted."
c. FtSyl: F [ !
  • [
, ] F ! ]
  • \F
eet start and end
  • n
syllable b
  • undaries."
d. P a ckFeet: ] F ! F [ \Eac h fo
  • t
is follo w ed immediately b y an-
  • ther
fo
  • t;
i.e., minimize the n um b er
  • f
gaps b et w een feet. Note that the nal fo
  • t,
if an y , will alw a ys violate this constrain t." e. NoClash: ] x ? x [ \Tw
  • stress
marks ma y not b e adjacen t." f. Pr
  • gressiveV
  • icing:
] v
  • i
? C [ \If the segmen t preceding a consonan t is v
  • iced,
v
  • icing
ma y not stop prior to the 3 Suc h a sequence do es alter the meaning sligh tly . T
  • get
the exact
  • riginal
meaning, w e w
  • uld
ha v e to de- comp
  • se
in to so-called \unrank ed" constrain ts, whereb y C i (R) is dened as C i 1 (R) + C i 2 (R). But suc h ties under- mine OT's idea
  • f
strict ranking: they confer the p
  • w
er to minimize linear functions suc h as (C 1 + C 1 + C 1 + C 2 + C 3 + C 3 )(R) = 3C 1 (R) + C 2 (R) + 2C 3 (R). F
  • r
this reason, OTP curren tly disallo ws unrank ed constrain ts; I kno w
  • f
no linguisti c data that crucially require them. consonan t but m ust b e spread
  • n
to it." g. NasV
  • i:
nas ! v
  • i
\Ev ery nasal gesture m ust b e at least partly v
  • iced."
h. FullNasV
  • i:
nas ? v
  • i
[ , nas ? ] v
  • i
\A nasal gesture ma y not b e
  • nly
partly v
  • iced."
i. Max(v
  • i)
  • r
P arse(v
  • i):
v
  • i
! v
  • i
\Underlying v
  • icing
features surface." j. Dep(v
  • i)
  • r
Fill(v
  • i):
v
  • i
! v
  • i
\V
  • icing
features app ear
  • n
the surface
  • nly
if they are also underlying." k. NoSpreadRight(v
  • i):
v
  • i
? ] v
  • i
\Underlying v
  • icing
ma y not spread righ t- w ard as in (6)." l. NonDegenera te: F !
  • [
\Ev ery fo
  • t
m ust cross at least
  • ne
mora b
  • undary
  • [
." m. T a utomorphemicF
  • ot:
F ? M
  • r
ph [ \No fo
  • t
ma y cross a morpheme b
  • undary
." 3 Finite-state generation in OTP 3.1 A simple generation algorithm Recall that the generation problem is to nd the
  • utput
set S n , where (13) a. S = Gen (input )
  • Repns
b. S i+1 = Filter i+1 (S i )
  • S
i Since in OTP , the input is a partial
  • rder
  • f
edge brac k ets, and S n is a set
  • f
  • ne
  • r
more total
  • rders
(timelines), a natural approac h is to successiv ely re- ne a partial
  • rder.
This has merit. Ho w ev er, not ev ery S i can b e represen ted as a single partial
  • rder,
so the approac h is quic kly complicated b y the need to enco de disjunction. A simpler approac h is to represen t S i (as w ell as input and Repns ) as a nite-state automaton (FSA), denoting a regular set
  • f
strings that enco de timelines. The idea is essen tially due to (Ellison, 1994), and can b e b
  • iled
do wn to t w
  • lines:
(14) Ellison's algorithm (v arian t). S = input \ Repns = all conceiv able
  • utputs
con taining input S i+1 = BestP aths (S i \ C i+1 ) Eac h constrain t C i m ust b e form ulated as an e dge- weighte d FSA that scores candidates: C i accepts an y string R,
  • n
a single path
  • f
w eigh t C i (R). 4 Best- P aths is Dijkstra's \single-source shortest paths" algorithm, a dynamic-programm ing algorithm that prunes a w a y all but the minim um
  • w
eigh t paths in an automaton, lea ving an un w eigh ted automaton. OTP is simple enough that it can b e describ ed in this w a y . 5 The next section giv es a nice enco ding. 4 W eigh ted v ersions
  • f
the state-lab eled nite au- tomata
  • f
(Bird & Ellison, 1994) could b e used instead. 5 The con v erse is also true, as w as sho wn at the talk accompan ying this pap er: giv en enough abstract tiers, OTP can sim ulate an y nite-state OT grammar.
slide-5
SLIDE 5 3.2 OTP with automata W e ma y enco de eac h timeline as a string
  • v
er an enormous alphab et . If jTiers j = k , then eac h sym b
  • l
in
  • is
a k
  • tuple,
whose comp
  • nen
ts describ e what is happ ening
  • n
the v arious tiers at a giv en momen t. The comp
  • nen
ts are dra wn from a smaller alphab et
  • =
f[, ], |, +,
  • g.
Th us at an y time, the ith tier ma y b e b eginning
  • r
ending a constituen t ([, ])
  • r
b
  • th
at
  • nce
(|),
  • r
it ma y b e in a steady state in the in terior
  • r
exterior
  • f
a constituen t (+,
  • ).
A t a minim um , the string m ust record all momen ts where there is an edge
  • n
some tier. If all tiers are in a steady state, the string need not use an y sym b
  • ls
to sa y so. Th us the string enco ding is not unique. (15) giv es an expression for al l strings that cor- rectly describ e the single tier sho wn. (16) describ es a t w
  • -tier
timeline consisten t with (15). Note that the brac k ets
  • n
the t w
  • tiers
are
  • rdered
with re- sp ect to eac h
  • ther.
Timelines lik e these could b e assem bled morphologicall y from
  • ne
  • r
more lexical en tries (Bird & Ellison, 1994),
  • r
pro duced in the course
  • f
algorithm (14). (15) x [ x ] [ x ] x )
  • *[+*|+*]-*
(16) x [ x ] [ x ] x y [ ] y y [ ] y ) h-;
  • i
*h [;
  • i
h+;
  • i*h+;
[ ih+; + i*h|; + ih + ; +i* h+; ] ih+;
  • i*h+;
[ ih+; + i *h] ; ]i W e store timeline expressions lik e (16) as deter- ministic FSAs. T
  • reduce
the size
  • f
these automata, it is con v enien t to lab el arcs not with individual el- emen ts
  • f
  • (whic
h is h uge) but with subsets
  • f
, denoted b y predicates. W e use conjunctiv e predi- cates where eac h conjunct lists the allo w ed sym b
  • ls
  • n
a giv en tier: (17) +F, ] , [|+-voi (arc lab el w/ 3 conjuncts) The arc lab el in (17) is said to men tion the tiers F ;
  • ;
v
  • i
2 Tiers . Suc h a predicate allo ws an y sym- b
  • l
from
  • n
the tiers it do es not men tion. The input FSA constrains
  • nly
the input tiers. In (14) w e in tersect it with Repns , whic h constrains
  • nly
the
  • utput
tiers. Repns is dened as the in ter- section
  • f
man y automata exactly lik e (18), called tier rules, whic h ensure that brac k ets are prop erly paired
  • n
a giv en tier suc h as F (fo
  • t).
(18)
  • F

[F ]F |+F

Lik e the tier rules, the constrain t automata C i are small and deterministic and can b e built automat- ically . Ev ery edge has w eigh t
  • r
1. With some care it is p
  • ssible
to dra w eac h C i with t w
  • r
few er states, and with a n um b er
  • f
arcs prop
  • rtional
to the n um b er
  • f
tiers men tioned b y the constrain t. Keeping the constrain ts small is imp
  • rtan
t for ef- ciency , since real languages ha v e man y constrain ts that m ust b e in tersected. Let us do the hardest case rst. An implication constrain t has the general form (9). Supp
  • se
that all the
  • i
are in teriors, not edges. Then the constrain t targets in terv als
  • f
the form
  • =
  • 1
\
  • 2
\
  • .
Eac h time suc h an in terv al ends without an y
  • j
ha ving
  • ccurred
during it,
  • ne
violation is coun ted: (19) W eigh t-1 arcs are sho wn in b
  • ld;
  • thers
are w eigh t-0.

a ends (other) b during a a ends (other)

A candidate that do es see a
  • j
during an
  • can
go and rest in the righ t-hand state for the duration
  • f
the . Let us ll in the details
  • f
(19). Ho w do w e detect the end
  • f
an ? Because
  • ne
  • r
more
  • f
the
  • i
end (], |), while all the
  • i
either end
  • r
con tin ue (+), so that w e kno w w e are lea ving an . 6 Th us: (20)

Sigma - (in

  • r end all ai)

(in or end all ai)

  • (in all ai)

(in all ai) - (some bj) (in all ai) & (some bj) (in or end all ai) - (in all ai) in all ai

An un usually complex example is sho wn in (21). Note that to preserv e the form
  • f
the predicates in (17) and k eep the automaton deterministic, w e need to split some
  • f
the arcs ab
  • v
e in to m ulti- ple arcs. Eac h
  • j
gets its
  • wn
arc, and w e m ust also expand set dierences in to m ultiple arcs, using the sc heme W
  • x
^ y ^ z = W _ :(x ^ y ^ z ) = (W ^ :x) _ (W ^ x ^ :y ) _ (W ^ x ^ y ^ :z ). 6 It is imp
  • rtan
t to tak e ], not +, as
  • ur
indicatio n that w e ha v e b een inside a constituen t. This means that the timeline h[ ;
  • ih+
;
  • i*h+;
[ ih+; + i*h ] ; +ih- ; +i*h-; ] i cannot a v
  • id
violating a clash constrain t simply b y instan tiat- ing the h+; +i* part as . F urthermore, the ] con v en tion means that a zero-width input constituen t (more pre- cisely , a sequence
  • f
zero-width constituen ts, represen ted as a single | sym b
  • l)
will
  • ften
act as if it has an in terior. Th us if V syncopates as in fo
  • tnote
2, it still violates the parse constrain t V ! V . This is an explicit prop ert y
  • f
OTP:
  • therwise,
nothing that failed to parse w
  • uld
ev er violate P arse, b ecause it w
  • uld
b e gone! On the
  • ther
hand, ] do es not ha v e this sp ecial role
  • n
the righ t hand side
  • f
! , whic h do es not quan tify univ ersally
  • v
er an in terv al. The consequence for zero- width consituen ts is that ev en if a zero-width V
  • v
erlaps (at the edge, sa y) with a surface V , the latter cannot claim
  • n
this basis alone to satisfy Fill: V ! V . This to
  • seems
lik e the righ t mo v e linguisti cal ly , although fur- ther study is needed.
slide-6
SLIDE 6 (21) ( p and q ) ! ( b
  • r
c [ )

]|+p [-q [-p +p ]|q ]|p ]|+q +p +q []|-b ]+-c +p +q +b +p +q []|-b [|c +p ]|q ]|p ]|+q +p +q

Ho w ab
  • ut
  • ther
cases? If the an teceden t
  • f
an implication is not an in terv al, then the con- strain t needs
  • nly
  • ne
state, to p enalize mo- men ts when the an teceden t holds and the con- sequen t do es not. Finally , a clash constrain t
  • 1
?
  • 2
?
  • is
iden tical to the implicatio n constrain t (
  • 1
and
  • 2
and
  • )
! f alse . Clash FSAs are therefore just degenerate v ersions
  • f
im- plication FSAs, where the arcs lo
  • king
for
  • j
do not exist b ecause they w
  • uld
accept no sym b
  • l.
(22) sho ws the constrain ts ( p and ] q ) ! b and p ? q . (22)

[]|-b +p [+-q []|-b []|-p +b []|-b +p ]|q ]|+p [-q [-p +p ]|q ]|p ]|+q +p +q

4 Computational requirem en ts 4.1 Generalized Alignmen t is not nite-stat e Ellison's metho d can succeed
  • nly
  • n
a restricted formalism suc h as OTP , whic h do es not admit suc h constrain ts as the p
  • pular
Generalized Alignmen t (GA) family
  • f
(McCarth y & Prince, 1993). A t yp- ical GA constrain t is Align(F , L, Wor d , L), whic h sums the n um b er
  • f
syllables b et w een eac h left fo
  • t
edge F [ and the left edge
  • f
the proso dic w
  • rd.
Min- imizing this sum ac hiev es a kind
  • f
left-to-righ t it- erativ e fo
  • ting.
OTP argues that suc h non-lo cal, arithmetic constrain ts can generally b e eliminated in fa v
  • r
  • f
simpler mec hanisms (Eisner, in press). Ellison's metho d cannot directly express the ab
  • v
e GA constrain t, ev en
  • utside
OTP , b ecause it cannot compute a quadratic function + 2 + 4 +
  • n
a string lik e [
  • ]
F [
  • ]
F [
  • ]
F
  • .
P ath w eigh ts in an FSA cannot b e more than linear
  • n
string length. P erhaps the ltering
  • p
eration
  • f
an y GA con- strain t can b e sim ulated with a system
  • f
nite- state constrain ts? No: GA is simply to
  • p
  • w
erful. The pro
  • f
is suppressed here for reasons
  • f
space, but it relies
  • n
a form
  • f
the pumping lemma for w eigh ted FSAs. The k ey insigh t is that among can- didates with a xed n um b er
  • f
syllables and a single (oating) tone, Align( ; L; H ; L) prefers candidates where the tone do c ks at the c enter. A similar argu- men t for w eigh ted CF Gs (using two tones) sho ws this constrain t to b e to
  • hard
ev en for (T esar, 1996). 4.2 Generation is NP-hard ev en in OTP When algorithm (14) is implemen ted literally and with mo derate care, using an
  • ptimizing
C compiler
  • n
a 167MHz UltraSP AR C, it tak es fully 3.5 min utes (real time) to disco v er a stress pattern for the syl- lable sequence ^^^|^^| |^^^. 7 The automata b ecome impractically h uge due to in tersections. Muc h
  • f
the explosion in this case is in tro duced at the start and can b e a v
  • ided.
Because Repns has 2 jTiers j = 512 states, S , S 1 , and S 2 eac h ha v e ab
  • ut
5000 states and 500,000 to 775,000 arcs. Thereafter the S i automata b ecome smaller, thanks to the pruning p erformed at eac h step b y BestP aths. This rep eated pruning is already an impro v em en t
  • v
er Ellison's
  • riginal
algorithm (whic h sa v es prun- ing till the end, and so con tin ues to gro w exp
  • nen-
tially with ev ery new constrain t). If w e mo dify (14) further, so that eac h tier rule from Repns is in ter- sected with the candidate set
  • nly
when its tier is rst men tioned b y a constrain t, then the automata are pruned bac k as quic kly as they gro w. They ha v e ab
  • ut
10 times few er states and 100 times few er arcs, and the generation time drops to 2.2 seconds. This is a k ey practical tric k. But neither it nor an y
  • ther
tric k can help for all grammars, for in the w
  • rst
case, the OTP generation problem is NP-hard
  • n
the n um b er
  • f
tiers used b y the grammar. The lo calit y
  • f
constrain ts do es not sa v e us here. Man y NP-complete problems, suc h as graph coloring
  • r
bin pac king, attempt to minim ize some global coun t sub ject to n umerous lo cal restrictions. In the case
  • f
OTP generation, the global coun t to minimi ze is the degree
  • f
violation
  • f
C i , and the lo cal restrictions are imp
  • sed
b y C 1 ; C 2 ; : : : C i1 . Pro
  • f
  • f
NP-hardness (b y p
  • lytime
reduction from Hamilton P ath). Giv en G = h V (G); E (G)i , an n-v ertex directed graph. Put Tiers = V (G) [ fS tem ; S g. Consider the follo wing v ector
  • f
O (n 2 ) primitiv e constrain ts (ordered as sho wn): (23) a. 8v 2 V (G): v [ ! S [ [\v ertices are S 's"] b. 8v 2 V (G): ] v ! ] S [\v ertices are S 's"] c. 8v 2 V (G): S tem ! v [\at least
  • ne"]
d. 8u; v 2 V (G): u ? v [\disjoin t"] e. S tem ? S [\no extra copies"] f. 8u; v 2 V (G) s.t. uv 62 E (G): ] u ? v [ g. ] S ! S [ [\single path"] 7 The grammar is tak en from the OTP stress t yp
  • l-
  • gy
prop
  • sed
b y (Eisner, in press). It has tier rules for 9 tiers, and then sp ends 26 constrain ts
  • n
  • b
vious univ er- sal prop erties
  • f
moras and syllables, follo w ed b y 6 con- strain ts for univ ersal prop erties
  • f
feet and stress marks and nally 6 substan tiv e constrain ts that can b e freely rerank ed to yield dieren t stress systems, suc h as left-to- righ t iam bs with iam bic lengthening.
slide-7
SLIDE 7 Supp
  • se
the input is simply [S tem ]. Filtering Gen (input ) through constrain ts (23a{e), w e are left with just those candidates where S tem b ears n dis- join t constituen ts
  • f
t yp e S , eac h co extensiv e with a constituen t b earing a dieren t lab el v 2 V (G). (These candidates satisfy (23a{d) but violate (23e) n times.) (23f ) sa ys that a c hain
  • f
abutting con- stituen ts [u] [v ] [w ]
  • is
allo w ed
  • nly
if it corresp
  • nds
to a path in G. Finally , (23g) forces the gramma r to minim ize the n um b er
  • f
suc h c hains. If the minim um is 1 (i.e., an arbitrarily selected
  • utput
candidate vi-
  • lates
(23g)
  • nly
  • nce),
then G has a Hamilton path. When confron ted with this pathological case, the nite-state metho ds resp
  • nd
essen tially b y en umer- ating all p
  • ssible
p erm utations
  • f
V (G) (though with sharing
  • f
prexes). The mac hine state stores, among
  • ther
things, the subset
  • f
V (G) that has al- ready b een seen; so there are at least 2 jTiers j states. It m ust b e emphasized that if the gramma r is xe d in advanc e, algorithm (14) is close to linear in the size
  • f
the input form: it is dominated b y a constan t n um b er
  • f
calls to Dijkstra's BestP aths metho d, eac h taking time O (jinput arcsj log jinput statesj). There are nonetheless three reasons wh y the ab
  • v
e result is imp
  • rtan
t. (a) It raises the prac- tical sp ecter
  • f
h uge constan t factors (> 2 40 ) for real grammars. Ev en if a xed grammar can someho w b e compiled in to a fast form for use with man y inputs, the compilation itself will ha v e to deal with this con- stan t factor. (b) The result has the in teresting im- plication that candidate sets can arise that cannot b e concisely represen ted with FSAs. F
  • r
if all S i w ere p
  • lynomial-sized
in (14), the algorithm w
  • uld
run in p
  • lynomial
time. (c) Finally , the gramm ar is not xed in all circumstances: b
  • th
linguists and c hildren crucially exp erimen t with dieren t theories. 4.3 W
  • rk
in progress: F actored automata The previous section ga v e a useful tric k for sp eeding up Ellison's algorithm in the t ypical case. W e are curren tly exp erimen ting with additional impro v e- men ts along the same lines, whic h attempt to de- fer in tersection b y k eeping tiers separate as long as p
  • ssible.
The idea is to represen t the candidate set S i not as a large un w eigh ted FSA, but rather as a collection A
  • f
preferably small un w eigh ted FSAs, called factors, eac h
  • f
whic h men tions as few tiers as p
  • ssible.
This collection, called a factored automaton, serv es as a compact represen tation
  • f
\A. It usually has far few er states than \A w
  • uld
if the in tersection w ere carried
  • ut.
F
  • r
instance, the natural factors
  • f
S are input and all the tier rules (see 18). This requires
  • nly
O (jTiers j + jinput j) states, not O (2 jTiers j
  • jinput
j). Using factored automata helps Ellison's algorithm (14) in sev eral w a ys:
  • The
candidate sets S i tend to b e represen ted more compactly .
  • In
(14), the constrain t C i+1 needs to b e in ter- sected with
  • nly
certain factors
  • f
S i .
  • Sometimes
C i+1 do es not need to b e in tersected with the input, b ecause they do not men tion an y
  • f
the same tiers. Then step i + 1 can b e p erformed in time indep enden t
  • f
input length. Example: input = ^^ ^|^^||^^^, whic h is a 43-state automaton, and C 1 is F ! x , whic h sa ys that ev ery fo
  • t
b ears a stress mark. Then to nd S 1 = BestP aths (S \ C 1 ), w e need
  • nly
consider S 's tier rules for F and x, whic h require w ell-formed feet and w ell-formed stress marks, and com bine them with C 1 to get a new factor that requires stressed feet. No
  • ther
factors need b e in v
  • lv
ed. The k ey
  • p
eration in (14) is to nd Bestpaths(A \ C ), where A is an un w eigh ted factored automaton represen ting a candidate set, and C is an
  • rdinary
w eigh ted FSA (a constrain t). This is the b est in ter- section problem. F
  • r
concreteness let us supp
  • se
that C enco des F ! x , a t w
  • -state
constrain t. A naiv e idea is simply to add F ! x to A as a new factor. Ho w ev er, this ignores the BestP aths step: w e wish to k eep just the b est paths in F ! x that are compatible with A. Suc h paths migh t b e long and include unrolled cycles in F ! x . F
  • r
ex- ample, a w eigh t-1 path w
  • uld
describ e a c hain
  • f
  • p-
timal stressed feet in terrupted b y a single unstressed
  • ne
where A happ ens to blo c k stress. A corrected v arian t is to put I = \A and run BestP aths
  • n
I \ C . Let the pruned result b e B . W e could add B directly bac k to to A as a new factor, but it is large. W e w
  • uld
rather add a smaller factor B that has the same eect, in that I \ B = I \ B . (B will lo
  • k
something lik e the
  • riginal
C , but with some paths missing, some states split, and some cycles unrolled.) Observ e that eac h state
  • f
B has the form i
  • c
for some i 2 I and c 2 C . W e form B from B b y \re-merging" states i
  • c
and i
  • c
where p
  • ssible,
using an approac h similar to DF A minim izati
  • n.
Of course, this v arian t is not v ery ecien t, b ecause it requires us to nd and use I = \A. What w e really w an t is to follo w the ab
  • v
e idea but use a smaller I ,
  • ne
that considers just the relev an t factors in A. W e need not consider factors that will not aect the c hoice
  • f
paths in C ab
  • v
e. V arious approac hes are p
  • ssible
for c ho
  • sing
suc h an I . The follo wing tec hnique is completely general, though it ma y
  • r
ma y not b e practical. Observ e that for BestP aths to do the correct thing, I needs to reect the sum total
  • f
A 's con- strain ts
  • n
F and x, the tiers that C men tions. More formally , w e w an t I to b e the pro jection
  • f
the can- didate set \A
  • n
to just the F and x tiers. Unfortu- nately , these constrain ts are not just reected in the factors men tioning F
  • r
x, since the allo w ed con- gurations
  • f
F and x ma y b e mediated through
slide-8
SLIDE 8 additional factors. As an example, there ma y b e a factor men tioning F and , some
  • f
whose paths are incompatible with the input factor, b ecause the lat- ter allo ws
  • nly
in certain places
  • r
b ecause
  • nly
allo ws paths
  • f
length 14. 1. Num b er the tiers suc h that F and x are n um- b ered 0, and all
  • ther
tiers ha v e distinct p
  • sitiv
e n um b ers. 2. P artition the factors
  • f
A in to lists L ; L 1 ; L 2 ; : : : L k , according to the highest-n um b ered tier they men tion. (An y factor that men tions no tiers at all go es
  • n
to L .) 3. If k = 0, then return \L k as
  • ur
desired I . 4. Otherwise, \L k exhausts tier k 's abilit y to me- diate relations among the factors. Mo dify the arc lab els
  • f
\L k so that they no longer restrict (men tion) k . Then add a determinized, mini- mized v ersion
  • f
the result to to L j , where j is the highest-n um b ered tier it no w men tions. 5. Decremen t k and return to step 3. If \A has k factors, this tec hnique m ust p er- form k
  • 1
in tersections, just as if w e had put I = \A . Ho w ev er, it in tersp erses the in tersections with determinization and minim izatio n
  • p
erations, so that the automata b eing in tersected tend not to b e large. In the b est case, w e will ha v e k
  • 1
in tersection-determinization-minim i zations that cost O (1) apiece, rather than k
  • 1
in tersections that cost up to O (2 k ) apiece. 5 Conclusions Primitiv e Optimalit y Theory ,
  • r
OTP , is an attempt to pro duce a a simple, rigorous, constrain t-based mo del
  • f
phonology that is closely tted to the needs
  • f
w
  • rking
linguists. I b eliev e it is w
  • rth
study b
  • th
as a h yp
  • thesis
ab
  • ut
Univ ersal Grammar and as a formal
  • b
ject. The presen t pap er in tro duces the OTP formal- ization to the computational linguistics comm unit y . W e ha v e seen t w
  • formal
results
  • f
in terest, b
  • th
ha ving to do with generation
  • f
surface forms:
  • OTP's
gener ative p
  • wer
is lo w: nite-state
  • ptimization.
In particular it is more con- strained than theories using Generalized Align- men t. This is go
  • d
news for comprehension and learning.
  • OTP's
c
  • mputational
c
  • mplexity,
for genera- tion, is nonetheless high: NP-hard
  • n
the size
  • f
the grammar. This is mildly unfortunate for OTP and for the OT approac h in general. It remains true that for a xed gramma r, the time to do generation is close to linear
  • n
the size
  • f
the input (Ellison, 1994), whic h is heart- ening if w e in tend to
  • ptimize
long utterances with resp ect to a xed phonology . Finally , w e ha v e considered the prosp ect
  • f
building a practical to
  • l
to generate
  • ptimal
  • utputs
from OT theories. W e sa w ab
  • v
e to set up the represen- tations and constrain ts ecien tly using determinis- tic nite-state automata, and ho w to remedy some hidden ineciencies in the seminal w
  • rk
  • f
(Elli- son, 1994), ac hieving at least a 100-fold
  • bserv
ed sp eedup. Dela y ed in tersection and aggressiv e prun- ing pro v e to b e imp
  • rtan
t. Aggressiv e minimi zation and a more compact, \factored" represen tation
  • f
automata ma y also turn
  • ut
to help. References Bird, Stev en, & T. Mark Ellison. One Lev el Phonol-
  • gy:
Autosegmen tal represen tations and rules as nite automata. Comp. Linguistics 20:55{90. Cole, Jennifer, & Charles Kisseb erth. 1994. An
  • p-
timal domains theory
  • f
harmon y . Studies in the Linguistic Scienc es 24: 2. Eisner, Jason. In press. Decomp
  • sing
F
  • tF
  • rm:
Primitiv e constrain ts in OT. Pro ceedings
  • f
SCIL 8, NYU. Published b y MIT W
  • rking
P ap ers. (Av ailable at h ttp://ruccs.rutgers.edu/roa.h tml.) Eisner, Jason. What constrain ts should OT allo w? Handout for talk at LSA, Chicago. (Av ailable at h ttp://ruccs.rutgers.edu/roa.h tml.) Ellison, T. Mark. Phonological deriv ation in
  • pti-
malit y theory . COLING '94, 1007-1013. Goldsmith, John. 1976. Autosegmen tal phonology . Cam bridge, Mass: MIT PhD. dissertation. Pub- lished 1979 b y New Y
  • rk:
Garland Press. Goldsmith, John. 1990. Autosegmen tal and metrical phonology . Oxford: Blac kw ell Publishers. McCarth y , John, & Alan Prince. 1993. General- ized alignmen t. Y e arb
  • k
  • f
Morpholo gy, ed. Geert Bo
  • ij
& Jaap v an Marle, pp. 79-153. Klu w er. McCarth y , John and Alan Prince. 1995. F aithful- ness and reduplicativ e iden tit y . In Jill Bec kman et al., eds., P ap ers in Optimalit y Theory . UMass, Amherst: GLSA. 259{384. Prince, Alan, & P aul Smolensky . 1993. Optimality the
  • ry:
c
  • nstr
aint inter action in gener ative gr am- mar. T ec hnical Rep
  • rts
  • f
the Rutgers Univ ersit y Cen ter for Cognitiv e Science. Selkirk, Elizab eth. 1980. Proso dic domains in phonology: Sanskrit revisited. In Mark Arano and Mary-Louise Kean, eds., Junctur e, pp. 107{ 129. Anna Libri, Saratoga, CA. T esar, Bruce. 1995. Computational Optimalit y The-
  • ry
. Ph.D. dissertation, U.
  • f
Colorado, Boulder. T esar, Bruce. 1996. Computing
  • ptimal
descriptions for Optimalit y Theory: Gramma rs with con text- free p
  • sition
structures. Pro ceedings
  • f
the 34th Ann ual Meeting
  • f
the A CL.