Proc. of the 37th ACL (Assoc. for Computational Linguistics) - - PDF document

proc of the 37th acl assoc for computational linguistics
SMART_READER_LITE
LIVE PREVIEW

Proc. of the 37th ACL (Assoc. for Computational Linguistics) - - PDF document

Proc. of the 37th ACL (Assoc. for Computational Linguistics) (1999) Ecien t P arsing for Bilexical Con text-F ree Grammars and Head Automaton Grammars Jason Eisner Giorgio Satta Dept. of Computer &


slide-1
SLIDE 1 Proc.
  • f
the 37th ACL (Assoc. for Computational Linguistics) (1999) Ecien t P arsing for Bilexical Con text-F ree Grammars and Head Automaton Grammars
  • Jason
Eisner Dept.
  • f
Computer & Information Science Univ ersit y
  • f
P ennsylv ania 200 South 33rd Street, Philadelphia, P A 19104 USA jeisner@linc.cis.upenn.edu Giorgio Satta Dip. di Elettronica e Informatica Univ ersit a di P ado v a via Gradenigo 6/A, 35131 P ado v a, Italy satta@dei.unipd.it Abstract Sev eral recen t sto c hastic parsers use bilexic al grammars, where eac h w
  • rd
t yp e idiosyncrat- ically prefers particular complemen ts with par- ticular head w
  • rds.
W e presen t O (n 4 ) parsing algorithms for t w
  • bilexical
formalisms, impro v- ing the prior upp er b
  • unds
  • f
O (n 5 ). F
  • r
a com- mon sp ecial case that w as kno wn to allo w O (n 3 ) parsing (Eisner, 1997), w e presen t an O (n 3 ) al- gorithm with an impro v ed grammar constan t. 1 In tro duction Lexicalized grammar formalisms are
  • f
b
  • th
theoretical and practical in terest to the com- putational linguistics comm unit y . Suc h for- malisms sp ecify syn tactic facts ab
  • ut
eac h w
  • rd
  • f
the language|in particular, the t yp e
  • f
argumen ts that the w
  • rd
can
  • r
m ust tak e. Early mec hanisms
  • f
this sort included catego- rial grammar (Bar-Hillel, 1953) and sub catego- rization frames (Chomsky , 1965). Other lexi- calized formalisms include (Sc hab es et al., 1988; Mel'
  • cuk,
1988; P
  • llard
and Sag, 1994). Besides the p
  • ssible
argumen ts
  • f
a w
  • rd,
a natural-language grammar do es w ell to sp ecify p
  • ssible
head w
  • rds
for those argumen ts. \Con- v ene" requires an NP
  • b
ject, but some NPs are more seman tically
  • r
lexically appropriate here than
  • thers,
and the appropriateness dep ends largely
  • n
the NP's head (e.g., \meeting"). W e use the general term bilexical for a grammar that records suc h facts. A bilexical grammar mak es man y stipulations ab
  • ut
the compatibil- it y
  • f
particular pairs
  • f
w
  • rds
in particular roles. The acceptabilit y
  • f
\Nora con v ened the
  • The
authors w ere supp
  • rted
resp ectiv ely under ARP A Gran t N6600194-C-6043 \Human Language T ec hnology" and Ministero dell'Univ ersit a e della Ricerca Scien tica e T ecnologica pro ject \Metho dologies and T
  • ls
  • f
High P erformance Systems for Multimedia Applications." part y" then dep ends
  • n
the grammar writer's assessmen t
  • f
whether parties can b e con v ened. Sev eral recen t real-w
  • rld
parsers ha v e im- pro v ed state-of-the-art parsing accuracy b y re- lying
  • n
probabilistic
  • r
w eigh ted v ersions
  • f
bilexical grammars (Alsha wi, 1996; Eisner, 1996; Charniak, 1997; Collins, 1997). The ra- tionale is that soft selectional restrictions pla y a crucial role in disam biguation. 1 The c hart parsing algorithms used b y most
  • f
the ab
  • v
e authors run in time O (n 5 ), b ecause bilexical grammars are enormous (the part
  • f
the grammar relev an t to a length-n input has size O (n 2 ) in practice). Hea vy probabilistic pruning is therefore needed to get acceptable run times. But in this pap er w e sho w that the complexit y is not so bad after all:
  • F
  • r
bilexicalized con text-free grammars, O (n 4 ) is p
  • ssible.
  • The
O (n 4 ) result also holds for head au- tomaton grammars.
  • F
  • r
a v ery common sp ecial case
  • f
these grammars where an O (n 3 ) algorithm w as previously kno wn (Eisner, 1997), the gram- mar constan t can b e reduced without harming the O (n 3 ) prop ert y . Our algorithmic tec hnique throughout is to pro- p
  • se
new kinds
  • f
sub deriv ations that are not constituen ts. W e use dynamic programming to assem ble suc h sub deriv ations in to a full parse. 2 Notation for con text-free grammars The reader is assumed to b e familiar with con text-free grammars. Our notation fol- 1 Other relev an t parsers sim ultaneously consider t w
  • r
more w
  • rds
that are not necessarily in a dep endency relationship (Laert y et al., 1992; Magerman, 1995; Collins and Bro
  • ks,
1995; Chelba and Jelinek, 1998).
slide-2
SLIDE 2 lo ws (Harrison, 1978; Hop croft and Ullman, 1979). A con text-free grammar (CF G) is a tuple G = (V N ; V T ; P ; S ), where V N and V T are nite, disjoin t sets
  • f
non terminal and terminal sym- b
  • ls,
resp ectiv ely , and S 2 V N is the start sym- b
  • l.
Set P is a nite set
  • f
pro ductions ha ving the form A ! , where A 2 V N ;
  • 2
(V N [ V T )
  • .
If ev ery pro duction in P has the form A ! B C
  • r
A ! a, for A; B ; C 2 V N ; a 2 V T , then the grammar is said to b e in Chomsky Normal F
  • rm
(CNF). 2 Ev ery language that can b e generated b y a CF G can also b e generated b y a CF G in CNF. In this pap er w e adopt the follo wing con v en- tions: a; b; c; d denote sym b
  • ls
in V T , w ; x; y de- note strings in V
  • T
, and ;
  • ;
: : : denote strings in (V N [ V T )
  • .
The input to the parser will b e a CF G G together with a string
  • f
terminal sym- b
  • ls
to b e parsed, w = d 1 d 2
  • d
n . Also h; i; j; k denote p
  • sitiv
e in tegers, whic h are assumed to b e
  • n
when w e are treating them as indices in to w . W e write w i;j for the input substring d i
  • d
j (and put w i;j =
  • for
i > j ). A \deriv es" relation, written ), is asso ciated with a CF G as usual. W e also use the reexiv e and transitiv e closure
  • f
), written )
  • ,
and dene L(G) accordingly . W e write
  • |{z}
  • )
  • for
a deriv ation in whic h
  • nly
  • is
rewritten. 3 Bilexical con text-free grammars W e in tro duce next a grammar formalism that captures lexical dep endencies among pairs
  • f
w
  • rds
in V T . This formalism closely resem- bles sto c hastic grammatical formalisms that are used in sev eral existing natural language pro- cessing systems (see x1). W e will sp ecify a non- sto c hastic v ersion, noting that probabilities
  • r
  • ther
w eigh ts ma y b e attac hed to the rewrite rules exactly as in sto c hastic CF G (Gonzales and Thomason, 1978; W etherell, 1980). (See x4 for brief discussion.) Supp
  • se
G = (V N ; V T ; P ; T [$]) is a CF G in CNF. 3 W e sa y that G is bilexical i there exists a set
  • f
\delexicalized non terminals" V D suc h that V N = fA[a] : A 2 V D ; a 2 V T g and ev ery pro duction in P has
  • ne
  • f
the follo wing forms: 2 Pro duction S !
  • is
also allo w ed in a CNF grammar if S nev er app ears
  • n
the righ t side
  • f
an y pro duction. Ho w ev er, S !
  • is
not allo w ed in
  • ur
bilexical CF Gs. 3 W e ha v e a more general denition that drops the restriction to CNF, but do not giv e it here.
  • A[a]
! B [b] C [a] (1)
  • A[a]
! C [a] B [b] (2)
  • A[a]
! a (3) Th us ev ery non terminal is lexicalized at some terminal a. A constituen t
  • f
non terminal t yp e A[a] is said to ha v e terminal sym b
  • l
a as its lex- ical head, \inherited" from the constituen t's head c hild in the parse tree (e.g., C [a]). Notice that the start sym b
  • l
is necessarily a lexicalized non terminal, T [$]. Hence $ app ears in ev ery string
  • f
L(G); it is usually con v enien t to dene G so that the language
  • f
in terest is actually L (G) = fx : x$ 2 L(G)g. Suc h a grammar can enco de lexically sp ecic preferences. F
  • r
example, P migh t con tain the pro ductions
  • VP[solv
e] ! V[solv e] NP[puzzles]
  • NP[puzzles]
! DET[t w
  • ]
N[puzzles]
  • V[solv
e] ! solv e
  • N[puzzles]
! puzzles
  • DET[t
w
  • ]
! t w
  • in
  • rder
to allo w the deriv ation VP[solv e] )
  • solv
e t w
  • puzzles,
but mean while
  • mit
the sim- ilar pro ductions
  • VP[eat]
! V[eat] NP[puzzles]
  • VP[solv
e] ! V[solv e] NP[goat]
  • VP[sleep]
! V[sleep] NP[goat]
  • NP[goat]
! DET[t w
  • ]
N[goat] since puzzles are not edible, a goat is not solv- able, \sleep" is in transitiv e, and \goat" cannot tak e plural determiners. (A sto c hastic v ersion
  • f
the grammar could implemen t \soft prefer- ences" b y allo wing the rules in the second group but assigning them v arious lo w probabilities.) The cost
  • f
this expressiv eness is a v ery large grammar. Standard con text-free parsing algo- rithms are inecien t in suc h a case. The CKY algorithm (Y
  • unger,
1967; Aho and Ullman, 1972) is time O (n 3
  • jP
j), where in the w
  • rst
case jP j = jV N j 3 (one ignores unary pro ductions). F
  • r
a bilexical grammar, the w
  • rst
case is jP j = jV D j 3
  • jV
T j 2 , whic h is large for a large v
  • cabulary
V T . W e ma y impro v e the analysis somewhat b y
  • bserving
that when parsing d 1
  • d
n , the CKY algorithm
  • nly
considers non terminals
  • f
the form A[d i ]; b y restricting to the relev an t pro- ductions w e
  • btain
O (n 3
  • jV
D j 3
  • min
(n; jV T j) 2 ). 2
slide-3
SLIDE 3 W e
  • bserv
e that in practical applications w e alw a ys ha v e n
  • jV
T j . Let us then restrict
  • ur
analysis to the (innite) set
  • f
input in- stances
  • f
the parsing problem that satisfy re- lation n < jV T j. With this assumption, the asymptotic time complexit y
  • f
the CKY algo- rithm b ecomes O (n 5
  • jV
D j 3 ). In
  • ther
w
  • rds,
it is a factor
  • f
n 2 slo w er than a comparable non-lexicalized CF G. 4 Bilexical CF G in time O (n 4 ) In this section w e giv e a recognition algorithm for bilexical CNF con text-free grammars, whic h runs in time O (n 4
  • max(p;
jV D j 2 )) = O (n 4
  • jV
D j 3 ). Here p is the maxim um n um b er
  • f
pro- ductions sharing the same pair
  • f
terminal sym- b
  • ls
(e.g., the pair (b; a) in pro duction (1)). The new algorithm is asymptotically more ecien t than the CKY algorithm, when restricted to in- put instances satisfying the relation n < jV T j. Where CKY recognizes
  • nly
constituen t sub- strings
  • f
the input, the new algorithm can rec-
  • gnize
three t yp es
  • f
sub deriv ations, sho wn and describ ed in Figure 1(a). A declarativ e sp eci- cation
  • f
the algorithm is giv en in Figure 1(b). The deriv abilit y conditions
  • f
(a) are guaran- teed b y (b), b y induction, and the correctness
  • f
the acceptance condition (see caption) follo ws. This declarativ e sp ecication, lik e CKY, ma y b e implemen ted b y b
  • ttom-up
dynamic pro- gramming. W e sk etc h
  • ne
suc h metho d. F
  • r
eac h p
  • ssible
item, as sho wn in (a), w e main tain a bit (indexed b y the parameters
  • f
the item) that records whether the item has b een deriv ed y et. All these bits are initially zero. The algo- rithm mak es a single pass through the p
  • ssible
items, setting the bit for eac h if it can b e deriv ed using an y rule in (b) from items whose bits are already set. A t the end
  • f
this pass it is straigh t- forw ard to test whether to accept w (see cap- tion). The pass considers the items in increas- ing
  • rder
  • f
width, where the width
  • f
an item in (a) is dened as maxfh; i; j g
  • min
fh; i; j g. Among items
  • f
the same width, those
  • f
t yp e 4 should b e considered last. The algorithm requires space prop
  • rtional
to the n um b er
  • f
p
  • ssible
items, whic h is at most n 3 jV D j 2 . Eac h
  • f
the v e rule templates can instan tiate its free v ariables in at most n 4 p
  • r
(for Complete rules) n 4 jV D j 2 dieren t w a ys, eac h
  • f
whic h is tested
  • nce
and in constan t time; so the run time is O (n 4 max (p; jV D j 2 )). By comparison, the CKY algorithm uses
  • nly
the rst t yp e
  • f
item, and relies
  • n
rules whose inputs are pairs
  • @
@ B i h j
  • Q
Q C j + 1 h k . Suc h rules can b e instan tiated in O (n 5 ) dieren t w a ys for a xed grammar, yielding O (n 5 ) time complexit y . The new algorithm sa v es a factor
  • f
n b y com- bining those t w
  • constituen
ts in t w
  • steps,
  • ne
  • f
whic h is insensitiv e to k and abstracts
  • v
er its p
  • ssible
v alues, the
  • ther
  • f
whic h is insensitiv e to h and abstracts
  • v
er its p
  • ssible
v alues. It is straigh tforw ard to turn the new O (n 4 ) recognition algorithm in to a p arser for sto chas- tic bilexical CF Gs (or
  • ther
w eigh ted bilexical CF Gs). In a sto c hastic CF G, eac h non terminal A[a] is accompanied b y a probabilit y distribu- tion
  • v
er pro ductions
  • f
the form A[a] ! . A parse is just a deriv ation (pro
  • f
tree)
  • f
  • @
@ T 1 h n , and its probabilit y|lik e that
  • f
an y deriv ation w e nd|is dened as the pro duct
  • f
the prob- abilities
  • f
all pro ductions used to condition in- ference rules in the pro
  • f
tree. The highest- probabilit y deriv ation for an y item can b e re- constructed recursiv ely at the end
  • f
the parse, pro vided that eac h item main tains not
  • nly
a bit indicating whether it can b e deriv ed, but also the probabilit y and instan tiated ro
  • t
rule
  • f
its highest-probabilit y deriv ation tree. 5 A more ecien t v arian t W e no w giv e a v arian t
  • f
the algorithm
  • f
x4; the v arian t has the same asymptotic complexit y but will
  • ften
b e faster in practice. Notice that the A tt a ch-Left rule
  • f
Fig- ure 1(b) tries to com bine the non terminal lab el B [d h ]
  • f
a previously deriv ed constituen t with every p
  • ssible
non terminal lab el
  • f
the form C [d h ]. The impro v ed v ersion, sho wn in Figure 2, restricts C [d h ] to b e the lab el
  • f
a previously de- riv ed adjacen t constituen t. This impro v es sp eed if there are not man y suc h constituen ts and w e can en umerate them in O (1) time apiece (using a sparse parse table to store the deriv ed items). It is necessary to use an agenda data struc- ture (Ka y , 1986) when implemen ting the declar- ativ e algorithm
  • f
Figure 2. Deriving narro w er items b efore wider
  • nes
as b efore will not w
  • rk
here b ecause the rule Hal ve deriv es narro w items fr
  • m
wide
  • nes.
3
slide-4
SLIDE 4 (a)
  • @
@ A i h j (i
  • h
  • j
, A 2 V D ) is deriv ed i A[d h ] )
  • w
i;j @
  • P
P A i j C h (i
  • j
< h, A; C 2 V D ) is deriv ed i A[d h ] ) B [d h ] | {z } C [d h ] )
  • w
i;j C [d h ] for some B ; h @
  • P
P
  • A
i j C h (h < i
  • j
, A; C 2 V D ) is deriv ed i A[d h ] ) C [d h ]B [d h ] | {z } )
  • C
[d h ]w i;j for some B ; h (b) St ar t :
  • @
@ A h h h A[d h ] ! d h A tt a ch-Left :
  • @
@ B i h j @
  • P
P A i j C h A[d h ] ! B [d h ]C [d h ] Complete-Right : @
  • P
P A i j C h
  • Q
Q C j + 1 h k
  • @
@ A i h k A tt a ch-Right :
  • @
@ B i h j @
  • P
P
  • A
i j C h A[d h ] ! C [d h ]B [d h ] Complete-Left :
  • Q
Q C i h j
  • 1
@
  • P
P
  • A
j k C h
  • @
@ A i h k Figure 1: An O (n 4 ) recognition algorithm for CNF bilexical CF G. (a) T yp es
  • f
items in the parse table (c hart). The rst is syn tactic sugar for the tuple [4; A; i; h; j ], and so
  • n.
The stated conditions assume that d 1 ; : : : d n are all distinct. (b) Inference rules. The algorithm deriv es the item b elo w | | { if the items ab
  • v
e | | { ha v e already b een deriv ed and an y condition to the righ t
  • f
| | { is met. It accepts input w just if item [4; T ; 1; h; n] is deriv ed for some h suc h that d h = $. (a)
  • @
@ A i h j (i
  • h
  • j
, A 2 V D ) is deriv ed i A[d h ] )
  • w
i;j
  • A
i h (i
  • h,
A 2 V D ) is deriv ed i A[d h ] )
  • w
i;j for some j
  • h
@ @ A h j (h
  • j
, A 2 V D ) is deriv ed i A[d h ] )
  • w
i;j for some i
  • h
@
  • P
P A i j C h (i
  • j
< h, A; C 2 V D ) is deriv ed i A[d h ] ) B [d h ] | {z } C [d h ] )
  • w
i;j C [d h ] )
  • w
i;k for some B ; h ; k @
  • P
P
  • A
i j C h (h < i
  • j
, A; C 2 V D ) is deriv ed i A[d h ] ) C [d h ]B [d h ] | {z } )
  • C
[d h ]w i;j )
  • w
k ;j for some B ; h ; k (b) As in Figure 1(b) ab
  • v
e, but add Hal ve and c hange A tt a ch-Left and A tt a ch-Right as sho wn. Hal ve :
  • @
@ A i h j
  • A
i h @ @ A h j A tt a ch-Left :
  • @
@ B i h j
  • C
j + 1 h @
  • P
P A i j C h A[d h ] ! B [d h ]C [d h ] A tt a ch-Right : Q Q C h j
  • 1
  • @
@ B j h k @
  • P
P
  • A
j k C h A[d h ] ! C [d h ]B [d h ] Figure 2: A more ecien t v arian t
  • f
the O (n 4 ) algorithm in Figure 1, in the same format. 4
slide-5
SLIDE 5 6 Multiple w
  • rd
senses Rather than parsing an input string directly , it is
  • ften
desirable to parse another string related b y a (p
  • ssibly
sto c hastic) transduction. Let T b e a nite-state transducer that maps a mor- pheme sequence w 2 V
  • T
to its
  • rthographic
re- alization, a grapheme sequence
  • w
. T ma y re- alize arbitrary morphological pro cesses, includ- ing axation, lo cal clitic mo v emen t, deletion
  • f
phonological n ulls, forbidden
  • r
dispreferred k
  • grams,
t yp
  • graphical
errors, and mapping
  • f
m ultiple senses
  • n
to the same grapheme. Giv en grammar G and an input
  • w
, w e ask whether
  • w
2 T (L(G)). W e ha v e extended all the algo- rithms in this pap er to this case: the items sim- ply k eep trac k
  • f
the transducer state as w ell. Due to space constrain ts, w e sk etc h
  • nly
the sp ecial case
  • f
m ultiple senses. Supp
  • se
that the input is
  • w
=
  • d
1
  • d
n , and eac h
  • d
i has up to g p
  • ssible
senses. Eac h item no w needs to trac k its head's sense along with its head's p
  • sition
in
  • w
. Wherev er an item formerly recorded a head p
  • sition
h (similarly h ), it m ust no w record a pair (h; d h ), where d h 2 V T is a sp ecic sense
  • f
  • d
h . No rule in Figures 1{2 (or Figure 3 b elo w) will men tion more than t w
  • suc
h pairs. So the time complexit y increases b y a factor
  • f
O (g 2 ). 7 Head automaton grammars in time O (n 4 ) In this section w e sho w that a length-n string generated b y a head automaton grammar (Al- sha wi, 1996) can b e parsed in time O (n 4 ). W e do this b y pro viding a translation from head automaton grammars to bilexical CF Gs. 4 This result impro v es
  • n
the head-automaton parsing algorithm giv en b y Alsha wi, whic h is analogous to the CKY algorithm
  • n
bilexical CF Gs and is lik ewise O (n 5 ) in practice (see x3). A head automaton grammar (HA G) is a function H : a 7! H a that denes a head au- tomaton (HA) for eac h elemen t
  • f
its (nite) domain. Let V T = domain (H ) and D = f!; g. A sp ecial sym b
  • l
$ 2 V T pla ys the role
  • f
start sym b
  • l.
F
  • r
eac h a 2 V T , H a is a tuple (Q a ; V T ;
  • a
; I a ; F a ), where
  • Q
a is a nite set
  • f
states; 4 T ranslation in the
  • ther
direction is p
  • ssible
if the HA G formalism is extended to allo w m ultiple senses p er w
  • rd
(see x6). This mak es the formalisms equiv alen t.
  • I
a ; F a
  • Q
a are sets
  • f
initial and nal states, resp ectiv ely;
  • a
is a transition function mapping Q a
  • V
T
  • D
to 2 Q a , the p
  • w
er set
  • f
Q a . A single head automaton is an acceptor for a language
  • f
string pairs hz l ; z r i 2 V
  • T
  • V
  • T
. In- formally , if b is the leftmost sym b
  • l
  • f
z r and q 2
  • a
(q ; b; !), then H a can mo v e from state q to state q , matc hing sym b
  • l
b and remo ving it from the left end
  • f
z r . Symmetrically , if b is the righ tmost sym b
  • l
  • f
z l and q 2
  • a
(q ; b; ) then from q H a can mo v e to q , matc hing sym b
  • l
b and remo ving it from the righ t end
  • f
z l . 5 More formally , w e asso ciate with the head au- tomaton H a a \deriv es" relation ` a , dened as a binary relation
  • n
Q a
  • V
  • T
  • V
  • T
. F
  • r
ev- ery q 2 Q, x; y 2 V
  • T
, b 2 V T , d 2 D , and q 2
  • a
(q ; b; d), w e sp ecify that (q ; xb; y ) ` a (q ; x; y ) if d = ; (q ; x; by ) ` a (q ; x; y ) if d =! : The reexiv e and transitiv e closure
  • f
` a is writ- ten `
  • a
. The language generated b y H a is the set L(H a ) = fhz l ; z r i j (q ; z l ; z r ) `
  • a
(r ; ; ); q 2 I a ; r 2 F a g: W e ma y no w dene the language generated b y the en tire grammar H . T
  • generate,
w e ex- pand the start w
  • rd
$ 2 V T in to x$y for some hx; y i 2 L(H $ ), and then recursiv ely expand the w
  • rds
in strings x and y . More formally , giv en H , w e sim ultaneously dene L a for all a 2 V T to b e minimal suc h that if hx; y i 2 L(H a ), x 2 L x , y 2 L y , then x ay 2 L a , where L a 1
  • a
k stands for the concatenation language L a 1
  • L
a k . Then H generates language L $ . W e next presen t a simple construction that transforms a HA G H in to a bilexical CF G G generating the same language. The construc- tion also preserv es deriv ation am biguit y . This means that for eac h string w , there is a linear- time 1-to-1 mapping b et w een (appropriately de- 5 Alsha wi (1996) describ es HAs as accepting (or equiv- alen tly , generating) z l and z r from the
  • utside
in. T
  • mak
e Figure 3 easier to follo w, w e ha v e dened HAs as accepting sym b
  • ls
in the
  • pp
  • site
  • rder,
from the in- side
  • ut.
This amoun ts to the same thing if transitions are rev ersed, I a is exc hanged with F a , and an y transi- tion probabilities are replaced b y those
  • f
the rev ersed Mark
  • v
c hain. 5
slide-6
SLIDE 6 ned) canonical deriv ations
  • f
w b y H and canonical deriv ations
  • f
w b y G. W e adopt the notation ab
  • v
e for H and the comp
  • nen
ts
  • f
its head automata. Let V D b e an arbitrary set
  • f
size t = maxfjQ a j : a 2 V T g, and for eac h a, dene an arbitrary injection f a : Q a ! V D . W e dene G = (V N ; V T ; P ; T [$]), where (i) V N = fA[a] : A 2 V D ; a 2 V T g, in the usual manner for bilexical CF G; (ii) P is the set
  • f
all pro ductions ha ving
  • ne
  • f
the follo wing forms, where a; b 2 V T :
  • A[a]
! B [b] C [a] where A = f a (r ), B = f b (q ), C = f a (q ) for some q 2 I b , q 2 Q a , r 2
  • a
(q ; b; )
  • A[a]
! C [a] B [b] where A = f a (r ), B = f b (q ), C = f a (q ) for some q 2 I b , q 2 Q a , r 2
  • a
(q ; b; !)
  • A[a]
! a where A = f a (q ) for some q 2 F a (iii) T = f $ (q ), where w e assume WLOG that I $ is a singleton set fq g. W e
  • mit
the formal pro
  • f
that G and H admit isomorphic deriv ations and hence gen- erate the same languages,
  • bserving
  • nly
that if hx; y i = hb 1 b 2
  • b
j ; b j +1
  • b
k i 2 L(H a )| a condition used in dening L a ab
  • v
e|then A[a] )
  • B
1 [b 1 ]
  • B
j [b j ]aB j +1 [b j +1 ]
  • B
k [b k ], for an y A; B 1 ; : : : B k that map to initial states in H a ; H b 1 ; : : : H b k resp ectiv ely . In general, G has p = O (jV D j 3 ) = O (t 3 ). The construction therefore implies that w e can parse a length-n sen tence under H in time O (n 4 t 3 ). If the HAs in H happ en to b e deterministic, then in eac h binary pro duction giv en b y (ii) ab
  • v
e, sym b
  • l
A is fully determined b y a, b, and C . In this case p = O (t 2 ), so the parser will
  • p
erate in time O (n 4 t 2 ). W e note that this construction can b e straigh tforw ardly extended to con v ert sto c has- tic HA Gs as in (Alsha wi, 1996) in to sto c hastic CF Gs. Probabilities that H a assigns to state q 's v arious transition and halt actions are copied
  • n
to the corresp
  • nding
pro ductions A[a] !
  • f
G, where A = f a (q ). 8 Split head automaton grammars in time O (n 3 ) F
  • r
man y bilexical CF Gs
  • r
HA Gs
  • f
practical signicance, just as for the bilexical v ersion
  • f
link grammars (Laert y et al., 1992), it is p
  • ssi-
ble to parse length-n inputs ev en faster, in time O (n 3 ) (Eisner, 1997). In this section w e de- scrib e and discuss this sp ecial case, and giv e a new O (n 3 ) algorithm that has a smaller gram- mar constan t than previously rep
  • rted.
A head automaton H a is called split if it has no states that can b e en tered
  • n
a transi- tion and exited
  • n
a ! transition. Suc h an au- tomaton can accept hx; y i
  • nly
b y reading all
  • f
y |immediately after whic h it is said to b e in a ip state|and then reading all
  • f
x. F
  • r-
mally , a ip state is
  • ne
that allo ws en try
  • n
a ! transition and that either allo ws exit
  • n
a transition
  • r
is a nal state. W e are concerned here with head automa- ton grammars H suc h that ev ery H a is split. These corresp
  • nd
to bilexical CF Gs in whic h an y deriv ation A[a] )
  • xay
has the form A[a] )
  • xB
[a] )
  • xay
. That is, a w
  • rd's
left dep enden ts are more
  • blique
than its righ t de- p enden ts and c-command them. Suc h grammars are broadly applicable. Ev en if H a is not split, there usually exists a split head automaton H a recognizing the same language. H a exists i fx#y : hx; y i 2 L(H a )g is regular (where # 62 V T ). In particular, H a m ust exist unless H a has a cycle that includes b
  • th
and ! transitions. Suc h cycles w
  • uld
b e necessary for H a itself to accept a formal language suc h as fhb n ; c n i : n
  • 0g,
where w
  • rd
a tak es 2n de- p enden ts, but w e kno w
  • f
no natural-language motiv ation for ev er using them in a HA G. One more denition will help us b
  • und
the complexit y . A split head automaton H a is said to b e g
  • split
if its set
  • f
ip states, denoted
  • Q
a
  • Q
a , has size
  • g
. The languages that can b e recognized b y g
  • split
HAs are those that can b e written as S g i=1 L i
  • R
i , where the L i and R i are regular languages
  • v
er V T . Eisner (1997) actually dened (g
  • split)
bilexical grammars in terms
  • f
the latter prop ert y . 6 6 That pap er asso ciated a pro duct language L i
  • R
i ,
  • r
equiv alen tly a 1-split HA, with eac h
  • f
g senses
  • f
a w
  • rd
(see x6). One could do the same without p enalt y in
  • ur
presen t approac h: conning to 1-split automata w
  • uld
remo v e the g 2 complexit y factor, and then allo wing g 6
slide-7
SLIDE 7 W e no w presen t
  • ur
result: Figure 3 sp ecies an O (n 3 g 2 t 2 ) recognition algorithm for a head automaton grammar H in whic h ev ery H a is g
  • split.
F
  • r
deterministic automata, the run- time is O (n 3 g 2 t)|a considerable impro v emen t
  • n
the O (n 3 g 3 t 2 ) result
  • f
(Eisner, 1997), whic h also assumes deterministic automata. As in x4, a simple b
  • ttom-up
implemen tation will suce. F
  • r
a practical sp eedup, add @ @ s h j as an an- teceden t to the Mid rule (and ll in the parse table from righ t to left). Lik e
  • ur
previous algorithms, this
  • ne
tak es t w
  • steps
(A tt a ch, Complete) to attac h a c hild constituen t to a paren t constituen t. But instead
  • f
full constituen ts|strings xd i y 2 L d i |it uses
  • nly
half-constituen ts lik e xd i and d i y . Where CKY com bines
  • @
@ i h j
  • Q
Q j + 1 h k , w e sa v e t w
  • degrees
  • f
freedom i; k (so impro v- ing O (n 5 ) to O (n 3 )) and com bine @ @ h j
  • j
+ 1 h : The
  • ther
halv es
  • f
these constituen ts can b e at- tac hed later, b ecause to nd an accepting path for hz l ; z r i in a split head automaton,
  • ne
can sep ar ately nd the half-path b efore the ip state (whic h accepts z r ) and the half-path after the ip state (whic h accepts z l ). These t w
  • half-
paths can subsequen tly b e joined in to an ac- cepting path if they ha v e the same ip state s, i.e.,
  • ne
path starts where the
  • ther
ends. An- notating
  • ur
left half-constituen ts with s mak es this c hec k p
  • ssible.
9 Final remarks W e ha v e formally describ ed, and giv en faster parsing algorithms for, three practical gram- matical rewriting systems that capture dep en- dencies b et w een pairs
  • f
w
  • rds.
All three sys- tems admit naiv e O (n 5 ) algorithms. W e giv e the rst O (n 4 ) results for the natural formalism
  • f
bilexical con text-free grammar, and for Al- sha wi's (1996) head automaton grammars. F
  • r
the usual case, split head automaton grammars
  • r
equiv alen t bilexical CF Gs, w e replace the O (n 3 ) algorithm
  • f
(Eisner, 1997) b y
  • ne
with a smaller grammar constan t. Note that, e.g., all senses w
  • uld
restore the g 2 factor. Indeed, this approac h giv es added exibilit y: a w
  • rd's
sense, unlik e its c hoice
  • f
ip state, is visible to the HA that r e ads it. three mo dels in (Collins, 1997) are susceptible to the O (n 3 ) metho d (cf. Collins's O (n 5 )). Our dynamic programming tec hniques for c heaply attac hing head information to deriv a- tions can also b e exploited in parsing formalisms
  • ther
than rewriting systems. The authors ha v e dev elop ed an O (n 7 )-time parsing algorithm for bilexicalized tree adjoining grammars (Sc hab es, 1992), impro ving the naiv e O (n 8 ) metho d. The results men tioned in x6 are related to the closure prop ert y
  • f
CF Gs under generalized se- quen tial mac hine mapping (Hop croft and Ull- man, 1979). This prop ert y also holds for
  • ur
class
  • f
bilexical CF Gs. References A. V. Aho and J. D. Ullman. 1972. The The
  • ry
  • f
Parsing, T r anslation and Compiling, v
  • lume
1. Pren tice-Hall, Englew
  • d
Clis, NJ. H. Alsha wi. 1996. Head automata and bilingual tiling: T ranslation with minimal represen tations. In Pr
  • c.
  • f
A CL, pages 167{176, San ta Cruz, CA. Y. Bar-Hillel. 1953. A quasi-arithmetical notation for syn tactic description. L anguage, 29:47{58. E. Charniak. 1997. Statistical parsing with a con text-free grammar and w
  • rd
statistics. In Pr
  • c.
  • f
the 14th AAAI, Menlo P ark. C. Chelba and F. Jelinek. 1998. Exploiting syn tac- tic structure for language mo deling. In Pr
  • c.
  • f
COLING-A CL. N. Chomsky . 1965. Asp e cts
  • f
the The
  • ry
  • f
Syntax. MIT Press, Cam bridge, MA. M. Collins and J. Bro
  • ks.
1995. Prep
  • sitional
phrase attac hmen t through a bac k ed-o mo del. In Pr
  • c.
  • f
the Thir d Workshop
  • n
V ery L ar ge Corp
  • r
a, Cam bridge, MA. M. Collins. 1997. Three generativ e, lexicalised mo d- els for statistical parsing. In Pr
  • c.
  • f
the 35th A CL and 8th Eur
  • p
e an A CL, Madrid, July . J. Eisner. 1996. An empirical comparison
  • f
proba- bilit y mo dels for dep endency grammar. T ec hnical Rep
  • rt
IR CS-96-11, IR CS, Univ.
  • f
P ennsylv ania. J. Eisner. 1997. Bilexical grammars and a cubic- time probabilistic parser. In Pr
  • c
e e dings
  • f
the 4th Int. Workshop
  • n
Parsing T e chnolo gies, MIT, Cam bridge, MA, Septem b er. R. C. Gonzales and M. G. Thomason. 1978. Syntac- tic Pattern R e c
  • gnition.
Addison-W esley, Read- ing, MA. M. A. Harrison. 1978. Intr
  • duction
to F
  • rmal
L an- guage The
  • ry.
Addison-W esley, Reading, MA. J. E. Hop croft and J. D. Ullman. 1979. Intr
  • duc-
tion to A utomata The
  • ry,
L anguages and Com- putation. Addison-W esley, Reading, MA. 7
slide-8
SLIDE 8 (a) @ @ q h j (h
  • j
, q 2 Q d h ) is deriv ed i d h : I x
  • !
q where w h+1;j 2 L x
  • q
i h s (i
  • h,
q 2 Q d h [ fF g, s 2
  • Q
d h ) is deriv ed i d h : q x
  • s
where w i;h1 2 L x H H q h h s (h < h , q 2 Q d h , s 2
  • Q
d h ) is deriv ed i d h : I xd h
  • !
q and d h : F y
  • s
where w h+1;h 1 2 L xy
  • q
h h s s (h < h, q 2 Q d h , s 2
  • Q
d h , s 2
  • Q
d h ) is deriv ed i d h : I x
  • !
s and d h : q d h y
  • s
where w h+1;h 1 2 L xy (b) St ar t : @ @ q h h q 2 I d h Mid:
  • s
h h s s 2
  • Q
d h Finish:
  • q
i h s
  • F
i h s q 2 F d h A tt a ch-Right : Q Q q h i
  • 1
  • F
i h s H H r h h s r 2
  • d
h (q ; d h ; !) Complete-Right : H H q h h s @ @ s h i @ @ q h i A tt a ch-Left : @ @ s h i
  • q
i + 1 h s
  • r
h h s s s 2
  • Q
d h ; r 2
  • d
h (q ; d h ; ) Complete-Left :
  • F
i h s
  • q
h h s s
  • q
i h s (c) Accept input w just if
  • F
1 h s and @ @ s h n are deriv ed for some h; s suc h that d h = $. Figure 3: An O (n 3 ) recognition algorithm for split head automaton grammars. The format is as in Figure 1, except that (c) giv es the acceptance condition. The follo wing notation indicates that a head automaton can consume a string x from its left
  • r
righ t input: a : q x
  • !
q means that (q ; ; x) `
  • a
(q ; ; ), and a : I x
  • !
q means this is true for some q 2 I a . Similarly , a : q x
  • q
means that (q ; x; ) `
  • a
(q ; ; ), and a : F x
  • q
means this is true for some q 2 F a . The sp ecial sym b
  • l
F also app ears as a literal in some items, and eectiv ely means \an unsp ecied nal state." M. Ka y . 1986. Algorithm sc hemata and data struc- tures in syn tactic pro cessing. In K. Sparc k Jones B. J. Grosz and B. L. W ebb er, editors, Natu- r al L anguage Pr
  • c
essing, pages 35{70. Kaufmann, Los Altos, CA. J. Laert y , D. Sleator, and D. T emp erley . 1992. Grammatical trigrams: A probabilistic mo del
  • f
link grammar. In Pr
  • c.
  • f
the AAAI Conf.
  • n
Pr
  • b
abilistic Appr
  • aches
to Nat. L ang., Octob er. D. Magerman. 1995. Statistical decision-tree mo d- els for parsing. In Pr
  • c
e e dings
  • f
the 33r d A CL. I. Mel'
  • cuk.
1988. Dep endency Syntax: The
  • ry
and Pr actic e. State Univ ersit y
  • f
New Y
  • rk
Press. C. P
  • llard
and I. Sag. 1994. He ad-Driven Phr ase Structur e Gr ammar. Univ ersit y
  • f
Chicago Press. Y. Sc hab es, A. Ab eill
  • e,
and A. Joshi. 1988. P arsing strategies with `lexicalized' grammars: Applica- tion to Tree Adjoining Grammars. In Pr
  • c
e e dings
  • f
COLING-88, Budap est, August. Yv es Sc hab es. 1992. Sto c hastic lexicalized tree- adjoining grammars. In Pr
  • c.
  • f
the 14th COL- ING, pages 426{432, Nan tes, F rance, August. C. S. W etherell. 1980. Probabilistic languages: A review and some
  • p
en questions. Computing Sur- veys, 12(4):361{379. D. H. Y
  • unger.
1967. Recognition and parsing
  • f
con text-free languages in time n 3 . Information and Contr
  • l,
10(2):189{208, F ebruary . 8