Proceedi ngs of National Conference on Artificia l Intellig - - PDF document

proceedi ngs of national conference on artificia l
SMART_READER_LITE
LIVE PREVIEW

Proceedi ngs of National Conference on Artificia l Intellig - - PDF document

Proceedi ngs of National Conference on Artificia l Intellig enc e (AAAI-92 ), San Jose, 1992, pp. 322-328. A Probabilistic P arser Applied to Soft w are T esting Do cumen ts Mark A. Jones Jason M. Eisner A


slide-1
SLIDE 1 Proceedi ngs
  • f
National Conference
  • n
Artificia l Intellig enc e (AAAI-92 ), San Jose, 1992, pp. 322-328. A Probabilistic P arser Applied to Soft w are T esting Do cumen ts Mark A. Jones A T&T Bell Lab
  • ratories
600 Moun tain Av en ue, Rm. 2B-435 Murra y Hill, NJ 07974{063 6 jones@researc h.att.com Jason M. Eisner Emman uel College, Cam bridge Cam bridge CB2 3AP England jme14@pho enix.cam bri dge.ac. uk Abstract W e describ e an approac h to training a statisti- cal parser from a brac k eted corpus, and demon- strate its use in a soft w are testing application that translates English sp ecications in to an au- tomated testing language. A gramm ar is not ex- plicitly sp ecied; the rules and con textual proba- bilities
  • f
  • ccurrence
are automatically generated from the corpus. The parser is extremely success- ful at pro ducing and iden tifying the correct parse, and nearly deterministic in the n um b er
  • f
parses that it pro duces. T
  • comp
ensate for undertrain- ing, the parser also uses general, linguistic sub- theories whic h aid in guessing some t yp es
  • f
no v el structures. In tro duction In constrained domains, natural language pro cessing can
  • ften
pro vide lev erage. In soft w are testing at A T&T, for example, 20,000 English test cases prescrib e the b eha vior
  • f
a telephone switc hing system. A test case consists
  • f
ab
  • ut
a dozen sen tences describing the goal
  • f
the test, the actions to p erform, and the con- ditions to v erify . Figure 1 sho ws part
  • f
a simple test case. Curren t practice is to execute the tests b y hand,
  • r
else hand-translate them in to a lo w-lev el, executable language for automatic testing. Co ding the tests in the executable language is tedious and error-prone, and the English v ersions m ust b e main tained an yw a y for read- abilit y . W e ha v e constructed a system called KITSS (Kno wledge-Based In teractiv e T est Script System), whic h can b e view ed as a system for mac hine-assisted translation from English to co de. Both the English test cases and the executable target language are part
  • f
a pre-existing testing en vironmen t that KITSS m ust t in to. The basic structure
  • f
the system is giv en in Figure 2. English test cases undergo a series
  • f
trans- lation steps, some
  • f
whic h are in teractiv ely guided b y a tester. The c
  • mpleteness
and inter action analyzer is the pragmatic comp
  • nen
t that understands the basic ax- GO AL: Activ ate CF A [call forw arding] using CF A Acti- v ation Access Co de. A CTION: Set station B2 without redirect notication. Station B2 go es
  • ho
  • k
and dials CF A Activ ation Access Co de. VERIFY: Station B2 receiv es the second dial tone. A CTION: Station B2 dials the extension
  • f
station B3. VERIFY: Station B2 receiv es conrmation tone. The status lamp asso ciated with the CF A button at B2 is lit. VERIFY: : : : Figure 1: An Example T est Case ioms and con v en tions
  • f
telephon y . Its task is to esh
  • ut
the test description pro vided b y the English sen- tences. This is c hallenging b ecause the sen tences
  • mit
man y implicit conditions and actions. In addition, some sen tences (\Mak e B1 busy") require the analyzer to create simple plans. The analyzer pro duces a for- mal description
  • f
the test, whic h the bac k-end tr ans- lator then renders as executable co de. A more com- plete description
  • f
the goals
  • f
the system, its arc hi- tecture and the soft w are testing problem can b e found in [Nonnenmann and Eddy 1992 ]. This pap er discusses the natur al language pr
  • c
essor
  • r
linguistic comp
  • nen
t, whic h m ust extract at least the surface con ten t
  • f
a highly referen tial, naturally
  • ccurring
text. The sen tences v ary in length, ranging from short sen tences suc h as \Station B3 go es
  • nho
  • k"
to 50 w
  • rd
sen tences con taining paren theticals, sub
  • r-
dinate clauses, and conjunction. The principal lev erage is that the discourse is reasonably w ell fo cused: a large, but nite, n um b er
  • f
telephonic concepts en ter in to a nite set
  • f
relationships. Natural Language Pro cessing in KITSS The KITSS linguistic comp
  • nen
t uses three t yp es
  • f
kno wledge to translate English sen tences quic kly and accurately in to a logical form:
slide-2
SLIDE 2

ENGLISH TEST CASES DOMAIN MODEL N L P R O C E S S O R T R A N S L A T O R I N T E R A C T I O N A N A L Y Z E R EXEC. TEST SCRIPTS C O M P L E T E N E S S & U S E R Interaction Transformation

Figure 2: KITSS Arc hitecture 1. syntactic: empirical statistics ab
  • ut
common con- structions 2. semantic: empirical statistics ab
  • ut
common con- cepts 3. r efer ential: exp ert kno wledge ab
  • ut
the logical rep- resen tation
  • f
concepts Figure 3 illustrates the syn tactic, seman tic, and logical represen tations computed for
  • ne
analysis
  • f
the sen tence \Place a call from station B1 to station B2." W e will not sa y m uc h here ab
  • ut
the referen- tial kno wledge that nally rewrites the surface seman- tic represen tation as temp
  • ral
logic. KITSS curren tly uses a hand-co ded pro duction system that includes linguistic rules (e.g., activ e-passiv e and conjunction), discourse rules (e.g., denite reference), and domain- sp ecic canonicalization rules. An undirected parser w
  • uld
generate man y alter- nativ e (incorrect) h yp
  • theses
regarding the structure and in terpretation
  • f
the sen tence in Figure 3. It migh t try attac hing the prep
  • sitional
phrases to the noun phrase \a call,"
  • r
treating \to station B2" as an innitiv e phrase. In designing a parsing tec hnique for the KITSS system, w e w an ted to exploit the sta- tistical regularities that mak e
  • ne
in terpretation far lik elier than
  • thers.
In line with
  • ur
earlier success
  • n
statistical error-correction for
  • ptical
c haracter rec-
  • gnizers
(OCR devices) [Jones et al 1991 ], w e sough t w a ys to \b
  • tstrap"
the acquisition
  • f
statistical do- main kno wledge|in this case, kno wledge ab
  • ut
the lik eliho
  • d
  • f
syn tactic and seman tic substructures in the test case sen tences. Note that initially w e ma y ha v e
  • nly
a corpus
  • f
ra w sen tences, not a corpus
  • f
their target syn tactic and seman tic structures. While it is impractical to hand-analyze a large p
  • rtion
  • f
the corpus, it is p
  • s-
sible to do a relativ ely small n um b er
  • f
sen tences b y hand, to get started, and then to use the parser itself as a to
  • l
to suggest analyses (or partial analyses) for further sen tences. A similar approac h to training is found in [Simmo ns 1991 ]. W e will assume b elo w that w e ha v e access to a training set
  • f
syn tactic and se- man tic structures. 1 Issues in Probabilistic P arsing T
  • generalize
from the training corpus to new sen- tences, w e will need to induce a go
  • d
statistical mo del
  • f
the language. But the statistical distributions in a natural language reect a great man y factors, some
  • f
them at
  • dds
with eac h
  • ther
in unexp ected w a ys. Chomsky's famous sen tence, \Colorless green ideas sleep furiously ," is syn tactically quite reasonable|but seman tic nonsense|but, for historical reasons, quite common in conference pap ers. Or consider the classic illustration
  • f
attac hmen t am biguit y: \I sa w a man in the park with a telescop e." One in terpretation
  • f
this sen tence holds that I used a telescop e to see a man. T
  • judge
the relativ e lik eliho
  • d,
w e ma y w an t to kno w ho w
  • ften
telescop es are used for \seeing" (vs. \sa wing"); ho w
  • ften
a v erb tak es t w
  • prep
  • sitional
phrases; who is most lik ely to ha v e a telescop e (me, the man,
  • r
the park); and so
  • n.
Th us man y features
  • f
a sen tence ma y b e signi- can t. Within a restricted domain suc h as KITSS, the distributions are further shap ed b y the domain sub ject matter and b y st ylistic con v en tions. Sen tences suc h as \Station B3 go es
  • nho
  • k"
ma y b e rare in the newspa- p er but common in the KITSS application. F
  • r
sta- tions to \go" is a test script idiom. W e w an t
  • ur
statistics to capture more than the syn- tactic correlations. Our strategy is to build up ric h in- terpretations
  • f
the sen tences as w e are parsing them. W e tak e care to in terpret ev ery subtree that w e gener- ate. Th us, when w e are deciding whether to com bine t w
  • subtrees
later
  • n,
w e will kno w what the subtrees \mean." F urthermore, w e will ha v e seman tic readings for
  • ther,
p
  • ssibly
relev an t p
  • rtions
  • f
the sen tence. The seman tic information helps to exp
  • se
deep sim- ilarities and deep dierences among sen tences. Tw
  • trees
that are seman tically similar are lik ely to com- bine in similar w a ys. With seman tic in terpretations, w e directly represen t the fact that in
  • ne
h yp
  • thesis
the telescop e is used for seeing. This fact is
  • bscured
in the corresp
  • nding
syn tactic tree|and ev en more so in the
  • riginal
sen tence, where \sa w" and \telescop e" app ear far apart. F
  • rmally
, let
  • b
e a giv en space
  • f
p
  • ssible
inter- pr etations. W e mo del a phrase structure rule, L k ! 1 In KITSS,
  • nly
the syn tactic brac k eting is ev er fully man ual. The system automatically constructs a seman tics for eac h training example from its syn tax, using a set
  • f
translation rules. Most
  • f
these rules are inferred from a default theory
  • f
syn tactic-seman tic t yp e corresp
  • ndences.
slide-3
SLIDE 3 String: Place a call from station B1 to station B2 . Syntax: (SP (S (VP (VP (VP (VB "Place") (NP (AT "a") (NN "call"))) (PP (IN "from") (NP (NN "station") (NPR "B1")))) (PP (IN "to") (NP (NN "station") (NPR "B2"))))) (\. ".")) Semantics: (PLACE (:OBJECT (CALL (:NUMBER SING) (:REF A))) (:FROM (STATION (:NUMBER SING) (:NAME "B1"))) (:TO (STATION (:NUMBER SING) (:NAME "B2"))) (:MOOD DECL) ...) Logic: ((OCCURS (PLACES-C AL L B1 B2 CALL-812)) ) Figure 3: Represen tations F
  • rmed
in Pro cessing a Sen tence R 1 R 2 : : : R m , as an m-ary function taking v alues in . R 1 : : : R m giv e t yp e restrictions
  • n
the m argumen ts. The function describ es ho w to build an in terpretation
  • f
t yp e L from m con tiguous substring in terpretations
  • f
t yp es R 1 : : : R m . The rule n um b er k distinguishes rules that ha v e the same domain and range, but dif- fer functionally (e.g., in the case
  • f
a noun with t w
  • meanings);
the c hart main tains distinct h yp
  • theses
for the alternativ e in terpretations. In practice,
  • ur
  • consists
  • f
join t syn tactic-seman tic in terpretations. The syn tactic half
  • f
an in terpretation is simply the parse tree. The seman tic half is built comp
  • sitionall
y from lexically-deriv ed heads, slots and llers in a standard frame language, as illustrated in Figure 3. Exp erimen ts conrm the v alue
  • f
this approac h for statistical parsing. When w e run
  • ur
parser with se- man tics turned
  • ,
its syntactic accuracy rate drops from 99% to 66%, and it runs far more slo wly . The KITSS Algorithm The KITSS parsing algorithm (giv en as Algorithm 1 in App endix A) is a v arian t
  • f
tabular
  • r
c hart parsing metho ds for con text-free languages [Co c k e and Sc h w artz 1970 , Earley 1970, Graham et al 1980 ]. It scans the sen tence from left to righ t, assem bling p
  • ssible
partial in terpre- tations
  • f
the sen tence; but it con tin ually discards in- terpretations that are statistically unlik ely . The grammar rules and statistics are generated au- tomatically b y training
  • n
a brac k eted corpus. The gramm ar is tak en to b e the smallest set
  • f
sym b
  • ls
and rules needed to write do wn all the parse trees in the corpus. The statistics are con text-sensitiv e; they concern the frequencies with whic h the in terpreted sub- trees co-o ccur. Incremen tal training is p ermitted. The mo del is that the system considers a new sample sen- tence, up dates its database, and thro ws the sen tence a w a y . A grammar is giv en b y G = (V ; ; P ; S ), where V is the v
  • cabulary
  • f
all sym b
  • ls,
  • is
the set
  • f
terminal sym b
  • ls,
P is the set
  • f
rules, and S is the start sym b
  • l.
The start sym b
  • l
is restricted to b e non-recursiv e. A distinguished start sym b
  • l
(e.g., R OOT ) can b e added to the gramma r if necessary . F
  • r
an input sen tence w = a 1 a 2 : : : a jw j (a i 2 ), let w i;j denote the substring a i+1 : : : a j . F
  • r
example, w 0;3 denotes the rst three w
  • rds.
The algorithm
  • p
erates b
  • ttom-up
from left to righ t in the input string. A t eac h p
  • in
t j in the input string, the algorithm constructs h yp
  • theses
ab
  • ut
the imme- diately preceding substrings w i;j . A c
  • mplete
hyp
  • th-
esis is a parse tree for some substring
  • f
the sen tence; w e write it as [L r 1 r 2 : : : r m ], where L ! R 1 R 2 : : : R m is a rule in P and eac h subtree r i is itself a complete h yp
  • thesis
with ro
  • t
R i . An inc
  • mplete
hyp
  • thesis
is similar, except it is missing
  • ne
  • r
more
  • f
its righ tmost branc hes. W e write it as [L r 1 r 2 : : : r q +];
  • q
< m. W e use the notation h i;j to refer to a h yp
  • thesis
that dominates the string w i;j . If a h yp
  • thesis
h i;j is judged to b e lik ely , it is added to a set t i;j in a (jw j + 1)
  • (jw
j + 1) c hart t. \Empt y" h yp
  • theses,
whic h are created directly from the gramm ar, ha v e the form [L +]. \Input" h y- p
  • theses
just assert the existence
  • f
a i and are com- plete; these are usually assigned probabilit y 1, but nor- malized sets
  • f
input h yp
  • theses
could b e used in noisy recognition en vironmen ts suc h as sp eec h
  • r
OCR. Longer h yp
  • theses
are created b y the
  • p
erator, whic h attac hes a new c hild to a tree. The
  • pro
duct
  • f
t w
  • h
yp
  • theses
is the smallest set resp ecting the condition that whenev er ( h i;j = [L r 1 : : : r q +]; h j;k = r q +1 ; and (L ! R 1 : : : R q R q +1 : : : R m ) 2 P then
  • [L
r 1 : : : r q r q +1 +] 2 (h i;j
  • h
j;k ) if q + 1 < m [L r 1 : : : r m ] 2 (h i;j
  • h
j;k ) if q + 1 = m Note that
  • returns
a set
  • f
0, 1,
  • r
2 h yp
  • theses.
The rst argumen t
  • f
  • is
  • rdinarily
an incomplete h yp
  • thesis,
while the second is a complete h yp
  • the-
sis immediately to its righ t. Otherwise
  • returns
the empt y set. The
  • p
erator can easily b e extended to act
  • n
sets and c harts: Q
  • R
:= S fh
  • h
j h 2 Q; h 2 Rg t
  • R
:= ( S t i;j )
  • R
The algorithm returns the set
  • f
complete h yp
  • theses
in t 0;jw j whose ro
  • ts
are S , the start sym b
  • l.
Eac h
  • f
these parses has an asso ciated probabilit y .
slide-4
SLIDE 4 V = fVP ; V ; \Plac e" ; NP ; Det ; \a" ; N ; \c al l" : : : g
  • =
f\Plac e"; \a" ; \c al l" : : : g P = fVP ! V NP ; NP ! Det N ; V ! \Plac e"; Det ! \a" ; N ! \c al l" : : :g S = VP VP / \ / NP / / \ V Det N | | | "Place" "a" "call" Figure 4: A P arse T ree for w = \Plac e" \a" \c al l" During the parsing pro cess, a left-c
  • ntext
pr
  • b
ability
  • r
LCP , Pr(h i;j j w 0;j ), is used to prune sets
  • f
com- p eting h yp
  • theses.
Pruning sev erit y dep ends
  • n
the b e am width, <
  • 1.
A b eam width
  • f
10 2 k eeps
  • nly
those alternativ e h yp
  • theses
that are judged at least 1% as lik ely as the leading con tender in the set. The correct parse can surviv e
  • nly
if all
  • f
its con- stituen t h yp
  • theses
meet this criterion; th us
  • p
era- tionally determines the set
  • f
garden path sen tences for the parser. If an y correct h yp
  • thesis
is pruned, then the correct parse will not b e found (indeed, p erhaps no parse will b e found). This can happ en in garden path sen tences. It ma y also happ en if the statistical database pro vides an inadequate
  • r
incomplete mo del
  • f
the language. Probabilit y Calculations The parsing algorithm k eeps
  • r
discards a h yp
  • the-
sis according to the left-con text probabilit y Pr (h i;j j w 0;j ). The more accurate this v alue, the b etter w e will do at pruning the searc h space. Ho w can w e compute it without assuming con text-freeness? W e are able to decomp
  • se
the probabilit y in to a pro duct
  • f
corpus statistics (whic h w e lo
  • k
up in a xed hash table) and the LCPs
  • f
  • ther
h yp
  • theses
(whic h w e computed earlier in the parse). Space pre- v en ts us from giving the formal deriv ation. Instead w e will w
  • rk
through part
  • f
an example. Figure 4 giv es a small gramma r fragmen t, with a p
  • ssible
parse tree for a short sen tence. F
  • r
con v e- nience, w e will name v arious trees and subtrees as fol- lo ws: plac e = \Plac e" 0;1 a = \a" 1;2 c al l = \c al l" 2;3 v = [V \Plac e" ] 0;1 det = [Det \a" ] 1;2 n = [N \c al l" ] 2;3 np 1 = [NP [Det \a" ] +] 1;2 np = [NP [Det \a" ] [N \c al l" ]] 1;3 vp 1 = [VP [V \Plac e" ] +] 0;1 vp = [VP [V \Plac e" ] [NP [Det \a" ] [N \c al l" ]]] 0;3 These trees corresp
  • nd
to the h yp
  • theses
  • f
the pre- vious section. Note carefully that|for example|the tree vp 2 vp 1
  • np
is correct if and
  • nly
if the trees vp 1 and np are also correct. W e will use this fact so
  • n.
Left-Con text Probabilities W e b egin with some remarks ab
  • ut
Pr (np j w 0;3 ), the LCP that np is the correct in terpretation
  • f
w 1;3 . This probabilit y dep ends
  • n
the rst w
  • rd
  • f
the sen tence, w 0;1 , and in particular
  • n
the in terpretation
  • f
w 0;1 . (F
  • r
example: if the statistics suggest that \Place" is a noun rather than a v erb, the np h yp
  • thesis
ma y b e unlik ely .) The correct computation is Pr (np j w 0;3 ) = Pr(vp 1 & np j w 0;3 ) (1) + Pr (X & np j w 0;3 ) + Pr (Y & np j w 0;3 ) + : : : where vp 1 ; X ; Y ; : : : are a set
  • f
(m utually exclusiv e) p
  • ssible
explanations for \Place." The summands in equation 1 are t ypical terms in
  • ur
deriv ation. They are LCPs for chains
  • f
  • ne
  • r
more con tiguous h yp
  • the-
ses. No w let us skip ahead to the end
  • f
the sen tence, when the parser has nished building the complete tree vp . W e decomp
  • se
this tree's LCP as follo ws: Pr (vp j w 0;3 ) = Pr(vp & vp 1 & np j w 0;3 ) (2) = Pr(vp j vp 1 & np & w 0;3 )
  • Pr(vp
1 & np j w 0;3 ) The rst factor is the lik eliho
  • d
that vp 1 and np , if they are in fact correct, will com bine to mak e the big- ger tree vp 2 vp 1
  • np
. W e appro ximate it empirically , as discussed in the next section. As for the second factor, the parser has already found it! It app eared as
  • ne
  • f
the summands in (1), whic h the parser used to nd the LCP
  • f
np . It de- comp
  • ses
as Pr (vp 1 & np j w 0;3 ) (3) = Pr (vp 1 & np & np 1 & n j w 0;3 ) = Pr (np j vp 1 & np 1 & n & w 0;3 )
  • Pr
(vp 1 & np 1 & n j w 0;3 ) The situation is exactly as b efore. W e estimate the rst factor empirically , and w e ha v e already found the second as Pr(vp 1 & np 1 & n j w 0;3 ) (4) = Pr (vp 1 & np 1 & n & c al l j w 0;3 ) = Pr (n j vp 1 & np 1 & c al l & w 0;3 )
  • Pr(vp
1 & np 1 & c al l j w 0;3 )
slide-5
SLIDE 5 A t this p
  • in
t the recursion b
  • ttoms
  • ut,
since c al l cannot b e decomp
  • sed
further. T
  • nd
the second fac- tor w e in v
  • k
e Ba y es' theorem: Pr(vp 1 & np 1 & c al l j w 0;3 ) (5) = Pr(vp 1 & np 1 & c al l j w 0;2 & c al l ) = Pr(vp 1 & np 1 j w 0;2 & c al l ) = Pr(c al l j vp 1 & np 1 & w 0;2 )
  • Pr
(vp 1 & np 1 j w 0;2 ) P X Pr(c al l j X & w 0;2 )
  • Pr
(X j w 0;2 ) The sum in the denominator is
  • v
er all c hains X , in- cluding vp 1 & np 1 , that comp ete with eac h
  • ther
to ex- plain the input w 0;2 = \Plac e a" . Note that for eac h X , the LCP
  • f
X & c al l will ha v e the same denomina- tor. In b
  • th
the n umerator and the denominator, the rst factor is again estimated from corpus statistics. And again, the second factor has already b een computed. F
  • r
example, Pr (vp 1 & np 1 j w 0;2 ) is a summand in the LCP for np 1 . Note that thanks to the Ba y esian pro cedure, this is indeed a left-con text probabilit y: it do es not dep end
  • n
the w
  • rd
\call," whic h falls to the righ t
  • f
np 1 . Corpus Statistics The recursiv e LCP computation do es nothing but m ul- tiply together some empirical n um b ers. Where do these n um b ers come from? Ho w do es
  • ne
estimate a v alue lik e Pr(vp j vp 1 & np & w 0;3 )? The condition w 0;3 is redundan t, since the w
  • rds
also app ear as the lea v es
  • f
the c hain vp 1 & np . So the ex- pression simplies to Pr (vp j vp 1 & np ). This is the probabilit y that, if vp 1 and np are correct in an ar- bitrary sen tence, vp is also correct. (Consisten t alter- nativ es to vp = [VP v np ] migh t include [VP v np pp ] and [VP v [NP np pp ]], where pp is some prep
  • sitional
phrase.) In theory ,
  • ne
could nd this v alue directly from the brac k eted corpus: (a) In the 3 brac k eted training sen tences (sa y) where
  • A
= [VP [V \Plac e" ] +] 0;1 app ears
  • B
= [NP [Det \a" ] [N \c al l" ]] 1;3 app ears in what fraction do es [VP [V \Plac e" ] [NP [Det \a" ] [N \c al l" ]]] 0;3 app ear? Ho w ev er, suc h a question is to
  • sp
ecic to b e practical: 3 sen tences is uncomfortably close to 0. T
  • ensure
that
  • ur
samples are large enough, w e broaden
  • ur
question. W e migh t ask instead: (b) Among the 250 training sen tences with
  • A
= [VP [V : : : ] +] i;j
  • B
= [NP : : : ] j;k (some i < j < k ) in what fraction is B the second c hild
  • f
A? Alternativ ely , w e migh t tak e a more seman tic approac h and ask (c) Among the 20 training sen tences with
  • A
= a subtree from i to j
  • B
= a subtree from j to k
  • A
has seman tic in terpretation A = (v-PLA CE : : : )
  • B
has seman tic in terpretation B = (n-CALL : : : ) in what fraction do es B ll the O B J E C T role
  • f
A ? Questions (b) and (c) b
  • th
consider more sen tences than (a). (b) considers a wide selection
  • f
sen tences lik e \The
  • p
erator activ ates CF A : : : ." (c) connes itself to sen tences lik e \The
  • p
erator places t w
  • priorit
y calls : : : ." As a practical matter, an estimate
  • f
Pr(vp j vp 1 & np ) should probably consider some syn tactic and some seman tic information ab
  • ut
vp 1 and np . Our cur- ren t approac h is essen tially to com bine questions (b) and (c). That is, w e fo cus
  • ur
atten tion
  • n
corpus sen tences that satisfy the conditions
  • f
b
  • th
questions sim ultaneously . Limiting Chain Length The previous sections refer to man y long c hains
  • f
h y- p
  • theses:
Pr(vp 1 & np 1 & c al l j w 0;3 ) Pr(n j vp 1 & np 1 & w 0;3 ) In p
  • in
t
  • f
fact, ev ery c hain w e ha v e built extends bac k to the start
  • f
the sen tence. But this is unacceptable: it means that parsing time and space gro w exp
  • nen
tially with the length
  • f
the sen tence. Researc h in probabilistic parsing
  • ften
a v
  • ids
this problem b y assuming a sto c hastic con text-free lan- guage (SCF G). In this case, Pr(vp 1 & np 1 & c al l j w 0;3 ) = Pr (vp 1 j w 0;1 )
  • Pr(np
1 j w 1;2 )
  • Pr(c
al l j w 2;3 ) and Pr(n j vp 1 & np 1 & w 0;3 ) = Pr (n j w 2;3 ). This assumption w
  • uld
greatly condense
  • ur
compu- tation and
  • ur
statistical database. Unfortunately , for natural languages SCF Gs are a p
  • r
mo del. F
  • llo
w- ing the deriv ation S )
  • John
solve d the N , the prob- abilit y
  • f
applying rule fN ! sh g is appro ximately zero|though it ma y b e quite lik ely in
  • ther
con texts. But
  • ne
ma y limit the length
  • f
c hains less dras- tically , b y making w eak er indep endence assumptions. With appropriate mo dications to the form ulas, it is p
  • ssible
to arrange that c hain length nev er exceeds some constan t L
  • 1.
Setting L = 1 yields the con text- free case. Our parser renes this idea
  • ne
step further: within the b
  • und
L, c hain length v aries dynamically . F
  • r
in- stance, supp
  • se
L = 3. In the parse tree [VP [V \Plac e" ] [NP [Det \a" ] [A dj \priority" ] [N \c al l" ]]],
slide-6
SLIDE 6 w e do compute an LCP for the en tire c hain vp 1 & np 2 & n, but the c hain vp 1 & np 1 & adj is consid- ered \to
  • long."
Wh y? The parser sees that adj will b e buried t w
  • lev
els deep in b
  • th
the syn tactic and seman tic trees for vp . It concludes that Pr (adj j vp 1 & np 1 )
  • Pr
(adj j np 1 ), and uses this heuristic assumption to sa v e w
  • rk
in sev- eral places. In general w e ma y sp eak
  • f
high-fo cus and low-fo cus parsing. The fo cus ac hiev ed during a parse is dened as the ratio
  • f
useful w
  • rk
to total w
  • rk.
Th us a high-fo cus parser is
  • ne
that prunes aggressiv ely: it do es not allo w incorrect h yp
  • theses
to proliferate. A completely deterministic parser w
  • uld
ha v e 100% fo- cus. Non-probabilistic parsers t ypically ha v e lo w fo cus when applied to natural languages. Our parsing algorithm allo ws us to c ho
  • se
b et w een high- and lo w-fo cus strategies. T
  • ac
hiev e high fo cus, w e need a high v alue
  • f
  • (aggressiv
e pruning) and ac- curate probabilities. This will prev en t a com binatorial explosion. T
  • the
exten t that accurate probabilities re- quire long c hains and complicated form ulas, the useful w
  • rk
  • f
the parser will tak e more time. On the
  • ther
hand, a greater prop
  • rtion
  • f
the w
  • rk
will b e useful. In practice, w e nd that ev en L = 2 yields go
  • d
fo cus in the KITSS domain. Although the L = 2 prob- abilities undoubtedly sacrice some degree
  • f
accuracy , they still allo w us to prune most incorrect h yp
  • theses.
Related issues
  • f
mo dularit y and lo cal nondetermin- ism are also discussed in the psyc holinguistic literature (e.g., [T anenhaus et al 1985 ]). Incomplete Kno wledge Natural language systems tend to b e plagued b y the p
  • ten
tial
  • p
en-endedness
  • f
the kno wledge required for understanding. Incompleteness is a problem ev en for fully hand-co ded systems. The correlate for statistical sc hemes is the problem
  • f
undertraining. In
  • ur
task, for example, w e do not ha v e a large sample
  • f
syn- tactic and seman tic brac k etings. T
  • comp
ensate, the parser utilizes hand-co ded kno wledge as w ell as statis- tical kno wledge. The hand-co ded kno wledge expresses general kno wledge ab
  • ut
linguistic subtheories suc h as part
  • f
sp eec h rules
  • r
co
  • rdination.
As an example, consider the problem
  • f
assign- ing parts
  • f
sp eec h to no v el w
  • rds.
Sev eral sources
  • f
kno wledge ma y suggest the correct part
  • f
sp eec h class|e.g., the left-con text
  • f
the the no v el w
  • rd,
the relativ e
  • p
enness
  • r
closedness
  • f
the classes, morpho- logical evidence from axation, and
  • rthographic
con- v en tions suc h as capitalization. The parser com bines the v arious forms
  • f
evidence (using a simplied rep- resen tation
  • f
the probabilit y densit y functions) to as- sign a priori probabilities to no v el part
  • f
sp eec h rules. Using this tec hnique, the parser tak es a no v el sen- tence suc h as \XX YY go es ZZ," and deriv es syn tactic and seman tic represen tations analogous to \Station B1 go es
  • ho
  • k."
The curren t system in v
  • k
es these ad ho c kno wledge sources when it fails to nd suitable alternativ es using its empirical kno wledge. Clearly , there could b e a more con tin uous strategy . Ev en tually , w e w
  • uld
lik e to see v ery general linguistic principles (e.g., metarules and
  • X
theory [Jac k endo 1977 ]) and p
  • w
erful informan ts suc h as w
  • rld
kno wledge pro vide a priori biases that w
  • uld
allo w the parser to guess gen uinely no v el struc- tures. Ultimately , this kno wledge ma y serv e as an in- ductiv e bias for a completely b
  • tstrapping,
learning v ersion
  • f
the system. Results and Summary The KITSS system is implem en ted in F ranz Com- mon Lisp running
  • n
Sun w
  • rkstations.
The system as a whole has no w pro duced co de for more than 70 complete test cases con taining h undreds
  • f
sen tences. The parsing comp
  • nen
t w as trained and ev aluated
  • n
a brac k eted corpus
  • f
429 sen tences from 40 test cases. The brac k etings use part
  • f
sp eec h tags from the Bro wn Corpus [F rancis and Kucera 1982 ] and tra- ditional phrase structure lab els (S , NP , etc.). Because adjunction rules suc h as VP ! VP PP are common, the trees tend to b e deep (9 lev els
  • n
a v erage). The brac k eted corpus con tains 308 distinct lexical items whic h participate in 355 part
  • f
sp eec h rules. There are 55 distinct non terminal lab els, including 35 parts
  • f
sp eec h. There are a total
  • f
13,262 constituen t de- cisions represen ted in the corpus. W e ha v e studied the accuracy
  • f
the parser in using its statistics to reparse the training sen tences correctly . Generally , w e run the exp erimen ts using an adaptiv e b eam sc hedule; progressiv ely wider b eam widths are tried (e.g., 10 1 , 10 2 , and 10 3 ) un til a parse is
  • b-
tained. (By comparison, appro ximately 85% more h y- p
  • theses
are generated if the system is run with a xed b eam
  • f
10 3 .) F
  • r
62%
  • f
the sen tences some parse w as rst found at
  • =
10 1 , for 32% at 10 2 , and for 6% at 10 3 . F
  • r
  • nly
  • ne
sen tence w as no parse found within
  • =
10 3 . F
  • r
the
  • ther
428 sen tences, the parser alw a ys reco v ered the correct parse, and cor- rectly iden tied it in 423 cases. In the 428 top-rank ed parse trees, 99.92%
  • f
the brac k eting decisions w ere correct (less than
  • ne
error in 1000). Signican tly , the parser pro duced
  • nly
1.02 parses p er sen tence. Of the h yp
  • theses
that surviv ed pruning and w ere added to the c hart, 68% actually app eared as parts
  • f
the nal parse. W e ha v e also done a limited n um b er
  • f
randomized split-corpus studies to ev aluate the parser's general- ization abilit y . After sev eral h undred sen tences in the telephon y domain, ho w ev er, the v
  • cabulary
con tin ues to gro w and new structural rules are also
  • ccasionally
needed. W e ha v e conducted a small-scale test that ensured that the test sen tences w ere no v el but gram- matical under the kno wn rules. With 211 sen tences for training and 147 for testing, parses w ere found for 77%
  • f
the test sen tences:
  • f
these, the top-rank ed
slide-7
SLIDE 7 parse w as correct 90%
  • f
the time, and 99.3%
  • f
the brac k eting decisions w ere correct. Using 256 sen tences for training and 102 sen tences for testing, the parser p erformed p erfectly
  • n
the test set. In conclusion, w e b eliev e that the KITSS applica- tion demonstrates that it is p
  • ssible
to create a robust natural language pro cessing system that utilizes b
  • th
distributional kno wledge and general linguistic kno wl- edge. Ac kno wledgemen ts Other ma jor con tributors to the KITSS system include V an Kelly , Uw e Nonnenmann, Bob Hall, John Eddy , and Lori Alp erin Resnic k. App endix A: The KITSS P arsing Algorithm Algorithm 1. P ARSE(w ): (* create a (jw j + 1)
  • (jw
j + 1) c hart t = (t i;j ): *) t 0;0 := f[S +]g; for j := 1 to jw j do R 1 := t j 1;j := fa j g; R 2 := ;; while R 1 6= ; R := PR UNE((t
  • R
1 ) [ (PREDICT(R 1 )
  • R
1 ) [ R 2 ); R 1 := R 2 := ;; for h i;j 2 R do t i;j := t i;j [ fh i;j g; if h i;j is complete then R 1 := R 1 [ fh i;j g else R 2 := R 2 [ fh i;j g endfor endwhile endfor; return f all complete S
  • h
yp
  • theses
in t 0;jw j g Subroutine PREDICT(R): return f[A +] j A ! B 1 : : : B m is in P , some complete B 1
  • h
yp
  • thesis
is in R, and A 6= S g Subroutine PR UNE(R): (*
  • nly
lik ely rules are k ept *) R := ;; thr eshol d :=
  • max
h i;j 2R Pr(h i;j j w 0;j ); for h i;j in R do if Pr(h i;j j w 0;j )
  • thr
eshol d then R := R [ fh i;j g endfor; return R References [Co c k e and Sc h w artz 1970] Co c k e, J. and Sc h w artz, J.I. 1970. Pr
  • gr
amming L anguages and Their Com- pilers. New Y
  • rk:
Couran t Institute
  • f
Mathemati- cal Sciences, New Y
  • rk
Univ ersit y . [Earley 1970] Earley , J. 1970. An Ecien t Con text- F ree P arsing Algorithm. Communic ations
  • f
the A CM 13(2): 94-102. [F rancis and Kucera 1982] F rancis, W. and Kucera, H. 1982. F r e quency A nalysis
  • f
English Usage. Boston: Hough ton Miin. [Graham et al 1980] Graham, S.L., Harrison, M.A. and Ruzzo, W.L. 1980. An Impro v ed Con text-F ree Recognizer. A CM T r ansactions
  • n
Pr
  • gr
amming L anguages and Systems 2(3):415-463. [Jac k endo 1977] Jac k endo, R. 1977.
  • X
Syntax: A Study
  • f
Phr ase Structur e. Cam bridge, MA.: MIT Press. [Jones et al 1991] Jones, M.A., Story , G.A., and Bal- lard, B.W. 1991. Using Multiple Kno wledge Sources in a Ba y esian OCR P
  • st-Pro
cessor. In First In- ternational Confer enc e
  • n
Do cument A nalysis and R etrieval, 925-933. St. Malo, F rance: AF CET{ IRISA/INRIA. [Nonnenmann and Eddy 1992] Nonnenmann, U., and Eddy J.K. 1992. KITSS
  • A
F unctional Soft w are T esting System Using a Hybrid Domain Mo del. In Pr
  • c.
  • f
8th IEEE Confer enc e
  • n
A rticial Intel li- genc e Applic ations. Mon terey , CA: IEEE. [Simmo ns 1991] Simmo ns, R. and Y u, Y. 1991. The Acquisition and Application
  • f
Con text Sensitiv e Grammar for English. In Pr
  • c.
  • f
the 29th A nnual Me eting
  • f
the Asso ciation for Computational Lin- guistics, 122-129. Berk eley , California: Asso ciation for Computational Linguistics. [T anenhaus et al 1985] T anenhaus, M.K., Carlson, G.N. and Seiden b erg, M.S. 1985. Do Listeners Com- pute Linguistic Represen tations? In Natur al L an- guage Parsing (eds. D. R. Do wt y , L. Karttunen and A.M. Zwic ky), 359-408. Cam bridge Univ ersit y Press: Cam bridge, England.