[PDF] - Proceedi ngs of National Conference on Artificia l Intellig PDF Document

SLIDE 1 Proceedi ngs

f

National Conference

n

Artificia l Intellig enc e (AAAI-92 ), San Jose, 1992, pp. 322-328. A Probabilistic P arser Applied to Soft w are T esting Do cumen ts Mark A. Jones A T&T Bell Lab

ratories

600 Moun tain Av en ue, Rm. 2B-435 Murra y Hill, NJ 07974{063 6 jones@researc h.att.com Jason M. Eisner Emman uel College, Cam bridge Cam bridge CB2 3AP England jme14@pho enix.cam bri dge.ac. uk Abstract W e describ e an approac h to training a statistical parser from a brac k eted corpus, and demon- strate its use in a soft w are testing application that translates English sp ecications in to an au- tomated testing language. A gramm ar is not ex- plicitly sp ecied; the rules and con textual probabilities

f
ccurrence

are automatically generated from the corpus. The parser is extremely success- ful at pro ducing and iden tifying the correct parse, and nearly deterministic in the n um b er

f

parses that it pro duces. T

comp

ensate for undertraining, the parser also uses general, linguistic subtheories whic h aid in guessing some t yp es

f

no v el structures. In tro duction In constrained domains, natural language pro cessing can

ften

pro vide lev erage. In soft w are testing at A T&T, for example, 20,000 English test cases prescrib e the b eha vior

f

a telephone switc hing system. A test case consists

f

ab

ut

a dozen sen tences describing the goal

f

the test, the actions to p erform, and the conditions to v erify . Figure 1 sho ws part

f

a simple test case. Curren t practice is to execute the tests b y hand,

r

else hand-translate them in to a lo w-lev el, executable language for automatic testing. Co ding the tests in the executable language is tedious and error-prone, and the English v ersions m ust b e main tained an yw a y for read- abilit y . W e ha v e constructed a system called KITSS (Kno wledge-Based In teractiv e T est Script System), whic h can b e view ed as a system for mac hine-assisted translation from English to co de. Both the English test cases and the executable target language are part

f

a pre-existing testing en vironmen t that KITSS m ust t in to. The basic structure

f

the system is giv en in Figure 2. English test cases undergo a series

f

translation steps, some

f

whic h are in teractiv ely guided b y a tester. The c

mpleteness

and inter action analyzer is the pragmatic comp

nen

t that understands the basic ax- GO AL: Activ ate CF A [call forw arding] using CF A Acti- v ation Access Co de. A CTION: Set station B2 without redirect notication. Station B2 go es

ho
k

and dials CF A Activ ation Access Co de. VERIFY: Station B2 receiv es the second dial tone. A CTION: Station B2 dials the extension

f

station B3. VERIFY: Station B2 receiv es conrmation tone. The status lamp asso ciated with the CF A button at B2 is lit. VERIFY: : : : Figure 1: An Example T est Case ioms and con v en tions

f

telephon y . Its task is to esh

ut

the test description pro vided b y the English sen- tences. This is c hallenging b ecause the sen tences

mit

man y implicit conditions and actions. In addition, some sen tences (\Mak e B1 busy") require the analyzer to create simple plans. The analyzer pro duces a formal description

f

the test, whic h the bac k-end tr ans- lator then renders as executable co de. A more complete description

f

the goals

f

the system, its arc hitecture and the soft w are testing problem can b e found in [Nonnenmann and Eddy 1992 ]. This pap er discusses the natur al language pr

c

essor

r

linguistic comp

nen

t, whic h m ust extract at least the surface con ten t

f

a highly referen tial, naturally

ccurring

text. The sen tences v ary in length, ranging from short sen tences suc h as \Station B3 go es

nho
k"

to 50 w

rd

sen tences con taining paren theticals, sub

r-

dinate clauses, and conjunction. The principal lev erage is that the discourse is reasonably w ell fo cused: a large, but nite, n um b er

f

telephonic concepts en ter in to a nite set

f

relationships. Natural Language Pro cessing in KITSS The KITSS linguistic comp

nen

t uses three t yp es

f

kno wledge to translate English sen tences quic kly and accurately in to a logical form:

SLIDE 2

ENGLISH TEST CASES DOMAIN MODEL N L P R O C E S S O R T R A N S L A T O R I N T E R A C T I O N A N A L Y Z E R EXEC. TEST SCRIPTS C O M P L E T E N E S S & U S E R Interaction Transformation

Figure 2: KITSS Arc hitecture 1. syntactic: empirical statistics ab

ut

common con- structions 2. semantic: empirical statistics ab

ut

common concepts 3. r efer ential: exp ert kno wledge ab

ut

the logical represen tation

f

concepts Figure 3 illustrates the syn tactic, seman tic, and logical represen tations computed for

ne

analysis

f

the sen tence \Place a call from station B1 to station B2." W e will not sa y m uc h here ab

ut

the referen- tial kno wledge that nally rewrites the surface semantic represen tation as temp

ral

logic. KITSS curren tly uses a hand-co ded pro duction system that includes linguistic rules (e.g., activ e-passiv e and conjunction), discourse rules (e.g., denite reference), and domain- sp ecic canonicalization rules. An undirected parser w

uld

generate man y alternativ e (incorrect) h yp

theses

regarding the structure and in terpretation

f

the sen tence in Figure 3. It migh t try attac hing the prep

sitional

phrases to the noun phrase \a call,"

r

treating \to station B2" as an innitiv e phrase. In designing a parsing tec hnique for the KITSS system, w e w an ted to exploit the statistical regularities that mak e

ne

in terpretation far lik elier than

thers.

In line with

ur

earlier success

n

statistical error-correction for

ptical

c haracter rec-

gnizers

(OCR devices) [Jones et al 1991 ], w e sough t w a ys to \b

tstrap"

the acquisition

f

statistical domain kno wledge|in this case, kno wledge ab

ut

the lik eliho

d
f

syn tactic and seman tic substructures in the test case sen tences. Note that initially w e ma y ha v e

nly

a corpus

f

ra w sen tences, not a corpus

f

their target syn tactic and seman tic structures. While it is impractical to hand-analyze a large p

rtion
f

the corpus, it is p

s-

sible to do a relativ ely small n um b er

f

sen tences b y hand, to get started, and then to use the parser itself as a to

l

to suggest analyses (or partial analyses) for further sen tences. A similar approac h to training is found in [Simmo ns 1991 ]. W e will assume b elo w that w e ha v e access to a training set

f

syn tactic and seman tic structures. 1 Issues in Probabilistic P arsing T

generalize

from the training corpus to new sen- tences, w e will need to induce a go

d

statistical mo del

f

the language. But the statistical distributions in a natural language reect a great man y factors, some

f

them at

dds

with eac h

ther

in unexp ected w a ys. Chomsky's famous sen tence, \Colorless green ideas sleep furiously ," is syn tactically quite reasonable|but seman tic nonsense|but, for historical reasons, quite common in conference pap ers. Or consider the classic illustration

f

attac hmen t am biguit y: \I sa w a man in the park with a telescop e." One in terpretation

f

this sen tence holds that I used a telescop e to see a man. T

judge

the relativ e lik eliho

d,

w e ma y w an t to kno w ho w

ften

telescop es are used for \seeing" (vs. \sa wing"); ho w

ften

a v erb tak es t w

prep
sitional

phrases; who is most lik ely to ha v e a telescop e (me, the man,

r

the park); and so

n.

Th us man y features

f

a sen tence ma y b e signican t. Within a restricted domain suc h as KITSS, the distributions are further shap ed b y the domain sub ject matter and b y st ylistic con v en tions. Sen tences suc h as \Station B3 go es

nho
k"

ma y b e rare in the newspa- p er but common in the KITSS application. F

r

sta- tions to \go" is a test script idiom. W e w an t

ur

statistics to capture more than the syntactic correlations. Our strategy is to build up ric h in- terpretations

f

the sen tences as w e are parsing them. W e tak e care to in terpret ev ery subtree that w e generate. Th us, when w e are deciding whether to com bine t w

subtrees

later

n,

w e will kno w what the subtrees \mean." F urthermore, w e will ha v e seman tic readings for

ther,

p

ssibly

relev an t p

rtions
f

the sen tence. The seman tic information helps to exp

se

deep sim- ilarities and deep dierences among sen tences. Tw

trees

that are seman tically similar are lik ely to com- bine in similar w a ys. With seman tic in terpretations, w e directly represen t the fact that in

ne

h yp

thesis

the telescop e is used for seeing. This fact is

bscured

in the corresp

nding

syn tactic tree|and ev en more so in the

riginal

sen tence, where \sa w" and \telescop e" app ear far apart. F

rmally

, let

b

e a giv en space

f

p

ssible

inter- pr etations. W e mo del a phrase structure rule, L k ! 1 In KITSS,

nly

the syn tactic brac k eting is ev er fully man ual. The system automatically constructs a seman tics for eac h training example from its syn tax, using a set

f

translation rules. Most

f

these rules are inferred from a default theory

f

syn tactic-seman tic t yp e corresp

ndences.

SLIDE 3 String: Place a call from station B1 to station B2 . Syntax: (SP (S (VP (VP (VP (VB "Place") (NP (AT "a") (NN "call"))) (PP (IN "from") (NP (NN "station") (NPR "B1")))) (PP (IN "to") (NP (NN "station") (NPR "B2"))))) (\. ".")) Semantics: (PLACE (:OBJECT (CALL (:NUMBER SING) (:REF A))) (:FROM (STATION (:NUMBER SING) (:NAME "B1"))) (:TO (STATION (:NUMBER SING) (:NAME "B2"))) (:MOOD DECL) ...) Logic: ((OCCURS (PLACES-C AL L B1 B2 CALL-812)) ) Figure 3: Represen tations F

rmed

in Pro cessing a Sen tence R 1 R 2 : : : R m , as an m-ary function taking v alues in . R 1 : : : R m giv e t yp e restrictions

n

the m argumen ts. The function describ es ho w to build an in terpretation

f

t yp e L from m con tiguous substring in terpretations

f

t yp es R 1 : : : R m . The rule n um b er k distinguishes rules that ha v e the same domain and range, but dif- fer functionally (e.g., in the case

f

a noun with t w

meanings);

the c hart main tains distinct h yp

theses

for the alternativ e in terpretations. In practice,

ur
consists
f

join t syn tactic-seman tic in terpretations. The syn tactic half

f

an in terpretation is simply the parse tree. The seman tic half is built comp

sitionall

y from lexically-deriv ed heads, slots and llers in a standard frame language, as illustrated in Figure 3. Exp erimen ts conrm the v alue

f

this approac h for statistical parsing. When w e run

ur

parser with seman tics turned

,

its syntactic accuracy rate drops from 99% to 66%, and it runs far more slo wly . The KITSS Algorithm The KITSS parsing algorithm (giv en as Algorithm 1 in App endix A) is a v arian t

f

tabular

r

c hart parsing metho ds for con text-free languages [Co c k e and Sc h w artz 1970 , Earley 1970, Graham et al 1980 ]. It scans the sen tence from left to righ t, assem bling p

ssible

partial in terpretations

f

the sen tence; but it con tin ually discards in- terpretations that are statistically unlik ely . The grammar rules and statistics are generated automatically b y training

n

a brac k eted corpus. The gramm ar is tak en to b e the smallest set

f

sym b

ls

and rules needed to write do wn all the parse trees in the corpus. The statistics are con text-sensitiv e; they concern the frequencies with whic h the in terpreted subtrees co-o ccur. Incremen tal training is p ermitted. The mo del is that the system considers a new sample sen- tence, up dates its database, and thro ws the sen tence a w a y . A grammar is giv en b y G = (V ; ; P ; S ), where V is the v

cabulary
f

all sym b

ls,
is

the set

f

terminal sym b

ls,

P is the set

f

rules, and S is the start sym b

l.

The start sym b

l

is restricted to b e non-recursiv e. A distinguished start sym b

l

(e.g., R OOT ) can b e added to the gramma r if necessary . F

r

an input sen tence w = a 1 a 2 : : : a jw j (a i 2 ), let w i;j denote the substring a i+1 : : : a j . F

r

example, w 0;3 denotes the rst three w

rds.

The algorithm

p

erates b

ttom-up

from left to righ t in the input string. A t eac h p

in

t j in the input string, the algorithm constructs h yp

theses

ab

ut

the immediately preceding substrings w i;j . A c

mplete

hyp

th-

esis is a parse tree for some substring

f

the sen tence; w e write it as [L r 1 r 2 : : : r m ], where L ! R 1 R 2 : : : R m is a rule in P and eac h subtree r i is itself a complete h yp

thesis

with ro

t

R i . An inc

mplete

hyp

thesis

is similar, except it is missing

ne
r

more

f

its righ tmost branc hes. W e write it as [L r 1 r 2 : : : r q +];

q

< m. W e use the notation h i;j to refer to a h yp

thesis

that dominates the string w i;j . If a h yp

thesis

h i;j is judged to b e lik ely , it is added to a set t i;j in a (jw j + 1)

(jw

j + 1) c hart t. \Empt y" h yp

theses,

whic h are created directly from the gramm ar, ha v e the form [L +]. \Input" h y- p

theses

just assert the existence

f

a i and are complete; these are usually assigned probabilit y 1, but nor- malized sets

f

input h yp

theses

could b e used in noisy recognition en vironmen ts suc h as sp eec h

r

OCR. Longer h yp

theses

are created b y the

p

erator, whic h attac hes a new c hild to a tree. The

pro

duct

f

t w

h

yp

theses

is the smallest set resp ecting the condition that whenev er ( h i;j = [L r 1 : : : r q +]; h j;k = r q +1 ; and (L ! R 1 : : : R q R q +1 : : : R m ) 2 P then

[L

r 1 : : : r q r q +1 +] 2 (h i;j

h

j;k ) if q + 1 < m [L r 1 : : : r m ] 2 (h i;j

h

j;k ) if q + 1 = m Note that

returns

a set

f

0, 1,

r

2 h yp

theses.

The rst argumen t

f
is
rdinarily

an incomplete h yp

thesis,

while the second is a complete h yp

the-

sis immediately to its righ t. Otherwise

returns

the empt y set. The

p

erator can easily b e extended to act

n

sets and c harts: Q

R

:= S fh

h

j h 2 Q; h 2 Rg t

R

:= ( S t i;j )

R

The algorithm returns the set

f

complete h yp

theses

in t 0;jw j whose ro

ts

are S , the start sym b

l.

Eac h

f

these parses has an asso ciated probabilit y .

SLIDE 4 V = fVP ; V ; \Plac e" ; NP ; Det ; \a" ; N ; \c al l" : : : g

=

f\Plac e"; \a" ; \c al l" : : : g P = fVP ! V NP ; NP ! Det N ; V ! \Plac e"; Det ! \a" ; N ! \c al l" : : :g S = VP VP / \ / NP / / \ V Det N | | | "Place" "a" "call" Figure 4: A P arse T ree for w = \Plac e" \a" \c al l" During the parsing pro cess, a left-c

ntext

pr

b

ability

r

LCP , Pr(h i;j j w 0;j ), is used to prune sets

f

com- p eting h yp

theses.

Pruning sev erit y dep ends

n

the b e am width, <

1.

A b eam width

f

10 2 k eeps

nly

those alternativ e h yp

theses

that are judged at least 1% as lik ely as the leading con tender in the set. The correct parse can surviv e

nly

if all

f

its constituen t h yp

theses

meet this criterion; th us

p

era- tionally determines the set

f

garden path sen tences for the parser. If an y correct h yp

thesis

is pruned, then the correct parse will not b e found (indeed, p erhaps no parse will b e found). This can happ en in garden path sen tences. It ma y also happ en if the statistical database pro vides an inadequate

r

incomplete mo del

f

the language. Probabilit y Calculations The parsing algorithm k eeps

r

discards a h yp

the-

sis according to the left-con text probabilit y Pr (h i;j j w 0;j ). The more accurate this v alue, the b etter w e will do at pruning the searc h space. Ho w can w e compute it without assuming con text-freeness? W e are able to decomp

se

the probabilit y in to a pro duct

f

corpus statistics (whic h w e lo

k

up in a xed hash table) and the LCPs

f
ther

h yp

theses

(whic h w e computed earlier in the parse). Space pre- v en ts us from giving the formal deriv ation. Instead w e will w

rk

through part

f

an example. Figure 4 giv es a small gramma r fragmen t, with a p

ssible

parse tree for a short sen tence. F

r

con v e- nience, w e will name v arious trees and subtrees as follo ws: plac e = \Plac e" 0;1 a = \a" 1;2 c al l = \c al l" 2;3 v = [V \Plac e" ] 0;1 det = [Det \a" ] 1;2 n = [N \c al l" ] 2;3 np 1 = [NP [Det \a" ] +] 1;2 np = [NP [Det \a" ] [N \c al l" ]] 1;3 vp 1 = [VP [V \Plac e" ] +] 0;1 vp = [VP [V \Plac e" ] [NP [Det \a" ] [N \c al l" ]]] 0;3 These trees corresp

nd

to the h yp

theses
f

the previous section. Note carefully that|for example|the tree vp 2 vp 1

np

is correct if and

nly

if the trees vp 1 and np are also correct. W e will use this fact so

n.

Left-Con text Probabilities W e b egin with some remarks ab

ut

Pr (np j w 0;3 ), the LCP that np is the correct in terpretation

f

w 1;3 . This probabilit y dep ends

n

the rst w

rd
f

the sen tence, w 0;1 , and in particular

n

the in terpretation

f

w 0;1 . (F

r

example: if the statistics suggest that \Place" is a noun rather than a v erb, the np h yp

thesis

ma y b e unlik ely .) The correct computation is Pr (np j w 0;3 ) = Pr(vp 1 & np j w 0;3 ) (1) + Pr (X & np j w 0;3 ) + Pr (Y & np j w 0;3 ) + : : : where vp 1 ; X ; Y ; : : : are a set

f

(m utually exclusiv e) p

ssible

explanations for \Place." The summands in equation 1 are t ypical terms in

ur

deriv ation. They are LCPs for chains

f
ne
r

more con tiguous h yp

the-

ses. No w let us skip ahead to the end

f

the sen tence, when the parser has nished building the complete tree vp . W e decomp

se

this tree's LCP as follo ws: Pr (vp j w 0;3 ) = Pr(vp & vp 1 & np j w 0;3 ) (2) = Pr(vp j vp 1 & np & w 0;3 )

Pr(vp

1 & np j w 0;3 ) The rst factor is the lik eliho

d

that vp 1 and np , if they are in fact correct, will com bine to mak e the big- ger tree vp 2 vp 1

np

. W e appro ximate it empirically , as discussed in the next section. As for the second factor, the parser has already found it! It app eared as

ne
f

the summands in (1), whic h the parser used to nd the LCP

f

np . It decomp

ses

as Pr (vp 1 & np j w 0;3 ) (3) = Pr (vp 1 & np & np 1 & n j w 0;3 ) = Pr (np j vp 1 & np 1 & n & w 0;3 )

Pr

(vp 1 & np 1 & n j w 0;3 ) The situation is exactly as b efore. W e estimate the rst factor empirically , and w e ha v e already found the second as Pr(vp 1 & np 1 & n j w 0;3 ) (4) = Pr (vp 1 & np 1 & n & c al l j w 0;3 ) = Pr (n j vp 1 & np 1 & c al l & w 0;3 )

Pr(vp

1 & np 1 & c al l j w 0;3 )

SLIDE 5 A t this p

in

t the recursion b

ttoms
ut,

since c al l cannot b e decomp

sed

further. T

nd

the second factor w e in v

k

e Ba y es' theorem: Pr(vp 1 & np 1 & c al l j w 0;3 ) (5) = Pr(vp 1 & np 1 & c al l j w 0;2 & c al l ) = Pr(vp 1 & np 1 j w 0;2 & c al l ) = Pr(c al l j vp 1 & np 1 & w 0;2 )

Pr

(vp 1 & np 1 j w 0;2 ) P X Pr(c al l j X & w 0;2 )

Pr

(X j w 0;2 ) The sum in the denominator is

v

er all c hains X , including vp 1 & np 1 , that comp ete with eac h

ther

to ex- plain the input w 0;2 = \Plac e a" . Note that for eac h X , the LCP

f

X & c al l will ha v e the same denominator. In b

th

the n umerator and the denominator, the rst factor is again estimated from corpus statistics. And again, the second factor has already b een computed. F

r

example, Pr (vp 1 & np 1 j w 0;2 ) is a summand in the LCP for np 1 . Note that thanks to the Ba y esian pro cedure, this is indeed a left-con text probabilit y: it do es not dep end

n

the w

rd

\call," whic h falls to the righ t

f

np 1 . Corpus Statistics The recursiv e LCP computation do es nothing but m ul- tiply together some empirical n um b ers. Where do these n um b ers come from? Ho w do es

ne

estimate a v alue lik e Pr(vp j vp 1 & np & w 0;3 )? The condition w 0;3 is redundan t, since the w

rds

also app ear as the lea v es

f

the c hain vp 1 & np . So the ex- pression simplies to Pr (vp j vp 1 & np ). This is the probabilit y that, if vp 1 and np are correct in an ar- bitrary sen tence, vp is also correct. (Consisten t alternativ es to vp = [VP v np ] migh t include [VP v np pp ] and [VP v [NP np pp ]], where pp is some prep

sitional

phrase.) In theory ,

ne

could nd this v alue directly from the brac k eted corpus: (a) In the 3 brac k eted training sen tences (sa y) where

A

= [VP [V \Plac e" ] +] 0;1 app ears

B

= [NP [Det \a" ] [N \c al l" ]] 1;3 app ears in what fraction do es [VP [V \Plac e" ] [NP [Det \a" ] [N \c al l" ]]] 0;3 app ear? Ho w ev er, suc h a question is to

sp

ecic to b e practical: 3 sen tences is uncomfortably close to 0. T

ensure

that

ur

samples are large enough, w e broaden

ur

question. W e migh t ask instead: (b) Among the 250 training sen tences with

A

= [VP [V : : : ] +] i;j

B

= [NP : : : ] j;k (some i < j < k ) in what fraction is B the second c hild

f

A? Alternativ ely , w e migh t tak e a more seman tic approac h and ask (c) Among the 20 training sen tences with

A

= a subtree from i to j

B

= a subtree from j to k

A

has seman tic in terpretation A = (v-PLA CE : : : )

B

has seman tic in terpretation B = (n-CALL : : : ) in what fraction do es B ll the O B J E C T role

f

A ? Questions (b) and (c) b

th

consider more sen tences than (a). (b) considers a wide selection

f

sen tences lik e \The

p

erator activ ates CF A : : : ." (c) connes itself to sen tences lik e \The

p

erator places t w

priorit

y calls : : : ." As a practical matter, an estimate

f

Pr(vp j vp 1 & np ) should probably consider some syn tactic and some seman tic information ab

ut

vp 1 and np . Our curren t approac h is essen tially to com bine questions (b) and (c). That is, w e fo cus

ur

atten tion

n

corpus sen tences that satisfy the conditions

f

b

th

questions sim ultaneously . Limiting Chain Length The previous sections refer to man y long c hains

f

h y- p

theses:

Pr(vp 1 & np 1 & c al l j w 0;3 ) Pr(n j vp 1 & np 1 & w 0;3 ) In p

in

t

f

fact, ev ery c hain w e ha v e built extends bac k to the start

f

the sen tence. But this is unacceptable: it means that parsing time and space gro w exp

nen

tially with the length

f

the sen tence. Researc h in probabilistic parsing

ften

a v

ids

this problem b y assuming a sto c hastic con text-free language (SCF G). In this case, Pr(vp 1 & np 1 & c al l j w 0;3 ) = Pr (vp 1 j w 0;1 )

Pr(np

1 j w 1;2 )

Pr(c

al l j w 2;3 ) and Pr(n j vp 1 & np 1 & w 0;3 ) = Pr (n j w 2;3 ). This assumption w

uld

greatly condense

ur

computation and

ur

statistical database. Unfortunately , for natural languages SCF Gs are a p

r

mo del. F

llo

w- ing the deriv ation S )

John

solve d the N , the probabilit y

f

applying rule fN ! sh g is appro ximately zero|though it ma y b e quite lik ely in

ther

con texts. But

ne

ma y limit the length

f

c hains less dras- tically , b y making w eak er indep endence assumptions. With appropriate mo dications to the form ulas, it is p

ssible

to arrange that c hain length nev er exceeds some constan t L

1.

Setting L = 1 yields the con text- free case. Our parser renes this idea

ne

step further: within the b

und

L, c hain length v aries dynamically . F

r

in- stance, supp

se

L = 3. In the parse tree [VP [V \Plac e" ] [NP [Det \a" ] [A dj \priority" ] [N \c al l" ]]],

SLIDE 6 w e do compute an LCP for the en tire c hain vp 1 & np 2 & n, but the c hain vp 1 & np 1 & adj is consid- ered \to

long."

Wh y? The parser sees that adj will b e buried t w

lev

els deep in b

th

the syn tactic and seman tic trees for vp . It concludes that Pr (adj j vp 1 & np 1 )

Pr

(adj j np 1 ), and uses this heuristic assumption to sa v e w

rk

in sev- eral places. In general w e ma y sp eak

f

high-fo cus and low-fo cus parsing. The fo cus ac hiev ed during a parse is dened as the ratio

f

useful w

rk

to total w

rk.

Th us a high-fo cus parser is

ne

that prunes aggressiv ely: it do es not allo w incorrect h yp

theses

to proliferate. A completely deterministic parser w

uld

ha v e 100% fo- cus. Non-probabilistic parsers t ypically ha v e lo w fo cus when applied to natural languages. Our parsing algorithm allo ws us to c ho

se

b et w een high- and lo w-fo cus strategies. T

ac

hiev e high fo cus, w e need a high v alue

f
(aggressiv

e pruning) and accurate probabilities. This will prev en t a com binatorial explosion. T

the

exten t that accurate probabilities require long c hains and complicated form ulas, the useful w

rk
f

the parser will tak e more time. On the

ther

hand, a greater prop

rtion
f

the w

rk

will b e useful. In practice, w e nd that ev en L = 2 yields go

d

fo cus in the KITSS domain. Although the L = 2 probabilities undoubtedly sacrice some degree

f

accuracy , they still allo w us to prune most incorrect h yp

theses.

Related issues

f

mo dularit y and lo cal nondetermin- ism are also discussed in the psyc holinguistic literature (e.g., [T anenhaus et al 1985 ]). Incomplete Kno wledge Natural language systems tend to b e plagued b y the p

ten

tial

p

en-endedness

f

the kno wledge required for understanding. Incompleteness is a problem ev en for fully hand-co ded systems. The correlate for statistical sc hemes is the problem

f

undertraining. In

ur

task, for example, w e do not ha v e a large sample

f

syntactic and seman tic brac k etings. T

comp

ensate, the parser utilizes hand-co ded kno wledge as w ell as statistical kno wledge. The hand-co ded kno wledge expresses general kno wledge ab

ut

linguistic subtheories suc h as part

f

sp eec h rules

r

co

rdination.

As an example, consider the problem

f

assign- ing parts

f

sp eec h to no v el w

rds.

Sev eral sources

f

kno wledge ma y suggest the correct part

f

sp eec h class|e.g., the left-con text

f

the the no v el w

rd,

the relativ e

p

enness

r

closedness

f

the classes, morpho- logical evidence from axation, and

rthographic

con- v en tions suc h as capitalization. The parser com bines the v arious forms

f

evidence (using a simplied represen tation

f

the probabilit y densit y functions) to assign a priori probabilities to no v el part

f

sp eec h rules. Using this tec hnique, the parser tak es a no v el sen- tence suc h as \XX YY go es ZZ," and deriv es syn tactic and seman tic represen tations analogous to \Station B1 go es

ho
k."

The curren t system in v

k

es these ad ho c kno wledge sources when it fails to nd suitable alternativ es using its empirical kno wledge. Clearly , there could b e a more con tin uous strategy . Ev en tually , w e w

uld

lik e to see v ery general linguistic principles (e.g., metarules and

X

theory [Jac k endo 1977 ]) and p

w

erful informan ts suc h as w

rld

kno wledge pro vide a priori biases that w

uld

allo w the parser to guess gen uinely no v el structures. Ultimately , this kno wledge ma y serv e as an in- ductiv e bias for a completely b

tstrapping,

learning v ersion

f

the system. Results and Summary The KITSS system is implem en ted in F ranz Com- mon Lisp running

n

Sun w

rkstations.

The system as a whole has no w pro duced co de for more than 70 complete test cases con taining h undreds

f

sen tences. The parsing comp

nen

t w as trained and ev aluated

n

a brac k eted corpus

f

429 sen tences from 40 test cases. The brac k etings use part

f

sp eec h tags from the Bro wn Corpus [F rancis and Kucera 1982 ] and tra- ditional phrase structure lab els (S , NP , etc.). Because adjunction rules suc h as VP ! VP PP are common, the trees tend to b e deep (9 lev els

n

a v erage). The brac k eted corpus con tains 308 distinct lexical items whic h participate in 355 part

f

sp eec h rules. There are 55 distinct non terminal lab els, including 35 parts

f

sp eec h. There are a total

f

13,262 constituen t decisions represen ted in the corpus. W e ha v e studied the accuracy

f

the parser in using its statistics to reparse the training sen tences correctly . Generally , w e run the exp erimen ts using an adaptiv e b eam sc hedule; progressiv ely wider b eam widths are tried (e.g., 10 1 , 10 2 , and 10 3 ) un til a parse is

b-

tained. (By comparison, appro ximately 85% more h y- p

theses

are generated if the system is run with a xed b eam

f

10 3 .) F

r

62%

f

the sen tences some parse w as rst found at

=

10 1 , for 32% at 10 2 , and for 6% at 10 3 . F

r
nly
ne

sen tence w as no parse found within

=

10 3 . F

r

the

ther

428 sen tences, the parser alw a ys reco v ered the correct parse, and correctly iden tied it in 423 cases. In the 428 top-rank ed parse trees, 99.92%

f

the brac k eting decisions w ere correct (less than

ne

error in 1000). Signican tly , the parser pro duced

nly

1.02 parses p er sen tence. Of the h yp

theses

that surviv ed pruning and w ere added to the c hart, 68% actually app eared as parts

f

the nal parse. W e ha v e also done a limited n um b er

f

randomized split-corpus studies to ev aluate the parser's general- ization abilit y . After sev eral h undred sen tences in the telephon y domain, ho w ev er, the v

cabulary

con tin ues to gro w and new structural rules are also

ccasionally

needed. W e ha v e conducted a small-scale test that ensured that the test sen tences w ere no v el but gram- matical under the kno wn rules. With 211 sen tences for training and 147 for testing, parses w ere found for 77%

f

the test sen tences:

f

these, the top-rank ed

SLIDE 7 parse w as correct 90%

f

the time, and 99.3%

f

the brac k eting decisions w ere correct. Using 256 sen tences for training and 102 sen tences for testing, the parser p erformed p erfectly

n

the test set. In conclusion, w e b eliev e that the KITSS application demonstrates that it is p

ssible

to create a robust natural language pro cessing system that utilizes b

th

distributional kno wledge and general linguistic kno wledge. Ac kno wledgemen ts Other ma jor con tributors to the KITSS system include V an Kelly , Uw e Nonnenmann, Bob Hall, John Eddy , and Lori Alp erin Resnic k. App endix A: The KITSS P arsing Algorithm Algorithm 1. P ARSE(w ): (* create a (jw j + 1)

(jw

j + 1) c hart t = (t i;j ): *) t 0;0 := f[S +]g; for j := 1 to jw j do R 1 := t j 1;j := fa j g; R 2 := ;; while R 1 6= ; R := PR UNE((t

R

1 ) [ (PREDICT(R 1 )

R

1 ) [ R 2 ); R 1 := R 2 := ;; for h i;j 2 R do t i;j := t i;j [ fh i;j g; if h i;j is complete then R 1 := R 1 [ fh i;j g else R 2 := R 2 [ fh i;j g endfor endwhile endfor; return f all complete S

h

yp

theses

in t 0;jw j g Subroutine PREDICT(R): return f[A +] j A ! B 1 : : : B m is in P , some complete B 1

h

yp

thesis

is in R, and A 6= S g Subroutine PR UNE(R): (*

nly

lik ely rules are k ept *) R := ;; thr eshol d :=

max

h i;j 2R Pr(h i;j j w 0;j ); for h i;j in R do if Pr(h i;j j w 0;j )

thr

eshol d then R := R [ fh i;j g endfor; return R References [Co c k e and Sc h w artz 1970] Co c k e, J. and Sc h w artz, J.I. 1970. Pr

gr

amming L anguages and Their Com- pilers. New Y

rk:

Couran t Institute

f

Mathemati- cal Sciences, New Y

rk

Univ ersit y . [Earley 1970] Earley , J. 1970. An Ecien t Con text- F ree P arsing Algorithm. Communic ations

f

the A CM 13(2): 94-102. [F rancis and Kucera 1982] F rancis, W. and Kucera, H. 1982. F r e quency A nalysis

f

English Usage. Boston: Hough ton Miin. [Graham et al 1980] Graham, S.L., Harrison, M.A. and Ruzzo, W.L. 1980. An Impro v ed Con text-F ree Recognizer. A CM T r ansactions

n

Pr

gr

amming L anguages and Systems 2(3):415-463. [Jac k endo 1977] Jac k endo, R. 1977.

X

Syntax: A Study

f

Phr ase Structur e. Cam bridge, MA.: MIT Press. [Jones et al 1991] Jones, M.A., Story , G.A., and Bal- lard, B.W. 1991. Using Multiple Kno wledge Sources in a Ba y esian OCR P

st-Pro

cessor. In First In- ternational Confer enc e

n

Do cument A nalysis and R etrieval, 925-933. St. Malo, F rance: AF CET{ IRISA/INRIA. [Nonnenmann and Eddy 1992] Nonnenmann, U., and Eddy J.K. 1992. KITSS

A

F unctional Soft w are T esting System Using a Hybrid Domain Mo del. In Pr

c.
f

8th IEEE Confer enc e

n

A rticial Intel li- genc e Applic ations. Mon terey , CA: IEEE. [Simmo ns 1991] Simmo ns, R. and Y u, Y. 1991. The Acquisition and Application

f

Con text Sensitiv e Grammar for English. In Pr

c.
f

the 29th A nnual Me eting

f

the Asso ciation for Computational Lin- guistics, 122-129. Berk eley , California: Asso ciation for Computational Linguistics. [T anenhaus et al 1985] T anenhaus, M.K., Carlson, G.N. and Seiden b erg, M.S. 1985. Do Listeners Com- pute Linguistic Represen tations? In Natur al L an- guage Parsing (eds. D. R. Do wt y , L. Karttunen and A.M. Zwic ky), 359-408. Cam bridge Univ ersit y Press: Cam bridge, England.