Appears in: Carl Weir (ed.), Statistical ly -Ba se d - PDF document

Appears in: Carl Weir (ed.), Statistical ly -Ba se d Natural Language Processing Technique s: Papers from the 1992 Workshop, pp. 20-27. Technical Report W-92-01, AAAI Press, Menlo Park, 1992. A Probabilistic P arser and Its Application Mark A. Jones Jason M. Eisner A T&T Bell Lab oratories Emman uel College, Cam bridge 600 Moun tain Av en ue, Rm. 2B-435 Cam bridge CB2 3AP England Murra y Hill, NJ 07974{063 6 jme14@pho enix.cam bri dge.ac. uk jones@researc h.att.com Abstract out of earlier w ork [Jones et al 1991 ] on correcting the output of optical c haracter recognition (OCR) systems. W e describ e a general approac h to the probabilis- W e w ere amazed at ho w m uc h correction w as p ossible tic parsing of con text-free grammars. The metho d using only lo w-lev el statistical kno wledge ab out En- in tegrates con text-sensitiv e statistical kno wledge glish (e.g., the frequency of digrams lik e \pa") and of v arious t yp es (e.g., syn tactic and seman tic) and ab out common OCR mistak es (e.g., rep orting \c" for can b e trained incremen tally from a brac k eted cor- \e"). As man y as 90% of incorrect w ords could b e �xed pus. W e in tro duce a v arian t of the GHR con text- within the telephon y sublanguage domain, and 70{80% free recognition algorithm, and explain ho w to for broader samples of English. Naturally w e w on- adapt it for e�cien t probabilistic parsing. In split- dered whether more sophisticated uses of statistical corpus testing on a real-w orld corpus of sen tences kno wledge could aid in suc h tasks as the one describ ed from soft w are testing do cumen ts, with 20 p ossible ab o v e. The recen t literature also re�ects an increas- parses for a sen tence of a v erage length, the sys- ing in terest in statistical training metho ds for man y tem �nds and iden ti�es the correct parse in 96% NL tasks, including parsing [Jelinek and La�ert y 1991 , of the sen tences for whic h it �nds an y parse, while Magerman and Marcus 1991 , Bobro w 1991 , pro ducing only 1.03 parses p er sen tence for those Magerman and W eir 1992 , Blac k, Jelinek, et al 1992 ], sen tences. Signi�can tly , this success rate w ould b e part of sp eec h tagging [Ch urc h 1988 ], and corp ora only 79% without the seman tic statistics. alignmen t [Dagan et al 1991 , Gale and Ch urc h 1991 ]. Simply stated, w e seek to build a parser that can construct accurate syn tactic and seman tic analyses for In tro duction the sen tences of a giv en language. The parser should In constrained domains, natural language pro cessing kno w little or nothing ab out the target language, sa v e can often pro vide lev erage. A t A T&T, for instance, what it can disco v er statistically from a represen ta- NL tec hnology can p oten tially help automate man y tiv e corpus of analyzed sen tences. When only unan- asp ects of soft w are dev elopmen t. A t ypical example alyzed sen tences are a v ailable, a practical approac h o ccurs in the soft w are testing area. Here 250,000 En- is to parse a small set of sen tences b y hand, to get glish sen tences sp ecify the op erational tests for a tele- started, and then to use the parser itself as a to ol to phone switc hing system. The c hallenge is to to ex- suggest analyses (or partial analyses) for further sen- tract at least the surface con ten t of this highly ref- tences. A similar \b o otstrapping" approac h is found eren tial, naturally o ccurring text, as a �rst step in in [Simmo ns 1990 ]. The precise grammatical theory automating the largely man ual testing pro cess. The w e use to hand-analyze sen tences should not b e cru- sen tences v ary in length and complexit y , ranging from cial, so long as it is applied consisten tly and is not short sen tences suc h as \Station B3 go es onho ok" to 50 unduly large. w ord sen tences con taining paren theticals, sub ordinate clauses, and conjunction. F ortunately the discourse P arsing Algorithms is reasonably w ell fo cused: a large but �nite n um b er of telephonic concepts en ter in to a limited set of logi- F ollo wing [Graham et al 1980 ], w e adopt the follo wing cal relationships. Suc h fo cus is c haracteristic of man y notation. An arbitrary con text-free grammar is giv en sublanguages with practical imp ortance (e.g., medical b y = ( V � ; ), where is the v o cabulary of all G ; P ; S V records). sym b ols, � is the set of terminal sym b ols, P is the W e desire to press forw ard to NL tec hniques that set of rewrite rules, and is the start sym b ol. F or S are robust, that do not need complete grammars in ad- an input sen tence = , let denote the w a a : : : a w 1 2 n i;j v ance, and that can b e trained from existing corp ora of substring a : : : a and w = w denote the pre�x i +1 j i 0 ;i sample sen tences. Our approac h to this problem grew of length i . W e use Greek letters ( �; � ; : : : ) to denote

Appears in: Carl Weir (ed.), Statistical ly -Ba se d - PDF document

Appears in: Carl Weir (ed.), Statistical ly -Ba se d Natural Language Processing Technique s: Papers from the 1992 Workshop, pp. 20-27. Technical Report W-92-01, AAAI Press, Menlo Park, 1992. A Probabilistic P arser

Wallace Weir Fish Rescue Facility Issue Wallace Weir Fish Rescue Facility Issue Wallace Weir

URBAN WATERWAYS RESTORATION STUDY South Platte River Harvard Gulch Weir Gulch South

Weir Ready: Public Education Campaign Rationale We believe that many people still don't know

Kirkham Weir & Sluice Future and Preferred Options June 2016 Claire Barrett-Mold Future of

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Introduction to Natural Deduction Carl Pollard Department of Linguistics Ohio State University

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Natural & Cultural Scottish Natural Heritage Heritage Fund Natural & Cultural

Natural Refrigerants Natural Refrigerants Natural Refrigerants Natural Refrigerants Safe

Women Inventors and their Contributions Presented by Carl Giordano, Esq. Law Office of Carl

HFA is either trademark or registered trademark of Carl Zeiss Meditec, Inc HFA

FY15 Results 30 th November 2015 Overview Carl Leaver FY15 Results Paul Bowtell FY15

Unboxing the Arqus Magnus Berlander Carl Stranne Presenters Magnus Berlander Carl Stranne CTO

AGES OF EQUIPMENT Carl Kirstein Exxaro Resources Ltd carl.kirstein@exxaro.com What Ill Cover

Carl D. Perkins Carl D. Perkins Spring Informational Meeting Spring Informational Meeting for

Proceedi ngs of National Conference on Artificia l Intellig enc e (AAAI-92 ), San

Response Manager Data Management Region 6 Louisiana Flood Response Response Manager EPA

Seventeen And Eighteen The New Woman & Phoebes Place 1 Shall the sisters pray and

Gods Character Science has made remarkable advancements Jules Verne 1865 Popular Science

High-speed cryptography, Cryptographers part 3: more cryptosystems Working

One of Those Constructions that Really Needs a Proper Analysis Doug Arnold & Christopher

T he book of Jeremiah opens with The Last Kings of Judah a divine call that is issued to a

5 B2B Social Media Career Killers and how to overcome them Eddie Smith Chris Baggott Chief

Sambuz

Useful Links

Newsletter

Mail Us

Appears in: Carl Weir (ed.), Statistical ly -Ba se d - PDF document

Appears in: Carl Weir (ed.), Statistical ly -Ba se d Natural Language Processing Technique s: Papers from the 1992 Workshop, pp. 20-27. Technical Report W-92-01, AAAI Press, Menlo Park, 1992. A Probabilistic P arser

Wallace Weir Fish Rescue Facility Issue Wallace Weir Fish Rescue Facility Issue Wallace Weir

URBAN WATERWAYS RESTORATION STUDY South Platte River Harvard Gulch Weir Gulch South

Weir Ready: Public Education Campaign Rationale We believe that many people still don't know

Kirkham Weir &amp; Sluice Future and Preferred Options June 2016 Claire Barrett-Mold Future of

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Introduction to Natural Deduction Carl Pollard Department of Linguistics Ohio State University

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Natural &amp; Cultural Scottish Natural Heritage Heritage Fund Natural &amp; Cultural

Natural Refrigerants Natural Refrigerants Natural Refrigerants Natural Refrigerants Safe

Women Inventors and their Contributions Presented by Carl Giordano, Esq. Law Office of Carl

HFA is either trademark or registered trademark of Carl Zeiss Meditec, Inc HFA

FY15 Results 30 th November 2015 Overview Carl Leaver FY15 Results Paul Bowtell FY15

Unboxing the Arqus Magnus Berlander Carl Stranne Presenters Magnus Berlander Carl Stranne CTO

AGES OF EQUIPMENT Carl Kirstein Exxaro Resources Ltd carl.kirstein@exxaro.com What Ill Cover

Carl D. Perkins Carl D. Perkins Spring Informational Meeting Spring Informational Meeting for

Proceedi ngs of National Conference on Artificia l Intellig enc e (AAAI-92 ), San

Response Manager Data Management Region 6 Louisiana Flood Response Response Manager EPA

Seventeen And Eighteen The New Woman &amp; Phoebes Place 1 Shall the sisters pray and

Gods Character Science has made remarkable advancements Jules Verne 1865 Popular Science

High-speed cryptography, Cryptographers part 3: more cryptosystems Working

One of Those Constructions that Really Needs a Proper Analysis Doug Arnold &amp; Christopher

T he book of Jeremiah opens with The Last Kings of Judah a divine call that is issued to a

5 B2B Social Media Career Killers and how to overcome them Eddie Smith Chris Baggott Chief

Sambuz

Useful Links

Newsletter

Mail Us

Kirkham Weir & Sluice Future and Preferred Options June 2016 Claire Barrett-Mold Future of

Natural & Cultural Scottish Natural Heritage Heritage Fund Natural & Cultural

Seventeen And Eighteen The New Woman & Phoebes Place 1 Shall the sisters pray and

One of Those Constructions that Really Needs a Proper Analysis Doug Arnold & Christopher