 
              Hybrid NLP Hybrid NLP
O UTLINE O UTLINE � Problems of Deep and Shallow Processing � Hybrid Architectures � An Advanced Platform for Hybrid NLP: Deep Thought � Applications for Hybrid Processing � Conclusion and Outlook LTII – SS 2008
D EEP & S HALLOW P ROCESSING D EEP & S HALLOW P ROCESSING � deep methods for morphological - syntactic - semantic processing exploit our knowledge about the structure of human language � as opposed to shallow methods such as pattern matching grammars, n-gram language models � deep methods are needed for getting at the meaning of language input � shallow methods perform a partial or heavily under- specified analysis sufficient for certain applications LTII – SS 2008
∃ x[(old'(penny')) (x) ∧ ( Past(give'(sue‘, paul‘, x)))] S S S/NP VP NP NP NP NP V NP Det N V NP Det N A N A N Sue gab Paul einen alten Pfennig. Sue gave Paul an old penny. LTII – SS 2008
A PPLICATIONS A PPLICATIONS � Machine Translation e.g. Systran, Logos, METAL-Comprendium, IBM PT � Access to Databases e.g. Core Language Engine LTII – SS 2008
O NCE U PON A T IME O NCE U PON A T IME � Broad industrial research in deep parsing � Xerox - LFG Siemens - LFG � IBM Germany - HPSG � � Hewlett Packard - GPSG and HPSG � IBM USA - PLNLP and Slot Grammar � Very large projects � EUROTRA LILOG � LS-GRAM � LTII – SS 2008
G RAMMAR F RAMEWORKS G RAMMAR F RAMEWORKS � Head-Driven Phrase Structure Grammar (HPSG) � Lexical Functional Grammar (LFG) � Tree-Adjunction Grammar (TAG) � Categorial Grammar (CG) � Dependency Grammar (DG) � GB-Minimalist Program LTII – SS 2008
HPSG HPSG � Head-Driven Phrase Structure Grammar by Pollard and Sag � Uniform formalism: typed feature structures � High degree of lexicalization: very few PS-rules, rich lexicon structure � Ontological structure: Multiple inheritance type hierarchy LTII – SS 2008
Problems with with Deep Deep Analysis Analysis Problems � Coverage (Development Time) � Robustness (Coping with Out-of-Grammar Input) � Efficiency (Runtime and Space Efficiency) � Specificity (Selection among Readings) LTII – SS 2008
Problems with with Shallow Shallow Analysis Analysis Problems � Accuracy Problems with embeddings, grammatical control, � anaphora and modal as well a negative contexts. According to SVP Raul Lopez, Slator expected him to be appointed CEO of Crawford Inc. at the upcoming share holders meeting. After the retirement of Peter Smith, Mary Hopp was introduced by VP Brown as the new director of the marketing division. After every former US based vicepresident except Lisa Ronell served as Chairman of the Board, the shareholders for the first time appointed a non-US Chairperson. LTII – SS 2008
R EAL G RAMMARS R EAL G RAMMARS � LinGO - English Resource Grammar � 8.000 types � 100.000 lines of code average feature structure > 300 nodes � � German Grammar of equal size � Japanese and Norwegian grammars are getting close LTII – SS 2008
International Collaboration Collaboration International Toky o � Tsujii Lab at the University of Tokyo � Tsujii, Torisawa, Ninomiya, Taura, Yoshida, Mitsoishi,... Stanford � HPSG Group at CSLI � Sag, Flickinger, Copestake, Malouf, Carroll (Brighton),... Saarbrücken � LT Lab at DFKI and Dept. of CL QuickTime™ and a GIF decompressor are needed to see this picture � Oepen, Callmeier, Krieger, Kiefer, Ciortuz, Müller,... LTII – SS 2008
S ETUP ETUP tsdb VALUATION S E VALUATION LTII – SS 2008 HE E T HE T
R ESULTS R ESULTS � All participating systems have benefitted from the systematic comparative evaluation � Currently the fastest system is the runtime parser PET by Ulrich Callmeier (Saarbrücken) � But the other parsers also improved drastically,e.g.: LKB (Stanford, Cambridge) � � LILFES (Tokyo) PAGE (Saarbrücken) � LTII – SS 2008
R ESULTS R ESULTS � HPSG Parsing is now 2000 times faster than before � Normal-length sentences parse in 0.1 - 1.0 seconds � Steady increase in hardware efficiency will also help LTII – SS 2008
R EFERENCES R EFERENCES � D. Flickinger, S. Oepen, H. Uszkoreit, and J. Tsujii (eds.). 2000. Journal of Natural Language Engineering 6 (2000) 1. Special Issue on Efficient Processing with HPSG: Methods, Systems, Evaluation. Cambridge University Press. Cambridge. � A. Copestake. 2002. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford. Building a Large Annotated Corpus of English: � S. Oepen, D. Flickinger, J. Tsujii, and H. Uszkoreit. 2002. Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing. CSLI Publications, Stanford. LTII – SS 2008
T HE C ORE M ACHINERY T HE C ORE M ACHINERY PET LKB Runtime Parser Development Platform English Development Platform Development Platform Development Platform Grammar Application German LKB LKB LKB Grammar Japanese Grammar Open Source tsdb Public Domain LTII – SS 2008
H OWEVER H OWEVER � Back to the problems of robustness � coverage � specificity � LTII – SS 2008
A SSUMPTIONS A SSUMPTIONS � Information extraction is not an alternative to deep processing but a continuum between classification and "full" semantic analysis � Information Extraction via Text Enrichment � We can detect topics, names, binary relations, complex relations, answers, etc. � Question: At what point is deep processing needed? LTII – SS 2008
A PPROACH A PPROACH � Lack of robustness and coverage remains a serious problem for deep processing. � So we need to find applications, where deep processing can improve detection without spoiling the performance. � Example: Relation extraction. � Let deep processing assist shallow methods. LTII – SS 2008
LT M M ETHODS LT ETHODS discrete non-discrete hybrid shallow HMM HMM- -based based POS Tagger Tagger POS deep LTII – SS 2008
LT M M ETHODS LT ETHODS discrete non-discrete hybrid shallow HPSG- -Parser Parser HPSG with MRS MRS with deep LTII – SS 2008
LT M M ETHODS LT ETHODS discrete non-discrete hybrid shallow PCF Parser PCF Parser deep LTII – SS 2008
LT M M ETHODS LT ETHODS discrete non-discrete hybrid shallow syntactic LFG syntactic LFG parser with with ME ME parser selection selection deep LTII – SS 2008
C OMBINATION OMBINATION OF M OF M ETHODS C ETHODS discrete non-discrete hybrid shallow deep LTII – SS 2008
Recommend
More recommend