tutorial outline tutorial outline xle xle
play

Tutorial Outline Tutorial Outline XLE: XLE: What is a deep - PowerPoint PPT Presentation

Tutorial Outline Tutorial Outline XLE: XLE: What is a deep grammar and why would you want Grammar Development Platform Grammar Development Platform one? Parser/Generator XLE: A First Walkthrough Parser/Generator Robustness


  1. Tutorial Outline Tutorial Outline XLE: XLE: � What is a deep grammar and why would you want Grammar Development Platform Grammar Development Platform one? Parser/Generator � XLE: A First Walkthrough Parser/Generator � Robustness techniques � Generation � Disambiguation Miriam Butt ( Miriam Butt (Universit Universitä ät t Konstanz Konstanz) ) � Applications: Tracy Holloway King (PARC) Tracy Holloway King (PARC) – Machine Translation – Sentence Condensation – Computer Assisted Language Learning (CALL) – Knowledge Representation COLING 2004 Tutorial COLING 2004 Tutorial COLING 2004: XLE tutorial Applications of Language Engineering Applications of Language Engineering Deep grammars Deep grammars Post-Search � Provide detailed syntactic/semantic analyses Sifting Google – HPSG (LinGO, Matrix), LFG (ParGram) Domain Coverage Broad Autonomous Knowledge Filtering Alta – Grammatical functions, tense, number, etc. Vista Mary wants to leave. AskJeeves subj(want~1,Mary~3) Document Base comp(want~1,leave~2) Management Good subj(leave~2,Mary~3) Translation Narrow Restricted Useful tense(leave~2,present) Knowledge Dialogue Summary Fusion Manually-tagged � Usually manually constructed Keyword Search Natural Microsoft Dialogue Paperclip Low High Functionality COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  2. Deep analysis matters… … Deep analysis matters Why would you want one? one? Why would you want if you care about the answer if you care about the answer � Meaning sensitive applications Example: – overkill for many NLP applications A delegation led by Vice President Philips, head of the chemical � Applications which use shallow methods for division, flew to Chicago a week after the incident. English may not be able to for "free" word Question: Who flew to Chicago? order languages Candidate answers: – can read many functions off of trees in English » subj: NP sister to VP division closest noun shallow but wrong head next closest » obj: first NP sister to V V.P. Philips next – need other information in German, Japanese, etc. deep and right delegation furthest away but Subject of flew COLING 2004: XLE tutorial COLING 2004: XLE tutorial Why don't people use them? Why don't people use them? Why should one pay attention now? Why should one pay attention now? New Generation of Large-Scale Grammars: � Time consuming and expensive to write – shallow parsers can be induced automatically from � Robustness: a training set – Integrated Chunk Parsers � Brittle – Bad input always results in some (possibly good) output – shallow parsers produce something for everything � Ambiguity : � Ambiguous – Integration of stochastic methods – shallow parsers rank the outputs – Optimality Theory used to rank/pick alternatives � Slow � Speed: comparable to shallow parsers – shallow parsers are very fast (real time) � Other gating items for applications that need � Accuracy and information content: deep grammars – far beyond the capabilities of shallow parsers. COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  3. XLE at PARC Basic LFG XLE at PARC Basic LFG � Constituent-Structure: tree � Platform for Developing Large-Scale LFG � Functional-Structure: Attribute Value Matrix Grammars universal � LFG (Lexical-Functional Grammar) – Invented in the 1980s PRED 'appear<SUBJ>' (Joan Bresnan and Ronald Kaplan) S TENSE pres – Theoretically stable � Solid Implementation NP VP PRED 'pro' SUBJ � XLE is implemented in C, used with emacs, tcl/tk PERS 3 PRON V � XLE includes a parser , generator and transfer they appear NUM pl component. COLING 2004: XLE tutorial COLING 2004: XLE tutorial Grammar components Grammar components Basic configuration file Basic configuration file TOY ENGLISH CONFIG (1.0) � Configuration: links components ROOTCAT S. � Annotated phrase structure rules FILES . LEXENTRIES (TOY ENGLISH). � Lexicon RULES (TOY ENGLISH). TEMPLATES (TOY ENGLISH). � Templates GOVERNABLERELATIONS SUBJ OBJ OBJ2 OBL COMP XCOMP. � Other possible components SEMANTICFUNCTIONS ADJUNCT TOPIC. NONDISTRIBUTIVES NUM PERS. – Finite State (FST) morphology EPSILON e. – disambiguation feature file OPTIMALITYORDER NOGOOD. ---- COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  4. Grammar sections Syntactic rules Grammar sections Syntactic rules � Annotated phrase structure rules � Rules, templates, lexicons Category --> Cat1: Schemata1; � Each has: Cat2: Schemata2; – version ID Cat3: Schemata3. – component ID – XLE version number (1.0) – terminated by four dashes ---- S --> NP: (^ SUBJ)=! � Example (! CASE)=NOM; STANDARD ENGLISH RULES (1.0) VP: ^=!. ---- COLING 2004: XLE tutorial COLING 2004: XLE tutorial Another sample rule Another sample rule Lexicon Lexicon � Basic form for lexical entries: "indicate comments" VP --> V: ^=!; "head" word Category1 Morphcode1 Schemata1; Category2 Morphcode2 Schemata2. (NP: (^ OBJ)=! "() = optionality" (! CASE)=ACC) walk V * (^ PRED)='WALK<(^ SUBJ)>'; PP*: ! $ (^ ADJUNCT). "$ = set" N * (^ PRED) = 'A-WALK' . girl N * (^ PRED) = 'A-GIRL'. VP consists of: a head verb kick V * { (^ PRED)='KICK<(^ SUBJ)(^ OBJ)>' an optional object |(^ PRED)='KICK<(^ SUBJ)>'}. zero or more PP adjuncts the D * (^ DEF)=+. COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  5. Templates Template example cont. Templates Template example cont. No Template � Parameterize template to pass in values � Express generalizations girl N * (^ PRED)='GIRL' CN(P) = (^ PRED)='P' – in the lexicon girl N * @(CN GIRL). { (^ NUM)=SG { (^ NUM)=SG – in the grammar boy N * @(CN BOY). (^ DEF) (^ DEF) – within the template space |(^ NUM)=PL}. |(^ NUM)=PL}. With Template � Template can call other templates TEMPLATE: CN = { (^ NUM)=SG (^ DEF) INTRANS(P) = (^ PRED)='P<(^ SUBJ)>'. |(^ NUM)=PL}. TRANS(P) = (^ PRED)='P<(^ SUBJ)(^ OBJ)>'. girl N * (^ PRED)='GIRL' @CN. OPT-TRANS(P) = { @(INTRANS P) | @(TRANS P) }. boy N * (^ PRED)='BOY' @CN. COLING 2004: XLE tutorial COLING 2004: XLE tutorial Outline: Robustness Outline: Robustness Parsing a string Parsing a string Dealing with brittleness � create-parser demo-eng.lfg � parse "the girl walks" � Missing vocabulary – you can't list all the proper names in the world � Missing constructions Walkthrough Demo – there are many constructions theoretical linguistics rarely considers (e.g. dates, company names) � Ungrammatical input – real world text is not always perfect – sometimes it is really horrendous COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  6. Dealing with Missing Vocabulary Building lexical entries Dealing with Missing Vocabulary Building lexical entries � Build vocabulary based on the input of � Lexical entries shallow methods -unknown N XLE @(COMMON-NOUN %stem). +Noun N-SFX XLE @(PERS 3). – fast +Pl N-NUM XLE @(NUM pl). – extensive � Rule – accurate Noun -> N N-SFX N-NUM. � Finite-state morphologies � Structure falls -> fall +Noun +Pl [ PRED 'fall' NTYPE common fall +Verb +Pres +3sg PERS 3 � Build lexical entry on-the-fly from the NUM pl ] morphological information COLING 2004: XLE tutorial COLING 2004: XLE tutorial Guessing words Guessing words Using the lexicons Using the lexicons � Use FST guesser if the morphology doesn't Rank the lexical lookup � know the word 1. overt entry in lexicon 2. entry built from information from morphology – Capitalized words can be proper nouns 3. entry built from information from guesser Saakashvili -> Saakashvili +Noun +Proper +Guessed » quality will depend on language type – ed words can be past tense verbs or adjectives Use the most reliable information � fumped -> fump +Verb +Past +Guessed fumped +Adj +Deverbal +Guessed Fall back only as necessary � COLING 2004: XLE tutorial COLING 2004: XLE tutorial

  7. Missing constructions Grammar engineering approach Missing constructions Grammar engineering approach � Even large hand-written grammars are not � First try to get a complete parse complete � If fail, build up chunks that get complete – new constructions, especially with new corpora parses – unusual constructions � Have a fall-back for things without even � Generally longer sentences fail chunk parses � Link these chunks and fall-backs together in a Solution: Fragment and Chunk Parsing single structure � Build up as much as you can; stitch together the pieces COLING 2004: XLE tutorial COLING 2004: XLE tutorial Fragment Chunks: Sample output Fragment Chunks: Sample output F-structure F-structure � the the dog appears. � Split into: – "token" the – sentence " the dog appears " – ignore the period COLING 2004: XLE tutorial COLING 2004: XLE tutorial

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend