machine translation
play

Machine Translation implementation Classical and Statistical - PowerPoint PPT Presentation

Session 3: Constraint-based Transfer Lab1: Syntactic Transfer a Prolog Machine Translation implementation Classical and Statistical Approaches Background on Lexical-Functional Grammar (LFG) Session 3: Constraint-based Transfer


  1. Session 3: Constraint-based Transfer � Lab1: Syntactic Transfer – a Prolog Machine Translation implementation – Classical and Statistical Approaches � Background on Lexical-Functional Grammar (LFG) Session 3: Constraint-based Transfer � Constraint-based Transfer � Kaplan et al. 1989: Translation by Structural Jonas Kuhn Universität des Saarlandes, Saarbrücken Correspondence The University of Texas at Austin jonask@coli.uni-sb.de DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005 Jonas Kuhn: MT 2 Lab1: Syntactic Transfer Lab1: Suggested exercises � A Prolog implementation 1. Specify simple rules required for some of the English �� Spanish divergence examples from � Analysis � Transfer � Generation Trujillo chapter � Each part: processing engine + declarative rules Change the lexicon: distinguish a surface form and 2. � Variants of this lab exercise depending on an underlying citation form Prolog background: The transfer rules can be expressed more generally � � Use given Prolog implementation of the processing systems + concentrate on specifying The language-specific grammars should constrain � rules possible combinations of morphological variants � Use given parser and generator, but specify You can try to add certain ideas from interlingua- � Prolog predicate for transfer based translation (language-independent citation � Implement all parts from scratch (not forms, e.g., for definite articles) recommended due to time constraits) Add a new language pair (e.g., 3. English �� German) Jonas Kuhn: MT 3 Jonas Kuhn: MT 4

  2. Some additional background on lab Given code for the lab � The given parser and generator do not use the built-in DCG rule � The Prolog predicates for processing are format, but a slightly different format defined in a file engine.pl , separate from the � The parse-tree building argument is automatically added in a compilation step grammar rules � Each rule has an argument for the language it belongs to (this could be exploited for language-independent rules, � The engine.pl file is consulted automatically using “_”) when the grammar file is loaded � Each category is represented as a Prolog list: the first element is the main category label, the remaining elments � This is triggered by the following line at the can be used for linguistic features beginning of the grammar file: rule(en, [np,Agr] ---> [ [det,Agr], [n1,Agr] ] ). rule(en, [n1,Agr] ---> [ [n,Agr] ] ). :- ensure_loaded('engine.pl'). rule(en, [n1,Agr] ---> [ [adj,Agr], [n,Agr] ] ). word(en, [det,_]/the ---> [the]). word(en, [n, agr(sg)]/soup ---> [soup]). word(en, [adj, agr(sg)]/delicious ---> [delicious]). Jonas Kuhn: MT 5 Jonas Kuhn: MT 6 Given code for the lab Given code for the lab � When the grammar rules are consulted, they � Parsing predicate are automatically compiled into a special parse(Lang,StartSymbol,String,Tree) internal format � Example: � This is triggered by the following line at the ?- parse(en, [np|_]/_, [the,soup], T). end: � Short parse predicate: :- compile_grammars. p(String) � You can look at the effect of the compilation � The language and start symbol are “guessed” on tree representations by parsing a string � The resulting tree is printed using the (using the given parse predicates) pretty_print predicate Jonas Kuhn: MT 7 Jonas Kuhn: MT 8

  3. Given code for the lab Given code for the lab � Transfer predicate � Generation predicate translate(SourceLang,TargetLang, generate(Lang,Tree,String) SourceString,TargetString) � Example: � The translation predicates � calls the parsing predicate (guessing the start symbol) ?- parse(en, [np|_]/_, [the,soup], T), � prints the resulting source tree generate(en,T,X). � recursively traverses the source tree, applying � Short generation predicate: transformation rules � prints the resulting (underspecified) target tree g(Tree) � calls the generation predicate � Again, the language is “guessed” � returns the resulting string as TargetString � Example: � The resulting string is printed ?- translate(en,es,[the,delicious,soup],S). � Short translation predicate: t(SourceString) Jonas Kuhn: MT 9 Jonas Kuhn: MT 10 Transformation rules Lexical-Functional Grammar � Grammar formalism originally proposed by Joan � The rules will be applied top-down in the Bresnan and Ron Kaplan in the early 1980s order they are specified � Non-derivational � A cut (“ ! ”) is used to exclude backtracking � Lexicalist � Parallel levels of representation � The more specific rules should go to the top! � C(onstituent)-structure [Phrase structure tree repr.] � F(unctional)-structure [Complex feature structure repr.] � (Lexicosemantic) A(rgument)-structure � Semantic Structure � Grammar specification as constraint schemata describing projection functions across the levels of representation Jonas Kuhn: MT 11 Jonas Kuhn: MT 12

  4. Lexical-Functional Grammar Lexical-Functional Grammar � Constraints describing the relation between � Computational perspective: various levels of representation � One of the best high-level grammar development system (with a very fast parser/generator) is based on an LFG implementation C-Structure F-Structure � XLE – developed over c. 15 years by Ron Kaplan, John Maxwell and colleagues at the Palo Alto Research Center (formerly Xerox PARC) � Broad-coverage LFG grammars for a growing number of languages have been developed in the Parallel Grammar development project (ParGram) � English, German, French, Japanese, Norwegian, Korean, Arabic, … ( φ (n 1 ) SUBJ) = φ (n 2 ) Jonas Kuhn: MT 13 Jonas Kuhn: MT 14 Lexical-Functional Grammar LFG-based Machine Translation � F-Structure as a syntactic representation abstracting � Additional “tau projection” to describe cross- away from most language-specific aspects of linguistic relation realization � Constraints on F-Structure are specified as annotations in (C-Structure) rewrite rules S � NP VP ( ↑ SUBJ)= ↓ ↑ = ↓ Read: (1) the f-structure projected from the NP node ( ↓ ) is the same as the f-structure embedded under the feature SUBJ in the f-structure projected from the S node ( ↑ ) ; (2) the f-structure projected from the VP node ( ↓ ) is identical to the f-structure projected from the S node ( ↑ ) Jonas Kuhn: MT 15 Jonas Kuhn: MT 16

  5. Tau projection: structural divergence Tau projection: structural divergence S � NP VP ( ↑ SUBJ)= ↓ ↑ = ↓ � Example VP � V NP ↑ ↑ = ↓ ( ↑ OBJ)= ↓ SUBJ � German: Der Student beantwortet die Frage φ S ( ↑ SUBJ) � French: L’étudiant répond à la question VP OBJ ( ↑ OBJ) NP NP V DET N DET N Student beantwortet die der Frage � Source language information from the verb’s lexical entry: Jonas Kuhn: MT 17 Jonas Kuhn: MT 18 Tau projection: structural divergence Tau projection: head switching τ τ ↑ � Example: SUBJ ↑ τ ( ↑ SUBJ) � English: The baby just fell � French: Le bébé vient de tomber AOBJ OBJ τ ( ↑ OBJ) � Assuming that at f-structure adverbs like just introduce a grammatical function that embeds φ the f-structure of the rest of the sentence Information from the transfer lexicon entry for the verb beantworten : � beantwortet Jonas Kuhn: MT 19 Jonas Kuhn: MT 20

  6. Tau projection: head switching Tau projection: head switching φ φ S S NP NP ADV ADV VP VP DET DET N N Transfer entry for adverb: baby just baby just the the fell fell Jonas Kuhn: MT 21 Jonas Kuhn: MT 22 Tau projection: head switching Issues in the tau projection approach τ τ ↑ � Kaplan et al. 1989 paper is the beginning of a debate: � Kaplan et al. 1989 (original proposal) � Sadler/Thompson 1991 (criticism: problem with head-switching in embedded contexts) � Kaplan/Wedekind 1993 (reply: extension of LFG formalism by the restriction operator to deal with mismatches between syntactic and Transfer entry for adverb: semantic heads) Jonas Kuhn: MT 23 Jonas Kuhn: MT 24

  7. LFG-based Machine Translation � Conclusion: Flexible approach taking advantage of linguistic generalizations at various levels � Modularization of transfer information (separate from language-specific grammatical information) is somewhat problematic Jonas Kuhn: MT 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend