machine translation
play

Machine Translation Week 1: Classical approaches Classical and - PowerPoint PPT Presentation

Session 1: Overview Course Overview Machine Translation Week 1: Classical approaches Classical and Statistical Approaches Week 2: Data-driven, statistical approaches Machine Translation: Some History Major


  1. Session 1: Overview � Course Overview Machine Translation � Week 1: “Classical” approaches – Classical and Statistical Approaches � Week 2: Data-driven, statistical approaches � Machine Translation: Some History � Major architectures/paradigms in “classical” Session 1: Overview Machine Translation � Translation challenges – a classification Jonas Kuhn Universität des Saarlandes, Saarbrücken The University of Texas at Austin jonask@coli.uni-sb.de DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005 Jonas Kuhn: MT 2 Course Overview (1) Course Overview (2) � Week 2: Data-driven, statistical approaches � Week 1: “Classical” approaches � The noisy channel model � History & Overview [Brown et al. 1990, Knight 1999] � Transfer-based translation � Language modeling � Syntax-based transfer [Trujillo 1999] � Translation modeling � Transfer as LFG projection � Word alignment [Kaplan et al. 1999] � Phrase alignment � Interlingua-based translation [Koehn et al. 2003] � Decoding [Dorr 1994] [Koehn 1994] � Term-rewriting transfer [Emele/Dorna 1998] � Other uses of word alignments [Yarowsky et al. 2001] Jonas Kuhn: MT 3 Jonas Kuhn: MT 4

  2. MT: Some History MT: Some History � 1947: Memo by Warren Weaver (Rockefeller Foundation) � Translation “I have a text in front of me which is written in Russian but I am � c. 2000 BC (Old Babylonian period): bilingual going to pretend that it is really written in English and that it Sumerian-Akkadian text fragments has been coded in some strange symbols. All I need to do is � China, 9 th century BC: references to translators and strip off the code in order to retrieve the information contained in the text.” interpreters (compare use of computers for cryptography in WW-II) � c. mid-8 th century BC: reference to interpreting in Old � 1954: first prototype of Russian-English MT system (GAT Testament (Genesis 42:23) system, Peter Toma; Georgetown University, Washington D.C.) � 240 BC: Livius Andronicus translates the Odyssey from � 1961, UT Austin: Linguistic Research Center (led by Winfred Lehmann) Greek into Latin � Fundamental research and development of METAL, a � 197 BC: Rosetta stone carved (three scripts: Egyptian bidirectional English-German transfer system hieroglyphs, Demotic, Greek; discovered in 1799, and � Initially funded by US Air Force Rome Air Development deciphered by Jean François Champillion in 1822) Center; since 1978 by Siemens � First commercial METAL system appeared in 1989 Jonas Kuhn: MT 5 Jonas Kuhn: MT 6 MT: Some History MT: Some History � 1966: The ALPAC Report (Automatic Language Processing Advisory Committee, commissioned by the US National Academy of Sciences) � no shortage of human translators, no immediate prospect of MT producing useful translation of general scientific texts � funding for MT was virtually stopped (especially in the USA) � Groups continuing to work on MT in 1970s: � TAUM group in Montreal: METEO system (used for translating weather forecasts since 1977) � groups in the USSR � GETA group in Grenoble, France � SUSY group in Saarbrücken, Germany � Peter Toma working on Systran (in various organizations) � Systran is now available for 36 language pairs; http://www.systransoft.com/ � Underlying technology in Babel Fish Translation (by Altavista) Jonas Kuhn: MT 7 Jonas Kuhn: MT 8

  3. MT: Some History Architectures and Paradigms in MT � 1976: Commission of the European Communities Classification following Dorr/Jordan/Benoit (1999): A Survey of installs English-French version of Systran Current Paradigms in Machine Translation. In: Zelkowitz, Marvin (Hg.) Advances in Computers 49, 1-68. Academic Press, � commissions further language pairs of Systran London. � MT Architectures � 1982-1993: Eurotra – large-scale MT project funded � Direct translation by the European Communities � Transfer-based translation � 1993-2000: Verbmobil – large-scale speech-to- speech translation project funded by the German � Interlingua-based translation ministry for research � MT Paradigms � late 1980s-1990s: Candide project at IBM Watson � Linguistic-based paradigms Research Center – pioneering work in Statistical � Constraint-based MT, Knowledge-based MT, Lexical-based MT, Rule-based MT, Principle-based MT, Shake-and-Bake MT Machine Translation � Non-linguistic-based paradigms � Basis for all ongoing work in Statistical MT � Statistical-based MT, Example-based MT, Dialogue-based MT � Example: “Surprise Language Project” by DARPA – 1 month time for developing an MT system for a given language � Hybrid paradigms (June 2003: Hindi) (11 research institutions participated) Jonas Kuhn: MT 9 Jonas Kuhn: MT 10 Transfer vs. Interlingua The Vauquois Triangle � Some slides taken from Arturo Trujillo… � (author of “Translation Engines” 1999, Springer) Jonas Kuhn: MT 11 Jonas Kuhn: MT 12

  4. Transfer vs. Interlingua Multilinguality – Transfer � Transfer : English Catalan Contrasts are fundamental to translation. Statements in one theory (source language) are mapped into statements in another theory (target language). German � Interlingua : Spanish Meanings are language independent and can be encoded. They are extracted from SL sentences and rendered as TL sentences. French Japanese Jonas Kuhn: MT 13 Jonas Kuhn: MT 14 Transfer vs. Interlingua Multilinguality – Interlingua English Catalan + Easier to implement + Eliminates redundancy + Good for mono- or bi- + Highly modular directional systems + Simplifies addition of + Humans work on 2 languages languages at a time - Different linguists may German Interlingua Spanish disagree on representation of - Modifications affect several meaning transfer modules - Difficult to ensure that TL - Inefficient for multilinguality generator can produce sentence from SL French Japanese representation Jonas Kuhn: MT 15 Jonas Kuhn: MT 16

  5. Classifying translation challenges Types of divergence � Thematic divergence � Translation divergence: � Head-switching divergence � Meaning is conveyed by translation, although syntactic structure and semantic distribution of � Structural divergence meaning components is different in the two � Categorial divergence languages � Lexical gap (conflational divergence) � Translation mismatch � Divergence in lexicalization (lexical � Difference in information content between divergence) source and target sentence � Collocational divergence � Example (from Dorr 1994): translation of fish � Multi-lexeme and idiomatic divergence into Spanish – pez (alive), pescado (food) Jonas Kuhn: MT 17 Jonas Kuhn: MT 18 Types of divergence Types of divergence � Categorial divergence � Thematic divergence � En: a little bread � En: You like her � Sp: un poco de pan � Sp: Ella te gusta � (Lit: a bit of bread) � (Lit: She you-ACC pleases) � Lexical gap (conflational divergence) � Head-switching divergence � En: Camillo got up early � En: The baby just ate � Sp: Camillo madrugó � Sp: El bebé acaba de comer � (Lit: The baby finishes of to-eat) � En: I stabbed Juan � Sp: Yo le di puñeladas a Juan � Structural divergence � (Lit: I gave knife-wounds to Juan) � En: Luisa entered the house � Sp: Luisa entró a la casa � (Lit: Luisa entered to the house) Jonas Kuhn: MT 19 Jonas Kuhn: MT 20

  6. Types of divergence Other translation challenges � Ambiguity: Language understanding problem � Divergence in lexicalization (lexical divergence) � En: Susan swam across the channel (compare Dorr et al. 1999) � Sp: Susan cruzó el canal nadando � Syntactic ambiguity � (Lit: Susan crossed the channel swimming) I saw the man on the hill with the telescope � Resolution may not be necessary, since ambiguity � Collocational divergence may transfer to target language � En: Jan made a decision � Lexical ambiguity � Sp: Jan tomó/*hizó una decisión (Lit: Jan took/*made a En: book �� Sp: libro / reservar decision) � Semantic ambiguity � Multi-lexeme and idiomatic divergence � Homography � En: Socrates kicked the bucket En: ball �� Sp: pelota (spherical object) / baile (formal dance) � Sp: Socrates estiró la pata (Lit: Socrates stretched the leg) � Polysemy En: kill �� Sp: matar (kill a man) / acabar (kill a En: Frank is as tall as Orlaith � process) Sp: Frank es tan alto como Orlaith (Lit: Frank is so tall like Orlaith) � Jonas Kuhn: MT 21 Jonas Kuhn: MT 22 Other translation challenges Other translation challenges � Ambiguity (compare Dorr et al. 1999) � Ambiguity (compare Dorr et al. 1999) � Complex semantic ambiguity � Contextual ambiguity � Homography En: The computer outputs the data; it is fast Sp: La computadora imprime los datos; es rápida En: The box was in the pen (es: singular) Sp: La caja estaba en el corral / *la pluma corral: enclosure, pluma: writing pen En: The computer outputs the data; it is stored in ascii � Metonymy Sp: La computadora imprime los datos; están almacenados En: While driving, John swerved and hit a tree en ascii (están: plural) Sp: Mientras que John estaba manejando, se desvió y � Complex contextual ambiguity golpeó con un arbol En: John hit the dog with a stick Sp: John golpeó el perro con el palo / que tenía el palo (‘While John was driving, (itself) swerved and hit with a tree’) (hit … with the stick / (the dog) that had a stick) Jonas Kuhn: MT 23 Jonas Kuhn: MT 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend