statistical machine translation rapid development with
play

Statistical Machine Translation: Rapid Development with Limited - PowerPoint PPT Presentation

Statistical Machine Translation: Rapid Development with Limited Resources George Foster, Simona Gandrabur, Philippe Langlais , Pierre Plamondon, Graham Russell and Michel Simard RALI-DIRO Universit e de Montr eal CP. 6128 succursale


  1. Statistical Machine Translation: Rapid Development with Limited Resources George Foster, Simona Gandrabur, Philippe Langlais , Pierre Plamondon, Graham Russell and Michel Simard RALI-DIRO Universit´ e de Montr´ eal CP. 6128 succursale centre-ville Montr´ eal (Qu´ ebec) Canada, H3C 3J7 www-rali.iro.umontreal.ca 1 MT-Summit IX — New Orleans

  2. Motivation What progress can a small team of de- velopers expect to achieve in creating a statistical MT system for an unfamil- iar language, using only data and tech- nology readily available in-house, or at short notice from external sources? • Work conducted within the NIST 2003 MT evaluation task http://www.nist.gov/speech/tests/mt/ • Chinese-to-English task • Computing resources: Pentium-4 class PCs with a maximum of 1Gb RAM 2 MT-Summit IX — New Orleans

  3. We had a plan... Rescoring approach built on top of a roughly state-of-the-art transla- tion model such as IBM Model 4 • Extensively used in automatic speech recognition on n-best lists or word-graphs (Ortmanns et al., 1997; Rose and Riccardi, 1999) • More recently proposed for use in SMT (Och and Ney, 2002; Soricut et al., 2002; Ueffing et al., 2002). 3 MT-Summit IX — New Orleans

  4. Step 1 – Preparing with Canadian Hansards Alegr´ ıas • Install necessary packages , • Train translation and language models , • Write IBM model 2&4 decoders → IBM4 models trained with: GIZA ++ and mkcls ֒ www-i6.informatik.rwth-aachen.de/Colleagues/och → Language and IBM model 2 trained with in-house packages ֒ → Multiple search strategy (Nießen et al., 1998; Germann et ֒ al., 2001) 4 MT-Summit IX — New Orleans

  5. Step 1 – 3/4 weeks later . . . Seguiriyas We’ve got it ! Our first English-French GIZA ++ model was ready to use. • Establishing the limits of the package (maximum input size, etc.) → 2/3 days of computation to train on a corpus of 1 ֒ million pair of sentences → can’t train with more data (memory problems) ֒ • Running mkcls → around 10 hours of computation to cluster a vocabulary ֒ into 50 classes • Writing wrappers for the model data structures 5 MT-Summit IX — New Orleans

  6. Step 2 – Corpus Preprocessing SMT is not exactly language blind . . . The Linguistic Data Consortium (LDC) dis- tributed the Training data for the NIST task (at least partially did). http://www.ldc.upenn.edu/ → A surprising variety of formats (sounds nice but is not) ֒ → Word boundaries inserted by means of a revised version of ֒ the mansegment program supplied by the LDC. → Doubt: is our sentence aligner supposed to work for ֒ Chinese/English corpora? One person-month of effort for a judicious mixture of automatic and semi-automatic approaches 6 MT-Summit IX — New Orleans

  7. Step 2 – Corpus Preprocessing Take one • For the NIST exercice, only pre-aligned texts were used. → Some regions acknowledged by the supplier to be ֒ potentially unreliable were omitted. • Instead of recompiling GIZA ++ in order to account for sentences longer than 40 words, we devised an knowledge-poor splitter relying heavily on punctuations. → In cases where no suitable punctuation existed, ֒ sentences were split at an arbitrary token boundary. → Mostly a mix of un and hansard was used to train language ֒ and translation models 7 MT-Summit IX — New Orleans

  8. Step 3 – Decoders The joy of diversity Three different decoders, all previously described in the statistical MT litera- ture, were implemented. Sounds odd to do that under time pressure, but we found possible advantages: • Detection of certain bugs (actually useful) • Competition between coders: “mine is better than yours” (did work too) • Could be fruitful in a rescoring strategy (details later) Detail: explicit enumeration of the candidate translations (n-best lists) 8 MT-Summit IX — New Orleans

  9. Step 3 – Decoders Greedy decoder (Germann et al., 2001) • ISI ReWrite Decoder available a at: http://www.isi.edu/licensed-sw/rewrite-decoder • Requires the language model to be trained with the CMU-Cambridge Statistical Language Modeling Toolkit (Clarkson and Rosenfeld, 1997) → We found it easier to rewrite the ReWrite Decoder ֒ Hypotheses generated by the hill-climbing search were collected into an n-best list. a At least for Canadian residents 9 MT-Summit IX — New Orleans

  10. Step 3 – Decoders Inverted Alignment Decoder (Nießen et al., 1998) Shame on us: we also tested the performance of a DP-decoder designed for IBM model 2 . for all target position i = 1 , 2 , . . . , I max do prune( i − 1); for all live hypotheses h i do for all word w in the Active Vocabulary do for all fertility f ∈ { 1 , 2 , 3 } do for all uncovered source positions j, . . . , j + f − 1 do Consider h ′ the extension of h i with w (at target position i ) aligned with j, . . . , j + f − 1 if score ( h ′ ) > Score ( i, j, f, c ) then Keep h ′ and record back-track information Best live hypotheses are kept in an n-best list 10 MT-Summit IX — New Orleans

  11. Step 3 – Decoders Stack-based Decoder (FST) • loop until time limit reached: – pop best node from the stack – if final, add hypothesis to n-best list – else ∗ expand “exhaustively” ∗ add resulting hypotheses to graph and stack • main properties: – all nodes retained in graph – fast output of initial hypotheses, with successive refinement – precise time control – no heuristic function on suffixes 11 MT-Summit IX — New Orleans

  12. Step 3 – Decoders Stack-based Decoder (FST) • graph properties: – 30M nodes in ≈ 1GB – nodes retain trigram state and source alignments – retroactive score correction • prefix heuristics: – multiple stacks to correct for prefix-score bias: ∗ number of source and target words ∗ unigram logprob – pop depends on stack and gain over parent • timing: max 3 minutes per source sentence; more time gives better model scores but worse NIST scores 12 MT-Summit IX — New Orleans

  13. Step 3 – Decoders The cost of diversity decoder coding tuning greedy 2 1 FST 3 3 IBM2 2 3 total 7 7 Approximate number of person-weeks of development But it was worth it! (the price of a decent IBM-4 stack-based decoder) 13 MT-Summit IX — New Orleans

  14. Step 3 – Decoders Finally • Two decoders for IBM model 4, one for IBM model 2. • Existential questions: “is it good or bad ?”, “why?”, etc. • Tuning the compromise between speed and quality is difficult • Incremental improvements → We compared the decoders by translating 100 sentences (of ֒ at most 20 words) • greedy (results within 10 minutes or so) • fst (results within half an hour or so) • ibm2-fast (results within few seconds) • ibm2-slow (results within half an hour or so) 14 MT-Summit IX — New Orleans

  15. The bigger the better? Main factor is decoder type 1-best 100-best decoder wer nist nist% wer nist nist% hansard greedy 68 · 93 2 · 41448 24 · 20 61 · 71 3 · 68806 37 · 00 ibm2-fast 65 · 87 3 · 22954 32 · 30 59 · 22 4 · 42125 44 · 20 ibm2-slow 63 · 85 3 · 85769 38 · 50 53 · 03 5 · 28764 52 · 80 fst 62 · 86 4 · 19043 41 · 90 55 · 24 5 · 10464 51 · 00 un greedy 70 · 35 2 · 76181 26 · 10 62 · 97 3 · 98415 37 · 70 ibm2-fast 69 · 80 3 · 19254 30 · 20 63 · 04 4 · 38660 41 · 50 ibm2-slow 68 · 77 4 · 39036 41 · 50 58 · 65 5 · 77882 54 · 60 fst 65 · 57 4 · 56739 43 · 20 57 · 18 5 · 80536 54 · 90 sinorama greedy 86 · 89 0 · 79860 7 · 80 82 · 16 1 · 37465 13 · 40 ibm2-fast 87 · 55 1 · 09399 10 · 30 82 · 45 1 · 68875 15 · 80 ibm2-slow 87 · 56 1 · 46096 13 · 70 81 · 55 2 · 44893 23 · 00 fst 88 · 97 1 · 72001 16 · 10 85 · 40 2 · 35273 22 · 00 xinhua greedy 89 · 64 1 · 30970 12 · 70 85 · 10 2 · 00496 19 · 40 ibm2-fast 91 · 09 1 · 08899 10 · 30 85 · 90 1 · 86932 17 · 70 ibm2-slow 89 · 13 1 · 34132 12 · 70 83 · 86 2 · 29718 21 · 80 fst 90 · 82 1 · 08510 10 · 30 87 · 98 1 · 56167 14 · 80 15 MT-Summit IX — New Orleans

  16. The bigger the better? Search space is also important 1-best 100-best decoder wer nist nist% wer nist nist% hansard greedy 68 · 93 2 · 41448 24 · 20 61 · 71 3 · 68806 37 · 00 ibm2-fast 65 · 87 3 · 22954 32 · 30 59 · 22 4 · 42125 44 · 20 ibm2-slow 63 · 85 3 · 85769 38 · 50 53 · 03 5 · 28764 52 · 80 fst 62 · 86 4 · 19043 41 · 90 55 · 24 5 · 10464 51 · 00 un greedy 70 · 35 2 · 76181 26 · 10 62 · 97 3 · 98415 37 · 70 ibm2-fast 69 · 80 3 · 19254 30 · 20 63 · 04 4 · 38660 41 · 50 ibm2-slow 68 · 77 4 · 39036 41 · 50 58 · 65 5 · 77882 54 · 60 fst 65 · 57 4 · 56739 43 · 20 57 · 18 5 · 80536 54 · 90 sinorama greedy 86 · 89 0 · 79860 7 · 80 82 · 16 1 · 37465 13 · 40 ibm2-fast 87 · 55 1 · 09399 10 · 30 82 · 45 1 · 68875 15 · 80 ibm2-slow 87 · 56 1 · 46096 13 · 70 81 · 55 2 · 44893 23 · 00 fst 88 · 97 1 · 72001 16 · 10 85 · 40 2 · 35273 22 · 00 xinhua greedy 89 · 64 1 · 30970 12 · 70 85 · 10 2 · 00496 19 · 40 ibm2-fast 91 · 09 1 · 08899 10 · 30 85 · 90 1 · 86932 17 · 70 ibm2-slow 89 · 13 1 · 34132 12 · 70 83 · 86 2 · 29718 21 · 80 fst 90 · 82 1 · 08510 10 · 30 87 · 98 1 · 56167 14 · 80 16 MT-Summit IX — New Orleans

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend