Improving SMT by Using Parallel Data
- f a Closely Related Language
Improving SMT by Using Parallel Data of a Closely Related Language - - PowerPoint PPT Presentation
Improving SMT by Using Parallel Data of a Closely Related Language Petra Galukov and Ondej Bojar presented by Mark Fishel Institute of Formal and Applied Linguistics Charles University in Prague {galuscakova,bojar}@ufal.mff.cuni.cz
– Česílko 2.0 supports more pairs but performs worse for cs→sk.
– e.g. it does not change word order during the translation.
– as the baseline direct en→sk translation, – for the various configurations of pivoting.
– Moses is trained and tuned on the English–Czech corpus, – The resulting model is used for English→Czech translation, the
– The Czech part of the English–Czech corpus is automatically
– Moses is trained and tuned on this synthetic parallel corpus and the
– The Slovak version was created by translating from Czech. – The English version comes from various source languages.
– Česílko preserves the word order, – The translators may have pursued the same approach because
– BLEU may thus give a high credit to matching n-grams