Machine Translation 2 Wikipedia Machine translation, often - - PowerPoint PPT Presentation

machine translation
SMART_READER_LITE
LIVE PREVIEW

Machine Translation 2 Wikipedia Machine translation, often - - PowerPoint PPT Presentation

Machine Translation 2 Wikipedia Machine translation, often referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to


slide-1
SLIDE 1

An Introduction to Machine Translation

Marcello Federico FBK, Trento - Italy 2016

  • M. Federico

MT 2016 1

Outline

  • Introduction
  • Applications
  • Approaches
  • Brief history
  • Evaluation
  • Examples
  • Text genres
  • Conclusions

References:

  • P. Koehn, Statistical Machine Translation, Cambridge University Press, 2009.
  • A. Lopez, Statistical Machine Translation, ACM Computing Surveys, vol. 40, number 3, 2008.
  • D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice Hall, 2009.
  • C. Manning and H. Sch¨

utze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.

  • M. Federico

MT 2016 2

Machine Translation

Wikipedia Machine translation, often referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. Personal Definition MT generally investigates the automatic translation of ”standard” language that can be systematically observed in ordinary communication – e.g. conversations, news, speeches, business letters, user manuals, etc. –. MT is generally not concerned with literature genres, nor creative and sophisticated use of language. For several reasons, such kind of language is simply out of the scope of MT.

1For a very interesting introduction to issues related to the translation of literature work see Umberto Eco,

”Experiences in Translation”, U. Toronto Press, 2001.

  • M. Federico

MT 2016 3

Introduction to MT

  • M. Federico

MT 2016

slide-2
SLIDE 2

4

Introduction to MT

Why is machine translation so important?1

  • Information society and production of multilingual content

7 billion people - 193 countries - over 150 official languages

  • Globalization and demand for translation services:

1,000 global companies operating in at least 160 countries

  • Size of worldwide translation market:

37 billion $ per year ≈ 100 million $ per day

  • Size of translation industry:

30,000 translation companies, 250,000 translators

  • MT can improve productivity of human translators:

integration of MT with human translation (post-editing)

  • MT can supply cheap gist translation

competitive quality-cost-speed trade-off

1Sources: Common Sense Advisory, TAUS

  • M. Federico

MT 2016 5

Introduction to MT

Why is machine translation so difficult? High quality human translation implies:

  • deep and rich understanding of source language and text
  • sophisticated and creative command of target language

Nowadays, feasible goals for machine translation are tasks were:

  • even approximate translation are helpful (gist translation)
  • professional translators can take advantage of it (computer assisted translation)
  • linguistic domain is very focused and limited (apps for travelers)

In general, difficulty of translating depends on how similar the target and source languages are in their vocabulary, grammar, and conceptual structure.

  • M. Federico

MT 2016 6

Applications of MT

Gist translation for social media

  • M. Federico

MT 2016 7

Applications of MT

Carrier 12:00 PM Carrier 12:00 PM

Speech translation Apps

  • M. Federico

MT 2016

slide-3
SLIDE 3

8

Applications of MT

Integration in computer assisted translation

  • M. Federico

MT 2016 9

Differences and Similarities of Languages

  • Universal communicative role of language

– names for people, words for talking about women, men, children – every language seems to have nouns and verbs

  • Differences/similarities across large classes of languages:

– Morphology: one vs. many morphemes per words, agglutination vs. fusion – Syntax: Subj-Verb-Obj structure (E) vs. SOV (J) vs. VSO (Irish) – Semantics: mapping of semantic roles and meaning of words e.g. direction/manner of motion indicated by verb/satellite in the bottle floated out (E) → la botella sali´

  • flotando (S)
  • Lexical divergence between languages:

– Semantical: there is no corresponding word with the same meaning wall (E) → Wand/Mauer (G, inside/outside) – Syntactical: a word is better translated into another part-of-speech she likes to sing (E,v) → sie singt gerne (D,adv)

  • Cultural Differences: philosophical argument=is translation possible at all?
  • M. Federico

MT 2016 10

Lexical Divergence

English brother Japanese

  • tooto (younger)
  • niisan (older)

English is Japanese isu (subj animate) aru (subj not animate) English know French conna^ ıtre (be acquainted with) savoir (know a proposition) English they French ils (masculine) elles (feminine) German Berg English hill mountain

  • some languages make distinctions that other languages don’t
  • difficulty to translate from less specific into more specific information
  • ?? do language differences enforce different conceptual structures ??
  • ?? do people who speak different languages think differently ??2

2 Watch talk by Lera Boroditsky (U. Stanford), ”How Language Shapes Thought”, fora.tv.

  • M. Federico

MT 2016 11

Approaches to MT

  • M. Federico

MT 2016

slide-4
SLIDE 4

12

Approaches to MT

Rough classification according to employed linguistic representations:

  • Direct model: translate and re-order single words or n-grams

– basically, no linguistic representation is used

  • Transfer model: use explicit knowledge about language differences

– analyze lexical and syntactic structure of source sentence – transfer structures from source to target language – generate corresponding sentence in the target language

  • Interlingua model: extract the meaning and express it in the target language

– analyze lexical, syntactical and semantical structure of source sentence – interpret the meaning into a canonical interlingua – generate the target sentence from the interlingua Notice: required knowledge for the interlingua approach grows linearly with number of languages, rather than to the square.

  • M. Federico

MT 2016 13

Approaches to MT

Interlingua Direct Target String String Source Transfer A n a l y s i s G e n e r a t i

  • n

Syntax Semantics Semantics Syntax

  • M. Federico

MT 2016 14

Approaches to MT

How is knowledge and linguistic information acquired by the system?

  • Hand-crafted:

knowledge for analysis, transfer, generation, meaning representation, or direct translation is manually developed – most of commercial MT systems fall into this category – requires lots of human labor and expertise – includes: rule-based MT

  • Machine-learned: representations are implemented by mathematical models

learnable from data, e.g. parallel corpora of human translations – much less human effort is needed – requires huge amounts of data, the more, the better! – includes: statistical MT and neural MT

  • M. Federico

MT 2016 15

Approaches to MT

  • Transfer
  • Interlingua
  • Example-based
  • Statistical Word-based
  • Statistical Phrase-based
  • Statistical Tree-based
  • Statistical Hierarchical phrase-based
  • Neural
  • M. Federico

MT 2016

slide-5
SLIDE 5

16

Transfer Approach

context-free grammar

NP → DT NPB NPB → JJ NN NPB → NN · · · DT → the JJ → north NN → wind · · ·

Synchronous context-free grammar

NP → DT1 NPB2 / DT1 NPB2 NPB → JJ1 NN2 / NN2 JJ1 NPB → NN / NN · · · DT → the / il JJ → north / settentrionale NN → wind / vento · · · NP NPB DT the JJ NN north wind NP NPB DT il NN vento settentrionale JJ settentrionale

  • M. Federico

MT 2016 16

Transfer Approach

context-free grammar

NP → DT NPB NPB → JJ NN NPB → NN · · · DT → the JJ → north NN → wind · · ·

synchronous context-free grammar

NP → DT1 NPB2 / DT1 NPB2 NPB → JJ1 NN2 / NN2 JJ1 NPB → NN / NN · · · DT → the / il JJ → north / settentrionale NN → wind / vento · · · NP NPB DT the JJ NN north wind NP NPB DT il NN vento settentrionale JJ settentrionale

1This is a toy example. Working approaches use a very large set of probabilistic and lexicalized rules.

  • M. Federico

MT 2016 17

Interlingua Approach

  • Applied to linguistic domains with a limited number of relations and concepts

– tourist information, hotel booking, flight reservation, ...

  • Semantics of a sentence can be expressed with predicate argument structure

– I need a twin bed room reservation for tomorrow – book-room(date=tomorrow,type=single)

  • Interlingua language has to be designed carefully (by hand)

– for some application formalism similar to SQL language

  • Processing steps in IBMT:

– extract content from source sentence – map content into SQL like IL format

  • generate translation from IL format
  • M. Federico

MT 2016 18

Interlingua Approach

  • S3 : I’m arriving on june sixth
  • I: give-information+temporal+arrival (who=I, time=(june, md6))
  • T: my arrival time is sixth of june
  • S: no that’s not necessary
  • I: negate
  • T: no
  • S: and i was wondering what you have in the way of rooms available during

that time

  • I: request-information+availability+room (room-type=question)
  • T: what kind of rooms are available?

3S: speech (English), I: Interlingua, T: translation (English)

  • M. Federico

MT 2016

slide-6
SLIDE 6

19

Example-Based Approach

  • Assumption: people translate by analogy

– Decompose a sentence into phrases – Translate phrases by analogy to previous translations – Properly compose translation fragments into one long sentence

  • Given a parallel corpus of translation examples

Italian German sono possibili deboli nevicate leichte Schneef¨ alle sind m¨

  • glich

sono possibili alcuni rovesci ein paar Regenschauer sind m¨

  • glich

le deboli precipitazioni cesseranno die leichte Niederschl¨ age klingen ab si verificheranno deboli precipitazioni leichte Niederschl¨ age werden einsetzen.

  • Learn Translation patterns

sono possibili X X sind m¨

  • glich

deboli precipitazioni leichte Niederschl¨ age

  • M. Federico

MT 2016 20

Example-Based Approach

  • Assumption: people translate by analogy

– Decompose a sentence into phrases – Translate phrases by analogy to previous translations – Properly compose translation fragments into one long sentence

  • Given a parallel corpus of translation examples

Italian German sono possibili deboli nevicate leichte Schneef¨ alle sind m¨

  • glich

sono possibili alcuni rovesci ein paar Regenschauer sind m¨

  • glich

le deboli precipitazioni cesseranno die leichte Niederschl¨ age klingen ab si verificheranno deboli precipitazioni leichte Niederschl¨ age werden einsetzen.

  • Learn Translation patterns

Italian German sono possibili X − → X sind m¨

  • glich

deboli precipitazioni leichte Niederschl¨ age

  • M. Federico

MT 2016 21

Example-Based Approach

  • Assumption: people translate by analogy

– Decompose a sentence into phrases – Translate phrases by analogy to previous translations – Properly compose translation fragments into one long sentence

  • Given a parallel corpus of translation examples

Italian German sono possibili deboli nevicate leichte Schneef¨ alle sind m¨

  • glich

sono possibili alcuni rovesci ein paar Regenschauer sind m¨

  • glich

le deboli precipitazioni cesseranno die leichte Niederschl¨ age klingen ab si verificheranno deboli precipitazioni leichte Niederschl¨ age werden einsetzen.

  • Learn Translation patterns

Italian German sono possibili X − → X sind m¨

  • glich

deboli precipitazioni − → leichte Niederschl¨ age

  • M. Federico

MT 2016 22

Example-Based Approach

  • Assumption: people translate by analogy

– Decompose a sentence into phrases – Translate phrases by analogy to previous translations – Properly compose translation fragments into one long sentence

  • Given a parallel corpus of translation examples

Italian German sono possibili deboli nevicate leichte Schneef¨ alle sind m¨

  • glich

sono possibili alcuni rovesci ein paar Regenschauer sind m¨

  • glich

le deboli precipitazioni cesseranno die leichte Niederschl¨ age klingen ab si verificheranno deboli precipitazioni leichte Niederschl¨ age werden einsetzen.

  • Learn Translation patterns

Italian German sono possibili X − → X sind m¨

  • glich

deboli precipitazioni − → leichte Niederschl¨ age

  • M. Federico

MT 2016

slide-7
SLIDE 7

23

Example-Based Approach

  • Assumption: people translate by analogy

– Decompose a sentence into phrases – Translate phrases by analogy to previous translations – Properly compose translation fragments into one long sentence

  • Given translation patterns learned from data

Italian German sono possibili X − → X sind m¨

  • glich

deboli precipitazioni − → leichte Niederschl¨ age

  • Translate a (possibly new) source sentence

Italian German sono possibili deboli precipitazioni

  • M. Federico

MT 2016 24

Example-Based Approach

  • Assumption: people translate by analogy

– Decompose a sentence into phrases – Translate phrases by analogy to previous translations – Properly compose translation fragments into one long sentence

  • Given translation patterns learned from data

Italian German sono possibili X − → X sind m¨

  • glich

sono precipitazioni − → leichte Niederschl¨ age

  • Translate a (possibly new) source sentence

Italian German sono possibili deboli precipitazioni

  • M. Federico

MT 2016 25

Example-Based Approach

  • Assumption: people translate by analogy

– Decompose a sentence into phrases – Translate phrases by analogy to previous translations – Properly compose translation fragments into one long sentence

  • Given translation patterns learned from data

Italian German sono possibili X − → X sind m¨

  • glich

deboli precipitazioni − → leichte Niederschl¨ age

  • Translate a (possibly new) source sentence

Italian German sono possibili deboli precipitazioni − → leichte Niederschl¨ age sind m¨

  • glich
  • M. Federico

MT 2016 26

Example-Based Approach

  • Assumption: people translate by analogy

– Decompose a sentence into phrases – Translate phrases by analogy to previous translations – Properly compose translation fragments into one long sentence

  • Given translation patterns learned from data

Italian German sono possibili X − → X sind m¨

  • glich

deboli precipitazioni − → leichte Niederschl¨ age

  • Translate a (possibly new) source sentence

Italian German sono possibili deboli precipitazioni − → leichte Niederschl¨ age sind m¨

  • glich
  • M. Federico

MT 2016

slide-8
SLIDE 8

27

MT and Cryptography

Warren Weaver (1894-1978) (Memorandum, July 1949)

  • M. Federico

MT 2016 28

MT and Cryptography

  • The Rosetta Stone was found

in Egypt in 1799

  • It

contains the translation

  • f Egyptian hieroglyphs into

Demotic and Greek

  • Thanks to it expert were able

to decipher the language of hieroglyphs

  • Couldn’t a machine just learn

in the same way by analysing many examples

  • f

human translation?

  • M. Federico

MT 2016 29

Statistical MT

  • parallel texts
  • rientale

vento freddo un soffierà domani di serata dalla blow will wind chilly eastern an evening tomorrow since eastern Alps the affects breeze cool an un Alpi le interessa est da freddo vento

  • M. Federico

MT 2016 29

Statistical MT

  • parallel texts and word alignments
  • rientale

vento freddo un soffierà domani di serata dalla blow will wind chilly eastern an evening tomorrow since un Alpi le interessa est da freddo vento eastern Alps the affects breeze cool an

  • M. Federico

MT 2016

slide-9
SLIDE 9

29

Statistical MT

  • parallel texts and word alignments
  • rientale

vento freddo un soffierà domani di serata dalla blow will wind chilly eastern an evening tomorrow since un Alpi le interessa est da freddo vento eastern Alps the affects breeze cool an

  • word translation probabilities

15 0.15 chill ... ... ... probs 0.43 0.10 0.28 28 cool cold 43 10 chilly counts translations of freddo

  • M. Federico

MT 2016 29

Statistical MT

  • parallel texts and word alignments
  • rientale

vento freddo un soffierà domani di serata dalla blow will wind chilly eastern an evening tomorrow since un Alpi le interessa est da freddo vento eastern Alps the affects breeze cool an

  • word translation probabilities

15 0.15 chill ... ... ... probs 0.43 0.10 0.28 28 cool cold 43 10 chilly counts translations of freddo ... ... ... 59 0.59 wind probs 0.26 26 breeze counts translations of vento

  • M. Federico

MT 2016 29

Statistical MT

  • parallel texts and word alignments
  • rientale

vento freddo un soffierà domani di serata dalla blow will wind chilly eastern an evening tomorrow since un Alpi le interessa est da freddo vento eastern Alps the affects breeze cool an

  • word translation probabilities

15 0.15 chill ... ... ... probs 0.43 0.10 0.28 28 cool cold 43 10 chilly counts translations of freddo ... ... ... 59 0.59 wind probs 0.26 26 breeze counts translations of vento

  • word concatenation probabilities

5 0.05 eastern cool ... eastern ... ... probs 0.12 0.10 0.07 7 eastern breeze eastern wind 12 10 eastern chilly counts bigrams with eastern

  • M. Federico

MT 2016 30

Statistical MT

  • given word translations and word concatenation probabilities

15 0.15 chill ... ... ... probs 0.43 0.10 0.28 28 cool cold 43 10 chilly counts translations of freddo ... ... ... 59 0.59 wind probs 0.26 26 breeze counts translations of vento 5 0.05 eastern cool ... eastern ... ... probs 0.12 0.10 0.07 7 eastern breeze eastern wind 12 10 eastern chilly counts bigrams with eastern

  • generate possible translations of the source sentence

un freddo vento da est

  • M. Federico

MT 2016

slide-10
SLIDE 10

30

Statistical MT

  • given word translations and word concatenation probabilities

15 0.15 chill ... ... ... probs 0.43 0.10 0.28 28 cool cold 43 10 chilly counts translations of freddo ... ... ... 59 0.59 wind probs 0.26 26 breeze counts translations of vento 5 0.05 eastern cool ... eastern ... ... probs 0.12 0.10 0.07 7 eastern breeze eastern wind 12 10 eastern chilly counts bigrams with eastern

  • generate possible translations of the source sentence and score them

0.10 an eastern chilly wind 0.09 a eastern cool wind ... ... an eastern chilly breeze 0.05 a cold eastern wind 0.12 0.08 un freddo vento da est a cool eastern breeze

  • M. Federico

MT 2016 30

Statistical MT

  • given word translations and word concatenation probabilities

15 0.15 chill ... ... ... probs 0.43 0.10 0.28 28 cool cold 43 10 chilly counts translations of freddo ... ... ... 59 0.59 wind probs 0.26 26 breeze counts translations of vento 5 0.05 eastern cool ... eastern ... ... probs 0.12 0.10 0.07 7 eastern breeze eastern wind 12 10 eastern chilly counts bigrams with eastern

  • generate possible translations of the source sentence and score them

0.10 an eastern chilly wind 0.09 a eastern cool wind ... ... an eastern chilly breeze 0.05 a cold eastern wind 0.12 0.08 un freddo vento da est a cool eastern breeze

  • return the best scoring translation
  • M. Federico

MT 2016 31

Phrase-based Statistical MT

  • Given automatic alignment of words in parallel texts:
  • rientale

vento freddo un soffierà domani di serata dalla blow will wind chilly eastern an evening tomorrow since

  • Aligned words blocks, or phrases, are detected:
  • Phrase pairs are collected and stored with their probabilities

dalla serata di domani - since tomorrow morning un freddo vento occidentale - an eastern chilly wind will blow - soffier` a serata di domani - tomorrow evening dalla - since di domani - tomorrow ...

  • ...
  • M. Federico

MT 2016 32

Phrase-based Statistical MT

Then, we use a search algorithm to find the optimal translation with phrase-pairs How decoding works

  • translation: segment input, translate and re-arrange phrases
  • steps: select a source segment, translate and attach to target
  • M. Federico

MT 2016

slide-11
SLIDE 11

32

Statistical MT

Then, we use a search algorithm to find the optimal translation with phrase-pairs How decoding works

  • translation: segment input, translate and re-arrange phrases
  • steps: select a source segment, translate and attach to target
  • M. Federico

MT 2016 32

Phrase-based Statistical MT

Then, we use a search algorithm to find the optimal translation with phrase-pairs How decoding works

  • translation: segment input, translate and re-arrange phrases
  • steps: select a source segment, translate and attach to target
  • M. Federico

MT 2016 32

Phrase-based Statistical MT

Then, we use a search algorithm to find the optimal translation with phrase-pairs How decoding works

  • translation: segment input, translate and re-arrange phrases
  • steps: select a source segment, translate and attach to target
  • M. Federico

MT 2016 32

Phrase-based Statistical MT

Then, we use a search algorithm to find the optimal translation with phrase-pairs How decoding works

  • translation: segment input, translate and re-arrange phrases
  • steps: select a source segment, translate and attach to target
  • scores: linear combination of feature functions
  • features: phrase pairs, target n-grams, relative phrase movement , ...
  • decoder: efficient algorithm to compute (sub-)optimal solutions
  • features and combination weights are machine learnable from parallel data
  • M. Federico

MT 2016

slide-12
SLIDE 12

33

Hierarchical Phrase-Based Statistical MT

  • First tree-to-tree approach outperforming phrase-based statistical MT
  • n large scale evaluations involving very distant languages
  • Discontinuous phrases, i.e. phrases with gaps
  • Long-range reordering rules
  • Formalized as synchronous context-free grammars (transfer approach?)
  • Not based on syntactic rules: just two non-terminal symbols!
  • The model is fully machine learnable!
  • Example. Chinese-English: original, transliteration, glosses, and translation
  • M. Federico

MT 2016 34

HPBSMT: Motivations

  • Example. Typical Phrase-Based Chinese-English Translation:

Let us model some syntactic differences with simple phrase-rules:

  • Chinese VPs follow PPs / English VPs precede PPs

yu X1 you X2 / have X2 with X1

  • Chinese NPs follow RCs / English NPs precede RCs

X1 de X2 / the X2 that X1

  • translation of zhiyi construct in English word order

X1 zhiyi / one of X1 Our rules use one non-terminal X and indices to mark multiple occurrences. These rules can be inferred automatically from word-aligned parallel data.

  • M. Federico

MT 2016 35

HPBSMT: Example of Rules

S → X1 / X1 (1) S → S1 X2 / S1 X2 (2) X → yu X1 you X2 / have X2 with X1 (3) X → X1 de X2 / the X2 that X1 (4) X → X1 zhiyi / one of X1 (5) X → Aozhou / Australia (6) X → Beihan / N. Korea (7) X → she / is (8) X → bangjiao / dipl.rels. (9) X → shaoshu guojia / few countries (10)

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016

slide-13
SLIDE 13

36

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016

slide-14
SLIDE 14

36

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016

slide-15
SLIDE 15

36

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016 36

  • M. Federico

MT 2016 37

Neural Machine Translation

  • Models the translation process as a one to many map φ : f −

→ e

  • The function φ is implemented through a layered architecture
  • Each layer contains nodes of the same type
  • Each node computes some function on the output of the previous layer

Figure 1: Neural network node

  • M. Federico

MT 2016

slide-16
SLIDE 16

38

Neural Machine Translation

  • The input layer reads the input data
  • The output layer computes the values of the map φ
  • Intermediate layers are called hidden layers
  • The hidden layer compresses the input (Encoder)
  • The output layer generates the output (Decoder)

Figure 2: Feed-forward Neural Network

  • M. Federico

MT 2016 39

Neural Machine Translation

  • MT input and output do not have fixed lenght!
  • A better idea is to use Recurrent Neural Networks (RNNs)
  • RNNs read one word at the time and generate a state
  • RNNs are good at memorizing and generating sentences
  • RNNs are trained to reproduce the target from the source
  • Two RNNs are combined:

– Encoder memorizes the source sequence – Decoder generates the target sequence In the following we see a ”simplified” RNN architecture for NMT4

4 Figures from ”Introduction to Neural Machine Translation with GPUs” by Kyunghyun Cho, 2015.

  • M. Federico

MT 2016 40

Neural Machine Translation

Figure 3: Encoder RNN for MT and its time unfolded representation.

  • M. Federico

MT 2016 41

Neural Machine Translation

Figure 4: Time Unfolded Encoder RNN for MT

  • M. Federico

MT 2016

slide-17
SLIDE 17

42

Neural Machine Translation

Figure 5: Time Unfolded Decoder RNN for MT

  • M. Federico

MT 2016 43

Neural Machine Translation

  • In 2010, networks with many layers (deep learning) boosted ASR
  • In 2015, RNN outperformed SMT on some large scale tasks
  • > 20 years empirical experience

– new types of nodes, e.g. LSTM – new models, e.g. Attention Model – more powerful computers, e.g. GPUs

  • Represents another paradigm shift:

– Rule-based MT is rule engineering from knowledge – Statistical MT is feature engineering from data – Neural MT reduces to network design

  • Pros: fast to test, strong generalisation capabilities
  • Cons: slow to train, weak at memorizing training data
  • M. Federico

MT 2016 44

A Brief History of Machine Translation

  • M. Federico

MT 2016 45

A Brief History of Machine Translation

before 1900 various suggestions about “mechanic” translation 1933 French Patent by George Artsouni: storage device on paper tape to find translations of words Russian Patent by Petr Petrovich Troyanskii: lexical-syntactic transfer (base-forms+syntactic functions) 1949 memorandum by Warren Weaver (and Andrew D. Booth): cryptography methods, statistical methods, Shannon’s theory 1951 First research position on MT at MIT 1954 rule-based MT project by Georgetown U. + IBM: public demo Russian to English (Vocab: 250 words, Grammar: 6 rules) 1955

  • U. Leningrad: interlingua as artificial language

4A rich source of historical information about MT is in John Hutchins’ website http://www.hutchinsweb.me.uk.

  • M. Federico

MT 2016

slide-18
SLIDE 18

46

A Brief History of Machine Translation

1933: translating machine by Artsouni

  • M. Federico

MT 2016 47

A Brief History of Machine Translation

1954: computer IBM 701 used for the first MT demo.

  • M. Federico

MT 2016 48

A Brief History of Machine Translation

1956-1966 large scale funding in US: high expectation & disillusion 1957 Peter Toma starts building Systran 1958

  • U. Washington, IBM : word-for-word approach

Russian-English system for US Air Force (up to 1970) 1960 RAND corp. rough translation with statistical approach 1961

  • U. Georgetown (+ P. Toma) Russian to English demo

rule based (more levels of analysis) around 1960 MIT and U. Texas work on syntactic transfer approach 1967 ALPAC report: US funding drastically reduced for 10 years 1970-1981

  • U. Montreal, TAUM project: rule-based, logic-programming

success with weather forecasts, failure with aviation manuals 1960-1971

  • U. Texas and U. Grenoble work on interlingua approach, logic

1975 interlingua looses interest

  • M. Federico

MT 2016 49

A Brief History of Machine Translation

1980 - Rule based transfer and new interlingua approaches based on linguistic theories, logic programming, AI 1990 - Rule based MT dominance is broken Statistical alignment models for French-English (IBM) Example-based translation (Sato and Nagao, Japan) 1990 - Speech translation projects: limited domains ATR, Kyoto: automatic telephony research CSTAR consortium (US, Europe, Asia) Verbmobil project (Germany) 2000 - Unrestricted Language Translation Automatic evaluation metrics for MT (IBM) TIDES/GALE (US): written/spoken news Chi/Ara to Eng TC-STAR (EU): news Chi to Eng speeches Spa-Eng

  • M. Federico

MT 2016

slide-19
SLIDE 19

50

A Brief History of Machine Translation

2005 - Open source for MT Toolkits: Moses, Hiero, Cdec, SRILM, Irstlm, KENLM ... Resources: Europarl, UN, French-English 109 corpus 2010 - MT for post-editing Tasks: online learning, QE, APE Projects: MateCat, CasmaCat, TransCenter, LILT 2015 - Neural MT Breakthroughs: Enc-Dec RNN, LSTM units, attention model Toolkits: Theano, Tensorflow, ...

  • M. Federico

MT 2016 51

Machine Translation Evaluation

  • M. Federico

MT 2016 52

Machine Translation Evaluation

Experimental research in HLT generally follows this development cycle:

Insight Model Decide Results Benchmark Discard Improve Analyse Implement Evaluate Exploit

Evaluation bottleneck MT developers need to monitor the effect of changes to their systems in order to weed out bad ideas from good ideas!

  • M. Federico

MT 2016 53

Machine Translation Evaluation

How do we evaluate the output of a MT system?

  • Human MT evaluation

– criteria: adequacy, fluency, ranking, post-edit effort – pros: very accurate, high quality – cons: expensive, slow, difficult, subjective, difficult to reproduce

  • Automatic MT evaluation

– criteria: similarity with respect to one or more human translations – pros: cheap, quick, correlates with human judgments, good sensitivity – cons: difficult to interpret MT systems can be tuned to optimize automatic metrics!

  • M. Federico

MT 2016

slide-20
SLIDE 20

54

Machine Translation Evaluation

Compare MT output against one or more human translations (references):

  • Word alignment methods

– WER: minimal number of word insertion, deletions, and substitutions – TER: extends edit distance with n-gram shifts

  • N-gram matching methods

– BLEU: matching n-grams – NIST: extends BLEU by weighting n-gram with informativeness – GTM: F-score of matching n-grams rewarding longer matches. – METEOR: weighted F-score of 1-gram matches with synonymy matching. – ... – ....

  • M. Federico

MT 2016 55

WER: Word Error Rate

Compute alignment between hypothesis (H) and reference (R) that minimizes editing operations (word insertions, deletions, and substitutions) to fix H. H: it is a guide to action which ensures that the military ... R: it is a guide to action that ensures that the military ... .... always obeys the

  • commands of the party

.... will forever heed party commands -

  • To fix H we need to apply 4 substitutions + 1 insertion + 3 deletions = 8

We compute the ratio between the number of operations on the length of R WER = 8 16 = 0.50

  • M. Federico

MT 2016 56

TER: Translation Error Rate

  • H: this week the Saudis denied information published in the New

York Times

  • R:

Saudi Arabia denied this week information published in the American New York Times

  • “this week” is shifted
  • “Saudi Arabia” in the REF appears as “the Saudis” in the HYP
  • “American” appears only in the REF

In this case, the number of edits is 4 (1 shift, 2 substitutions, and 1 deletion): TER% = 4 13 × 100 = 30.8% Minimizing edits is computationally hard, hence TER values are suboptimal.

  • M. Federico

MT 2016 57

BLEU

H: it is a guide to action which ensures that the military always

  • beys the commands of the party

R1: it is a guide to action that ensures that the military will forever heed party commands R2: it is the guiding principle which guarantees the military forces always being under the command of the party 1-grams matches 15/18 = 0.83 2-grams matches: 10/17=0.59 3-grams matches: 7/16=0.44 4-gram ....

  • BLEU averages precision scores of 1-grams, 2-grams, ..., 5-grams
  • Zero matches → BLEU=0, 100% matches on all n-grams → BLEU=1
  • Length penalty applied if length of hypothesis is shorter than closest reference
  • M. Federico

MT 2016

slide-21
SLIDE 21

58

Examples

  • M. Federico

MT 2016 59

Example 1: Arabic English

Human Dubai 2 - 7 ( AFP ) - The Secretary-General of the United Nations Kofi Annan said he would donate the international Zayed Prize for the Environment , which he received on Monday night in Dubai worth 500000 dollars , to setup a foundation for agriculture and educating girls in Africa . Machine Dubai 2-7 (AFP) - United Nations Secretary-General Kofi Annan said that the award will Zayed International Environment, which received Monday evening in Dubai worth 500,000 dollars to establish an institution for agriculture and education of girls in the African continent.

  • M. Federico

MT 2016 60

Example 1: Arabic English

Human Dubai 2 - 7 ( AFP ) - The Secretary-General of the United Nations Kofi Annan said he would donate the international Zayed Prize for the Environment , which he received on Monday night in Dubai worth 500000 dollars , to setup a foundation for agriculture and educating girls in Africa . Machine Dubai 2-7 (AFP) - United Nations Secretary-General Kofi Annan said that the award will Zayed International Environment, which ... he ... received ... on... Monday evening in Dubai worth 500,000 dollars ... , will be donated ... to establish an institution for agriculture and education

  • f girls in the African continent.

Looks useful for post-editing!

  • M. Federico

MT 2016 61

Example 2: Arabic English

Human New York ( The United Nations ) 2 - 8 ( AFP ) - United Nations Secretary General Kofi Annan expressed his concern today , Tuesday , about the wave of targeted liquidations being carried out by Israel in Gaza and the West Bank , and he also condemned the rocket attacks targeting the Hebrew State , according to his spokesman . Machine New York (United Nations) 2-8 (AFP) - United Nations Secretary General Kofi Annan expressed concern today, Tuesday, the wave of qualifiers quality by Israel in Gaza and the West Bank, also condemned the missile attacks against the Jewish state, his spokesman said.

  • M. Federico

MT 2016

slide-22
SLIDE 22

62

Example 2: Arabic English

Human New York ( The United Nations ) 2 - 8 ( AFP ) - United Nations Secretary General Kofi Annan expressed his concern today , Tuesday , about the wave of targeted liquidations being carried out by Israel in Gaza and the West Bank , and he also condemned the rocket attacks targeting the Hebrew State , according to his spokesman . Machine New York (United Nations) 2-8 (AFP) - United Nations Secretary General Kofi Annan expressed concern today, Tuesday, ... about ... the wave of qualifiers quality targeted liquidations by Israel in Gaza and the West Bank, ... and he ... also condemned the missile attacks against the Jewish state, his spokesman said. Looks also useful for post-editing!

  • M. Federico

MT 2016 63

Example 3: Chinese English

Human Today was the Catholic Church’s annual ”Life Day”. Pope Benedict XVI delivered a speech in St. Peter’s Basilica, in which he criticized that the hedonism of wealthy society impairs the Christian value system of respect for life, and he strongly condemned abortion and euthanasia. Machine Today is the ”life” of the Catholic Church once a year, when 16 of the pope delivered a speech in St. Peter’s cathedral, criticized the joy of an affluent society, undermine the values of the Christian faith to respect life, and strongly condemned euthanasia and abortion.

  • M. Federico

MT 2016 64

Example 3: Chinese English

Human (?) Today was the Catholic Church’s annual ” Life Day ”. Pope Benedict XVI delivered a speech in St. Peter’s Basilica, in which he criticized that the hedonism of ...our... wealthy society ...which... impairs the Christian value system of respect for life, and he strongly condemned abortion and euthanasia. Machine Today is the ”life ..day...” of the Catholic Church once a year, when 16 of the pope delivered a speech in St. Peter’s cathedral, ...he... criticized the joy of an affluent society, ... that... undermines the values of the Christian faith to respect life, and strongly condemned euthanasia and abortion. Difficult to make out the meaning of this!

  • M. Federico

MT 2016 65

Example 4: Chinese English

Human The Pope told thousands of believers making the pilgrimage to St . Peter’s Basilica , ” Life is often glorified during times of happiness , but no longer respected during times

  • f sickness and trouble or when it is impaired . ”

Machine The pope told thousands who came to St. Peter’s church followers, ”when the joys of life were often, but sick or disabled, will no longer be respected.”

  • M. Federico

MT 2016

slide-23
SLIDE 23

66

Example 4: Chinese English

Human The Pope told thousands of believers making the pilgrimage to St . Peter’s Basilica , ” Life is often glorified during times of happiness , but no longer respected during times

  • f sickness and trouble or when it is impaired . ”

Machine The pope told thousands ... of followers... who came to St. Peter’s church followers, ”when the-re is joys of life were..was.. often ..glorified.., but ...when... sick or disabled, will..it is.. no longer be respected.” Slightly better but still hard to grasp!

  • M. Federico

MT 2016 67

MT of Text Genres

There is very much content to be translated in the world and not all of it is actually expressed with high quality, creative and sophisticated language. Text or speech genres can be characterized by:

  • Purpose: e.g., informative, persuasive, instructive
  • Type: e.g. narrative, argumentative, descriptive, expository
  • Register: e.g. formal, casual, intimate, ...
  • Style: e.g. dialogic, descriptive, grammatical choices, sentence length, ...

Different genres present different translation difficulties, e.g.:

  • Novels, speeches, critical reviews: style, rhetorical figures, idioms, ...
  • News stories, technical documentation: names, terminology, ...

Remark: MT is so far better addressing genres using simple linguistic structures and words with a literal although technical meaning.

  • M. Federico

MT 2016 68

MT of Text Genres

Example from The New York Times, Critics’s Notebook, 2011: A string of tedious shows can turn the intrepid theater goer into a couch potato.

  • Genre: critical review
  • Purpose: persuasive
  • Type: argumentative
  • Register: formal
  • Style: use of humorous, idioms, rhetorical figures (hyperbole)

[Exercise 2. Try to translated this sentence with an on-line translation systems and comment the results.]

  • M. Federico

MT 2016 68

MT of Text Genres

Example from The New York Times, Critics’s Notebook, 2011: A string of tedious shows can turn the intrepid theater goer into a couch potato.

  • Genre: critical review
  • Purpose: persuasive
  • Type: argumentative
  • Register: formal
  • Style: use of humorous, idioms, rhetorical figures (hyperbole)

MT output by Google Translate: Una serie di spettacoli noiosi pu`

  • trasformare il frequentatore di teatro

intrepido in un teledipendente. MT output correctly reflects the structure and meaning, but part of the writing style and consequent effect are lost. Rendering the original effect likely requires language and world knowledge of a native speaker.

  • M. Federico

MT 2016

slide-24
SLIDE 24

69

MT of Text Genres

Example from IBM’s online technical documentation, 2013: Similarly, each message displayed in the interface includes a link to the help for that message.

  • Genre: technical documentation
  • Purpose: informative
  • Type: expository
  • Register: formal
  • Style: use literal meaning of words though technical, neutral text, low ambiguity

(polysemy, coreference, ...) [Exercise 4. Try to translated this sentence with an on-line translation systems and comment the results.]

  • M. Federico

MT 2016 69

Translation of Text Genres

Example from IBM’s online technical documentation, 2013: Similarly, each message displayed in the interface includes a link to the help for that message.

  • Genre: technical documentation
  • Purpose: informative
  • Type: expository
  • Register: formal
  • Style: use literal meaning of words though technical, neutral text, low ambiguity

(polysemy, coreference, ...) MT output by Google Translate: Allo stesso modo, ogni messaggio visualizzato nell’interfaccia include un collegamento alla Guida per quel messaggio. MT output correctly reflects the grammatical structure, meaning and style.

  • M. Federico

MT 2016 70

Conclusions

  • MT is a very competitive technology

– statistical and machine learning methods are dominant – several commercial MT systems: Google, Microsoft, IBM, LW, ...

  • Evaluation campaigns are organized every year:

– NIST: news texts - Chi/Ara to Eng (2002-) – IWSLT: lectures/speeches - Asian-EU languages (2004-) – WMT: news texts - many EU languages (2005-)

  • Best performing MT systems use either:

– brute force direct translation exploiting large amounts of data – phrase-based, tree-based models, neural networks

  • Automatic evaluation has boosted research in MT:

– model training can directly optimize the evaluation metric

  • Still hard problems:

– morphologically reach languages (data sparseness) – distant language pairs (word re-ordering) – long dependencies (context)

  • M. Federico

MT 2016