Outline Multi-Engine Machine Translation 1 Alignment Search Space - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Multi-Engine Machine Translation 1 Alignment Search Space - - PowerPoint PPT Presentation

Multi-Engine Machine Translation Model Combination Other Combination Approaches Outline Multi-Engine Machine Translation 1 Alignment Search Space Features Match Model Combination 2 Other Combination Approaches 3 Kenneth Heafield


slide-1
SLIDE 1

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Outline

1

Multi-Engine Machine Translation Alignment Search Space Features

Match

2

Model Combination

3

Other Combination Approaches

Kenneth Heafield System Combination

slide-2
SLIDE 2

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Individual Systems METEOR This Work: MEMT Input Translate Translate Translate Align Decode Output

Kenneth Heafield System Combination

slide-3
SLIDE 3

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Arabic-English Example Combination

System 1: So even if that was meaningful , it is because you were late System 2: Even if feasible , it is because you have been delayed Combine Combined: Even if feasible , it is because you were late

= Compare

Reference: And even if that was useful , it was because you were late

Kenneth Heafield System Combination

slide-4
SLIDE 4

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Sentence Pair Alignment

Match surface, stems, WordNet synsets, and automatic paraphrases Minimize crossing alignments Twice that produced by nuclear plants Double that that produce nuclear power stations

Lavie and Agarwal, METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments, WMT 2007. Kenneth Heafield System Combination

slide-5
SLIDE 5

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Overall Alignment: Urdu-English Example

1 Russian President Putin Mir it for a big success . 2 The Russian president the result of a big victory for Putin .

Kenneth Heafield System Combination

slide-6
SLIDE 6

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Overall Alignment: Urdu-English Example

1 Russian President Putin Mir it for a big success . 2 The Russian president the result of a big victory for Putin . 1 Russian President Putin Mir it for a big success . 3 For the result Russian President Mir Putin is a great success . 2 The Russian president the result of a big victory for Putin . 3 For the result Russian President Mir Putin is a great success .

Kenneth Heafield System Combination

slide-7
SLIDE 7

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Search Space

Algorithm Start at the beginning of each sentence Branch by appending the first unused word from a system Example System 1: Now can know why . System 2: Now we can now know why . Partial Hypothesis

  • Now

Now

Kenneth Heafield System Combination

slide-8
SLIDE 8

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Search Space

Algorithm Start at the beginning of each sentence Branch by appending the first unused word from a system Use the appended word and those aligned with it Example System 1: Now can know why . System 2: Now we can now know why . Partial Hypothesis Now

  • can

we

Kenneth Heafield System Combination

slide-9
SLIDE 9

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Search Space

Algorithm Start at the beginning of each sentence Branch by appending the first unused word from a system Use the appended word and those aligned with it Loop until all hypotheses reach end of sentence Example System 1: Now can know why . System 2: Now we can now know why . Partial Hypothesis Now we

  • can

can

Kenneth Heafield System Combination

slide-10
SLIDE 10

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Search Space

Algorithm Start at the beginning of each sentence Branch by appending the first unused word from a system Use the appended word and those aligned with it Loop until all hypotheses reach end of sentence Example System 1: Now can know why . System 2: Now we can now know why . Partial Hypothesis Now we can

  • know

now

Kenneth Heafield System Combination

slide-11
SLIDE 11

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Outline

1

Multi-Engine Machine Translation Alignment Search Space Features

Match

2

Model Combination

3

Other Combination Approaches

Kenneth Heafield System Combination

slide-12
SLIDE 12

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Features

Length Length of hypothesis Language Model Model: log probability from an ARPA language model OOV: count of words not found in the model Match Count of n-grams matching each system

Kenneth Heafield System Combination

slide-13
SLIDE 13

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Feature Rationale

Length Length of hypothesis Compensate for length’s impact on other features Language Model Model: log probability from an ARPA language model OOV: count of words not found in the model Fluent output with tuned OOV penalty Match Count of n-grams matching each system Agreement with translation systems

Kenneth Heafield System Combination

slide-14
SLIDE 14

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

Match Features

System 1: Supported Proposal of France System 2: Support for the Proposal of France Hypothesis Hypothesis: Support for Proposal of France Count Unigram Bigram Trigram Quadgram System 1 4 2 1 System 2 5 3 1

Kenneth Heafield System Combination

slide-15
SLIDE 15

Multi-Engine Machine Translation Model Combination Other Combination Approaches Alignment Search Space Features

What’s in a match?

Exact matches Lexical choice Choosing between aligned alternatives Approximate matches Vote to include/exclude text Word order Answer Use both types of features Exact matches effectively get a tunable bonus

Kenneth Heafield System Combination

slide-16
SLIDE 16

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Individual Systems Model Combination Input Hypergraph Hypergraph Hypergraph Select Output

Kenneth Heafield System Combination

slide-17
SLIDE 17

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Model Combination is Hypothesis Selection

The Search Space Union of search spaces from each system Combined sentence must be in one system’s hypergraph Formally Every system outputs a hypergraph Phrasal lattice is just a special-case hypergraph Add a root node and an edge to each system root

Kenneth Heafield System Combination

slide-18
SLIDE 18

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Model Combination is Hypothesis Selection

The Search Space Union of search spaces from each system Combined sentence must be in one system’s hypergraph Formally Every system outputs a hypergraph Phrasal lattice is just a special-case hypergraph Add a root node and an edge to each system root Source Alignment Hypergraphs retain alignment to source

Kenneth Heafield System Combination

slide-19
SLIDE 19

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Features

Length Length of hypothesis Model score Score given by the underlying system System indicator Each system has a feature: 1 if derived from that system 0 otherwise N-gram support Support from each system for n-grams

Kenneth Heafield System Combination

slide-20
SLIDE 20

Multi-Engine Machine Translation Model Combination Other Combination Approaches

N-gram support

Posterior of n-gram What fraction of system i’s translations include “crack rocks”? Formally vn

i (g) = EPi(d|f )h(d, g)

vn

i (g) System i’s vote for n-gram g

Pi(d|f ) Probability of a derivation d in hypergraph f from system i h(d, g) 1 if the derivation d contains n-gram g; 0 otherwise

Kenneth Heafield System Combination

slide-21
SLIDE 21

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Performance

ar-en zh-en Best individual 43.9 28.4 Combined 45.3 29.0

Table: Performance (BLEU) on NIST 2008 task using three systems

Kenneth Heafield System Combination

slide-22
SLIDE 22

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Input Translate Translate Translate Combine Combine Combine Combine Combine Combine Output

Kenneth Heafield System Combination

slide-23
SLIDE 23

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Serial System Combination

Input Translate Post-edit Output

Kenneth Heafield System Combination

slide-24
SLIDE 24

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Input Comparison

Input to System Combination MBR N-best list Hyposel N-best list Confusion Networks N-best list MEMT 1-best Model Combination Hypergraph Serial System Combination Single output

Kenneth Heafield System Combination

slide-25
SLIDE 25

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Results Into English

Czech-English German-English memt 1.3 upv 0.4 rwth 0.6 bbn 1.6 jhu

  • 0.2

memt 1.8 upv 0.8 rwth 1.6 koc

  • 0.6

bbn 2.0 jhu 0.8 hypo 0.9 Spanish-English French-English memt 0.7 upv 0.1 bbn 1.0 jhu

  • 0.3

memt 0.2 upv

  • 0.2 rwth

0.4 bbn 0.4 jhu

  • 0.3

dcu 0.9 hypo

  • 0.0 lium
  • 0.4

Kenneth Heafield System Combination

slide-26
SLIDE 26

Multi-Engine Machine Translation Model Combination Other Combination Approaches

Results From English

English-Czech English-German memt 0.4 upv 0.9 rwth 0.9 koc 0.0 dcu 2.2 memt 0.9 upv 0.4 rwth 0.4 koc 0.3 English-Spanish English-French memt 1.4 upv 0.4 rwth 0.7 koc 0.0 memt 1.2 upv 1.0 rwth 1.0 koc 0.8

Kenneth Heafield System Combination