Alignment in Machine Translation
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M - - PowerPoint PPT Presentation
Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp 1a. ok-voon ororok
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
enemok .
sprok .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
enfadados .
Translate: Clients do not sell pharmaceuticals in Europe.
When I look at an article in Russian, I say to myself: This is really written in English, but it has been coded in some strange
proceed to decode.
Warren Weaver
More about the IBM story: 20 years of bitext workshop
translation into two independent subproblems – Language modeling – Translation modeling / Alignment
introduced in early 90s at IBM
exactly one English word e
– Including NULL
– length of a = length of sentence f – ai = j if French position i is aligned to English position j
– Projecting word translations – Through alignment links
– f is French sentence with m words – e is an English sentence with l words
alignment link among (l+1) English words
– an English sentence of length l – a length m
– Pick an English source index j – Choose a translation
– an English sentence of length l – a length m
– Pick an English source index j – Choose a translation Alignment is based on word positions, not word identities Alignment probabilities are UNIFORM Words are translated independently
– Word translation probability table – for all words in French & English vocab
– an English sentence of length l – a length m
– Pick an English source index j – Choose a translation
– an English sentence of length l – a length m
– Pick an English source index j – Choose a translation Remove assumption that q is uniform
– now a table – not uniform as in IBM1
parameters are there?
Inference
– a sentence pair (e,f) – an alignment model with parameters t(e|f) and q(j|i,l,m)
probable alignment a? Parameter Estimation
– training data (lots of sentence pairs) – a model definition
parameters t(e|f) and q(j|i,l,m)?
– Model parameter tables for t and q – A sentence pair
P(e,a|f)?
– Hint: recall independence assumptions!
– Model parameter tables for t and q – A sentence pair
P(e,a|f)?
– Hint: recall independence assumptions!
– Model parameter tables for t and q – A sentence pair
P(e,a|f)?
– Hint: recall independence assumptions!
– Model parameter tables for t and q – A sentence pair
P(e,a|f)?
– Hint: recall independence assumptions!
– Model parameter tables for t and q – A sentence pair
P(e,a|f)?
– Hint: recall independence assumptions!
– Model parameter tables for t and q – A sentence pair
P(e,a|f)?
– Hint: recall independence assumptions!
Inference
(e,f), what is the most probable alignment a? Parameter Estimation
parameters t(e|f) and q(j|i,l,m) from data?
– Parallel corpus gives us (e,f) pairs only, a is hidden
– estimate t and q, given (e,a,f) – compute p(e,a|f), given t and q
– E-step: given hidden variable, estimate parameters – M-step: given parameters, update hidden variable
Use “Soft” values instead of binary counts
– Compute expected counts for parameters (t) based on summing over hidden variable
– Compute the maximum likelihood estimate of t from the expected counts
green house the house casa verde la casa
For the rest of this talk, French = Spanish
(a) compute probability of each alignment p(a|f,e)
Note: we’re making simplification assumptions in this example
French and English word is aligned to something
decrease across iterations
– Initialization matters
assumptions
– Results in linguistically naïve models
– But allows efficient parameter estimation and inference
– unlike words which are observed – require unsupervised learning (EM algorithm)
(instead of IBM models)
Start position of f_i End position of f_(i-1) Probability of two consecutive English phrases being separated by a particular span in French
Get high confidence alignment links by intersecting IBM word alignments from both directions
This means that the IBM model represents P(Spanish|English)
Improve recall by adding some links from the union of alignments
Extract phrases that are consistent with word alignment
translation into two independent subproblems – Language modeling – Translation modeling / Alignment
assumptions
– unlike words which are observed – require unsupervised learning (EM algorithm)
for more complex translation models
– E.g., phrase-based machine translation