Alignment in Machine Translation
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
Figures credit: Matt Post
Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M - - PowerPoint PPT Presentation
Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Figures credit: Matt Post Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
Figures credit: Matt Post
enemok .
sprok .
Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
Your assignment, put these words in order: { jjat, arrat, mat, bat, oloat, at-yurp }
enfadados .
Translate: Clients do not sell pharmaceuticals in Europe.
More about the IBM story: 20 years of bitext workshop
independent subproblems
in early 90s at IBM
English words
Alignment is based on word positions, not word identities Alignment probabilities are UNIFORM Words are translated independently
vocab
Remove assumption that q is uniform
there?
Inference
parameters t(f|e) and q(j|i,l,m)
alignment a?
Parameter Estimation
pairs)
t(f|e) and q(j|i,l,m)?
Inference
parameters t(f|e) and q(j|i,l,m)
alignment a?
Parameter Estimation
pairs)
t(f|e) and q(j|i,l,m)?
Use “Soft” values instead of binary counts
variable
green house the house casa verde la casa
In this example: Source language F = Spanish Target language E = English
(a) compute probability of each alignment p(a,f|e)
Note: we’re making simplification assumptions in this example
French and English word is aligned to something
iterations
IBM1 http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/ibm12.pdf