Machine Translation: Word Alignment Problem Marcello Federico FBK, - - PDF document

machine translation word alignment problem
SMART_READER_LITE
LIVE PREVIEW

Machine Translation: Word Alignment Problem Marcello Federico FBK, - - PDF document

Machine Translation: Word Alignment Problem Marcello Federico FBK, Trento - Italy 2013 M. Federico MT 2013 Outline 1 Word alignments Word alignment models Alignment search Alignment estimation EM algorithm M. Federico


slide-1
SLIDE 1

Machine Translation: Word Alignment Problem

Marcello Federico FBK, Trento - Italy 2013

  • M. Federico

MT 2013 1

Outline

  • Word alignments
  • Word alignment models
  • Alignment search
  • Alignment estimation
  • EM algorithm
  • M. Federico

MT 2013

slide-2
SLIDE 2

2

Example of Parallel Corpus

Darum liegt die Verantwortung f¨ ur das Erreichen des Effizienzzieles und der damit einhergehenden CO2 -Reduzierung bei der Gemeinschaft , die n¨ amlich dann t¨ atig wird , wenn das Ziel besser durch gemeinschaftliche Massnahmen erreicht werden kann . Und genaugenommen steht hier die Glaubw¨ urdigkeit der EU auf dem Spiel . That is why the responsibility for achieving the efficiency target and at the same time reducing CO2 lies with the Community , which in fact takes action when an objective can be achieved more effectively by Community measures . Strictly speaking , it is the credibility of the EU that is at stake here . Notice different positions of corresponding verb groups. MT has to take into account word re-ordering!

  • M. Federico

MT 2013 3

Word Alignments

  • Let us considers possible alignments a between words in f and e.

dalla

1

  • rientale

domani un serata soffierà freddo vento di

9 8 7 6 5 4 3 2 1

since

3 2 8 4 9 6 5 7

blow will wind chilly eastern an evening tomorrow

  • M. Federico

MT 2013

slide-3
SLIDE 3

3

Word Alignments

  • Let us considers possible alignments a between words in f and e.
  • Typically, alignments are restricted to maps between positions of f and of e.

dalla

1

  • rientale

domani un serata soffierà freddo vento di

9 8 7 6 5 4 3 2 1

since

3 2 8 4 9 6 5 7

blow will wind chilly eastern an evening tomorrow

  • M. Federico

MT 2013 3

Word Alignments

  • Let us considers possible alignments a between words in f and e.
  • Typically, alignments are restricted to maps between positions of f and of e.
  • Some source words might be not aligned

dalla

1

  • rientale

domani un serata soffierà freddo vento di

9 8 7 6 5 4 3 2 1

since

3 2 8 4 9 6 5 7

blow will wind chilly eastern an evening tomorrow

  • M. Federico

MT 2013

slide-4
SLIDE 4

3

Word Alignments

  • Ley us considers possible alignments a between words in f and e.
  • Typically, alignments are restricted to maps between positions of f and of e.
  • Some source words might be not aligned (=virtually aligned with NULL)

dalla

1

  • rientale

domani un serata soffierà freddo vento di

9 8 7 6 5 4 3 2 1

since

3 2 8 4 9 6 5 7

blow will wind chilly eastern an evening tomorrow NULL

  • M. Federico

MT 2013 3

Word Alignments

  • Let us considers possible alignments a between words in f and e.
  • Typically, alignments are restricted to maps between positions of f and of e.
  • Some source words might be not aligned (=virtually aligned with NULL)
  • These and even more general alignments are machine learnable.

dalla

1

  • rientale

domani un serata soffierà freddo vento di

9 8 7 6 5 4 3 2 1

since

3 2 8 4 9 6 5 7

blow will wind chilly eastern an evening tomorrow NULL

  • M. Federico

MT 2013

slide-5
SLIDE 5

3

Word Alignments

  • Let us considers possible alignments a between words in f and e.
  • Typically, alignments are restricted to maps between positions of f and of e.
  • Some source words might be not aligned (=virtually aligned with NULL)
  • These and even more general alignments are machine learnable.
  • Notice also that alignments induce word re-ordering

dalla

1

  • rientale

domani un serata soffierà freddo vento di

9 8 7 6 5 4 3 2 1

since

3 2 8 4 9 6 5 7

blow will wind chilly eastern an evening tomorrow NULL

  • M. Federico

MT 2013 4

Word Alignment: Matrix Representation

blow 9 · · · ·

  • ·

· · · will 8 · · · · · · · · · wind 7 · · · · · · ·

  • ·

chilly 6 · · · · · ·

  • ·

· eastern 5 · · · · · · · ·

  • an 4

· · · · ·

  • ·

· · evening 3 ·

  • ·

· · · · · · tomorrow 2 · · ·

  • ·

· · · · since 1

  • ·

· · · · · · ·

1 2 3 4 5 6 7 8 9

d a l l a s e r a t a d i d

  • m

a n i s

  • f

f i e r ` a u n f b l a c k d

  • v

e n t

  • r

i e n t a l e

dalla

1

  • rientale

domani un serata soffierà freddo vento di

9 8 7 6 5 4 3 2 1

since

3 2 8 4 9 6 5 7

blow will wind chilly eastern an evening tomorrow

  • M. Federico

MT 2013

slide-6
SLIDE 6

4

Word Alignment: Matrix Representation

blow 9 · · · ·

  • ·

· · · will 8 · · · · · · · · · wind 7 · · · · · · ·

  • ·

chilly 6 · · · · · ·

  • ·

· eastern 5 · · · · · · · ·

  • an 4

· · · · ·

  • ·

· · evening 3 ·

  • ·

· · · · · · tomorrow 2 · · ·

  • ·

· · · · since 1

  • ·

· · · · · · · NULL 0 · ·

  • ·

· · · · ·

1 2 3 4 5 6 7 8 9

d a l l a s e r a t a d i d

  • m

a n i s

  • f

f i e r ` a u n f b l a c k d

  • v

e n t

  • r

i e n t a l e

dalla

1

  • rientale

domani un serata soffierà freddo vento di

9 8 7 6 5 4 3 2 1

since

3 2 8 4 9 6 5 7

blow will wind chilly eastern an evening tomorrow NULL

  • M. Federico

MT 2013 5

Word Alignment: Direct Alignment

A : {1, . . . , m} − → {1, . . . , l} implemented6 · · · ·

  • been5

· · ·

  • ·

· · has4 · ·

  • ·

· · · program3 ·

  • ·

· · · · the2

  • ·

· · · · · and1 · · · · · · · position

1 2 3 4 5 6 7

i l p r

  • g

r a m m a ` e s t a t

  • m

e s s

  • i

n p r a t i c a We allow only one link (point) in each column. Some columns may be empty.

  • M. Federico

MT 2013

slide-7
SLIDE 7

6

Word Alignment: Inverted Alignment

A : {1, . . . , l} − → {1, . . . , m} people6 · · ·

  • aborigenal5

· · ·

  • the4

· ·

  • ·
  • f3

· ·

  • ·

territory2 ·

  • ·

· the1

  • ·

· · position

1 2 3 4

i l t e r r i t

  • r

i

  • d

e g l i a u t

  • c

t

  • n

i You can get a direct alignment by swapping source and target sentence.

  • M. Federico

MT 2013 7

Word Alignment Models

  • In order to find automatic methods to learn word alignments from data we use

mathematical models that ”explain” how translations are generated.

  • The way models explain translations may appear very na¨

ıve if not silly! Indeed they are very simplistic ...

  • However, simple explanations often do work better than complex ones!
  • We need to be a little bit formal here, just to give names to ingredients we

will use in our recipes to learn word alignments: – English sentence e is a sequence of m words – French sentence f is a sentence of l words – Word alignment a is a map from n positions to m positions

  • We will have to relax a bit our conception of sentence:

it is just a sequence of words, which might have or not sense at all...

  • M. Federico

MT 2013

slide-8
SLIDE 8

8

Word Alignment Models

There are five models, of increasing complexity, that explain how a translation and an alignment can be generated from a foreign sentence.

Alignment Model Pr(a,f|e)

e a,f

Complexity refers to the amount of parameters that define the model! We start from the simplest model, called Model 1!

  • M. Federico

MT 2013 9

Model 1

Alignment Model Pr(a,f|e)

e a,f

Model 1 generates the translation and the alignment as follows:

  • 1. guess the length m of f on the basis of the length l of e
  • 2. for each position j in f repeat the following two steps:

(a) randomly pick a corresponding position i in e (b) generate word j of f by picking a translation of word i in e Step 1 is executed by using a translation length predictor Step 2.(a) is performed by throwing a dice with l faces 1 Step 2.(b) is carried out by using a word translation table

1Indeed, l + 1 if we want to include the null word.

  • M. Federico

MT 2013

slide-9
SLIDE 9

10

Model 1: Generative Process

the1 program2 has3 been4 implemented5

l=5 m=7 3 5 4 5 5 1 2

e'1 stato2 messo3 in4 pratica5 il6 programma7 has3 been4 implemented5 implemented5 implemented5 the1 program2

. . .

alignment length translation

words chosen through a probability table positions picked randomly

MODEL 1 ONLY RELIES ON WORD-TO-WORD TRANSLATION PROBs!

  • M. Federico

MT 2013 11

Model 1

Alignment Model Pr(a,f|e)

e a,f

Let us see how we can can implement Model 1 and at its complexity:

  • 1. length predictor of the translation

this is not difficult to build, we look for instance at many English-French translations and study how sentence lengths are related (few parameters)

  • 2. dice of l faces: very simple to simulate by a computer (no parameters)
  • 3. translation table of words: this is the tricky part.

We need a big table that tells us for each French word f and English word e if e is either a good or bad translation of f (fair amount of parameters)

  • M. Federico

MT 2013

slide-10
SLIDE 10

12

Model 1: Translation Table

Assume very simple German and English languages: just 4 words each.

0.1 a the ein das 0.85 0.8 0.12 0.02 0.04 0,05 0.03 0.03 Buch Haus 0.01 0.07 0.02 0.01 0.02 0.92 0.92 book house

Model 1 needs a table 4 x 4:

  • each raw shows English translations of each German word
  • each raw contains probabilities summing up to one

Of course, the majority of cells should ideally be equal to zero. Learning Model 1, basically means filling the table with some good values ....

  • M. Federico

MT 2013 13

Model 1: Learning

Let us assume that we have a parallel corpus with alignments:

  • rientale

vento freddo un soffierà domani di serata dalla blow will wind chilly eastern an evening tomorrow since un Alpi le interessa est da freddo vento eastern Alps the affects breeze cool an

We can estimate translation probabilities by counting aligned word-pairs. The maximum likelihood estimation for a discrete distribution is: p(e | f) = count(f, e) P

e count(e, f) = count(e, f)

count(f) for a word-pair chilly-freddo we count how often they are aligned together p(chilly | freddo) = count(chilly, freddo) count(freddo) = 1 2 = 0.5 we end up with reliable probabilities by using a very large parallel corpus!

  • M. Federico

MT 2013

slide-11
SLIDE 11

14

Model 1: Best Alignment Search

Let us assume that we have translation probabilities p(f | e). Given a parallel corpus without alignments

  • rientale

vento freddo un soffierà domani di serata dalla blow will wind chilly eastern an evening tomorrow since eastern Alps the affects breeze cool an un Alpi le interessa est da freddo vento

We can compute the most probable alignment of each sentence pair as follows: for each word e in the text e we pick the most probable word f in the text e according to the available probabilities. [Exercise 2. Given the following translation probabilities of the word freddo, what alignments will be generate for this word? cold chilly cool wind freddo 0.4 0.3 0.2 0.1 ]

  • M. Federico

MT 2013 15

Estimation of Word Alignment Models

How to train alignment models from parallel data?

  • data + word alignments =

⇒ model parameters

  • data + model parameters =

⇒ word alignments Idea to solve this chicken & egg problem

BILINGUAL CORPUS INITIAL PARAM IMPROVE ESTIMATE PARAM

loop until convergencence

  • M. Federico

MT 2013

slide-12
SLIDE 12

16

Model 1: Estimation

Let’s go back to our simplified English and German languages. Ingredients of Expectation Maximization algorithm:

  • Initial parameters: translation table with uniform probabilities
  • Bilingual corpus: collection of human translations

Blingual corpus the house - das Haus the book - das Buch a book - ein Buch Probability table: das ein Haus Buch the 0.25 0.25 0.25 0.25 a 0.25 0.25 0.25 0.25 house 0.25 0.25 0.25 0.25 book 0.25 0.25 0.25 0.25 Let us now see how to improve our probabilities ....

  • M. Federico

MT 2013 17

Model 1: Estimation

We start from the first sentence pair of the bilingual corpus: the house - das Haus and apply the following two steps:

  • 1. We weight each word co-occurrences with its probability in the table:

co(the,das) = 1 x 0.25 co(the,Haus) = 1 x 0.25 co(house,das) = 1 x 0.25 co(house,Haus) = 1 x 0.25

  • 2. We transform weighted co-occurrences in conditional probabilities:

Pr(das/the) = 0.25/(0.25+0.25) Pr(Haus/the) =0.25/(0.25+0.25) Pr(das/house) = 0.25/(0.25+0.25) Pr(Haus/house) =0.25/(0.25+0.25) the das ein Haus Buch co 0.25 0.25 pr 0.50 0.50 house das ein Haus Buch co 0.25 0.25 pr 0.25 0.25 Notice: in this sentence ”the” can be only linked either to ”das” or to ”Haus” We apply the same steps to all the sentence pairs of the bilingual corpus

  • M. Federico

MT 2013

slide-13
SLIDE 13

18

Model 1: Estimation

  • the house - das Haus

the das ein Haus Buch co 0.25 0.25 pr 0.50 0.50 house das ein Haus Buch co 0.25 0.25 pr 0.25 0.25

  • the book - das Buch

the das ein Haus Buch co 0,25 0.25 pr 0,50 0.50 book das ein Haus Buch co 0.25 0.25 pr 0.50 0.50

  • a book - ein Buch

a das ein Haus Buch co 0.25 0.25 pr 0.50 0.50 book das ein Haus Buch co 0.25 0.50 pr 0.50 0.50

  • M. Federico

MT 2013 19

Model 1: Estimation

We sum up all sentence level probs in a co-occurrence table ... Bilingual corpus the house - das Haus the book - das Buch a book - ein Buch Co-occurrence table: das ein Haus Buch total the 1.0 0.50 0.50 2 a 0.5 0.5 1 house 0.5 0.5 1 book 0.5 0.5 1.0 2 Finally, we compute updated word translation probabilities from the counts. Probability table: das ein Haus Buch the 0.50 0.25 0.25 a 0.5 0.5 house 0.5 0.5 book 0.25 0.25 0.50 Let is start a second iteration ....

  • M. Federico

MT 2013

slide-14
SLIDE 14

20

Model 1: Estimation

  • the house - das Haus

the das ein Haus Buch co 0.5 0.25 pr 0,67 0.33 house das ein Haus Buch co 0.5 0.5 pr 0.5 0.5

  • the book - das Buch

the das ein Haus Buch co 0,5 0.25 pr 0,67 0.33 book das ein Haus Buch co 0.25 0.5 pr 0.33 0.67

  • a book - ein Buch

a das ein Haus Buch co 0.5 0.5 pr 0.5 0.5 book das ein Haus Buch co 0.25 0.50 pr 0.33 0.67

  • M. Federico

MT 2013 21

Model 1: Estimation

Again, we sum all probabilities in a co-occurrence table Bilingual corpus the house - das Haus the book - das Buch a book - ein Buch Co-occurrence table: das ein Haus Buch total the 1.34 0.33 0.33 2 a 0.5 0.5 1 house 0.5 0.5 1 book 0.33 0.33 1.34 2 and compute updated word translation probabilities from the counts. Probability table: das ein Haus Buch the 0.67 0.165 0.165 a 0.5 0.5 house 0.5 0.5 book 0.165 0.165 0.67 We iterate this procedure several times, until prob get stable values

  • M. Federico

MT 2013

slide-15
SLIDE 15

22

Model 1: Estimation

Iteration 3 das ein Haus Buch the 0.8 0.1 0.1 a 0.5 0.5 house 0.5 0.5 book 0.1 0.1 0.8 .... Iteration 12 das ein Haus Buch the 1.0 a 0.5 0.5 house 0.5 0.5 book 1.0

  • This procedure is called Expectation Maximization algorithm
  • Here, EM could only learn translations of ”the” and ”book”!
  • We need more data to learn more translations and ... better models, too!
  • M. Federico

MT 2013 23

Example of Alignment with Model 1

. · · · · · · · · · · · · ·

  • mehr

· · · · · · · · · · · ·

  • ·

nicht · · · · · · · · · ·

  • ·

· · wohl · · · · ·

  • ·

·

  • ·

· · · das · · · · · · · ·

  • ·

· · · · geht · · · · · · · · · · ·

  • ·

· dann · · ·

  • ·

· · · · · · · · · , · ·

  • ·
  • ·

·

  • ·

· · · · · ja ·

  • ·

· · · · · · · · · · · ah

  • ·

· · · · · · · · · · · · NULL · · · · · · · · · · · · · ·

  • h

well , then , I guess , that will not work anymore .

Problem: – three source words (,) are mapped to the same target word! – in fact, source words are aligned independently from each other

  • M. Federico

MT 2013

slide-16
SLIDE 16

24

Example: alignment with fertility models

. · · · · · · · · · · · · ·

  • mehr

· · · · · · · · · · · ·

  • ·

nicht · · · · · · · · · ·

  • ·

· · wohl · · · · ·

  • ·

· · · · · · das · · · · · · · ·

  • ·

· · · · geht · · · · · · · · ·

  • ·
  • ·

· dann · · ·

  • ·

· · · · · · · · · , · ·

  • ·

· · · · · · · · · · ja ·

  • ·

· · · · · · · · · · · ah

  • ·

· · · · · · · · · · · · NULL · · · ·

  • ·

·

  • ·

· · · · ·

  • h

well , then , I guess , that will not work anymore .

Fertility models – explicitly consider the number of words covered by each English word – e.g. if comma has fertility 1, then only one source word can be aligned to it

  • M. Federico

MT 2013 25

Model 3

Alignment Model Pr(a,f|e)

e a,f

Model 3 generates the translation and the alignment as follows:

  • 1. for each word i of e it generates a fertility value φi
  • 2. fore each word i of e it applies the following steps:

(a) generate φ translations of word i (b) pick one positions for each of the φ words Step 1 implicitly defines the length m of the translation Steps 1-2 all rely of specific probability tables This model is significantly more complex than Model 1! Estimation of M3 follows the principle used for M1, it’s just more tricky!

  • M. Federico

MT 2013

slide-17
SLIDE 17

26

Model 3: Generative Process

null0 the1 program2 has3 been4 implemented5

1 1 1 1

e'1 stato2 messo3 in4 pratica5 il6 programma7 fertility

3 il programma e` stato pratica in messo 6 7 1 2 5 4 3

tablet permutation

  • M. Federico

MT 2013 27

Combinations of Word Alignments

Given parallel sentences we can train an alignment model and then align them. We have different options:

  • direct alignment: we learn alignments from source to target
  • inverted alignment: we learn alignment from target so source

We can get better alignments by combining direct and inverted alignments.

  • union: greedy collection of alignment points, higher coverage
  • intersection: selective collection, higher precision
  • grow-diagonal: take the best of two

Properties:

  • direct/inverted alignments are maps betwen two sets of positions
  • union alignment is a many-to-many partial alignment
  • intersection is is a 1-1 partial alignment
  • M. Federico

MT 2013

slide-18
SLIDE 18

28

Union and Intersection Alignments

∩ ∪ = =

source source source source target target direct inverted union intersection source source direct inverted target target target target

  • M. Federico

MT 2013 29

Grow Diagonal Word Alignment

∩ =

source target direct inverted intersection source source target target source grow diagonal target source target source target grow diagonal grow diagonal

  • M. Federico

MT 2013

slide-19
SLIDE 19

30

How to measure quality of word alignments

sure alignments target Automatic Manual source source target possible alignments automatic alignments Matches source target

AER= #( ) + #( )

∩ ∩

#( ) + #( ) = 3 + 2 4 + 3 = 0.71

AER = Alignment Error Rate

  • M. Federico

MT 2013 31

Use of word alignments

Bilingual concordance Search string: Select corpus:

Alice in Wonderland rabbit EQUAL TO Done She felt very sleepy, when suddenly a White rabbit with pink eyes ran close by her. nor did Alice think it so unusual to hear the rabbit say to itself "Oh dear! Oh dear! I shall be too late!" But when the rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for she remembered that she had never before seen a rabbit with either a waistcoat-pocket or a watch to take out of it, and she ran across the field after it, and was just in time to see it pop down a large rabbit-hole under the hedge. The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had no time to think about stopping herself before she found herself falling down what seemed to be a very deep well.

Source: Target:

EN-English ZH-Chinese 她感到昏昏欲睡,就在此

  • A“哎呀!哎

呀!我要”她也不 然而当兔子居然从背心口袋中掏出一只表,瞧了 瞧,然后又匆匆赶路

  • 兔子洞像隧道一
  • M. Federico

MT 2013

slide-20
SLIDE 20

32

Last words on word alignments

Given a parallel corpus we can automatically learn alignments to:

  • discover interesting lexical relationships
  • generate a probabilistic translation lexicon
  • extract phrase-pairs

Alignments have limitations in terms of allowed word mappings Better alignments can be obtained by:

  • estimating alignments from source to target and viceversa
  • computing a suitable combination of the two alignments
  • M. Federico

MT 2013