4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 - - PowerPoint PPT Presentation

4csll5 ibm translation models
SMART_READER_LITE
LIVE PREVIEW

4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 - - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Efficient EM via p (( j , i ) a | o , s ) 4CSLL5 IBM


slide-1
SLIDE 1

4CSLL5 IBM Translation Models

4CSLL5 IBM Translation Models

Martin Emms October 29, 2020

slide-2
SLIDE 2

4CSLL5 IBM Translation Models

Parameter learning (efficient) How to sum alignments efficiently Efficient EM via p((j, i) ∈ a|o, s)

slide-3
SLIDE 3

4CSLL5 IBM Translation Models

Avoiding Exponential Cost

slide-4
SLIDE 4

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Outline

Parameter learning (efficient) How to sum alignments efficiently Efficient EM via p((j, i) ∈ a|o, s)

slide-5
SLIDE 5

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

but what about Exponential cost?

slide-6
SLIDE 6

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

but what about Exponential cost?

◮ the learnability of translation probabilites in an unsupervised fashion from

just a corpus of pairs is a remarkable thing

slide-7
SLIDE 7

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

but what about Exponential cost?

◮ the learnability of translation probabilites in an unsupervised fashion from

just a corpus of pairs is a remarkable thing

◮ however, as we have formulated it, each possible alignment has to be

considered in turn, and each contributes increments to expected counts

slide-8
SLIDE 8

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

but what about Exponential cost?

◮ the learnability of translation probabilites in an unsupervised fashion from

just a corpus of pairs is a remarkable thing

◮ however, as we have formulated it, each possible alignment has to be

considered in turn, and each contributes increments to expected counts

◮ it was already noted that the number of possible alignments is (ℓs + 1)ℓo –

  • ie. exponential in the length of o. For ℓs + 1 = ℓo = 10, this is 1010, or

10,000 million

slide-9
SLIDE 9

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

but what about Exponential cost?

◮ the learnability of translation probabilites in an unsupervised fashion from

just a corpus of pairs is a remarkable thing

◮ however, as we have formulated it, each possible alignment has to be

considered in turn, and each contributes increments to expected counts

◮ it was already noted that the number of possible alignments is (ℓs + 1)ℓo –

  • ie. exponential in the length of o. For ℓs + 1 = ℓo = 10, this is 1010, or

10,000 million

◮ so unless a way can be found to make the EM process on this model much

more efficient, its learnability in principle would just be an interesting curiosity

slide-10
SLIDE 10

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

but what about Exponential cost?

◮ the learnability of translation probabilites in an unsupervised fashion from

just a corpus of pairs is a remarkable thing

◮ however, as we have formulated it, each possible alignment has to be

considered in turn, and each contributes increments to expected counts

◮ it was already noted that the number of possible alignments is (ℓs + 1)ℓo –

  • ie. exponential in the length of o. For ℓs + 1 = ℓo = 10, this is 1010, or

10,000 million

◮ so unless a way can be found to make the EM process on this model much

more efficient, its learnability in principle would just be an interesting curiosity

◮ it turns out that by studying a little more closely the formula where

alignments are summed over, and doing some conversions of ’sums-over-products’ to ’products-over-sums’, it is indeed possible to make the EM process on this model much more efficient.

slide-11
SLIDE 11

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments

2in the formula for p(o, a, ℓo, s) everything except the translation probs is going to cancel out

when you take ratios . . .

slide-12
SLIDE 12

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments

Looking at the brute-force EM algorithm, need to calculate p(a|o, s) – call this γd(a).

2in the formula for p(o, a, ℓo, s) everything except the translation probs is going to cancel out

when you take ratios . . .

slide-13
SLIDE 13

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments

Looking at the brute-force EM algorithm, need to calculate p(a|o, s) – call this γd(a). its fairly easy to see that this is2 γd(a) =

  • j p(oj|sa(j))
  • a′
  • j p(oj|sa′(j))

(11)

2in the formula for p(o, a, ℓo, s) everything except the translation probs is going to cancel out

when you take ratios . . .

slide-14
SLIDE 14

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments

Looking at the brute-force EM algorithm, need to calculate p(a|o, s) – call this γd(a). its fairly easy to see that this is2 γd(a) =

  • j p(oj|sa(j))
  • a′
  • j p(oj|sa′(j))

(11)

◮ The numerator is a product.

2in the formula for p(o, a, ℓo, s) everything except the translation probs is going to cancel out

when you take ratios . . .

slide-15
SLIDE 15

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments

Looking at the brute-force EM algorithm, need to calculate p(a|o, s) – call this γd(a). its fairly easy to see that this is2 γd(a) =

  • j p(oj|sa(j))
  • a′
  • j p(oj|sa′(j))

(11)

◮ The numerator is a product. ◮ It turns out the denominator can also be turned into product of sums

2in the formula for p(o, a, ℓo, s) everything except the translation probs is going to cancel out

when you take ratios . . .

slide-16
SLIDE 16

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments contd

each j can be aligned to any i between 0 and I, hence

  • a
  • j

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

J

  • j=1

t(oj|sa(j)) = = =

slide-17
SLIDE 17

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments contd

each j can be aligned to any i between 0 and I, hence

  • a
  • j

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

J

  • j=1

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

[t(o1|sa(1)) . . . t(oJ|sa(J))] = =

slide-18
SLIDE 18

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments contd

each j can be aligned to any i between 0 and I, hence

  • a
  • j

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

J

  • j=1

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

[t(o1|sa(1)) . . . t(oJ|sa(J))] each I

a(j)=0() effects just one t(oj|sa(j)) term, and this means we can use a

sum-of-products to product-of-sums conversion, hence = =

slide-19
SLIDE 19

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments contd

each j can be aligned to any i between 0 and I, hence

  • a
  • j

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

J

  • j=1

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

[t(o1|sa(1)) . . . t(oJ|sa(J))] each I

a(j)=0() effects just one t(oj|sa(j)) term, and this means we can use a

sum-of-products to product-of-sums conversion, hence =

J

  • j=1

[

I

  • a(j)=0

t(oj|sa(j))] =

slide-20
SLIDE 20

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Summing over alignments contd

each j can be aligned to any i between 0 and I, hence

  • a
  • j

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

J

  • j=1

t(oj|sa(j)) =

I

  • a(1)=0

. . .

I

  • a(J)=0

[t(o1|sa(1)) . . . t(oJ|sa(J))] each I

a(j)=0() effects just one t(oj|sa(j)) term, and this means we can use a

sum-of-products to product-of-sums conversion, hence =

J

  • j=1

[

I

  • a(j)=0

t(oj|sa(j))] =

J

  • j=1

[

I

  • i=0

t(oj|si)]

slide-21
SLIDE 21

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Pause: did you believe that?

the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

slide-22
SLIDE 22

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Pause: did you believe that?

the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

2

  • a(1)=0

2

  • a(2)=0

2

  • j=1

t(oj|sa(j)) = = t(o1|s0) t(o2|s0) + t(o1|s0) t(o2|s1) + t(o1|s0) t(o2|s2) + = =

slide-23
SLIDE 23

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Pause: did you believe that?

the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

2

  • a(1)=0

2

  • a(2)=0

2

  • j=1

t(oj|sa(j)) = = t(o1|s0) t(o2|s0) + t(o1|s0) t(o2|s1) + t(o1|s0) t(o2|s2) + t(o1|s1) t(o2|s0) + t(o1|s1) t(o2|s1) + t(o1|s1) t(o2|s2) + = =

slide-24
SLIDE 24

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Pause: did you believe that?

the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

2

  • a(1)=0

2

  • a(2)=0

2

  • j=1

t(oj|sa(j)) = = t(o1|s0) t(o2|s0) + t(o1|s0) t(o2|s1) + t(o1|s0) t(o2|s2) + t(o1|s1) t(o2|s0) + t(o1|s1) t(o2|s1) + t(o1|s1) t(o2|s2) + t(o1|s2) t(o2|s0) + t(o1|s2) t(o2|s1) + t(o1|s2) t(o2|s2) = = =

slide-25
SLIDE 25

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Pause: did you believe that?

the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

2

  • a(1)=0

2

  • a(2)=0

2

  • j=1

t(oj|sa(j)) = = t(o1|s0) t(o2|s0) + t(o1|s0) t(o2|s1) + t(o1|s0) t(o2|s2) + t(o1|s1) t(o2|s0) + t(o1|s1) t(o2|s1) + t(o1|s1) t(o2|s2) + t(o1|s2) t(o2|s0) + t(o1|s2) t(o2|s1) + t(o1|s2) t(o2|s2) = = t(o1|s0)[t(o2|s0) + t(o2|s1) + t(o2|s2)] + =

slide-26
SLIDE 26

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Pause: did you believe that?

the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

2

  • a(1)=0

2

  • a(2)=0

2

  • j=1

t(oj|sa(j)) = = t(o1|s0) t(o2|s0) + t(o1|s0) t(o2|s1) + t(o1|s0) t(o2|s2) + t(o1|s1) t(o2|s0) + t(o1|s1) t(o2|s1) + t(o1|s1) t(o2|s2) + t(o1|s2) t(o2|s0) + t(o1|s2) t(o2|s1) + t(o1|s2) t(o2|s2) = = t(o1|s0)[t(o2|s0) + t(o2|s1) + t(o2|s2)] + t(o1|s1)[t(o2|s0) + t(o2|s1) + t(o2|s2)] + =

slide-27
SLIDE 27

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Pause: did you believe that?

the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

2

  • a(1)=0

2

  • a(2)=0

2

  • j=1

t(oj|sa(j)) = = t(o1|s0) t(o2|s0) + t(o1|s0) t(o2|s1) + t(o1|s0) t(o2|s2) + t(o1|s1) t(o2|s0) + t(o1|s1) t(o2|s1) + t(o1|s1) t(o2|s2) + t(o1|s2) t(o2|s0) + t(o1|s2) t(o2|s1) + t(o1|s2) t(o2|s2) = = t(o1|s0)[t(o2|s0) + t(o2|s1) + t(o2|s2)] + t(o1|s1)[t(o2|s0) + t(o2|s1) + t(o2|s2)] + t(o1|s2)[t(o2|s0) + t(o2|s1) + t(o2|s2)] = =

slide-28
SLIDE 28

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Pause: did you believe that?

the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

2

  • a(1)=0

2

  • a(2)=0

2

  • j=1

t(oj|sa(j)) = = t(o1|s0) t(o2|s0) + t(o1|s0) t(o2|s1) + t(o1|s0) t(o2|s2) + t(o1|s1) t(o2|s0) + t(o1|s1) t(o2|s1) + t(o1|s1) t(o2|s2) + t(o1|s2) t(o2|s0) + t(o1|s2) t(o2|s1) + t(o1|s2) t(o2|s2) = = t(o1|s0)[t(o2|s0) + t(o2|s1) + t(o2|s2)] + t(o1|s1)[t(o2|s0) + t(o2|s1) + t(o2|s2)] + t(o1|s2)[t(o2|s0) + t(o2|s1) + t(o2|s2)] = = [t(o1|s0) + t(o1|s1) + t(o1|s2)][t(o2|s0) + t(o2|s1) + t(o2|s2)]

slide-29
SLIDE 29

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Making p(a|o, s) into a product

Armed with this, we can rewrite (11) the formula for γd(a) as

slide-30
SLIDE 30

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Making p(a|o, s) into a product

Armed with this, we can rewrite (11) the formula for γd(a) as γd(a) = J

j=1[t(oj|sa(j))]

J

j=1[I i=0 t(oj|si)]

=

slide-31
SLIDE 31

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Making p(a|o, s) into a product

Armed with this, we can rewrite (11) the formula for γd(a) as γd(a) = J

j=1[t(oj|sa(j))]

J

j=1[I i=0 t(oj|si)]

and this is just one big product =

slide-32
SLIDE 32

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Making p(a|o, s) into a product

Armed with this, we can rewrite (11) the formula for γd(a) as γd(a) = J

j=1[t(oj|sa(j))]

J

j=1[I i=0 t(oj|si)]

and this is just one big product =

J

  • j=1
  • t(oj|sa(j))]

I

i=0 t(oj|si)]

slide-33
SLIDE 33

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Making p(a|o, s) into a product

Armed with this, we can rewrite (11) the formula for γd(a) as γd(a) = J

j=1[t(oj|sa(j))]

J

j=1[I i=0 t(oj|si)]

and this is just one big product =

J

  • j=1
  • t(oj|sa(j))]

I

i=0 t(oj|si)]

  • each term in this product can be seen as the probability of a particular

alignment step (j, i), given o, s, and it makes sense for the overall alignment probability to be a product of the individual steps. If we use the notation γd(j, i) for this probability of a single alignment step, we get

slide-34
SLIDE 34

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

Making p(a|o, s) into a product

Armed with this, we can rewrite (11) the formula for γd(a) as γd(a) = J

j=1[t(oj|sa(j))]

J

j=1[I i=0 t(oj|si)]

and this is just one big product =

J

  • j=1
  • t(oj|sa(j))]

I

i=0 t(oj|si)]

  • each term in this product can be seen as the probability of a particular

alignment step (j, i), given o, s, and it makes sense for the overall alignment probability to be a product of the individual steps. If we use the notation γd(j, i) for this probability of a single alignment step, we get γd(a) =

J

  • j=1

[γd(j, a(j))]

slide-35
SLIDE 35

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

We have for γd(j, i) γd(j, i) = t(oj|si) I

i′=0 t(oj|si′)

(12)

slide-36
SLIDE 36

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

We have for γd(j, i) γd(j, i) = t(oj|si) I

i′=0 t(oj|si′)

(12)

◮ crucially the cost of calculating γd(j, i) is trivial – its linear in length of s

slide-37
SLIDE 37

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently

We have for γd(j, i) γd(j, i) = t(oj|si) I

i′=0 t(oj|si′)

(12)

◮ crucially the cost of calculating γd(j, i) is trivial – its linear in length of s ◮ The efficient version of EM rests on seeing that once p(j, i|o, s) is worked

  • ut for each j, i, the desired expected (o, s) counts can be worked out

from them

slide-38
SLIDE 38

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

Outline

Parameter learning (efficient) How to sum alignments efficiently Efficient EM via p((j, i) ∈ a|o, s)

slide-39
SLIDE 39

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

recall the [E] step of the brute-force algorithm (if o, s are the dth pair): for each pair (o, s) for each a calculate p(a|o, s) // pseudo counts of (o,s) word pairs for each j ∈ 1 : ℓo // in virtual data #(oj, sa(j)) += p(a|o, s)

3The notation a|(j,i)∈a() means ’sum over only those a that have (j, i) ∈ a

slide-40
SLIDE 40

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

recall the [E] step of the brute-force algorithm (if o, s are the dth pair): for each pair (o, s) for each a calculate p(a|o, s) // pseudo counts of (o,s) word pairs for each j ∈ 1 : ℓo // in virtual data #(oj, sa(j)) += p(a|o, s)

◮ consider a particular (j, i). As you go through all possible a for o, s, each

time the alignment a contains this pairing you make the increment γd(a).

3The notation a|(j,i)∈a() means ’sum over only those a that have (j, i) ∈ a

slide-41
SLIDE 41

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

recall the [E] step of the brute-force algorithm (if o, s are the dth pair): for each pair (o, s) for each a calculate p(a|o, s) // pseudo counts of (o,s) word pairs for each j ∈ 1 : ℓo // in virtual data #(oj, sa(j)) += p(a|o, s)

◮ consider a particular (j, i). As you go through all possible a for o, s, each

time the alignment a contains this pairing you make the increment γd(a). . . . aim for an algorithm which works out quickly for each j, i what the sum of these increments will be, ie.3

  • a|(j,i)∈a

γd(a) (13)

3The notation a|(j,i)∈a() means ’sum over only those a that have (j, i) ∈ a

slide-42
SLIDE 42

Summing over the alignments gives just γd(j, i)

for o position j, s position i is fixed. For every other o position j′, j′ can be aligned to any i between 0 and I, hence

  • a|(i,j)∈a

γd(a) = = = γd(j, i)

slide-43
SLIDE 43

Summing over the alignments gives just γd(j, i)

for o position j, s position i is fixed. For every other o position j′, j′ can be aligned to any i between 0 and I, hence

  • a|(i,j)∈a

γd(a) =

I

  • a(1)=0

. . .

I

  • a(j−1)=0

I

  • a(j+1)=0

. . .

I

  • a(J)=0

 γd(j, i)

  • j′=j

γd(j′, a(j′))   = = γd(j, i)

slide-44
SLIDE 44

Summing over the alignments gives just γd(j, i)

for o position j, s position i is fixed. For every other o position j′, j′ can be aligned to any i between 0 and I, hence

  • a|(i,j)∈a

γd(a) =

I

  • a(1)=0

. . .

I

  • a(j−1)=0

I

  • a(j+1)=0

. . .

I

  • a(J)=0

 γd(j, i)

  • j′=j

γd(j′, a(j′))   we can pull out γd(j, i) and again do a sum-of-products to product-of-sums conversion with the rest, hence = = γd(j, i)

slide-45
SLIDE 45

Summing over the alignments gives just γd(j, i)

for o position j, s position i is fixed. For every other o position j′, j′ can be aligned to any i between 0 and I, hence

  • a|(i,j)∈a

γd(a) =

I

  • a(1)=0

. . .

I

  • a(j−1)=0

I

  • a(j+1)=0

. . .

I

  • a(J)=0

 γd(j, i)

  • j′=j

γd(j′, a(j′))   we can pull out γd(j, i) and again do a sum-of-products to product-of-sums conversion with the rest, hence = γd(j, i)

  • j′=j

 

I

  • a(j′)=0

γd(j′, a(j′))   = γd(j, i)

slide-46
SLIDE 46

Summing over the alignments gives just γd(j, i)

for o position j, s position i is fixed. For every other o position j′, j′ can be aligned to any i between 0 and I, hence

  • a|(i,j)∈a

γd(a) =

I

  • a(1)=0

. . .

I

  • a(j−1)=0

I

  • a(j+1)=0

. . .

I

  • a(J)=0

 γd(j, i)

  • j′=j

γd(j′, a(j′))   we can pull out γd(j, i) and again do a sum-of-products to product-of-sums conversion with the rest, hence = γd(j, i)

  • j′=j

 

I

  • a(j′)=0

γd(j′, a(j′))   each sum runs over every possible alignment destination for j′ and so each one sums to one, so you get just = γd(j, i)

slide-47
SLIDE 47

Efficient EM algorithm for IBM Model 1 training

initialise tr(o|s) uniformly repeat [E] followed by [M] till convergence [E] for each o ∈ Vo for each s ∈ Vs ∪ {NULL} #(o, s) = for each pair o, s for each j ∈ 1 : ℓo for each i ∈ 0 : ℓs #(oj, si) += p((j, i)|o, s) (using (12)) [M] for each s ∈ Vs ∪ {NULL} for each o ∈ Vo tr(o|s) = #(o, s)

  • #(o, s)
slide-48
SLIDE 48

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

Further details

from the above outline to real code is a fairly short distance

slide-49
SLIDE 49

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

Further details

from the above outline to real code is a fairly short distance

  • 1. the formula for p((j, i)|o, s) is

t(oj|si) I

i′=0 t(oj|si′)

, and the denominator stays the same as i is varied in p((i, j)|o, s), so this denominator should be calculated once for each j

slide-50
SLIDE 50

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

Further details

from the above outline to real code is a fairly short distance

  • 1. the formula for p((j, i)|o, s) is

t(oj|si) I

i′=0 t(oj|si′)

, and the denominator stays the same as i is varied in p((i, j)|o, s), so this denominator should be calculated once for each j

  • 2. likewise in the M step, in

#(o, s)

  • ′ #(o′, s) the denominator stays the same as
  • is varied, so this denominator should be calculated once for each s
slide-51
SLIDE 51

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

Example One

Assuming a corpus of 2 pairs: s1 la maison s2 la fleur

  • 1

the house

  • 1

the flower initialising all tr(o|s) to uniformly to 1

3 the evolution of looks like this:

  • |s at each iteration

Obs Src 1 2 3 4 5 . . . final the la 0.33 0.5 0.6 0.69 0.77 0.84 1.00 house la 0.33 0.25 0.2 0.15 0.11 0.081 0.00 flower la 0.33 0.25 0.2 0.15 0.11 0.081 0.00 the maison 0.33 0.5 0.43 0.36 0.3 0.24 0.00 house maison 0.33 0.5 0.57 0.64 0.7 0.76 1.00 flower maison 0.33 0.00 0.00 0.00 0.00 0.00 0.00 the fleur 0.33 0.5 0.43 0.36 0.3 0.24 0.00 house fleur 0.33 0.00 0.00 0.00 0.00 0.00 0.00 flower fleur 0.33 0.5 0.57 0.64 0.7 0.76 1.00

slide-52
SLIDE 52

4CSLL5 IBM Translation Models Parameter learning (efficient) Efficient EM via p((j, i) ∈ a|o, s)

Example Two (Koehn p92)

assuming a corpus of 3 pairs s1 das Haus s2 das Buch s3 ein Buch

  • 1

the house

  • 2

the book

  • 3

a book initialising all t(o|s) uniformly to 0.25, evolution of t(o|s) is

  • |s at each iteration

Obs Src 1 2 3 ... final the das 0.25 0.5 0.6364 0.7479 ... 1 book das 0.25 0.25 0.1818 0.1208 ... house das 0.25 0.25 0.1818 0.1313 ... the buch 0.25 0.25 0.1818 0.1208 ... book buch 0.25 0.5 0.6364 0.7479 ... 1 a buch 0.25 0.25 0.1818 0.1313 ... book ein 0.25 0.5 0.4286 0.3466 ... a ein 0.25 0.5 0.5714 0.6534 ... 1 the haus 0.25 0.5 0.4286 0.3466 ... house haus 0.25 0.5 0.5714 0.6534 ... 1