Lexical Translation Models 1 January 24, 2013 Thursday, January - - PowerPoint PPT Presentation

lexical translation models 1
SMART_READER_LITE
LIVE PREVIEW

Lexical Translation Models 1 January 24, 2013 Thursday, January - - PowerPoint PPT Presentation

Lexical Translation Models 1 January 24, 2013 Thursday, January 24, 13 Lexical Translation How do we translate a word? Look it up in the dictionary Haus : house, home, shell, household Multiple translations Different word senses,


slide-1
SLIDE 1

Lexical Translation Models 1

January 24, 2013

Thursday, January 24, 13

slide-2
SLIDE 2

Lexical Translation

  • How do we translate a word? Look it up in the

dictionary

  • Multiple translations
  • Different word senses, different registers,

different inflections (?)

  • house, home are common
  • shell is specialized (the Haus of a snail is a shell)

Haus : house, home, shell, household

Thursday, January 24, 13

slide-3
SLIDE 3

How common is each?

Translation Count

house 5000 home 2000 shell 100 household 80

Thursday, January 24, 13

slide-4
SLIDE 4

MLE

ˆ pMLE(e | Haus) =                0.696 if e = house 0.279 if e = home 0.014 if e = shell 0.011 if e = household

  • therwise

Thursday, January 24, 13

slide-5
SLIDE 5

Lexical Translation

  • Goal: a model
  • where and are complete English and Foreign sentences
  • Lexical translation makes the following assumptions:
  • Each word in in is generated from exactly one word

in

  • Thus, we have an alignment that indicates which word

“came from”, specifically it came from .

  • Given the alignments , translation decisions are

conditionally independent of each other and depend only

  • n the aligned source word .

p(e | f, m) e f e ei f ai ei fai a

Thursday, January 24, 13

slide-6
SLIDE 6

Lexical Translation

  • Goal: a model
  • where and are complete English and Foreign sentences
  • Lexical translation makes the following assumptions:
  • Each word in in is generated from exactly one word

in

  • Thus, we have an alignment that indicates which word

“came from”, specifically it came from .

  • Given the alignments , translation decisions are

conditionally independent of each other and depend only

  • n the aligned source word .

p(e | f, m) e f e ei f ai ei fai a e = he1, e2, . . . , emi

Thursday, January 24, 13

slide-7
SLIDE 7

Lexical Translation

  • Goal: a model
  • where and are complete English and Foreign sentences
  • Lexical translation makes the following assumptions:
  • Each word in in is generated from exactly one word

in

  • Thus, we have an alignment that indicates which word

“came from”, specifically it came from .

  • Given the alignments , translation decisions are

conditionally independent of each other and depend only

  • n the aligned source word .

p(e | f, m) e f e ei f ai ei fai a e = he1, e2, . . . , emi f = hf1, f2, . . . , fni

Thursday, January 24, 13

slide-8
SLIDE 8

Lexical Translation

  • Goal: a model
  • where and are complete English and Foreign sentences
  • Lexical translation makes the following assumptions:
  • Each word in in is generated from exactly one word

in

  • Thus, we have an alignment that indicates which word

“came from”, specifically it came from .

  • Given the alignments , translation decisions are

conditionally independent of each other and depend only

  • n the aligned source word .

p(e | f, m) e f e ei f ai ei fai a fai

Thursday, January 24, 13

slide-9
SLIDE 9

Lexical Translation

  • Putting our assumptions together, we have:

Alignment Translation | Alignment

× p(e | f, m) = X

a∈[0,n]m

p(a | f, m) ×

m

Y

i=1

p(ei | fai)

Thursday, January 24, 13

slide-10
SLIDE 10

Lexical Translation

p(ei | fai)

Thursday, January 24, 13

slide-11
SLIDE 11

Lexical Translation

p(ei | fai)

p(house | Haus)

Thursday, January 24, 13

slide-12
SLIDE 12

Lexical Translation

p(ei | fai)

p(house | Haus) p(shell | Haus)

Thursday, January 24, 13

slide-13
SLIDE 13

Lexical Translation

p(ei | fai)

p(house | Haus) p(shell | Haus) p(declaration | Unabhaenigkeitserkaerung)

Thursday, January 24, 13

slide-14
SLIDE 14

Lexical Translation

p(ei | fai)

p(house | Haus) p(shell | Haus) p(declaration | Unabhaenigkeitserkaerung)

Remember bigram models...

Thursday, January 24, 13

slide-15
SLIDE 15

Lexical Translation

  • Putting our assumptions together, we have:

Alignment Translation | Alignment

× p(e | f, m) = X

a∈[0,n]m

p(a | f, m) ×

m

Y

i=1

p(ei | fai)

Thursday, January 24, 13

slide-16
SLIDE 16

Alignment

p(a | f, m)

Most of the action for the first 10 years

  • f MT was here. Words weren’t the problem,

word order was hard.

Thursday, January 24, 13

slide-17
SLIDE 17

Alignment

  • Alignments can be visualized in by drawing

links between two sentences, and they are represented as vectors of positions:

a = (1, 2, 3, 4)>

Thursday, January 24, 13

slide-18
SLIDE 18

Reordering

  • Words may be reordered during

translation.

a = (3, 4, 2, 1)>

Thursday, January 24, 13

slide-19
SLIDE 19

Word Dropping

  • A source word may not be translated at all

a = (2, 3, 4)>

Thursday, January 24, 13

slide-20
SLIDE 20

Word Insertion

  • Words may be inserted during translation

English just does not have an equivalent But it must be explained - we typically assume every source sentence contains a NULL token

a = (1, 2, 3, 0, 4)>

Thursday, January 24, 13

slide-21
SLIDE 21

One-to-many Translation

  • A source word may translate into more

than one target word a = (1, 2, 3, 4, 4)>

Thursday, January 24, 13

slide-22
SLIDE 22

Many-to-one Translation

  • More than one source word may

not translate as a unit in lexical translation das Haus brach zusammen the house collapsed

1 2 3 4 1 2 3

a =???

Thursday, January 24, 13

slide-23
SLIDE 23

Many-to-one Translation

  • More than one source word may

not translate as a unit in lexical translation das Haus brach zusammen the house collapsed

1 2 3 4 1 2 3

a =???

a = (1, 2, (3, 4)>)> ?

Thursday, January 24, 13

slide-24
SLIDE 24

IBM Model 1

  • Simplest possible lexical translation model
  • Additional assumptions
  • The m alignment decisions are independent
  • The alignment distribution for each is uniform
  • ver all source words and NULL

ai for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )

Thursday, January 24, 13

slide-25
SLIDE 25

IBM Model 1

for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )

m

Y

i=1

p(e, a | f, m) =

Thursday, January 24, 13

slide-26
SLIDE 26

IBM Model 1

for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )

m

Y

i=1

1 1 + n p(e, a | f, m) =

Thursday, January 24, 13

slide-27
SLIDE 27

IBM Model 1

for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )

m

Y

i=1

1 1 + n p(ei | fai) p(e, a | f, m) =

Thursday, January 24, 13

slide-28
SLIDE 28

IBM Model 1

for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )

m

Y

i=1

1 1 + n p(ei | fai) p(e, a | f, m) =

Thursday, January 24, 13

slide-29
SLIDE 29

IBM Model 1

for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )

m

Y

i=1

1 1 + n p(ei | fai) p(e, a | f, m) = p(ei, ai | f, m) = 1 1 + np(ei | fai) p(e, a | f, m) =

m

Y

i=1

p(ei, ai | f, m)

Thursday, January 24, 13

slide-30
SLIDE 30

Marginal probability

p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =

n

X

ai=0

1 1 + np(ei | fai)

Recall our independence assumption: all alignment decisions are independent of each other, and given alignments all translation decisions are independent of each other, so all translation decisions are independent of each other.

Thursday, January 24, 13

slide-31
SLIDE 31

Marginal probability

p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =

n

X

ai=0

1 1 + np(ei | fai)

Recall our independence assumption: all alignment decisions are independent of each other, and given alignments all translation decisions are independent of each other, so all translation decisions are independent of each other.

p(a, b, c, d) = p(a)p(b)p(c)p(d)

Thursday, January 24, 13

slide-32
SLIDE 32

Marginal probability

p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =

n

X

ai=0

1 1 + np(ei | fai)

Recall our independence assumption: all alignment decisions are independent of each other, and given alignments all translation decisions are independent of each other, so all translation decisions are independent of each other.

Thursday, January 24, 13

slide-33
SLIDE 33

Marginal probability

p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =

n

X

ai=0

1 1 + np(ei | fai)

Recall our independence assumption: all alignment decisions are independent of each other, and given alignments all translation decisions are independent of each other, so all translation decisions are independent of each other.

p(e | f, m) =

m

Y

i=1

p(ei | f, m)

Thursday, January 24, 13

slide-34
SLIDE 34

Marginal probability

p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =

n

X

ai=0

1 1 + np(ei | fai) p(e | f, m) =

m

Y

i=1

p(ei | f, m)

Thursday, January 24, 13

slide-35
SLIDE 35

Marginal probability

p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =

n

X

ai=0

1 1 + np(ei | fai) p(e | f, m) =

m

Y

i=1

p(ei | f, m) =

m

Y

i=1 n

X

ai=0

1 1 + np(ei | fai) = 1 (1 + n)m

m

Y

i=1 n

X

ai=0

p(ei | fai)

Thursday, January 24, 13

slide-36
SLIDE 36

Marginal probability

p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =

n

X

ai=0

1 1 + np(ei | fai) p(e | f, m) =

m

Y

i=1

p(ei | f, m) =

m

Y

i=1 n

X

ai=0

1 1 + np(ei | fai) = 1 (1 + n)m

m

Y

i=1 n

X

ai=0

p(ei | fai)

Thursday, January 24, 13

slide-37
SLIDE 37

Example

das Haus ist klein

1 2 3 4 1 2 4 3

NULL

Start with a foreign sentence and a target length.

Thursday, January 24, 13

slide-38
SLIDE 38

Example

das Haus ist klein

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-39
SLIDE 39

Example

das Haus ist klein

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-40
SLIDE 40

Example

das Haus ist klein the

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-41
SLIDE 41

Example

das Haus ist klein the

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-42
SLIDE 42

Example

das Haus ist klein the house

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-43
SLIDE 43

Example

das Haus ist klein the house

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-44
SLIDE 44

Example

das Haus ist klein the house

1 2 3 4 1 2 4

is

3

NULL

Thursday, January 24, 13

slide-45
SLIDE 45

Example

das Haus ist klein the house

1 2 3 4 1 2 4

is

3

NULL

Thursday, January 24, 13

slide-46
SLIDE 46

Example

das Haus ist klein the house

1 2 3 4 1 2 4

is small

3

NULL

Thursday, January 24, 13

slide-47
SLIDE 47

Example

das Haus ist klein

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-48
SLIDE 48

Example

das Haus ist klein

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-49
SLIDE 49

Example

das Haus ist klein the

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-50
SLIDE 50

Example

das Haus ist klein the

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-51
SLIDE 51

Example

das Haus ist klein the house

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-52
SLIDE 52

Example

das Haus ist klein the house

1 2 3 4 1 2 4 3

NULL

Thursday, January 24, 13

slide-53
SLIDE 53

Example

das Haus ist klein the house

1 2 3 4 1 2 4

is

3

NULL

Thursday, January 24, 13

slide-54
SLIDE 54

Example

das Haus ist klein the house

1 2 3 4 1 2 4

is

3

NULL

Thursday, January 24, 13

slide-55
SLIDE 55

Example

das Haus ist klein the house

1 2 3 4 1 2 4

is small

3

NULL

Thursday, January 24, 13

slide-56
SLIDE 56

Finding the Viterbi Alignment

a⇤ = arg max

a2[0,1,...,n]m p(a | e, f)

= arg max

a2[0,1,...,n]m

p(e, a | f) P

a0 p(e, a0 | f)

= arg max

a2[0,1,...,n]m p(e, a | f)

a∗

i = arg n

max

ai=0

1 1 + np(ei | fai) = arg

n

max

ai=0 p(ei | fai)

Thursday, January 24, 13

slide-57
SLIDE 57

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-58
SLIDE 58

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-59
SLIDE 59

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-60
SLIDE 60

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-61
SLIDE 61

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-62
SLIDE 62

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-63
SLIDE 63

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-64
SLIDE 64

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-65
SLIDE 65

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-66
SLIDE 66

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-67
SLIDE 67

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-68
SLIDE 68

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-69
SLIDE 69

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-70
SLIDE 70

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-71
SLIDE 71

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-72
SLIDE 72

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-73
SLIDE 73

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-74
SLIDE 74

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-75
SLIDE 75

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-76
SLIDE 76

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-77
SLIDE 77

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-78
SLIDE 78

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-79
SLIDE 79

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-80
SLIDE 80

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-81
SLIDE 81

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-82
SLIDE 82

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-83
SLIDE 83

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-84
SLIDE 84

Finding the Viterbi Alignment

das Haus ist klein

1 2 3 4

NULL

the home

1 2 4

is little

3

Thursday, January 24, 13

slide-85
SLIDE 85

Learning Lexical Translation Models

  • How do we learn the parameters
  • “Chicken and egg” problem
  • If we had the alignments, we could

estimate the parameters (MLE)

  • If we had parameters, we could find the

most likely alignments

p(e | f)

Thursday, January 24, 13

slide-86
SLIDE 86

EM Algorithm

  • pick some random (or uniform) parameters
  • Repeat until you get bored (~ 5 iterations for lexical translation

models)

  • using your current parameters, compute “expected”

alignments for every target word token in the training data

  • keep track of the expected number of times f translates into e

throughout the whole corpus

  • keep track of the expected number of times that f is used as

the source of any translation

  • use these expected counts as if they were “real” counts in the

standard MLE equation

p(ai | e, f)

(on board)

Thursday, January 24, 13

slide-87
SLIDE 87

EM for Model 1

Thursday, January 24, 13

slide-88
SLIDE 88

EM for Model 1

Thursday, January 24, 13

slide-89
SLIDE 89

EM for Model 1

Thursday, January 24, 13

slide-90
SLIDE 90

EM for Model 1

Thursday, January 24, 13

slide-91
SLIDE 91

EM for Model 1

Thursday, January 24, 13

slide-92
SLIDE 92

Convergence

Thursday, January 24, 13

slide-93
SLIDE 93

Evaluation

  • Since we have a probabilistic model, we can

evaluate perplexity.

PPL = 2

1 P (e,f)∈D |e| log Q (e,f)∈D p(e|f)

Iter 1 Iter 2 Iter 3 Iter 4 ... Iter ∞

  • log likelihood

perplexity

  • 7.66

7.21 6.84 ...

  • 6
  • 2.42

2.30 2.21 ... 2

Thursday, January 24, 13

slide-94
SLIDE 94

Alignment Error Rate

Thursday, January 24, 13

slide-95
SLIDE 95

Alignment Error Rate

P

Possible links

Thursday, January 24, 13

slide-96
SLIDE 96

Alignment Error Rate

P

Possible links

Thursday, January 24, 13

slide-97
SLIDE 97

Alignment Error Rate

P

Possible links

S

Sure links

Thursday, January 24, 13

slide-98
SLIDE 98

Alignment Error Rate

P

Possible links

S

Sure links

Thursday, January 24, 13

slide-99
SLIDE 99

Alignment Error Rate

P

Possible links

S

Sure links

Precision(A, P) = |P ∩ A| |A|

Thursday, January 24, 13

slide-100
SLIDE 100

Alignment Error Rate

P

Possible links

S

Sure links

Precision(A, P) = |P ∩ A| |A| Recall(A, S) = |S ∩ A| |S|

Thursday, January 24, 13

slide-101
SLIDE 101

Alignment Error Rate

P

Possible links

S

Sure links

Precision(A, P) = |P ∩ A| |A| Recall(A, S) = |S ∩ A| |S| AER(A, P, S) = 1 − |S ∩ A| + |P ∩ A| |S| + |A|

Thursday, January 24, 13

slide-102
SLIDE 102

Announcements

  • First language-in-10 start next week
  • Tuesday, Jan 29: David - Latin
  • Thursday, Jan 31: Weston - Mandarin
  • HW 1 is now available (due Feb. 12)

Thursday, January 24, 13

slide-103
SLIDE 103

Thursday, January 24, 13