Lexical Translation Models 1
January 24, 2013
Thursday, January 24, 13
Lexical Translation Models 1 January 24, 2013 Thursday, January - - PowerPoint PPT Presentation
Lexical Translation Models 1 January 24, 2013 Thursday, January 24, 13 Lexical Translation How do we translate a word? Look it up in the dictionary Haus : house, home, shell, household Multiple translations Different word senses,
January 24, 2013
Thursday, January 24, 13
dictionary
different inflections (?)
Haus : house, home, shell, household
Thursday, January 24, 13
Translation Count
house 5000 home 2000 shell 100 household 80
Thursday, January 24, 13
ˆ pMLE(e | Haus) = 0.696 if e = house 0.279 if e = home 0.014 if e = shell 0.011 if e = household
Thursday, January 24, 13
in
“came from”, specifically it came from .
conditionally independent of each other and depend only
p(e | f, m) e f e ei f ai ei fai a
Thursday, January 24, 13
in
“came from”, specifically it came from .
conditionally independent of each other and depend only
p(e | f, m) e f e ei f ai ei fai a e = he1, e2, . . . , emi
Thursday, January 24, 13
in
“came from”, specifically it came from .
conditionally independent of each other and depend only
p(e | f, m) e f e ei f ai ei fai a e = he1, e2, . . . , emi f = hf1, f2, . . . , fni
Thursday, January 24, 13
in
“came from”, specifically it came from .
conditionally independent of each other and depend only
p(e | f, m) e f e ei f ai ei fai a fai
Thursday, January 24, 13
Alignment Translation | Alignment
× p(e | f, m) = X
a∈[0,n]m
p(a | f, m) ×
m
Y
i=1
p(ei | fai)
Thursday, January 24, 13
Thursday, January 24, 13
p(house | Haus)
Thursday, January 24, 13
p(house | Haus) p(shell | Haus)
Thursday, January 24, 13
p(house | Haus) p(shell | Haus) p(declaration | Unabhaenigkeitserkaerung)
Thursday, January 24, 13
p(house | Haus) p(shell | Haus) p(declaration | Unabhaenigkeitserkaerung)
Remember bigram models...
Thursday, January 24, 13
Alignment Translation | Alignment
× p(e | f, m) = X
a∈[0,n]m
p(a | f, m) ×
m
Y
i=1
p(ei | fai)
Thursday, January 24, 13
Most of the action for the first 10 years
word order was hard.
Thursday, January 24, 13
links between two sentences, and they are represented as vectors of positions:
a = (1, 2, 3, 4)>
Thursday, January 24, 13
translation.
a = (3, 4, 2, 1)>
Thursday, January 24, 13
a = (2, 3, 4)>
Thursday, January 24, 13
English just does not have an equivalent But it must be explained - we typically assume every source sentence contains a NULL token
a = (1, 2, 3, 0, 4)>
Thursday, January 24, 13
than one target word a = (1, 2, 3, 4, 4)>
Thursday, January 24, 13
not translate as a unit in lexical translation das Haus brach zusammen the house collapsed
1 2 3 4 1 2 3
a =???
Thursday, January 24, 13
not translate as a unit in lexical translation das Haus brach zusammen the house collapsed
1 2 3 4 1 2 3
a =???
a = (1, 2, (3, 4)>)> ?
Thursday, January 24, 13
ai for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )
Thursday, January 24, 13
for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )
m
Y
i=1
p(e, a | f, m) =
Thursday, January 24, 13
for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )
m
Y
i=1
1 1 + n p(e, a | f, m) =
Thursday, January 24, 13
for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )
m
Y
i=1
1 1 + n p(ei | fai) p(e, a | f, m) =
Thursday, January 24, 13
for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )
m
Y
i=1
1 1 + n p(ei | fai) p(e, a | f, m) =
Thursday, January 24, 13
for each i ∈ [1, 2, . . . , m] ai ∼ Uniform(0, 1, 2, . . . , n) ei ∼ Categorical(θfai )
m
Y
i=1
1 1 + n p(ei | fai) p(e, a | f, m) = p(ei, ai | f, m) = 1 1 + np(ei | fai) p(e, a | f, m) =
m
Y
i=1
p(ei, ai | f, m)
Thursday, January 24, 13
p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =
n
X
ai=0
1 1 + np(ei | fai)
Recall our independence assumption: all alignment decisions are independent of each other, and given alignments all translation decisions are independent of each other, so all translation decisions are independent of each other.
Thursday, January 24, 13
p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =
n
X
ai=0
1 1 + np(ei | fai)
Recall our independence assumption: all alignment decisions are independent of each other, and given alignments all translation decisions are independent of each other, so all translation decisions are independent of each other.
p(a, b, c, d) = p(a)p(b)p(c)p(d)
Thursday, January 24, 13
p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =
n
X
ai=0
1 1 + np(ei | fai)
Recall our independence assumption: all alignment decisions are independent of each other, and given alignments all translation decisions are independent of each other, so all translation decisions are independent of each other.
Thursday, January 24, 13
p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =
n
X
ai=0
1 1 + np(ei | fai)
Recall our independence assumption: all alignment decisions are independent of each other, and given alignments all translation decisions are independent of each other, so all translation decisions are independent of each other.
p(e | f, m) =
m
Y
i=1
p(ei | f, m)
Thursday, January 24, 13
p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =
n
X
ai=0
1 1 + np(ei | fai) p(e | f, m) =
m
Y
i=1
p(ei | f, m)
Thursday, January 24, 13
p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =
n
X
ai=0
1 1 + np(ei | fai) p(e | f, m) =
m
Y
i=1
p(ei | f, m) =
m
Y
i=1 n
X
ai=0
1 1 + np(ei | fai) = 1 (1 + n)m
m
Y
i=1 n
X
ai=0
p(ei | fai)
Thursday, January 24, 13
p(ei, ai | f, m) = 1 1 + np(ei | fai) p(ei | f, m) =
n
X
ai=0
1 1 + np(ei | fai) p(e | f, m) =
m
Y
i=1
p(ei | f, m) =
m
Y
i=1 n
X
ai=0
1 1 + np(ei | fai) = 1 (1 + n)m
m
Y
i=1 n
X
ai=0
p(ei | fai)
Thursday, January 24, 13
das Haus ist klein
1 2 3 4 1 2 4 3
NULL
Start with a foreign sentence and a target length.
Thursday, January 24, 13
das Haus ist klein
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4
is
3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4
is
3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4
is small
3
NULL
Thursday, January 24, 13
das Haus ist klein
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4 3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4
is
3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4
is
3
NULL
Thursday, January 24, 13
das Haus ist klein the house
1 2 3 4 1 2 4
is small
3
NULL
Thursday, January 24, 13
a⇤ = arg max
a2[0,1,...,n]m p(a | e, f)
= arg max
a2[0,1,...,n]m
p(e, a | f) P
a0 p(e, a0 | f)
= arg max
a2[0,1,...,n]m p(e, a | f)
a∗
i = arg n
max
ai=0
1 1 + np(ei | fai) = arg
n
max
ai=0 p(ei | fai)
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
das Haus ist klein
1 2 3 4
NULL
the home
1 2 4
is little
3
Thursday, January 24, 13
estimate the parameters (MLE)
most likely alignments
p(e | f)
Thursday, January 24, 13
models)
alignments for every target word token in the training data
throughout the whole corpus
the source of any translation
standard MLE equation
p(ai | e, f)
(on board)
Thursday, January 24, 13
Thursday, January 24, 13
Thursday, January 24, 13
Thursday, January 24, 13
Thursday, January 24, 13
Thursday, January 24, 13
Thursday, January 24, 13
evaluate perplexity.
PPL = 2
−
1 P (e,f)∈D |e| log Q (e,f)∈D p(e|f)
Iter 1 Iter 2 Iter 3 Iter 4 ... Iter ∞
perplexity
7.21 6.84 ...
2.30 2.21 ... 2
Thursday, January 24, 13
Thursday, January 24, 13
P
Possible links
Thursday, January 24, 13
P
Possible links
Thursday, January 24, 13
P
Possible links
S
Sure links
Thursday, January 24, 13
P
Possible links
S
Sure links
Thursday, January 24, 13
P
Possible links
S
Sure links
Precision(A, P) = |P ∩ A| |A|
Thursday, January 24, 13
P
Possible links
S
Sure links
Precision(A, P) = |P ∩ A| |A| Recall(A, S) = |S ∩ A| |S|
Thursday, January 24, 13
P
Possible links
S
Sure links
Precision(A, P) = |P ∩ A| |A| Recall(A, S) = |S ∩ A| |S| AER(A, P, S) = 1 − |S ∩ A| + |P ∩ A| |S| + |A|
Thursday, January 24, 13
Thursday, January 24, 13
Thursday, January 24, 13