Decoding and Inference with Syntactic Translation Models
Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn
Decoding and Inference with Syntactic Translation Models Machine - - PowerPoint PPT Presentation
Decoding and Inference with Syntactic Translation Models Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn jon-ga ringo-o tabeta jon-ga ringo-o tabeta ringo-o tabeta
Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn
jon-ga
tabeta
ringo-o
jon-ga NP
tabeta ringo-o
jon-ga ringo-o tabeta
jon-ga
tabeta
ringo-o
2 1
2 1
jon-ga
tabeta
ringo-o
jon-ga
tabeta
ringo-o
jon-ga ringo-o tabeta : John ate an apple
jon-ga ringo-o tabeta
polynomial space
CKY or variants on the CKY algorithm)
complexity substantially!
A : u B : v C : w φ
w
[X, i, k] : u [Y, k, j] : v [Z, i, j] : u × v × w (Z → XY ) ∈ G
w
[S, 0, ]
[X, i − 1, i] : w (X → fi) ∈ G
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2 NP,2,4
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2 NP,2,4 SBAR 2,4
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2 NP,2,4 SBAR 2,4 VP,1,4
duck
NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2 NP,2,4 SBAR 2,4 VP,1,4
duck
S,0,4 NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2 NP,2,4 SBAR 2,4 VP,1,4
S,0,4 NN,3,4 V,3,4 PRP,0,1 PRP,2,3 V,1,2 NP,2,4 SBAR 2,4 VP,1,4
arity of the edge is the number of tails)
at nodes (cf. semiring parsing)
sites (non-terminals)
source and target languages
arity of the edge and must be the same in both languages
substitution sites
symbols
la lectura de ayer : yesterday ’s reading
S X X
la lectura : reading ayer : yesterday de : 's
1 2 2 1
de : from
1 2 1 2
la lectura de ayer : reading from yesterday
derivations passing through each edge/node
O(|E| + |V |) O(|E| + |V |) O(|E| + |Dmax|k log k)
|E| ∈ O(n3|G|3)
|V | ∈ O(n2|G|)
2 1 2 1
jon-ga
tabeta
ringo-o
2 1 2 1
pattern
there a good translation that doesn’t violate the SCFG matching property)?
string translation model?
model) top-down matching algorithm
the translation forest
et al. (EMNLP , 2010)
S(x1:NP x2:VP) → x1 x2 VP(x1:NP x2:V) → x2 x1 tabeta → ate ringo-o → an apple jon-ga → John
jon-ga ringo-o tabeta
S(x1:NP x2:VP) → x1 x2 VP(x1:NP x2:V) → x2 x1 tabeta → ate ringo-o → an apple jon-ga → John
jon-ga ringo-o tabeta
S NP VP
1
1 2
John
NP V
2
2 1
an apple ate
S(x1:NP x2:VP) → x1 x2 VP(x1:NP x2:V) → x2 x1 tabeta → ate ringo-o → an apple jon-ga → John
S(x1:NP x2:VP) → x1 x2 VP(x1:NP x2:V) → x2 x1
jon-ga ringo-o tabeta
S NP VP
1
1 2
John
NP V
2
2 1
an apple ate
tabeta → ate ringo-o → an apple jon-ga → John
S(x1:NP x2:VP) → x1 x2 VP(x1:NP x2:V) → x2 x1
jon-ga ringo-o tabeta
S NP VP
1
1 2
John
NP V
2
2 1
an apple ate
S(x1:NP x2:VP) → x1 x2 VP(x1:NP x2:V) → x2 x1 tabeta → ate ringo-o → an apple jon-ga → John
jon-ga ringo-o tabeta
S NP VP
1
1 2
John
NP V
2
2 1
an apple ate
S(x1:NP x2:VP) → x1 x2 VP(x1:NP x2:V) → x2 x1 tabeta → ate ringo-o → an apple jon-ga → John
jon-ga ringo-o tabeta
S NP VP
1
1 2
John
NP V
2
2 1
an apple ate
S(x1:NP x2:VP) → x1 x2 VP(x1:NP x2:V) → x2 x1 tabeta → ate ringo-o → an apple jon-ga → John
S X X
la lectura : reading ayer : yesterday de : 's
1 2 2 1
de : from
1 2 1 2
S X X
la lectura : reading ayer : yesterday de : 's
1 2 2 1
de : from
1 2 1 2
S X X
la lectura : reading ayer : yesterday de : 's
1 2 2 1
de : from
1 2 1 2
S X X
la lectura : reading ayer : yesterday de : 's
1 2 2 1
de : from
1 2 1 2
S X X
la lectura : reading ayer : yesterday de : 's
1 2 2 1
de : from
1 2 1 2
X X X
de : saw
1 2
X X X
de : saw
1 2
John Mary
X X X
de : saw
1 2
I think I Mary
X X X
de : saw
1 2
I think I somebody
X X X
de : saw
1 2
I think I somebody
X X X
de : saw
1 2
I think I somebody
S
1
<s> </s>
X X X
de : saw
1 2
I think I somebody
1
fortunately,
X
X X X
de : saw
1 2
I think I somebody
1
X
2
X
the local (conditional) probability of a word in context
along edges.
“enough”.
probabilities of the nth, (n+1)th, ... words can be computed.
probability of any word
certainty because the context is not known
probabilities of the nth, (n+1)th, ... words can be computed.
probability of any word
certainty because the context is not known
Split nodes by the (n-1) words on both sides of the convergent edges.
nodes)
tail node corresponding to the substitution variable
edges from v so that they also proceed from vqe
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
p(mancha|la)
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
p(mancha|la)
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
p(stain|the)
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
X
the stain
p(stain|the)
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
X
the stain
p(gray|the) x p(stain|gray)
1
de : from de : 's
1 2 2 1
X X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
X
the stain
p(gray|the) x p(stain|gray)
1
de : from de : 's
1 2 2 1
X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
X
the stain
1
de : from de : 's
1 2 2 1
X X
2 1 2
la mancha the stain the gray stain the man the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
X
the stain
1
de : from de : 's
1 2 2 1
X X
2 1 2
la mancha the stain the gray stain the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
X
the stain
X
the man the husband
the man
1
de : from de : 's
1 2 2 1
X X
2 1 2
la mancha the stain the gray stain the husband 0.1 0.2 0.7 0.4 0.6 0.6 0.4
X
la mancha
X
the stain
X
the man the husband
the man
impractical
algorithm
MT
decoder, or as a second “rescoring” pass
example, all nodes corresponding to a certain source span)
translation hypergraph may not have cycles