Faster Decoding for Phrases and Syntax
Kenneth Heafield
Faster Decoding for Phrases and Syntax Kenneth Heafield Translation - - PowerPoint PPT Presentation
Faster Decoding for Phrases and Syntax Kenneth Heafield Translation is Expensive speed-up in tuning time but a ff ects the performance 18 days using 12 cores [Williams et al WMT 2014] Time-sensitive BLEU score [Chung and
Kenneth Heafield
“speed-up in tuning time but affects the performance” “18 days using 12 cores”
[Williams et al WMT 2014]
“Time-sensitive BLEU score”
[Chung and Galley, 2012]
“Due to time constraints, this procedure was not used”
[Servan et al, WMT 2012]
Introduction Problem Cube Pruning Incremental Conclusion
2
Introduction Problem Cube Pruning Incremental Conclusion
3
“LM queries often account for more than 50% of the CPU”
[Green et al, WMT 2014]
Introduction Problem Cube Pruning Incremental Conclusion
4
“LM queries often account for more than 50% of the CPU”
[Green et al, WMT 2014]
Faster queries (KenLM) More effective queries
Introduction Problem Cube Pruning Incremental Conclusion
5
Introduction Problem Cube Pruning Incremental Conclusion
8
Le gar¸ con a vu l’homme avec un t´ elescope
Introduction Problem Cube Pruning Incremental Conclusion
9
S:S X:NP X:VP X:VP X:PP X:V X:NP Le gar¸ con a vu l’homme avec un t´ elescope
Introduction Problem Cube Pruning Incremental Conclusion
10
S:S X:NP X:VP X:VP X:PP X:V X:NP Le gar¸ con The boy A boy a vu seen saw view l’homme man the man some men avec un t´ elescope with the telescope to an telescope with a telescope
Introduction Problem Cube Pruning Incremental Conclusion
11
S:S X:NP X:VP X:VP X:PP X:V X:NP Le gar¸ con The boy A boy a vu seen saw view l’homme man the man some men avec un t´ elescope with the telescope to an telescope with a telescope
Introduction Problem Cube Pruning Incremental Conclusion
12
X:VP X:V X:NP a vu Hyp seen saw view l’homme Hyp man the man some men
Introduction Problem Cube Pruning Incremental Conclusion
13
X:VP X:V X:NP a vu Hyp seen saw view l’homme Hyp man the man some men X:VP a vu l’homme Hypothesis seen man seen the man seen some men saw man saw the man saw some men view man view the man view some men
Introduction Problem Cube Pruning Incremental Conclusion
14
X:VP X:V X:NP a vu Hyp Score seen
saw
view
l’homme Hyp Score man
the man
some men
X:VP a vu l’homme Hypothesis Score seen man
seen the man
seen some men
saw man
saw the man
saw some men
view man
view the man
view some men
Introduction Problem Cube Pruning Incremental Conclusion
15
X:VP X:V X:NP a vu Hyp Score seen
saw
view
l’homme Hyp Score man
the man
some men
X:VP a vu l’homme Hypothesis Score saw the man
seen the man
saw man
saw some men
view man
seen man
view the man
seen some men
view some men
Introduction Problem Cube Pruning Incremental Conclusion
16
X:VP X:V X:NP a vu Hyp Score seen
saw
view
l’homme Hyp Score man
the man
some men
X:VP a vu l’homme Hypothesis Score saw the man
seen the man
saw man
saw some men
view man
seen man
view the man
seen some men
view some men
Introduction Problem Cube Pruning Incremental Conclusion
17
X:VP X:V X:NP a vu Hyp Score seen
saw
view
l’homme Hyp Score man
the man
some men
X:VP a vu l’homme Hypothesis Score saw the man
seen the man
saw man
saw some men
view man
seen man
view the man
seen some men
view some men
Introduction Problem Cube Pruning Incremental Conclusion
18
Hypotheses are built by string concatenation. Language model probability changes when this is done:
p(saw the man) = p(the | saw)p(man | saw the) p(saw)p(the man) p(the) p(man | the)
Introduction Problem Cube Pruning Incremental Conclusion
19
Hypotheses are built by string concatenation. Language model probability changes when this is done:
p(saw the man) = p(the | saw)p(man | saw the) p(saw)p(the man) p(the) p(man | the)
Log probability is part of the score = ) Scores do not sum = ) Local decisions may not be globally optimal = ) Search is hard.
Introduction Problem Cube Pruning Incremental Conclusion
20
Introduction Problem Cube Pruning Incremental Conclusion
21
man 3.6 the man 4.3 some men 6.3 seen 3.8 seen man 8.8 seen the man 7.6 seen some men 9.5 saw 4.0 saw man 8.3 saw the man 6.9 saw some men 8.5 view 4.0 view man 8.5 view the man 8.9 view some men 10.8 [Lowerre, 1976; Chiang, 2005]
Introduction Problem Cube Pruning Incremental Conclusion
22
man 3.6 the man 4.3 some men 6.3 seen 3.8 Queue saw 4.0 view 4.0 Queue Hypothesis Sum seen man 3.83.6=7.4 [Chiang, 2007]
Introduction Problem Cube Pruning Incremental Conclusion
23
man 3.6 the man 4.3 some men 6.3 seen 3.8 seen man 8.8 Queue saw 4.0 Queue view 4.0 Queue Hypothesis Sum saw man 4.03.6=7.6 seen the man 3.84.3=8.1 [Chiang, 2007]
Introduction Problem Cube Pruning Incremental Conclusion
24
man 3.6 the man 4.3 some men 6.3 seen 3.8 seen man 8.8 Queue saw 4.0 saw man 8.3 Queue view 4.0 Queue Queue Hypothesis Sum view man 4.03.6=7.6 seen the man 3.84.3=8.1 saw the man 4.04.3=8.3 [Chiang, 2007]
Introduction Problem Cube Pruning Incremental Conclusion
25
man 3.6 the man 4.3 some men 6.3 seen 3.8 seen man 8.8 Queue saw 4.0 saw man 8.3 Queue view 4.0 view man 8.5 Queue Queue Hypothesis Sum seen the man 3.84.3=8.1 saw the man 4.04.3=8.3 view the man 4.04.3=8.3 [Chiang, 2007]
Introduction Problem Cube Pruning Incremental Conclusion
26
Beam Search Make every dish. Keep the best k, throw the rest out. Cube pruning Combine the best ingredients. Only make k dishes.
Introduction Problem Cube Pruning Incremental Conclusion
27
String is a are a String countries that countries which country String is a countries that are a countries that are a countries which . . .
Introduction Problem Cube Pruning Incremental Conclusion
28
Beam Search Make every dish. Keep the best k, throw the rest out. Cube pruning Combine the best ingredients. Only make k dishes. Coarse-to-Fine Make small portions, taste, and order the best ones.
Introduction Problem Cube Pruning Incremental Conclusion
29
Decode multiple times, adding detail each time:
Increased LM order, words instead of classes
Detect and prune “a countries” with a bigram LM.
[Zhang et al, 2008; Petrov et al, 2008]
Introduction Problem Cube Pruning Incremental Conclusion
30
Decode multiple times, adding detail each time:
Increased LM order, words instead of classes
Detect and prune “a countries” with a bigram LM.
[Zhang et al, 2008; Petrov et al, 2008]
Requires tuning each pruning pass. Operates in lock step.
Introduction Problem Cube Pruning Incremental Conclusion
31
Decode multiple times, adding detail each time:
Increased LM order, words instead of classes
Detect and prune “a countries” with a bigram LM.
[Zhang et al, 2008; Petrov et al, 2008]
Requires tuning each pruning pass. Operates in lock step.
Introduction Problem Cube Pruning Incremental Conclusion
32
Introduction Problem Cube Pruning Incremental Conclusion
33
Competing translations have words in common: is a, are a
Introduction Problem Cube Pruning Incremental Conclusion
34
Competing translations have words in common: is a, are a Words at the boundary matter most: a + country, a + countries
Introduction Problem Cube Pruning Incremental Conclusion
35
Competing translations have words in common: is a, are a Words at the boundary matter most: a + country, a + countries
Introduction Problem Cube Pruning Incremental Conclusion
36
Beam Search Make every dish. Keep the best k, throw the rest out. Cube pruning Combine the best ingredients. Only make k dishes. Coarse-to-Fine Make small portions, taste, and order the best ones. Incremental Taste during cooking. Share ingredients.
Introduction Problem Cube Pruning Incremental Conclusion
37
1 Left-to-right phrase-based: one side 2 Bottom-up syntax: both sides
Introduction Problem Cube Pruning Incremental Conclusion
38
Plain text
The United Kingdom is a + . . . Scotland and Wales are a + . . .
Tree
✏ a is The United Kingdom are Scotland and Wales
Introduction Problem Cube Pruning Incremental Conclusion
39
Plain text
. . . + countries that . . . + countries which . . . + country
Tree
✏ country countries which that
Introduction Problem Cube Pruning Incremental Conclusion
40
✏ a is The United Kingdom are Scotland and Wales ✏ country countries which that
Introduction Problem Cube Pruning Incremental Conclusion
41
✏ a is The United Kingdom are Scotland and Wales ✏ country countries which that
Introduction Problem Cube Pruning Incremental Conclusion
42
✏ a is The United Kingdom are Scotland and Wales ✏ country countries which that
Introduction Problem Cube Pruning Incremental Conclusion
43
✏ a is The United Kingdom are Scotland and Wales ✏ country countries which that
Does the model like “a + countries”?
Introduction Problem Cube Pruning Incremental Conclusion
44
Does the model like “a + countries”? Yes Try more detail. No Consider alternatives.
Introduction Problem Cube Pruning Incremental Conclusion
45
Does the model like “a + countries”? Yes Try more detail. No Consider alternatives. Formally: best-first search with a priority queue.
Introduction Problem Cube Pruning Incremental Conclusion
46
“a + ✏”
Best Child “a + countries” Other Children “a + country”
Introduction Problem Cube Pruning Incremental Conclusion
47
Score(a) = max{Score(is a), Score(are a)}
Introduction Problem Cube Pruning Incremental Conclusion
48
Score(a) = max{Score(is a), Score(are a)}
Score(a + countries) < Score(a) + Score(countries)
Introduction Problem Cube Pruning Incremental Conclusion
49
Score(a) = max{Score(is a), Score(are a)}
Score(a + countries) < Score(a) + Score(countries) Formally: p(countries | a) replaces p(countries)
Introduction Problem Cube Pruning Incremental Conclusion
50
Populate the queue with ✏ + ✏ Loop until k complete options have been found: Split the top-scoring option Build a tree from the k complete options
Introduction Problem Cube Pruning Incremental Conclusion
51
Translations are assembled from left to right. Partial translations often share suffixes. Phrases often share prefixes. Test suffixes and prefixes before full combinations.
Introduction Problem Cube Pruning Incremental Conclusion
52
Task Chinese–English Source Stanford Model Phrase-based Software My own decoder, mtplz, versus Moses
Introduction Problem Cube Pruning Incremental Conclusion
53
1 2 3 4 Average model score CPU seconds/sentence mtplz with Incremental Moses with Cube Pruning
Introduction Problem Cube Pruning Incremental Conclusion
54
13 14 15 1 2 3 4 Uncased BLEU CPU seconds/sentence mtplz with Incremental Moses with Cube Pruning
Introduction Problem Cube Pruning Incremental Conclusion
55
The language model cares most about adjacent words. Test them first.
Introduction Problem Cube Pruning Incremental Conclusion
56
1 Left-to-right phrase-based: one side 2 Bottom-up syntax: both sides
Introduction Problem Cube Pruning Incremental Conclusion
57
is a X:NP1 </s> is a X:NP1 that How do we find the best value to substitute?
Manage words on both sides.
Introduction Problem Cube Pruning Incremental Conclusion
58
countries that maintain diplomatic relations with North Korea . Left State Right State relations ties countries that have an embassy in DPR Korea . country that maintains some diplomatic ties in North Korea . nations which has some diplomatic ties with DPR Korea . country that maintains some diplomatic ties with DPR Korea .
Introduction Problem Cube Pruning Incremental Conclusion
59
Left State Right State (countries that ⇧ with North Korea .) (nations which has ⇧ with DPR Korea .) (countries that have ⇧ DPR Korea .) (country ⇧ in North Korea .) (country ⇧ with DPR Korea .) ⇧ Words the language model does not care about
Introduction Problem Cube Pruning Incremental Conclusion
60
Introduction Problem Cube Pruning Incremental Conclusion
61
(✏ ⇧ ✏) (country ⇧ Korea .) (country ⇧ with DPR Korea .) (country ⇧ in North Korea .) (nations which has ⇧ with DPR Korea .) (countries that ⇧ Korea .) (countries that have ⇧ DPR Korea .) (countries that ⇧ with North Korea .)
Introduction Problem Cube Pruning Incremental Conclusion
62
(✏ ⇧ ✏) (country ⇧ Korea .) (country ⇧ with DPR Korea .) (country ⇧ in North Korea .) (nations which has ⇧ with DPR Korea .) (countries that ⇧ Korea .) (countries that have ⇧ DPR Korea .) (countries that ⇧ with North Korea .)
Introduction Problem Cube Pruning Incremental Conclusion
63
(✏ ⇧ ✏) (country ⇧ Korea .) (country ⇧ with DPR Korea .) (country ⇧ in North Korea .) (nations which has ⇧ with DPR Korea .) (countries that ⇧ Korea .) (countries that have ⇧ DPR Korea .) (countries that ⇧ with North Korea .)
Introduction Problem Cube Pruning Incremental Conclusion
64
is a X:NP1 </s> X:V 1 the X:N2
is a (✏ ⇧ ✏) </s> (✏ ⇧ ✏) the (✏ ⇧ ✏) | {z } | {z } X:V 1 X:N2
Introduction Problem Cube Pruning Incremental Conclusion
65
Does the LM like “is a (countries that ⇧ Korea .) </s>”? Yes Try more detail. No Consider alternatives.
Introduction Problem Cube Pruning Incremental Conclusion
66
Does the LM like “is a (countries that ⇧ Korea .) </s>”? Yes Try more detail. No Consider alternatives. Formally: priority queue containing breadcrumbs.
Introduction Problem Cube Pruning Incremental Conclusion
67
(✏ ⇧ ✏) (country ⇧ Korea .) (country ⇧ with DPR Korea .) (country ⇧ in North Korea .) (nations which has ⇧ with DPR Korea .) (countries that ⇧ Korea .) (countries that have ⇧ DPR Korea .) (countries that ⇧ with North Korea .)
Introduction Problem Cube Pruning Incremental Conclusion
68
(✏ ⇧ ✏) (country ⇧ Korea .) (country ⇧ with DPR Korea .) (country ⇧ in North Korea .) (nations which has ⇧ with DPR Korea .) (countries that ⇧ Korea .) (countries that have ⇧ DPR Korea .) (countries that ⇧ with North Korea .) [1+]
Introduction Problem Cube Pruning Incremental Conclusion
69
is a (✏ ⇧ ✏) </s>
Zeroth Child “is a (countries that ⇧ Korea .) </s>” Other Children “is a (✏ ⇧ ✏)[1+] </s>” Children except the zeroth.
Introduction Problem Cube Pruning Incremental Conclusion
70
A priority queue contains competing entries: is a (countries that ⇧ Korea .) </s> (✏ ⇧ ✏) the (✏ ⇧ ✏) is a (✏ ⇧ ✏)[1+] </s> The algorithm pops the top entry, splits a non-terminal, and pushes.
Introduction Problem Cube Pruning Incremental Conclusion
71
Populate the queue with rules like “is a (✏ ⇧ ✏) </s>” Loop until k complete options have been found: Split the top-scoring option, leave a breadcrumb Build a tree from the k complete options
Introduction Problem Cube Pruning Incremental Conclusion
72
Same as phrase-based, just concatenate on left and right.
Introduction Problem Cube Pruning Incremental Conclusion
73
Task WMT 2011 German-English Model Hierarchical Decoder Moses
Introduction Problem Cube Pruning Incremental Conclusion
74
101.9 101.8 101.7 101.6 101.5 101.4 0.5 1 1.5 2 2.5 Average model score CPU seconds/sentence Incremental Cube pruning Additive cube pruning
Introduction Problem Cube Pruning Incremental Conclusion
75
21.4 21.6 21.8 22.0 22.2 0.5 1 1.5 2 2.5 Uncased BLEU CPU seconds/sentence Incremental Cube pruning Additive cube pruning
Introduction Problem Cube Pruning Incremental Conclusion
76
Speed Ratio Hiero zh-en Hiero en-de Hiero de-en Hiero de-en cdec Syntax en-de Syntax de-en Tree-to-tree fr-en cdec 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Introduction Problem Cube Pruning Incremental Conclusion
77
Speed Ratio Hiero zh-en Hiero en-de Hiero de-en Hiero de-en cdec Syntax en-de Syntax de-en Tree-to-tree fr-en cdec Beam 20 Beam < 20 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Introduction Problem Cube Pruning Incremental Conclusion
78
A series of coarse-to-fine estimates. Continually taste the dish and adjust.
Introduction Problem Cube Pruning Incremental Conclusion
79
Search limits what translation can do.
Long-distance models like gender and number are harder.
Open the black box.
Language models can produce intermediate scores.
Introduction Problem Cube Pruning Incremental Conclusion
80