SLIDE 1 A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion Limit
Yin-Wen Chang 1 (Joint work with Michael Collins 1,2)
1Google, New York 2Columbia University
July 31, 2017
SLIDE 2 Introduction
Background:
◮ Phrase-based decoding without further constraints is NP-hard ◮ Proof: reduction from the travelling salesman problem
(TSP)[Knight(1999)]
◮ Hard distortion limit is commonly imposed in PBMT systems
Question:
◮ Is phrase-based decoding with a fixed distortion limit NP-hard
SLIDE 3
Introduction
A related problem: bandwidth-limited TSP
1 2
. . .
i
. . .
j |i − j| ≤ d
This work: a new decoding algorithm
◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side ◮ Run time: O(nd!lhd+1)
n: source sentence length d: distortion limit
SLIDE 4
Overview of the proposed decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π1 ← π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern) ǫ ǫ
◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side
SLIDE 5
Overview of the proposed decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π1 ← π1 · (1, 2, this must) π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern) ǫ
◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side
SLIDE 6
Overview of the proposed decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π2 ← π2 · (3, 4, our concern) π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern)
◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side
SLIDE 7
Overview of the proposed decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π1 ← π1 · (5, 5, also) π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern)
◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side
SLIDE 8
Overview of the proposed decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π1 ← π1 · (6, 6, be) · π2 π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern) ǫ
◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side
SLIDE 9
Outline
Introduction of the phrase-based decoding problem Target-side left-to-right: the usual decoding algorithm Source-side left-to-right: the proposed algorithm Time complexity of the proposed algorithm Conclusion and future work
SLIDE 10 Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein this must
concern also be Derivation: complete translation with phrase mappings Sub-derivation: partial translation
SLIDE 11 Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein this must
concern also be
◮ Segment the German sentence into non-overlapping phrases
Derivation: complete translation with phrase mappings Sub-derivation: partial translation
SLIDE 12 Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein this must
concern also be this must
concern also be
◮ Segment the German sentence into non-overlapping phrases ◮ Find an English translation for each German phrase
Derivation: complete translation with phrase mappings Sub-derivation: partial translation
SLIDE 13 Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein this must
concern also be this must also be
concern
◮ Segment the German sentence into non-overlapping phrases ◮ Find an English translation for each German phrase ◮ Reorder the English phrases to get a better English sentence
Derivation: complete translation with phrase mappings Sub-derivation: partial translation
SLIDE 14 Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein this must
concern also be this must also be
concern
◮ Segment the German sentence into non-overlapping phrases ◮ Find an English translation for each German phrase ◮ Reorder the English phrases to get a better English sentence
Derivation: complete translation with phrase mappings Sub-derivation: partial translation
SLIDE 15 Score a derivation
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be
concern
SLIDE 16 Score a derivation
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be
concern
◮ Phrase translation score: score(das muss, this must) + · · ·
SLIDE 17 Score a derivation
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be
concern
◮ Phrase translation score: score(das muss, this must) + · · · ◮ Language model score:
score(<s> this must also be our concern </s>) =score(this|<s>) + score(must|this) + · · ·
SLIDE 18 Score a derivation
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be
concern
◮ Phrase translation score: score(das muss, this must) + · · · ◮ Language model score:
score(<s> this must also be our concern </s>) =score(this|<s>) + score(must|this) + · · ·
◮ Reordering score: η · |2 + 1 − 5|
SLIDE 19 Fixed distortion limit: distortion distance ≤ d
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be
concern
◮ Distortion distance: |2 + 1 − 5| = 2
SLIDE 20
Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must
SLIDE 21
Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must
SLIDE 22
Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also
SLIDE 23
Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
SLIDE 24 Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern
SLIDE 25
Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state:
SLIDE 26
Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state: DP state: (must, 2, 110000)
SLIDE 27
Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state: DP state: (also, 5, 110010)
SLIDE 28
Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state: DP state: (be, 6, 110011)
SLIDE 29 Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state: DP state: (concern, 4, 111111)
SLIDE 30
Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must
SLIDE 31
Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must
SLIDE 32 Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must
concern
SLIDE 33 Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also
concern
SLIDE 34 Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern
SLIDE 35
Source-side left-to-right: dynamic programming state
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern
SLIDE 36
Source-side left-to-right: dynamic programming state
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 2, σ1 = 1, this, 2, must
SLIDE 37
Source-side left-to-right: dynamic programming state
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 2, σ1 = 1, this, 2, must
SLIDE 38 Source-side left-to-right: dynamic programming state
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must
concern Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern
SLIDE 39 Source-side left-to-right: dynamic programming state
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also
concern Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 5, σ1 = 1, this, 5, also, σ2 = 3, our, 4, concern
SLIDE 40 Source-side left-to-right: dynamic programming state
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 6, σ1 = 1, this, 4, concern
SLIDE 41 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
SLIDE 42 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
SLIDE 43 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
SLIDE 44 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
◮ s, t: source word indices
SLIDE 45 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
◮ s, t: source word indices ◮ ws, wt: translated target words
SLIDE 46 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsNext phrase starts at s, t can only occurs here Number of states: O(n · g(d) · hd+1)
SLIDE 47 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words Translated source wordsNext phrase starts at s, t can only occurs here Number of states: O(n · g(d) · hd+1)
SLIDE 48 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words Translated source wordsNext phrase starts at Next phrase starts at s, t can only occurs here Number of states: O(n · g(d) · hd+1)
SLIDE 49 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words Translated source wordsNext phrase starts at Next phrase starts at s, t can only occurs here s, t can only occurs here Number of states: O(n · g(d) · hd+1)
SLIDE 50 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here Number of states: O(n · g(d) · hd+1)
SLIDE 51 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here
X
Number of states: O(n · g(d) · hd+1)
SLIDE 52 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here
◮ s(σ1) = 1 ◮ s(σi) ∈ {j − d + 2 . . . j}
∀i ∈ {2 . . . r}
◮ t(σi) ∈ {j − d . . . j}
∀i ∈ {1 . . . r} Number of states: O(n · g(d) · hd+1)
SLIDE 53 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here
◮ s(σ1) = 1 ◮ s(σi) ∈ {j − d + 2 . . . j}
∀i ∈ {2 . . . r}
◮ t(σi) ∈ {j − d . . . j}
∀i ∈ {1 . . . r}
◮ r is bounded by d + 1.
Number of states: O(n · g(d) · hd+1)
SLIDE 54 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here
◮ s(σ1) = 1 ◮ s(σi) ∈ {j − d + 2 . . . j}
∀i ∈ {2 . . . r}
◮ t(σi) ∈ {j − d . . . j}
∀i ∈ {1 . . . r}
◮ r is bounded by d + 1.
Number of states: O(n · g(d) · hd+1)
SLIDE 55 The number of DP states (fixed distortion limit d)
State:
- j, {σ1, σ2 . . . σr}
- r: number of “tapes”
◮ j ∈ {1, . . . , n}
n: source sentence length → O(n)
◮ σ = (s, ws, t, wt)
Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here
◮ s(σ1) = 1 ◮ s(σi) ∈ {j − d + 2 . . . j}
∀i ∈ {2 . . . r}
◮ t(σi) ∈ {j − d . . . j}
∀i ∈ {1 . . . r}
◮ r is bounded by d + 1.
Number of states: O(n · g(d) · hd+1)
SLIDE 56 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 57 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 58 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 59 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 60 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 61 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also also be
concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 62 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 63 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also)
SLIDE 64 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 65 Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)
◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)
1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be
concern also also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)
SLIDE 66
Bound on running time O(nd!lhd+1)
# DP states: O(n · g(d) · hd+1) # transition: O(d2 · l)
◮ n: source sentence length ◮ d: distortion limit ◮ l: bound on the number of phrases starting at any position ◮ h: bound on the maximum number of target translations for
any source word
SLIDE 67
Summary
Problem: Phrase-based decoding with a fixed distortion limit
◮ A new decoding algorithm with O(nd!lhd+1) time ◮ Operate from left to right on the source side ◮ Maintain multiple “tapes” on the target side
SLIDE 68
Follow-up paper in EMNLP discussing experimental results
To appear in EMNLP 2017: “Source-side left-to-right or target-side left-to-right? An empirical comparison of two phrase-based decoding algorithms”
◮ Beam search with a trigram language model ◮ Constraints on the number of “tapes” ◮ Achieve similar efficiency and accuracy as Moses
SLIDE 69
Future work
Finite state transducer (FST) formulation
j = 4 σ1 = 1, this, 2, must σ2 = 3, our, 4, concern j = 5 σ1 = 1, this, 5, also σ2 = 3, our, 4, concern
1 · (5, 5, also), 2 Neural machine translation
◮ An NMT system using this kind of approach? ◮ Replace the attention model by absolving source words strictly
left-to-right?