A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based - - PowerPoint PPT Presentation

a polynomial time dynamic programming algorithm for
SMART_READER_LITE
LIVE PREVIEW

A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based - - PowerPoint PPT Presentation

A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion Limit Yin-Wen Chang 1 (Joint work with Michael Collins 1 , 2 ) 1 Google, New York 2 Columbia University July 31, 2017 Introduction Background:


slide-1
SLIDE 1

A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion Limit

Yin-Wen Chang 1 (Joint work with Michael Collins 1,2)

1Google, New York 2Columbia University

July 31, 2017

slide-2
SLIDE 2

Introduction

Background:

◮ Phrase-based decoding without further constraints is NP-hard ◮ Proof: reduction from the travelling salesman problem

(TSP)[Knight(1999)]

◮ Hard distortion limit is commonly imposed in PBMT systems

Question:

◮ Is phrase-based decoding with a fixed distortion limit NP-hard

  • r not?
slide-3
SLIDE 3

Introduction

A related problem: bandwidth-limited TSP

1 2

. . .

i

. . .

j |i − j| ≤ d

This work: a new decoding algorithm

◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side ◮ Run time: O(nd!lhd+1)

n: source sentence length d: distortion limit

slide-4
SLIDE 4

Overview of the proposed decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π1 ← π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern) ǫ ǫ

◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side

slide-5
SLIDE 5

Overview of the proposed decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π1 ← π1 · (1, 2, this must) π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern) ǫ

◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side

slide-6
SLIDE 6

Overview of the proposed decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π2 ← π2 · (3, 4, our concern) π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern)

◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side

slide-7
SLIDE 7

Overview of the proposed decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π1 ← π1 · (5, 5, also) π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern)

◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side

slide-8
SLIDE 8

Overview of the proposed decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein π1 ← π1 · (6, 6, be) · π2 π1 = π2 = (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) (3, 4, our concern) ǫ

◮ Process the source word from left-to-right ◮ Maintain multiple “tapes” in the target side

slide-9
SLIDE 9

Outline

Introduction of the phrase-based decoding problem Target-side left-to-right: the usual decoding algorithm Source-side left-to-right: the proposed algorithm Time complexity of the proposed algorithm Conclusion and future work

slide-10
SLIDE 10

Phrase-based decoding problem

das muss unsere sorge gleichermaßen sein this must

  • ur

concern also be Derivation: complete translation with phrase mappings Sub-derivation: partial translation

slide-11
SLIDE 11

Phrase-based decoding problem

das muss unsere sorge gleichermaßen sein this must

  • ur

concern also be

◮ Segment the German sentence into non-overlapping phrases

Derivation: complete translation with phrase mappings Sub-derivation: partial translation

slide-12
SLIDE 12

Phrase-based decoding problem

das muss unsere sorge gleichermaßen sein this must

  • ur

concern also be this must

  • ur

concern also be

◮ Segment the German sentence into non-overlapping phrases ◮ Find an English translation for each German phrase

Derivation: complete translation with phrase mappings Sub-derivation: partial translation

slide-13
SLIDE 13

Phrase-based decoding problem

das muss unsere sorge gleichermaßen sein this must

  • ur

concern also be this must also be

  • ur

concern

◮ Segment the German sentence into non-overlapping phrases ◮ Find an English translation for each German phrase ◮ Reorder the English phrases to get a better English sentence

Derivation: complete translation with phrase mappings Sub-derivation: partial translation

slide-14
SLIDE 14

Phrase-based decoding problem

das muss unsere sorge gleichermaßen sein this must

  • ur

concern also be this must also be

  • ur

concern

◮ Segment the German sentence into non-overlapping phrases ◮ Find an English translation for each German phrase ◮ Reorder the English phrases to get a better English sentence

Derivation: complete translation with phrase mappings Sub-derivation: partial translation

slide-15
SLIDE 15

Score a derivation

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be

  • ur

concern

slide-16
SLIDE 16

Score a derivation

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be

  • ur

concern

◮ Phrase translation score: score(das muss, this must) + · · ·

slide-17
SLIDE 17

Score a derivation

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be

  • ur

concern

◮ Phrase translation score: score(das muss, this must) + · · · ◮ Language model score:

score(<s> this must also be our concern </s>) =score(this|<s>) + score(must|this) + · · ·

slide-18
SLIDE 18

Score a derivation

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be

  • ur

concern

◮ Phrase translation score: score(das muss, this must) + · · · ◮ Language model score:

score(<s> this must also be our concern </s>) =score(this|<s>) + score(must|this) + · · ·

◮ Reordering score: η · |2 + 1 − 5|

slide-19
SLIDE 19

Fixed distortion limit: distortion distance ≤ d

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must also be

  • ur

concern

◮ Distortion distance: |2 + 1 − 5| = 2

slide-20
SLIDE 20

Target-side left-to-right: the usual decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must

slide-21
SLIDE 21

Target-side left-to-right: the usual decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must

slide-22
SLIDE 22

Target-side left-to-right: the usual decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also

slide-23
SLIDE 23

Target-side left-to-right: the usual decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

slide-24
SLIDE 24

Target-side left-to-right: the usual decoding algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern

slide-25
SLIDE 25

Target-side left-to-right: dynamic programming algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state:

slide-26
SLIDE 26

Target-side left-to-right: dynamic programming algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state: DP state: (must, 2, 110000)

slide-27
SLIDE 27

Target-side left-to-right: dynamic programming algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state: DP state: (also, 5, 110010)

slide-28
SLIDE 28

Target-side left-to-right: dynamic programming algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state: DP state: (be, 6, 110011)

slide-29
SLIDE 29

Target-side left-to-right: dynamic programming algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) DP state: DP state: (concern, 4, 111111)

slide-30
SLIDE 30

Source-side left-to-right: the proposed algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must

slide-31
SLIDE 31

Source-side left-to-right: the proposed algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must

slide-32
SLIDE 32

Source-side left-to-right: the proposed algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must

  • ur

concern

slide-33
SLIDE 33

Source-side left-to-right: the proposed algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also

  • ur

concern

slide-34
SLIDE 34

Source-side left-to-right: the proposed algorithm

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern

slide-35
SLIDE 35

Source-side left-to-right: dynamic programming state

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern

slide-36
SLIDE 36

Source-side left-to-right: dynamic programming state

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 2, σ1 = 1, this, 2, must

slide-37
SLIDE 37

Source-side left-to-right: dynamic programming state

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 2, σ1 = 1, this, 2, must

slide-38
SLIDE 38

Source-side left-to-right: dynamic programming state

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must

  • ur

concern Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern

slide-39
SLIDE 39

Source-side left-to-right: dynamic programming state

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also

  • ur

concern Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 5, σ1 = 1, this, 5, also, σ2 = 3, our, 4, concern

slide-40
SLIDE 40

Source-side left-to-right: dynamic programming state

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern Sub-derivation: Sub-derivation: (1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern) π1 = π2 =(3, 4, our concern) DP state: j = 4, σ1 = 1, this, 2, must, σ2 = 3, our, 4, concern DP state: j = 6, σ1 = 1, this, 4, concern

slide-41
SLIDE 41

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

slide-42
SLIDE 42

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

slide-43
SLIDE 43

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)

slide-44
SLIDE 44

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)

◮ s, t: source word indices

slide-45
SLIDE 45

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)

◮ s, t: source word indices ◮ ws, wt: translated target words

slide-46
SLIDE 46

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source wordsNext phrase starts at s, t can only occurs here Number of states: O(n · g(d) · hd+1)

slide-47
SLIDE 47

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source words Translated source wordsNext phrase starts at s, t can only occurs here Number of states: O(n · g(d) · hd+1)

slide-48
SLIDE 48

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source words Translated source wordsNext phrase starts at Next phrase starts at s, t can only occurs here Number of states: O(n · g(d) · hd+1)

slide-49
SLIDE 49

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source words Translated source wordsNext phrase starts at Next phrase starts at s, t can only occurs here s, t can only occurs here Number of states: O(n · g(d) · hd+1)

slide-50
SLIDE 50

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here Number of states: O(n · g(d) · hd+1)

slide-51
SLIDE 51

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here

X

Number of states: O(n · g(d) · hd+1)

slide-52
SLIDE 52

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here

◮ s(σ1) = 1 ◮ s(σi) ∈ {j − d + 2 . . . j}

∀i ∈ {2 . . . r}

◮ t(σi) ∈ {j − d . . . j}

∀i ∈ {1 . . . r} Number of states: O(n · g(d) · hd+1)

slide-53
SLIDE 53

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here

◮ s(σ1) = 1 ◮ s(σi) ∈ {j − d + 2 . . . j}

∀i ∈ {2 . . . r}

◮ t(σi) ∈ {j − d . . . j}

∀i ∈ {1 . . . r}

◮ r is bounded by d + 1.

Number of states: O(n · g(d) · hd+1)

slide-54
SLIDE 54

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here

◮ s(σ1) = 1 ◮ s(σi) ∈ {j − d + 2 . . . j}

∀i ∈ {2 . . . r}

◮ t(σi) ∈ {j − d . . . j}

∀i ∈ {1 . . . r}

◮ r is bounded by d + 1.

Number of states: O(n · g(d) · hd+1)

slide-55
SLIDE 55

The number of DP states (fixed distortion limit d)

State:

  • j, {σ1, σ2 . . . σr}
  • r: number of “tapes”

◮ j ∈ {1, . . . , n}

n: source sentence length → O(n)

◮ σ = (s, ws, t, wt)

Ex: σ = (1, this, 5, also) σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1) 1 2 3 4 . . . j − d . . . j j + 1 . . .

X

Translated source wordsNext phrase starts at s, t can only occurs here s, t can only occurs here

◮ s(σ1) = 1 ◮ s(σi) ∈ {j − d + 2 . . . j}

∀i ∈ {2 . . . r}

◮ t(σi) ∈ {j − d . . . j}

∀i ∈ {1 . . . r}

◮ r is bounded by d + 1.

Number of states: O(n · g(d) · hd+1)

slide-56
SLIDE 56

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-57
SLIDE 57

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-58
SLIDE 58

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-59
SLIDE 59

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-60
SLIDE 60

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-61
SLIDE 61

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also also be

  • ur

concern also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-62
SLIDE 62

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-63
SLIDE 63

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) π2 = (3, 4, our concern)(5, 5, also) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also)

slide-64
SLIDE 64

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-65
SLIDE 65

Extend a sub-derivation by four operations

Current sub-derivation: j, π1, π2, . . . πr Consider a new phrase starting at source position j + 1 → O(l)

◮ New segment πr+1 = p ◮ Append πi = πi, p ◮ Prepend πi = p, πi ◮ Concatenate πi = πi, p, πi′ → O(r2) = O(d2)

1 2 3 4 5 6 das muss unsere sorge gleichermaßen sein this must this must also be

  • ur

concern also also also Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (5, 5, also)(3, 4, our concern) π3 = (5, 5, also) Sub-derivation: π1 = (1, 2, this must)(5, 5, also) (5, 5, also)(3, 4, our concern) π2 = (3, 4, our concern)(5, 5, also) π3 = (5, 5, also)

slide-66
SLIDE 66

Bound on running time O(nd!lhd+1)

# DP states: O(n · g(d) · hd+1) # transition: O(d2 · l)

◮ n: source sentence length ◮ d: distortion limit ◮ l: bound on the number of phrases starting at any position ◮ h: bound on the maximum number of target translations for

any source word

slide-67
SLIDE 67

Summary

Problem: Phrase-based decoding with a fixed distortion limit

◮ A new decoding algorithm with O(nd!lhd+1) time ◮ Operate from left to right on the source side ◮ Maintain multiple “tapes” on the target side

slide-68
SLIDE 68

Follow-up paper in EMNLP discussing experimental results

To appear in EMNLP 2017: “Source-side left-to-right or target-side left-to-right? An empirical comparison of two phrase-based decoding algorithms”

◮ Beam search with a trigram language model ◮ Constraints on the number of “tapes” ◮ Achieve similar efficiency and accuracy as Moses

slide-69
SLIDE 69

Future work

Finite state transducer (FST) formulation

j = 4 σ1 = 1, this, 2, must σ2 = 3, our, 4, concern j = 5 σ1 = 1, this, 5, also σ2 = 3, our, 4, concern

1 · (5, 5, also), 2 Neural machine translation

◮ An NMT system using this kind of approach? ◮ Replace the attention model by absolving source words strictly

left-to-right?