Linguistically Motivated Reordering Modeling for Phrase-Based - - PowerPoint PPT Presentation

linguistically motivated reordering modeling for phrase
SMART_READER_LITE
LIVE PREVIEW

Linguistically Motivated Reordering Modeling for Phrase-Based - - PowerPoint PPT Presentation

PhD Thesis: Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno Kessler / Universit di Trento PSMT decoding overview E' necessario


slide-1
SLIDE 1

Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation

Arianna Bisazza

Advisor: Marcello Federico Fondazione Bruno Kessler / Università di Trento

PhD Thesis:

slide-2
SLIDE 2

2

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-3
SLIDE 3

Freedom of movement must be encouraged

LM scores

3

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

LM scores TM scores ReoM scores ReoM scores

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-4
SLIDE 4

career paths while ensuring that Freedom of movement must be encouraged

LM scores LM scores LM scores

4

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

LM scores TM scores ReoM scores ReoM scores ReoM scores ReoM scores

Arianna Bisazza – PhD Thesis – 19 April 2013

ReoM scores

slide-5
SLIDE 5

LM scores LM scores LM scores

5

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali Freedom of movement must be encouraged while ensuring that career paths

LM scores TM scores ReoM scores ReoM scores ReoM scores ReoM scores

Arianna Bisazza – PhD Thesis – 19 April 2013

ReoM scores

slide-6
SLIDE 6

6

Reordering Models

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

ReoM scores ReoM scores ReoM scores ReoM scores

Many solutions have been proposed with different reo. classes, features, train modes, etc.

Tillman 04, Zens & Ney 06 Al Onaizan & Papineni 06 Galley & Manning 08 Green & al.10, Feng & al.10 …

Arianna Bisazza – PhD Thesis – 19 April 2013

ReoM scores

slide-7
SLIDE 7

7

Reordering Models

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

ReoM scores ReoM scores ReoM scores ReoM scores

No matter what reordering model is used, the permutation search space must be limited!  The power of all reordering models is bound to the reordering constraints in use

Tillman04, Zens&Ney06 AlOnaizan & Papineni06 Galley & Manning08 Green &al.10, Feng &al.10 …

Many solutions have been proposed with different reo. classes, features, train modes, etc.

Tillman 04, Zens & Ney 06 Al Onaizan & Papineni 06 Galley & Manning 08 Green & al.10, Feng & al.10 …

Arianna Bisazza – PhD Thesis – 19 April 2013

ReoM scores

slide-8
SLIDE 8

8

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

ReoM scores

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-9
SLIDE 9

9

Reordering Constraints

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali #perm = |w|! ≈40,000,000

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-10
SLIDE 10

10

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali Source-to-Source distortion #perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1|

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 5 6 7 8 w2 3 2 1 2 3 4 5 6 7 w3 4 3 2 1 2 3 4 5 6 w4 5 4 3 2 1 2 3 4 5 w5 6 5 4 3 2 1 2 3 4 w6 7 6 5 4 3 2 1 2 3 w7 8 7 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2

Reordering Constraints

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-11
SLIDE 11

11

Source-to-Source distortion #perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3  #perm ≈7,000

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 5 6 7 8 w2 3 2 1 2 3 4 5 6 7 w3 4 3 2 1 2 3 4 5 6 w4 5 4 3 2 1 2 3 4 5 w5 6 5 4 3 2 1 2 3 4 w6 7 6 5 4 3 2 1 2 3 w7 8 7 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2

DL: distortion limit

Reordering Constraints

Arianna Bisazza – PhD Thesis – 19 April 2013

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

slide-12
SLIDE 12

12

The problem with DL…

Arabic-English

AR EN AR EN

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-13
SLIDE 13

13

German-English

DE EN DE EN

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2

The problem with DL…

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-14
SLIDE 14

14

Source-to-Source distortion

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 5 6 7 8 w2 3 2 1 2 3 4 5 6 7 w3 4 3 2 1 2 3 4 5 6 w4 5 4 3 2 1 2 3 4 5 w5 6 5 4 3 2 1 2 3 4 w6 7 6 5 4 3 2 1 2 3 w7 8 7 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2

#perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3  #perm ≈7,000

Increasing the DLimit!

Arianna Bisazza – PhD Thesis – 19 April 2013

Current solution

slide-15
SLIDE 15

15

Source-to-Source distortion

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 5 6 7 8 w2 3 2 1 2 3 4 5 6 7 w3 4 3 2 1 2 3 4 5 6 w4 5 4 3 2 1 2 3 4 5 w5 6 5 4 3 2 1 2 3 4 w6 7 6 5 4 3 2 1 2 3 w7 8 7 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2

#perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3  #perm ≈7,000 DL=7  #perm ≈7,000,000

Coarse reordering space definition:  slower decoding  worse translations

Arianna Bisazza – PhD Thesis – 19 April 2013

Increasing the DLimit!

Current solution

slide-16
SLIDE 16

16

Observations

  • Word reordering is difficult!
  • The existing word reordering models are not perfect, but they

are expected to guide search over huge search spaces

Arianna Bisazza – PhD Thesis – 19 April 2013

  • design a perfect model
  • problem: many have

already tried and failed

  • ne way to go:
  • simplify the task for the

existing reordering models

  • ur way:
slide-17
SLIDE 17

17 Arianna Bisazza – PhD Thesis – 19 April 2013

  • A better definition of the reordering search space (i.e. constraints)

can simplify the task of the reordering model

  • (Shallow) linguistic knowledge can help us to refine the reordering

search space for a given language pair

Working hypotheses

slide-18
SLIDE 18

18

Outline

  • The problem
  • The solutions:
  • verb reordering lattices
  • modified distortion matrices
  • dynamically pruning the reordering space
  • Comparative evaluation & conclusions

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-19
SLIDE 19

19

Outline

  • The problem
  • The solutions:
  • verb reordering lattices
  • modified distortion matrices
  • dynamically pruning the reordering space
  • Comparative evaluation & conclusions

Arianna Bisazza – PhD Thesis – 19 April 2013

Bisazza and Federico, Chunk-based Verb Reordering in VSO Sentences for Arabic-English, WMT 2010 Bisazza, Pighin, Federico, Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation, MT Journal 2012

slide-20
SLIDE 20

20

Source-to-Source distortion #perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3  #perm ≈7,000 DL=7  #perm ≈7,000,000

… modify the input to allow

  • nly specific long reorderings

Arianna Bisazza – PhD Thesis – 19 April 2013 w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 5 6 7 8 w2 3 2 1 2 3 4 5 6 7 w3 4 3 2 1 2 3 4 5 6 w4 5 4 3 2 1 2 3 4 5 w5 6 5 4 3 2 1 2 3 4 w6 7 6 5 4 3 2 1 2 3 w7 8 7 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2

Idea: keep a low distortion limit and …

slide-21
SLIDE 21

21

Example of VSO sentences: the Arabic verb is anticipated wrt the English order

Arianna Bisazza – PhD Thesis – 19 April 2013

Typical PSMT outputs:

*The Moroccan monarch King Mohamed VI __ his support to… *He renewed the Moroccan monarch King Mohamed VI his support to…

Reordering patterns in Arabic-English

slide-22
SLIDE 22

22 Arianna Bisazza – PhD Thesis – 19 April 2013

We assume they are well handled in standard PSMT We try to model them explicitly!

Working hypothesis

Uneven distribution of long and short-range word movements:

  • few long:

 verb-subject-object sentences

  • many short:

 adjective-noun  head-initial genitive constructions (idafa)

slide-23
SLIDE 23

23

Chunk-based fuzzy reordering rules

Shallow syntax chunking:

  • cheaper and easier than deep parsing
  • constrains reorderings in a softer way

Fuzzy (non-determinisic) reordering rules:

  • generate N permutations for each matching sequence
  • final reordering decision is taken during translation,

guided by all SMT models (reoM, LM...) Few rules for language pair, to only capture long reordering

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-24
SLIDE 24

24 Arianna Bisazza – PhD Thesis – 19 April 2013

Move verb chunk ahead by 1 to N chunks Move verb chunk and following chunk ahead by 1 to N chunks

Chunk-based fuzzy reordering rules

… CH(*) CH(V) CH(*) CH(*) CH(*) CH(*) CH(*) … CH(V) CH(*) CH(*) CH(*) … CH(*) CH(*) CH(*) …

slide-25
SLIDE 25

25 Arianna Bisazza – PhD Thesis – 19 April 2013

The optimal reordering is the one that minimizes total distortion

Chunk-based verb reordering in parallel data

slide-26
SLIDE 26

26 Arianna Bisazza – PhD Thesis – 19 April 2013

Chunk-based verb reordering in test data

Move verb chunk Move verb chunk and following chunk Verb chunk Other chunks

slide-27
SLIDE 27

27

Experiments

  • Task: NIST
  • MT09 (news translation)
  • Systems based on Moses, include lexicalized phrase

reordering models [Tillmann 04; Koehn & al 05]

  • Non-monotonic lattice decoding [Dyer & al 08]
  • Evaluation by
  • BLEU [Papineni & al 01] for lexical match & local order
  • KRS [Birch & al 10]

for global order

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-28
SLIDE 28

Arianna Bisazza – PhD Thesis – 19 April 2013 28

Arabic-English:

Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis)

Translation Quality

+0.5 BLEU +0.4 KRS

slide-29
SLIDE 29

Arianna Bisazza – PhD Thesis – 19 April 2013 29

Arabic-English:

Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis)

Translation Quality Translation Time

  • 0.1 BLEU
  • 0.3 KRS

Pruning Decoding

slide-30
SLIDE 30

30 Arianna Bisazza – PhD Thesis – 19 April 2013

limiting long reordering of a few chunks only use lattice to represent extra reordering decoding slow down Can we do better? Observation: lattice topology basically distorts word-to-word distances, i.e. during decoding some distant positions become closer Can we achieve the same effect more directly?

Lessons learned

slide-31
SLIDE 31

31

Outline

  • The problem
  • The solutions:
  • verb reordering lattices
  • modified distortion matrices
  • dynamically pruning the reordering space
  • Comparative evaluation & conclusions

Arianna Bisazza – PhD Thesis – 19 April 2013

Bisazza and Federico, Modified Distortion Matrices for Phrase-Based Statistical Machine Translation, ACL 2012

slide-32
SLIDE 32

32

Source-to-Source distortion #perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3  #perm ≈7,000 DL=7  #perm ≈7,000,000

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 5 6 7 8 w2 3 2 1 2 3 4 5 6 7 w3 4 3 2 1 2 3 4 5 6 w4 5 4 3 2 1 2 3 4 5 w5 6 5 4 3 2 1 2 3 4 w6 7 6 5 4 3 2 1 2 3 w7 8 7 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2 Arianna Bisazza – PhD Thesis – 19 April 2013

slide-33
SLIDE 33

33

Source-to-Source distortion #perm = |w|! ≈40,000,000 D(wx,wy)=|y‐x‐1| DL=3  #perm ≈7,000 DL=7  #perm ≈7,000,000 DL=3 & modif(D)  #perm ≈20,000

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 0 0 7 8 w2 3 2 1 2 3 0 0 6 7 w3 4 3 2 1 2 3 4 5 6 w4 5 4 3 2 1 2 3 4 5 w5 6 5 4 3 2 1 2 3 0 w6 7 6 5 4 3 2 1 2 3 w7 8 7 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 2 2 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2

Refined reordering search space

Arianna Bisazza – PhD Thesis – 19 April 2013

Idea: modify the distortion matrix for each test sentence!

slide-34
SLIDE 34

34

Arabic-English

“Move verb chunk (and following chunk) to the right by 1 to N chunks”

Chunk-based fuzzy reordering rules

CC1 VC2 PC3 NC4 PC5 Pct6

w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b .

and took part in the march dozens of militants from the Brigades

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-35
SLIDE 35

35

Arabic-English

“Move verb chunk (and following chunk) to the right by 1 to N chunks” CC1 VC2 PC3 NC4 PC5 Pct6 CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 VC2 PC3 NC4 PC5 CC1 CC1 PC5 Pct6 Pct6 Pct6

w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b .

and took part in the march dozens of militants from the Brigades

Chunk-based fuzzy reordering rules

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-36
SLIDE 36

36

Arabic-English

“Move verb chunk (and following chunk) to the right by 1 to N chunks” CC1 VC2 PC3 NC4 PC5 Pct6 CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 VC2 PC3 NC4 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1 CC1 CC1 CC1 PC5 PC5 Pct6 Pct6 Pct6 Pct6 Pct6

w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b .

and took part in the march dozens of militants from the Brigades

Chunk-based fuzzy reordering rules

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-37
SLIDE 37

37

CC1 VC2 PC3 NC4 PC5 Pct6 CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 VC2 PC3 NC4 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1 CC1 CC1 CC1 PC5 PC5 Pct6 Pct6 Pct6 Pct6 Pct6

w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b .

and took part in the march dozens of militants from the Brigades

Chunk-based fuzzy reordering rules

Reordering selection

Reordered source LM

0.9 0.4 0.1 0.1 0.7

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-38
SLIDE 38

38

CC1 VC2 PC3 NC4 PC5 Pct6 CC1 VC2 PC3 NC4 PC5 VC2 PC3 Pct6 Pct6

w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b .

and took part in the march dozens of militants from the Brigades

Chunk-based fuzzy reordering rules

Reordering selection

Reordered source LM

0.9 0.7 0.4 0.1 0.1 Reorderings to include in the distortion matrix

NC4 PC5 CC1

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-39
SLIDE 39

39

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

1 2 3 4 5 6 7

VC2 w1 2

1 2 3 4 5 6

PC3

w2 3 2 1 2 3 4 5 w3 4 3 2 1 2 3 4

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 6 5 4 3 2 1 w7 8 7 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

CC1 VC2 PC3 NC4 PC5 VC2 PC3

Reorderings to include in the distortion matrix

NC4 PC5 CC1 Pct6 Pct6

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-40
SLIDE 40

CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1

40

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

0 0 3 4 5 6 7

VC2 w1 2

1 2 3 4 5 6

PC3

w2 3 2 1 2 3 4 5 w3 4 3 2 1 2 3 4

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 6 5 4 3 2 1 w7 8 7 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

Pct6 Pct6

Arianna Bisazza – PhD Thesis – 19 April 2013

Reorderings to include in the distortion matrix

slide-41
SLIDE 41

41

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

0 0 3 4 5 6 7

VC2 w1 2

1 2 3 4 5 6

PC3

w2 3 2 1 2 3 4 5 w3 4 2 2 1 2 3 4

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 6 5 4 3 2 1 w7 8 7 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1 Pct6 Pct6

Arianna Bisazza – PhD Thesis – 19 April 2013

Reorderings to include in the distortion matrix

slide-42
SLIDE 42

42

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

0 0 3 4 5 6 7

VC2 w1 2

1 0 0 4 5 6

PC3

w2 3 2 1 2 3 4 5 w3 4 2 2 1 2 3 4

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 6 5 4 3 2 1 w7 8 7 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1 Pct6 Pct6

Arianna Bisazza – PhD Thesis – 19 April 2013

Reorderings to include in the distortion matrix

slide-43
SLIDE 43

43

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

0 0 0 0 5 6 7

VC2 w1 2

1 0 0 4 5 6

PC3

w2 3 2 1 2 3 4 5 w3 4 2 2 1 2 3 4

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 6 5 4 3 2 1 w7 8 7 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1 Pct6 Pct6

Arianna Bisazza – PhD Thesis – 19 April 2013

Reorderings to include in the distortion matrix

slide-44
SLIDE 44

44

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

0 0 0 0 5 6 7

VC2 w1 2

1 0 0 4 5 6

PC3

w2 3 2 1 2 3 4 5 w3 4 2 2 1 2 3 4

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 2 5 4 3 2 1 w7 8 2 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1 Pct6 Pct6

Arianna Bisazza – PhD Thesis – 19 April 2013

Reorderings to include in the distortion matrix

slide-45
SLIDE 45

45

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

0 0 0 0 5 6 7

VC2 w1 2

1 0 0 4 5 6

PC3

w2 3 2 1 2 3 4 w3 4 2 2 1 2 3

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 2 5 4 3 2 1 w7 8 2 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1 Pct6 Pct6

Arianna Bisazza – PhD Thesis – 19 April 2013

Reorderings to include in the distortion matrix

slide-46
SLIDE 46

46

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

0 0 0 0 5 6 7

VC2 w1 2

1 0 0 4 5 6

PC3

w2 3 2 1 2 3 4 w3 4 2 2 1 2 3

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 2 5 4 3 2 1 w7 8 2 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

CC1 VC2 PC3 NC4 PC5 VC2 PC3 NC4 PC5 CC1 Pct6 Pct6

Arianna Bisazza – PhD Thesis – 19 April 2013

Reorderings to include in the distortion matrix

slide-47
SLIDE 47

47

Modifying the distortion matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8 <s> 0 1 2 3 4 5 6 7 8

CC1 w0

0 0 0 0 5 6 7

VC2 w1 2

1 0 0 4 5 6

PC3

w2 3 2 1 2 3 4 w3 4 2 2 1 2 3

NC4

w4 5 4 3 2 1 2 3 w5 6 5 4 3 2 1 2

PC5

w6 7 2 5 4 3 2 1 w7 8 2 6 5 4 3 2

Pct6 w8 9

8 7 6 5 4 3 2

Arianna Bisazza – PhD Thesis – 19 April 2013

“ w‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . ”

Decoder input

slide-48
SLIDE 48

48

Experiments

  • Tasks: NIST
  • MT09 for Ar-En, WMT10 for De-En
  • Systems based on Moses, include state-of-the-art

hierarchical lexicalized reordering models

[Tillmann 04; Koehn & al 05; Galley & Manning 08]

  • Baseline Distortion Limits: 5 in Ar-En, 10 in De-En
  • Evaluation by:
  • BLEU for lexical match & local order
  • KRS for global order

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-49
SLIDE 49

Arianna Bisazza – PhD Thesis – 19 April 2013 49

Arabic-English:

Test set: eval09-nw Distortion modified with 3-best reorderings per rule-matching sequence

Translation Quality Translation Time

+0.9 BLEU +0.6 KRS

!"#$ #%&$ !'#$

!(($ !)($ #(($ #)($ *(($

+,-./012)$ +,-./012%$ 345.6012)$

!"#$%&'(

slide-50
SLIDE 50

!"#$ %&%$ '(&$ !('$

!))$ !")$ %))$ %")$ '))$ '")$

*+,-./012$ *+,-./01!)$ *+,-./01%)$ 345-6/012$

!"#$%&'(

Arianna Bisazza – PhD Thesis – 19 April 2013 50

German-English:

Test set: newstest10 Distortion modified with 3-best reorderings per rule-matching sequence

Translation Quality Translation Time

+0.5 BLEU +0.7 KRS

slide-51
SLIDE 51

51 Arianna Bisazza – PhD Thesis – 19 April 2013

modified distortion matrices improve reordering without decoding overhead language-specific reordering rules are still needed Can we learn everything from the data?

Lessons learned

slide-52
SLIDE 52

52

Outline

  • The problem
  • The solutions:
  • verb reordering lattices
  • modified distortion matrices
  • dynamically pruning the reordering space
  • Comparative evaluation & conclusions

Arianna Bisazza – PhD Thesis – 19 April 2013

Bisazza and Federico, Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation, Transactions of ACL 2013 (accepted with minor revisions)

slide-53
SLIDE 53

53

A fully data-driven approach

  • Train a binary classifier to learn if an input word wy

is to be translated right after another wx

 Word-after-Word (WaW) reordering model

“... anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet ”

yes

no no no no no

Arianna Bisazza – PhD Thesis – 19 April 2013

  • No rules required, all is learnt from parallel data
  • Approach is easily portable to new language pairs with

similar reordering characteristics

slide-54
SLIDE 54

54

[usual approach] additional feature function [novel approach dynamically prune the reordering space: ➞ use model score to decide (early) if a given reordering path is promising enough to be further explored

Arianna Bisazza – PhD Thesis – 19 April 2013

Decoder-integration

usual approach novel approach

slide-55
SLIDE 55

55 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet .

Early reordering pruning

Test time: run classifier for each input sentence

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-56
SLIDE 56

56 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Arianna Bisazza – PhD Thesis – 19 April 2013

Early reordering pruning

Test time: run classifier for each input sentence

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.3 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

Consider a larger space (DL)

slide-57
SLIDE 57

57

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Arianna Bisazza – PhD Thesis – 19 April 2013

Early reordering pruning

Test time: run classifier for each input sentence Consider a larger space (DL)

slide-58
SLIDE 58

58

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Arianna Bisazza – PhD Thesis – 19 April 2013

Early reordering pruning

Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion

slide-59
SLIDE 59

59 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Arianna Bisazza – PhD Thesis – 19 April 2013

Early reordering pruning

Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”…

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

slide-60
SLIDE 60

60 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Arianna Bisazza – PhD Thesis – 19 April 2013

Early reordering pruning

Test time: run classifier for each input sentence

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”…

slide-61
SLIDE 61

61 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Arianna Bisazza – PhD Thesis – 19 April 2013

Early reordering pruning

Test time: run classifier for each input sentence

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”… … after “Staat”…

slide-62
SLIDE 62

62 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Arianna Bisazza – PhD Thesis – 19 April 2013

Early reordering pruning

Test time: run classifier for each input sentence

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after “Die”… … after “Staat”…

slide-63
SLIDE 63

63 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Improved Word Reordering for PBSMT

Decoder-integration

How to reduce early pruning errors?  always allow short jumps!

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

slide-64
SLIDE 64

64

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10 0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1 0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2 0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3 0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1 0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2 0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1 0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7 0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5 0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . <S> Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet . Improved Word Reordering for PBSMT

Decoder-integration

How to reduce early pruning errors?  always allow short jumps! Off limits Prunable zone Non-prunable zone

slide-65
SLIDE 65

65

Experiments

  • Same tasks
  • Similar baselines, but with early distortion cost

[Moore & Quirk 07]

  • Baseline Distortion Limit: 8
  • Evaluation by:
  • BLEU, KRS
  • KRS-V Weighted KRS, only sensitive to verbs

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-66
SLIDE 66

!"#$%&'() *+"+%&'() !"#$%&',() *+"+%&',() *+"+%&',() *-$./0-12$)

(3/4) (4/5) (4/6) (4/() (7/8) 35/8) 35/6) 35/4) 35/() 3,/5) 3,/8)

!"#$%&

'()*& Arianna Bisazza – PhD Thesis – 19 April 2013 66

Arabic-English:

Translation Quality

+0.3 BLEU +0.8 KRS-V

Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis)

slide-67
SLIDE 67

!"#$%&'() *+"+%&'() !"#$%&',() *+"+%&',() *+"+%&',() *-$./0-12$)

(3/4) (4/5) (4/6) (4/() (7/8) 35/8) 35/6) 35/4) 35/() 3,/5) 3,/8)

!"#$%&

'()*& Arianna Bisazza – PhD Thesis – 19 April 2013 67

Arabic-English:

Translation Quality Translation Time

+0.6 BLEU +1.2 KRS-V

Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis)

slide-68
SLIDE 68

Arianna Bisazza – PhD Thesis – 19 April 2013 68

German-English:

Translation Quality

Test set: newstest10 Non-prunable zone width: 5 (more metrics and test sets in the thesis)

!"#$%&'() *+"+%&'() !"#$%&',() *+"+%&',() *+"+%&',() *-$./0-12$)

34/5) 36/5) 37/5) 33/5) 38/5) 3(/5) ,9/5) ,9/7) :5/5) :5/7) :,/5)

!"#$%& '()*&

+0.2 BLEU +0.7 KRS-V

slide-69
SLIDE 69

Arianna Bisazza – PhD Thesis – 19 April 2013 69

German-English:

Translation Quality

Test set: newstest10 Non-prunable zone width: 5 (more metrics and test sets in the thesis)

!"#$%&'() *+"+%&'() !"#$%&',() *+"+%&',() *+"+%&',() *-$./0-12$)

34/5) 36/5) 37/5) 33/5) 38/5) 3(/5) ,9/5) ,9/7) :5/5) :5/7) :,/5)

!"#$%& '()*&

Translation Time

+1.3 BLEU +4.0 KRS-V

slide-70
SLIDE 70

70

Outline

  • The problem
  • The solutions:
  • verb reordering lattices
  • modified distortion matrices
  • dynamically pruning the reordering space
  • Comparative evaluation & conclusions

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-71
SLIDE 71

71

Experiments

  • Same PSMT baselines
  • Best enhanced PSMT systems:
  • Ar-En: WaW model & erly reo. pruning
  • De-En: reo. lattices pruned with reo. source LM
  • Hierarchical phrase-based system:
  • default configuration (max span for rule extract.: 10 words)
  • max span for decoding: 10 or 20
  • Evaluation by:
  • BLEU, KRS
  • KRS-V Weighted KRS, only sensitive to verbs

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-72
SLIDE 72

Arianna Bisazza – PhD Thesis – 19 April 2013 72

Translation Quality Translation Time

Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis)

Arabic-English:

slide-73
SLIDE 73

Arianna Bisazza – PhD Thesis – 19 April 2013 73

Translation Quality

Test set: newstest10 Lattices pruned with reo. source LM (more metrics and test sets in the thesis)

Translation Time

German-English:

slide-74
SLIDE 74

74 Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English examples (1)

slide-75
SLIDE 75

75 Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English examples (1)

slide-76
SLIDE 76

76 Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English examples (2)

slide-77
SLIDE 77

77 Arianna Bisazza – PhD Thesis – 19 April 2013

Arabic-English examples (2)

slide-78
SLIDE 78

78 Arianna Bisazza – PhD Thesis – 19 April 2013

German-English examples (1)

slide-79
SLIDE 79

79 Arianna Bisazza – PhD Thesis – 19 April 2013

German-English examples (1)

slide-80
SLIDE 80

80 Arianna Bisazza – PhD Thesis – 19 April 2013

German-English examples (2)

slide-81
SLIDE 81

81 Arianna Bisazza – PhD Thesis – 19 April 2013

German-English examples (2)

slide-82
SLIDE 82

82

Conclusions

  • Our techniques advance the state of the art in reordering

modeling within the PSMT framework:  capture long-range reordering patterns without sacrificing decoding efficiency  proved importance of refining the reordering search space

  • Positive results on large-scale news translation task in two

difficult language pairs:  significant gains in reordering-specific metrics while generic scores are preserved or increased  our best PSMT systems compare favorably with a strong tree-based approach (HSMT) - both in quality and effjciency

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-83
SLIDE 83

83

Future Directions

  • Improve the proposed methods by:

 refining chunk-based reordering rules with POS or lexical clues  increasing accuracy of WaW model with new features  combining different reordering scores for early pruning

  • Evaluate on language pairs with similar reordering characteristics
  • Analyze the effect of improved long reordering on post-editing

effort by human translators

  • Address the problem of reordering search space definition in

HSMT, possibly with analogous strategies

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-84
SLIDE 84

84

Related publications

  • A. Bisazza, M. Federico, “Chunk-based Verb Reordering in VSO Sentences for

Arabic-English”, WMT 2010.

  • C. Hardmeier, A. Bisazza, M. Federico, “Word Lattices for Morphological

Reduction and Chunk-based Reordering”, WMT 2010.

  • A. Bisazza, D. Pighin, M. Federico, “Chunk-Lattices for Verb Reordering in

Arabic-English Statistical Machine Translation”, MT Journal, Special Issues on MT for Arabic, 2012.

  • A. Bisazza, M. Federico, “Modified Distortion Matrices for Phrase-Based

Statistical Machine Translation”, ACL 2012.

  • A. Bisazza, M. Federico, “Dynamically Shaping the Reordering Search Space
  • f Phrase-Based Statistical Machine Translation”,

Transactions of the ACL 2013 (accepted with minor revisions).

Arianna Bisazza – PhD Thesis – 19 April 2013

slide-85
SLIDE 85

85 w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 5 6 7 8 w2 3 T 1 2 3 4 5 6 7 w3 4 H 2 1 2 3 Y 5 6 w4 5 A T T E N T I O N ! w5 6 N 4 3 2 1 U 3 4 w6 7 K 5 4 3 2 F O R 2 3 w7 8 S 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2 Arianna Bisazza – PhD Thesis – 19 April 2013

slide-86
SLIDE 86

86 w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 <s> 0 1 2 3 4 5 6 7 8 9 10 w0 1 2 3 4 5 6 7 8 9 w1 2 1 2 3 4 5 6 7 8 w2 3 T 1 2 3 4 5 6 7 w3 4 H 2 1 2 3 Y 5 6 w4 5 A T T E N T I O N ! w5 6 N 4 3 2 1 U 3 4 w6 7 K 5 4 3 2 F O R 2 3 w7 8 S 6 5 4 3 2 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 w10 11 10 9 8 7 6 5 4 3 2 Arianna Bisazza – PhD Thesis – 19 April 2013