Source side Dependency Tree Reordering Models with Subtree Movements - - PowerPoint PPT Presentation

source side dependency tree reordering models with
SMART_READER_LITE
LIVE PREVIEW

Source side Dependency Tree Reordering Models with Subtree Movements - - PowerPoint PPT Presentation

Source side Dependency Tree Reordering Models with Subtree Movements and Constraints Nguyen Bach, Qin Gao and Stephan Vogel Carnegie Mellon University 1 Overview We introduce source side dependency tree reordering models Inspired


slide-1
SLIDE 1

Source‐side Dependency Tree Reordering Models with Subtree Movements and Constraints

Nguyen Bach, Qin Gao and Stephan Vogel

Carnegie Mellon University

1

slide-2
SLIDE 2

Overview

  • We introduce source‐side dependency tree reordering models
  • Inspired by lexicalized reordering model (Koehn et. al 2005) ,

hierarchical dependency translation (Shen et. al, 2008) and cohesive decoding (Cherry, 2008)

  • We model reordering events of phrases associated with source‐side

dependency trees

  • Inside/Outside subtree movements efficiently capture the statistical

distribution of the subtree‐to‐subtree transitions in training data

  • Utilize subtree movements directly at the decoding time alongside

with cohesive constraints to guide the search process

  • Improvements are shown in English‐Spanish and English‐Iraqi tasks

2

slide-3
SLIDE 3

Outline

  • Background & Motivations
  • Source‐side dependency tree reordering

models

– Modeling – Training – Decoding

  • Experiments & Analysis
  • Conclusions

3

slide-4
SLIDE 4

Background of Reordering Models

4

Explicitly model phrase reordering distances Put syntactic analysis of the target language into both modeling and decoding Use source language syntax

slide-5
SLIDE 5

5

Explicitly model phrase reordering distances Put syntactic analysis of the target language into both modeling and decoding Use source language syntax Distance‐based (Och, 2002; Koehn et.al., 2003) Lexicalized phrase (Tillmann, 2004; Koehn, et.al., 2005; Al‐ Onaizan and Papineni, 2006) Hierarchical phrase (Galley and Manning, 2008) MaxEnt classifier (Zens and Ney, 2006; Xiong, et.al., 2006; Chang, et. al., 2009) Direct model target language constituents movement in either constituency trees (Yamada and Knight, 2001; Galley et.al., 2006; Zollmann et.al., 2008) or dependency trees (Quirk, et.al., 2005) Hierarchical phrase‐based (Chiang, 2005; Shen et. al., 2008) Preprocessing with syntactic reordering rules (Xia and McCord, 2004; Collins et.al., 2005; Rottmann and Vogel, 2007; Wang et.al., 2007; Xu et.al. 2009) Use syntactical analysis to provide multiple source sentence reordering options through word lattices (Zhang et.al., 2007; Li et.al., 2007; Elming, 2008).

slide-6
SLIDE 6

6

Explicitly model phrase reordering distances Put syntactic analysis of the target language into both modeling and decoding Use source language syntax Distance‐based (Och, 2002; Koehn et.al., 2003) Lexicalized phrase (Tillmann, 2004; Koehn, et.al., 2005; Al‐ Onaizan and Papineni, 2006) Hierarchical phrase (Galley and Manning, 2008) MaxEnt classifier (Zens and Ney, 2006; Xiong, et.al., 2006; Chang, et. al., 2009) Direct modeling of target language constituents movement in either constituency trees (Yamada and Knight, 2001; Galley et.al., 2006; Zollmann et.al., 2008) or dependency trees (Quirk, et.al., 2005) Hierarchical phrase‐based (Chiang, 2005; Shen et. al., 2008) Preprocessing with syntactic reordering rules (Xia and McCord, 2004; Collins et.al., 2005; Rottmann and Vogel, 2007; Wang et.al., 2007; Xu et.al. 2009) Use syntactical analysis to provide multiple source sentence reordering options through word lattices (Zhang et.al., 2007; Li et.al., 2007; Elming, 2008).

Source‐side Dependency Tree Reordering Models with Subtree Movements and Constraints

slide-7
SLIDE 7

What are the differences?

  • Instead of using flat word structures to extract reordering

events, utilize source‐side dependency structures

– Provide more linguistic cues for reordering events

  • Instead of using pre‐defined reordering patterns, learn

reordering feature distributions from training data

– Capture reordering events from real data

  • Instead of preprocessing the data, discriminatively train the

reordering model via MERT

– Tighter integration with the decoder

7

slide-8
SLIDE 8

Cohesive Decoding

  • A cohesive decoding (Cherry, 08; Bach et. al., 09) is forcing the

cohesive constraint:

– When the decoder begins translation any part of a source subtree, it must cover all words under that subtree before it can translate anything outside.

  • Source‐side dependency tree reordering models

– Efficiently capture the statistical distribution of the subtree‐to‐subtree transitions in training data. – Directly utilize it at the decoding time to guide the search process.

8

slide-9
SLIDE 9

Outline

  • Background of Reordering Models
  • Source‐side dependency tree reordering

models

– Modeling – Training – Decoding

  • Experiments & Analysis
  • Conclusions

9

slide-10
SLIDE 10

Lexicalized Reordering Models (Tillmann, 2004; Koehn, et.al., 2005; Al‐Onaizan & Papineni, 2006)

10

= −

=

n i i i a i i

a a f e

  • p

f e O p

i

1 1

) , , , | ( ) , | (

); ( possibles 3

  • ver

value a has each sequence; phrase n

  • rientatio

is ; alignment an by defined phrase d translate a has which phrase source a is ; alignments phrase is ) ( phrases; language target the is ) ,..., ( sentence; input the is

i

a 1 1

M, S,D

  • O

a e f ,...,a a a e e e f where

i i i n n

= =

slide-11
SLIDE 11

11 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

slide-12
SLIDE 12

12 16 15 14 13 12 11 10 9 8 7 6 5 4 3

  • quisiera

2

  • tanto

1

  • lo
  • Por

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

slide-13
SLIDE 13

13 16 15 14 13 12 11 10 9 8 7 6 5 4

  • pedirle

3

  • quisiera

2

  • tanto

1

  • lo
  • Por

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Discontinuous

slide-14
SLIDE 14

14 16 15 14 13 12 11 10 9 8 7 6 5

  • nuevamente

4

  • pedirle

3

  • quisiera

2

  • tanto

1

  • lo
  • Por

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Discontinuous Swap

slide-15
SLIDE 15

15 16 15 14 13 12 11 10

  • que

9

  • de

8

  • encargue

7 se 6 que 5

  • nuevamente

4

  • Pedirle

3

  • quisiera

2

  • tanto

1

  • lo
  • Por

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Discontinuous Swap Discontinuous

slide-16
SLIDE 16

16 16

  • neerlandés

15

  • canal

14

  • Un

13 también 12

  • ver

11

  • podamos

10

  • que

9

  • de

8

  • encargue

7 se 6 que 5

  • nuevamente

4

  • Pedirle

3

  • quisiera

2

  • tanto

1

  • lo
  • Por

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Discontinuous Swap Discontinuous Monotone

slide-17
SLIDE 17

Pros & Cons of Lexicalized Reordering Models

  • Pros

– intuitively model flat word movements – well‐defined for phrase‐based framework

  • Cons

– No linguistics structures – Need alignment matrix to determine movements

17

slide-18
SLIDE 18

Completed/Open subtrees

18

g b a c d e f A completed subtree

All words under a node have been translated then we call a completed subtree

slide-19
SLIDE 19

Completed/Open subtrees

19

g b a c d e f An open subtree

A subtree that has begun translation but not yet complete, an open subtree

slide-20
SLIDE 20

Inside/Outside subtree movements

20

g b a c d e f

Inside

“c” is moving inside a subtree rooted at “b”

A structure is moving inside a subtree if it helps the subtree to be completed or less open

slide-21
SLIDE 21

Inside/Outside subtree movements

21

g b a c d e f

Outside

“d e” is moving outside a subtree rooted at “b”

A structure is moving

  • utside a subtree if it

leaves the subtree to be

  • pen
slide-22
SLIDE 22

Source‐side Dependency Tree (SDT) Reordering Models

22

= −

=

n i i i i a i i

s s a f e d p f e D p

i

1 1

) , , , , | ( ) , | (

; alignment an by defined phrase d translate a has which phrase source a is ; alignments phrase is ) ( phrases; language target the is ) ,..., ( sentence; input the is

i

a 1 1 i i n n

a e f ,...,a a a e e e f where = =

; each tree; dependency source

  • ver

movements phrase syntactic

  • f

sequence the represents ; and phrases source

  • f

structures dependency are and

1

  • i

i

a a 1

{I, O} d D f f s s

i i- i

=

slide-23
SLIDE 23

23

neerlandés canal un también ver podamos que de encargue se que nuevamente Pedirle quisiera tanto lo Por

ask more

  • nce

would get channel that we therefore I ensure you to as Dutch a well

slide-24
SLIDE 24

24

  • neerlandés
canal un también ver podamos que de encargue se que nuevamente Pedirle quisiera tanto lo Por

ask more

  • nce

would get channel that we therefore I ensure you to as Dutch a well

slide-25
SLIDE 25

25

  • neerlandés
canal un también ver podamos que de encargue se que nuevamente Pedirle quisiera tanto lo Por

ask more

  • nce

would get channel that we therefore I ensure you to as Dutch a well

Discontinuous

Inside

slide-26
SLIDE 26

26

  • neerlandés
canal un también ver podamos que de encargue se que nuevamente Pedirle quisiera tanto lo Por

ask more

  • nce

would get channel that we therefore I ensure you to as Dutch a well

Discontinuous

Inside

Swap

Outside

slide-27
SLIDE 27

27

  • neerlandés
canal un también ver podamos que de encargue se que nuevamente Pedirle quisiera tanto lo Por

ask more

  • nce

would get channel that we therefore I ensure you to as Dutch a well

Discontinuous

Inside

Swap

Outside

Discontinuous

Inside

slide-28
SLIDE 28

28

  • neerlandés
canal un también ver podamos que de encargue se que nuevamente Pedirle quisiera tanto lo Por

ask more

  • nce

would get channel that we therefore I ensure you to as Dutch a well

Discontinuous

Inside

Swap

Outside

Discontinuous

Inside

Monotone

Inside

slide-29
SLIDE 29

29

  • neerlandés
canal un también ver podamos que de encargue se que nuevamente Pedirle quisiera tanto lo Por

ask more

  • nce

would get channel that we therefore I ensure you to as Dutch a well

Discontinuous

Inside

Swap

Outside

Discontinuous

Inside

Monotone

Source-side Dependency Tree R.M. Lexicalized R.M.

Inside Outside Inside Inside Inside

Discontinuous Swap Discontinuous Monotone

slide-30
SLIDE 30

Extended Source‐side Dependency Tree (SDT) Reordering Models

30

= − −

=

n i i i i i a i i

s s a a f e d

  • p

f e D p

i

1 1 1

) , , , , , | ) _ (( ) , | (

; and phrases source

  • f

structures dependency are and ; alignment an by defined phrase d translate a has which phrase source a is ; alignments phrase is ) ( phrases; language target the is ) ,..., ( sentence; input the is

1
  • i
i i

a a 1 a 1 1

f f s s a e f ,...,a a a e e e f where

i- i i i n n

= =

; each tree; dependency source

  • ver

movements phrase syntactic

  • f

sequence the represents } , S_O, D_O , D_I, M_O {M_I, S_I (o_d) D

i =

slide-31
SLIDE 31

Extended Source‐side Dependency Tree (SDT) Reordering Models

31

= − −

=

n i i i i i a i i

s s a a f e d

  • p

f e D p

i

1 1 1

) , , , , , | ) _ (( ) , | (

; and phrases source

  • f

structures dependency are and ; alignment an by defined phrase d translate a has which phrase source a is ; alignments phrase is ) ( phrases; language target the is ) ,..., ( sentence; input the is

1
  • i
i i

a a 1 a 1 1

f f s s a e f ,...,a a a e e e f where

i- i i i n n

= =

; each tree; dependency source

  • ver

movements phrase syntactic

  • f

sequence the represents } , S_O, D_O , D_I, M_O {M_I, S_I (o_d) D

i =

D_I S_O D_I M_I

Inside Outside Inside Inside

Discontinuous Swap Discontinuous Monotone

slide-32
SLIDE 32

Training

  • Obtain dependency parse of the source side
  • Given a sentence pair and the source side

dependency tree

– Phrase extraction: also extract source dependency structures of phrase pairs – Identify Inside/Outside movement by using Interruption Check Algorithms (Bach et.al., 2009)

32

slide-33
SLIDE 33

Training

33

∑ ∑

+ + =

k j k j k j k j a i k j

d

  • count

d

  • count

d

  • f

e d

  • p

i

) ) _ ( ( ) _ ( ) , , , | ) _ (( γ γ

+ + =

k k j k j k a i k j

d

  • count

d

  • count

d f e d

  • p

i

) ) _ ( ( ) _ ( ) , , | ) _ (( γ γ

+ + =

j k j k j j a i k j

d

  • count

d

  • count
  • f

e d

  • p

i

) ) _ ( ( ) _ ( ) , , | ) _ (( γ γ DO: a joint probability of subtree movements and lexicalized orientations DOD: conditioned on subtree movements DOO: conditioned on lexicalized orientations

slide-34
SLIDE 34

Decoding

  • Without cohesive constraints

– Having no information about the source dependency tree information during the decoding time – Consider both subtree movements, and add them up to the translation model costs

  • With cohesive constraints

– The source dependency tree is available during the decoding time – Only consider either inside or outside movement, depending on the

  • utput of the interruption check algorithm

34

slide-35
SLIDE 35

Outline

  • Background of Reordering Models
  • Source‐side dependency tree reordering

models

– Modeling – Training – Decoding

  • Experiments & Analysis
  • Conclusion

35

slide-36
SLIDE 36

Experiments setups

  • Baseline: a phrase‐based MT with lexicalized reordering

model

  • Coh: using cohesive constraints
  • DO / DOD / DOO: using source‐side dependency tree (SDT)

reordering model with different parameter estimations

  • DO+Coh / DOD+Coh / DOO+Coh: decoding with both SDT

reordering model and cohesive constraints.

36

slide-37
SLIDE 37

English‐Spanish (Europarl)

  • Source‐side dependency tree reordering models and cohesive

constraints obtained improvements over the lexicalized reordering models.

37

32.6 32.8 33 33.2 33.4 33.6 33.8

BLEU English‐Spanish: nc‐test2007

19.6 19.8 20 20.2 20.4 20.6 20.8

BLEU English‐Spanish: news‐test2008

slide-38
SLIDE 38

English‐Iraqi (TransTac)

38

  • Decoding with both source‐side dependency tree reordering

models and cohesive constraints often obtain the best performance.

25 25.1 25.2 25.3 25.4 25.5 25.6 25.7

BLEU English‐Iraqi: june2008

17.6 17.8 18 18.2 18.4 18.6 18.8 19 19.2

BLEU English‐Iraqi: nov2008

slide-39
SLIDE 39

Where are improvements coming from?

39

slide-40
SLIDE 40

Test set breakdown

  • Divide the test sets into three portions based on

sentence‐level TER of the baseline system

  • μ and σ are mean and standard deviation of the

whole test set

  • Head, Tail and Mid as the sentence whose score is

lower than μ‐1/2 σ, higher than μ+1/2 σ and the rest

40

slide-41
SLIDE 41

41

‐0.8 ‐0.3 0.2 0.7

BLEU English‐Spanish: nc‐test2007

tail mid head ‐1 ‐0.5 0.5 1

BLEU English‐Spanish: news‐test2008

tail mid head ‐1 ‐0.5 0.5 1 1.5

BLEU English‐Iraqi: june‐2008

tail mid head ‐4 ‐2 2 4

BLEU English‐Iraqi: nov‐2008

tail mid head

june‐08 nov‐08 nc‐test2007 news‐test2008 Head 7.92 6.27 20.39 13.07 Mid 12.31 11.09 28.07 22.78 Tail 13.91 14.08 35.29 25.33

slide-42
SLIDE 42

What is the most significant effect the source‐ tree reordering models contribute?

42

slide-43
SLIDE 43

Numbers of Reorderings

nc‐test2007 news‐test2008 june‐2008 nov‐2008 Baseline 1507 1684 39 24 Coh 2045 2903 46 21 DO 2189 2113 97 58 DO+Coh 1929 1900 155 88 DOD 1735 2592 123 60 DOD+Coh 2070 2021 148 90 DOO 1735 1785 164 49 DOO+Coh 1818 1959 247 66

43

  • More reorderings can be generated without losing performance.
  • The source‐tree reordering models provide a more discriminative mechanism to

estimate reordering events.

  • Reordering is more language‐specific than general translation models, and the

conditions for a reordering event to happen vary among languages.

slide-44
SLIDE 44

Outline

  • Background & Motivations
  • Source‐side dependency tree reordering

models

– Modeling – Training – Decoding

  • Experiments & Analysis
  • Conclusions

44

slide-45
SLIDE 45

Conclusions & Future Work

  • Conclusions

– Source‐side dependency tree reordering models are helpful

  • Model reordering event with Inside/Outside subtree movements

– The effectiveness was shown when comparing with a strong reordering model – Obtained improvements with 2 language pairs and also covered a training corpus sizes, ranging from 500K up to 1.3M sentence pairs

  • Future work

– A hierarchical source side dependency reordering model: extend Galley&Manning (2008). – Packed‐forest dependency tree reordering models

45

slide-46
SLIDE 46

Back up

46

slide-47
SLIDE 47

47

g b a c d e f

A completed subtree

g b a c d e f

An open subtree

g b a c d e f Outside

“d e” is moving outside a subtree rooted at “b”

g b a c d e f Inside

“c” is moving inside a subtree rooted at “b”

slide-48
SLIDE 48

48

g b a c d e f Outside g b a c d e f Inside g b a c d e f Inside g b a c d e f Outside

slide-49
SLIDE 49

What do you mean by introducing Inside/Outside notions?

  • The movement of the subtree inside or outside a

source subtree can be viewed as the decoder is leaving from the previous source state to the current source state.

  • Tracking facts about the subtree‐to‐subtree

transitions observed in the source side of word‐ aligned training data.

49

slide-50
SLIDE 50

50

Lexicalized Source‐tree ask you # pedirle dis swap D_I * ask you # pedirle mono mono M_I ask you # pedirle mono mono M_O

  • nce more # nuevamente

swap dis S_O *

  • nce more # nuevamente

dis swap D_O

  • nce more # nuevamente que

swap dis S_O M_I S_I D_I M_O S_O D_O DO 0.691 0.003 0.142 0.119 0.009 0.038 DOD 0.827 0.003 0.17 0.719 0.053 0.228 DOO 0.854 0.25 0.79 0.146 0.75 0.21

inside and outside probabilities for phrase “ask you”- “pedirle” according to three parameter estimation methods

slide-51
SLIDE 51

Distributions of Reordering Events

51

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

M_I S_I D_I M_O S_O D_O

En‐Es

0.1 0.2 0.3 0.4 0.5 0.6 0.7

M_I S_I D_I M_O S_O D_O

En‐Ir

Observed monotone & inside (M_I) movements more often than other categories

slide-52
SLIDE 52

52

Explicitly model phrase reordering distances Put syntactic analysis of the target language into both modeling and decoding Use source language syntax Distance‐based (Och, 2002; Koehn et.al., 2003) Lexicalized phrase (Tillmann, 2004; Koehn, et.al., 2005; Al‐ Onaizan and Papineni, 2006) Hierarchical phrase (Galley and Manning, 2008) MaxEnt classifier (Zens and Ney, 2006; Xiong, et.al., 2006; Chang, et. al., 2009)

slide-53
SLIDE 53

53

Explicitly model phrase reordering distances Put syntactic analysis of the target language into both modeling and decoding Use source language syntax Distance‐based (Och, 2002; Koehn et.al., 2003) Lexicalized phrase (Tillmann, 2004; Koehn, et.al., 2005; Al‐ Onaizan and Papineni, 2006) Hierarchical phrase (Galley and Manning, 2008) MaxEnt classifier (Zens and Ney, 2006; Xiong, et.al., 2006; Chang, et. al., 2009) Direct model target language constituents movement in either constituency trees (Yamada and Knight, 2001; Galley et.al., 2006; Zollmann et.al., 2008) or dependency trees (Quirk, et.al., 2005) Hierarchical phrase‐based (Chiang, 2005; Shen et. al., 2008)