Cohesive Constraints in a Beam Search Phrase-based Decoder
Nguyen Bach, Stephan Vogel Carnegie Mellon University Colin Cherry Microsoft Research
1
Cohesive Constraints in a Beam Search Phrase-based Decoder Nguyen - - PowerPoint PPT Presentation
Cohesive Constraints in a Beam Search Phrase-based Decoder Nguyen Bach, Stephan Vogel Colin Cherry Carnegie Mellon University Microsoft Research 1 Overview Apply cohesive constraints during decoding process to consider the source
1
– Significant improvements on English-Spanish – Stable improvements on other pairs
2
3
the presidential election begins tomorrow states united
the la élection présidentielle commence demain des États Unis the presidential election of the united states begins tomorrow English->French Source: Source dependency tree 1 2 3
4
the presidential election begins tomorrow states united
the la élection présidentielle commence demain des États Unis the presidential election of the united states begins tomorrow English->French Source: Source dependency tree 1 3 2
5
the presidential election begins tomorrow states united
the la élection présidentielle commence demain des États Unis the presidential election of the united states begins tomorrow 1 2 3 the presidential election begins tomorrow states united
the la élection présidentielle commence demain des États Unis the presidential election of the united states begins tomorrow 1 3 2 Phrase-based decoder Cohesive decoding
6
la élection présidentielle commence demain des États Unis the presidential election begins tomorrow states united
the 1 2 3 4
7
8
9
la élection présidentielle commence demain des États Unis the presidential election begins tomorrow states united
the Interruption Check: NO Exhaustive Interruption Check: YES 1 2 3 4 5
10
11
the presidential election begins tomorrow states united
the the /DT presidential /JJ election /NN begins /VBZ tomorrow /NN states /NNS united /VBN
the /DT SBJ OBJ NMOD NMOD NMOD PMOD NMOD NMOD
– Binary event: violation/not violate – Interruption Count: untranslated word count – Verb Count: untranslated verb count – Noun Count: untranslated noun count
12
How to penalize a cohesion violation? Binary Number of untranslated words Linguistics features How to detect the largest subtree T(n)? The previous phrase Interruption Check Interruption Count Rich Interruption Constraints All previous phrases Exhaustive Interruption Check Exhaustive Interruption Count N/A
13
14
23 23.2 23.4 23.6 23.8 24 24.2 24.4 24.6 24.8 25
BLEU
TransTac June08
English-Iraqi
31.4 31.6 31.8 32 32.2 32.4 32.6 32.8 33 33.2 33.4
BLEU
Europarl nctest2007
English-Spanish
Cohesive constraints obtained improvements over the standard phrase-based decoder.
15
16
– M1: 19.41% – M2: 86.21%
– M2 is better than M1
31.2 31.4 31.6 31.8 32 32.2 32.4 32.6 32.8 33 33.2
BLEU
M1 M2
17
18
Cohesive constraints obtained improvements even with large scale system and strong reordering models
25.6 25.8 26 26.2 26.4 26.6 26.8 27 27.2 BLEU
GALE Dev07-NW
24.6 24.7 24.8 24.9 25 25.1 25.2 25.3 25.4 25.5 BLEU
GALE Dev07-WB
19
20
– Cohesive constraints are helpful – The effectiveness was shown when using with a strong reordering model – Obtained improvements with 3 language pairs and also covered a wide range of training corpus sizes, ranging from 500K up to 11M sentence pairs
– A source side dependency reordering model: Learning reordering events of the phrases based on source subtree movements – A hierarchical source side dependency reordering model: extend Galley&Manning (2008).
21
22