Improving Bilingual Sub-sentential Alignment by Sampling-based - - PowerPoint PPT Presentation

improving bilingual sub sentential alignment by sampling
SMART_READER_LITE
LIVE PREVIEW

Improving Bilingual Sub-sentential Alignment by Sampling-based - - PowerPoint PPT Presentation

Improving Bilingual Sub-sentential Alignment by Sampling-based Transpotting Li Gong , Aur elien Max, Franc ois Yvon LIMSI-CNRS & Universit e Paris-Sud Orsay, France Method Experimental Results Conclusion and future work Context


slide-1
SLIDE 1

Improving Bilingual Sub-sentential Alignment by Sampling-based Transpotting

Li Gong, Aur´ elien Max, Franc ¸ois Yvon

LIMSI-CNRS & Universit´ e Paris-Sud Orsay, France

slide-2
SLIDE 2

Method Experimental Results Conclusion and future work

Context of this work

Building SMT systems, step 1 : align parallel corpus

a t r

  • u

p e

  • f

a c t

  • r

s i n c

  • s

t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...

  • parallel corpus can be huge
  • we don’t use/need everything
  • we may regularly receive new

data Our method for parallel corpus alignment

  • is very simple to describe and implement
  • processes each sentence pair independently
  • uses new data transparently (plug-and-play)

2 / 26

slide-3
SLIDE 3

Method Experimental Results Conclusion and future work

Context of this work

Building SMT systems, step 1 : align parallel corpus

a t r

  • u

p e

  • f

a c t

  • r

s i n c

  • s

t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...

  • parallel corpus can be huge
  • we don’t use/need everything
  • we may regularly receive new

data Our method for parallel corpus alignment

  • is very simple to describe and implement
  • processes each sentence pair independently
  • uses new data transparently (plug-and-play)

2 / 26

slide-4
SLIDE 4

Method Experimental Results Conclusion and future work

Outline

1 Method

Sampling-based transpotting Sub-sentential alignment extraction

2 Experimental Results

Basic alignment task Incremental alignment task

3 Conclusion and future work

3 / 26

slide-5
SLIDE 5

Method Experimental Results Conclusion and future work

Outline

1 Method

Sampling-based transpotting Sub-sentential alignment extraction

2 Experimental Results

Basic alignment task Incremental alignment task

3 Conclusion and future work

4 / 26

slide-6
SLIDE 6

Method Experimental Results Conclusion and future work

Outline

1 Method

Sampling-based transpotting Sub-sentential alignment extraction

2 Experimental Results

Basic alignment task Incremental alignment task

3 Conclusion and future work

5 / 26

slide-7
SLIDE 7

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26

slide-8
SLIDE 8

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . English French 1

  • ne coffee , please .

un caf´ e , s’il vous plaˆ ıt . 2 the coffee is not bad . ce caf´ e est correct . 3 yes , one tea .

  • ui , un th´

e . 6 / 26

slide-9
SLIDE 9

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . English French 1

  • ne coffee , please .

un caf´ e , s’il vous plaˆ ıt . 2 the coffee is not bad . ce caf´ e est correct . 3 yes , one tea .

  • ui , un th´

e . 6 / 26

slide-10
SLIDE 10

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . English French 1

  • ne coffee , please .

un caf´ e , s’il vous plaˆ ıt . 2 the coffee is not bad . ce caf´ e est correct . 3 yes , one tea .

  • ui , un th´

e . 6 / 26

slide-11
SLIDE 11

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . English French 1

  • ne coffee , please .

un caf´ e , s’il vous plaˆ ıt . 2 the coffee is not bad . ce caf´ e est correct . 3 yes , one tea .

  • ui , un th´

e . 6 / 26

slide-12
SLIDE 12

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . English French 1

  • ne coffee , please .

un caf´ e , s’il vous plaˆ ıt . 2 the coffee is not bad . ce caf´ e est correct . 3 yes , one tea .

  • ui , un th´

e . 6 / 26

slide-13
SLIDE 13

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . English French 1

  • ne coffee , please .

un caf´ e , s’il vous plaˆ ıt . 2 the coffee is not bad . ce caf´ e est correct . 3 yes , one tea .

  • ui , un th´

e . 6 / 26

slide-14
SLIDE 14

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . English French 1

  • ne coffee , please .

un caf´ e , s’il vous plaˆ ıt . 2 the coffee is not bad . ce caf´ e est correct . 3 yes , one tea .

  • ui , un th´

e . 6 / 26

slide-15
SLIDE 15

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . words with same distribution profile profiles

  • ne ,

↔ un , [1, 0, 1] diet coke ↔ coca z´ ero [0, 0, 0] please ↔ s’il vous plaˆ ıt [1, 0, 0] . ↔ . [1, 1, 1] #(diet coke ↔ coca z´ ero) += 1 #(please ↔ s’il vous plaˆ ıt) += 1 #(. ↔ .) += 1 6 / 26

slide-16
SLIDE 16

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . words with same distribution profile profiles

  • ne ,

↔ un , [1, 0, 1] diet coke ↔ coca z´ ero [0, 0, 0] please ↔ s’il vous plaˆ ıt [1, 0, 0] . ↔ . [1, 1, 1] #(diet coke ↔ coca z´ ero) += 1 #(please ↔ s’il vous plaˆ ıt) += 1 #(. ↔ .) += 1 6 / 26

slide-17
SLIDE 17

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

  • ne

[1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . words with same distribution profile profiles

  • ne ,

↔ un , [1, 0, 1] diet coke ↔ coca z´ ero [0, 0, 0] please ↔ s’il vous plaˆ ıt [1, 0, 0] . ↔ . [1, 1, 1] #(diet coke ↔ coca z´ ero) += 1 #(please ↔ s’il vous plaˆ ıt) += 1 #(. ↔ .) += 1 6 / 26

slide-18
SLIDE 18

Method Experimental Results Conclusion and future work

Sampling-based transpotting

1 Given a source-target sentence pair, extract an association

table :

  • ne diet coke , please . ↔ un coca z´

ero , s’il vous plaˆ ıt .

2 Draw a random sub-corpus from the parallel corpus and

compute profiles for each word

3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table

for the given sentence pair source phrase target phrase count

  • ne

↔ un 830 coke ↔ coca 680 diet coke ↔ coca z´ ero 260

  • ne diet coke

↔ un coca z´ ero 30 , ↔ , 900 please ↔ s’il vous plaˆ ıt 160 . ↔ . 980

6 / 26

slide-19
SLIDE 19

Method Experimental Results Conclusion and future work

Outline

1 Method

Sampling-based transpotting Sub-sentential alignment extraction

2 Experimental Results

Basic alignment task Incremental alignment task

3 Conclusion and future work

7 / 26

slide-20
SLIDE 20

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 ε ε ε ε ε ε ε diet ε 0.310 0.382 ε ε ε ε ε coke ε 0.738 0.132 ε ε ε ε ε , ε ε ε 0.624 ε ε ε 0.248 please ε ε ε ε 0.132 0.108 0.628 ε . ε ε ε 0.102 ε ε ε 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-21
SLIDE 21

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 ε ε ε ε ε ε ε diet ε 0.310 0.382 ε ε ε ε ε coke ε 0.738 0.132 ε ε ε ε ε , ε ε ε 0.624 ε ε ε 0.248 please ε ε ε ε 0.132 0.108 0.628 ε . ε ε ε 0.102 ε ε ε 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-22
SLIDE 22

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.310 0.382 ε ε ε ε ε coke 0.738 0.132 ε ε ε ε ε , ε ε 0.624 ε ε ε 0.248 please ε ε ε 0.132 0.108 0.628 ε . ε ε 0.102 ε ε ε 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-23
SLIDE 23

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.310 0.382 ε ε ε ε ε coke 0.738 0.132 ε ε ε ε ε , ε ε 0.624 ε ε ε 0.248 please ε ε ε 0.132 0.108 0.628 ε . ε ε 0.102 ε ε ε 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-24
SLIDE 24

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.310 0.382 coke 0.738 0.132 , 0.624 ε ε ε 0.248 please ε 0.132 0.108 0.628 ε . 0.102 ε ε ε 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-25
SLIDE 25

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.310 0.382 coke 0.738 0.132 , 0.624 ε ε ε 0.248 please ε 0.132 0.108 0.628 ε . 0.102 ε ε ε 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-26
SLIDE 26

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.382 coke 0.738 , 0.624 ε ε ε 0.248 please ε 0.132 0.108 0.628 ε . 0.102 ε ε ε 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-27
SLIDE 27

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.382 coke 0.738 , 0.624 ε ε ε 0.248 please ε 0.132 0.108 0.628 ε . 0.102 ε ε ε 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-28
SLIDE 28

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.382 coke 0.738 , 0.624 ε ε ε please ε 0.132 0.108 0.628 . 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-29
SLIDE 29

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.382 coke 0.738 , 0.624 ε ε ε please ε 0.132 0.108 0.628 . 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-30
SLIDE 30

Method Experimental Results Conclusion and future work

Sub-sentential alignment : algorithm illustration

un coca z´ ero , s’il vous plaˆ ıt .

  • ne

0.846 diet 0.382 coke 0.738 , 0.624 please 0.132 0.108 0.628 . 0.873

un coca zéro , s'il

  • ne

diet coke , please vous plaît . .

i j k l n m i j k l n m Straight rule Inversion rule

8 / 26

slide-31
SLIDE 31

Method Experimental Results Conclusion and future work

Sub-sentential alignment : details

1 Association score w(s,t) between source and target words :

w(s,t) = p(s|t)∗p(t|s)

2 Segmentation criterion :

cut(X,Y) = cut(¯ X, ¯ Y) = W(X, ¯ Y)+W(¯ X,Y) where :

  • (X,Y) ∈ {A, ¯

A}×{B, ¯ B}

  • W(X,Y) =

s∈X,t∈Y

w(s,t)

B ¯ B t1 ... ty−1 ty ... tJ s1 A . . . W(A,B) W(A, ¯ B) sx−1 sx ¯ A . . . W( ¯ A,B) W( ¯ A, ¯ B) sI

  • 9 / 26
slide-32
SLIDE 32

Method Experimental Results Conclusion and future work

Outline

1 Method

Sampling-based transpotting Sub-sentential alignment extraction

2 Experimental Results

Basic alignment task Incremental alignment task

3 Conclusion and future work

10 / 26

slide-33
SLIDE 33

Method Experimental Results Conclusion and future work

Outline

1 Method

Sampling-based transpotting Sub-sentential alignment extraction

2 Experimental Results

Basic alignment task Incremental alignment task

3 Conclusion and future work

11 / 26

slide-34
SLIDE 34

Method Experimental Results Conclusion and future work

Basic alignment task : systems

file

Parallel Corpus

file

Parallel Corpus giza++ sba giza++ alignment sba alignment Moses Moses Dev/ Test MERT MERT

Baseline : giza++ with default setting Our method : sba, drawing 1,000 sub-corpora per sentence pair

12 / 26

slide-35
SLIDE 35

Method Experimental Results Conclusion and future work

Basic alignment task : systems

file

Parallel Corpus

file

Parallel Corpus giza++ sba giza++ alignment sba alignment Moses Moses Dev/ Test MERT MERT

Baseline : giza++ with default setting Our method : sba, drawing 1,000 sub-corpora per sentence pair

12 / 26

slide-36
SLIDE 36

Method Experimental Results Conclusion and future work

Basic alignment task : systems

file

Parallel Corpus

file

Parallel Corpus giza++ sba giza++ alignment sba alignment Moses Moses Dev/ Test MERT MERT

Baseline : giza++ with default setting Our method : sba, drawing 1,000 sub-corpora per sentence pair

12 / 26

slide-37
SLIDE 37

Method Experimental Results Conclusion and future work

Basic alignment task : data

  • Language pairs
  • English to French (1 reference translation)
  • French to English (7 reference translations)
  • Chinese to English (7 reference translations)
  • Development and test set (from BTEC)

Corpus #lines Avg(#tokenen) #tokenfr #tokenzh devel03 506 4,098 (16 refs) 4,220 3,435 test09 469 3,928 (7 refs) 4,023 3,031

  • Training Data

Corpus # lines #tokenen # tokenfr # tokenzh BTEC 20K 182K 207K

  • HIT

62K 600K 690K 590K

13 / 26

slide-38
SLIDE 38

Method Experimental Results Conclusion and future work

Basic alignment task : results

  • English → French (1 reference) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K

  • French → English (7 references) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K

  • Chinese → English (7 references) :

HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries

giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K

14 / 26

slide-39
SLIDE 39

Method Experimental Results Conclusion and future work

Basic alignment task : results

  • English → French (1 reference) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K

  • French → English (7 references) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K

  • Chinese → English (7 references) :

HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries

giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K

14 / 26

slide-40
SLIDE 40

Method Experimental Results Conclusion and future work

Basic alignment task : results

  • English → French (1 reference) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K

  • French → English (7 references) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K

  • Chinese → English (7 references) :

HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries

giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K

14 / 26

slide-41
SLIDE 41

Method Experimental Results Conclusion and future work

Basic alignment task : results

  • English → French (1 reference) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K

  • French → English (7 references) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K

  • Chinese → English (7 references) :

HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries

giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K

14 / 26

slide-42
SLIDE 42

Method Experimental Results Conclusion and future work

Basic alignment task : results

  • English → French (1 reference) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K

  • French → English (7 references) :

BTEC (in-domain) HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries

giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K

  • Chinese → English (7 references) :

HIT (out-of-domain)

BLEU

  • racle-BLEU

TER # entries

giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K

14 / 26

slide-43
SLIDE 43

Method Experimental Results Conclusion and future work

Outline

1 Method

Sampling-based transpotting Sub-sentential alignment extraction

2 Experimental Results

Basic alignment task Incremental alignment task

3 Conclusion and future work

15 / 26

slide-44
SLIDE 44

Method Experimental Results Conclusion and future work

Incremental alignment task : system & data

Multi-PT Moses

file

WMT Big Parallel Corpus

file

Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba

Data selection : select sentence pairs which contain at least one

  • ccurrence of a word in the input text and is
  • ut-of-vocabulary (OOV) in the baseline system.

Supp PT : only contains entries of OOV words Baseline system : giza++/Moses on HIT corpus (French → English with 7 references)

Corpus # lines #tokenen # tokenfr HIT 62K 600K 690K WMT 11,745K 317,688K 383,076K supp 3.3K 111K 121K

16 / 26

slide-45
SLIDE 45

Method Experimental Results Conclusion and future work

Incremental alignment task : system & data

Multi-PT Moses

file

WMT Big Parallel Corpus

file

Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba supp

Data selection : select sentence pairs which contain at least one

  • ccurrence of a word in the input text and is
  • ut-of-vocabulary (OOV) in the baseline system.

Supp PT : only contains entries of OOV words Baseline system : giza++/Moses on HIT corpus (French → English with 7 references)

Corpus # lines #tokenen # tokenfr HIT 62K 600K 690K WMT 11,745K 317,688K 383,076K supp 3.3K 111K 121K

16 / 26

slide-46
SLIDE 46

Method Experimental Results Conclusion and future work

Incremental alignment task : system & data

Multi-PT Moses

file

WMT Big Parallel Corpus

file

Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba supp

Data selection : select sentence pairs which contain at least one

  • ccurrence of a word in the input text and is
  • ut-of-vocabulary (OOV) in the baseline system.

Supp PT : only contains entries of OOV words Baseline system : giza++/Moses on HIT corpus (French → English with 7 references)

Corpus # lines #tokenen # tokenfr HIT 62K 600K 690K WMT 11,745K 317,688K 383,076K supp 3.3K 111K 121K

16 / 26

slide-47
SLIDE 47

Method Experimental Results Conclusion and future work

Incremental alignment task : system & data

Multi-PT Moses

file

WMT Big Parallel Corpus

file

Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba supp

Data selection : select sentence pairs which contain at least one

  • ccurrence of a word in the input text and is
  • ut-of-vocabulary (OOV) in the baseline system.

Supp PT : only contains entries of OOV words Baseline system : giza++/Moses on HIT corpus (French → English with 7 references)

Corpus # lines #tokenen # tokenfr HIT 62K 600K 690K WMT 11,745K 317,688K 383,076K supp 3.3K 111K 121K

16 / 26

slide-48
SLIDE 48

Method Experimental Results Conclusion and future work

Incremental alignment task : system & data

Multi-PT Moses

file

WMT Big Parallel Corpus

file

Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba supp

Data selection : select sentence pairs which contain at least one

  • ccurrence of a word in the input text and is
  • ut-of-vocabulary (OOV) in the baseline system.

Supp PT : only contains entries of OOV words Baseline system : giza++/Moses on HIT corpus (French → English with 7 references)

Corpus # lines #tokenen # tokenfr HIT 62K 600K 690K WMT 11,745K 317,688K 383,076K supp 3.3K 111K 121K

16 / 26

slide-49
SLIDE 49

Method Experimental Results Conclusion and future work

Incremental alignment task : results

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU ∆-BLEU TER giza++ none

  • 45.52

33.99 | forced 59 1,993 47.94 +2.42 34.62 | concat 60 1,190 48.69 +3.17 33.09 | sba 64 681 49.83 +4.31 30.61 | concat++ 62 1,218 50.23 +4.71 29.81 sba none

  • 45.34
  • 0.18

33.79 | sba 64 681 50.45 +4.93 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the WMT corpus

17 / 26

slide-50
SLIDE 50

Method Experimental Results Conclusion and future work

Incremental alignment task : results

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU ∆-BLEU TER giza++ none

  • 45.52

33.99 | forced 59 1,993 47.94 +2.42 34.62 | concat 60 1,190 48.69 +3.17 33.09 | sba 64 681 49.83 +4.31 30.61 | concat++ 62 1,218 50.23 +4.71 29.81 sba none

  • 45.34
  • 0.18

33.79 | sba 64 681 50.45 +4.93 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the WMT corpus

17 / 26

slide-51
SLIDE 51

Method Experimental Results Conclusion and future work

Incremental alignment task : results

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU ∆-BLEU TER giza++ none

  • 45.52

33.99 | forced 59 1,993 47.94 +2.42 34.62 | concat 60 1,190 48.69 +3.17 33.09 | sba 64 681 49.83 +4.31 30.61 | concat++ 62 1,218 50.23 +4.71 29.81 sba none

  • 45.34
  • 0.18

33.79 | sba 64 681 50.45 +4.93 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the WMT corpus

17 / 26

slide-52
SLIDE 52

Method Experimental Results Conclusion and future work

Incremental alignment task : results

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU ∆-BLEU TER giza++ none

  • 45.52

33.99 | forced 59 1,993 47.94 +2.42 34.62 | concat 60 1,190 48.69 +3.17 33.09 | sba 64 681 49.83 +4.31 30.61 | concat++ 62 1,218 50.23 +4.71 29.81 sba none

  • 45.34
  • 0.18

33.79 | sba 64 681 50.45 +4.93 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the WMT corpus

17 / 26

slide-53
SLIDE 53

Method Experimental Results Conclusion and future work

Incremental alignment task : results

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU ∆-BLEU TER giza++ none

  • 45.52

33.99 | forced 59 1,993 47.94 +2.42 34.62 | concat 60 1,190 48.69 +3.17 33.09 | sba 64 681 49.83 +4.31 30.61 | concat++ 62 1,218 50.23 +4.71 29.81 sba none

  • 45.34
  • 0.18

33.79 | sba 64 681 50.45 +4.93 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the WMT corpus

17 / 26

slide-54
SLIDE 54

Method Experimental Results Conclusion and future work

Incremental alignment task : results

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU ∆-BLEU TER giza++ none

  • 45.52

33.99 | forced 59 1,993 47.94 +2.42 34.62 | concat 60 1,190 48.69 +3.17 33.09 | sba 64 681 49.83 +4.31 30.61 | concat++ 62 1,218 50.23 +4.71 29.81 sba none

  • 45.34
  • 0.18

33.79 | sba 64 681 50.45 +4.93 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the WMT corpus

17 / 26

slide-55
SLIDE 55

Method Experimental Results Conclusion and future work

Incremental alignment task : results

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU ∆-BLEU TER giza++ none

  • 45.52

33.99 | forced 59 1,993 47.94 +2.42 34.62 | concat 60 1,190 48.69 +3.17 33.09 | sba 64 681 49.83 +4.31 30.61 | concat++ 62 1,218 50.23 +4.71 29.81 sba none

  • 45.34
  • 0.18

33.79 | sba 64 681 50.45 +4.93 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the WMT corpus

17 / 26

slide-56
SLIDE 56

Method Experimental Results Conclusion and future work

Outline

1 Method

Sampling-based transpotting Sub-sentential alignment extraction

2 Experimental Results

Basic alignment task Incremental alignment task

3 Conclusion and future work

18 / 26

slide-57
SLIDE 57

Method Experimental Results Conclusion and future work

Conclusion

Conclusion

  • alignment time can be controlled
  • only useful sentence pairs need to be aligned
  • integrating new data is plug-and-play

Previous work

  • sampling-based transpotting (Anymalign) (Lardilleux and

Lepage, 2008)

  • Inversion Transduction Grammars (Wu, 1997)
  • binary recursive bi-sentence segmentation (Lardilleux, Yvon and

Lepage, 2012)

19 / 26

slide-58
SLIDE 58

Method Experimental Results Conclusion and future work

Conclusion and future work

Hypothesis : sba performs better on rare words

a t r

  • u

p e

  • f

a c t

  • r

s i n c

  • s

t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...

Forced

a troupe

  • f

actors in costumes ... in ... une troupe de comédiens déguisés dans ... ...

Concat

a t r

  • u

p e

  • f

a c t

  • r

s i n c

  • s

t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...

SBA

With this framework, we can :

  • perform the alignment process and phrase table construction on

a per-need basis

  • work at the level of tera-scale translation using huge quantities
  • f unaligned parallel corpora
  • perform domain adaptation by careful example selection

20 / 26

slide-59
SLIDE 59

Thank you !

slide-60
SLIDE 60

Method Experimental Results Conclusion and future work

Sub-sentential alignment : details

1 Association score w(s,t) between source and target words :

w(s,t) = p(s|t)∗p(t|s)

2 Segmentation criterion :

cut(X,Y) = cut(¯ X, ¯ Y) = W(X, ¯ Y)+W(¯ X,Y) where :

  • (X,Y) ∈ {A, ¯

A}×{B, ¯ B}

  • W(X,Y) =

s∈X,t∈Y

w(s,t)

B ¯ B t1 ... ty−1 ty ... tJ s1 A . . . W(A,B) W(A, ¯ B) sx−1 sx ¯ A . . . W( ¯ A,B) W( ¯ A, ¯ B) sI

  • To avoid unbalanced segmentation, we use instead a normalized

variant : Ncut(X,Y) =

cut(X,Y) cut(X,Y)+2×W(X,Y) + cut( ¯ X, ¯ Y) cut( ¯ X, ¯ Y)+2×W( ¯ X, ¯ Y)

22 / 26

slide-61
SLIDE 61

Method Experimental Results Conclusion and future work

Sub-sentential alignment

1 The greedy strategy is used to

find the best segmentation point and the direction (direct or swap).

2 The recursive procedure ends

when the source or target segment contains only one word.

procedure align(S,T) : if length(S) = 1 or length(T) = 1 : link each word of S to each word of T stop procedure minNcut = 2 (X,Y) = (S,T) for each (i, j) 2 {2...I}×{2...J} : if Ncut(A,B) < minNcut : minNcut = Ncut(A,B) (X,Y) = (A,B) if Ncut(A, ¯ B < minNcut : minNcut = Ncut(A, ¯ B) (X,Y) = (A, ¯ B) align(X,Y) align( ¯ X, ¯ Y)

  • 23 / 26
slide-62
SLIDE 62

Method Experimental Results Conclusion and future work

Sub-sentential alignment

1 The greedy strategy is used to

find the best segmentation point and the direction (direct or swap).

2 The recursive procedure ends

when the source or target segment contains only one word.

procedure align(S,T) : if length(S) = 1 or length(T) = 1 : link each word of S to each word of T stop procedure minNcut = 2 (X,Y) = (S,T) for each (i, j) 2 {2...I}×{2...J} : if Ncut(A,B) < minNcut : minNcut = Ncut(A,B) (X,Y) = (A,B) if Ncut(A, ¯ B < minNcut : minNcut = Ncut(A, ¯ B) (X,Y) = (A, ¯ B) align(X,Y) align( ¯ X, ¯ Y)

  • 23 / 26
slide-63
SLIDE 63

Method Experimental Results Conclusion and future work

Sub-sentential alignment

1 The greedy strategy is used to

find the best segmentation point and the direction (direct or swap).

2 The recursive procedure ends

when the source or target segment contains only one word.

procedure align(S,T) : if length(S) = 1 or length(T) = 1 : link each word of S to each word of T stop procedure minNcut = 2 (X,Y) = (S,T) for each (i, j) 2 {2...I}×{2...J} : if Ncut(A,B) < minNcut : minNcut = Ncut(A,B) (X,Y) = (A,B) if Ncut(A, ¯ B < minNcut : minNcut = Ncut(A, ¯ B) (X,Y) = (A, ¯ B) align(X,Y) align( ¯ X, ¯ Y)

  • 23 / 26
slide-64
SLIDE 64

Method Experimental Results Conclusion and future work

All results

BTEC HIT BTEC+HIT BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries BLEU

  • racle-BLEU

TER # entries English→French (1 reference) giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K 47.97 83.62 35.45 1,546K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K 47.55 84.40 37.22 1,241K French→English (7 references) giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K 63.69 84.00 21.95 1,551K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K 64.44 83.57 22.31 1,241K Chinese→English (7 references) giza++

  • 27.88

51.69 50.76 1,139K

  • sba
  • 27.85

53.05 50.93 655K

  • 24 / 26
slide-65
SLIDE 65

Method Experimental Results Conclusion and future work

Incremental alignment task

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU 1g 2g 3g 4g TER giza++ none

  • 45.52

76.5 52.2 37.8 27.1 33.99 | forced 59 1,993 47.94 76.8 55.4 41.0 29.2 34.62 | concat 60 1,190 48.69 78.4 56.1 41.4 29.8 33.09 | sba 64 681 49.83 80.9 57.3 42.0 30.5 30.61 | concat++ 62 1,218 50.23 81.5 57.8 42.6 31.1 29.81 sba none

  • 45.34

77.0 52.1 37.4 26.9 33.79 | sba 64 681 50.45 81.8 58.3 42.5 30.9 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the corpus WMT

25 / 26

slide-66
SLIDE 66

Method Experimental Results Conclusion and future work

Incremental alignment task

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU 1g 2g 3g 4g TER giza++ none

  • 45.52

76.5 52.2 37.8 27.1 33.99 | forced 59 1,993 47.94 76.8 55.4 41.0 29.2 34.62 | concat 60 1,190 48.69 78.4 56.1 41.4 29.8 33.09 | sba 64 681 49.83 80.9 57.3 42.0 30.5 30.61 | concat++ 62 1,218 50.23 81.5 57.8 42.6 31.1 29.81 sba none

  • 45.34

77.0 52.1 37.4 26.9 33.79 | sba 64 681 50.45 81.8 58.3 42.5 30.9 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the corpus WMT

25 / 26

slide-67
SLIDE 67

Method Experimental Results Conclusion and future work

Incremental alignment task

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU 1g 2g 3g 4g TER giza++ none

  • 45.52

76.5 52.2 37.8 27.1 33.99 | forced 59 1,993 47.94 76.8 55.4 41.0 29.2 34.62 | concat 60 1,190 48.69 78.4 56.1 41.4 29.8 33.09 | sba 64 681 49.83 80.9 57.3 42.0 30.5 30.61 | concat++ 62 1,218 50.23 81.5 57.8 42.6 31.1 29.81 sba none

  • 45.34

77.0 52.1 37.4 26.9 33.79 | sba 64 681 50.45 81.8 58.3 42.5 30.9 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the corpus WMT

25 / 26

slide-68
SLIDE 68

Method Experimental Results Conclusion and future work

Incremental alignment task

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU 1g 2g 3g 4g TER giza++ none

  • 45.52

76.5 52.2 37.8 27.1 33.99 | forced 59 1,993 47.94 76.8 55.4 41.0 29.2 34.62 | concat 60 1,190 48.69 78.4 56.1 41.4 29.8 33.09 | sba 64 681 49.83 80.9 57.3 42.0 30.5 30.61 | concat++ 62 1,218 50.23 81.5 57.8 42.6 31.1 29.81 sba none

  • 45.34

77.0 52.1 37.4 26.9 33.79 | sba 64 681 50.45 81.8 58.3 42.5 30.9 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the corpus WMT

25 / 26

slide-69
SLIDE 69

Method Experimental Results Conclusion and future work

Incremental alignment task

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU 1g 2g 3g 4g TER giza++ none

  • 45.52

76.5 52.2 37.8 27.1 33.99 | forced 59 1,993 47.94 76.8 55.4 41.0 29.2 34.62 | concat 60 1,190 48.69 78.4 56.1 41.4 29.8 33.09 | sba 64 681 49.83 80.9 57.3 42.0 30.5 30.61 | concat++ 62 1,218 50.23 81.5 57.8 42.6 31.1 29.81 sba none

  • 45.34

77.0 52.1 37.4 26.9 33.79 | sba 64 681 50.45 81.8 58.3 42.5 30.9 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the corpus WMT

25 / 26

slide-70
SLIDE 70

Method Experimental Results Conclusion and future work

Incremental alignment task

Phrase tables HIT main supplementary (62K HIT) (3.3K supp) # words # entries BLEU 1g 2g 3g 4g TER giza++ none

  • 45.52

76.5 52.2 37.8 27.1 33.99 | forced 59 1,993 47.94 76.8 55.4 41.0 29.2 34.62 | concat 60 1,190 48.69 78.4 56.1 41.4 29.8 33.09 | sba 64 681 49.83 80.9 57.3 42.0 30.5 30.61 | concat++ 62 1,218 50.23 81.5 57.8 42.6 31.1 29.81 sba none

  • 45.34

77.0 52.1 37.4 26.9 33.79 | sba 64 681 50.45 81.8 58.3 42.5 30.9 29.94

none: baseline system forced: forced alignment, trained on HIT concat: giza++ alignment learnt on the concatenation of HIT and supp sba: our sampling-based alignment method concat++: giza++ alignment learnt on the corpus WMT

25 / 26

slide-71
SLIDE 71

Method Experimental Results Conclusion and future work

Hypothesis SBA performs better on rare words.

a t r

  • u

p e

  • f

a c t

  • r

s i n c

  • s

t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...

Forced

a troupe

  • f

actors in costumes ... in ... une troupe de comédiens déguisés dans ... ...

Concat

a t r

  • u

p e

  • f

a c t

  • r

s i n c

  • s

t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...

SBA

26 / 26