Improving Bilingual Sub-sentential Alignment by Sampling-based Transpotting
Li Gong, Aur´ elien Max, Franc ¸ois Yvon
LIMSI-CNRS & Universit´ e Paris-Sud Orsay, France
Improving Bilingual Sub-sentential Alignment by Sampling-based - - PowerPoint PPT Presentation
Improving Bilingual Sub-sentential Alignment by Sampling-based Transpotting Li Gong , Aur elien Max, Franc ois Yvon LIMSI-CNRS & Universit e Paris-Sud Orsay, France Method Experimental Results Conclusion and future work Context
LIMSI-CNRS & Universit´ e Paris-Sud Orsay, France
Method Experimental Results Conclusion and future work
a t r
p e
a c t
s i n c
t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...
Method Experimental Results Conclusion and future work
a t r
p e
a c t
s i n c
t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...
Method Experimental Results Conclusion and future work
1 Method
2 Experimental Results
3 Conclusion and future work
Method Experimental Results Conclusion and future work
1 Method
2 Experimental Results
3 Conclusion and future work
Method Experimental Results Conclusion and future work
1 Method
2 Experimental Results
3 Conclusion and future work
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Given a source-target sentence pair, extract an association
2 Draw a random sub-corpus from the parallel corpus and
3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table
Method Experimental Results Conclusion and future work
1 Method
2 Experimental Results
3 Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
1 Association score w(s,t) between source and target words :
2 Segmentation criterion :
s∈X,t∈Y
B ¯ B t1 ... ty−1 ty ... tJ s1 A . . . W(A,B) W(A, ¯ B) sx−1 sx ¯ A . . . W( ¯ A,B) W( ¯ A, ¯ B) sI
Method Experimental Results Conclusion and future work
1 Method
2 Experimental Results
3 Conclusion and future work
Method Experimental Results Conclusion and future work
1 Method
2 Experimental Results
3 Conclusion and future work
Method Experimental Results Conclusion and future work
file
Parallel Corpus
file
Parallel Corpus giza++ sba giza++ alignment sba alignment Moses Moses Dev/ Test MERT MERT
Method Experimental Results Conclusion and future work
file
Parallel Corpus
file
Parallel Corpus giza++ sba giza++ alignment sba alignment Moses Moses Dev/ Test MERT MERT
Method Experimental Results Conclusion and future work
file
Parallel Corpus
file
Parallel Corpus giza++ sba giza++ alignment sba alignment Moses Moses Dev/ Test MERT MERT
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K
HIT (out-of-domain)
BLEU
TER # entries
giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K
Method Experimental Results Conclusion and future work
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K
HIT (out-of-domain)
BLEU
TER # entries
giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K
Method Experimental Results Conclusion and future work
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K
HIT (out-of-domain)
BLEU
TER # entries
giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K
Method Experimental Results Conclusion and future work
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K
HIT (out-of-domain)
BLEU
TER # entries
giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K
Method Experimental Results Conclusion and future work
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K
BTEC (in-domain) HIT (out-of-domain)
BLEU
TER # entries BLEU
TER # entries
giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K
HIT (out-of-domain)
BLEU
TER # entries
giza++ 27.88 51.69 50.76 1,139K sba 27.85 53.05 50.93 655K
Method Experimental Results Conclusion and future work
1 Method
2 Experimental Results
3 Conclusion and future work
Method Experimental Results Conclusion and future work
Multi-PT Moses
file
WMT Big Parallel Corpus
file
Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba
Method Experimental Results Conclusion and future work
Multi-PT Moses
file
WMT Big Parallel Corpus
file
Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba supp
Method Experimental Results Conclusion and future work
Multi-PT Moses
file
WMT Big Parallel Corpus
file
Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba supp
Method Experimental Results Conclusion and future work
Multi-PT Moses
file
WMT Big Parallel Corpus
file
Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba supp
Method Experimental Results Conclusion and future work
Multi-PT Moses
file
WMT Big Parallel Corpus
file
Selected Sentence Pairs Dev/Test Data selection Alignment Supp PT Baseline System + giza++/sba supp
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
1 Method
2 Experimental Results
3 Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
a t r
p e
a c t
s i n c
t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...
Forced
a troupe
actors in costumes ... in ... une troupe de comédiens déguisés dans ... ...
Concat
a t r
p e
a c t
s i n c
t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...
SBA
Method Experimental Results Conclusion and future work
1 Association score w(s,t) between source and target words :
2 Segmentation criterion :
s∈X,t∈Y
B ¯ B t1 ... ty−1 ty ... tJ s1 A . . . W(A,B) W(A, ¯ B) sx−1 sx ¯ A . . . W( ¯ A,B) W( ¯ A, ¯ B) sI
Method Experimental Results Conclusion and future work
1 The greedy strategy is used to
2 The recursive procedure ends
procedure align(S,T) : if length(S) = 1 or length(T) = 1 : link each word of S to each word of T stop procedure minNcut = 2 (X,Y) = (S,T) for each (i, j) 2 {2...I}×{2...J} : if Ncut(A,B) < minNcut : minNcut = Ncut(A,B) (X,Y) = (A,B) if Ncut(A, ¯ B < minNcut : minNcut = Ncut(A, ¯ B) (X,Y) = (A, ¯ B) align(X,Y) align( ¯ X, ¯ Y)
Method Experimental Results Conclusion and future work
1 The greedy strategy is used to
2 The recursive procedure ends
procedure align(S,T) : if length(S) = 1 or length(T) = 1 : link each word of S to each word of T stop procedure minNcut = 2 (X,Y) = (S,T) for each (i, j) 2 {2...I}×{2...J} : if Ncut(A,B) < minNcut : minNcut = Ncut(A,B) (X,Y) = (A,B) if Ncut(A, ¯ B < minNcut : minNcut = Ncut(A, ¯ B) (X,Y) = (A, ¯ B) align(X,Y) align( ¯ X, ¯ Y)
Method Experimental Results Conclusion and future work
1 The greedy strategy is used to
2 The recursive procedure ends
procedure align(S,T) : if length(S) = 1 or length(T) = 1 : link each word of S to each word of T stop procedure minNcut = 2 (X,Y) = (S,T) for each (i, j) 2 {2...I}×{2...J} : if Ncut(A,B) < minNcut : minNcut = Ncut(A,B) (X,Y) = (A,B) if Ncut(A, ¯ B < minNcut : minNcut = Ncut(A, ¯ B) (X,Y) = (A, ¯ B) align(X,Y) align( ¯ X, ¯ Y)
Method Experimental Results Conclusion and future work
BTEC HIT BTEC+HIT BLEU
TER # entries BLEU
TER # entries BLEU
TER # entries English→French (1 reference) giza++ 45.68 76.26 37.03 360K 39.65 68.20 44.50 1,217K 47.97 83.62 35.45 1,546K sba 47.81 77.78 36.60 315K 39.70 68.45 43.56 921K 47.55 84.40 37.22 1,241K French→English (7 references) giza++ 59.50 77.23 24.59 360K 45.52 68.58 33.99 1,224K 63.69 84.00 21.95 1,551K sba 59.92 77.50 24.22 315K 45.34 69.59 33.79 937K 64.44 83.57 22.31 1,241K Chinese→English (7 references) giza++
51.69 50.76 1,139K
53.05 50.93 655K
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
Method Experimental Results Conclusion and future work
a t r
p e
a c t
s i n c
t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...
Forced
a troupe
actors in costumes ... in ... une troupe de comédiens déguisés dans ... ...
Concat
a t r
p e
a c t
s i n c
t u m e s ... i n ... une troupe de comédiens déguisés dans ... ...
SBA