11,001 NEW FEATURES FOR STATISTICAL MACHINE TRANSLATION
David Chiang Kevin Knight Wei Wang
11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David - - PowerPoint PPT Presentation
11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David Chiang Kevin Knight Wei Wang MOTIVATION Maria no di una bofetada a la bruja verde the green witch Maria DT JJ NN slap NNP NP VB NP did not NP VP RB VBD VP VP
David Chiang Kevin Knight Wei Wang
VP VBD VP
slap
VB NP NP
Maria
NP
the green witch
NNP DT JJ NN S NP RB
not did
VP
Maria no dió una bofetada a la bruja verde
VP VBD VP
slapped
VBD NP NP
Maria
NP
the green witch
NNP DT JJ NN S NP RB
not did
VP
Maria no dió una bofetada a la bruja verde
VP VBD VP
slapped
VBD NP NP
Maria
NP
the green witch
NNP DT JJ NN S NP RB
not did
VP
Maria no dió una bofetada a la bruja verde
VP VBD VP
slapped
VBD NP NP
Maria
NP
the green witch
NNP DT JJ NN S NP RB
not did
VP
Maria no dió una bofetada a la bruja verde
System Training Features BLEU Hiero MERT 11 36.1 Hiero MIRA 10,990 37.6 Syntax MERT 25 39.5 Syntax MIRA 283 40.6
GALE 2008 Chinese-English data
Features Experiments
BLEU Model score
BLEU Model score
margin
BLEU Model score
loss
BLEU Model score
BLEU Model score
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
decode decode update decode update decode decode update decode update broadcast broadcast broadcast broadcast broadcast
Hiero n = 20 Syntax n = 73
PP PP IN from IN around NP1
晚上 NP1 左右 count=1
p.m. around
insert-were
VP VP VBD
were
VP
expelled by NK UN inspectors VP
VP
expelled by NK UN inspectors VP
S NP ADVP NNP edo NP thinking NN PP IN
NP NP the best-selling book " PP VBN published NP his autobiography … VP . " NP PP NP PP art for the generation in mind
bad-rewrite bad-rewrite
node=,
S NP VP
said
VBD ,
, Yoon
S NP VP
said
VBD S
Yoon
S
第一个 站 出来 stand up
first come out
PP IN
from
NP
the first leg
IN NP NP
the first
SBAR
to
VP VP
root=IN root=VP
stand
第一个 站 出来
first come out stand
这 是 一个 值得 关注 和 研究 的 新 动向 .
Marton & Resnik 2008; Chiang et al 2008
VP
new trends in the study cross-VP
this is a merit attention study new trend and
这 是 一个 值得 关注 和 研究 的 新 动向 .
Marton & Resnik 2008; Chiang et al 2008
VP
meriting attention and study match-VP
this is a merit attention study new trend and
Chiang et al 2008 挪威 恢复 在 斯里兰卡 的 和平 斡旋
Norway restore in Sri Lanka peace mediation
to restore peace in Sri Lanka , the Norwegian mediation Norway restoring peace mediation in Sri Lanka 挪威 恢复 在 斯里兰卡 的 和平 斡旋
Norway restore in Sri Lanka peace mediation
work on WSD in MT (Chan et al. 2007, Carpuat & Wu 2007)
the source side (just the 100 most frequent types)
fi fi-1 e fi fi+1 e
fi=, & fi-1=说 & e=that fi=, & fi-1=说 & e=, 他 说 , 由于 没有 配音 , 他 不得不 since there is no voice , he said , he had to 他 说 , 由于 没有 配音 , 他 不得不 he said that because of the lack of voice , he had to
he said because no voice he had to he said because no voice he had to
Hiero Syntax Parallel data 260M 65M Language model 2G 1G MERT/MIRA 58k 58k Test 57k 57k
GALE 2008 Chinese-English data
Training Features # BLEU MERT baseline 11 36.1 +source-side syntax +distortion 56 36.9 MIRA +discount 61 37.3 +word context 10,990 37.6
Chinese-English
Training Features # BLEU MERT baseline 25 39.5 baseline 25 39.8 rule overlap 132 39.9 MIRA node count 136 40.0 +discount +bad rewrite +insertion 283 40.6
Chinese-English