MaxForce: Max-Violation Perceptron and Forced Decoding for Scalable MT Training
Heng Yu
Chinese Acad. of Sciences
Liang Huang
CUNY
Haitao Mi
IBM T. J. Watson
1 2 3 4 5 6 Bush held held talks talks with with Sharon Sharon
Kai Zhao
CUNY
MaxForce: Max-Violation Perceptron and Forced Decoding for Scalable - - PowerPoint PPT Presentation
MaxForce: Max-Violation Perceptron and Forced Decoding for Scalable MT Training held talks with Sharon held Bush talks with Sharon 0 1 2 3 4 5 6 Heng Yu Liang Huang Kai Zhao Haitao Mi Chinese Acad. of Sciences CUNY CUNY IBM T.
Chinese Acad. of Sciences
CUNY
IBM T. J. Watson
1 2 3 4 5 6 Bush held held talks talks with with Sharon Sharon
CUNY
Chinese Acad. of Sciences
CUNY
IBM T. J. Watson
1 2 3 4 5 6 Bush held held talks talks with with Sharon Sharon
CUNY
2
dev set (~1k sents) test set (~1k sents)
3
dev set (~1k sents) test set (~1k sents)
MERT
(Och ’02)
(dense features)
3
dev set (~1k sents) test set (~1k sents)
(Liang et al 2006)
MERT
(Och ’02)
(dense features)
3
dev set (~1k sents) test set (~1k sents)
(Liang et al 2006)
MIRA
(Watanabe+ ’07) (Chiang+ ’08-’12)
MERT
(Och ’02)
(dense features) (pseudo sparse features)
3
dev set (~1k sents) test set (~1k sents)
(Liang et al 2006)
MIRA
(Watanabe+ ’07) (Chiang+ ’08-’12)
PRO
(Hopkins+May ’11)
Regression
(Bazrafshan+ ’12)
MERT
(Och ’02)
(dense features) (pseudo sparse features)
3
dev set (~1k sents) test set (~1k sents)
(Liang et al 2006)
MIRA
(Watanabe+ ’07) (Chiang+ ’08-’12)
PRO
(Hopkins+May ’11)
Regression
(Bazrafshan+ ’12)
HOLS
(Flanigan+ ’13)
(sparse features as
MERT
(Och ’02)
(dense features) (pseudo sparse features)
3
dev set (~1k sents) test set (~1k sents)
(Liang et al 2006)
MIRA
(Watanabe+ ’07) (Chiang+ ’08-’12)
PRO
(Hopkins+May ’11)
Regression
(Bazrafshan+ ’12)
HOLS
(Flanigan+ ’13)
(sparse features as
MERT
(Och ’02)
(dense features) (pseudo sparse features)
3
dev set (~1k sents) test set (~1k sents)
(Liang et al 2006)
MIRA
(Watanabe+ ’07) (Chiang+ ’08-’12)
PRO
(Hopkins+May ’11)
Regression
(Bazrafshan+ ’12)
HOLS
(Flanigan+ ’13)
(sparse features as
MERT
(Och ’02)
(dense features) (pseudo sparse features)
4
5
6
x y
the man bit the dog
那 人 咬 了 狗
6
x y
the man bit the dog
那 人 咬 了 狗
6
x y
the man bit the dog
那 人 咬 了 狗
x
那 人 咬 了 狗
6
x y
the man bit the dog
那 人 咬 了 狗
x y
the dog bit the man
那 人 咬 了 狗
6
x y
the man bit the dog
那 人 咬 了 狗
x y
the dog bit the man
那 人 咬 了 狗
6
x y
the man bit the dog
那 人 咬 了 狗
x y
the dog bit the man
那 人 咬 了 狗
6
x y
the man bit the dog
那 人 咬 了 狗
x y
the dog bit the man
那 人 咬 了 狗
7
与 沙⻰龚 举行 了 会谈
布什 Bushi Bush
with Sharon held talks meetings Sharon held
with
Bush Bushi
与 沙⻰龚 举行 了 会谈
布什 Bushi Bush
with Sharon held talks meetings Sharon held
with
_ _ _ _ _ _ Bush Bushi
与 沙⻰龚 举行 了 会谈
布什 Bushi Bush
with Sharon held talks meetings Sharon held
with
_ _ _ _ _ _ Bush Bushi
与 沙⻰龚 举行 了 会谈
布什 Bushi Bush
with Sharon held talks meetings Sharon held
with
_ _ _ _ _ _
Bushi
与 沙⻰龚 举行 了 会谈
布什 Bushi Bush
with Sharon held talks meetings Sharon held
with
_ _ _ _ _ _
Bushi
与 沙⻰龚 举行 了 会谈
布什 Bushi Bush
with Sharon held talks meetings Sharon held
with
_ _ _ _ _ _
Bushi
与 沙⻰龚 举行 了 会谈
布什 Bushi Bush
with Sharon held talks meetings Sharon held
with
_ _ _ _ _ _
Bushi
9
9
! ! ! ! !
Bush
9
! ! ! ! ! ! ! ! ! !
... talks
! ! ! ! ! ! ! ! ! ! ! ! !
... talk
! ! !
... meeting
! ! ! ! !
Bush
9
! ! ! ! ! ! ! ! ! !
... talks
! ! ! ! ! ! ! ! ! ! ! ! !
... talk
! ! !
... meeting
! ! ! ! !
Bush
9
! ! ! ! ! ! ! ! ! !
... talks
! ! ! ! ! ! ! ! ! ! ! ! !
... talk
! ! !
... meeting
! ! ! ! !
Bush
Bushi yu Shalong juxing le huitan Bush held talks with Sharon
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
! ! ! ! !
Bush
Bushi yu Shalong juxing le huitan Bush held talks with Sharon
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
! ! ! ! ! ! ! ! ! !
... talks
! ! ! ! ! ! ! ! ! ! ! ! !
... talk
! ! !
... meeting
! ! ! ! !
Bush
Bushi yu Shalong juxing le huitan Bush held talks with Sharon
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
! ! ! ! ! ! ! ! ! !
... talks
! ! ! ! ! ! ! ! ! ! ! ! !
... talk
! ! !
... meeting
! ! ! ! !
Bush
Bushi yu Shalong juxing le huitan Bush held talks with Sharon
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
! ! ! ! ! ! ! ! ! !
... talks
! ! ! ! ! ! ! ! ! ! ! ! !
... talk
! ! !
... meeting
! ! ! ! !
Bush
Bushi yu Shalong juxing le huitan Bush held talks with Sharon
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
! ! ! ! ! ! ! ! ! !
... talks
! ! ! ! ! ! ! ! ! ! ! ! !
... talk
! ! !
... meeting
! ! ! ! !
Bush
Bushi yu Shalong juxing le huitan Bush held talks with Sharon
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
! ! ! ! ! ! ! ! ! !
... talks
! ! ! ! ! ! ! ! ! ! ! ! !
... talk
! ! !
... meeting
! ! ! ! !
Bush
Bushi yu Shalong juxing le huitan Bush held talks with Sharon
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
11
U.N. sent 50
to monitor the 1st election since Bolivia restored democracy 5 3 3 4
玻利维亚 恢复 民主 政治 以来 首次 全国 大选 联合国 派遣 50名 观察员 监督
11
U.N. sent 50
to monitor the 1st election since Bolivia restored democracy 5 3 3 4
玻利维亚 恢复 民主 政治 以来 首次 全国 大选 联合国 派遣 50名 观察员 监督
12
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10 20 30 40 50 60 70 Ratio of complete coverage Sentence length Distortion-unlimit Distortion-limit 6 Distortion-limit 4 Distortion-limit 2 Distortion-limit 0
12
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10 20 30 40 50 60 70 Ratio of complete coverage Sentence length Distortion-unlimit Distortion-limit 6 Distortion-limit 4 Distortion-limit 2 Distortion-limit 0
12
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10 20 30 40 50 60 70 Ratio of complete coverage Sentence length Distortion-unlimit Distortion-limit 6 Distortion-limit 4 Distortion-limit 2 Distortion-limit 0
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 10 20 30 40 50 60 70 Ratio of complete coverage Sentence length dist-6 dist-4 dist-2 dist-0
13
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 5 10 15 20 25 30 35 40 45 50 Average number of derivations Sentence length dist-6 dist-4 dist-2 dist-0
14
15
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference
binary classification
15
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference
binary classification structured classification
15
x y
the man bit the dog
那 人 咬 了 狗
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference
binary classification structured classification
15
x y
the man bit the dog
那 人 咬 了 狗
y
update weights if y ≠ z w
x z
exact inference
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference constant # of classes exponential # of classes
binary classification structured classification
15
x y
the man bit the dog
那 人 咬 了 狗
y
update weights if y ≠ z w
x z
exact inference
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference constant # of classes exponential # of classes
binary classification structured classification
inexact inference
16 _ _ _ _ _ _
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice real decoding beam search
16 _ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice real decoding beam search
16 _ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _ _ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice real decoding beam search
16 _ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _ _ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice real decoding beam search
16 _ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● _ ● ● ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _ _ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice real decoding beam search
16 _ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● _ ● ● ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _ _ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice real decoding beam search
16 _ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● _ ● ● ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _ _ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice real decoding beam search
should fix search errors here!
17
standard update (no guarantee!)
21
Model
17
standard update (no guarantee!)
21
Model
17
correct sequence falls off beam (pruned)
correct
standard update (no guarantee!)
21
Model
17
correct sequence falls off beam (pruned)
correct i n c
r e c t
standard update (no guarantee!)
21
Model
17
early update
correct sequence falls off beam (pruned)
correct i n c
r e c t violation guaranteed: incorrect prefix scores higher up to this point
standard update (no guarantee!)
21
Model
17
early update
correct sequence falls off beam (pruned)
correct i n c
r e c t violation guaranteed: incorrect prefix scores higher up to this point
standard update (no guarantee!)
21
Model
18
21
Model
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
18
21
Model
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
correct
18
21
Model
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
correct
18
21
Model
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
correct
all correct derivations fall off
18
i n c
r e c t
21
Model
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
correct
all correct derivations fall off
18
early update
i n c
r e c t violation guaranteed: incorrect prefix scores higher up to this point
21
Model
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
correct
all correct derivations fall off
18
early update
i n c
r e c t violation guaranteed: incorrect prefix scores higher up to this point
21
Model
_ _ _ _ _ _
Bush held talks with Sharon held talks with Sharon gold derivation lattice
correct
all correct derivations fall off stop decoding
19
et al 2012)
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
_ _ _ _ _ _
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
Early-update
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
Early-update
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● _ ● ● ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
Early-update
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● _ ● ● ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
Early-update
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● _ ● ● ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
Early-update
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● _ ● ● ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
Early-update
_ _ _ _ _ _
_ _ ●_ _ _ _ _ _ _ ● _ _ _ _ _ _ ●
_ ● ● ● _ _ _ _ ● ● ●_ _ ● ● ● _ _
_ ● _ ● ● ●
_ ● ● _ _ _ _ _ ● ●_ _ _ _ ● _ ● _ _ ● _ _ ● _
Max-violation
early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
early max- violation latest full
(standard)
best in the beam worst in the beam falls off the beam biggest violation last valid update c
r e c t s e q u e n c e invalid update! early max- violation best in the beam worst in the beam
d−
i
d+
i
d+
i∗
d−
i∗
d+
|x|
dy
|x| std local standard update is invalid model w
d−
|x|
22
(Collins, 2002)
22
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
22
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
(Collins & Roark, 2004; Huang et al 2012)
22
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
(Collins & Roark, 2004; Huang et al 2012)
(Yu et al 2013)
22
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
(Collins & Roark, 2004; Huang et al 2012)
(Yu et al 2013)
23
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
24
与 沙⻰龚
布什 Bush
25
与 沙⻰龚
布什 Bush
25
与 沙⻰龚
布什 Bush
25
与 沙⻰龚
布什 Bush
25
与 沙⻰龚
布什 Bush
25
与 沙⻰龚
布什 Bush
25
与 沙⻰龚
布什 Bush
25
与 沙⻰龚
布什 Bush
26
与 沙⻰龚
布什 Bush
26
与 沙⻰龚
布什 Bush
26
与 沙⻰龚
布什 Bush
27
27
Scale Language
27
Scale Language
Small
Large
27
Scale Language
Small
Large
27
10x dev
Scale Language
Small
Large
27
10x dev 120x dev
Scale Language
Small
Large
27
10x dev 120x dev
Scale Language
Small
Large
27
10x dev 120x dev
Scale Language
Small
Large
Sp-En sent. word. ratio 55% 43.9%
27
10x dev 120x dev
Scale Language
Small
Large
Sp-En sent. word. ratio 55% 43.9%
31x dev
17 18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 18 20 BLEU Number of iteration MaxForce MERT e a r l y l
a l s t a n d a r d
28
this explains why Liang et al ’06 failed std ~ “bold”; local ~ “local”
17 18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 18 20 BLEU Number of iteration MaxForce MERT e a r l y l
a l s t a n d a r d
28
50% 60% 70% 80% 90% 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Ratio beam size Ratio of invalid updates +non-local feature
(standard perceptron)
this explains why Liang et al ’06 failed std ~ “bold”; local ~ “local”
17 18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 18 20 BLEU Number of iteration MaxForce MERT e a r l y l
a l s t a n d a r d
28
50% 60% 70% 80% 90% 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Ratio beam size Ratio of invalid updates +non-local feature
(standard perceptron)
this explains why Liang et al ’06 failed std ~ “bold”; local ~ “local”
17 18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 18 20 BLEU Number of iteration MaxForce MERT e a r l y l
a l s t a n d a r d
28
29
22 23 24 0.5 1 1.5 2 2.5 3 3.5 4 BLEU Time MERT PRO-dense minibatch(24-core) minibatch(6-core) minibatch(1 core) single processor
Time
30
18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 BLEU Number of iteration MERT +non-local +word-edges +ruleid dense
30
dense: 11 features
18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 BLEU Number of iteration MERT +non-local +word-edges +ruleid dense
30
ruleid: 0.1% dense: 11 features
18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 BLEU Number of iteration MERT +non-local +word-edges +ruleid dense
+0.9 bleu
30
ruleid: 0.1% wordedges: 99.6% dense: 11 features
18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 BLEU Number of iteration MERT +non-local +word-edges +ruleid dense
+0.9 bleu +2.3
30
ruleid: 0.1% wordedges: 99.6% non-local: 0.3% dense: 11 features
18 19 20 21 22 23 24 25 26 2 4 6 8 10 12 14 16 BLEU Number of iteration MERT +non-local +word-edges +ruleid dense
+0.9 bleu +2.3 +0.7
31
10 12 14 16 18 20 22 24 26 2 4 6 8 10 12 14 16 BLEU Number of iteration MaxForce MERT PRO-dense PRO-medium PRO-large
32
32
MERT dev set 11 25.5 22.5 MERT dev set 11 25.4 22.5 Ma
32
MERT dev set 11 25.5 22.5 MERT dev set 11 25.4 22.5 11 25.6 22.6 PRO dev set 3k MaxForce
32
MERT dev set 11 25.5 22.5 MERT dev set 11 25.4 22.5 11 25.6 22.6 PRO dev set 3k 26.3 23.0 36k MaxForce
32
MERT dev set 11 25.5 22.5 MERT dev set 11 25.4 22.5 11 25.6 22.6 PRO dev set 3k 26.3 23.0 36k 17.7 14.3 MaxForce
32
MERT dev set 11 25.5 22.5 MERT dev set 11 25.4 22.5 11 25.6 22.6 PRO dev set 3k 26.3 23.0 36k 17.7 14.3 MaxForce Train set 23M 27.8 24.5
32
MERT dev set 11 25.5 22.5 MERT dev set 11 25.4 22.5 11 25.6 22.6 PRO dev set 3k 26.3 23.0 36k 17.7 14.3 MaxForce Train set 23M 27.8 24.5
+2.3
32
MERT dev set 11 25.5 22.5 MERT dev set 11 25.4 22.5 11 25.6 22.6 PRO dev set 3k 26.3 23.0 36k 17.7 14.3 MaxForce Train set 23M 27.8 24.5
+2.3 +2.0
33
system algorithm #feat. dev test Moses Mert 11 27.4 24.4 Cubit MaxForce 21M 28.7 25.5
Sp-En
Reachable ratio 55% 43.9%
33
system algorithm #feat. dev test Moses Mert 11 27.4 24.4 Cubit MaxForce 21M 28.7 25.5 +1.3 +1.1
Sp-En
Reachable ratio 55% 43.9%
34
35
(Collins, 2002)
35
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
35
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
(Collins & Roark, 2004; Huang et al 2012)
35
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
(Collins & Roark, 2004; Huang et al 2012)
(Yu et al 2013)
35
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
(Collins & Roark, 2004; Huang et al 2012)
(Yu et al 2013)
35
(Collins, 2002)
(Zettlemoyer and Collins, 2005; Sun et al., 2009)
(Collins & Roark, 2004; Huang et al 2012)
(Yu et al 2013)
replacing EM for partially-
Max-violation