What can Statistical Machine Translation teach Neural Machine Translation about Structured Prediction?
Graham Neubig
@ ICLR Workshop on Deep Reinforcement Learning Meets Structured Prediction 5/6/2019
What can Statistical Machine Translation teach Neural Machine - - PowerPoint PPT Presentation
What can Statistical Machine Translation teach Neural Machine Translation about Structured Prediction? Graham Neubig @ ICLR Workshop on Deep Reinforcement Learning Meets Structured Prediction 5/6/2019 Types of Prediction Types of
Graham Neubig
@ ICLR Workshop on Deep Reinforcement Learning Meets Structured Prediction 5/6/2019
I hate this movie
positive negative
I hate this movie
positive negative
I hate this movie
positive negative
I hate this movie
very good good neutral bad very bad
I hate this movie
positive negative
I hate this movie
very good good neutral bad very bad
I hate this movie
positive negative
I hate this movie PRP VBP DT NN I hate this movie
very good good neutral bad very bad
I hate this movie
positive negative
I hate this movie PRP VBP DT NN I hate this movie kono eiga ga kirai I hate this movie
very good good neutral bad very bad
I hate this movie
positive negative
I hate this movie PRP VBP DT NN I hate this movie kono eiga ga kirai I hate this movie
very good good neutral bad very bad
... Neubig & Watanabe, Computational Linguistics (2016)
... Neubig & Watanabe, Computational Linguistics (2016)
Minimum Error Rate Training in Statistical Machine Translation (Och 2004)
kono eiga ga kirai movie this I hate
Minimum Error Rate Training in Statistical Machine Translation (Och 2004)
kono eiga ga kirai movie this I hate
Minimum Error Rate Training in Statistical Machine Translation (Och 2004)
kono eiga ga kirai movie this I hate
Minimum Error Rate Training in Statistical Machine Translation (Och 2004)
kono eiga ga kirai movie this I hate
Minimum Error Rate Training in Statistical Machine Translation (Och 2004)
kono eiga ga kirai movie this I hate
Minimum Error Rate Training in Statistical Machine Translation (Och 2004)
kono eiga ga kirai movie this I hate
Minimum Error Rate Training in Statistical Machine Translation (Och 2004)
kono eiga ga kirai movie this I hate
translation accuracy [Och 2004]
Minimum Error Rate Training in Statistical Machine Translation (Och 2004)
log P(Y | X) = X
i
λiφi(X, Y )/Z
<latexit sha1_base64="zi4lDHl42mhk2a3gk9P95mU898=">ACG3icbZDLSgMxFIYz9VbrbdSlm2ARWpA6o4K6EIpuXFawtrVThkwm04YmSHJCGXog7jxVdy4UHEluPBtTC8Lrf4Q+PjPOUnOHySMKu04X1Zubn5hcSm/XFhZXVvfsDe3blWcSkzqOGaxbAZIEUYFqWuqGWkmkiAeMNI+pejeuOeSEVjcaMHCelw1BU0ohpY/n2kcfiLqyVWtDjNITNMjyHnkq5T6HzDUhGlHSoz4tNfdbZXgA73y76FScseBfcKdQBFPVfPvDC2OciI0ZkiptuskupMhqSlmZFjwUkUShPuoS9oGBeJEdbLxckO4Z5wQRrE0R2g4dn9OZIgrNeCB6eRI9RsbWT+V2unOjrtZFQkqSYCTx6KUgZ1DEdJwZBKgjUbGEBYUvNXiHtIqxNngUTgju78l+oH1bOKu71cbF6MU0jD3bALigBF5yAKrgCNVAHGDyAJ/ACXq1H69l6s94nrTlrOrMNfsn6/AYeVZ56</latexit><latexit sha1_base64="zi4lDHl42mhk2a3gk9P95mU898=">ACG3icbZDLSgMxFIYz9VbrbdSlm2ARWpA6o4K6EIpuXFawtrVThkwm04YmSHJCGXog7jxVdy4UHEluPBtTC8Lrf4Q+PjPOUnOHySMKu04X1Zubn5hcSm/XFhZXVvfsDe3blWcSkzqOGaxbAZIEUYFqWuqGWkmkiAeMNI+pejeuOeSEVjcaMHCelw1BU0ohpY/n2kcfiLqyVWtDjNITNMjyHnkq5T6HzDUhGlHSoz4tNfdbZXgA73y76FScseBfcKdQBFPVfPvDC2OciI0ZkiptuskupMhqSlmZFjwUkUShPuoS9oGBeJEdbLxckO4Z5wQRrE0R2g4dn9OZIgrNeCB6eRI9RsbWT+V2unOjrtZFQkqSYCTx6KUgZ1DEdJwZBKgjUbGEBYUvNXiHtIqxNngUTgju78l+oH1bOKu71cbF6MU0jD3bALigBF5yAKrgCNVAHGDyAJ/ACXq1H69l6s94nrTlrOrMNfsn6/AYeVZ56</latexit><latexit sha1_base64="zi4lDHl42mhk2a3gk9P95mU898=">ACG3icbZDLSgMxFIYz9VbrbdSlm2ARWpA6o4K6EIpuXFawtrVThkwm04YmSHJCGXog7jxVdy4UHEluPBtTC8Lrf4Q+PjPOUnOHySMKu04X1Zubn5hcSm/XFhZXVvfsDe3blWcSkzqOGaxbAZIEUYFqWuqGWkmkiAeMNI+pejeuOeSEVjcaMHCelw1BU0ohpY/n2kcfiLqyVWtDjNITNMjyHnkq5T6HzDUhGlHSoz4tNfdbZXgA73y76FScseBfcKdQBFPVfPvDC2OciI0ZkiptuskupMhqSlmZFjwUkUShPuoS9oGBeJEdbLxckO4Z5wQRrE0R2g4dn9OZIgrNeCB6eRI9RsbWT+V2unOjrtZFQkqSYCTx6KUgZ1DEdJwZBKgjUbGEBYUvNXiHtIqxNngUTgju78l+oH1bOKu71cbF6MU0jD3bALigBF5yAKrgCNVAHGDyAJ/ACXq1H69l6s94nrTlrOrMNfsn6/AYeVZ56</latexit></s>
dec dec dec dec
</s> I hate this movie kono eiga ga kirai I hate this movie
Encoder Decoder
</s>
dec dec dec dec
</s> I hate this movie kono eiga ga kirai I hate this movie
Encoder Decoder
likelihood (not accuracy!)
I
classify classify
I hate hate
classify
this this
classify
movie movie
classify
</s> encoder P(E | F) =
T
Y
t=1
P(et | F, e1, . . . , et−1)
<latexit sha1_base64="4+Z5A9vFnGki2tmcH1tEn43Xra8=">ACKXicbVDLSsNAFJ34tr6qLt0MFqGClkQEdSH4QHFZwarQ1DCZ3OrgJBNmboQS8j1u/BU3Lnxt/RGnNQtfBy6cOede5t4TplIYdN03Z2h4ZHRsfGKyMjU9MztXnV84NyrTHFpcSaUvQ2ZAigRaKFDCZaqBxaGEi/D2sO9f3IE2QiVn2EuhE7PrRHQFZ2iloLrfrB9RPxYRPV6lu9RPtYqCHe94io/K2izDgGW/hqFwFujvowUmv4jx3WvWA2qNbfhDkD/Eq8kNVKiGVSf/EjxLIYEuWTGtD03xU7ONAouoaj4mYGU8Vt2DW1LExaD6eSDUwu6YpWIdpW2lSAdqN8nchYb04tD2xkzvDG/vb74n9fOsLvdyUWSZgJ/qom0mKivZzo5HQwFH2LGFcC7sr5TdM423YoNwft98l/S2mjsNLzTzdreQZnGBFkiy6ROPLJF9sgJaZIW4eSePJn8uI8OE/Oq/P+1TrklDOL5Aecj0/UeaN6</latexit><latexit sha1_base64="4+Z5A9vFnGki2tmcH1tEn43Xra8=">ACKXicbVDLSsNAFJ34tr6qLt0MFqGClkQEdSH4QHFZwarQ1DCZ3OrgJBNmboQS8j1u/BU3Lnxt/RGnNQtfBy6cOede5t4TplIYdN03Z2h4ZHRsfGKyMjU9MztXnV84NyrTHFpcSaUvQ2ZAigRaKFDCZaqBxaGEi/D2sO9f3IE2QiVn2EuhE7PrRHQFZ2iloLrfrB9RPxYRPV6lu9RPtYqCHe94io/K2izDgGW/hqFwFujvowUmv4jx3WvWA2qNbfhDkD/Eq8kNVKiGVSf/EjxLIYEuWTGtD03xU7ONAouoaj4mYGU8Vt2DW1LExaD6eSDUwu6YpWIdpW2lSAdqN8nchYb04tD2xkzvDG/vb74n9fOsLvdyUWSZgJ/qom0mKivZzo5HQwFH2LGFcC7sr5TdM423YoNwft98l/S2mjsNLzTzdreQZnGBFkiy6ROPLJF9sgJaZIW4eSePJn8uI8OE/Oq/P+1TrklDOL5Aecj0/UeaN6</latexit><latexit sha1_base64="4+Z5A9vFnGki2tmcH1tEn43Xra8=">ACKXicbVDLSsNAFJ34tr6qLt0MFqGClkQEdSH4QHFZwarQ1DCZ3OrgJBNmboQS8j1u/BU3Lnxt/RGnNQtfBy6cOede5t4TplIYdN03Z2h4ZHRsfGKyMjU9MztXnV84NyrTHFpcSaUvQ2ZAigRaKFDCZaqBxaGEi/D2sO9f3IE2QiVn2EuhE7PrRHQFZ2iloLrfrB9RPxYRPV6lu9RPtYqCHe94io/K2izDgGW/hqFwFujvowUmv4jx3WvWA2qNbfhDkD/Eq8kNVKiGVSf/EjxLIYEuWTGtD03xU7ONAouoaj4mYGU8Vt2DW1LExaD6eSDUwu6YpWIdpW2lSAdqN8nchYb04tD2xkzvDG/vb74n9fOsLvdyUWSZgJ/qom0mKivZzo5HQwFH2LGFcC7sr5TdM423YoNwft98l/S2mjsNLzTzdreQZnGBFkiy6ROPLJF9sgJaZIW4eSePJn8uI8OE/Oq/P+1TrklDOL5Aecj0/UeaN6</latexit>in the reference given the previous words `(E | F) = − log P(E | F) = −
T
X
t=1
log P(et | F, e1, . . . , et−1)
<latexit sha1_base64="GeA/Os4/BK6Zz954iZvfPtPrQE=">ACXHicbVFdSxwxFM1MtepY61qhL325uLQo6DIjhdoHQVpafFzBrcLOdshk7q7BZDIkdwrLMH+yb/Wlf6XZdR786IHAyTn3kOQkr5R0FMd/gvDFyurLtfWNaPV1uvt3s6bH87UVuBIGXsdc4dKlniCQpvK4scp0rvMpvy78q19onTlJc0rnGg+K+VUCk5eynqUolL73yDVsoDvB/ABTuEoVWYGwdqmkbwGMs5SF2ts4ZOk/Znc9lCl8OMuQhYJYcer0w5Babho6S9iDr9eNBvAQ8J0lH+qzDMOv9Tgsjao0lCcWdGydxRZOGW5JCYRultcOKi1s+w7GnJdfoJs2ynRbe6WAqbF+lQRL9WGi4dq5uc79pOZ0456C/F/3rim6cmkWVE5bi/qBprYAMLKqGQloUpOaecGlvyuIG265IP8hkS8hefrk52R0Pg8SC4+9s+dG2s3dsj+2zhH1iZ+ycDdmICXYXsGAjiIK/4Wq4GW7dj4ZBl9ljxC+/Qcv6aoL</latexit><latexit sha1_base64="GeA/Os4/BK6Zz954iZvfPtPrQE=">ACXHicbVFdSxwxFM1MtepY61qhL325uLQo6DIjhdoHQVpafFzBrcLOdshk7q7BZDIkdwrLMH+yb/Wlf6XZdR786IHAyTn3kOQkr5R0FMd/gvDFyurLtfWNaPV1uvt3s6bH87UVuBIGXsdc4dKlniCQpvK4scp0rvMpvy78q19onTlJc0rnGg+K+VUCk5eynqUolL73yDVsoDvB/ABTuEoVWYGwdqmkbwGMs5SF2ts4ZOk/Znc9lCl8OMuQhYJYcer0w5Babho6S9iDr9eNBvAQ8J0lH+qzDMOv9Tgsjao0lCcWdGydxRZOGW5JCYRultcOKi1s+w7GnJdfoJs2ynRbe6WAqbF+lQRL9WGi4dq5uc79pOZ0456C/F/3rim6cmkWVE5bi/qBprYAMLKqGQloUpOaecGlvyuIG265IP8hkS8hefrk52R0Pg8SC4+9s+dG2s3dsj+2zhH1iZ+ycDdmICXYXsGAjiIK/4Wq4GW7dj4ZBl9ljxC+/Qcv6aoL</latexit><latexit sha1_base64="GeA/Os4/BK6Zz954iZvfPtPrQE=">ACXHicbVFdSxwxFM1MtepY61qhL325uLQo6DIjhdoHQVpafFzBrcLOdshk7q7BZDIkdwrLMH+yb/Wlf6XZdR786IHAyTn3kOQkr5R0FMd/gvDFyurLtfWNaPV1uvt3s6bH87UVuBIGXsdc4dKlniCQpvK4scp0rvMpvy78q19onTlJc0rnGg+K+VUCk5eynqUolL73yDVsoDvB/ABTuEoVWYGwdqmkbwGMs5SF2ts4ZOk/Znc9lCl8OMuQhYJYcer0w5Babho6S9iDr9eNBvAQ8J0lH+qzDMOv9Tgsjao0lCcWdGydxRZOGW5JCYRultcOKi1s+w7GnJdfoJs2ynRbe6WAqbF+lQRL9WGi4dq5uc79pOZ0456C/F/3rim6cmkWVE5bi/qBprYAMLKqGQloUpOaecGlvyuIG265IP8hkS8hefrk52R0Pg8SC4+9s+dG2s3dsj+2zhH1iZ+ycDdmICXYXsGAjiIK/4Wq4GW7dj4ZBl9ljxC+/Qcv6aoL</latexit>in the reference given the previous words `(E | F) = − log P(E | F) = −
T
X
t=1
log P(et | F, e1, . . . , et−1)
<latexit sha1_base64="GeA/Os4/BK6Zz954iZvfPtPrQE=">ACXHicbVFdSxwxFM1MtepY61qhL325uLQo6DIjhdoHQVpafFzBrcLOdshk7q7BZDIkdwrLMH+yb/Wlf6XZdR786IHAyTn3kOQkr5R0FMd/gvDFyurLtfWNaPV1uvt3s6bH87UVuBIGXsdc4dKlniCQpvK4scp0rvMpvy78q19onTlJc0rnGg+K+VUCk5eynqUolL73yDVsoDvB/ABTuEoVWYGwdqmkbwGMs5SF2ts4ZOk/Znc9lCl8OMuQhYJYcer0w5Babho6S9iDr9eNBvAQ8J0lH+qzDMOv9Tgsjao0lCcWdGydxRZOGW5JCYRultcOKi1s+w7GnJdfoJs2ynRbe6WAqbF+lQRL9WGi4dq5uc79pOZ0456C/F/3rim6cmkWVE5bi/qBprYAMLKqGQloUpOaecGlvyuIG265IP8hkS8hefrk52R0Pg8SC4+9s+dG2s3dsj+2zhH1iZ+ycDdmICXYXsGAjiIK/4Wq4GW7dj4ZBl9ljxC+/Qcv6aoL</latexit><latexit sha1_base64="GeA/Os4/BK6Zz954iZvfPtPrQE=">ACXHicbVFdSxwxFM1MtepY61qhL325uLQo6DIjhdoHQVpafFzBrcLOdshk7q7BZDIkdwrLMH+yb/Wlf6XZdR786IHAyTn3kOQkr5R0FMd/gvDFyurLtfWNaPV1uvt3s6bH87UVuBIGXsdc4dKlniCQpvK4scp0rvMpvy78q19onTlJc0rnGg+K+VUCk5eynqUolL73yDVsoDvB/ABTuEoVWYGwdqmkbwGMs5SF2ts4ZOk/Znc9lCl8OMuQhYJYcer0w5Babho6S9iDr9eNBvAQ8J0lH+qzDMOv9Tgsjao0lCcWdGydxRZOGW5JCYRultcOKi1s+w7GnJdfoJs2ynRbe6WAqbF+lQRL9WGi4dq5uc79pOZ0456C/F/3rim6cmkWVE5bi/qBprYAMLKqGQloUpOaecGlvyuIG265IP8hkS8hefrk52R0Pg8SC4+9s+dG2s3dsj+2zhH1iZ+ycDdmICXYXsGAjiIK/4Wq4GW7dj4ZBl9ljxC+/Qcv6aoL</latexit><latexit sha1_base64="GeA/Os4/BK6Zz954iZvfPtPrQE=">ACXHicbVFdSxwxFM1MtepY61qhL325uLQo6DIjhdoHQVpafFzBrcLOdshk7q7BZDIkdwrLMH+yb/Wlf6XZdR786IHAyTn3kOQkr5R0FMd/gvDFyurLtfWNaPV1uvt3s6bH87UVuBIGXsdc4dKlniCQpvK4scp0rvMpvy78q19onTlJc0rnGg+K+VUCk5eynqUolL73yDVsoDvB/ABTuEoVWYGwdqmkbwGMs5SF2ts4ZOk/Znc9lCl8OMuQhYJYcer0w5Babho6S9iDr9eNBvAQ8J0lH+qzDMOv9Tgsjao0lCcWdGydxRZOGW5JCYRultcOKi1s+w7GnJdfoJs2ynRbe6WAqbF+lQRL9WGi4dq5uc79pOZ0456C/F/3rim6cmkWVE5bi/qBprYAMLKqGQloUpOaecGlvyuIG265IP8hkS8hefrk52R0Pg8SC4+9s+dG2s3dsj+2zhH1iZ+ycDdmICXYXsGAjiIK/4Wq4GW7dj4ZBl9ljxC+/Qcv6aoL</latexit>but at test time we may make mistakes that propagate
I
classify classify
I I I
classify
I encoder I
classify
I I
classify
I
but at test time we may make mistakes that propagate
training, and cannot deal with them at test
I
classify classify
I I I
classify
I encoder I
classify
I I
classify
I
but at test time we may make mistakes that propagate
training, and cannot deal with them at test
phenomena such as repeating.
I
classify classify
I I I
classify
I encoder I
classify
I I
classify
I
e.g. BLEU or METEOR
e.g. BLEU or METEOR
e.g. BLEU or METEOR
e.g. BLEU or METEOR
Translation 2016 [Neubig 16]
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016 (Neubig 16)
Translation 2016 [Neubig 16]
23 24 25 26 27 MLE MLE+Length MinRisk 80 85 90 95 100 MLE MLE+Length MinRisk
BLEU Length Ratio
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016 (Neubig 16)
Translation 2016 [Neubig 16]
23 24 25 26 27 MLE MLE+Length MinRisk 80 85 90 95 100 MLE MLE+Length MinRisk
BLEU Length Ratio
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016 (Neubig 16)
Translation 2016 [Neubig 16]
23 24 25 26 27 MLE MLE+Length MinRisk 80 85 90 95 100 MLE MLE+Length MinRisk
BLEU Length Ratio
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016 (Neubig 16)
Translation 2016 [Neubig 16]
23 24 25 26 27 MLE MLE+Length MinRisk 80 85 90 95 100 MLE MLE+Length MinRisk
BLEU Length Ratio
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016 (Neubig 16)
Translation 2016 [Neubig 16]
23 24 25 26 27 MLE MLE+Length MinRisk 80 85 90 95 100 MLE MLE+Length MinRisk
BLEU Length Ratio
length problems, and does much better than heuristics
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016 (Neubig 16)
ˆ E = argmax ˜
EP( ˜
E | F)
<latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit><latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit><latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit>
ˆ E = argmax ˜
EP( ˜
E | F)
<latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit><latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit><latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit>error(E, ˆ E) = 1 − BLEU(E, ˆ E)
<latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit><latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit><latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit>
ˆ E = argmax ˜
EP( ˜
E | F)
<latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit><latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit><latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit>error(E, ˆ E) = 1 − BLEU(E, ˆ E)
<latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit><latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit><latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit>
conducive to gradient-based optimization
ˆ E = argmax ˜
EP( ˜
E | F)
<latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit><latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit><latexit sha1_base64="6ek90mJoNPTvCtomTW+aydQsu2s=">ACH3icbVBNSwMxEM3W7/pV9eglWAS9lF0RqgehKIrHClaFbinZ7LQNTXaXZFZalv0pXvwrXjyoiDf/jWkt4teDwJv3ZpjMCxIpDLru1OYmp6ZnZtfKC4uLa+sltbWr0ycag4NHstY3wTMgBQRNFCghJtEA1OBhOugfzLyr29BGxFHlzhMoKVYNxIdwRlaqV2q+j2G2WlOj6iPMCM6a5ig7yd+ShkCNbKaX3nq6C+EiE92Xym7FHYP+Jd6ElMkE9XbpzQ9jniqIkEtmTNzE2zZdSi4hLzopwYSxvusC01LI6bAtLxgTndtkpIO7G2L0I6Vr9PZEwZM1SB7VQMe+a3NxL/85opdg5amYiSFCHin4s6qaQY01FaNBQaOMqhJYxrYf9KeY9pxtFmWrQheL9P/ksae5XDinexX64dT9KYJ5tki+wQj1RJjZyTOmkQTu7IA3kiz8698+i8OK+frQVnMrNBfsB5/wAY9KMb</latexit>error(E, ˆ E) = 1 − BLEU(E, ˆ E)
<latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit><latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit><latexit sha1_base64="KRxJjxRAFBSumCLgm+mSm7rf7k=">ACHicbVDLSgNBEJyNrxhfUY9eBoMQcOuBNSDECIBDx4iuEZIQpidJIhsw9mesWw5Ee8+CtePKh48SD4N04eB40WNBRV3XR3eZEUGm37y0rNzS8sLqWXMyura+sb2c2tGx3GioPLQxmqW49pkCIAFwVKuI0UMN+TUP65yO/dgdKizC4xkETZ91A9ERnKGRWtliA+EeE1AqVMN85YA2egyTynCfnlHncGKWLyvuL6+VzdkFewz6lzhTkiNTVFvZj0Y75LEPAXLJtK47doTNhCkUXMIw04g1RIz3WRfqhgbMB91Mxt8N6Z5R2rQTKlMB0rH6cyJhvtYD3zOdPsOenvVG4n9ePcbOSTMRQRQjBHyqBNLiEdRUXbQgFHOTCEcSXMrZT3mGIcTaAZE4Iz+/Jf4h4VTgvOVTFXKk/TSJMdskvyxCHpEQuSJW4hJMH8kReyKv1aD1b9b7pDVlTWe2yS9Yn9F6BW</latexit>(over an n-best list for every hypothesis)
(over an n-best list for every hypothesis)
F1 φ1 φ2 φ3 err E1,1 1
0.6 E1,2 0
1
E1,3 1
1 1
F2 φ1 φ2 φ3 err E2,1
1
0.8
E2,2
3 1
0.3
E2,3
3 1 2
2 4
1 2 3 4
2 4
1 2 3 4
(a) (b) λ1=-1, λ2=1, λ3=0
2 4 1
2 4 1
2 4 1 2
(d)
α ←1.25
(c)
F1 candidates F2 candidates F1 error F2 error total error E1,1 E1,2 E1,3 E2,1 E2,2 E2,3
d1=0, d2=0, d3=1 λ1=-1, λ2=1, λ3=1.25
Minimum Risk Annealing for Training Log-Linear Models (Smith and Eisner 2006) Minimum risk training for neural machine translation (Shen et al. 2015)
Minimum Risk Annealing for Training Log-Linear Models (Smith and Eisner 2006) Minimum risk training for neural machine translation (Shen et al. 2015)
risk(F, E, θ) = X
˜ E
P( ˜ E | F; θ)error(E, ˜ E).
<latexit sha1_base64="iwD7OmBG4KhDZEWl5K36ziE3oIk=">ACTHicbVFdSyMxFM1U14/uh1UfQmWhRakzIigIoIoLfvYhe1W6JSydza0GRmSO6IZg/6Iuwb/svfPFBRdi0Hcpu3QuBk3POvUlOgkQKg672ymtrH5YW9/YLH/89PnLVmV756eJU82hw2MZ6+uAGZAig4KlHCdaGAqkNANxldTvXsL2og4+oGTBPqK3URiKDhDSw0qoY9wh5kWZpzXWge0eUB9HAGyOj2nvknVIPNRyBCyZp7Tdm2xob4SIW2dLezQaB1rPabEzhrDcGlarbcGdF3wOvAFVSVHtQ+eWHMU8VRMglM6bnuQn2M6ZRcAl52U8NJIyP2Q30LIyYAtPZmnk9KtlQjqMtV0R0hn7d0fGlDETFVinYjgy9qU/J/WS3F40s9ElKQIEZ8fNEwlxZhOo6Wh0MBRTixgXAt7V8pHTDO9gPKNgRv+cnvQewcdrwvh9VLy6LNDbIHtknNeKRY3JBvpE26RBO7skjeSYvzoPz5Lw6b3NrySl6dsk/Vr7A9xQsM=</latexit><latexit sha1_base64="iwD7OmBG4KhDZEWl5K36ziE3oIk=">ACTHicbVFdSyMxFM1U14/uh1UfQmWhRakzIigIoIoLfvYhe1W6JSydza0GRmSO6IZg/6Iuwb/svfPFBRdi0Hcpu3QuBk3POvUlOgkQKg672ymtrH5YW9/YLH/89PnLVmV756eJU82hw2MZ6+uAGZAig4KlHCdaGAqkNANxldTvXsL2og4+oGTBPqK3URiKDhDSw0qoY9wh5kWZpzXWge0eUB9HAGyOj2nvknVIPNRyBCyZp7Tdm2xob4SIW2dLezQaB1rPabEzhrDcGlarbcGdF3wOvAFVSVHtQ+eWHMU8VRMglM6bnuQn2M6ZRcAl52U8NJIyP2Q30LIyYAtPZmnk9KtlQjqMtV0R0hn7d0fGlDETFVinYjgy9qU/J/WS3F40s9ElKQIEZ8fNEwlxZhOo6Wh0MBRTixgXAt7V8pHTDO9gPKNgRv+cnvQewcdrwvh9VLy6LNDbIHtknNeKRY3JBvpE26RBO7skjeSYvzoPz5Lw6b3NrySl6dsk/Vr7A9xQsM=</latexit><latexit sha1_base64="iwD7OmBG4KhDZEWl5K36ziE3oIk=">ACTHicbVFdSyMxFM1U14/uh1UfQmWhRakzIigIoIoLfvYhe1W6JSydza0GRmSO6IZg/6Iuwb/svfPFBRdi0Hcpu3QuBk3POvUlOgkQKg672ymtrH5YW9/YLH/89PnLVmV756eJU82hw2MZ6+uAGZAig4KlHCdaGAqkNANxldTvXsL2og4+oGTBPqK3URiKDhDSw0qoY9wh5kWZpzXWge0eUB9HAGyOj2nvknVIPNRyBCyZp7Tdm2xob4SIW2dLezQaB1rPabEzhrDcGlarbcGdF3wOvAFVSVHtQ+eWHMU8VRMglM6bnuQn2M6ZRcAl52U8NJIyP2Q30LIyYAtPZmnk9KtlQjqMtV0R0hn7d0fGlDETFVinYjgy9qU/J/WS3F40s9ElKQIEZ8fNEwlxZhOo6Wh0MBRTixgXAt7V8pHTDO9gPKNgRv+cnvQewcdrwvh9VLy6LNDbIHtknNeKRY3JBvpE26RBO7skjeSYvzoPz5Lw6b3NrySl6dsk/Vr7A9xQsM=</latexit>Minimum Risk Annealing for Training Log-Linear Models (Smith and Eisner 2006) Minimum risk training for neural machine translation (Shen et al. 2015)
risk(F, E, θ) = X
˜ E
P( ˜ E | F; θ)error(E, ˜ E).
<latexit sha1_base64="iwD7OmBG4KhDZEWl5K36ziE3oIk=">ACTHicbVFdSyMxFM1U14/uh1UfQmWhRakzIigIoIoLfvYhe1W6JSydza0GRmSO6IZg/6Iuwb/svfPFBRdi0Hcpu3QuBk3POvUlOgkQKg672ymtrH5YW9/YLH/89PnLVmV756eJU82hw2MZ6+uAGZAig4KlHCdaGAqkNANxldTvXsL2og4+oGTBPqK3URiKDhDSw0qoY9wh5kWZpzXWge0eUB9HAGyOj2nvknVIPNRyBCyZp7Tdm2xob4SIW2dLezQaB1rPabEzhrDcGlarbcGdF3wOvAFVSVHtQ+eWHMU8VRMglM6bnuQn2M6ZRcAl52U8NJIyP2Q30LIyYAtPZmnk9KtlQjqMtV0R0hn7d0fGlDETFVinYjgy9qU/J/WS3F40s9ElKQIEZ8fNEwlxZhOo6Wh0MBRTixgXAt7V8pHTDO9gPKNgRv+cnvQewcdrwvh9VLy6LNDbIHtknNeKRY3JBvpE26RBO7skjeSYvzoPz5Lw6b3NrySl6dsk/Vr7A9xQsM=</latexit><latexit sha1_base64="iwD7OmBG4KhDZEWl5K36ziE3oIk=">ACTHicbVFdSyMxFM1U14/uh1UfQmWhRakzIigIoIoLfvYhe1W6JSydza0GRmSO6IZg/6Iuwb/svfPFBRdi0Hcpu3QuBk3POvUlOgkQKg672ymtrH5YW9/YLH/89PnLVmV756eJU82hw2MZ6+uAGZAig4KlHCdaGAqkNANxldTvXsL2og4+oGTBPqK3URiKDhDSw0qoY9wh5kWZpzXWge0eUB9HAGyOj2nvknVIPNRyBCyZp7Tdm2xob4SIW2dLezQaB1rPabEzhrDcGlarbcGdF3wOvAFVSVHtQ+eWHMU8VRMglM6bnuQn2M6ZRcAl52U8NJIyP2Q30LIyYAtPZmnk9KtlQjqMtV0R0hn7d0fGlDETFVinYjgy9qU/J/WS3F40s9ElKQIEZ8fNEwlxZhOo6Wh0MBRTixgXAt7V8pHTDO9gPKNgRv+cnvQewcdrwvh9VLy6LNDbIHtknNeKRY3JBvpE26RBO7skjeSYvzoPz5Lw6b3NrySl6dsk/Vr7A9xQsM=</latexit><latexit sha1_base64="iwD7OmBG4KhDZEWl5K36ziE3oIk=">ACTHicbVFdSyMxFM1U14/uh1UfQmWhRakzIigIoIoLfvYhe1W6JSydza0GRmSO6IZg/6Iuwb/svfPFBRdi0Hcpu3QuBk3POvUlOgkQKg672ymtrH5YW9/YLH/89PnLVmV756eJU82hw2MZ6+uAGZAig4KlHCdaGAqkNANxldTvXsL2og4+oGTBPqK3URiKDhDSw0qoY9wh5kWZpzXWge0eUB9HAGyOj2nvknVIPNRyBCyZp7Tdm2xob4SIW2dLezQaB1rPabEzhrDcGlarbcGdF3wOvAFVSVHtQ+eWHMU8VRMglM6bnuQn2M6ZRcAl52U8NJIyP2Q30LIyYAtPZmnk9KtlQjqMtV0R0hn7d0fGlDETFVinYjgy9qU/J/WS3F40s9ElKQIEZ8fNEwlxZhOo6Wh0MBRTixgXAt7V8pHTDO9gPKNgRv+cnvQewcdrwvh9VLy6LNDbIHtknNeKRY3JBvpE26RBO7skjeSYvzoPz5Lw6b3NrySl6dsk/Vr7A9xQsM=</latexit>function -> differentiable!
Minimum Risk Annealing for Training Log-Linear Models (Smith and Eisner 2006) Minimum risk training for neural machine translation (Shen et al. 2015)
calculate risk over that
calculate risk over that risk(F, E, S) = X
˜ E∈S
P( ˜ E | F) Z error(E, ˆ E)
<latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit><latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit><latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit>calculate risk over that
n-best search risk(F, E, S) = X
˜ E∈S
P( ˜ E | F) Z error(E, ˆ E)
<latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit><latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit><latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit>calculate risk over that
n-best search
risk(F, E, S) = X
˜ E∈S
P( ˜ E | F) Z error(E, ˆ E)
<latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit><latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit><latexit sha1_base64="s7VNmewP+sEAU60nHL1SnfP+azM=">ACTHicbVFNaxsxFNS6aZo4/XDbYy4ipmBDMLul0OYQC02OTokTkK8xmi1b2NhSbtIb0uN2D/YSyC3/otekhCILjQz76QGiYmfckjZJCoth+CeovVh5ufpqb2+8frN23eN9x+ObV4aDgOey9ycJsyCFBoGKFDCaWGAqUTCSTL9MdPfoKxItdHOCtgpNi5FpngD01bqQxwi90Rthp1ept0+42PWzTXRrbUo1djEKm4LoVjYWmh37LDOu3ogKJHSXrtyZx4vZoExualaflI8Yeg97XGjGXbCRdHnIFqCJlWf9y4jNOclwo0csmsHUZhgSPHDAouoarHpYWC8Sk7h6GHmimwI7dIo6KfPJPSLDd+aQL9mGHY8ramUq8UzGc2KfanPyfNiwx+zZyQhclgub3B2WlpJjTebQ0FQY4ypkHjBvh70r5hPnA0H9A3YcQPX3yczD43NnpRAdfmnvfl2mskU2yRVokIl/JHtknfTIgnPwmf8kVuQ4ugn/BTXB7b60Fy56P5FHVu8A7qWy3A=</latexit>minimizing risk
minimizing risk `reinforce(X, Y ) = −R( ˆ Y , Y ) log P( ˆ Y | X)
<latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit><latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit><latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit>minimizing risk
weight `reinforce(X, Y ) = −R( ˆ Y , Y ) log P( ˆ Y | X)
<latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit><latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit><latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit>minimizing risk
weight
`reinforce(X, Y ) = −R( ˆ Y , Y ) log P( ˆ Y | X)
<latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit><latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit><latexit sha1_base64="QJ/ljc72z58oUdsvi8ZHPU5Q/Xw=">ACK3icbVBNSwMxEM36bf2qevQSLEILWnZFUA+C6MVjFast3VKy6bQNzWaXZFYsy/4gL/4VQTyoePV/mNYKfj0IPN6bmcy8IJbCoOu+OBOTU9Mzs3PzuYXFpeWV/OralYkSzaHKIxnpWsAMSKGgigIl1GINLAwkXAf906F/fQPaiEhd4iCGZsi6SnQEZ2ilVv7UBylbqY9wi6kGoTqRHZxlxdp2vUSP6M5F0e8xTOvZNq2XfBl1aeVLoX4o2rRWauULbtkdgf4l3pgUyBiVv7Rb0c8CUEhl8yYhufG2EyZRsElZDk/MRAz3mdaFiqWAimY6OzeiWVdrUrmfQjpSv3ekLDRmEAa2MmTYM7+9ofif10iwc9BMhYoTBMU/P+okmJEh8nRtDAUQ4sYVwLuyvlPaYZR5tvzobg/T75L6nulg/L3vle4fhknMYc2SCbpEg8sk+OyRmpkCrh5I48kGfy4tw7T86r8/ZOuGMe9bJDzjvHwvDpmQ=</latexit>Minimum risk training for neural machine translation (Shen et al. 2015)
Chances are, this is you 😕
Minimum risk training for neural machine translation (Shen et al. 2015)
(not latent variables or standard RL settings)
(not latent variables or standard RL settings)
MLE to the full objective
for a particular sentence
for a particular sentence “This is an easy sentence” “Buffalo Buffalo Buffalo”
for a particular sentence Reward 0.8 0.3 “This is an easy sentence” “Buffalo Buffalo Buffalo”
for a particular sentence Reward 0.8 0.3 0.95 Baseline 0.1 “This is an easy sentence” “Buffalo Buffalo Buffalo”
for a particular sentence Reward 0.8 0.3 0.95 Baseline 0.1 B-R
0.2 “This is an easy sentence” “Buffalo Buffalo Buffalo”
for a particular sentence Reward 0.8 0.3 0.95 Baseline 0.1 B-R
0.2 “This is an easy sentence” “Buffalo Buffalo Buffalo”
reflect when we did better or worse than expected
for a particular sentence Reward 0.8 0.3 0.95 Baseline 0.1 B-R
0.2 “This is an easy sentence” “Buffalo Buffalo Buffalo”
reflect when we did better or worse than expected `baseline(X) = −(R( ˆ Y , Y ) − B( ˆ Y )) log P( ˆ Y | X)
for a particular sentence Reward 0.8 0.3 0.95 Baseline 0.1 B-R
0.2 “This is an easy sentence” “Buffalo Buffalo Buffalo”
reflect when we did better or worse than expected `baseline(X) = −(R( ˆ Y , Y ) − B( ˆ Y )) log P( ˆ Y | X)
can sample many different examples before performing update
can sample many different examples before performing update
done before an update to stabilize
can sample many different examples before performing update
done before an update to stabilize
them when we update parameters (experience replay, Lin 1993)
risk(F, E, θ, τ, S) = X
˜ E∈S
P( ˜ E | F; θ)1/τ Z error(E, ˆ E)
<latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit><latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit><latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit>distribution risk(F, E, θ, τ, S) = X
˜ E∈S
P( ˜ E | F; θ)1/τ Z error(E, ˆ E)
<latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit><latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit><latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit>1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
τ = 1 τ = 0.5 τ = 0.25 τ = 0.05
distribution
accounts for unsampled hypotheses that should be in the denominator risk(F, E, θ, τ, S) = X
˜ E∈S
P( ˜ E | F; θ)1/τ Z error(E, ˆ E)
<latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit><latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit><latexit sha1_base64="e4M3TNipvdjfyQh+cH52R4IE4w0=">ACa3icbVHLbhMxFPUMrxIeTQsLXguLqFIiRWGmqgRVhVSBWrEMKqEVmRB5PHcaK7ZnZN9BRNas+EN2fAIbvgFPOovSciXrHp1zH/ZxWkphMYp+BeGNm7du39m427l3/8HDze7W9mdbVIbDhBeyMGcpsyCFhgkKlHBWGmAqlXCaLt83+uk3MFYU+hOuSpgpdq5FLjhDT827PxKE7+iMsMu6fzykR0Oa4AKQNZlVQ3oyoG9pYis1dwkKmYE7qmkiND3xKTeMu3H/kqBERo8P2hmDry5+1Yypa/fFi+tVYExh6n6zaMHQNw3m3V40itZBr4O4BT3Sxnje/ZlkBa8UaOSWTuNoxJnjhkUXELdSoLJeNLdg5TDzVTYGdubVZNdzyT0bw/mika/Zyh2PK2pVKfaViuLBXtYb8nzatMH8zc0KXFYLmF4vySlIsaOM8zYQBjnLlAeNG+LtSvmDeQfT/0/EmxFefB1Mdkf7o/jXu/wXevGBnlOXpI+iclrckg+kDGZE5+B5vBk+Bp8Cd8HD4LX1yUhkHb84j8E+HOX85buPE=</latexit>1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
τ = 1 τ = 0.5 τ = 0.25 τ = 0.05
NMT+ MinRisk PBMT+MERT
NMT+ MinRisk PBMT+MERT Model NMT PBMT
NMT+ MinRisk PBMT+MERT Model NMT PBMT Optimized Parameters Millions 5-30 Log-linear Weights (others MLE)
NMT+ MinRisk PBMT+MERT Model NMT PBMT Optimized Parameters Millions 5-30 Log-linear Weights (others MLE) Objective Risk Error
NMT+ MinRisk PBMT+MERT Model NMT PBMT Optimized Parameters Millions 5-30 Log-linear Weights (others MLE) Objective Risk Error Metric Granularity Sentence Level Corpus Level
NMT+ MinRisk PBMT+MERT Model NMT PBMT Optimized Parameters Millions 5-30 Log-linear Weights (others MLE) Objective Risk Error Metric Granularity Sentence Level Corpus Level n-best Lists Re-generated Accumulated
model?
Freezing Subnetworks to Analyze Domain Adaptation in
model?
Freezing Subnetworks to Analyze Domain Adaptation in
models as a linear combination of a few hyper-parameters?
Contextualized Parameter Generation for Universal NMT. Platanios et al. 2018.
W = X
i
αiWi
<latexit sha1_base64="Ko9ZPauNruXiU2+UoH4L6VexyWk=">AB/3icbZDNSsNAFIUn9a/Wv6gLF24Gi+CqJCKoC6HoxmUFYwpNCDfTaTt0MgkzE6GEbnwVNy5U3Poa7nwbp20W2npg4OPce7lzT5xprTjfFuVpeWV1bXqem1jc2t7x97de1BpLgn1SMpT2Y5BUc4E9TnLYzSGJOfXj4c2k7j9SqVgq7vUo2ECfcF6jIA2VmQf+PgKBypPIoYD4NkADPgRi+y603CmwovglBHpVqR/RV0U5InVGjCQamO62Q6LEBqRjgd14Jc0QzIEPq0Y1BAQlVYTA8Y42PjdHEvleYJjafu74kCEqVGSWw6E9ADNV+bmP/VOrnuXYQFE1muqSCzRb2cY53iSRq4yQlmo8MAJHM/BWTAUg2mRWMyG48ycvgnfauGy4d2f15nWZRhUdoiN0glx0jproFrWQhwgao2f0it6sJ+vFerc+Zq0Vq5zZR39kf4AcJ+VOw=</latexit><latexit sha1_base64="Ko9ZPauNruXiU2+UoH4L6VexyWk=">AB/3icbZDNSsNAFIUn9a/Wv6gLF24Gi+CqJCKoC6HoxmUFYwpNCDfTaTt0MgkzE6GEbnwVNy5U3Poa7nwbp20W2npg4OPce7lzT5xprTjfFuVpeWV1bXqem1jc2t7x97de1BpLgn1SMpT2Y5BUc4E9TnLYzSGJOfXj4c2k7j9SqVgq7vUo2ECfcF6jIA2VmQf+PgKBypPIoYD4NkADPgRi+y603CmwovglBHpVqR/RV0U5InVGjCQamO62Q6LEBqRjgd14Jc0QzIEPq0Y1BAQlVYTA8Y42PjdHEvleYJjafu74kCEqVGSWw6E9ADNV+bmP/VOrnuXYQFE1muqSCzRb2cY53iSRq4yQlmo8MAJHM/BWTAUg2mRWMyG48ycvgnfauGy4d2f15nWZRhUdoiN0glx0jproFrWQhwgao2f0it6sJ+vFerc+Zq0Vq5zZR39kf4AcJ+VOw=</latexit><latexit sha1_base64="Ko9ZPauNruXiU2+UoH4L6VexyWk=">AB/3icbZDNSsNAFIUn9a/Wv6gLF24Gi+CqJCKoC6HoxmUFYwpNCDfTaTt0MgkzE6GEbnwVNy5U3Poa7nwbp20W2npg4OPce7lzT5xprTjfFuVpeWV1bXqem1jc2t7x97de1BpLgn1SMpT2Y5BUc4E9TnLYzSGJOfXj4c2k7j9SqVgq7vUo2ECfcF6jIA2VmQf+PgKBypPIoYD4NkADPgRi+y603CmwovglBHpVqR/RV0U5InVGjCQamO62Q6LEBqRjgd14Jc0QzIEPq0Y1BAQlVYTA8Y42PjdHEvleYJjafu74kCEqVGSWw6E9ADNV+bmP/VOrnuXYQFE1muqSCzRb2cY53iSRq4yQlmo8MAJHM/BWTAUg2mRWMyG48ycvgnfauGy4d2f15nWZRhUdoiN0glx0jproFrWQhwgao2f0it6sJ+vFerc+Zq0Vq5zZR39kf4AcJ+VOw=</latexit>want to do in the first place?
want to do in the first place?
move towards a peakier distribution?
Minimum risk annealing for training log-linear models. Smith and Eisner 2006.
1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
1 2 3 4 0.5 1 1.5 2
τ = 1 τ = 0.5 τ = 0.25 τ = 0.05
Training progression
average
average
average
average
Optimizing for sentence-level BLEU+1 yields short translations. Naklov et al. 2012.
average
Optimizing for sentence-level BLEU+1 yields short translations. Naklov et al. 2012.
statistics to approximate corpus BLEU?
Online large-margin training of syntactic and structural translation features. Chiang et
lists across epochs:
new n-best 2 n-best 1
Epoch 1
n-best 1
Epoch 2
new n-best 2 n-best 1
Epoch 3
new n-best 3
lists across epochs:
new n-best 2 n-best 1
Epoch 1
n-best 1
Epoch 2
new n-best 2 n-best 1
Epoch 3
new n-best 3
parameters, it still has good hypotheses from which to recover.
lists across epochs:
new n-best 2 n-best 1
Epoch 1
n-best 1
Epoch 2
new n-best 2 n-best 1
Epoch 3
new n-best 3
parameters, it still has good hypotheses from which to recover.
experience replay in RL:
Self-improving reactive agents based on reinforcement learning, planning and teaching. Lin 1992.
Optimization for Statistical Machine Translation, a Survey (Neubig and Watanabe 2016)
Optimization for Statistical Machine Translation, a Survey (Neubig and Watanabe 2016)