Rethinking the Generation Orders of Sequence
jcykcai
Rethinking the Generation Orders of Sequence jcykcai Why - - PowerPoint PPT Presentation
Rethinking the Generation Orders of Sequence jcykcai Why left-to-right? Humans do it But humans also do First generate some abstract of what to say Then serialize them The Importance of Generation Order in Language Modeling
jcykcai
The Importance of Generation Order in Language Modeling
Nicolas Ford∗ Daniel Duckworth Mohammad Norouzi George E. Dahl Google Brain {nicf,duckworthd,mnorouzi,gdahl}@google.com EMNLP18
tokens
special placeholders
sentence common first rare first function first content first
” all you need to do if you want the na- tion ’s press camped
say you once had a [UNK] in 1947 , ” he noted memorably in his diary . [EOS] ” all you to if you the ’s
is to you had a [UNK] in , ” he in his . [EOS] need do want nation press camped your doorstep say
1947 noted memorably diary [EOS] ” all you to if you the ’s
is to you a in , ” he in his . [EOS] need do want nation press camped doorstep say
[UNK] 1947 noted memorably diary [EOS] ” all you need you the nation ’s press camped on your doorstep say you
” noted his . [EOS] the team announced thursday that the 6- foot-1 , [UNK] starter will remain in detroit through the 2013 sea- son . [EOS] the that the , [UNK] will in the . [EOS] team announced thursday 6-foot-1 starter remain detroit through 2013 season [EOS] the that the , will in through the . [EOS] team announced thursday 6-foot-1 [UNK] starter remain detroit 2013 season [EOS] the team announced the 6-foot-1 will remain through the 2013 . [EOS] scotland ’s next game is a friendly against the czech republic at hampden on 3 march . [EOS] ’s is a the at
. [EOS] scotland next game friendly against czech republic ham- pden 3 march [EOS] ’s is a against the at
. [EOS] scotland next game friendly czech republic ham- pden 3 march [EOS] ’s next game the czech republic at hampden on 3 march . [EOS]
additional homeown- ers did make a big mis- take : they took ad- vantage of ” liar loans ” and other [UNK] deals to buy homes they couldn ’t afford . [EOS]
,
a : they
” ” and [UNK] to they ’t . [EOS] course millions additional homeown- ers did make big mistake took ad- vantage liar loans
deals buy homes couldn afford [EOS]
,
a : they
” and to they . [EOS] course millions additional home-
did make big mistake took advantage liar loans
[UNK] deals buy homes couldn ’t afford [EOS]
big they advantage of ” liar ” and other deals buy homes they couldn afford . [EOS]
Table 1: Some example sentences from the dataset and their corresponding templates. The placeholder token is
decoder
p(y) = p1(y(1)) p2(y(2) | y(1))
Model Train Validation Test
39.925 45.377 45.196 rare first 38.283 43.293 43.077 content first 38.321 42.564 42.394 common first 36.525 41.018 40.895 function first 36.126 40.246 40.085 baseline 38.668 41.888 41.721 enhanced baseline 35.945 39.845 39.726
https://arxiv.org/pdf/1902.01370.pdf https://arxiv.org/pdf/1902.02192.pdf https://arxiv.org/pdf/1902.03249.pdf
Insertion Transformer: Flexible Sequence Generation via Insertion Operations
Mitchell Stern 1 2 William Chan 1 Jamie Kiros 1 Jakob Uszkoreit 1
ICML19
yt) = InsertionTransformer(x, ˆ yt). As an example, suppose our current hypothesis can
Insertion Transformer: Flexible Sequence Generation via Insertion Operations
Serial generation:
t Canvas Insertion [] (ate, 0) 1 [ate] (together, 1) 2 [ate, together] (friends, 0) 3 [friends, ate, together] (three, 0) 4 [three, friends, ate, together] (lunch, 3) 5 [three, friends, ate, lunch, together] (hEOSi, 5)
Parallel generation:
t Canvas Insertions [] (ate, 0) 1 [ate] (friends, 0), (together, 1) 2 [friends, ate, together] (three, 0), (lunch, 2) 3 [three, friends, ate, lunch, together] (hEOSi, 5) Figure 1. Examples demonstrating how the clause “three friends ate lunch together” can be generated using our insertion framework. On the left, a serial generation process is used in which one insertion is performed at a time. On the right, a parallel generation process is used with multiple insertions being allowed per time step. Our model can either be trained to follow specific orderings or to maximize entropy
Loss Termination BLEU (+EOS) BLEU (+EOS) BLEU (+EOS) +Distillation +Distillation, +Parallel Left-to-Right Sequence 20.92 (20.92) 23.29 (23.36)
Slot 20.35 (21.39) 24.49 (25.55) 25.33 (25.70) Binary Tree (τ = 1.0) Slot 21.02 (22.37) 24.36 (25.43) 25.43 (25.76) Binary Tree (τ = 2.0) Slot 20.52 (21.95) 24.59 (25.80) 25.33 (25.80) Uniform Sequence 19.34 (22.64) 22.75 (25.45)
Slot 18.26 (22.16) 22.39 (25.58) 24.31 (24.91)
circumvented by making multiple updates to the hypothesis at once.
Model BLEU Iterations Autoregressive Left-to-Right Transformer (Vaswani et al., 2017) 27.3 n Semi-Autoregressive Left-to-Right SAT (Wang et al., 2018) 24.83 n/6 Blockwise Parallel (Stern et al., 2018) 27.40 ⇡ n/5 Non-Autoregressive NAT (Gu et al., 2018) 17.69 1 Iterative Refinement (Lee et al., 2018) 21.61 10 Our Approach (Greedy) Insertion Transformer + Left-to-Right 23.94 n Insertion Transformer + Binary Tree 27.29 n Insertion Transformer + Uniform 27.12 n Our Approach (Parallel) Insertion Transformer + Binary Tree 27.41 ⇡ log2 n Insertion Transformer + Uniform 26.72 ⇡ log2 n
position after each insertion
Non-Monotonic Sequential Text Generation
Sean Welleck 1 Kiant´ e Brantley 2 Hal Daum´ e III 2 3 Kyunghyun Cho 1 4 5
are how ? you
<end> <end> <end> <end> <end>
1 2 3 4 5 6 7 8 9 4 1 8 3 2 5 6 7 9
ICML19
generating words to its left and words to its right.
are how ? you
<end> <end> <end> <end> <end>
1 2 3 4 5 6 7 8 9 4 1 8 3 2 5 6 7 9
Figure 1. A sequence, “how are you ?”, generated by the proposed
are how ? you
<end> <end> <end> <end> <end>
1 2 3 4 5 6 7 8 9 4 1 8 3 2 5 6 7 9
Figure 1. A sequence, “how are you ?”, generated by the proposed
preferences
π⇤
coaching(a|s) / π⇤ uniform(a|s) π(a|s)
π⇤
annealed(a|s) = βπ⇤
uniform(a|s) + (1 β)π⇤ coaching(a|s)
nodes in a level-order traversal.
Oracle %Novel %Unique Avg. Tokens Avg. Span BLEU left-right 17.8 97.0 11.9 1.0 47.0 uniform 98.3 99.9 13.0 1.43 40.0 annealed 93.1 98.2 10.6 1.31 56.2 Validation 97.0 100 12.1
policies trained on Persona-Chat. A sample is novel when it is not in the training set. Percent unique is the cardinality of the set of sampled sentences divided by the number of sampled sentences.
Validation Test Oracle BLEU (BP) Meteor YiSi Ribes BLEU (BP) Meteor YiSi Ribes left-right 32.30 (0.95) 31.96 69.41 84.80 28.00 (1.00) 30.10 65.22 82.29 uniform 24.50 (0.84) 27.98 66.40 82.66 21.40 (0.86) 26.40 62.41 80.00 annealed 26.80 (0.88) 29.67 67.88 83.61 23.30 (0.91) 27.96 63.38 80.91 +tree-encoding 28.00 (0.86) 30.15 68.43 84.36 24.30 (0.91) 28.59 63.87 81.64 +hendi-tuning 29.10 (0.99) 31.00 68.81 83.51 24.60 (1.00) 29.30 64.18 80.53
nodes are lined up following the ignorer traversal.
Insertion-based Decoding with automatically Inferred Generation Order
Jiatao Gu
†, Qi Liu † and Kyunghyun Cho †‡ †Facebook AI Research ‡New York University, CIFAR Azrieli Global Scholar †{jgu, qiliu, kyunghyuncho}@fb.com
rt
i,j =
8 > < > : 1 zt
j > zt i (left)
zt
j = zt i (middle)
1 zt
j < zt i (right)
,
Rt+1 = 2 6 6 6 4 rt+1
t+1,0
Rt . . . rt+1
t+1,t
rt+1
t+1,0
· · · rt+1
t+1,t
3 7 7 7 5 (5)
<S> </S> dream I a
Relative Positions
Transformer-Decoder
R L R R L LCausal Self-attention Update
a
+1
Rkey for insert at right
Lkey for insert at left
ht
<latexit sha1_base64="bfTi8g3GUSdgtNjKa2Fy0oX2GsY=">AB6nicbVA9SwNBEJ2LXzF+RQUbm8UgWIU7m1gGbSwTNB+QHFvs5cs2ds7dueEcOQn2FgoYusvsrPxt7j5KDTxwcDjvRlm5gWJFAZd98vJra1vbG7ltws7u3v7B8XDo6aJU814g8Uy1u2AGi6F4g0UKHk70ZxGgeStYHQz9VuPXBsRq3scJ9yP6ECJUDCKVrob9rBXLldwaySrwFKVP6t8PAFDrFT+7/ZilEVfIJDWm47kJ+hnVKJjk0I3NTyhbEQHvGOpohE3fjY7dULOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PsTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tOwYbgLb+8SpqXZc8te3WbxjXMkYdTOIML8KACVbiFGjSAwQCe4AVeHek8O2/O+7w15yxmjuEPnI8f5jSPtA=</latexit><latexit sha1_base64="MUCBKc2QkRNerSU/qY62O5UdoSc=">AB6nicbVA9SwNBEJ2LXzF+RQUbm8UgWIU7Gy1DbCwTNImQHGFvs5cs2ds7dueEcOQn2FgoYmvrv/AX2Nn4W9x8FJr4YODx3gwz84JECoOu+XkVlbX1jfym4Wt7Z3dveL+QdPEqWa8wWIZ67uAGi6F4g0UKPldojmNAslbwfBq4rfuTYiVrc4Srgf0b4SoWAUrXQz6GK3WHL7hRkmXhzUqoc1b/Fe/Wj1i1+dnoxSyOukElqTNtzE/QzqlEwyceFTmp4QtmQ9nbUkUjbvxseuqYnFqlR8JY21JIpurviYxGxoyiwHZGFAdm0ZuI/3ntFMNLPxMqSZErNlsUpJgTCZ/k57QnKEcWUKZFvZWwgZU4Y2nYINwVt8eZk0z8ueW/bqNo0qzJCHYziBM/DgAipwDTVoAIM+PMATPDvSeXRenNdZa86ZzxzCHzhvPzeEkXA=</latexit><latexit sha1_base64="MUCBKc2QkRNerSU/qY62O5UdoSc=">AB6nicbVA9SwNBEJ2LXzF+RQUbm8UgWIU7Gy1DbCwTNImQHGFvs5cs2ds7dueEcOQn2FgoYmvrv/AX2Nn4W9x8FJr4YODx3gwz84JECoOu+XkVlbX1jfym4Wt7Z3dveL+QdPEqWa8wWIZ67uAGi6F4g0UKPldojmNAslbwfBq4rfuTYiVrc4Srgf0b4SoWAUrXQz6GK3WHL7hRkmXhzUqoc1b/Fe/Wj1i1+dnoxSyOukElqTNtzE/QzqlEwyceFTmp4QtmQ9nbUkUjbvxseuqYnFqlR8JY21JIpurviYxGxoyiwHZGFAdm0ZuI/3ntFMNLPxMqSZErNlsUpJgTCZ/k57QnKEcWUKZFvZWwgZU4Y2nYINwVt8eZk0z8ueW/bqNo0qzJCHYziBM/DgAipwDTVoAIM+PMATPDvSeXRenNdZa86ZzxzCHzhvPzeEkXA=</latexit><latexit sha1_base64="O7FfWHYl4nxml/caqTcK6XxGg/w=">AB6nicbVA9TwJBEJ3DL8Qv1NJmIzGxInc2WhJtLDEKksCF7C17sGFv97I7Z0Iu/AQbC42x9RfZ+W9c4AoFXzLJy3szmZkXpVJY9P1vr7S2vrG5Vd6u7Ozu7R9UD4/aVmeG8RbTUptORC2XQvEWCpS8kxpOk0jyx2h8M/Mfn7ixQqsHnKQ8TOhQiVgwik6H/WxX635dX8OskqCgtSgQLNf/eoNMsSrpBJam038FMc2pQMmnlV5meUrZmA51FE27DfH7qlJw5ZUBibVwpJHP190ROE2snSeQ6E4oju+zNxP+8bobxVZgLlWbIFVsijNJUJPZ32QgDGcoJ45QZoS7lbARNZShS6fiQgiWX14l7Yt64NeDO7/WuC7iKMJnMI5BHAJDbiFJrSAwRCe4RXePOm9eO/ex6K15BUzx/AH3ucPV5yNzw=</latexit>C
<latexit sha1_base64="qitYjHnTvWLMhbcMUcBblbEo8=">AB6HicbZC7SwNBEMbnfMb4ilraLAbBKtzZaCMG01gmYB6QHGFvM5es2ds7dveEcATsbSwUsfWfsbfzv3HzKDTxg4Uf3zfDzkyQCK6N6347K6tr6xubua389s7u3n7h4LCh41QxrLNYxKoVUI2CS6wbgS2EoU0CgQ2g2FlkjcfUGkeyzszStCPaF/ykDNqrFWrdAtFt+RORZbBm0Px+jN/9QgA1W7hq9OLWRqhNExQrduemxg/o8pwJnCc76QaE8qGtI9ti5JGqP1sOuiYnFqnR8JY2ScNmbq/OzIaT2KAlsZUTPQi9nE/C9rpya89DMuk9SgZLOPwlQE5PJ1qTHFTIjRhYoU9zOStiAKsqMvU3eHsFbXHkZGuclzy15NbdYvoGZcnAMJ3AGHlxAGW6hCnVgPAEL/Dq3DvPzpvzPitdceY9R/BHzscPA+yOkA=</latexit><latexit sha1_base64="EQAM1j/Ie8eqC8qE7NRLV0Spmg=">AB6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EYBrLBMwFkiXMTs4mY2Znl5lZISx5AhsLRWz1YextxLdxcik08YeBj/8/hznBAlnSrvut5VbWV1b38hv2lvbO7t7hf2DhopTSbFOYx7LVkAUciawrpnm2Eokijg2AyGlUnevEepWCxu9ShBPyJ9wUJGiTZWrdItFN2SO5WzDN4cilcf9mXy/mVXu4XPTi+maYRCU06Uantuov2MSM0ox7HdSRUmhA5JH9sGBYlQ+dl0LFzYpyeE8bSPKGdqfu7IyORUqMoMJUR0QO1mE3M/7J2qsMLP2MiSTUKOvsoTLmjY2eytdNjEqnmIwOESmZmdeiASEK1uY1tjuAtrwMjbOS5a8mlsX8NMeTiCYzgFD86hDdQhTpQHiAJ3i27qxH68V6nZXmrHnPIfyR9fYD9WyQBA=</latexit><latexit sha1_base64="EQAM1j/Ie8eqC8qE7NRLV0Spmg=">AB6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EYBrLBMwFkiXMTs4mY2Znl5lZISx5AhsLRWz1YextxLdxcik08YeBj/8/hznBAlnSrvut5VbWV1b38hv2lvbO7t7hf2DhopTSbFOYx7LVkAUciawrpnm2Eokijg2AyGlUnevEepWCxu9ShBPyJ9wUJGiTZWrdItFN2SO5WzDN4cilcf9mXy/mVXu4XPTi+maYRCU06Uantuov2MSM0ox7HdSRUmhA5JH9sGBYlQ+dl0LFzYpyeE8bSPKGdqfu7IyORUqMoMJUR0QO1mE3M/7J2qsMLP2MiSTUKOvsoTLmjY2eytdNjEqnmIwOESmZmdeiASEK1uY1tjuAtrwMjbOS5a8mlsX8NMeTiCYzgFD86hDdQhTpQHiAJ3i27qxH68V6nZXmrHnPIfyR9fYD9WyQBA=</latexit><latexit sha1_base64="n9q4lqwqCrBAGkuJ+w9AiHXD4=">AB6HicbVA9TwJBEJ3DL8Qv1NJmIzGxInc2UhJpLCGRjwQuZG+Zg5W9vcvungm58AtsLDTG1p9k579xgSsUfMkL+/NZGZekAiujet+O4Wt7Z3dveJ+6eDw6PikfHrW0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8G0sfC7T6g0j+WDmSXoR3QsecgZNVZqNYblilt1lyCbxMtJBXI0h+WvwShmaYTSMEG17ntuYvyMKsOZwHlpkGpMKJvSMfYtlTRC7WfLQ+fkyiojEsbKljRkqf6eyGik9SwKbGdEzUSvewvxP6+fmrDmZ1wmqUHJVovCVBATk8XZMQVMiNmlCmuL2VsAlVlBmbTcmG4K2/vEk6N1XPrXot1K/y+MowgVcwjV4cAt1uIcmtIEBwjO8wpvz6Lw4787HqrXg5DPn8AfO5w+UpYzD</latexit>D
<latexit sha1_base64="k+Evk9/LT2U3D2FUCx6x034befI=">AB6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UYMamGZgHlAcoS9zVyZm/v2N0TQgjY21goYus/Y2/nf+PmUWjiBws/vm+GnZkwFVwbz/t2lpZXVtfWcxvu5tb2zm5+b7+mk0wxrLJEJKoRUo2CS6wabgQ2UoU0DgXWw/71OK8/oNI8kXdmkGIQ067kEWfUWKty084XvKI3EVkEfwaFy0/34hEAyu38V6uTsCxGaZigWjd9LzXBkCrDmcCR28o0pT1aRebFiWNUQfDyaAjcmydDokSZ80ZOL+7hjSWOtBHNrKmJqens/G5n9ZMzPReTDkMs0MSjb9KMoEMQkZb06XCEzYmCBMsXtrIT1qKLM2Nu49gj+/MqLUDst+l7Rr3iF0hVMlYNDOIT8OEMSnALZagCA4QneIFX595dt6c92npkjPrOYA/cj5+AVwjpE=</latexit><latexit sha1_base64="G8RSHBlg9lJEMWdbTCPcRxw1n8=">AB6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EoBaWCZgLJEuYnZxNxszOLjOzQljyBDYWitjqw9jbiG/j5FJo4g8DH/9/DnPOCRLOlHbdbyu3tLyupZftzc2t7Z3Crt7dRWnkmKNxjyWzYAo5ExgTPNsZlIJFHAsREMrsZ54x6lYrG41cME/Yj0BAsZJdpY1etOoeiW3ImcRfBmULz4sM+T9y+70il8trsxTSMUmnKiVMtzE+1nRGpGOY7sdqowIXRAetgyKEiEys8mg46cI+N0nTCW5gntTNzfHRmJlBpGgamMiO6r+Wxs/pe1Uh2e+RkTSapR0OlHYcodHTvjrZ0uk0g1HxogVDIzq0P7RBKqzW1scwRvfuVFqJ+UPLfkVd1i+RKmysMBHMIxeHAKZbiBCtSAsIDPMGzdWc9Wi/W67Q0Z8169uGPrLcf9vCQBQ=</latexit><latexit sha1_base64="G8RSHBlg9lJEMWdbTCPcRxw1n8=">AB6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EoBaWCZgLJEuYnZxNxszOLjOzQljyBDYWitjqw9jbiG/j5FJo4g8DH/9/DnPOCRLOlHbdbyu3tLyupZftzc2t7Z3Crt7dRWnkmKNxjyWzYAo5ExgTPNsZlIJFHAsREMrsZ54x6lYrG41cME/Yj0BAsZJdpY1etOoeiW3ImcRfBmULz4sM+T9y+70il8trsxTSMUmnKiVMtzE+1nRGpGOY7sdqowIXRAetgyKEiEys8mg46cI+N0nTCW5gntTNzfHRmJlBpGgamMiO6r+Wxs/pe1Uh2e+RkTSapR0OlHYcodHTvjrZ0uk0g1HxogVDIzq0P7RBKqzW1scwRvfuVFqJ+UPLfkVd1i+RKmysMBHMIxeHAKZbiBCtSAsIDPMGzdWc9Wi/W67Q0Z8169uGPrLcf9vCQBQ=</latexit><latexit sha1_base64="N532L3S4+3uGp4iHp6lOnatguw=">AB6HicbVA9SwNBEJ2LXzF+RS1tFoNgFe5sTBnUwjIB8wHJEfY2c8mavb1jd08IR36BjYUitv4kO/+Nm+QKTXw8Hhvhpl5QSK4Nq7RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7AST27nfeUKleSwfzDRBP6IjyUPOqLFS825QrhVdwGyTrycVCBHY1D+6g9jlkYoDRNU657nJsbPqDKcCZyV+qnGhLIJHWHPUkj1H62OHRGLqwyJGsbElDFurviYxGWk+jwHZG1Iz1qjcX/N6qQlrfsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXdCv1mzyOIpzBOVyCB9dQh3toQAsYIDzDK7w5j86L8+58LFsLTj5zCn/gfP4AlimMxA=</latexit>E
<latexit sha1_base64="NlFKxwEorTyvX+ZWl7isJb1VZk=">AB6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UYMimCZgHlAcoS9zVyZm/v2N0TQgjY21goYus/Y2/nf+PmUWjiBws/vm+GnZkwFVwbz/t2lpZXVtfWcxvu5tb2zm5+b7+mk0wxrLJEJKoRUo2CS6wabgQ2UoU0DgXWw/71OK8/oNI8kXdmkGIQ067kEWfUWKty084XvKI3EVkEfwaFy0/34hEAyu38V6uTsCxGaZigWjd9LzXBkCrDmcCR28o0pT1aRebFiWNUQfDyaAjcmydDokSZ80ZOL+7hjSWOtBHNrKmJqens/G5n9ZMzPReTDkMs0MSjb9KMoEMQkZb06XCEzYmCBMsXtrIT1qKLM2Nu49gj+/MqLUDst+l7Rr3iF0hVMlYNDOIT8OEMSnALZagCA4QneIFX595dt6c92npkjPrOYA/cj5+Ab0jpI=</latexit><latexit sha1_base64="bH4UPRdYnmM6bMmSz7bDix7UCZc=">AB6HicbZDLSsNAFIZP6q3GW9Wlm2ARXJXEjW7EoguW7AXaEOZTE/asZNJmJkIJfQJ3LhQxK0+jHs34ts4vSy09YeBj/8/hznBAlnSrvut5VbWl5ZXcuv2xubW9s7hd29uopTSbFGYx7LZkAUciawpnm2Ewkijg2AgGV+O8cY9SsVjc6mGCfkR6goWMEm2s6nWnUHRL7kTOIngzKF582OfJ+5d6RQ+292YphEKTlRquW5ifYzIjWjHEd2O1WYEDogPWwZFCRC5WeTQUfOkXG6ThL84R2Ju7vjoxESg2jwFRGRPfVfDY2/8taqQ7P/IyJNUo6PSjMOWOjp3x1k6XSaSaDw0QKpmZ1aF9IgnV5ja2OYI3v/Ii1E9Knlvyqm6xfAlT5eEADuEYPDiFMtxABWpAeEBnuDZurMerRfrdVqas2Y9+/BH1tsP+HSQBg=</latexit><latexit sha1_base64="bH4UPRdYnmM6bMmSz7bDix7UCZc=">AB6HicbZDLSsNAFIZP6q3GW9Wlm2ARXJXEjW7EoguW7AXaEOZTE/asZNJmJkIJfQJ3LhQxK0+jHs34ts4vSy09YeBj/8/hznBAlnSrvut5VbWl5ZXcuv2xubW9s7hd29uopTSbFGYx7LZkAUciawpnm2Ewkijg2AgGV+O8cY9SsVjc6mGCfkR6goWMEm2s6nWnUHRL7kTOIngzKF582OfJ+5d6RQ+292YphEKTlRquW5ifYzIjWjHEd2O1WYEDogPWwZFCRC5WeTQUfOkXG6ThL84R2Ju7vjoxESg2jwFRGRPfVfDY2/8taqQ7P/IyJNUo6PSjMOWOjp3x1k6XSaSaDw0QKpmZ1aF9IgnV5ja2OYI3v/Ii1E9Knlvyqm6xfAlT5eEADuEYPDiFMtxABWpAeEBnuDZurMerRfrdVqas2Y9+/BH1tsP+HSQBg=</latexit><latexit sha1_base64="UuWujcJ1qnBgRNBp8ukQAU/1iwM=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m82GNRBI8t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJYPZpqgH9GR5CFn1FipeTcoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ1P+MySQ1KtlwUpoKYmMy/JkOukBkxtYQyxe2thI2poszYbEo2BG/15XSvqp6btVrupX6TR5HEc7gHC7Bg2uowz0oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8Al62MxQ=</latexit>F
<latexit sha1_base64="uMv5FUJHmHt3pFNm7BO6XhGtAXY=">AB6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UYMCmKZgHlAcoS9zVyZm/v2N0TQgjY21goYus/Y2/nf+PmUWjiBws/vm+GnZkwFVwbz/t2lpZXVtfWcxvu5tb2zm5+b7+mk0wxrLJEJKoRUo2CS6wabgQ2UoU0DgXWw/71OK8/oNI8kXdmkGIQ067kEWfUWKty084XvKI3EVkEfwaFy0/34hEAyu38V6uTsCxGaZigWjd9LzXBkCrDmcCR28o0pT1aRebFiWNUQfDyaAjcmydDokSZ80ZOL+7hjSWOtBHNrKmJqens/G5n9ZMzPReTDkMs0MSjb9KMoEMQkZb06XCEzYmCBMsXtrIT1qKLM2Nu49gj+/MqLUDst+l7Rr3iF0hVMlYNDOIT8OEMSnALZagCA4QneIFX595dt6c92npkjPrOYA/cj5+Ah4jpM=</latexit><latexit sha1_base64="HEQ4IExNQzo4E9k8+oDEYThRsbY=">AB6HicbZDLSsNAFIZP6q3GW9Wlm2ARXJXEjW7EoiAuW7AXaEOZTE/asZNJmJkIJfQJ3LhQxK0+jHs34ts4vSy09YeBj/8/hznBAlnSrvut5VbWl5ZXcuv2xubW9s7hd29uopTSbFGYx7LZkAUciawpnm2Ewkijg2AgGV+O8cY9SsVjc6mGCfkR6goWMEm2s6nWnUHRL7kTOIngzKF582OfJ+5d6RQ+292YphEKTlRquW5ifYzIjWjHEd2O1WYEDogPWwZFCRC5WeTQUfOkXG6ThL84R2Ju7vjoxESg2jwFRGRPfVfDY2/8taqQ7P/IyJNUo6PSjMOWOjp3x1k6XSaSaDw0QKpmZ1aF9IgnV5ja2OYI3v/Ii1E9Knlvyqm6xfAlT5eEADuEYPDiFMtxABWpAeEBnuDZurMerRfrdVqas2Y9+/BH1tsP+fiQBw=</latexit><latexit sha1_base64="HEQ4IExNQzo4E9k8+oDEYThRsbY=">AB6HicbZDLSsNAFIZP6q3GW9Wlm2ARXJXEjW7EoiAuW7AXaEOZTE/asZNJmJkIJfQJ3LhQxK0+jHs34ts4vSy09YeBj/8/hznBAlnSrvut5VbWl5ZXcuv2xubW9s7hd29uopTSbFGYx7LZkAUciawpnm2Ewkijg2AgGV+O8cY9SsVjc6mGCfkR6goWMEm2s6nWnUHRL7kTOIngzKF582OfJ+5d6RQ+292YphEKTlRquW5ifYzIjWjHEd2O1WYEDogPWwZFCRC5WeTQUfOkXG6ThL84R2Ju7vjoxESg2jwFRGRPfVfDY2/8taqQ7P/IyJNUo6PSjMOWOjp3x1k6XSaSaDw0QKpmZ1aF9IgnV5ja2OYI3v/Ii1E9Knlvyqm6xfAlT5eEADuEYPDiFMtxABWpAeEBnuDZurMerRfrdVqas2Y9+/BH1tsP+fiQBw=</latexit><latexit sha1_base64="XFqyBFTgCEUxwJTuQd64ewaUeqA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m82GNREI8t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJYPZpqgH9GR5CFn1FipeTcoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ1P+MySQ1KtlwUpoKYmMy/JkOukBkxtYQyxe2thI2poszYbEo2BG/15XSvqp6btVrupX6TR5HEc7gHC7Bg2uowz0oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8AmTGMxg=</latexit>W
<latexit sha1_base64="SgWXombg1msUYlBJWSd7nhjmw3I=">AB6HicbZC7SwNBEMbn4ivGV9TSZjEIVuHORhsxaGOZgHlAcoS9zVyZm/v2N0TwhGwt7FQxNZ/xt7O/8bNo9DEDxZ+fN8MOzNBIrg2rvt5FZW19Y38puFre2d3b3i/kFDx6liWGexiFUroBoFl1g3AhsJQpFAhsBsObSd58QKV5LO/MKE/on3JQ86osVat2S2W3LI7FVkGbw6lq8/C5SMAVLvFr04vZmE0jBtW57bmL8jCrDmcBxoZNqTCgb0j62LUoaofaz6aBjcmKdHgljZ80ZOr+7shopPUoCmxlRM1AL2YT87+snZrws+4TFKDks0+ClNBTEwmW5MeV8iMGFmgTHE7K2EDqigz9jYFewRvceVlaJyVPbfs1dxS5RpmysMRHMpeHAOFbiFKtSBAcITvMCrc+8O2/O+6w058x7DuGPnI8fIjyOpA=</latexit><latexit sha1_base64="So0n/WFrGezh4Ql0YSNr0dJmr0=">AB6HicbZC7SgNBFIbPeo3rLWpMxgEq7Bro40YtLFMwFwgCWF2cjYZMzu7zMwKYckT2FgoYqsPY28jvo2TS6GJPwx8/P85zDknSATXxvO+naXldW19dyGu7m1vbOb39uv6ThVDKsFrFqBFSj4BKrhuBjUQhjQKB9WBwPc7r96g0j+WtGSbYjmhP8pAzaqxVqXfyBa/oTUQWwZ9B4fLDvUjev9xyJ/Z6sYsjVAaJqjWTd9LTDujynAmcOS2Uo0JZQPaw6ZFSPU7Wwy6IgcW6dLwljZJw2ZuL87MhpPYwCWxlR09fz2dj8L2umJjxvZ1wmqUHJph+FqSAmJuOtSZcrZEYMLVCmuJ2VsD5VlBl7G9cewZ9feRFqp0XfK/oVr1C6gqlycAhHcAI+nEJbqAMVWCA8ABP8OzcOY/Oi/M6LV1yZj0H8EfO2w8Ty5AY</latexit><latexit sha1_base64="So0n/WFrGezh4Ql0YSNr0dJmr0=">AB6HicbZC7SgNBFIbPeo3rLWpMxgEq7Bro40YtLFMwFwgCWF2cjYZMzu7zMwKYckT2FgoYqsPY28jvo2TS6GJPwx8/P85zDknSATXxvO+naXldW19dyGu7m1vbOb39uv6ThVDKsFrFqBFSj4BKrhuBjUQhjQKB9WBwPc7r96g0j+WtGSbYjmhP8pAzaqxVqXfyBa/oTUQWwZ9B4fLDvUjev9xyJ/Z6sYsjVAaJqjWTd9LTDujynAmcOS2Uo0JZQPaw6ZFSPU7Wwy6IgcW6dLwljZJw2ZuL87MhpPYwCWxlR09fz2dj8L2umJjxvZ1wmqUHJph+FqSAmJuOtSZcrZEYMLVCmuJ2VsD5VlBl7G9cewZ9feRFqp0XfK/oVr1C6gqlycAhHcAI+nEJbqAMVWCA8ABP8OzcOY/Oi/M6LV1yZj0H8EfO2w8Ty5AY</latexit><latexit sha1_base64="9SEGhucspoU8Qdg45nZgK4r6jcA=">AB6HicbVBNT8JAEJ3iF+IX6tHLRmLibRe9Ej04hESCyTQkO0yhZXtndmpCGX+DFg8Z49Sd589+4QA8KvmSl/dmMjMvTAXxnW/ndLG5tb2Tnm3srd/cHhUPT5p6yRTDH2WiER1Q6pRcIm+4UZgN1VI41BgJ5zczf3OEyrNE/lgpikGMR1JHnFGjZVanUG15tbdBcg68QpSgwLNQfWrP0xYFqM0TFCte56bmiCnynAmcFbpZxpTyiZ0hD1LJY1RB/ni0Bm5sMqQRImyJQ1ZqL8nchprPY1D2xlTM9ar3lz8z+tlJroJci7TzKBky0VRJohJyPxrMuQKmRFTSyhT3N5K2JgqyozNpmJD8FZfXiftq7rn1r2W2vcFnGU4QzO4RI8uIYG3EMTfGCA8Ayv8OY8Oi/Ou/OxbC05xcwp/IHz+QOy9YzX</latexit>…
Position Prediction Word Prediction
… (a)
<latexit sha1_base64="282JehD7WAjx0hpBWFCbBzguNs=">AB6nicbVDLSgNBEOz1GeNr1aOXwSDES9j1oseAF48JmgckS5id9CZDZmeXmVkhLPkELx4U8eqH+A3e/As/wcnjoIkFDUVN91dYSq4Np735aytb2xubRd2irt7+weH7tFxUyeZYthgiUhUO6QaBZfYMNwIbKcKaRwKbIWjm6nfekCleSLvzTjFIKYDySPOqLHSXZle9NySV/FmIKvEX5BS1a1/fwBAred+dvsJy2KUhgmqdcf3UhPkVBnOBE6K3UxjStmIDrBjqaQx6iCfnToh51bpkyhRtqQhM/X3RE5jrcdxaDtjaoZ62ZuK/3mdzETXQc5lmhmUbL4oygQxCZn+TfpcITNibAlitbCRtSRZmx6RtCP7y6ukeVnxvYpft2l4MEcBTuEMyuDFVThFmrQAYDeIRneHGE8+S8Om/z1jVnMXMCf+C8/wChyI+F</latexit><latexit sha1_base64="NcIupzmf3RhxzTFJaklXvkry3Y0=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlvBZ1y16ZW8KtEz8OSlW3Nr3x83RQ7XrfnZ6kqQxFYZwrHXb9xITZFgZRjgdFzqpgkmQ9ynbUsFjqkOsumpY3RqlR6KpLIlDJqvycyHGs9ikPbGWMz0IveRPzPa6cmugwyJpLUEFmi6KUIyPR5G/UY4oSw0eWYKYvRWRAVaYGJtOwYbgL768TBrnZd8r+zWbhgcz5OEYTqAEPlxABa6hCnUg0IdHeIYXhztPzqvzNmvNOfOZQ/gD5/0H5aQeQ=</latexit><latexit sha1_base64="NcIupzmf3RhxzTFJaklXvkry3Y0=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlvBZ1y16ZW8KtEz8OSlW3Nr3x83RQ7XrfnZ6kqQxFYZwrHXb9xITZFgZRjgdFzqpgkmQ9ynbUsFjqkOsumpY3RqlR6KpLIlDJqvycyHGs9ikPbGWMz0IveRPzPa6cmugwyJpLUEFmi6KUIyPR5G/UY4oSw0eWYKYvRWRAVaYGJtOwYbgL768TBrnZd8r+zWbhgcz5OEYTqAEPlxABa6hCnUg0IdHeIYXhztPzqvzNmvNOfOZQ/gD5/0H5aQeQ=</latexit><latexit sha1_base64="fIGXaRFLs1xFgKurVPdeLoqlJM=">AB6nicbVA9SwNBEJ2LXzF+RS1tFoMQm7Bno2XAxjKi+YDkCHubSbJkb+/Y3RPCkZ9gY6GIrb/Izn/jJrlCEx8MPN6bYWZemEhLKXfXmFjc2t7p7hb2ts/ODwqH5+0TJxqjk0ey1h3QmZQCoVNK6zETqKRaHEdji5nfvtJ9RGxOrRThMIjZSYig4s056qLfrlCa3QBsk78nFQgR6Nf/uoNYp5GqCyXzJiuTxMbZExbwSXOSr3UYML4hI2w6hiEZogW5w6IxdOGZBhrF0pSxbq74mMRcZMo9B1RsyOzao3F/zuqkd3gSZUElqUfHlomEqiY3J/G8yEBq5lVNHGNfC3Ur4mGnGrUun5ELwV19eJ62rmk9r/j2t1GkeRxHO4Byq4M1OEOGtAEDiN4hld486T34r17H8vWgpfPnMIfeJ8/gGNA=</latexit>(b)
<latexit sha1_base64="TY/LbeBHIxrBTnqAw4a9X8UHtFM=">AB6nicbVDLSgNBEOz1GeNr1aOXwSDES9j1oseAF48JmgckS5id9CZDZmeXmVkhLPkELx4U8eqH+A3e/As/wcnjoIkFDUVN91dYSq4Np735aytb2xubRd2irt7+weH7tFxUyeZYthgiUhUO6QaBZfYMNwIbKcKaRwKbIWjm6nfekCleSLvzTjFIKYDySPOqLHSXTm86Lklr+LNQFaJvyClqlv/gCAWs/97PYTlsUoDRNU647vpSbIqTKcCZwUu5nGlLIRHWDHUklj1E+O3VCzq3SJ1GibElDZurviZzGWo/j0HbG1Az1sjcV/M6mYmug5zLNDMo2XxRlAliEjL9m/S5QmbE2BLKFLe3EjakijJj0ynaEPzl1dJ87LiexW/btPwYI4CnMIZlMGHK6jCLdSgAQwG8AjP8OI58l5d7mrWvOYuYE/sB5/wGjTY+G</latexit><latexit sha1_base64="SVs4Ppr5xYUt4bgzNkXsuF7L3yw=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlsKzrlv0yt4UaJn4c1KsuLXvj5ujh2rX/ez0JEljKgzhWOu27yUmyLAyjHA6LnRSTRNMhrhP25YKHFMdZNTx+jUKj0USWVLGDRVf09kONZ6FIe2M8ZmoBe9ifif105NdBlkTCSpoYLMFkUpR0aiyd+oxQlho8swUQxeysiA6wMTadg3BX3x5mTOy75X9ms2DQ9myMxnEAJfLiAClxDFepAoA+P8AwvDnenFfnbdac+Yzh/AHzvsP6RuQeg=</latexit><latexit sha1_base64="SVs4Ppr5xYUt4bgzNkXsuF7L3yw=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlsKzrlv0yt4UaJn4c1KsuLXvj5ujh2rX/ez0JEljKgzhWOu27yUmyLAyjHA6LnRSTRNMhrhP25YKHFMdZNTx+jUKj0USWVLGDRVf09kONZ6FIe2M8ZmoBe9ifif105NdBlkTCSpoYLMFkUpR0aiyd+oxQlho8swUQxeysiA6wMTadg3BX3x5mTOy75X9ms2DQ9myMxnEAJfLiAClxDFepAoA+P8AwvDnenFfnbdac+Yzh/AHzvsP6RuQeg=</latexit><latexit sha1_base64="Q9Q3RlmsHuz9bAIr9Qt5YJQR75o=">AB6nicbVA9SwNBEJ2LXzF+RS1tFoMQm7Bno2XAxjKi+YDkCHubSbJkb+/Y3RPCkZ9gY6GIrb/Izn/jJrlCEx8MPN6bYWZemEhLKXfXmFjc2t7p7hb2ts/ODwqH5+0TJxqjk0ey1h3QmZQCoVNK6zETqKRaHEdji5nfvtJ9RGxOrRThMIjZSYig4s056qIaX/XKF1ugCZJ34OalAjka/NUbxDyNUFkumTFdnyY2yJi2gkuclXqpwYTxCRth1HFIjRBtjh1Ri6cMiDWLtSlizU3xMZi4yZRqHrjJgdm1VvLv7ndVM7vAkyoZLUouLRcNUEhuT+d9kIDRyK6eOMK6Fu5XwMdOMW5dOyYXgr768TlpXNZ/W/HtaqdM8jiKcwTlUwYdrqMdNKAJHEbwDK/w5knvxXv3PpatBS+fOYU/8D5/AIOGjTU=</latexit> <latexit sha1_base64="a2Ke1LnYi/s9aZk8SGWv2R5+0=">AB6nicbVDLSgNBEOz1GeNr1aOXwSDES9j1oseAF48JmgckS5id9CZDZmeXmVkhLPkELx4U8eqH+A3e/As/wcnjoIkFDUVN91dYSq4Np735aytb2xubRd2irt7+weH7tFxUyeZYthgiUhUO6QaBZfYMNwIbKcKaRwKbIWjm6nfekCleSLvzTjFIKYDySPOqLHSXZld9NySV/FmIKvEX5BS1a1/fwBAred+dvsJy2KUhgmqdcf3UhPkVBnOBE6K3UxjStmIDrBjqaQx6iCfnToh51bpkyhRtqQhM/X3RE5jrcdxaDtjaoZ62ZuK/3mdzETXQc5lmhmUbL4oygQxCZn+TfpcITNibAlitbCRtSRZmx6RtCP7y6ukeVnxvYpft2l4MEcBTuEMyuDFVThFmrQAYDeIRneHGE8+S8Om/z1jVnMXMCf+C8/wCk0o+H</latexit><latexit sha1_base64="Y05+glcJ37EcxNCi42td7bLKjE=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlshZ1y16ZW8KtEz8OSlW3Nr3x83RQ7XrfnZ6kqQxFYZwrHXb9xITZFgZRjgdFzqpgkmQ9ynbUsFjqkOsumpY3RqlR6KpLIlDJqvycyHGs9ikPbGWMz0IveRPzPa6cmugwyJpLUEFmi6KUIyPR5G/UY4oSw0eWYKYvRWRAVaYGJtOwYbgL768TBrnZd8r+zWbhgcz5OEYTqAEPlxABa6hCnUg0IdHeIYXhztPzqvzNmvNOfOZQ/gD5/0H6qCQew=</latexit><latexit sha1_base64="Y05+glcJ37EcxNCi42td7bLKjE=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlshZ1y16ZW8KtEz8OSlW3Nr3x83RQ7XrfnZ6kqQxFYZwrHXb9xITZFgZRjgdFzqpgkmQ9ynbUsFjqkOsumpY3RqlR6KpLIlDJqvycyHGs9ikPbGWMz0IveRPzPa6cmugwyJpLUEFmi6KUIyPR5G/UY4oSw0eWYKYvRWRAVaYGJtOwYbgL768TBrnZd8r+zWbhgcz5OEYTqAEPlxABa6hCnUg0IdHeIYXhztPzqvzNmvNOfOZQ/gD5/0H6qCQew=</latexit><latexit sha1_base64="MAPWp0gE1a27E4Jf1OKQ5jsE+LQ=">AB6nicbVA9SwNBEJ2LXzF+RS1tFoMQm7Bno2XAxjKi+YDkCHubSbJkb+/Y3RPCkZ9gY6GIrb/Izn/jJrlCEx8MPN6bYWZemEhLKXfXmFjc2t7p7hb2ts/ODwqH5+0TJxqjk0ey1h3QmZQCoVNK6zETqKRaHEdji5nfvtJ9RGxOrRThMIjZSYig4s056qPLfrlCa3QBsk78nFQgR6Nf/uoNYp5GqCyXzJiuTxMbZExbwSXOSr3UYML4hI2w6hiEZogW5w6IxdOGZBhrF0pSxbq74mMRcZMo9B1RsyOzao3F/zuqkd3gSZUElqUfHlomEqiY3J/G8yEBq5lVNHGNfC3Ur4mGnGrUun5ELwV19eJ62rmk9r/j2t1GkeRxHO4Byq4M1OEOGtAEDiN4hld486T34r17H8vWgpfPnMIfeJ8/hQuNg=</latexit>LELBO = E
π⇠q log pθ(yπ|x) + H(q)
= E
r2:T +1⇠q
@
T+1
X
t=1
log pθ(yt+1|y0:t, r0:t, x1:T 0) | {z }
Word Prediction Loss
+
T
X
t=1
log pθ(rt+1|y0:t+1, r0:t, x1:T 0) | {z }
Position Prediction Loss
1 A + H(q), (10)
LSAO = 1 B X
π2B
log pθ(yπ|x) ⇢
where we assume q(π|x, y) = ⇢
1/B π 2 B
q(π|x, y)
Pre-defined Order Descriptions Left-to-right (L2R) Generate words from left to right. (Wu et al., 2018) Right-to-left (R2L) Generate words from right to left. (Wu et al., 2018) Odd-Even (ODD) Generate words at odd positions from left to right, then generate even positions. (Ford et al., 2018) Balanced-tree (BLT) Generate words with a top-down left-to-right order from a balanced binary tree. (Stern et al., 2019) Syntax-tree (SYN) Generate words with a top-down left-to-right order from the dependency tree. (Wang et al., 2018b) Common-First (CF) Generate all common words first from left to right, and then generate the others. (Ford et al., 2018) Rare-First (RF) Generate all rare words first from left to right, and then generate the remaining. (Ford et al., 2018) Random (RND) Generate words in a random order shuffled every time the example was loaded.
Model WMT16 Ro ! En WMT18 En ! Tr KFTT En ! Ja BLEU Ribes Meteor TER BLEU Ribes Meteor TER BLEU Ribes Meteor TER RND 20.20 79.35 41.00 63.20 03.04 55.45 19.12 90.60 17.09 70.89 35.24 70.11 L2R 31.82 83.37 52.19 50.62 14.85 69.20 33.90 71.56 30.87 77.72 48.57 59.92 R2L 31.62 83.18 52.09 50.20 14.38 68.87 33.33 71.91 30.44 77.95 47.91 61.09 ODD 30.11 83.09 50.68 50.79 13.64 68.85 32.48 72.84 28.59 77.01 46.28 60.12 BLT 24.38 81.70 45.67 55.38 08.72 65.70 27.40 77.76 21.50 73.97 40.23 64.39 SYN 29.62 82.65 50.25 52.14 – – CF 30.25 83.22 50.71 50.72 12.04 67.61 31.18 74.75 28.91 77.06 46.46 61.56 RF 30.23 83.29 50.72 51.73 12.10 67.44 30.72 73.40 27.35 76.40 45.15 62.14 SAO 32.47 84.10 53.00 49.02 15.18 70.06 34.60 71.56 31.91 77.56 49.66 59.80
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang∗1, Zihang Dai∗12, Yiming Yang1, Jaime Carbonell1, Ruslan Salakhutdinov1, Quoc V. Le2
1Carnegie Mellon University, 2Google Brain
{zhiliny,dzihang,yiming,jgc,rsalakhu}@cs.cmu.edu, qvl@google.com
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang∗1, Zihang Dai∗12, Yiming Yang1, Jaime Carbonell1, Ruslan Salakhutdinov1, Quoc V. Le2
1Carnegie Mellon University, 2Google Brain
{zhiliny,dzihang,yiming,jgc,rsalakhu}@cs.cmu.edu, qvl@google.com
???
very natural.
# Model RACE SQuAD2.0 MNLI SST-2 F1 EM m/mm 1 BERT-Base 64.3 76.30 73.66 84.34/84.65 92.78 2 DAE + Transformer-XL 65.03 79.56 76.80 84.88/84.45 92.60 3 XLNet-Base (K = 7) 66.05 81.33 78.46 85.84/85.43 92.66 4 XLNet-Base (K = 6) 66.66 80.98 78.18 85.63/85.12 93.35 5
65.55 80.15 77.27 85.32/85.05 92.78 6
65.95 80.61 77.91 85.49/85.02 93.12 7
66.34 80.65 77.87 85.31/84.99 92.66 8 + next-sent pred 66.76 79.83 76.94 85.32/85.09 92.89 able 6: Ablation study. The results of BERT on RACE are taken from [39]. We run BERT on the other datasets