Rethinking the Generation Orders of Sequence jcykcai Why - - PowerPoint PPT Presentation

rethinking the generation orders of sequence
SMART_READER_LITE
LIVE PREVIEW

Rethinking the Generation Orders of Sequence jcykcai Why - - PowerPoint PPT Presentation

Rethinking the Generation Orders of Sequence jcykcai Why left-to-right? Humans do it But humans also do First generate some abstract of what to say Then serialize them The Importance of Generation Order in Language Modeling


slide-1
SLIDE 1

Rethinking the Generation Orders of Sequence

jcykcai

slide-2
SLIDE 2

Why left-to-right?

  • Humans do it
  • But humans also do
  • First generate some abstract of what to say
  • Then serialize them
slide-3
SLIDE 3

The Importance of Generation Order in Language Modeling

Nicolas Ford∗ Daniel Duckworth Mohammad Norouzi George E. Dahl Google Brain {nicf,duckworthd,mnorouzi,gdahl}@google.com EMNLP18

slide-4
SLIDE 4

Goal

  • Better generation order?
  • Wait! Does it really matter?
slide-5
SLIDE 5

Framework

  • Two-pass language models
  • Vocabulary partition: first-pass and second-pass

tokens

  • Y = Y^1 + Y^2
  • Y^1 (template): only consist of first-pass tokens and

special placeholders

  • Y^2 the rest second-pass tokens
slide-6
SLIDE 6

Order Variants

sentence common first rare first function first content first

  • dd first

” all you need to do if you want the na- tion ’s press camped

  • n your doorstep is to

say you once had a [UNK] in 1947 , ” he noted memorably in his diary . [EOS] ” all you to if you the ’s

  • n

is to you had a [UNK] in , ” he in his . [EOS] need do want nation press camped your doorstep say

  • nce

1947 noted memorably diary [EOS] ” all you to if you the ’s

  • n your

is to you a in , ” he in his . [EOS] need do want nation press camped doorstep say

  • nce had

[UNK] 1947 noted memorably diary [EOS] ” all you need you the nation ’s press camped on your doorstep say you

  • nce had

” noted his . [EOS] the team announced thursday that the 6- foot-1 , [UNK] starter will remain in detroit through the 2013 sea- son . [EOS] the that the , [UNK] will in the . [EOS] team announced thursday 6-foot-1 starter remain detroit through 2013 season [EOS] the that the , will in through the . [EOS] team announced thursday 6-foot-1 [UNK] starter remain detroit 2013 season [EOS] the team announced the 6-foot-1 will remain through the 2013 . [EOS] scotland ’s next game is a friendly against the czech republic at hampden on 3 march . [EOS] ’s is a the at

  • n

. [EOS] scotland next game friendly against czech republic ham- pden 3 march [EOS] ’s is a against the at

  • n

. [EOS] scotland next game friendly czech republic ham- pden 3 march [EOS] ’s next game the czech republic at hampden on 3 march . [EOS]

  • f course , millions of

additional homeown- ers did make a big mis- take : they took ad- vantage of ” liar loans ” and other [UNK] deals to buy homes they couldn ’t afford . [EOS]

  • f

,

  • f

a : they

  • f

” ” and [UNK] to they ’t . [EOS] course millions additional homeown- ers did make big mistake took ad- vantage liar loans

  • ther

deals buy homes couldn afford [EOS]

  • f

,

  • f

a : they

  • f ”

” and to they . [EOS] course millions additional home-

  • wners

did make big mistake took advantage liar loans

  • ther

[UNK] deals buy homes couldn ’t afford [EOS]

  • f
  • f additional

big they advantage of ” liar ” and other deals buy homes they couldn afford . [EOS]

Table 1: Some example sentences from the dataset and their corresponding templates. The placeholder token is

slide-7
SLIDE 7

Language Models

  • The total probability of a sentence y is
  • The template y^1 is a deterministic function of y
  • Template decoder + Template encoder + second-phrase

decoder

p(y) = p1(y(1)) p2(y(2) | y(1))

slide-8
SLIDE 8

Experiments

  • PPL on LM1B
  • Content-dependent generation orders do have a large effect on model quality
  • Function-first is the best (common-first is the second)
  • It is easier to first decide syntactic structure
  • Delay the rare tokens

Model Train Validation Test

  • dd first

39.925 45.377 45.196 rare first 38.283 43.293 43.077 content first 38.321 42.564 42.394 common first 36.525 41.018 40.895 function first 36.126 40.246 40.085 baseline 38.668 41.888 41.721 enhanced baseline 35.945 39.845 39.726

slide-9
SLIDE 9

Recent Advances

https://arxiv.org/pdf/1902.01370.pdf https://arxiv.org/pdf/1902.02192.pdf https://arxiv.org/pdf/1902.03249.pdf

slide-10
SLIDE 10

Insertion Transformer: Flexible Sequence Generation via Insertion Operations

Mitchell Stern 1 2 William Chan 1 Jamie Kiros 1 Jakob Uszkoreit 1

ICML19

slide-11
SLIDE 11

Model

  • Architecture
  • Transformer with full self-attention decoder
  • Slot representations
  • Content-location distribution
  • What to insert & where to insert
  • p(c, l | x, ˆ

yt) = InsertionTransformer(x, ˆ yt). As an example, suppose our current hypothesis can

slide-12
SLIDE 12

Termination

  • Termination conditions
  • Sequence finalization
  • Slot finalization (enable parallel inference)
slide-13
SLIDE 13

Insertion Transformer: Flexible Sequence Generation via Insertion Operations

Serial generation:

t Canvas Insertion [] (ate, 0) 1 [ate] (together, 1) 2 [ate, together] (friends, 0) 3 [friends, ate, together] (three, 0) 4 [three, friends, ate, together] (lunch, 3) 5 [three, friends, ate, lunch, together] (hEOSi, 5)

Parallel generation:

t Canvas Insertions [] (ate, 0) 1 [ate] (friends, 0), (together, 1) 2 [friends, ate, together] (three, 0), (lunch, 2) 3 [three, friends, ate, lunch, together] (hEOSi, 5) Figure 1. Examples demonstrating how the clause “three friends ate lunch together” can be generated using our insertion framework. On the left, a serial generation process is used in which one insertion is performed at a time. On the right, a parallel generation process is used with multiple insertions being allowed per time step. Our model can either be trained to follow specific orderings or to maximize entropy

  • ver all valid actions. Some options permit highly efficient parallel decoding, as shown in our experiments.
slide-14
SLIDE 14

Training

  • The form of single training instances
  • Sample generation steps (partial sentences)
  • Variants
  • Left-to-right
  • Balanced Binary Tree
  • Uniform
slide-15
SLIDE 15

Results

Loss Termination BLEU (+EOS) BLEU (+EOS) BLEU (+EOS) +Distillation +Distillation, +Parallel Left-to-Right Sequence 20.92 (20.92) 23.29 (23.36)

  • Binary Tree (τ = 0.5)

Slot 20.35 (21.39) 24.49 (25.55) 25.33 (25.70) Binary Tree (τ = 1.0) Slot 21.02 (22.37) 24.36 (25.43) 25.43 (25.76) Binary Tree (τ = 2.0) Slot 20.52 (21.95) 24.59 (25.80) 25.33 (25.80) Uniform Sequence 19.34 (22.64) 22.75 (25.45)

  • Uniform

Slot 18.26 (22.16) 22.39 (25.58) 24.31 (24.91)

  • +Parallel is even better!
  • Greedy search may suffer from issues related to local search that are

circumvented by making multiple updates to the hypothesis at once.

slide-16
SLIDE 16

Results

Model BLEU Iterations Autoregressive Left-to-Right Transformer (Vaswani et al., 2017) 27.3 n Semi-Autoregressive Left-to-Right SAT (Wang et al., 2018) 24.83 n/6 Blockwise Parallel (Stern et al., 2018) 27.40 ⇡ n/5 Non-Autoregressive NAT (Gu et al., 2018) 17.69 1 Iterative Refinement (Lee et al., 2018) 21.61 10 Our Approach (Greedy) Insertion Transformer + Left-to-Right 23.94 n Insertion Transformer + Binary Tree 27.29 n Insertion Transformer + Uniform 27.12 n Our Approach (Parallel) Insertion Transformer + Binary Tree 27.41 ⇡ log2 n Insertion Transformer + Uniform 26.72 ⇡ log2 n

  • Comparable performance
  • Fewer generation iteration => faster?
slide-17
SLIDE 17

Limitations

  • Must recompute the decoder hidden stat for each

position after each insertion

  • Auto-regressive vs. non-autoregressive
  • Expressive power vs. parallel decoding
slide-18
SLIDE 18

Non-Monotonic Sequential Text Generation

Sean Welleck 1 Kiant´ e Brantley 2 Hal Daum´ e III 2 3 Kyunghyun Cho 1 4 5

are how ? you

<end> <end> <end> <end> <end>

1 2 3 4 5 6 7 8 9 4 1 8 3 2 5 6 7 9

ICML19

slide-19
SLIDE 19

Goal

  • Learn a good order without
  • specifying an order in advance.
  • additional annotation
slide-20
SLIDE 20

Formulation

  • Generating a word at an arbitrary position, then recursively

generating words to its left and words to its right.

are how ? you

<end> <end> <end> <end> <end>

1 2 3 4 5 6 7 8 9 4 1 8 3 2 5 6 7 9

Figure 1. A sequence, “how are you ?”, generated by the proposed

slide-21
SLIDE 21

Formulation

  • The full generation is performed in a level-order traversal. (green)
  • The output is read off from an in-order traversal. (blue)

are how ? you

<end> <end> <end> <end> <end>

1 2 3 4 5 6 7 8 9 4 1 8 3 2 5 6 7 9

Figure 1. A sequence, “how are you ?”, generated by the proposed

slide-22
SLIDE 22

Imitation Learning

  • Learn a generation policy that mimics the actions of an
  • racle generation policy
  • Oracle policies
  • Uniform oracle: similar to quick-sort
  • Coaching oracle: reinforce the policy’s own

preferences

  • Annealed coaching oracle:

π⇤

coaching(a|s) / π⇤ uniform(a|s) π(a|s)

π⇤

annealed(a|s) = βπ⇤

uniform(a|s) + (1 β)π⇤ coaching(a|s)

slide-23
SLIDE 23

Imitation Learning

  • Annealed coaching oracle
  • Random oracle encourages exploration
  • Reinforcement leads to a specific generation order
  • A special case for comparison
  • Deterministic Left-to-Right Oracle (standard order)
slide-24
SLIDE 24

Policy Networks

  • Partial binary tee is considered as a flat sequence of

nodes in a level-order traversal.

  • Essentially, still a sequence model
  • Transformer, LSTM can be applied.
slide-25
SLIDE 25

Experiments

  • Language Modeling on Persona-Chat dataset

Oracle %Novel %Unique Avg. Tokens Avg. Span BLEU left-right 17.8 97.0 11.9 1.0 47.0 uniform 98.3 99.9 13.0 1.43 40.0 annealed 93.1 98.2 10.6 1.31 56.2 Validation 97.0 100 12.1

  • Table 1. Statistics computed over 10,000 sampled sentences (in-
  • rder traversals of sampled trees with hendi tokens removed) for

policies trained on Persona-Chat. A sample is novel when it is not in the training set. Percent unique is the cardinality of the set of sampled sentences divided by the number of sampled sentences.

slide-26
SLIDE 26

Experiments

  • By POS analysis on different levels of the trees
  • Punctuation-first => easy-first
  • Pronoun before noun and verb => like dependency tree
slide-27
SLIDE 27

Experiments

  • Machine translation

Validation Test Oracle BLEU (BP) Meteor YiSi Ribes BLEU (BP) Meteor YiSi Ribes left-right 32.30 (0.95) 31.96 69.41 84.80 28.00 (1.00) 30.10 65.22 82.29 uniform 24.50 (0.84) 27.98 66.40 82.66 21.40 (0.86) 26.40 62.41 80.00 annealed 26.80 (0.88) 29.67 67.88 83.61 23.30 (0.91) 27.96 63.38 80.91 +tree-encoding 28.00 (0.86) 30.15 68.43 84.36 24.30 (0.91) 28.59 63.87 81.64 +hendi-tuning 29.10 (0.99) 31.00 68.81 83.51 24.60 (1.00) 29.30 64.18 80.53

  • BLEU focuses on getting a large number of 4-grams correct
  • The other three measures are less sensitive to exact word order and focus more on semantics.
slide-28
SLIDE 28

Limitations

  • Binary-tree => N-ary tree
  • Only produce a subset of all possible generation orders
  • Projective generation, no crossing of two edges when

nodes are lined up following the ignorer traversal.

slide-29
SLIDE 29

Insertion-based Decoding with automatically Inferred Generation Order

Jiatao Gu

†, Qi Liu † and Kyunghyun Cho †‡ †Facebook AI Research ‡New York University, CIFAR Azrieli Global Scholar †{jgu, qiliu, kyunghyuncho}@fb.com

slide-30
SLIDE 30

Goal

  • How can we decode a sequence in its best order?
slide-31
SLIDE 31

Model Design

  • Insertion-based (again)
  • Joint prediction of position and token
  • The problem of absolute position
  • Changes over decoding time (recomputing is costly!)
slide-32
SLIDE 32

Relative Positions

rt

i,j =

8 > < > : 1 zt

j > zt i (left)

zt

j = zt i (middle)

1 zt

j < zt i (right)

,

Rt+1 = 2 6 6 6 4 rt+1

t+1,0

Rt . . . rt+1

t+1,t

rt+1

t+1,0

· · · rt+1

t+1,t

3 7 7 7 5 (5)

slide-33
SLIDE 33

Decoding

<S> </S> dream I a

Relative Positions

Transformer-Decoder

R L R R L L

Causal Self-attention Update

a

  • 1

+1

R

key for insert at right

L

key for insert at left

ht

<latexit sha1_base64="bfTi8g3GUSdgtNjKa2Fy0oX2GsY=">AB6nicbVA9SwNBEJ2LXzF+RQUbm8UgWIU7m1gGbSwTNB+QHFvs5cs2ds7dueEcOQn2FgoYusvsrPxt7j5KDTxwcDjvRlm5gWJFAZd98vJra1vbG7ltws7u3v7B8XDo6aJU814g8Uy1u2AGi6F4g0UKHk70ZxGgeStYHQz9VuPXBsRq3scJ9yP6ECJUDCKVrob9rBXLldwaySrwFKVP6t8PAFDrFT+7/ZilEVfIJDWm47kJ+hnVKJjk0I3NTyhbEQHvGOpohE3fjY7dULOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PsTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tOwYbgLb+8SpqXZc8te3WbxjXMkYdTOIML8KACVbiFGjSAwQCe4AVeHek8O2/O+7w15yxmjuEPnI8f5jSPtA=</latexit><latexit sha1_base64="MUCBKc2QkRNerSU/qY62O5UdoSc=">AB6nicbVA9SwNBEJ2LXzF+RQUbm8UgWIU7Gy1DbCwTNImQHGFvs5cs2ds7dueEcOQn2FgoYmvrv/AX2Nn4W9x8FJr4YODx3gwz84JECoOu+XkVlbX1jfym4Wt7Z3dveL+QdPEqWa8wWIZ67uAGi6F4g0UKPldojmNAslbwfBq4rfuTYiVrc4Srgf0b4SoWAUrXQz6GK3WHL7hRkmXhzUqoc1b/Fe/Wj1i1+dnoxSyOukElqTNtzE/QzqlEwyceFTmp4QtmQ9nbUkUjbvxseuqYnFqlR8JY21JIpurviYxGxoyiwHZGFAdm0ZuI/3ntFMNLPxMqSZErNlsUpJgTCZ/k57QnKEcWUKZFvZWwgZU4Y2nYINwVt8eZk0z8ueW/bqNo0qzJCHYziBM/DgAipwDTVoAIM+PMATPDvSeXRenNdZa86ZzxzCHzhvPzeEkXA=</latexit><latexit sha1_base64="MUCBKc2QkRNerSU/qY62O5UdoSc=">AB6nicbVA9SwNBEJ2LXzF+RQUbm8UgWIU7Gy1DbCwTNImQHGFvs5cs2ds7dueEcOQn2FgoYmvrv/AX2Nn4W9x8FJr4YODx3gwz84JECoOu+XkVlbX1jfym4Wt7Z3dveL+QdPEqWa8wWIZ67uAGi6F4g0UKPldojmNAslbwfBq4rfuTYiVrc4Srgf0b4SoWAUrXQz6GK3WHL7hRkmXhzUqoc1b/Fe/Wj1i1+dnoxSyOukElqTNtzE/QzqlEwyceFTmp4QtmQ9nbUkUjbvxseuqYnFqlR8JY21JIpurviYxGxoyiwHZGFAdm0ZuI/3ntFMNLPxMqSZErNlsUpJgTCZ/k57QnKEcWUKZFvZWwgZU4Y2nYINwVt8eZk0z8ueW/bqNo0qzJCHYziBM/DgAipwDTVoAIM+PMATPDvSeXRenNdZa86ZzxzCHzhvPzeEkXA=</latexit><latexit sha1_base64="O7FfWHYl4nxml/caqTcK6XxGg/w=">AB6nicbVA9TwJBEJ3DL8Qv1NJmIzGxInc2WhJtLDEKksCF7C17sGFv97I7Z0Iu/AQbC42x9RfZ+W9c4AoFXzLJy3szmZkXpVJY9P1vr7S2vrG5Vd6u7Ozu7R9UD4/aVmeG8RbTUptORC2XQvEWCpS8kxpOk0jyx2h8M/Mfn7ixQqsHnKQ8TOhQiVgwik6H/WxX635dX8OskqCgtSgQLNf/eoNMsSrpBJam038FMc2pQMmnlV5meUrZmA51FE27DfH7qlJw5ZUBibVwpJHP190ROE2snSeQ6E4oju+zNxP+8bobxVZgLlWbIFVsijNJUJPZ32QgDGcoJ45QZoS7lbARNZShS6fiQgiWX14l7Yt64NeDO7/WuC7iKMJnMI5BHAJDbiFJrSAwRCe4RXePOm9eO/ex6K15BUzx/AH3ucPV5yNzw=</latexit>

C

<latexit sha1_base64="qitYjHnTvWLMhbcMUcBblbEo8=">AB6HicbZC7SwNBEMbnfMb4ilraLAbBKtzZaCMG01gmYB6QHGFvM5es2ds7dveEcATsbSwUsfWfsbfzv3HzKDTxg4Uf3zfDzkyQCK6N6347K6tr6xubua389s7u3n7h4LCh41QxrLNYxKoVUI2CS6wbgS2EoU0CgQ2g2FlkjcfUGkeyzszStCPaF/ykDNqrFWrdAtFt+RORZbBm0Px+jN/9QgA1W7hq9OLWRqhNExQrduemxg/o8pwJnCc76QaE8qGtI9ti5JGqP1sOuiYnFqnR8JY2ScNmbq/OzIaT2KAlsZUTPQi9nE/C9rpya89DMuk9SgZLOPwlQE5PJ1qTHFTIjRhYoU9zOStiAKsqMvU3eHsFbXHkZGuclzy15NbdYvoGZcnAMJ3AGHlxAGW6hCnVgPAEL/Dq3DvPzpvzPitdceY9R/BHzscPA+yOkA=</latexit><latexit sha1_base64="EQAM1j/Ie8eqC8qE7NRLV0Spmg=">AB6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EYBrLBMwFkiXMTs4mY2Znl5lZISx5AhsLRWz1YextxLdxcik08YeBj/8/hznBAlnSrvut5VbWV1b38hv2lvbO7t7hf2DhopTSbFOYx7LVkAUciawrpnm2Eokijg2AyGlUnevEepWCxu9ShBPyJ9wUJGiTZWrdItFN2SO5WzDN4cilcf9mXy/mVXu4XPTi+maYRCU06Uantuov2MSM0ox7HdSRUmhA5JH9sGBYlQ+dl0LFzYpyeE8bSPKGdqfu7IyORUqMoMJUR0QO1mE3M/7J2qsMLP2MiSTUKOvsoTLmjY2eytdNjEqnmIwOESmZmdeiASEK1uY1tjuAtrwMjbOS5a8mlsX8NMeTiCYzgFD86hDdQhTpQHiAJ3i27qxH68V6nZXmrHnPIfyR9fYD9WyQBA=</latexit><latexit sha1_base64="EQAM1j/Ie8eqC8qE7NRLV0Spmg=">AB6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EYBrLBMwFkiXMTs4mY2Znl5lZISx5AhsLRWz1YextxLdxcik08YeBj/8/hznBAlnSrvut5VbWV1b38hv2lvbO7t7hf2DhopTSbFOYx7LVkAUciawrpnm2Eokijg2AyGlUnevEepWCxu9ShBPyJ9wUJGiTZWrdItFN2SO5WzDN4cilcf9mXy/mVXu4XPTi+maYRCU06Uantuov2MSM0ox7HdSRUmhA5JH9sGBYlQ+dl0LFzYpyeE8bSPKGdqfu7IyORUqMoMJUR0QO1mE3M/7J2qsMLP2MiSTUKOvsoTLmjY2eytdNjEqnmIwOESmZmdeiASEK1uY1tjuAtrwMjbOS5a8mlsX8NMeTiCYzgFD86hDdQhTpQHiAJ3i27qxH68V6nZXmrHnPIfyR9fYD9WyQBA=</latexit><latexit sha1_base64="n9q4lqwqCrBAGkuJ+w9AiHXD4=">AB6HicbVA9TwJBEJ3DL8Qv1NJmIzGxInc2UhJpLCGRjwQuZG+Zg5W9vcvungm58AtsLDTG1p9k579xgSsUfMkL+/NZGZekAiujet+O4Wt7Z3dveJ+6eDw6PikfHrW0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8G0sfC7T6g0j+WDmSXoR3QsecgZNVZqNYblilt1lyCbxMtJBXI0h+WvwShmaYTSMEG17ntuYvyMKsOZwHlpkGpMKJvSMfYtlTRC7WfLQ+fkyiojEsbKljRkqf6eyGik9SwKbGdEzUSvewvxP6+fmrDmZ1wmqUHJVovCVBATk8XZMQVMiNmlCmuL2VsAlVlBmbTcmG4K2/vEk6N1XPrXot1K/y+MowgVcwjV4cAt1uIcmtIEBwjO8wpvz6Lw4787HqrXg5DPn8AfO5w+UpYzD</latexit>

D

<latexit sha1_base64="k+Evk9/LT2U3D2FUCx6x034befI=">AB6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UYMamGZgHlAcoS9zVyZm/v2N0TQgjY21goYus/Y2/nf+PmUWjiBws/vm+GnZkwFVwbz/t2lpZXVtfWcxvu5tb2zm5+b7+mk0wxrLJEJKoRUo2CS6wabgQ2UoU0DgXWw/71OK8/oNI8kXdmkGIQ067kEWfUWKty084XvKI3EVkEfwaFy0/34hEAyu38V6uTsCxGaZigWjd9LzXBkCrDmcCR28o0pT1aRebFiWNUQfDyaAjcmydDokSZ80ZOL+7hjSWOtBHNrKmJqens/G5n9ZMzPReTDkMs0MSjb9KMoEMQkZb06XCEzYmCBMsXtrIT1qKLM2Nu49gj+/MqLUDst+l7Rr3iF0hVMlYNDOIT8OEMSnALZagCA4QneIFX595dt6c92npkjPrOYA/cj5+AVwjpE=</latexit><latexit sha1_base64="G8RSHBlg9lJEMWdbTCPcRxw1n8=">AB6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EoBaWCZgLJEuYnZxNxszOLjOzQljyBDYWitjqw9jbiG/j5FJo4g8DH/9/DnPOCRLOlHbdbyu3tLyupZftzc2t7Z3Crt7dRWnkmKNxjyWzYAo5ExgTPNsZlIJFHAsREMrsZ54x6lYrG41cME/Yj0BAsZJdpY1etOoeiW3ImcRfBmULz4sM+T9y+70il8trsxTSMUmnKiVMtzE+1nRGpGOY7sdqowIXRAetgyKEiEys8mg46cI+N0nTCW5gntTNzfHRmJlBpGgamMiO6r+Wxs/pe1Uh2e+RkTSapR0OlHYcodHTvjrZ0uk0g1HxogVDIzq0P7RBKqzW1scwRvfuVFqJ+UPLfkVd1i+RKmysMBHMIxeHAKZbiBCtSAsIDPMGzdWc9Wi/W67Q0Z8169uGPrLcf9vCQBQ=</latexit><latexit sha1_base64="G8RSHBlg9lJEMWdbTCPcRxw1n8=">AB6HicbZC7SgNBFIbPxltcb1FLm8UgWIVdG23EoBaWCZgLJEuYnZxNxszOLjOzQljyBDYWitjqw9jbiG/j5FJo4g8DH/9/DnPOCRLOlHbdbyu3tLyupZftzc2t7Z3Crt7dRWnkmKNxjyWzYAo5ExgTPNsZlIJFHAsREMrsZ54x6lYrG41cME/Yj0BAsZJdpY1etOoeiW3ImcRfBmULz4sM+T9y+70il8trsxTSMUmnKiVMtzE+1nRGpGOY7sdqowIXRAetgyKEiEys8mg46cI+N0nTCW5gntTNzfHRmJlBpGgamMiO6r+Wxs/pe1Uh2e+RkTSapR0OlHYcodHTvjrZ0uk0g1HxogVDIzq0P7RBKqzW1scwRvfuVFqJ+UPLfkVd1i+RKmysMBHMIxeHAKZbiBCtSAsIDPMGzdWc9Wi/W67Q0Z8169uGPrLcf9vCQBQ=</latexit><latexit sha1_base64="N532L3S4+3uGp4iHp6lOnatguw=">AB6HicbVA9SwNBEJ2LXzF+RS1tFoNgFe5sTBnUwjIB8wHJEfY2c8mavb1jd08IR36BjYUitv4kO/+Nm+QKTXw8Hhvhpl5QSK4Nq7RQ2Nre2d4q7pb39g8Oj8vFJW8epYthisYhVN6AaBZfYMtwI7CYKaRQI7AST27nfeUKleSwfzDRBP6IjyUPOqLFS825QrhVdwGyTrycVCBHY1D+6g9jlkYoDRNU657nJsbPqDKcCZyV+qnGhLIJHWHPUkj1H62OHRGLqwyJGsbElDFurviYxGWk+jwHZG1Iz1qjcX/N6qQlrfsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXdCv1mzyOIpzBOVyCB9dQh3toQAsYIDzDK7w5j86L8+58LFsLTj5zCn/gfP4AlimMxA=</latexit>

E

<latexit sha1_base64="NlFKxwEorTyvX+ZWl7isJb1VZk=">AB6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UYMimCZgHlAcoS9zVyZm/v2N0TQgjY21goYus/Y2/nf+PmUWjiBws/vm+GnZkwFVwbz/t2lpZXVtfWcxvu5tb2zm5+b7+mk0wxrLJEJKoRUo2CS6wabgQ2UoU0DgXWw/71OK8/oNI8kXdmkGIQ067kEWfUWKty084XvKI3EVkEfwaFy0/34hEAyu38V6uTsCxGaZigWjd9LzXBkCrDmcCR28o0pT1aRebFiWNUQfDyaAjcmydDokSZ80ZOL+7hjSWOtBHNrKmJqens/G5n9ZMzPReTDkMs0MSjb9KMoEMQkZb06XCEzYmCBMsXtrIT1qKLM2Nu49gj+/MqLUDst+l7Rr3iF0hVMlYNDOIT8OEMSnALZagCA4QneIFX595dt6c92npkjPrOYA/cj5+Ab0jpI=</latexit><latexit sha1_base64="bH4UPRdYnmM6bMmSz7bDix7UCZc=">AB6HicbZDLSsNAFIZP6q3GW9Wlm2ARXJXEjW7EoguW7AXaEOZTE/asZNJmJkIJfQJ3LhQxK0+jHs34ts4vSy09YeBj/8/hznBAlnSrvut5VbWl5ZXcuv2xubW9s7hd29uopTSbFGYx7LZkAUciawpnm2Ewkijg2AgGV+O8cY9SsVjc6mGCfkR6goWMEm2s6nWnUHRL7kTOIngzKF582OfJ+5d6RQ+292YphEKTlRquW5ifYzIjWjHEd2O1WYEDogPWwZFCRC5WeTQUfOkXG6ThL84R2Ju7vjoxESg2jwFRGRPfVfDY2/8taqQ7P/IyJNUo6PSjMOWOjp3x1k6XSaSaDw0QKpmZ1aF9IgnV5ja2OYI3v/Ii1E9Knlvyqm6xfAlT5eEADuEYPDiFMtxABWpAeEBnuDZurMerRfrdVqas2Y9+/BH1tsP+HSQBg=</latexit><latexit sha1_base64="bH4UPRdYnmM6bMmSz7bDix7UCZc=">AB6HicbZDLSsNAFIZP6q3GW9Wlm2ARXJXEjW7EoguW7AXaEOZTE/asZNJmJkIJfQJ3LhQxK0+jHs34ts4vSy09YeBj/8/hznBAlnSrvut5VbWl5ZXcuv2xubW9s7hd29uopTSbFGYx7LZkAUciawpnm2Ewkijg2AgGV+O8cY9SsVjc6mGCfkR6goWMEm2s6nWnUHRL7kTOIngzKF582OfJ+5d6RQ+292YphEKTlRquW5ifYzIjWjHEd2O1WYEDogPWwZFCRC5WeTQUfOkXG6ThL84R2Ju7vjoxESg2jwFRGRPfVfDY2/8taqQ7P/IyJNUo6PSjMOWOjp3x1k6XSaSaDw0QKpmZ1aF9IgnV5ja2OYI3v/Ii1E9Knlvyqm6xfAlT5eEADuEYPDiFMtxABWpAeEBnuDZurMerRfrdVqas2Y9+/BH1tsP+HSQBg=</latexit><latexit sha1_base64="UuWujcJ1qnBgRNBp8ukQAU/1iwM=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m82GNRBI8t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJYPZpqgH9GR5CFn1FipeTcoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ1P+MySQ1KtlwUpoKYmMy/JkOukBkxtYQyxe2thI2poszYbEo2BG/15XSvqp6btVrupX6TR5HEc7gHC7Bg2uowz0oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8Al62MxQ=</latexit>

F

<latexit sha1_base64="uMv5FUJHmHt3pFNm7BO6XhGtAXY=">AB6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UYMCmKZgHlAcoS9zVyZm/v2N0TQgjY21goYus/Y2/nf+PmUWjiBws/vm+GnZkwFVwbz/t2lpZXVtfWcxvu5tb2zm5+b7+mk0wxrLJEJKoRUo2CS6wabgQ2UoU0DgXWw/71OK8/oNI8kXdmkGIQ067kEWfUWKty084XvKI3EVkEfwaFy0/34hEAyu38V6uTsCxGaZigWjd9LzXBkCrDmcCR28o0pT1aRebFiWNUQfDyaAjcmydDokSZ80ZOL+7hjSWOtBHNrKmJqens/G5n9ZMzPReTDkMs0MSjb9KMoEMQkZb06XCEzYmCBMsXtrIT1qKLM2Nu49gj+/MqLUDst+l7Rr3iF0hVMlYNDOIT8OEMSnALZagCA4QneIFX595dt6c92npkjPrOYA/cj5+Ah4jpM=</latexit><latexit sha1_base64="HEQ4IExNQzo4E9k8+oDEYThRsbY=">AB6HicbZDLSsNAFIZP6q3GW9Wlm2ARXJXEjW7EoiAuW7AXaEOZTE/asZNJmJkIJfQJ3LhQxK0+jHs34ts4vSy09YeBj/8/hznBAlnSrvut5VbWl5ZXcuv2xubW9s7hd29uopTSbFGYx7LZkAUciawpnm2Ewkijg2AgGV+O8cY9SsVjc6mGCfkR6goWMEm2s6nWnUHRL7kTOIngzKF582OfJ+5d6RQ+292YphEKTlRquW5ifYzIjWjHEd2O1WYEDogPWwZFCRC5WeTQUfOkXG6ThL84R2Ju7vjoxESg2jwFRGRPfVfDY2/8taqQ7P/IyJNUo6PSjMOWOjp3x1k6XSaSaDw0QKpmZ1aF9IgnV5ja2OYI3v/Ii1E9Knlvyqm6xfAlT5eEADuEYPDiFMtxABWpAeEBnuDZurMerRfrdVqas2Y9+/BH1tsP+fiQBw=</latexit><latexit sha1_base64="HEQ4IExNQzo4E9k8+oDEYThRsbY=">AB6HicbZDLSsNAFIZP6q3GW9Wlm2ARXJXEjW7EoiAuW7AXaEOZTE/asZNJmJkIJfQJ3LhQxK0+jHs34ts4vSy09YeBj/8/hznBAlnSrvut5VbWl5ZXcuv2xubW9s7hd29uopTSbFGYx7LZkAUciawpnm2Ewkijg2AgGV+O8cY9SsVjc6mGCfkR6goWMEm2s6nWnUHRL7kTOIngzKF582OfJ+5d6RQ+292YphEKTlRquW5ifYzIjWjHEd2O1WYEDogPWwZFCRC5WeTQUfOkXG6ThL84R2Ju7vjoxESg2jwFRGRPfVfDY2/8taqQ7P/IyJNUo6PSjMOWOjp3x1k6XSaSaDw0QKpmZ1aF9IgnV5ja2OYI3v/Ii1E9Knlvyqm6xfAlT5eEADuEYPDiFMtxABWpAeEBnuDZurMerRfrdVqas2Y9+/BH1tsP+fiQBw=</latexit><latexit sha1_base64="XFqyBFTgCEUxwJTuQd64ewaUeqA=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m82GNREI8t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJYPZpqgH9GR5CFn1FipeTcoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ1P+MySQ1KtlwUpoKYmMy/JkOukBkxtYQyxe2thI2poszYbEo2BG/15XSvqp6btVrupX6TR5HEc7gHC7Bg2uowz0oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8AmTGMxg=</latexit>

W

<latexit sha1_base64="SgWXombg1msUYlBJWSd7nhjmw3I=">AB6HicbZC7SwNBEMbn4ivGV9TSZjEIVuHORhsxaGOZgHlAcoS9zVyZm/v2N0TwhGwt7FQxNZ/xt7O/8bNo9DEDxZ+fN8MOzNBIrg2rvt5FZW19Y38puFre2d3b3i/kFDx6liWGexiFUroBoFl1g3AhsJQpFAhsBsObSd58QKV5LO/MKE/on3JQ86osVat2S2W3LI7FVkGbw6lq8/C5SMAVLvFr04vZmE0jBtW57bmL8jCrDmcBxoZNqTCgb0j62LUoaofaz6aBjcmKdHgljZ80ZOr+7shopPUoCmxlRM1AL2YT87+snZrws+4TFKDks0+ClNBTEwmW5MeV8iMGFmgTHE7K2EDqigz9jYFewRvceVlaJyVPbfs1dxS5RpmysMRHMpeHAOFbiFKtSBAcITvMCrc+8O2/O+6w058x7DuGPnI8fIjyOpA=</latexit><latexit sha1_base64="So0n/WFrGezh4Ql0YSNr0dJmr0=">AB6HicbZC7SgNBFIbPeo3rLWpMxgEq7Bro40YtLFMwFwgCWF2cjYZMzu7zMwKYckT2FgoYqsPY28jvo2TS6GJPwx8/P85zDknSATXxvO+naXldW19dyGu7m1vbOb39uv6ThVDKsFrFqBFSj4BKrhuBjUQhjQKB9WBwPc7r96g0j+WtGSbYjmhP8pAzaqxVqXfyBa/oTUQWwZ9B4fLDvUjev9xyJ/Z6sYsjVAaJqjWTd9LTDujynAmcOS2Uo0JZQPaw6ZFSPU7Wwy6IgcW6dLwljZJw2ZuL87MhpPYwCWxlR09fz2dj8L2umJjxvZ1wmqUHJph+FqSAmJuOtSZcrZEYMLVCmuJ2VsD5VlBl7G9cewZ9feRFqp0XfK/oVr1C6gqlycAhHcAI+nEJbqAMVWCA8ABP8OzcOY/Oi/M6LV1yZj0H8EfO2w8Ty5AY</latexit><latexit sha1_base64="So0n/WFrGezh4Ql0YSNr0dJmr0=">AB6HicbZC7SgNBFIbPeo3rLWpMxgEq7Bro40YtLFMwFwgCWF2cjYZMzu7zMwKYckT2FgoYqsPY28jvo2TS6GJPwx8/P85zDknSATXxvO+naXldW19dyGu7m1vbOb39uv6ThVDKsFrFqBFSj4BKrhuBjUQhjQKB9WBwPc7r96g0j+WtGSbYjmhP8pAzaqxVqXfyBa/oTUQWwZ9B4fLDvUjev9xyJ/Z6sYsjVAaJqjWTd9LTDujynAmcOS2Uo0JZQPaw6ZFSPU7Wwy6IgcW6dLwljZJw2ZuL87MhpPYwCWxlR09fz2dj8L2umJjxvZ1wmqUHJph+FqSAmJuOtSZcrZEYMLVCmuJ2VsD5VlBl7G9cewZ9feRFqp0XfK/oVr1C6gqlycAhHcAI+nEJbqAMVWCA8ABP8OzcOY/Oi/M6LV1yZj0H8EfO2w8Ty5AY</latexit><latexit sha1_base64="9SEGhucspoU8Qdg45nZgK4r6jcA=">AB6HicbVBNT8JAEJ3iF+IX6tHLRmLibRe9Ej04hESCyTQkO0yhZXtndmpCGX+DFg8Z49Sd589+4QA8KvmSl/dmMjMvTAXxnW/ndLG5tb2Tnm3srd/cHhUPT5p6yRTDH2WiER1Q6pRcIm+4UZgN1VI41BgJ5zczf3OEyrNE/lgpikGMR1JHnFGjZVanUG15tbdBcg68QpSgwLNQfWrP0xYFqM0TFCte56bmiCnynAmcFbpZxpTyiZ0hD1LJY1RB/ni0Bm5sMqQRImyJQ1ZqL8nchprPY1D2xlTM9ar3lz8z+tlJroJci7TzKBky0VRJohJyPxrMuQKmRFTSyhT3N5K2JgqyozNpmJD8FZfXiftq7rn1r2W2vcFnGU4QzO4RI8uIYG3EMTfGCA8Ayv8OY8Oi/Ou/OxbC05xcwp/IHz+QOy9YzX</latexit>

Position Prediction Word Prediction

… (a)

<latexit sha1_base64="282JehD7WAjx0hpBWFCbBzguNs=">AB6nicbVDLSgNBEOz1GeNr1aOXwSDES9j1oseAF48JmgckS5id9CZDZmeXmVkhLPkELx4U8eqH+A3e/As/wcnjoIkFDUVN91dYSq4Np735aytb2xubRd2irt7+weH7tFxUyeZYthgiUhUO6QaBZfYMNwIbKcKaRwKbIWjm6nfekCleSLvzTjFIKYDySPOqLHSXZle9NySV/FmIKvEX5BS1a1/fwBAred+dvsJy2KUhgmqdcf3UhPkVBnOBE6K3UxjStmIDrBjqaQx6iCfnToh51bpkyhRtqQhM/X3RE5jrcdxaDtjaoZ62ZuK/3mdzETXQc5lmhmUbL4oygQxCZn+TfpcITNibAlitbCRtSRZmx6RtCP7y6ukeVnxvYpft2l4MEcBTuEMyuDFVThFmrQAYDeIRneHGE8+S8Om/z1jVnMXMCf+C8/wChyI+F</latexit><latexit sha1_base64="NcIupzmf3RhxzTFJaklXvkry3Y0=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlvBZ1y16ZW8KtEz8OSlW3Nr3x83RQ7XrfnZ6kqQxFYZwrHXb9xITZFgZRjgdFzqpgkmQ9ynbUsFjqkOsumpY3RqlR6KpLIlDJqvycyHGs9ikPbGWMz0IveRPzPa6cmugwyJpLUEFmi6KUIyPR5G/UY4oSw0eWYKYvRWRAVaYGJtOwYbgL768TBrnZd8r+zWbhgcz5OEYTqAEPlxABa6hCnUg0IdHeIYXhztPzqvzNmvNOfOZQ/gD5/0H5aQeQ=</latexit><latexit sha1_base64="NcIupzmf3RhxzTFJaklXvkry3Y0=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlvBZ1y16ZW8KtEz8OSlW3Nr3x83RQ7XrfnZ6kqQxFYZwrHXb9xITZFgZRjgdFzqpgkmQ9ynbUsFjqkOsumpY3RqlR6KpLIlDJqvycyHGs9ikPbGWMz0IveRPzPa6cmugwyJpLUEFmi6KUIyPR5G/UY4oSw0eWYKYvRWRAVaYGJtOwYbgL768TBrnZd8r+zWbhgcz5OEYTqAEPlxABa6hCnUg0IdHeIYXhztPzqvzNmvNOfOZQ/gD5/0H5aQeQ=</latexit><latexit sha1_base64="fIGXaRFLs1xFgKurVPdeLoqlJM=">AB6nicbVA9SwNBEJ2LXzF+RS1tFoMQm7Bno2XAxjKi+YDkCHubSbJkb+/Y3RPCkZ9gY6GIrb/Izn/jJrlCEx8MPN6bYWZemEhLKXfXmFjc2t7p7hb2ts/ODwqH5+0TJxqjk0ey1h3QmZQCoVNK6zETqKRaHEdji5nfvtJ9RGxOrRThMIjZSYig4s056qLfrlCa3QBsk78nFQgR6Nf/uoNYp5GqCyXzJiuTxMbZExbwSXOSr3UYML4hI2w6hiEZogW5w6IxdOGZBhrF0pSxbq74mMRcZMo9B1RsyOzao3F/zuqkd3gSZUElqUfHlomEqiY3J/G8yEBq5lVNHGNfC3Ur4mGnGrUun5ELwV19eJ62rmk9r/j2t1GkeRxHO4Byq4M1OEOGtAEDiN4hld486T34r17H8vWgpfPnMIfeJ8/gGNA=</latexit>

(b)

<latexit sha1_base64="TY/LbeBHIxrBTnqAw4a9X8UHtFM=">AB6nicbVDLSgNBEOz1GeNr1aOXwSDES9j1oseAF48JmgckS5id9CZDZmeXmVkhLPkELx4U8eqH+A3e/As/wcnjoIkFDUVN91dYSq4Np735aytb2xubRd2irt7+weH7tFxUyeZYthgiUhUO6QaBZfYMNwIbKcKaRwKbIWjm6nfekCleSLvzTjFIKYDySPOqLHSXTm86Lklr+LNQFaJvyClqlv/gCAWs/97PYTlsUoDRNU647vpSbIqTKcCZwUu5nGlLIRHWDHUklj1E+O3VCzq3SJ1GibElDZurviZzGWo/j0HbG1Az1sjcV/M6mYmug5zLNDMo2XxRlAliEjL9m/S5QmbE2BLKFLe3EjakijJj0ynaEPzl1dJ87LiexW/btPwYI4CnMIZlMGHK6jCLdSgAQwG8AjP8OI58l5d7mrWvOYuYE/sB5/wGjTY+G</latexit><latexit sha1_base64="SVs4Ppr5xYUt4bgzNkXsuF7L3yw=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlsKzrlv0yt4UaJn4c1KsuLXvj5ujh2rX/ez0JEljKgzhWOu27yUmyLAyjHA6LnRSTRNMhrhP25YKHFMdZNTx+jUKj0USWVLGDRVf09kONZ6FIe2M8ZmoBe9ifif105NdBlkTCSpoYLMFkUpR0aiyd+oxQlho8swUQxeysiA6wMTadg3BX3x5mTOy75X9ms2DQ9myMxnEAJfLiAClxDFepAoA+P8AwvDnenFfnbdac+Yzh/AHzvsP6RuQeg=</latexit><latexit sha1_base64="SVs4Ppr5xYUt4bgzNkXsuF7L3yw=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlsKzrlv0yt4UaJn4c1KsuLXvj5ujh2rX/ez0JEljKgzhWOu27yUmyLAyjHA6LnRSTRNMhrhP25YKHFMdZNTx+jUKj0USWVLGDRVf09kONZ6FIe2M8ZmoBe9ifif105NdBlkTCSpoYLMFkUpR0aiyd+oxQlho8swUQxeysiA6wMTadg3BX3x5mTOy75X9ms2DQ9myMxnEAJfLiAClxDFepAoA+P8AwvDnenFfnbdac+Yzh/AHzvsP6RuQeg=</latexit><latexit sha1_base64="Q9Q3RlmsHuz9bAIr9Qt5YJQR75o=">AB6nicbVA9SwNBEJ2LXzF+RS1tFoMQm7Bno2XAxjKi+YDkCHubSbJkb+/Y3RPCkZ9gY6GIrb/Izn/jJrlCEx8MPN6bYWZemEhLKXfXmFjc2t7p7hb2ts/ODwqH5+0TJxqjk0ey1h3QmZQCoVNK6zETqKRaHEdji5nfvtJ9RGxOrRThMIjZSYig4s056qIaX/XKF1ugCZJ34OalAjka/NUbxDyNUFkumTFdnyY2yJi2gkuclXqpwYTxCRth1HFIjRBtjh1Ri6cMiDWLtSlizU3xMZi4yZRqHrjJgdm1VvLv7ndVM7vAkyoZLUouLRcNUEhuT+d9kIDRyK6eOMK6Fu5XwMdOMW5dOyYXgr768TlpXNZ/W/HtaqdM8jiKcwTlUwYdrqMdNKAJHEbwDK/w5knvxXv3PpatBS+fOYU/8D5/AIOGjTU=</latexit> <latexit sha1_base64="a2Ke1LnYi/s9aZk8SGWv2R5+0=">AB6nicbVDLSgNBEOz1GeNr1aOXwSDES9j1oseAF48JmgckS5id9CZDZmeXmVkhLPkELx4U8eqH+A3e/As/wcnjoIkFDUVN91dYSq4Np735aytb2xubRd2irt7+weH7tFxUyeZYthgiUhUO6QaBZfYMNwIbKcKaRwKbIWjm6nfekCleSLvzTjFIKYDySPOqLHSXZld9NySV/FmIKvEX5BS1a1/fwBAred+dvsJy2KUhgmqdcf3UhPkVBnOBE6K3UxjStmIDrBjqaQx6iCfnToh51bpkyhRtqQhM/X3RE5jrcdxaDtjaoZ62ZuK/3mdzETXQc5lmhmUbL4oygQxCZn+TfpcITNibAlitbCRtSRZmx6RtCP7y6ukeVnxvYpft2l4MEcBTuEMyuDFVThFmrQAYDeIRneHGE8+S8Om/z1jVnMXMCf+C8/wCk0o+H</latexit><latexit sha1_base64="Y05+glcJ37EcxNCi42td7bLKjE=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlshZ1y16ZW8KtEz8OSlW3Nr3x83RQ7XrfnZ6kqQxFYZwrHXb9xITZFgZRjgdFzqpgkmQ9ynbUsFjqkOsumpY3RqlR6KpLIlDJqvycyHGs9ikPbGWMz0IveRPzPa6cmugwyJpLUEFmi6KUIyPR5G/UY4oSw0eWYKYvRWRAVaYGJtOwYbgL768TBrnZd8r+zWbhgcz5OEYTqAEPlxABa6hCnUg0IdHeIYXhztPzqvzNmvNOfOZQ/gD5/0H6qCQew=</latexit><latexit sha1_base64="Y05+glcJ37EcxNCi42td7bLKjE=">AB6nicbVC7SgNBFL0bXzG+Vu20GQxCbMKujZYBGwuLBM0DkiXMTmaTIbMzy8ysEJZ8gdhYKGLrn/gHdv6Fn+DkUWjigQuHc+7l3nvChDNtPO/Lya2srq1v5DcLW9s7u3vu/kFDy1QRWieS9UKsacCVo3zHDaShTFchpMxeTfzmPVWaSXFnRgkNYtwXLGIEGyvdlshZ1y16ZW8KtEz8OSlW3Nr3x83RQ7XrfnZ6kqQxFYZwrHXb9xITZFgZRjgdFzqpgkmQ9ynbUsFjqkOsumpY3RqlR6KpLIlDJqvycyHGs9ikPbGWMz0IveRPzPa6cmugwyJpLUEFmi6KUIyPR5G/UY4oSw0eWYKYvRWRAVaYGJtOwYbgL768TBrnZd8r+zWbhgcz5OEYTqAEPlxABa6hCnUg0IdHeIYXhztPzqvzNmvNOfOZQ/gD5/0H6qCQew=</latexit><latexit sha1_base64="MAPWp0gE1a27E4Jf1OKQ5jsE+LQ=">AB6nicbVA9SwNBEJ2LXzF+RS1tFoMQm7Bno2XAxjKi+YDkCHubSbJkb+/Y3RPCkZ9gY6GIrb/Izn/jJrlCEx8MPN6bYWZemEhLKXfXmFjc2t7p7hb2ts/ODwqH5+0TJxqjk0ey1h3QmZQCoVNK6zETqKRaHEdji5nfvtJ9RGxOrRThMIjZSYig4s056qPLfrlCa3QBsk78nFQgR6Nf/uoNYp5GqCyXzJiuTxMbZExbwSXOSr3UYML4hI2w6hiEZogW5w6IxdOGZBhrF0pSxbq74mMRcZMo9B1RsyOzao3F/zuqkd3gSZUElqUfHlomEqiY3J/G8yEBq5lVNHGNfC3Ur4mGnGrUun5ELwV19eJ62rmk9r/j2t1GkeRxHO4Byq4M1OEOGtAEDiN4hld486T34r17H8vWgpfPnMIfeJ8/hQuNg=</latexit>
slide-34
SLIDE 34

Learning

LELBO = E

π⇠q log pθ(yπ|x) + H(q)

= E

r2:T +1⇠q

@

T+1

X

t=1

log pθ(yt+1|y0:t, r0:t, x1:T 0) | {z }

Word Prediction Loss

+

T

X

t=1

log pθ(rt+1|y0:t+1, r0:t, x1:T 0) | {z }

Position Prediction Loss

1 A + H(q), (10)

  • Maximize the evident lower bound (ELBO)
  • Approximate posterior distribution of generation orders q(π|x, y)
slide-35
SLIDE 35

Searched Adaptive Order (SAO)

LSAO = 1 B X

π2B

log pθ(yπ|x) ⇢

where we assume q(π|x, y) = ⇢

1/B π 2 B

  • therwise.
  • is approximated by beam search

q(π|x, y)

slide-36
SLIDE 36

Experiments

Pre-defined Order Descriptions Left-to-right (L2R) Generate words from left to right. (Wu et al., 2018) Right-to-left (R2L) Generate words from right to left. (Wu et al., 2018) Odd-Even (ODD) Generate words at odd positions from left to right, then generate even positions. (Ford et al., 2018) Balanced-tree (BLT) Generate words with a top-down left-to-right order from a balanced binary tree. (Stern et al., 2019) Syntax-tree (SYN) Generate words with a top-down left-to-right order from the dependency tree. (Wang et al., 2018b) Common-First (CF) Generate all common words first from left to right, and then generate the others. (Ford et al., 2018) Rare-First (RF) Generate all rare words first from left to right, and then generate the remaining. (Ford et al., 2018) Random (RND) Generate words in a random order shuffled every time the example was loaded.

Model WMT16 Ro ! En WMT18 En ! Tr KFTT En ! Ja BLEU Ribes Meteor TER BLEU Ribes Meteor TER BLEU Ribes Meteor TER RND 20.20 79.35 41.00 63.20 03.04 55.45 19.12 90.60 17.09 70.89 35.24 70.11 L2R 31.82 83.37 52.19 50.62 14.85 69.20 33.90 71.56 30.87 77.72 48.57 59.92 R2L 31.62 83.18 52.09 50.20 14.38 68.87 33.33 71.91 30.44 77.95 47.91 61.09 ODD 30.11 83.09 50.68 50.79 13.64 68.85 32.48 72.84 28.59 77.01 46.28 60.12 BLT 24.38 81.70 45.67 55.38 08.72 65.70 27.40 77.76 21.50 73.97 40.23 64.39 SYN 29.62 82.65 50.25 52.14 – – CF 30.25 83.22 50.71 50.72 12.04 67.61 31.18 74.75 28.91 77.06 46.46 61.56 RF 30.23 83.29 50.72 51.73 12.10 67.44 30.72 73.40 27.35 76.40 45.15 62.14 SAO 32.47 84.10 53.00 49.02 15.18 70.06 34.60 71.56 31.91 77.56 49.66 59.80

slide-37
SLIDE 37

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang∗1, Zihang Dai∗12, Yiming Yang1, Jaime Carbonell1, Ruslan Salakhutdinov1, Quoc V. Le2

1Carnegie Mellon University, 2Google Brain

{zhiliny,dzihang,yiming,jgc,rsalakhu}@cs.cmu.edu, qvl@google.com

slide-38
SLIDE 38

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang∗1, Zihang Dai∗12, Yiming Yang1, Jaime Carbonell1, Ruslan Salakhutdinov1, Quoc V. Le2

1Carnegie Mellon University, 2Google Brain

{zhiliny,dzihang,yiming,jgc,rsalakhu}@cs.cmu.edu, qvl@google.com

???

slide-39
SLIDE 39

BERT

  • Motivation of BERT: utilize bidirectional context
  • Solution of BERT: denoising auto-encoder
  • Problem of BERT:
  • pretrain-finetune discrepancy (the mask symbol)
  • Independent assumption (non-autoregressive)
slide-40
SLIDE 40

XLNet

  • Left-to-right ? No
  • Right-to-left ? No
  • Both ? No
  • All possible factorization orders
slide-41
SLIDE 41

Benefits

  • Still an auto-regressive model
  • Learn to utilize bidirectional context
  • No data corruption, no pretrain-finetune discrepancy
  • No independent assumption, more expressive
slide-42
SLIDE 42

Lesson

  • Given aforementioned papers, the idea of XLNet seems

very natural.

  • It is not hard to make a BIG NEWS if we
  • Always think of fundamental problems
  • Read some good papers
  • Have TPUs
slide-43
SLIDE 43

Other Techniques

  • Transformer-XL
  • Partial prediction
  • only predict the last tokens in a factorization order
  • Span-based prediction
  • mask a consecutive span
slide-44
SLIDE 44

Ablation Study

# Model RACE SQuAD2.0 MNLI SST-2 F1 EM m/mm 1 BERT-Base 64.3 76.30 73.66 84.34/84.65 92.78 2 DAE + Transformer-XL 65.03 79.56 76.80 84.88/84.45 92.60 3 XLNet-Base (K = 7) 66.05 81.33 78.46 85.84/85.43 92.66 4 XLNet-Base (K = 6) 66.66 80.98 78.18 85.63/85.12 93.35 5

  • memory

65.55 80.15 77.27 85.32/85.05 92.78 6

  • span-based pred

65.95 80.61 77.91 85.49/85.02 93.12 7

  • bidirectional data

66.34 80.65 77.87 85.31/84.99 92.66 8 + next-sent pred 66.76 79.83 76.94 85.32/85.09 92.89 able 6: Ablation study. The results of BERT on RACE are taken from [39]. We run BERT on the other datasets

  • The new permutation LM objective is superior.
  • The transformer-XL, span-based pred, etc also matter.
slide-45
SLIDE 45

Discussions

  • Why token-by-token?
  • Can we do deletion and substitution?