Ne Neural T Text Ge Generation f from S Struct ctured Da Data wi with h Appl Application n to the he Biogr graph phy Domain
Rémi Lebret, David Grangier, Michael Auli
Ne Neural T Text Ge Generation f from S Struct ctured Da Data - - PowerPoint PPT Presentation
Ne Neural T Text Ge Generation f from S Struct ctured Da Data wi with h Appl Application n to the he Biogr graph phy Domain Rmi Lebret, David Grangier, Michael Auli Fr From Str truc uctur tured ed Data a to Sen entenc
Rémi Lebret, David Grangier, Michael Auli
User-friendly access to structured data: ØQuestion answering ØVirtual assistant ØProfile summary Machines like to read structured data, people don’t
Cloudy, with temperatures between 10 and 20 degrees. South wind around 20 mph.
Give me the flights leaving Denver August ninth coming back to Boston before 4pm.
PROS CONS Natural language Repetitive No training Scale poorly
Small datasets with limited vocabularies
Ø 700K biographies Ø 400K words vocabulary
Conditioning on tables (fields + values)
Z
Copy actions
Neural language model for constrained sentence generation Success in: ØCaption generation (Vinyals et al, 2015) ØMachine translation (D. Bahdanau et al, 2014) ØModeling conversations and dialogues (Shang et al, 2015)
Table descriptors: Ø Name of the field Ø Position from the start Ø Position from the end
copy actions
Table descriptors: Ø Name of the field Ø Position from the start Ø Position from the end
copy actions
Table descriptors: Ø Name of the field Ø Position from the start Ø Position from the end
copy actions
Table descriptors: Ø Name of the field Ø Position from the start Ø Position from the end
Local conditioning à already generated fields copy actions
Table descriptors: Ø Name of the field Ø Position from the start Ø Position from the end
Local conditioning à already generated fields Global conditioning à fields and values copy actions
john doe ( 18 april 1352 ) is a
Embeddings-based model
john doe ( 18 april 1352 ) is a
Aggregating embeddings –> component-wise max
john doe ( 18 april 1352 ) is a
𝜔(,) 𝜔(-) 𝜔(𝑨)*) 𝜔(𝑑$) Input 𝑦 = 𝑑$, 𝑨)*, ,, - : 𝜔3 𝑦 = 𝜔(𝑑$); 𝜔(𝑨)*); 𝜔(,); 𝜔(-)
Input: 𝜔3 𝑦 = 𝜔(𝑑$); 𝜔(𝑨)*); 𝜔(,); 𝜔(-) ℎ(𝑦) Non-linear transformation Final score: 𝜚7 𝑦, 𝑥 = 𝜚3
𝒳 𝑦, 𝑥 + 𝜚9 𝑦, 𝑥
Input: 𝜔3 𝑦 = 𝜔(𝑑$); 𝜔(𝑨)*); 𝜔(,); 𝜔(-) Final score: 𝜚7 𝑦, 𝑥 = 𝜚3
𝒳 𝑦, 𝑥 + 𝜚9 𝑦, 𝑥
Softmax function: log 𝑄(𝑥|𝑦) = 𝜚7 𝑦, 𝑥 − log ? exp 𝜚7(𝑦, 𝑥′)
Training: Maximize Likelihood of Training Text 𝑀7 𝑡 = ? log 𝑄(𝑥$|𝑑$, 𝑨)*, ,, -)
J $KL
Ø Infobox Ø Introduction section (only 1st sentence for the generation)
Available at
https://rlebret.github.io/wikipedia-biography-dataset/
without copy actions with copy actions
Continuing an incomplete field Handling transitions between fields
200 500 1000 2000 15 20 25 30 35 40 45 time in ms BLEU
2 3 4 5 6 8 10 15 2025 1 345 67 810 15 20 25
Table NLM beam size
thanks to GPU
200 ms
MODEL GENERATED SENTENCE Template KN frederick parker-rhodes ( born november 21 , 1914 – march 2 , 1987 ) was an english cricketer . Table NLM +Local (field, start) frederick parker-rhodes ( 21 november 1914 – 2 march 1987 ) was an australian rules footballer who played with carlton in the victorian football league ( vfl ) during the XXXXs and XXXXs . + Global (field) frederick parker-rhodes ( 21 november 1914 – 2 march 1987 ) was an english mycology and plant pathology , mathematics at the university of uk . + Global (field, word) frederick parker-rhodes ( 21 november 1914 – 2 march 1987 ) was a british computer scientist , best known for his contributions to computational linguistics .
Ø copying facts from the table. Ø understanding type of fields. Ø understanding relation between record tokens and table tokens. Ø network with low capacity → fast generation.
Ø https://rlebret.github.io/wikipedia-biography-dataset/