numeracy for language models evaluating and improving
play

Numeracy for Language Models: Evaluating and Improving their Ability - PowerPoint PPT Presentation

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis , Steffen Petersen, Sebastian Riedel Machine Reading Group Numeracy the mat cat 2000 words 3.14 dog fox brown -1


  1. Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis , Steffen Petersen, Sebastian Riedel Machine Reading Group

  2. Numeracy the mat cat 2000 words 3.14… dog fox ℤ brown -1 ℝ 7 sleeping three sat 0.001 2 numbers four jumped two numerals 4 0 1+2i ℕ 0. ത 9 one ℂ 2 1.73 1 i π 2 ℚ 2018 −1 3.14 5/8 2 2/3

  3. Literate Language Models 𝑄 𝑀𝑁 𝑢𝑓𝑦𝑢 Plausible (semantically, grammatically, etc.) ‘A apple eats I’ ‘I eats an apple’ ‘An apple eats me’ ‘I eat an apple’

  4. Numerate Language Models 𝑄 𝑀𝑁 𝑢𝑓𝑦𝑢 𝑄 ℎ𝑓𝑗𝑕ℎ𝑢 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 ‘John is 0 m tall’ ‘John is 1.7 m tall’ ‘John is 2 m tall’ ‘John is 999 m tall’

  5. Numeracy Matters 0.5 ‘Unemployment of the US is 5 %’ 0.0 50 23.2 500 ‘Patient’s temperature is 36.6 degrees’ 41.9 98.6 0 ‘Our model is 10 times better than the baseline’ 100 1000

  6. Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?

  7. Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?

  8. A Neural Language Model 𝑞(𝑥 𝑢 |ℎ 𝑢 ) Output … … ℎ 𝑢 ℎ 𝑢−1 RNN Input 𝑓 𝑢 𝑥 𝑢−1

  9. A Neural Language Model = softmax (𝑥 𝑢 ) 𝑞(𝑥 𝑢 |ℎ 𝑢 ) 𝑊 petrichor the unothrorgaphy Output cat Spithourakis mat … sat … … ℎ 𝑢 ℎ 𝑢−1 RNN UNK h t V 2 2.1 1 Input 𝑓 𝑢 0.731 1.7 9,846,321 2018.3 2018 … 𝑥 𝑢−1 UNKNUM

  10. Evaluation: Adjusted Perplexity Perplexity John is 2.1 m tall 𝑞 2.1 = 𝑞 UNKNUM BUT + 𝑞 0.731 2.1 + 0.731 𝑞 9,846,321 9,846,321 + 2018.3 ⋮ …

  11. Evaluation: Adjusted Perplexity Perplexity Adjusted Perplexity [Ueberla, 1994] John is 2.1 m tall 𝑞 UNKNUM 𝑞 2.1 = 𝑞 𝑞 2.1 = UNKNUM 𝑥 ∈ UNKNUM BUT + from test data 𝑞 0.731 2.1 + 0.731 𝑞 9,846,321 a.k.a. Unknown-Penalised Perplexity 9,846,321 + 2018.3 [Ahn et al., 2016] ⋮ …

  12. Datasets Clinical Dataset Scientific Dataset 16,015 clinical patient reports 20,962 paragraphs from scientific papers Source: ARXIV Source: London Chest Hospital 4% 96% words numerals

  13. Results: Adjusted Perplexity 3,505,856.25 (Lower is better) 100 80.62 Scientific 51.83 50 0 58,443.72 all tokens words numerals 8.91 10 5.99 Clinical 5 0 all tokens words numerals

  14. Results: Adjusted Perplexity 3,505,856.25 (Lower is better) 100 80.62 Scientific 51.83 50 0 58,443.72 all tokens words numerals 8.91 10 5.99 Clinical 5 0 all tokens words numerals

  15. Results: Adjusted Perplexity 3,505,856.25 (Lower is better) softmax 100 Assumptions Reality (?) 80.62 Scientific 51.83 large large 50 UNKNUM 0 58,443.72 all tokens words numerals 8.91 10 small small 5.99 Clinical 5 PMF PDF 0 all tokens words numerals

  16. Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?

  17. Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat UNK UNK h t V 2 1 𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊 𝒖 ) 1.7 2018 UNKNUM

  18. Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat softmax UNK UNK h t V 2 2 1 1 1.7 1.7 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 2018 UNKNUM UNKNUM

  19. Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat UNK UNK 𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊 𝒖 ) h t V 2 • h-softmax 1 • digit-by-digit 1.7 • from PDF 2018 • etc. UNKNUM

  20. Strategy: Digit-by-Digit Composition 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1) d-RNN 2 . 1 EOS ℎ 𝑢 SOS 2 . 1

  21. Strategy: Digit-by-Digit Composition 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1) UNKNUM d-RNN 2 . 1 EOS 1.99 ℎ 𝑢 1.98 2.02 1.97 1.96 2.01 1.95 2.00 1.94 SOS 2 . 1

  22. Strategy: from continuous PDF 𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞 𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1 2.5 2 1.5 PDF 1 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5

  23. Strategy: from continuous PDF 𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞 𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1 2.5 MoG Frozen 𝜈 s and 𝜏 s 2 softmax 𝑑𝑝𝑛𝑞𝑝𝑜𝑓𝑜𝑢 1.5 PDF 1 h t 0.5 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 0 𝑞 precision = 𝑞 𝑆𝑂𝑂 𝑒𝑒𝑒𝑒 𝐹𝑃𝑇 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5

  24. Overview of Strategies softmax 2 1.7 h-softmax 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 2 . 1 <EOS> UNKNUM d-RNN <SOS> 2 . 1 MoG PDF

  25. Overview of Strategies softmax 2 combination 1.7 h-softmax 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡 2 . 1 <EOS> UNKNUM h t d-RNN <SOS> 2 . 1 MoG PDF

  26. Results: Language Modelling (1) 8.91 10 Clinical 6.05 5.88 5.88 5.82 5 All Tokens 0 softmax h-softmax d-RNN MoG combination 10 Perplexity Adjusted 5.99 4.96 4.95 4.99 4.96 Words 5 0 softmax h-softmax d-RNN MoG combination 58,443.72 1000 495.95 Numerals 500 263.22 226.46 197.59 0 softmax h-softmax d-RNN MoG combination (lower is better)

  27. Results: Language Modelling (2) 80.62 Scientific 100 54.8 54.37 53.7 53.03 50 All Tokens 0 100 softmax h-softmax d-RNN MoG combination 51.83 49.81 48.89 48.97 48.25 Perplexity Adjusted 50 Words 0 softmax h-softmax d-RNN MoG combination 3,505,856.25 1000 683.16 550.98 519.8 520.95 500 Numerals 0 softmax h-softmax d-RNN MoG combination (lower is better)

  28. Results: Number Prediction numeral number 𝑁𝐵𝑄𝐹 = 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 − 𝑢𝑏𝑠𝑕𝑓𝑢 × 100% `2.1’ 𝑢𝑏𝑠𝑕𝑓𝑢 2.1 Clinical 2353.11 747 800 622 552 514 600 426 348 400 200 0 (lower is mean median softmax h-softmax d-RNN MoG combination better)

  29. Results: Number Prediction Scientific 1e23 8039 2333 2500 1947 2000 1652 1287 1500 1000 590 500 0 mean softmax d-RNN combination Clinical 2353.11 747 800 622 552 514 600 426 348 400 200 0 (lower is mean median softmax h-softmax d-RNN MoG combination better)

  30. Softmax versus Hierarchical Softmax 1 2 3 4 … 100 101 … 2012 2013 2013 2012 … 101 100 … 4 3 2 1 cosine similarities cosine similarities softmax h-softmax

  31. Analysis: d-RNN and Benford’s Law 0 1 2 3 4 5 6 7 8 9 . EOS EOS . 9 8 7 6 5 4 3 2 1 0 cosine similarities d-RNN

  32. Analysis: d-RNN and Benford’s Law 1st digit 4th digit 0 1 2 3 4 5 6 7 8 9 . EOS 30 30 Clinical EOS . 9 8 7 6 5 4 3 2 1 0 20 20 10 10 0 0 1st digit 4th digit 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 30 30 Scientific 20 20 10 10 0 0 cosine similarities 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 d-RNN d-RNN Benford

  33. Analysis: Model Predictions h-softmax d-RNN ‘… ejective fraction : ____ % ...’ MoG

  34. Analysis: Strategy Selection 4 out of 17 segments Small integers, Enhancement > 25 % percentiles, h-softmax years Li et al. 2003 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡 measured 32 x 31 mm 2-digit integers, h t d-RNN NGC 6334 stars some ids Ejective fraction: 27.00 % reals, MoG Ejective fraction: 35.00 % some ids HIP 12961 and GL 676

  35. Conclusion (1) Are existing LMs numerate? softmax the cat mat UNK h t 2 3.14 2018 UNKNUM

  36. Conclusion (1) Are existing LMs numerate? ‘John’s height is ___ ’ softmax 0 1 2 the 25 3 cat 999 mat UNKNUM UNK 3.14 h t 2 2018 3.14 50 2018 UNKNUM

  37. Conclusion (2) How to improve the numeracy of LMs? combination h-softmax softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡 h t d-RNN MoG

  38. Conclusion (2) How to improve ‘John’s height is ___ ’ the numeracy of LMs? combination 2.1 h-softmax 1.8 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡 1.73 h t d-RNN 2 MoG

  39. 0 200 1 2 3.14 25 0 ℤ … 3 999 -1 ℝ 7 Thank you! 2.1 1.8 UNKNUM 3.14 2018 1.73 𝟑 50 0 1+2i 𝟏. ഥ ℕ 𝟘 ℂ 2 1 2 ℚ −𝟐 5/8 2/3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend