Numeracy for Language Models: Evaluating and Improving their Ability - PowerPoint PPT Presentation

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis , Steffen Petersen, Sebastian Riedel Machine Reading Group

Numeracy the mat cat 2000 words 3.14… dog fox ℤ brown -1 ℝ 7 sleeping three sat 0.001 2 numbers four jumped two numerals 4 0 1+2i ℕ 0. ത 9 one ℂ 2 1.73 1 i π 2 ℚ 2018 −1 3.14 5/8 2 2/3

Literate Language Models 𝑄 𝑀𝑁 𝑢𝑓𝑦𝑢 Plausible (semantically, grammatically, etc.) ‘A apple eats I’ ‘I eats an apple’ ‘An apple eats me’ ‘I eat an apple’

Numerate Language Models 𝑄 𝑀𝑁 𝑢𝑓𝑦𝑢 𝑄 ℎ𝑓𝑗𝑕ℎ𝑢 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 ‘John is 0 m tall’ ‘John is 1.7 m tall’ ‘John is 2 m tall’ ‘John is 999 m tall’

Numeracy Matters 0.5 ‘Unemployment of the US is 5 %’ 0.0 50 23.2 500 ‘Patient’s temperature is 36.6 degrees’ 41.9 98.6 0 ‘Our model is 10 times better than the baseline’ 100 1000

Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?

A Neural Language Model 𝑞(𝑥 𝑢 |ℎ 𝑢 ) Output … … ℎ 𝑢 ℎ 𝑢−1 RNN Input 𝑓 𝑢 𝑥 𝑢−1

A Neural Language Model = softmax (𝑥 𝑢 ) 𝑞(𝑥 𝑢 |ℎ 𝑢 ) 𝑊 petrichor the unothrorgaphy Output cat Spithourakis mat … sat … … ℎ 𝑢 ℎ 𝑢−1 RNN UNK h t V 2 2.1 1 Input 𝑓 𝑢 0.731 1.7 9,846,321 2018.3 2018 … 𝑥 𝑢−1 UNKNUM

Evaluation: Adjusted Perplexity Perplexity John is 2.1 m tall 𝑞 2.1 = 𝑞 UNKNUM BUT + 𝑞 0.731 2.1 + 0.731 𝑞 9,846,321 9,846,321 + 2018.3 ⋮ …

Evaluation: Adjusted Perplexity Perplexity Adjusted Perplexity [Ueberla, 1994] John is 2.1 m tall 𝑞 UNKNUM 𝑞 2.1 = 𝑞 𝑞 2.1 = UNKNUM 𝑥 ∈ UNKNUM BUT + from test data 𝑞 0.731 2.1 + 0.731 𝑞 9,846,321 a.k.a. Unknown-Penalised Perplexity 9,846,321 + 2018.3 [Ahn et al., 2016] ⋮ …

Datasets Clinical Dataset Scientific Dataset 16,015 clinical patient reports 20,962 paragraphs from scientific papers Source: ARXIV Source: London Chest Hospital 4% 96% words numerals

Results: Adjusted Perplexity 3,505,856.25 (Lower is better) 100 80.62 Scientific 51.83 50 0 58,443.72 all tokens words numerals 8.91 10 5.99 Clinical 5 0 all tokens words numerals

Results: Adjusted Perplexity 3,505,856.25 (Lower is better) softmax 100 Assumptions Reality (?) 80.62 Scientific 51.83 large large 50 UNKNUM 0 58,443.72 all tokens words numerals 8.91 10 small small 5.99 Clinical 5 PMF PDF 0 all tokens words numerals

Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?

Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat UNK UNK h t V 2 1 𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊 𝒖 ) 1.7 2018 UNKNUM

Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat softmax UNK UNK h t V 2 2 1 1 1.7 1.7 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 2018 UNKNUM UNKNUM

Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat UNK UNK 𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊 𝒖 ) h t V 2 • h-softmax 1 • digit-by-digit 1.7 • from PDF 2018 • etc. UNKNUM

Strategy: Digit-by-Digit Composition 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1) d-RNN 2 . 1 EOS ℎ 𝑢 SOS 2 . 1

Strategy: Digit-by-Digit Composition 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1) UNKNUM d-RNN 2 . 1 EOS 1.99 ℎ 𝑢 1.98 2.02 1.97 1.96 2.01 1.95 2.00 1.94 SOS 2 . 1

Strategy: from continuous PDF 𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞 𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1 2.5 2 1.5 PDF 1 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5

Strategy: from continuous PDF 𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞 𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1 2.5 MoG Frozen 𝜈 s and 𝜏 s 2 softmax 𝑑𝑝𝑛𝑞𝑝𝑜𝑓𝑜𝑢 1.5 PDF 1 h t 0.5 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 0 𝑞 precision = 𝑞 𝑆𝑂𝑂 𝑒𝑒𝑒𝑒 𝐹𝑃𝑇 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5

Overview of Strategies softmax 2 1.7 h-softmax 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 2 . 1 <EOS> UNKNUM d-RNN <SOS> 2 . 1 MoG PDF

Overview of Strategies softmax 2 combination 1.7 h-softmax 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡 2 . 1 <EOS> UNKNUM h t d-RNN <SOS> 2 . 1 MoG PDF

Results: Language Modelling (1) 8.91 10 Clinical 6.05 5.88 5.88 5.82 5 All Tokens 0 softmax h-softmax d-RNN MoG combination 10 Perplexity Adjusted 5.99 4.96 4.95 4.99 4.96 Words 5 0 softmax h-softmax d-RNN MoG combination 58,443.72 1000 495.95 Numerals 500 263.22 226.46 197.59 0 softmax h-softmax d-RNN MoG combination (lower is better)

Results: Language Modelling (2) 80.62 Scientific 100 54.8 54.37 53.7 53.03 50 All Tokens 0 100 softmax h-softmax d-RNN MoG combination 51.83 49.81 48.89 48.97 48.25 Perplexity Adjusted 50 Words 0 softmax h-softmax d-RNN MoG combination 3,505,856.25 1000 683.16 550.98 519.8 520.95 500 Numerals 0 softmax h-softmax d-RNN MoG combination (lower is better)

Results: Number Prediction numeral number 𝑁𝐵𝑄𝐹 = 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 − 𝑢𝑏𝑠𝑕𝑓𝑢 × 100% `2.1’ 𝑢𝑏𝑠𝑕𝑓𝑢 2.1 Clinical 2353.11 747 800 622 552 514 600 426 348 400 200 0 (lower is mean median softmax h-softmax d-RNN MoG combination better)

Results: Number Prediction Scientific 1e23 8039 2333 2500 1947 2000 1652 1287 1500 1000 590 500 0 mean softmax d-RNN combination Clinical 2353.11 747 800 622 552 514 600 426 348 400 200 0 (lower is mean median softmax h-softmax d-RNN MoG combination better)

Softmax versus Hierarchical Softmax 1 2 3 4 … 100 101 … 2012 2013 2013 2012 … 101 100 … 4 3 2 1 cosine similarities cosine similarities softmax h-softmax

Analysis: d-RNN and Benford’s Law 0 1 2 3 4 5 6 7 8 9 . EOS EOS . 9 8 7 6 5 4 3 2 1 0 cosine similarities d-RNN

Analysis: d-RNN and Benford’s Law 1st digit 4th digit 0 1 2 3 4 5 6 7 8 9 . EOS 30 30 Clinical EOS . 9 8 7 6 5 4 3 2 1 0 20 20 10 10 0 0 1st digit 4th digit 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 30 30 Scientific 20 20 10 10 0 0 cosine similarities 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 d-RNN d-RNN Benford

Analysis: Model Predictions h-softmax d-RNN ‘… ejective fraction : ____ % ...’ MoG

Analysis: Strategy Selection 4 out of 17 segments Small integers, Enhancement > 25 % percentiles, h-softmax years Li et al. 2003 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡 measured 32 x 31 mm 2-digit integers, h t d-RNN NGC 6334 stars some ids Ejective fraction: 27.00 % reals, MoG Ejective fraction: 35.00 % some ids HIP 12961 and GL 676

Conclusion (1) Are existing LMs numerate? softmax the cat mat UNK h t 2 3.14 2018 UNKNUM

Conclusion (1) Are existing LMs numerate? ‘John’s height is ___ ’ softmax 0 1 2 the 25 3 cat 999 mat UNKNUM UNK 3.14 h t 2 2018 3.14 50 2018 UNKNUM

Conclusion (2) How to improve the numeracy of LMs? combination h-softmax softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡 h t d-RNN MoG

Conclusion (2) How to improve ‘John’s height is ___ ’ the numeracy of LMs? combination 2.1 h-softmax 1.8 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡 1.73 h t d-RNN 2 MoG

0 200 1 2 3.14 25 0 ℤ … 3 999 -1 ℝ 7 Thank you! 2.1 1.8 UNKNUM 3.14 2018 1.73 𝟑 50 0 1+2i 𝟏. ഥ ℕ 𝟘 ℂ 2 1 2 ℚ −𝟐 5/8 2/3

Numeracy for Language Models: Evaluating and Improving their Ability - PowerPoint PPT Presentation

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis , Steffen Petersen, Sebastian Riedel Machine Reading Group Numeracy the mat cat 2000 words 3.14 dog fox brown -1

Grammar: The Heart of Numeracy 18 Nov, 2017 0B 2017 NNN2 Grammar: The Heart of Numeracy 1 0B

Welcome! Aims of this workshop: Reading, Literacy and Numeracy expectations Writing:

Bridging between traditional and new numeracy practices: A report of a numeracy pilot project

NATIONAL ASSESSMENT PROGRAM: LITERACY AND NUMERACY (NAPLAN) DATA ANALYSIS 2018 CHURCHLANDS

Helping your chil ild develop their Numeracy Skills Ways to help lp in in maths Numeracy

GCSE Maths & Numeracy January 2020 GCSE Maths & Numeracy The Exams Exam Dates

VALA VCAL Induction Day Numeracy 19 2 16 Welcome to VCAL Numeracy Introductions Game of 11

Levels of Numeracy: How Much Critical Thinking? V0B 11/17/2017 V0B V0B 2017 NNN1 1 2017 NNN1

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Outline Evaluating Models of Natural Image Patches Evaluating Models Comparing Whitening

Literacy and Numeracy Progressions - An Overview This work was made possible through funding from

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

Junior Prep Wednesday 20 th March 2019 Kerry Walsh Lead Practitioner in Numeracy A lot of

Playing n Learning Summer Program in Numeracy PRESENTED BY: Sarah Taylor Thursday, April 9,

Adult Learning Australia Basic numeracy concepts Chris Tully Learning Skills & Assessment

Health Numeracy: Making Connections Tina D. Moore, Ed.D. Mathematics Program Manager Arkansas

Desert Ecology Presented by the McDowell Sonoran Field Institute a program

Optimal Order Strategies on the Day- Ahead Electricity Market Martin Biel 20/9-2017 Outline

ON LEARNING AND INFORMATION ACQUISITION WITH RESPECT TO FUTURE AVAILABILITY OF ALTERNATIVES

Mandating Damage Reporting & Damage Prevention Councils J.D. Maniscalco Executive director

Discursive Framing & Community Mobilization: Stopping the Melancthon Mega Quarry Rebecca

Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Marios Mattheakis

Problem Norman pays Oklahoma City $3.10/1000 gallons for drinking water Previously from

TensorFlow Probability Joshua V. Dillon Software Engineer Google Research What is TensorFlow

Numeracy for Language Models: Evaluating and Improving their Ability - PowerPoint PPT Presentation

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis , Steffen Petersen, Sebastian Riedel Machine Reading Group Numeracy the mat cat 2000 words 3.14 dog fox brown -1

Grammar: The Heart of Numeracy 18 Nov, 2017 0B 2017 NNN2 Grammar: The Heart of Numeracy 1 0B

Welcome! Aims of this workshop: Reading, Literacy and Numeracy expectations Writing:

Bridging between traditional and new numeracy practices: A report of a numeracy pilot project

NATIONAL ASSESSMENT PROGRAM: LITERACY AND NUMERACY (NAPLAN) DATA ANALYSIS 2018 CHURCHLANDS

Helping your chil ild develop their Numeracy Skills Ways to help lp in in maths Numeracy

GCSE Maths &amp; Numeracy January 2020 GCSE Maths &amp; Numeracy The Exams Exam Dates

VALA VCAL Induction Day Numeracy 19 2 16 Welcome to VCAL Numeracy Introductions Game of 11

Levels of Numeracy: How Much Critical Thinking? V0B 11/17/2017 V0B V0B 2017 NNN1 1 2017 NNN1

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Outline Evaluating Models of Natural Image Patches Evaluating Models Comparing Whitening

Literacy and Numeracy Progressions - An Overview This work was made possible through funding from

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

Junior Prep Wednesday 20 th March 2019 Kerry Walsh Lead Practitioner in Numeracy A lot of

Playing n Learning Summer Program in Numeracy PRESENTED BY: Sarah Taylor Thursday, April 9,

Adult Learning Australia Basic numeracy concepts Chris Tully Learning Skills &amp; Assessment

Health Numeracy: Making Connections Tina D. Moore, Ed.D. Mathematics Program Manager Arkansas

Desert Ecology Presented by the McDowell Sonoran Field Institute a program

Optimal Order Strategies on the Day- Ahead Electricity Market Martin Biel 20/9-2017 Outline

ON LEARNING AND INFORMATION ACQUISITION WITH RESPECT TO FUTURE AVAILABILITY OF ALTERNATIVES

Mandating Damage Reporting &amp; Damage Prevention Councils J.D. Maniscalco Executive director

Discursive Framing &amp; Community Mobilization: Stopping the Melancthon Mega Quarry Rebecca

Advanced Section #5: Generalized Linear Models: Logistic Regression and Beyond Marios Mattheakis

Problem Norman pays Oklahoma City $3.10/1000 gallons for drinking water Previously from

TensorFlow Probability Joshua V. Dillon Software Engineer Google Research What is TensorFlow

GCSE Maths & Numeracy January 2020 GCSE Maths & Numeracy The Exams Exam Dates

Adult Learning Australia Basic numeracy concepts Chris Tully Learning Skills & Assessment

Mandating Damage Reporting & Damage Prevention Councils J.D. Maniscalco Executive director

Discursive Framing & Community Mobilization: Stopping the Melancthon Mega Quarry Rebecca