Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
Georgios Spithourakis, Steffen Petersen, Sebastian Riedel
Machine Reading Group
Numeracy for Language Models: Evaluating and Improving their Ability - - PowerPoint PPT Presentation
Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis , Steffen Petersen, Sebastian Riedel Machine Reading Group Numeracy the mat cat 2000 words 3.14 dog fox brown -1
Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
Georgios Spithourakis, Steffen Petersen, Sebastian Riedel
Machine Reading Group
words
Numeracy
π
ℤ
i 4 2018
two three four dog 0.001 fox cat mat sat jumped brown
numbers numerals
the sleeping 2
ℝ ℂ ℕ
3.14 1.73 2
ℚ
2 2 2/3 5/8 1
7 3.14…
9 1+2i −1 2000
Literate Language Models
‘I eat an apple’ ‘An apple eats me’ ‘I eats an apple’
Plausible (semantically, grammatically, etc.)
‘A apple eats I’
𝑄𝑀𝑁 𝑢𝑓𝑦𝑢
Numerate Language Models
‘John is 0 m tall’ ‘John is 1.7 m tall’ ‘John is 2 m tall’
𝑄𝑀𝑁 𝑢𝑓𝑦𝑢
‘John is 999 m tall’
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
𝑄 ℎ𝑓𝑗ℎ𝑢
Numeracy Matters
0.5 ‘Unemployment of the US is 5 %’ 50 500 ‘Our model is 10 times better than the baseline’ 100 1000 0.0 23.2 ‘Patient’s temperature is 36.6 degrees’ 41.9 98.6
Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?
Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?
A Neural Language Model
RNN 𝑞(𝑥𝑢|ℎ𝑢) ℎ𝑢 ℎ𝑢−1 𝑓𝑢 … Input 𝑥𝑢−1 … Output
A Neural Language Model
RNN 𝑞(𝑥𝑢|ℎ𝑢) ℎ𝑢 ℎ𝑢−1 𝑓𝑢 … Input 𝑥𝑢−1 … Output = softmax
𝑊
(𝑥𝑢)
the cat mat sat UNK V 2 1 1.7 2018 UNKNUM 2.1 … 0.731 9,846,321 2018.3 petrichor unothrorgaphy Spithourakis … ht
Evaluation: Adjusted Perplexity
Perplexity 2.1
… 0.731 9,846,321 2018.3
John is 2.1 m tall 𝑞 2.1 = 𝑞 BUT
+ 𝑞 0.731 + 𝑞 9,846,321 + ⋮
UNKNUM
Evaluation: Adjusted Perplexity
𝑞 2.1 = 𝑞 𝑥 ∈ Perplexity Adjusted Perplexity from test data
UNKNUM [Ueberla, 1994]
2.1
… 0.731 9,846,321 2018.3
John is 2.1 m tall 𝑞 2.1 = 𝑞 BUT
+ 𝑞 0.731 + 𝑞 9,846,321 + ⋮
UNKNUM UNKNUM [Ahn et al., 2016]
a.k.a. Unknown-Penalised Perplexity
Datasets
Clinical Dataset 16,015 clinical patient reports Source: London Chest Hospital Scientific Dataset 20,962 paragraphs from scientific papers Source: ARXIV
96% 4% words numerals
Results: Adjusted Perplexity
8.91 5.99 5 10
all tokens words numerals
80.62 51.83 50 100
all tokens words numerals
3,505,856.25
Scientific
58,443.72
(Lower is better) Clinical
Results: Adjusted Perplexity
8.91 5.99 5 10
all tokens words numerals
80.62 51.83 50 100
all tokens words numerals
3,505,856.25
Scientific
58,443.72
(Lower is better) Clinical
Results: Adjusted Perplexity
PMF
8.91 5.99 5 10
all tokens words numerals
80.62 51.83 50 100
all tokens words numerals
3,505,856.25
Scientific
58,443.72
softmax
Assumptions Reality (?)
PDF (Lower is better) UNKNUM large small large small Clinical
Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?
Strategy: Softmax & Hierarchical Softmax
softmax
V the cat mat sat UNK word numeral ht 𝑥𝑝𝑠𝑒𝑡
softmax softmax
the cat mat sat UNK 2 1 1.7 2018 UNKNUM ht 𝑢𝑧𝑞𝑓s
𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊𝒖)
Strategy: Softmax & Hierarchical Softmax
softmax
V the cat mat sat UNK 2 1 1.7 2018 UNKNUM word numeral ht 𝑥𝑝𝑠𝑒𝑡 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡
softmax softmax softmax
the cat mat sat UNK 2 1 1.7 2018 UNKNUM ht 𝑢𝑧𝑞𝑓s
Strategy: Softmax & Hierarchical Softmax
softmax
V the cat mat sat UNK word numeral ht 𝑥𝑝𝑠𝑒𝑡
softmax softmax 𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊𝒖)
the cat mat sat UNK 2 1 1.7 2018 UNKNUM ht 𝑢𝑧𝑞𝑓s
Strategy: Digit-by-Digit Composition
SOS 2 . 1 2 . 1 EOS ℎ𝑢 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1)
d-RNN
Strategy: Digit-by-Digit Composition
SOS 2 . 1 2 . 1 EOS ℎ𝑢 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1)
d-RNN UNKNUM 1.98
1.99
1.97 1.96 1.95 1.94
2.00
2.01 2.02
Strategy: from continuous PDF
0.5 1 1.5 2 2.5 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5
𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1
Strategy: from continuous PDF
0.5 1 1.5 2 2.5 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5
𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1
ht MoG softmax
𝑑𝑝𝑛𝑞𝑝𝑜𝑓𝑜𝑢
𝑞 precision = 𝑞𝑆𝑂𝑂 𝑒𝑒𝑒𝑒
𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜
𝐹𝑃𝑇 Frozen 𝜈s and 𝜏s
Overview of Strategies
<SOS> 2 . 1 2 . 1 <EOS> MoG d-RNN h-softmax
2 1.7 2018 UNKNUM 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡
softmax
Overview of Strategies
<SOS> 2 . 1 2 . 1 <EOS> MoG d-RNN h-softmax ht softmax
𝑡𝑢𝑠𝑏𝑢𝑓𝑗𝑓𝑡
combination
2 1.7 2018 UNKNUM 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡
softmax
Results: Language Modelling (1)
5.99 4.96 4.95 4.99 4.96 5 10
softmax h-softmax d-RNN MoG combination
495.95 263.22 226.46 197.59 500 1000
softmax h-softmax d-RNN MoG combination
8.91 6.05 5.88 5.88 5.82 5 10
softmax h-softmax d-RNN MoG combination
58,443.72
Clinical All Tokens Words Numerals Adjusted Perplexity (lower is better)
80.62 54.8 53.7 54.37 53.03 50 100
softmax h-softmax d-RNN MoG combination
51.83 49.81 48.89 48.97 48.25 50 100
softmax h-softmax d-RNN MoG combination
Results: Language Modelling (2)
3,505,856.25 550.98 519.8 683.16 520.95 500 1000
softmax h-softmax d-RNN MoG combination Scientific All Tokens Words Numerals Adjusted Perplexity (lower is better)
𝑁𝐵𝑄𝐹 = 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 − 𝑢𝑏𝑠𝑓𝑢 𝑢𝑏𝑠𝑓𝑢 × 100%
Results: Number Prediction
number numeral `2.1’ 2.1
426 622 747 514 348 552
200 400 600 800
mean median softmax h-softmax d-RNN MoG combination
2353.11
(lower is better) Clinical
1947 1652 1287 590 2333
500 1000 1500 2000 2500
mean softmax d-RNN combination
426 622 747 514 348 552
200 400 600 800
mean median softmax h-softmax d-RNN MoG combination
2353.11 8039 1e23
Scientific Clinical
Results: Number Prediction
(lower is better)
Softmax versus Hierarchical Softmax
1 2 3 4 … 100 101 … 2012 2013 2013 2012 … 101 100 … 4 3 2 1
cosine similarities softmax cosine similarities h-softmax
Analysis: d-RNN and Benford’s Law
cosine similarities d-RNN
0 1 2 3 4 5 6 7 8 9 . EOS EOS . 9 8 7 6 5 4 3 2 1 0
10 20 30 0 1 2 3 4 5 6 7 8 9
Scientific
1st digit 10 20 30 0 1 2 3 4 5 6 7 8 9
4th digit
d-RNN Benford
10 20 30 0 1 2 3 4 5 6 7 8 9
4th digit
10 20 30 0 1 2 3 4 5 6 7 8 9
Clinical 1st digit
Analysis: d-RNN and Benford’s Law
cosine similarities d-RNN
0 1 2 3 4 5 6 7 8 9 . EOS EOS . 9 8 7 6 5 4 3 2 1 0
Analysis: Model Predictions
MoG d-RNN h-softmax
‘… ejective fraction : ____ % ...’
Analysis: Strategy Selection
4 out of 17 segments Enhancement > 25 % Li et al. 2003 Ejective fraction: 27.00 % Ejective fraction: 35.00 % HIP 12961 and GL 676 measured 32 x 31 mm NGC 6334 stars
MoG d-RNN h-softmax ht softmax
𝑡𝑢𝑠𝑏𝑢𝑓𝑗𝑓𝑡
Small integers, percentiles, years 2-digit integers, some ids reals, some ids
Conclusion (1)
softmax
the cat mat UNK 2 3.14 2018 UNKNUM ht
Are existing LMs numerate?
Conclusion (1)
‘John’s height is ___ ’
softmax
the cat mat UNK 2 3.14 2018 UNKNUM ht
999 50 25 2018 3.14 UNKNUM 1 2 3
Are existing LMs numerate?
Conclusion (2)
MoG d-RNN h-softmax ht softmax
𝑡𝑢𝑠𝑏𝑢𝑓𝑗𝑓𝑡
combination
How to improve the numeracy of LMs?
Conclusion (2)
MoG d-RNN h-softmax ht softmax
𝑡𝑢𝑠𝑏𝑢𝑓𝑗𝑓𝑡
combination
‘John’s height is ___ ’
How to improve the numeracy of LMs?
999 50 25 2018
1 2 3
ℤ ℝ ℂ ℕ ℚ
𝟑 2 2/3 5/8 1
7 3.14 … 𝟏. ഥ 𝟘 1+2i −𝟐 200
UNKNUM