Numeracy for Language Models: Evaluating and Improving their Ability - - PowerPoint PPT Presentation

numeracy for language models evaluating and improving
SMART_READER_LITE
LIVE PREVIEW

Numeracy for Language Models: Evaluating and Improving their Ability - - PowerPoint PPT Presentation

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis , Steffen Petersen, Sebastian Riedel Machine Reading Group Numeracy the mat cat 2000 words 3.14 dog fox brown -1


slide-1
SLIDE 1

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers

Georgios Spithourakis, Steffen Petersen, Sebastian Riedel

Machine Reading Group

slide-2
SLIDE 2

words

Numeracy

π

i 4 2018

  • ne

two three four dog 0.001 fox cat mat sat jumped brown

numbers numerals

the sleeping 2

ℝ ℂ ℕ

3.14 1.73 2

2 2 2/3 5/8 1

  • 1

7 3.14…

  • 0. ത

9 1+2i −1 2000

slide-3
SLIDE 3

Literate Language Models

‘I eat an apple’ ‘An apple eats me’ ‘I eats an apple’

Plausible (semantically, grammatically, etc.)

‘A apple eats I’

𝑄𝑀𝑁 𝑢𝑓𝑦𝑢

slide-4
SLIDE 4

Numerate Language Models

‘John is 0 m tall’ ‘John is 1.7 m tall’ ‘John is 2 m tall’

𝑄𝑀𝑁 𝑢𝑓𝑦𝑢

‘John is 999 m tall’

1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2

𝑄 ℎ𝑓𝑗𝑕ℎ𝑢

slide-5
SLIDE 5

Numeracy Matters

0.5 ‘Unemployment of the US is 5 %’ 50 500 ‘Our model is 10 times better than the baseline’ 100 1000 0.0 23.2 ‘Patient’s temperature is 36.6 degrees’ 41.9 98.6

slide-6
SLIDE 6

Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?

slide-7
SLIDE 7

Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?

slide-8
SLIDE 8

A Neural Language Model

RNN 𝑞(𝑥𝑢|ℎ𝑢) ℎ𝑢 ℎ𝑢−1 𝑓𝑢 … Input 𝑥𝑢−1 … Output

slide-9
SLIDE 9

A Neural Language Model

RNN 𝑞(𝑥𝑢|ℎ𝑢) ℎ𝑢 ℎ𝑢−1 𝑓𝑢 … Input 𝑥𝑢−1 … Output = softmax

𝑊

(𝑥𝑢)

the cat mat sat UNK V 2 1 1.7 2018 UNKNUM 2.1 … 0.731 9,846,321 2018.3 petrichor unothrorgaphy Spithourakis … ht

slide-10
SLIDE 10

Evaluation: Adjusted Perplexity

Perplexity 2.1

… 0.731 9,846,321 2018.3

John is 2.1 m tall 𝑞 2.1 = 𝑞 BUT

+ 𝑞 0.731 + 𝑞 9,846,321 + ⋮

UNKNUM

slide-11
SLIDE 11

Evaluation: Adjusted Perplexity

𝑞 2.1 = 𝑞 𝑥 ∈ Perplexity Adjusted Perplexity from test data

UNKNUM [Ueberla, 1994]

2.1

… 0.731 9,846,321 2018.3

John is 2.1 m tall 𝑞 2.1 = 𝑞 BUT

+ 𝑞 0.731 + 𝑞 9,846,321 + ⋮

UNKNUM UNKNUM [Ahn et al., 2016]

a.k.a. Unknown-Penalised Perplexity

slide-12
SLIDE 12

Datasets

Clinical Dataset 16,015 clinical patient reports Source: London Chest Hospital Scientific Dataset 20,962 paragraphs from scientific papers Source: ARXIV

96% 4% words numerals

slide-13
SLIDE 13

Results: Adjusted Perplexity

8.91 5.99 5 10

all tokens words numerals

80.62 51.83 50 100

all tokens words numerals

3,505,856.25

Scientific

58,443.72

(Lower is better) Clinical

slide-14
SLIDE 14

Results: Adjusted Perplexity

8.91 5.99 5 10

all tokens words numerals

80.62 51.83 50 100

all tokens words numerals

3,505,856.25

Scientific

58,443.72

(Lower is better) Clinical

slide-15
SLIDE 15

Results: Adjusted Perplexity

PMF

8.91 5.99 5 10

all tokens words numerals

80.62 51.83 50 100

all tokens words numerals

3,505,856.25

Scientific

58,443.72

softmax

Assumptions Reality (?)

PDF (Lower is better) UNKNUM large small large small Clinical

slide-16
SLIDE 16

Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?

slide-17
SLIDE 17

Strategy: Softmax & Hierarchical Softmax

softmax

V the cat mat sat UNK word numeral ht 𝑥𝑝𝑠𝑒𝑡

softmax softmax

the cat mat sat UNK 2 1 1.7 2018 UNKNUM ht 𝑢𝑧𝑞𝑓s

𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊𝒖)

slide-18
SLIDE 18

Strategy: Softmax & Hierarchical Softmax

softmax

V the cat mat sat UNK 2 1 1.7 2018 UNKNUM word numeral ht 𝑥𝑝𝑠𝑒𝑡 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡

softmax softmax softmax

the cat mat sat UNK 2 1 1.7 2018 UNKNUM ht 𝑢𝑧𝑞𝑓s

slide-19
SLIDE 19

Strategy: Softmax & Hierarchical Softmax

softmax

V the cat mat sat UNK word numeral ht 𝑥𝑝𝑠𝑒𝑡

softmax softmax 𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊𝒖)

  • h-softmax
  • digit-by-digit
  • from PDF
  • etc.

the cat mat sat UNK 2 1 1.7 2018 UNKNUM ht 𝑢𝑧𝑞𝑓s

slide-20
SLIDE 20

Strategy: Digit-by-Digit Composition

SOS 2 . 1 2 . 1 EOS ℎ𝑢 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1)

d-RNN

slide-21
SLIDE 21

Strategy: Digit-by-Digit Composition

SOS 2 . 1 2 . 1 EOS ℎ𝑢 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1)

d-RNN UNKNUM 1.98

1.99

1.97 1.96 1.95 1.94

2.00

2.01 2.02

slide-22
SLIDE 22

Strategy: from continuous PDF

0.5 1 1.5 2 2.5 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5

PDF

𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1

slide-23
SLIDE 23

Strategy: from continuous PDF

0.5 1 1.5 2 2.5 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5

PDF

𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1

ht MoG softmax

𝑑𝑝𝑛𝑞𝑝𝑜𝑓𝑜𝑢

𝑞 precision = 𝑞𝑆𝑂𝑂 𝑒𝑒𝑒𝑒

𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜

𝐹𝑃𝑇 Frozen 𝜈s and 𝜏s

slide-24
SLIDE 24

Overview of Strategies

<SOS> 2 . 1 2 . 1 <EOS> MoG d-RNN h-softmax

PDF

2 1.7 2018 UNKNUM 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡

softmax

slide-25
SLIDE 25

Overview of Strategies

<SOS> 2 . 1 2 . 1 <EOS> MoG d-RNN h-softmax ht softmax

𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡

combination

PDF

2 1.7 2018 UNKNUM 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡

softmax

slide-26
SLIDE 26

Results: Language Modelling (1)

5.99 4.96 4.95 4.99 4.96 5 10

softmax h-softmax d-RNN MoG combination

495.95 263.22 226.46 197.59 500 1000

softmax h-softmax d-RNN MoG combination

8.91 6.05 5.88 5.88 5.82 5 10

softmax h-softmax d-RNN MoG combination

58,443.72

Clinical All Tokens Words Numerals Adjusted Perplexity (lower is better)

slide-27
SLIDE 27

80.62 54.8 53.7 54.37 53.03 50 100

softmax h-softmax d-RNN MoG combination

51.83 49.81 48.89 48.97 48.25 50 100

softmax h-softmax d-RNN MoG combination

Results: Language Modelling (2)

3,505,856.25 550.98 519.8 683.16 520.95 500 1000

softmax h-softmax d-RNN MoG combination Scientific All Tokens Words Numerals Adjusted Perplexity (lower is better)

slide-28
SLIDE 28

𝑁𝐵𝑄𝐹 = 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 − 𝑢𝑏𝑠𝑕𝑓𝑢 𝑢𝑏𝑠𝑕𝑓𝑢 × 100%

Results: Number Prediction

number numeral `2.1’ 2.1

426 622 747 514 348 552

200 400 600 800

mean median softmax h-softmax d-RNN MoG combination

2353.11

(lower is better) Clinical

slide-29
SLIDE 29

1947 1652 1287 590 2333

500 1000 1500 2000 2500

mean softmax d-RNN combination

426 622 747 514 348 552

200 400 600 800

mean median softmax h-softmax d-RNN MoG combination

2353.11 8039 1e23

Scientific Clinical

Results: Number Prediction

(lower is better)

slide-30
SLIDE 30

Softmax versus Hierarchical Softmax

1 2 3 4 … 100 101 … 2012 2013 2013 2012 … 101 100 … 4 3 2 1

cosine similarities softmax cosine similarities h-softmax

slide-31
SLIDE 31

Analysis: d-RNN and Benford’s Law

cosine similarities d-RNN

0 1 2 3 4 5 6 7 8 9 . EOS EOS . 9 8 7 6 5 4 3 2 1 0

slide-32
SLIDE 32

10 20 30 0 1 2 3 4 5 6 7 8 9

Scientific

1st digit 10 20 30 0 1 2 3 4 5 6 7 8 9

4th digit

d-RNN Benford

10 20 30 0 1 2 3 4 5 6 7 8 9

4th digit

10 20 30 0 1 2 3 4 5 6 7 8 9

Clinical 1st digit

Analysis: d-RNN and Benford’s Law

cosine similarities d-RNN

0 1 2 3 4 5 6 7 8 9 . EOS EOS . 9 8 7 6 5 4 3 2 1 0

slide-33
SLIDE 33

Analysis: Model Predictions

MoG d-RNN h-softmax

‘… ejective fraction : ____ % ...’

slide-34
SLIDE 34

Analysis: Strategy Selection

4 out of 17 segments Enhancement > 25 % Li et al. 2003 Ejective fraction: 27.00 % Ejective fraction: 35.00 % HIP 12961 and GL 676 measured 32 x 31 mm NGC 6334 stars

MoG d-RNN h-softmax ht softmax

𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡

Small integers, percentiles, years 2-digit integers, some ids reals, some ids

slide-35
SLIDE 35

Conclusion (1)

softmax

the cat mat UNK 2 3.14 2018 UNKNUM ht

Are existing LMs numerate?

slide-36
SLIDE 36

Conclusion (1)

‘John’s height is ___ ’

softmax

the cat mat UNK 2 3.14 2018 UNKNUM ht

999 50 25 2018 3.14 UNKNUM 1 2 3

Are existing LMs numerate?

slide-37
SLIDE 37

Conclusion (2)

MoG d-RNN h-softmax ht softmax

𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡

combination

How to improve the numeracy of LMs?

slide-38
SLIDE 38

Conclusion (2)

MoG d-RNN h-softmax ht softmax

𝑡𝑢𝑠𝑏𝑢𝑓𝑕𝑗𝑓𝑡

combination

‘John’s height is ___ ’

2.1

1.73

1.8 2

How to improve the numeracy of LMs?

slide-39
SLIDE 39

Thank you!

2.1

1.73

1.8 2

999 50 25 2018

3.14

1 2 3

ℤ ℝ ℂ ℕ ℚ

𝟑 2 2/3 5/8 1

  • 1

7 3.14 … 𝟏. ഥ 𝟘 1+2i −𝟐 200

UNKNUM