Language Modelling Makes Sense Propagating Representations through - - PowerPoint PPT Presentation

language modelling makes sense
SMART_READER_LITE
LIVE PREVIEW

Language Modelling Makes Sense Propagating Representations through - - PowerPoint PPT Presentation

Language Modelling Makes Sense Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation Da Daniel l Lo Loureiro, Alpio Jorge ACL Florence, 31 July 2019 Sense Embeddings Exploiting the latest Neural


slide-1
SLIDE 1

Language Modelling Makes Sense

Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation

Da Daniel l Lo Loureiro, Alípio Jorge ACL – Florence, 31 July 2019

slide-2
SLIDE 2

Sense Embeddings

Exploiting the latest Neural Language Models (NLMs) for sense-level representation learning.

  • Beat SOTA for Word Sense Disambiguation (WSD).
  • Full WordNet in NLM-space (+100K common sense concepts).
  • Concept-level analysis of NLMs.

Introduction Related Work Our Approach Performance Applications Conclusions

slide-3
SLIDE 3

Sense Embeddings

Exploiting the latest Neural Language Models (NLMs) for sense-level representation learning.

  • Beat SOTA for English Word Sense Disambiguation (WSD).
  • Full WordNet in NLM-space (+100K common sense concepts).
  • Concept-level analysis of NLMs.

Introduction Related Work Our Approach Performance Applications Conclusions

slide-4
SLIDE 4

Related Work

Introduction Rel elated Work

  • rk

Our Approach Performance Applications Conclusions

slide-5
SLIDE 5

Related Work

Bag-of-Features Classifiers (SVM) Deep Sequence Classifiers (BiLSTM) Sense-level Representations (k-NN)

(over NLM reprs.)

[Iacobacci et al. (2016)] [Zhong and Ng (2010)] [Luo et al. (2018b)] [Luo et al. (2018a)] [Vial et al. (2018)] [Raganato et al. (2017)] [Peters et al. (2018)] [Melamud et al. (2016)] [Yuan et al. (2016)]

Introduction Rel elated Work

  • rk

Our Approach Performance Applications Conclusions

slide-6
SLIDE 6

Related Work

Bag-of-Features Classifiers (SVM) Deep Sequence Classifiers (BiLSTM) Sense-level Representations (k-NN)

(over NLM reprs.)

[Iacobacci et al. (2016)] [Zhong and Ng (2010)] [Luo et al. (2018b)] [Luo et al. (2018a)] [Vial et al. (2018)] [Raganato et al. (2017)] [Peters et al. (2018)] [Melamud et al. (2016)] [Yuan et al. (2016)]

Introduction Rel elated Work

  • rk

Our Approach Performance Applications Conclusions

slide-7
SLIDE 7

Bag-of

  • f-Features Classifiers

It Makes Sense (IMS) [Zhong and Ng (2010)] :

  • POS tags, surrounding words, local collocations.
  • SVM for each word type in training.
  • Fallback: Most Frequent Sense (MFS).
  • Improved with word embedding features. [Iacobacci et al. (2016)]
  • Still competitive (!)

Introduction Rel elated Work

  • rk

Our Approach Performance Applications Conclusions

“glasses”

slide-8
SLIDE 8

Bi-directional LSTMs (BiLSTMs):

  • Better with:
  • Attention (as everything else).
  • Auxiliary losses. (POS, lemmas, lexnames) [Raganato et al. (2017)]
  • Glosses, via co-attention mechanisms. [Luo et al. (2018)]
  • Still must fallback on MFS.
  • Not that much better than bag-of-features…

Deep Sequence Classifiers

[Raganato et al. (2017)]

Introduction Rel elated Work

  • rk

Our Approach Performance Applications Conclusions

slide-9
SLIDE 9

Context xtual k-NN NN

Introduction Rel elated Work

  • rk

Our Approach Performance Applications Conclusions

Matching Contextual Word Embeddings:

  • Produce Sense Embeddings from NLMs (averaging).
  • Sense embs. can be compared with contextual embs.
  • Disambiguation = Nearest Neighbour search (1-NN).
  • Sense embs. limited to annotations. MFS required.
  • Promising, but early attempts.

[Ruder (2018)]

slide-10
SLIDE 10

Our Approach

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-11
SLIDE 11

Our Approach

  • Expand the k-NN approach to full-coverage of WordNet.
  • Matching senses becomes trivial, no MFS fallbacks needed.
  • Full-set of sense embeddings in NLM-space is useful beyond WSD.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-12
SLIDE 12

Our Approach

  • Expand the k-NN approach to full-coverage of WordNet.
  • Matching senses becomes trivial, no MFS fallbacks needed.
  • Full-set of sense embeddings in NLM-space is useful beyond WSD.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-13
SLIDE 13

Our Approach

  • Expand the k-NN approach to full-coverage of WordNet.
  • Matching senses becomes trivial, no MFS fallbacks needed.
  • Full-set of sense embeddings in NLM-space is useful beyond WSD.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-14
SLIDE 14

Our Approach

  • Expand the k-NN approach to full-coverage of WordNet.
  • Matching senses becomes trivial, no MFS fallbacks needed.
  • Full-set of sense embeddings in NLM-space is useful beyond WSD.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-15
SLIDE 15

Our Approach

  • Expand the k-NN approach to full-coverage of WordNet.
  • Matching senses becomes trivial, no MFS fallbacks needed.
  • Full-set of sense embeddings in NLM-space is useful beyond WSD.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-16
SLIDE 16

Challenges

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-17
SLIDE 17

Challenges

  • Overcome very limited sense annotations (covers 16% senses).
  • Infer missing senses correctly so that task performance improves.
  • Rely only on sense embeddings, no lemma or POS features.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-18
SLIDE 18

Challenges

  • Overcome very limited sense annotations (covers 16% senses).
  • Infer missing senses correctly so that task performance improves.
  • Rely only on sense embeddings, no lemma or POS features.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-19
SLIDE 19

Challenges

  • Overcome very limited sense annotations (covers 16% senses).
  • Infer missing senses correctly so that task performance improves.
  • Rely only on sense embeddings, no lemma or POS features.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-20
SLIDE 20

Challenges

  • Overcome very limited sense annotations (covers 16% senses).
  • Infer missing senses correctly so that task performance improves.
  • Rely only on sense embeddings, no lemma or POS features.

Reinforce Enrich Propagate Bootstrap

Annotated Dataset WordNet Ontology WordNet Glosses Morphological Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-21
SLIDE 21

Bootstrapping Sense Embeddings

Can your insurance company aid you in reducing administrative costs ? Would it be feasible to limit the menu in order to reduce feeding costs ?

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-22
SLIDE 22

Bootstrapping Sense Embeddings

Can your insurance company aid you in reducing administrative costs ?

insurance_company%1:14:00:: aid%2:41:00:: reduce%2:30:00:: administrative%3:01:00:: cost%1:21:00::

Would it be feasible to limit the menu in order to reduce feeding costs ?

cost%1:21:00:: feasible%5:00:00:possible:00 limit%2:30:00:: menu%1:10:00:: reduce%2:30:00:: feeding%1:04:01::

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-23
SLIDE 23

Bootstrapping Sense Embeddings

Can your insurance company aid you in reducing administrative costs ?

insurance_company%1:14:00:: aid%2:41:00:: reduce%2:30:00:: administrative%3:01:00:: cost%1:21:00::

Would it be feasible to limit the menu in order to reduce feeding costs ?

cost%1:21:00:: feasible%5:00:00:possible:00 limit%2:30:00:: menu%1:10:00:: reduce%2:30:00:: feeding%1:04:01::

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-24
SLIDE 24

Bootstrapping Sense Embeddings

Can your insurance company aid you in reducing administrative costs ?

insurance_company%1:14:00:: aid%2:41:00:: reduce%2:30:00:: administrative%3:01:00:: cost%1:21:00::

Would it be feasible to limit the menu in order to reduce feeding costs ?

cost%1:21:00:: feasible%5:00:00:possible:00 limit%2:30:00:: menu%1:10:00:: reduce%2:30:00:: feeding%1:04:01::

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-25
SLIDE 25

Bootstrapping Sense Embeddings

insurance_company%1:14:00:: aid%2:41:00:: reduce%2:30:00:: administrative%3:01:00:: cost%1:21:00:: cost%1:21:00:: feasible%5:00:00:possible:00 limit%2:30:00:: menu%1:10:00:: reduce%2:30:00:: feeding%1:04:01::

𝑑1 𝑑1 𝑑1 𝑑1 𝑑1 𝑑2 𝑑2 𝑑2 𝑑2 𝑑2 𝑑2

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-26
SLIDE 26

Bootstrapping Sense Embeddings

reduce%2:30:00:: cost%1:21:00:: cost%1:21:00:: reduce%2:30:00::

𝑑1 𝑑1 𝑑2 𝑑2

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-27
SLIDE 27

Bootstrapping Sense Embeddings

𝑤 reduce%2:30:00::

reduce%2:30:00::

𝑑1

reduce%2:30:00::

𝑑2

+ n

reduce%2:30:00::

𝑑n

+ + … =

𝑤 cost%1:21:00::

cost%1:21:00::

𝑑1

cost%1:21:00::

𝑑2

+ n

cost%1:21:00::

𝑑n

+ + … =

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-28
SLIDE 28

Bootstrapping Sense Embeddings

𝑤 reduce%2:30:00::

reduce%2:30:00::

𝑑1

reduce%2:30:00::

𝑑2

+ n

reduce%2:30:00::

𝑑n

+ + … =

𝑤 cost%1:21:00::

cost%1:21:00::

𝑑1

cost%1:21:00::

𝑑2

+ n

cost%1:21:00::

𝑑n

+ + … =

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Outcome: 33,360 sense embeddings (16% coverage)

slide-29
SLIDE 29

Propagating Sense Embeddings

WordNet’s units, synsets, represent concepts at different levels.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

slide-30
SLIDE 30

Propagating Sense Embeddings

WordNet’s units, synsets, represent concepts at different levels.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Sensekey Sensekey Synset Synset Synset Synset Lexname Sensekey Sensekey

slide-31
SLIDE 31

Propagating Sense Embeddings

WordNet’s units, synsets, represent concepts at different levels.

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

kid%1:18:00:: Sensekey child.n.01 Synset juvenile.n.01 Synset noun.person Sensekey Sensekey

slide-32
SLIDE 32

Propagating Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

hamburger%1:13:01:: burger%1:13:00:: hotdog%1:18:00:: potato_chip%1:13:00:: wrap%1:13:00:: sandwich%1:13:00::

slide-33
SLIDE 33

Propagating Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

hamburger%1:13:01:: burger%1:13:00:: hotdog%1:18:00:: potato_chip%1:13:00:: wrap%1:13:00:: sandwich%1:13:00::

slide-34
SLIDE 34

Propagating Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

hamburger%1:13:01:: burger%1:13:00::

burger.n.02 hotdog.n.01 sandwich.n.01 chips.n.04 noun.food

hotdog%1:18::00 potato_chip%1:13::00

wrap.n.02

wrap%1:13::00 sandwich%1:13:00::

Retrieve Synsets, Relations and Categories

slide-35
SLIDE 35

Propagating Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

hamburger%1:13:01:: burger%1:13:00::

burger.n.02 hotdog.n.01 sandwich.n.01 chips.n.04 noun.food

hotdog%1:18::00 potato_chip%1:13::00

wrap.n.02

wrap%1:13::00 sandwich%1:13:00::

1st stage: Synset Embeddings

slide-36
SLIDE 36

Propagating Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

hamburger%1:13:01:: burger%1:13:00::

burger.n.02 hotdog.n.01 sandwich.n.01 chips.n.04 noun.food

hotdog%1:18::00 potato_chip%1:13::00

wrap.n.02

wrap%1:13::00 sandwich%1:13:00::

2nd Stage: Hypernym Embeddings (ind. Synsets)

slide-37
SLIDE 37

Propagating Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

hamburger%1:13:01:: burger%1:13:00::

burger.n.02 hotdog.n.01 sandwich.n.01 chips.n.04 noun.food

hotdog%1:18::00 potato_chip%1:13::00

wrap.n.02

wrap%1:13::00 sandwich%1:13:00::

3rd Stage: Lexname Embeddings

slide-38
SLIDE 38

Propagating Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

hamburger%1:13:01:: burger%1:13:00::

burger.n.02 hotdog.n.01 sandwich.n.01 chips.n.04 noun.food

hotdog%1:18::00 potato_chip%1:13::00

wrap.n.02

wrap%1:13::00 sandwich%1:13:00::

But XX != __ …

slide-39
SLIDE 39

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Leverage Synset Definitions and Lemmas for Differentiation

slide-40
SLIDE 40

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Leverage Synset Definitions and Lemmas for Differentiation

sandwich:%1:13:00:: (sandwich.n.01)

Definition: two (or more) slices of bread with a filling between them Lemmas: sandwich

wrap:%1:13:00:: (wrap.n.02)

Definition: a sandwich in which the filling is rolled up in a soft tortilla Lemmas: wrap, tortilla

slide-41
SLIDE 41

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Compose a new context

sandwich:%1:13:00:: (sandwich.n.01)

sandwich - two (or more) slices of bread with a filling between them

wrap:%1:13:00:: (wrap.n.02)

wrap, tortilla - a sandwich in which the filling is rolled up in a soft tortilla

slide-42
SLIDE 42

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Make the context specific to sensekey (repeat lemma)

sandwich:%1:13:00::

sandwich - sandwich - two (or more) slices of bread with a filling between them

wrap%1:13:00::

wrap - wrap, tortilla - a sandwich in which the filling is rolled up in a soft tortilla

slide-43
SLIDE 43

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Make the context specific to sensekey (repeat lemma)

sandwich:%1:13:00::

sandwich - sandwich - two (or more) slices of bread with a filling between them

wrap%1:13:00::

wrap - wrap, tortilla - a sandwich in which the filling is rolled up in a soft tortilla

slide-44
SLIDE 44

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Obtain contextual embeddings for every token

sandwich:%1:13:00::

sandwich - sandwich - two (or more) slices of bread with a filling between them

wrap%1:13:00::

wrap – wrap, tortilla - a sandwich in which the filling is rolled up in a soft tortilla

𝑑 𝑑 𝑑 𝑑 𝑑 𝑑 … 𝑑 𝑑 …

slide-45
SLIDE 45

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Sentence Embedding from avg. of Contextual Embeddings

sandwich:%1:13:00::

sandwich - sandwich - two (or more) slices of bread with a filling between them

wrap%1:13:00::

wrap - wrap - a sandwich in which the filling is rolled up in a soft tortilla

𝑤𝑒 = 𝑤𝑒 =

𝑒 = 1024

slide-46
SLIDE 46

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Merge Sentence Embedding with previous Sense Embedding

sandwich:%1:13:00::

sandwich - sandwich - two (or more) slices of bread with a filling between them

wrap%1:13:00::

wrap - wrap - a sandwich in which the filling is rolled up in a soft tortilla

𝑤𝑒 = 𝑤𝑒 =

sandwich:%1:13:00::

𝑤𝑡 =

wrap:%1:13:00::

𝑤𝑡 =

slide-47
SLIDE 47

sandwich:%1:13:00::

sandwich - sandwich - two (or more) slices of bread with a filling between them

wrap%1:13:00::

wrap - wrap - a sandwich in which the filling is rolled up in a soft tortilla

Merge Sentence Embedding with previous Sense Embedding

Enriching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

𝑤𝑡 = 𝑤𝑡 =

𝑒 = 2048

slide-48
SLIDE 48

Reinforcing Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Contextual Embeddings aren’t good at preserving morphological relatedness

slide-49
SLIDE 49

Reinforcing Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Retrieve char-ngram embeddings (static) for lemmas

sandwich:%1:13:00:: wrap%1:13:00::

𝑤𝑚 = 𝑤𝑚 =

slide-50
SLIDE 50

Reinforcing Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Merge with previous sense embeddings

sandwich:%1:13:00:: wrap%1:13:00::

𝑤𝑚 = 𝑤𝑚 = 𝑤𝑡 = 𝑤𝑡 =

slide-51
SLIDE 51

Reinforcing Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

Merge with previous sense embeddings

sandwich:%1:13:00:: wrap%1:13:00::

𝑤𝑡 = 𝑤𝑡 =

𝑒 = 2348

slide-52
SLIDE 52

Matching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

The glasses are in the cupboard.

slide-53
SLIDE 53

Matching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

The glasses are in the cupboard.

Ԧ 𝑑 Ԧ 𝑤

slide-54
SLIDE 54

Matching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

The glasses are in the cupboard.

Ԧ 𝑑 Ԧ 𝑤 Ԧ 𝑑 Ԧ 𝑑 Ԧ 𝑤 𝑤𝑢 =

slide-55
SLIDE 55

Matching Sense Embeddings

Introduction Related Work Our ur Ap Approach Performance Applications Conclusions

The glasses are in the cupboard.

Ԧ 𝑑 Ԧ 𝑤 Ԧ 𝑑 Ԧ 𝑑 Ԧ 𝑤 𝑤𝑢 =

𝑤𝑒 𝑤𝑚 𝑤𝑡 𝑤𝑒 𝑤𝑚 𝑤𝑡 𝑤𝑒 𝑤𝑚 𝑤𝑡

spectacles%1:06:00:: glass%1:27:00:: drinking_glass%1:06:00::

slide-56
SLIDE 56

WSD Results

Introduction Related Work Our Approach Per erformance Applications Conclusions

slide-57
SLIDE 57

WSD Results

Introduction Related Work Our Approach Per erformance Applications Conclusions

60 65 70 75 80 MFS IMS (Zhong and Ng, 2010) IMS + Emb. (Iacobacci et

  • al. 2016)

BiLSTM (Raganato et

  • al. 2017)

BiLSTM VR (Vial et al. 2018) context2vec (Melamud et

  • al. 2016)

ELMo k-NN (Peters et al. 2018) BERT k-NN (Adapted Peters et al.) LMMS-BERT (Ours)

Standard English WSD Evaluation

F1 on ALL set of the WSD Evaluation Framework (Raganato et al. 2017)

slide-58
SLIDE 58

WSD Results

Introduction Related Work Our Approach Per erformance Applications Conclusions

Uninformed Sense Matching (matching +200K)

Same standard but without filtering candidates by lemmas or POS

10 20 30 40 50 60 70 80 LMMS 1024 LMMS 2048 LMMS 2348

slide-59
SLIDE 59

Applying Sense Embeddings

Introduction Related Work Our Approach Performance Ap Applications Conclusions

slide-60
SLIDE 60

World Knowledge in NLMs

What’s BERT thinking about when he reads?

Introduction Related Work Our Approach Performance Ap Applications Conclusions

slide-61
SLIDE 61

World Knowledge in NLMs

Introduction Related Work Our Approach Performance Ap Applications Conclusions

[E1] played [E2] in [E3]

slide-62
SLIDE 62

Checking for Biases in NLMs

Putting BERT on the spot

Introduction Related Work Our Approach Performance Ap Applications Conclusions

slide-63
SLIDE 63

Checking for Biases in NLMs

Introduction Related Work Our Approach Performance Ap Applications Conclusions

𝑐𝑗𝑏𝑡 𝑡 = 𝑡𝑗𝑛 Ԧ 𝑤𝑛𝑏𝑜𝑜

1, Ԧ

𝑤𝑡 − 𝑡𝑗𝑛( Ԧ 𝑤𝑥𝑝𝑛𝑏𝑜𝑜

1, Ԧ

𝑤𝑡)

slide-64
SLIDE 64

Conclusion

  • Powerful NLMs allow for a simple k-NN to perform really well for

WSD.

  • NLMs are improving very rapidly, progress in WSD should follow.
  • Sense embeddings from NLMs are useful not only for WSD, but also

for NLM inspection, and other probing or downstream tasks.

Introduction Related Work Our Approach Performance Applications Conclusions

slide-65
SLIDE 65

Future Work

  • Pipeline Improvements: Better NLMs, sentence embeddings, char

embeddings, use of WN, etc..

  • Multilingual Sense Embeddings.
  • Semi-supervised Refinement.
  • Formalize inspection (probing task), other applications.

Introduction Related Work Our Approach Performance Applications Conclusions

slide-66
SLIDE 66

Thanks

Introduction Related Work Our Approach Performance Applications Conclusions

Code and Sense Embeddings: github.com/danlou/LMMS

@danielbloureiro dloureiro@fc.up.pt