Moving Down the Long Tail of Word Sense Disambiguation with Gloss - PowerPoint PPT Presentation

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra Blevins and Luke Zettlemoyer

The plant sprouted a new leaf. (n) (botany) a (n) buildings for (v) to put or set living organism... carrying on (a seed or plant) industrial labor into the ground

Target Word Context The plant sprouted a new leaf. (n) (botany) a (n) buildings for (v) to put or set living organism... carrying on (a seed or plant) industrial labor into the ground Candidate Senses

Data Sparsity in WSD ● Senses have Zipfian distribution in natural language Kilgarriff (2004), How dominant is the commonest sense of a word? . Kumar et al. (2019), Zero-shot Word Sense Disambiguation using Sense Definition Embeddings.

Data Sparsity in WSD EWISE ● Senses have Zipfian distribution in natural language ● Data imbalance leads to worse performance on uncommon senses Kilgarriff (2004), How dominant is the commonest sense of a word? . Kumar et al. (2019), Zero-shot Word Sense Disambiguation using Sense Definition Embeddings.

Data Sparsity in WSD EWISE ● Senses have Zipfian distribution in natural language ● Data imbalance leads to worse 62.3 F1 performance on uncommon senses point gap Kilgarriff (2004), How dominant is the commonest sense of a word? . Kumar et al. (2019), Zero-shot Word Sense Disambiguation using Sense Definition Embeddings.

Data Sparsity in WSD EWISE ● Senses have Zipfian distribution in natural language ● Data imbalance leads to worse 62.3 F1 performance on uncommon senses point gap ● We propose an approach to improve performance on rare senses with pretrained models and glosses Kilgarriff (2004), How dominant is the commonest sense of a word? . Kumar et al. (2019), Zero-shot Word Sense Disambiguation using Sense Definition Embeddings.

Incorporating Glosses into WSD Models ● Lexical overlap between context and gloss is a successful knowledge -based approach (Lesk, 1986)

Incorporating Glosses into WSD Models ● Lexical overlap between context and gloss is a successful knowledge -based approach (Lesk, 1986) ● Neural models integrate glosses by: ○ Adding glosses as additional inputs into the WSD model (Luo et al., 2018a,b)

Incorporating Glosses into WSD Models ● Lexical overlap between context and gloss is a successful knowledge -based approach (Lesk, 1986) ● Neural models integrate glosses by: ○ Adding glosses as additional inputs into the WSD model (Luo et al., 2018a,b) Mapping encoded gloss ○ representations onto graph embeddings to be used as labels for a WSD model (Kumar et al., 2019)

Pretrained Models for WSD ● Simple probing classifiers on frozen pretrained representations found to perform better than models without pretraining Hadiwinoto et al. (2019), Improved word sense disambiguation using pretrained contextualized representations. Huang et al. (2019), GlossBERT: Bert for word sense disambiguation with gloss knowledge.

Pretrained Models for WSD ● Simple probing classifiers on frozen pretrained representations found to perform better than models without pretraining ● GlossBERT finetunes BERT on WSD with glosses by setting it up as a sentence-pair classification task Hadiwinoto et al. (2019), Improved word sense disambiguation using pretrained contextualized representations. Huang et al. (2019), GlossBERT: Bert for word sense disambiguation with gloss knowledge.

Our Approach: Gloss Informed Bi-encoder ● Two encoders that independently encode the context and gloss , aligning the target word embedding to the correct sense embedding

Our Approach: Gloss Informed Bi-encoder ● Two encoders that independently encode the context and gloss , aligning the target word embedding to the correct sense embedding ● Encoders initialized with BERT and trained end-to-end, without external knowledge

Our Approach: Gloss Informed Bi-encoder ● Two encoders that independently encode the context and gloss , aligning the target word embedding to the correct sense embedding ● Encoders initialized with BERT and trained end-to-end, without external knowledge ● The bi-encoder is more computationally efficient than a cross-encoder

Our Approach: Gloss Informed Bi-encoder

Baselines and Prior Work Model Glosses? Pretraining? Source HCAN Luo et al., 2018a ✓ EWISE Kumar et al., 2019 ✓ BERT Probe ✓ Ours GLU ✓ Hadiwinoto et al., 2019 LMMS Loureiro and Jorge, 2019 ✓ ✓ SVC Vial et al., 2019 ✓ GlossBERT Huang et al., 2019 ✓ ✓ Bi-encoder Model ( BEM ) Ours ✓ ✓

Overall WSD Performance 71.8 71.1 MFS baseline (65.5)

Overall WSD Performance 73.7 71.8 71.1

Overall WSD Performance 77.0 75.6 75.4 74.1 73.7 71.8 71.1

Overall WSD Performance 79.0 77.0 75.6 75.4 74.1 73.7 71.8 71.1

Performance by Sense Frequency

Performance by Sense Frequency MFS Performance 94.9 94.1 93.5

Performance by Sense Frequency MFS Performance LFS Performance 94.9 94.1 93.5 52.6 37.0 31.2

Performance by Sense Frequency MFS Performance LFS Performance BEM gains come 94.9 94.1 93.5 almost entirely from LFS 52.6 37.0 31.2

Zero-shot Evaluation ● BEM can represent new, unseen senses with gloss encoder and encode unseen words with the context encoder ● Probe baseline relies on WordNet back-off , predicting the most common 91.2 sense of unseen words as indicated in WordNet 84.9

Zero-shot Evaluation Zero-shot Words 91.0 91.2 84.9

Zero-shot Evaluation Zero-shot Words Zero-shot Senses 91.0 91.2 84.9 68.9 53.6

Few-shot Learning of WSD Train BEM (and frozen probe baseline) on subset of SemCor, with (up to) k examples of each sense in the training data

Few-shot Learning of WSD Train BEM (and frozen probe baseline) on subset of SemCor, with (up to) k examples of each sense in the training data BEM at k=5 gets similar performance to full baseline

Takeaways ● The BEM improves over the BERT probe baseline and prior approaches to using (1) sense definitions and (2) pretrained models for WSD

Takeaways ● The BEM improves over the BERT probe baseline and prior approaches for using (1) sense definitions and (2) pretrained models for WSD ● Gains stem from better performance on less common and unseen senses

Takeaways ● The BEM improves over the BERT probe baseline and prior approaches to using (1) sense definitions and (2) pretrained models for WSD ● Gains stem from better performance on less common and unseen senses https://github.com/facebookresearch/wsd-biencoders Questions? blvns@cs.washington.edu

Moving Down the Long Tail of Word Sense Disambiguation with Gloss - PowerPoint PPT Presentation

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra Blevins and Luke Zettlemoyer The plant sprouted a new leaf. (n) (botany) a (n) buildings for (v) to put or set living organism... carrying on (a

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Word Sense Disambiguation WORD SENSE DISAMBIGUATION Homonymy and Polysemy As we have seen,

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense

Final Projects Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky

Word Sense Disambiguation for Ontological Document Classification Speaker: Georgiana Ifrim

Similarity-based Word Sense Disambiguation Yael Karov Shimon Edelman Weizmann Institute MIT

Semantics Avalanche: Word Sense Disambiguation, Dependency Parsing, Semantic Role

Semantics Avalanche: Word Sense Disambiguation, Dependency Parsing, Semantic Role Labeling/Verb

Natural Language Processing: Word Sense Disambiguation Roman Kern <rkern@tugraz.at>

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

Data-driven sense induction for disambiguation and lexical selection in translation Marianna

Topic Models for Word Sense Disambiguation and Token-based Idiom Detection Linlin Li, Benjamin

HW #8 WordNet-based WSD Perform word sense disambiguation of probe word In context of

Slide 7 / 120 Slide 8 / 120 4 What is the formula for measuring the speed of an 5 Lance

Hormonal contraception (HC), thrombosis and cancer. An update jvind Lidegaard Clinical

Sec 1 Parents Night What we hope to achieve this evening Familiarise parents with Teaching

DAM and HRH Eat and Educate Session Thursday 28 th November 2019 8.30am 10.30am HOW CAN

Natural Language Processing Info 159/259 Lecture 19: Semantic parsing (Oct. 30, 2018) David

Epistemological Databases Andrew McCallum Department of Computer Science University of

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Multimedia Event Detection: Strong by Integration Hao ZHANG 1 , Maaike de Boer 2 Yijie Lu 1 ,

Moving Down the Long Tail of Word Sense Disambiguation with Gloss - PowerPoint PPT Presentation

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra Blevins and Luke Zettlemoyer The plant sprouted a new leaf. (n) (botany) a (n) buildings for (v) to put or set living organism... carrying on (a

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

Word Meaning &amp; Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Word Sense Disambiguation WORD SENSE DISAMBIGUATION Homonymy and Polysemy As we have seen,

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense

Final Projects Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky

Word Sense Disambiguation for Ontological Document Classification Speaker: Georgiana Ifrim

Similarity-based Word Sense Disambiguation Yael Karov Shimon Edelman Weizmann Institute MIT

Semantics Avalanche: Word Sense Disambiguation, Dependency Parsing, Semantic Role

Semantics Avalanche: Word Sense Disambiguation, Dependency Parsing, Semantic Role Labeling/Verb

Natural Language Processing: Word Sense Disambiguation Roman Kern &lt;rkern@tugraz.at&gt;

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

Data-driven sense induction for disambiguation and lexical selection in translation Marianna

Topic Models for Word Sense Disambiguation and Token-based Idiom Detection Linlin Li, Benjamin

HW #8 WordNet-based WSD Perform word sense disambiguation of probe word In context of

Slide 7 / 120 Slide 8 / 120 4 What is the formula for measuring the speed of an 5 Lance

Hormonal contraception (HC), thrombosis and cancer. An update jvind Lidegaard Clinical

Sec 1 Parents Night What we hope to achieve this evening Familiarise parents with Teaching

DAM and HRH Eat and Educate Session Thursday 28 th November 2019 8.30am 10.30am HOW CAN

Natural Language Processing Info 159/259 Lecture 19: Semantic parsing (Oct. 30, 2018) David

Epistemological Databases Andrew McCallum Department of Computer Science University of

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Multimedia Event Detection: Strong by Integration Hao ZHANG 1 , Maaike de Boer 2 Yijie Lu 1 ,

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Natural Language Processing: Word Sense Disambiguation Roman Kern <rkern@tugraz.at>