Noun Sense Induction and Disambiguation using Graph-Based - PowerPoint PPT Presentation

KONVENS 2016, The 13-th Conference on Natural Language Processing 21 September, 2016, Bochum, Germany Noun Sense Induction and Disambiguation using Graph-Based Distributional Semantics Alexander Panchenko , Johannes Simon, Martin Riedl and Chris Biemann Technische Universität Darmstadt, LT Group, Computer Science Department, Germany September 19, 2016 | 1

Summary ▶ An approach to word sense induction and disambiguation . ▶ The method is unsupervised and knowledge-free . ▶ Sense induction by clustering of word similarity networks ▶ Feature aggregation w.r.t. the induced inventory. ▶ Comparable to the state-of-the-art unsupervised WSD (SemEval’13 participants and various sense embeddings). ▶ Open source implementation: github.com/tudarmstadt-lt/JoSimText September 19, 2016 | 2

Motivation for Unsupervised Knowledge-Free Word Sense Disambiguation ▶ A word sense disambiguation (WSD) system: ▶ Input : word and its context. ▶ Output : a sense of this word. ▶ Surveys : Agirre and Edmonds (2007) and Navigli (2009). ▶ Knowledge-based approaches that rely on hand-crafted resources, such as WordNet. ▶ Supervised approaches learn from hand-labeled training data, such as SemCor. ▶ Problem 1: hand-crafted lexical resources and training data are expensive to create, often inconsistent and domain-dependent. ▶ Problem 2: These methods assume a fixed sense inventory: ▶ senses emerge and disappear over time. ▶ different applications require different granularities the sense inventory. ▶ An alternative route is the unsupervised knowledge-free approach . ▶ learn an interpretable sense inventory ▶ learn a disambiguation model September 19, 2016 | 3

Contribution ▶ The contribution is a framework that relies on induced inventories as a pivot for learning contextual feature representations and disambiguation. ▶ We rely on the JoBimText framework and distributional semantics (Biemann and Riedl, 2013) adding a word sense disambiguation functionality on top of it. ▶ The advantage of our method, compared to prior art, is that it can integrate several types of context features in an unsupervised way . ▶ The method achieves state-of-the-art results in unsupervised WSD. September 19, 2016 | 4

Method: Data-Driven Noun Sense Modelling 1. Computation of a distributional thesaurus ▶ using distributional semantics 2. Word sense induction ▶ using ego-network clustering of related words 3. Building a disambiguation model of the induced senses ▶ by feature aggregation w.r.t. the induced sense inventory September 19, 2016 | 5

Method: Distributional Thesaurus of Nouns using the JoBimText framework ▶ A distributional thesaurus (DT) is a graph of word similarities, such as “(Python, Java, 0.781)”. ▶ We used the JoBimText framework (Biemann and Riedl, 2013): ▶ efficient computation of nearest neighbours for all words ▶ providing state-of-the-art performance (Riedl, 2016) ▶ For each noun in the corpus get 200 most similar nouns September 19, 2016 | 6

Method: Distributional Thesaurus of Nouns using the JoBimText framework (cont.) ▶ For each noun in the corpus get l = 200 most similar nouns: 1. Extract word , feature and word-feature frequencies. ▶ Using dependency-based features , such as amod( • , grilled) or prep_for( • , dinner) using the Malt parser (Nivre et al., 2007) ▶ Collapsing of dependencies in the same way as the Stanford dependencies. 2. Discard rare words , features and word-features ( t < 3). 3. Normalize word-feature scores using the Local Mutual Information (LMI): ∑ f ij i , j f ij LMI ( i , j ) = f ij · PMI ( i , j ) = f ij · log f i ∗ · f ∗ j 4. Ranking word features by LMI. 5. Prune all, but p = 1000 most significant features per word. 6. Word similarities are computed as a number of common features for two words: sim ( t i , t j ) = | k : f ik > 0 ∧ f jk > 0 | 7. Return l = 200 most related words per word. September 19, 2016 | 7

Method: Noun Sense Induction via Ego-Network Clustering ▶ The "furniture" and the "data" sense clusters of the word "table". ▶ Graph clustering using the Chinese Whispers algorithm (Biemann, 2006). September 19, 2016 | 8

Method: Noun Sense Induction via Ego-Network Clustering (cont.) ▶ Process one word per iteration ▶ Construct an ego-network of the word: ▶ use dependency-based distributional word similarities ▶ the ego-network size ( N ): the number of related words ▶ the ego-network connectivity ( n ): how strongly the neighbours are related; this parameter controls granularity of sense inventory. ▶ Graph clustering using the Chinese Whispers algorithm . September 19, 2016 | 9

Method: Disambiguation of Induced Noun Senses ▶ Learning a disambiguation model P ( s i | C ) for each of the induced senses s i ∈ S of the target word w in context C = { c 1 , ..., c m } . ▶ We use the Naïve Bayes model : P ( s i ) ∏ | C | j =1 P ( c j | s i ) P ( s i | C ) = , P ( c 1 , ..., c m ) ▶ The best sense given the context C : | C | ∏ s ∗ P ( c j | s i ). i = arg max P ( s i ) s i ∈ S j =1 September 19, 2016 | 10

Method: Disambiguation of Induced Noun Senses (cont.) ▶ The prior probability of each sense is computed based on the largest cluster heuristic: | s i | P ( s i ) = s i ∈ S | s i | . ∑ ▶ Extract sense representations by aggregation of features from all words of the cluster s i . ▶ Probability of the feature c j given the sense s i : | s i | P ( c j | s i ) = 1 − α f ( w k , c j ) ∑ λ k + α , Λ i f ( w k ) k ▶ To normalize the score we divide it by the sum of all the weights Λ i = ∑ | s i | λ k : k ▶ α is a small number, e.g. 10 − 5 , added for smoothing. September 19, 2016 | 11

Method: Disambiguation of Induced Noun Senses (cont.) ▶ To calculate a WSD model we need to extract from corpus: 1. the distributional thesaurus; 2. sense clusters; 3. word-feature frequencies. ▶ Sense representations are obtained by “averaging” of feature representations of words in the sense clusters. September 19, 2016 | 12

Feature Extraction: Single Models ▶ The method requires sparse word-feature counts f ( w k , c j ). ▶ We demonstrate the approach on the four following types of features: 1. Features based on sense clusters : Cluster ▶ Features : words from the induced sense clusters; ▶ Weights : similarity scores. 2. Dependency features : Deptarget, Depall ▶ Features : syntactic dependencies attached to the word, e.g. “subj( • ,type)” or “amod(digital, • )” ▶ Weights : LMI scores of the scores. 3. Dependency word features : Depword ▶ Features : words extracted from all syntactic dependencies attached to a target word. For instance, the feature “subj( • ,write)” would result in the feature “write”. ▶ Weights : LMI scores. 4. Trigram features : Trigramtarget, Trigramall ▶ Features : pairs of left and right words around the target word, e.g. “typing_ • _or” and “digital_ • _.”. ▶ Weights : LMI scores. September 19, 2016 | 13

Feature Combination: Combined Models ▶ Feature-level Combination of Features ▶ Union context features of different types, such as dependencies and trigrams. ▶ “Stack” feature spaces. ▶ Meta-level Combination of Features 1. Independent sense classifications by single models 2. Aggregation of predictions with: ▶ Majority selects the sense s i selected by the largest number of single models. ▶ Ranks . First, results of single model classification are ranked by their confidence ˆ P ( s i | C ): the most suitable sense to the context obtains rank one and so on. Finally, we assign the sense with the least sum of ranks. ▶ Sum . This strategy assigns the sense with the largest sum of classification i ˆ P ( s i | C i confidences i.e., ∑ k ), where i is the number of the single model. September 19, 2016 | 14

Corpora used for experiments # Tokens Size Text Type 1.863 · 10 9 Wikipedia 11.79 Gb encyclopaedic 1.980 · 10 9 ukWaC 12.05 Gb Web pages Table: Corpora used for training our models. September 19, 2016 | 15

Results: Evaluation on the “Python-Ruby- Jaguar” (PRJ) dataset: 3 words, 60 contexts, 2 senses per word ▶ A simple dataset: 60 contexts, 2 homonyms per word. ▶ The models based on the meta-combinations are not shown for brevity as they did not improve performance of the presented models in terms of F-score. September 19, 2016 | 16

Results: Evaluation on the TWSI dataset: 1012 nouns, 145140 contexts, 2.33 senses per word September 19, 2016 | 17

Results: the TWSI dataset: effect of the corpus choice on the WSD performance ▶ 10 best models according to the F-score on the TWSI dataset ▶ Trained on Wikipedia and ukWaC corpora September 19, 2016 | 18

Results: Evaluation on the SemEval 2013 Task 13 dataset: 20 nouns, 1848 contexts September 19, 2016 | 19

Conclusion ▶ An approach to word sense induction and disambiguation . ▶ The method is unsupervised and knowledge-free . ▶ Sense induction by clustering of word similarity networks ▶ Feature aggregation w.r.t. the induced inventory. ▶ Comparable to the state-of-the-art unsupervised WSD (SemEval’13 participants and various sense embeddings). ▶ Open source implementation: github.com/tudarmstadt-lt/JoSimText September 19, 2016 | 20

Thank you! September 19, 2016 | 21

Noun Sense Induction and Disambiguation using Graph-Based - PowerPoint PPT Presentation

KONVENS 2016, The 13-th Conference on Natural Language Processing 21 September, 2016, Bochum, Germany Noun Sense Induction and Disambiguation using Graph-Based Distributional Semantics Alexander Panchenko , Johannes Simon, Martin Riedl and Chris

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

Data-driven sense induction for disambiguation and lexical selection in translation Marianna

Word Sense Disambiguation WORD SENSE DISAMBIGUATION Homonymy and Polysemy As we have seen,

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

basket by b farias from the Noun Project light bulb by Andrew Doane from the Noun Project baby by

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense

Induction Stepwise induction (for T PA , T cons ) Complete induction (for T PA , T cons )

STEP ONE ADMIT THERES AN ISSUE National Identity noun noun: national identity ; plural noun:

Similarity-based Word Sense Disambiguation Yael Karov Shimon Edelman Weizmann Institute MIT

Induction and recursion Chapter 5 Chapter Summary Mathematical Induction Strong Induction

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Eneko Agirre

Common Proper Collective Abstract joy Banquo FINISHED? Friday Macbeth anger dagger Can

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

Exploiting Syntax in Sentiment Polarity Classification Wolfgang Seeker joint work with Adam

Annotating Corpora for Linguistics from text to knowledge Eckhard Bick University of Southern

Performance analysis : Hands-on time Wall/CPU parallel context gprof flat

Software Product Line Engineering Processes, Business, Technology, Architecture and

Module 1: Introduction Deriving Business Information Deriving meaningful information from

Mainly nuts and bolts and how they could fit together. 1 We will focus on charged particle

A Tour of Market Imperfections (Welch, Chapter 11) Ivo Welch Opinions and Disagreements

Knowledge Elicitation Exercise COMP34512 Sebastian Brandt brandt@cs.man.ac.uk Wednesday, 5

Sambuz

Useful Links

Newsletter

Mail Us

Noun Sense Induction and Disambiguation using Graph-Based - PowerPoint PPT Presentation

KONVENS 2016, The 13-th Conference on Natural Language Processing 21 September, 2016, Bochum, Germany Noun Sense Induction and Disambiguation using Graph-Based Distributional Semantics Alexander Panchenko , Johannes Simon, Martin Riedl and Chris

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

Data-driven sense induction for disambiguation and lexical selection in translation Marianna

Word Sense Disambiguation WORD SENSE DISAMBIGUATION Homonymy and Polysemy As we have seen,

Word Meaning &amp; Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

basket by b farias from the Noun Project light bulb by Andrew Doane from the Noun Project baby by

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense

Induction Stepwise induction (for T PA , T cons ) Complete induction (for T PA , T cons )

STEP ONE ADMIT THERES AN ISSUE National Identity noun noun: national identity ; plural noun:

Similarity-based Word Sense Disambiguation Yael Karov Shimon Edelman Weizmann Institute MIT

Induction and recursion Chapter 5 Chapter Summary Mathematical Induction Strong Induction

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Eneko Agirre

Common Proper Collective Abstract joy Banquo FINISHED? Friday Macbeth anger dagger Can

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

Exploiting Syntax in Sentiment Polarity Classification Wolfgang Seeker joint work with Adam

Annotating Corpora for Linguistics from text to knowledge Eckhard Bick University of Southern

Performance analysis : Hands-on time Wall/CPU parallel context gprof flat

Software Product Line Engineering Processes, Business, Technology, Architecture and

Module 1: Introduction Deriving Business Information Deriving meaningful information from

Mainly nuts and bolts and how they could fit together. 1 We will focus on charged particle

A Tour of Market Imperfections (Welch, Chapter 11) Ivo Welch Opinions and Disagreements

Knowledge Elicitation Exercise COMP34512 Sebastian Brandt brandt@cs.man.ac.uk Wednesday, 5

Sambuz

Useful Links

Newsletter

Mail Us

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT