Natural Language Processing: Word Sense Disambiguation SCIENCE PASSION TECHNOLOGY Natural Language Processing: Word Sense Disambiguation Roman Kern <rkern@tugraz.at> 2020-05-28 Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 1 2020-05-28
Natural Language Processing: Word Sense Disambiguation Outline 1 Introduction 2 General Observations 3 Approaches 4 Evaluation, Applications, Tools Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 2 2020-05-28
Introduction Ambiguous Words Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 3 2020-05-28
Introduction Motivational Example Given a single (writen) word e.g., paper Depending on the context, the word might have different meanings e.g., newspaper or writing material In short: words are ambiguous Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 4 2020-05-28
Introduction Motivational Example Sense Sentence using that sense Substance That statue is made out of paper Sheets of material He needs some paper to draw on Material with writing Hand her that paper to read Meaning of the writing Did you understand that paper? Oral presentation I want to go hear his paper News source I read the paper every morning Newspaper company The paper might go out of business Company representative The paper called about doing an interview with you Editorial policies The paper is very pro-Illinois Class report I have to go turn in my paper Wall covering She got the most beautiful paper for her bedroom walls Gif wrap He tore open the paper to get at the present Commercial paper The paper on that silver mine is worth 10! on the dollar Klein, D. and Murphy, G. 2002. Paper has been my ruin: conceptual relations of polysemous senses. Journal of Memory and Language. 47, 4 (Nov. 2002), 548–570. DOI:htps://doi.org/10.1016/S0749-596X(02)00020-7. Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 5 2020-05-28
Introduction Recommended Literature WSD Book ❤tt♣✿✴✴✇✇✇✳✇s❞❜♦♦❦✳♦r❣ Papers Navigli, R. 2009. Word Sense Disambiguation: A Survey. ACM Computing Surveys (CSUR). 41, 2 (2009), 10. DOI:htps://doi.org/10.1145/1459352.1459355. Iacobacci, I., Pilehvar, M.T. and Navigli, R. 2016. Embeddings for word sense disambiguation: An evaluation study. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers. 2, 2003 (2016), 897–907. DOI:htps://doi.org/10.18653/v1/p16-1085. Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 6 2020-05-28
Introduction Polysemy and Homonymy Sense vs. meaning e.g., table (put stuff on) and table (like in a spreadsheet) Different meaning, but shared etymology → polysemous word Different interpretations of a homonym are referred to as meanings Resulting in different lexical entries Those of a polysemous word are referred to as senses → ambiguity of words are on a spectrum Homograph: different words, same spelling Homophone: different word, same sound Contranym (auto antonym): Ambiguous word, with contradicting senses (e.g., dust) Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 7 2020-05-28
Introduction How Senses Evolve Example: Iron Material Much of the Erzberg consisted out of iron. Product Voestalpine produces high quality iron out of iron. Object The electric clothing iron might not even be made out of iron. The evolution of sense ofen similar (material → product). Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 8 2020-05-28
Introduction Related Tasks Word Sense Disambiguation ... the task to identify for a single instance (e.g., word in a sentence) its correct sense Typically given a list of possible sense (closed class) Word Sense Induction and Disambiguation ... the task to identify the different senses of a word Typically without a pre-defined set of senses Note: Related tasks are, cross-lingual WSD , multi-lingual WSD , entity disambiguation (where typically named entities are being ambiguous, e.g., Aberdeen) Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 9 2020-05-28
Introduction History WSD was first formulated in the late 1940ies Weaver (1949): “[...] central word in question but also say N words on either side, then, [...] one can unambiguously decide the meaning” Zipf (1949): more frequent words have more senses than less frequent words in a power-law relationship Acknowledgement of the hardness of the problem ( 50ies/60ies ) Bar-Hillel (1960): “no existing or imaginable program will enable an electronic computer to determine that the word pen is used in its ‘enclosure’ sense” Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 10 2020-05-28
Introduction History WSD being held back by the knowledge acquisition botleneck ( 70ies ) Knowledge sources need to be hand-crafed Turning point for WSD in the 80ies Usage of corpora, like “Oxford Advanced Learner’s Dictionary of Current English”, and “Roget’s International Thesaurus” Dictionary-based approach to WSD Downside: Not robust due to lack of coverage Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 11 2020-05-28
Introduction History “Statistical revolution” in the 80ies/90ies Application of statistical and machine learning approaches on WSD Evaluation initiatives emerged in the late 90ies, early 2000 Needed to be able to compare approaches Most prominently, Senseval (and later SemEval) series Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 12 2020-05-28
Introduction History “Deep-learning revolution” in the recent years Adaption of word embeddings Contextual word embeddings also imply (to a certain degree) WSD Utilisation of various neural network architectures for the task e.g., LSTMs, CNNs End-to-end learning i.e., WSD is also implicitly taken care of [1] [1] Raganato, A., Bovi, C.D. and Navigli, R. 2017. Neural sequence learning models for word sense disambiguation. EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings. (2017), 1156–1167. DOI:htps://doi.org/10.18653/v1/d17-1120. Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 13 2020-05-28
General Observations ... and starting point of solutions Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 14 2020-05-28
General Observations How Hard is it? Human performance Just require 2 words of context (either side) to infer the sense (equivalent to whole sentence, Kaplan 1950) Caveat: Even human agree only to a certain amount (as low as 85% being reported) Machine performance WSD has been considered to be AI-complete [1] ... since it requires knowledge (of the world) [1] Mallery, J. C. 1988. Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Ph.D. dissertation. MIT Political Science Department, Cambridge, MA. Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 15 2020-05-28
General Observations Language Specific Ambiguity prevails in many human languages ... not only limited to the senses of the word English 121 most frequent English nouns → on average 7.8 meanings each [1] The senses are not aligned between the languages The senses also depend on the domain Senses come and go (diachronic) [1] Ng, Hwee Tou & Hian Beng Lee. 1996. Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California, 40–47. Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 16 2020-05-28
General Observations Cross-Lingual WSD Apidianaki, M., Ljubešić, N. and Fišer, D. 2013. Cross-lingual WSD for Translation Extraction from Comparable Corpora. Proceedings of the Sixth Workshop on Building and Using Comparable Corpora. Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 17 2020-05-28
General Observations Important Hypothesis Semantics of words Distributional Hypothesis First described by Harris in 1954, which states that words which tend to occur together are semantically related. Firth describes this intuition as “a word is characterised by the company it keeps”. Strong Contextual Hypothesis Proposed by Miller and Charles in 1991, says that the more similar the contexts of words the more semantically related the words are. Note: Linguists also use the term context to refer to situational or social context (pragmatics). Roman Kern <rkern@tugraz.at>, Institute for Interactive Systems and Data Science 18 2020-05-28
Recommend
More recommend