Looking at Word Meaning An interactive visualization of Semantic - PowerPoint PPT Presentation

Overview SVS types&tokens Data Visualization Conclusion References Looking at Word Meaning An interactive visualization of Semantic Vector Spaces for Dutch synsets Kris Heylen, Dirk Speelman & Dirk Geeraerts KULeuven Quantitative Lexicology and Variational Linguistics

Overview SVS types&tokens Data Visualization Conclusion References Purpose of the talk • Peak inside the black box of Vector Space Models of lexical semantics • through an interactive visualization of word uses • Allow Computational Linguists to do a direct, intrinsic evaluation of their models and the semantics they capture • Provide Lexicologists and Lexicographers with an explorative tool for analyzing word meaning in large corpora

Overview SVS types&tokens Data Visualization Conclusion References Overview 1. Semantic Vector Spaces as models of word meaning 2. Type vs token-level vector spaces 3. Case study: Data and set-up 4. Visualization 5. Conclusion and future work

Overview SVS types&tokens Data Visualization Conclusion References Semantic Vector Spaces as models of word meaning Semantic Vector Spaces in Computational Linguistics • standard technique in statistical NLP for the large-scale automatic modeling of (lexical) semantics • aka Vector Spaces Models, Distributional Semantic Models, Word Spaces,... (see Turney & Pantel (2010) for overview) • intuitive rationale, but largely black-box statistical technique Linguistic origin: Distributional Hypothesis • ”You shall know a word by the company it keeps” (Firth, 1957) • a word’s meaning can be induced from its co-occurring words • words appearing in similar contexts will have similar meanings

Overview SVS types&tokens Data Visualization Conclusion References Semantic Vector Spaces as models of word meaning Practical Which two words out of a set of three have the same meaning? ongeval, koffie, accident Occurrences in context from a corpus Op de Brusselse ring deed zich een ongeval met een vrachtwagen voor ’s Morgens drinkt hij een kop koffie met melk en suiker 2 bestuurders raakten gekwetst bij een ongeval met een vrachtwagen in de avondspits veroorzaakte een accident een kilometerslange file als vieruurtje serveert het hotel koffie en gebak voor de gasten de auto was betrokken in een accident met een dodelijke afloop Met winterbanden is het risico op een ongeval bij vriesweer veel kleiner

Overview SVS types&tokens Data Visualization Conclusion References Semantic Vector Spaces as models of word meaning word by context co-occurrence matrix n e g r e a t s ff w t o e t r h k t w e h l c k o k e c a p e i m t e u a r o u l g fi l v s k a s ongeval 120 424 388 82 270 11 3 1 accident 154 401 376 99 305 20 1 5 koffie 5 8 18 4 1 72 102 93

Overview SVS types&tokens Data Visualization Conclusion References Semantic Vector Spaces as models of word meaning word by word similarity matrix ongeval accident koffie ongeval 1 .91 .08 accident .91 1 .17 koffie .08 .17 1

Overview SVS types&tokens Data Visualization Conclusion References Vector Space Models of lexical semantics Many different parameter settings • context definition (document, window, dependency relations) • weighting and similarity measures (PMI, cosine, jaccard,...) • dimensionality reduction (SVD, LDA, NNMF, RI...) • type vs token level; words vs relations Wide variety of applications • Psycholinguistic modeling of semantic memory • Thesaurus extraction (WordNet) • Lexical entailment, Query expansion • Word sense disambiguation/induction • Lexical variation between language varieties • Historical studies of change in word meaning

Overview SVS types&tokens Data Visualization Conclusion References Vector Space Models of lexical semantics Unclear relation between parameters and semantics • Which semantic structure do SVS models capture and how? • Task-based evaluations only assess a-priori relations • actual lexical-semantic structure is richer (Geeraerts (2010)) • Appeal for an intrinsic evaluation (Baroni & Lenci (2011)) SVSs have found little application in Linguistics proper • Theoretical linguistics is becoming more data-driven • Lexicologists (and lexicographers) try to describe semantic structure based on a large number of corpus occurrences • SVSs can provide such a (preliminary) structure but it needs to be accessible for linguists

Overview SVS types&tokens Data Visualization Conclusion References Vector Space Models of lexical semantics Potential win-win solution for both problems ⇒ An intuitive visualization of SVS output matrix Benefits: • For computational linguist: Making SVS accessible for evaluation by lexical semantic experts that goes beyond the pre-defined semantic relations of task-based evaluation • For Lexicology: Tool for exploring and analyzing word meaning in large amounts of corpus data that unlike traditional concordances have some preliminary structure

Overview SVS types&tokens Data Visualization Conclusion References Type vs token-level SVS SVSs can model lexical semantics on two levels: 1. the type level: aggregating over all occurrences of a word, giving a representation of a word’s general semantics. (e.g. Thesaurus extraction) 2. the token level: representing the semantics of each individual occurrence of a word.(e.g. WSD) Lexicological studies typically take a set of types and analyze how they ’carve up’ semantic space by looking at their tokens We use a type-level SVS for finding synsets and a token-level space for modeling the tokens within each synset.

Overview SVS types&tokens Data Visualization Conclusion References Type vs token-level vector spaces Token vector approach of Sch¨ utze (1998): Token vector = average of context words’ type vector While walking to work, the teacher saw a barking dog chasing a cat e t l r t e k t i h c p r o r e l u g a i ffi o u p m h p i f p n o walk 4.7 2.3 2.4 0.2 1.9 0.1 0 0 work 1.2 4.9 3.2 0 0.1 2.3 0.1 0 teacher 0.3 1.3 0.8 0 1.2 4.3 0.5 0.1 see 0.2 0.4 1.2 0.7 0.9 0.8 0.7 0.1 bark 0.3 0.2 1.9 1.8 2.1 1.8 0.7 2.1 chase 2.8 1 2.1 3.1 2.2 1.1 0.9 0.8 cat 1.1 0.9 2.3 1.9 3.9 0.5 2.8 4.6 AVERAGE 1.51 1.57 1.99 1.10 1.76 1.56 0.81 1.10

Overview SVS types&tokens Data Visualization Conclusion References Type vs token-level vector spaces Our modified approach: Token vector = weighted average of context words’ type vector, with as weights the PMI values between type and context words T H G e t l r t e k i h t c p r o r e l I u i ffi g a E o u p m h p i f p W n o walk 1.1 4.7 2.3 2.4 0.2 1.9 0.1 0 0 work 0.2 1.2 4.9 3.2 0 0.1 2.3 0.1 0 see 0.1 0.2 0.4 1.2 0.7 0.9 0.8 0.7 0.1 bark 3.1 0.3 0.2 1.9 1.8 2.1 1.8 0.7 2.1 chase 2.7 2.8 1 2.1 3.1 2.2 1.1 0.9 0.8 cat 2.1 1.1 0.9 2.3 1.9 3.9 0.5 2.8 4.6 w.Av. 1.73 0.95 2.11 1.94 2.44 1.14 1.13 1.95

Overview SVS types&tokens Data Visualization Conclusion References Case study: Data and set-up Corpus • Dutch newspaper materials from 1999 to 2005 • stratified for Netherlandic (500M) and Belgian Dutch(1.3G) • automatically lemmatized, POS tagged and parsed with Alpino (van Noord (2006)). Dutch synsets • 218 synsets containing 476 nouns (Ruette et al. (2012)) • dependency-based type-level SVS (Pad´ o & Lapata (2007)) • clustered with Clustering by Committee ( Pantel & Lin (2002))

Overview SVS types&tokens Data Visualization Conclusion References Case study: Data and set-up Concept nouns in synset Infringement inbreuk, overtreding Genocide volkerenmoord, genocide Poll peiling, opiniepeiling, rondvraag Marihuana cannabis, marihuana Coup staatsgreep, coup Meningitis hersenvliesontsteking, meningitis Demonstrator demonstrant, betoger Airport vliegveld, luchthaven Collision aanrijding, botsing Computer screen computerschem, beeldscherm, monitor Table: Dutch synsets (sample)

Looking at Word Meaning An interactive visualization of Semantic - PowerPoint PPT Presentation

Overview SVS types&tokens Data Visualization Conclusion References Looking at Word Meaning An interactive visualization of Semantic Vector Spaces for Dutch synsets Kris Heylen, Dirk Speelman & Dirk Geeraerts KULeuven Quantitative

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Word Meaning and Similarity Word Senses and Word Rela-ons Dan

How Did I Get My Bible? Inspiration The Source of the Biblical Writings The Meaning of

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop

Is this a word that would be used by a mature language user? Is it a frequently used word?

Building On The Word Building On The Word Nehemiah 8:1-8 Nehemiah 8:1-8

Create PDF in MS Word 2013 using Adobe Distiller 10 Sep 2020 V0C V0C Create PDF In MS Word 2013

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Synonyms and Antonyms Synonym: a word that means exactly the same as another word. Antonym: a

synonym antonym opposite meaning the same) to another word. meaning as another word. This

antonym synonym opposite meaning the same) to another word. meaning as another word. This

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Person-neutral ownself in two varieties of Asian English Dennis Ryan Storoshenko University of

Matthew Series Lesson #015 December 8, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert

Parallel Programming Libraries and implementations Reusing this material This work is licensed

More Dynamic Programming Lecture 14 Wednesday, March 11, 2020 L A T EXed: January 19, 2020

Gravitational wave physics with LISA and pulsar timing arrays Jonathan Gair,

HIGGS PRECISION PHYSICS AT THE LHC Amplitudes in the LHC era GGI, Florence Oct. 29th 2018

2020 Innovation Training Limited Monthly Tax Webinar Martyn Ingles 22 June 2015 Agenda

Reporting Tips for HUD 9902 The audio is available ONLY through the conference call. Please call:

Looking at Word Meaning An interactive visualization of Semantic - PowerPoint PPT Presentation

Overview SVS types&tokens Data Visualization Conclusion References Looking at Word Meaning An interactive visualization of Semantic Vector Spaces for Dutch synsets Kris Heylen, Dirk Speelman & Dirk Geeraerts KULeuven Quantitative

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Word Meaning &amp; Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Word Meaning and Similarity Word Senses and Word Rela-ons Dan

How Did I Get My Bible? Inspiration The Source of the Biblical Writings The Meaning of

&gt;&gt;&gt;CLICK HERE&lt;&lt;&lt; Presentation d un document word New Haven. peugeot 207 workshop

Is this a word that would be used by a mature language user? Is it a frequently used word?

Building On The Word Building On The Word Nehemiah 8:1-8 Nehemiah 8:1-8

Create PDF in MS Word 2013 using Adobe Distiller 10 Sep 2020 V0C V0C Create PDF In MS Word 2013

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Synonyms and Antonyms Synonym: a word that means exactly the same as another word. Antonym: a

synonym antonym opposite meaning the same) to another word. meaning as another word. This

antonym synonym opposite meaning the same) to another word. meaning as another word. This

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Person-neutral ownself in two varieties of Asian English Dennis Ryan Storoshenko University of

Matthew Series Lesson #015 December 8, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert

Parallel Programming Libraries and implementations Reusing this material This work is licensed

More Dynamic Programming Lecture 14 Wednesday, March 11, 2020 L A T EXed: January 19, 2020

Gravitational wave physics with LISA and pulsar timing arrays Jonathan Gair,

HIGGS PRECISION PHYSICS AT THE LHC Amplitudes in the LHC era GGI, Florence Oct. 29th 2018

2020 Innovation Training Limited Monthly Tax Webinar Martyn Ingles 22 June 2015 Agenda

Reporting Tips for HUD 9902 The audio is available ONLY through the conference call. Please call:

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop