Lexical Semantics: Similarity Measures and Clustering Friday Monday - PowerPoint PPT Presentation

Beyond Dead Parrots Automatically constricted clusters of semantically similar words (Charniak, 1997): Lexical Semantics: Similarity Measures and Clustering Friday Monday Thursday Wednesday Tuesday Saturday Sunday People guys folks fellows CEOs commies blocks water gas cola liquid acid carbon steam shale that the theat head body hands eyes voice arm seat eye hair mouth Today: Semantic Similarity State-of-the-art Methods Closest words for ? anthropology 0.275881, sociology 0.247909, comparative lit- This parrot is no more! erature 0.245912, computer science 0.220663, political sci- It has ceased to be! ence 0.219948, zoology 0.210283, biochemistry 0.197723, It’s expired and gone to meet its maker! mechanical engineering 0.191549, biology 0.189167, crim- inology 0.178423, social science 0.176762, psychology This is a late parrot! 0.171797, astronomy 0.16531, neuroscience 0.163764, psy- This. . . is an EX-PARROT! chiatry 0.163098, geology 0.158567, archaeology 0.157911, mathematics 0.157138

Motivation Learning Similarity from Corpora Smoothing for statistical language models • You shall know a word by the company it keeps (Firth 1957) • Two alternative guesses of speech recognizer: What is tizguino? (Nida, 1975) For breakfast, she ate durian. For breakfast, she ate Dorian. A bottle of tizguino is on the table. • Our corpus contains neither “ate durian” nor “ate Tizguino makes you drunk. Dorian” We make tizguino out of corn. • But, our corpus contains “ate orange”, “ate banana” Motivation Learning Similarity from Corpora Aid for Question-Answering and Information Retrieval CAT • Task: “Find documents about women astronauts” cute smart dirty • Problem: some documents use paraphrase of DOG astronaut In the history of Soviet/Russian space exploration, there cute smart dirty have only been three Russian women cosmonauts: PIG Valentina Tereshkova, Svetlana Savitskaya, and Elena Kondakova. cute smart dirty

Outline Example 1: Next Word Representation Brown et al. (1992) • Vector-space representation and similarity computation • C ( x ) denotes the vector of properties of x (“context” – Similarity-based Methods for LM of x) • Hierarchical clustering • Assume alphabet of size K : w 1 , . . . , w K – Name Tagging with Word Clusters • C ( w ) = � #( w 1 ) , #( w 2 ) , . . . , #( w K ) � , where #( w i ) is the number of times w i followed w in the corpus • Computing semantic similarity using WordNet Learning Similarity from Corpora Example 2: Syntax-Based Representation • The vector C ( n ) for a noun n is the distribution of verbs for which it served as direct object • Select important distributional properties of a word • Assume (verb) alphabet of size K : v 1 , . . . , v K • Create a vector of length n for each word to be • C ( n ) = � P ( v 1 | n ) , P ( v 2 | n ) , . . . , P ( v K | n ) � , where classified P ( v i | n ) is the probability that v is a verb for which n • Viewing the n -dimensional vector as a point in an serves as a direct object n -dimensional space, cluster points that are near • Representation can be expanded to account for one another additional syntactic relations (subject, object, indirect-object)

Vector Space Model Similarity Measure: Cosine � n i =1 x i y i x ∗ � � y Cosine cos ( � x, � y ) = y | = | � x || � �� n i =1 x 2 �� n i =1 y 2 cosmonaut astronaut moon car truck Each word is represented as a vector � x = ( x 1 , x 2 , . . . , x n ) Soviet 1 0 0 1 1 American 0 1 0 1 1 man woman grape spacewalking 1 1 0 0 0 orange red 0 0 0 1 1 apple full 0 0 1 0 0 old 0 0 0 1 1 1 ∗ 0+0 ∗ 1+1 ∗ 1+0 ∗ 0+0 ∗ 0+0 ∗ 0 cos ( cosm, astr ) = � 12+02+12+02+02+02 � 02+12+12+02+02+02 Similarity Measure: Euclidean Outline �� n Euclidean | � x, � y | = | � x − � y | = i =1 ( x i − y i ) 2 cosmonaut astronaut moon car truck • Vector-space representation and similarity Soviet 1 0 0 1 1 computation American 0 1 0 1 1 – Similarity-based Methods for LM spacewalking 1 1 0 0 0 • Hierarchical clustering red 0 0 0 1 1 – Name Tagging with Word Clusters full 0 0 1 0 0 • Computing semantic similarity using WordNet old 0 0 0 1 1 euclidian ( cosm, astr ) = � (1 − 0)2 + (0 − 1)2 + (1 − 1)2 + (0 − 0)2 + (0 − 0)2 + (0 − 0)2

Discounting Smoothing for Language Modeling • Task: estimate the probability of unseen word pairs � • Possible approaches: P d ( w 2 | w 1 ) c ( w 1 , w 2 ) > 0 ˆ P ( w 2 | w 1 ) = – Katz back-off scheme — utilize unigram α ( w 1 ) P r ( w 2 | w 1 ) otherwise estimates P d Good-Turing discounted estimate – Class-based methods — utilize average α ( w 1 ) normalization factor co-occurrence probabilities of the classes to the model for probability redistribution among unseen words P r which the two words belong – Similarity-based methods Similarity-based Methods for LM Combining Evidence (Dagan, Lee & Pereira, 1997) • Idea: Assumption: if word w ′ 1 is “similar” to word w 1 , 1. combine estimates for the words most similar to a then w ′ word w 1 can yield information about the probability 2. weight the evidence provided by word w ′ by a of unseen word pairs involving w 1 function of its similarity to w S ( w 1 ) — the set of words most similar to w 1 • Implementation: W ( w 1 , w ′ 1 ) — similarity function – a scheme for deciding which word pairs require a W ( w 1 ,w ′ 1 ) P ( w 2 | w ′ P sim ( w 2 | w 1 ) = � 1 ) w ′ 1 ∈ S ( w 1 ) N ( w 1 ) similarity-based estimate 1 ∈ S ( w 1 ) W ( w 1 , w ′ N ( w 1 ) = � 1 ) w ′ – a method for combining information from similar words – a function measuring similarity between words

Combining Evidence (cont.) Other Probabilistic Dissimilarity Measures • Information Radius: How to define S ( w 1 ) ? Possible options: IRad( p, q ) = D ( p || p + q ) + D ( q || p + q ) • S ( w 1 ) = V 2 2 – Symmetric • S ( w 1 ) : the closest k or fewer words w ′ 1 such that – Well-defined if either q i > 0 or p i > 0 dissimilarity between w 1 and w ′ 1 is less than a threshold value t • L 1 norm: � L 1 ( p, q ) = | p i − q i | Redistribution model: i – Symmetric P r ( w 2 | w 1 ) = P sim ( w 2 | w 1 ) – Well-defined for arbitrary p and q Kullback Leibler Divergence Evaluation Task: Word Disambiguation • Definition: The KL Divergence D ( p || q ) measures how • Task: Given a noun and two verbs, decide which much information is lost if we assume distribution q when verb is more likely to have this noun as a direct the true distribution is p object p i log p i � D ( p || q ) = P ( plans | make ) vs. P ( plans | take ) q i i P ( action | make ) vs. P ( action | take ) • Properties: • Construction of candidate verb pairs: – Non-negative – generate verb-noun pairs on the test set – D ( p || q ) = 0 iff p = q – select pairs of verbs with similar frequency – Not symmetric and doesn’t satisfy triangle inequality – remove all the pairs seen in the training set – If q i = 0 and p i > 0 , then D ( p || q ) gets infinite value

Evaluation Setup Automatic Thesaurus Construction • Performance metric http://www.cs.ualberta.ca/˜lindek/demos/depsimdoc.htm (# of incorrect choices) + (# of ties) / 2 N Closest words for president N is the size of the test corpus • Data: leader 0.264431, minister 0.251936, vice president 0.238359, – 44m words of 1998 AP newswire Clinton 0.238222, chairman 0.207511, government 0.206842, Governor 0.193404, official 0.191428, Premier 0.177853, – select 1000 most frequent nouns and their Yeltsin 0.173577, member 0.173468, foreign minister corresponding verbs 0.171829, Mayor 0.168488, head of state 0.167166, chief – Training: 587833 pairs, Testing: 17152 pairs 0.164998, Ambassador 0.162118, Speaker 0.161698, General • Baseline: Maximum Likelihood Estimator 0.159422, secretary 0.156158, chief executive 0.15158 – Error rate: 0.5 Performance of Similarity-Based Methods Problems with Corpus-based Similarity Methods Error rate Katz 0.51 • Low-frequency words skew the results MLE 0.50 – “breast-undergoing”, “childhood-phychosis”, RandMLE 0.47 “outflow-infundibulum” L 1 MLE 0.27 • Semantic similarity does not imply synonymy IRadMLE 0.26 – “large-small”, “heavy-light”, “shallow-coastal” • RandMLE — Randomized combination of weights • Distributional information may not be sufficient for true semantic grouping • L 1 MLE — Similarity function based on L 1 • IRadMLE — Similarity function based on IRad

Outline Bottom-Up Hierarchical Clustering Given: a set X = { x 1 , . . . , x n } of objects a similarity function sim for i := 1 to n do • Vector-space representation and similarity c i := x i computation C := { c 1 , . . . , c n } – Similarity-based Methods for LM j := n + 1 while | C | > 1 • Hierarchical clustering ( c n 1 , c n 2 ) := argmax ( c u ,c v ) ∈ C × C sim ( c u , c v ) – Name Tagging with Word Clusters c j := c n 1 ∪ c n 2 • Computing semantic similarity using WordNet C := ( C − { c n 1 , c n 2 } ) ∪ { c j } j := j + 1 Hierarchical Clustering Agglomerative Clustering B E D C Greedy, bottom-up version: A 0.1 0.2 0.2 0.8 • Initialization: Create a separate cluster for each object B 0.1 0.1 0.2 • Each iteration: Find two most similar clusters and merge C 0.7 them 0.0 • Termination: All the objects are in the same cluster D 0.6 C D E A B

Lexical Semantics: Similarity Measures and Clustering Friday Monday - PowerPoint PPT Presentation

Beyond Dead Parrots Automatically constricted clusters of semantically similar words (Charniak, 1997): Lexical Semantics: Similarity Measures and Clustering Friday Monday Thursday Wednesday Tuesday Saturday Sunday People guys folks fellows

LEXICAL SEMANTICS LEXICAL SEMANTICS CS 224N 2011 Gerald Penn Slides largely adapted from

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

Compilers Lexical Analysis Alex Aiken Lexical Analysis 1. Lexical Analysis 2. Parsing 3.

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Semantics and Pragmatics of NLP Lexical Semantics: Polysemy Alex Lascarides School of

Lexical Semantics (Following slides are modified from Prof. Claire Cardies slides.)

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Lexical Semantics Martin Rajman & Jean-Cdric Chappelier Overview Basic concepts

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Similarity and clustering Dr. Ahmed Rafea Outline Motivation Clustering: An Overview

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Authors BioNB 4240 Discussion: Aug. 31, 2011 Carl D Hopkins Carl D. Hopkins Review of Jon H Kaas

Reading derived words by Italian children with and without dyslexia: The effect of root length

How to Manage Employee Leave Requests and Control Absence Abuse May 2018 Jaime Lizotte Shanna

Autism Autism Autism Autism Affects 1 in 166 children More common in boys than girls

Attendance Matters: How Expanded Learning Opportunities Keep Kids In School October 25, 2016

How to Stop Excessive Absenteeism from Undermining Your Business June 2019 Presented by:

Using Attendance Data for Decisionmaking: S trategies for S tate and Local Education Agencies

Group 2: Absenteeism,Withdrawal Behaviors Chi Chuong, Lea Finato, Jordan Mosset, Stephanie

Lexical Semantics: Similarity Measures and Clustering Friday Monday - PowerPoint PPT Presentation

Beyond Dead Parrots Automatically constricted clusters of semantically similar words (Charniak, 1997): Lexical Semantics: Similarity Measures and Clustering Friday Monday Thursday Wednesday Tuesday Saturday Sunday People guys folks fellows

LEXICAL SEMANTICS LEXICAL SEMANTICS CS 224N 2011 Gerald Penn Slides largely adapted from

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

Compilers Lexical Analysis Alex Aiken Lexical Analysis 1. Lexical Analysis 2. Parsing 3.

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Semantics and Pragmatics of NLP Lexical Semantics: Polysemy Alex Lascarides School of

Lexical Semantics (Following slides are modified from Prof. Claire Cardies slides.)

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Lexical Semantics Martin Rajman &amp; Jean-Cdric Chappelier Overview Basic concepts

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Similarity and clustering Dr. Ahmed Rafea Outline Motivation Clustering: An Overview

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Authors BioNB 4240 Discussion: Aug. 31, 2011 Carl D Hopkins Carl D. Hopkins Review of Jon H Kaas

Reading derived words by Italian children with and without dyslexia: The effect of root length

How to Manage Employee Leave Requests and Control Absence Abuse May 2018 Jaime Lizotte Shanna

Autism Autism Autism Autism Affects 1 in 166 children More common in boys than girls

Attendance Matters: How Expanded Learning Opportunities Keep Kids In School October 25, 2016

How to Stop Excessive Absenteeism from Undermining Your Business June 2019 Presented by:

Using Attendance Data for Decisionmaking: S trategies for S tate and Local Education Agencies

Group 2: Absenteeism,Withdrawal Behaviors Chi Chuong, Lea Finato, Jordan Mosset, Stephanie

Lexical Semantics Martin Rajman & Jean-Cdric Chappelier Overview Basic concepts