Predicting the relevance of distributional semantic similarity with - PowerPoint PPT Presentation

Predicting the relevance of distributional semantic similarity with contextual information Philippe Muller, Cécile Fabre, Clémentine Adam IRIT & CLLE, University of Toulouse June 23rd, 2014 supported by Asfalda project (ANR-12-CORD-023) Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 1 / 15

The big picture • distributional similarity hypothesis: lexical items with same usage contexts are semantically related • exhibited relations are either classical synonymy/hypernymy or indicative of topical cohesion • but • polysemy and context: relatedness depends on the context of use • validation is not obvious our contribution • context helps validation of lexical associations • the relevance of relatedness in a given context can be predicted • contextual features play a role, not just strength of a priori similarity Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 2 / 15

The distributional hypothesis Similar contexts (graphic or syntactic): Pair contexts relation immortal, eternal soul, love synonymy win, lose game, time, bet antonymy apple, fruit _ salad, harvest _ hyponymy professor_of, teach_obj literature, science ? bottle, die throw noise ? semantic relatedness of words is difficult to assess when presented out of context Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 3 / 15

Overview • judging of relatedness : context helps reliability • out of context human annotation • in context human annotation • experiment: predicting contextual similarity • data • setup / methods • results Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 4 / 15

Judging relatedness • intrinsic relatedness: judging pairs without any context, eg are the items root,insect semantically related ? • contextual relatedness: judging pairs in a text where they appear together: is the relation between the items relevant for textual cohesion ? While they are insectivores, hedgehogs are in practice omnivores. They can eat insects , roots , melon or squash. We must control whether: • context biases towards relatedness • annotators can agree on the relevance of item pairs Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 5 / 15

Judging relatedness Setup: • example texts from Wikipedia • distributional similarity database : French distributional neighbours from http://redac.univ-tlse2.fr/applications/vdw.html . built from a 250M word Wikipedia dump, Lin similarity measure from (Lin, 1998). • linguist annotators are given 100 randomly selected pairs, with additional constraints • out-of-context pairs were selected with a minimal similarity score of 0.2 (top 14% of all lexical pairs). • in context: no threshold on similarity score, but must appear in the same paragraph somewhere in the corpus • Then two of the annotators (now “experts”) went on to annotate a larger sample ( ≈ 2000 pairs). Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 6 / 15

Example: impala [...] Le ventre de l’impala de même que ses lèvres et sa queue sont blancs. Il faut aussi mentionner leurs lignes noires uniques à chaque individu au bout des oreilles , sur le dos de la queue et sur le front. Ces lignes noires sont très utiles aux impalas puisque ce sont des signes qui leur permettent de se reconnaitre entre eux. Ils possèdent aussi des glandes sécrétant des odeurs sur les pattes arrières et sur le front. Ces odeurs permettent également aux individus de se reconnaitre entre eux. Il a également des coussinets noirs situés, à l’arrière de ses pattes . Les impalas mâles et femelles ont une morphologie différente. En effet, on peut facilement distinguer un mâle par ses cornes en forme de S qui mesurent de 40 à 90 cm de long. Les impalas vivent dans les savanes où l’ herbe (courte ou moyenne) abonde. Bien qu’ils apprécient la proximité d’une source d’eau, celle-ci n’est généralement pas essentielle aux impalas puisqu’ils peuvent se satisfaire de l’eau contenue dans l’ herbe qu’ils consomment. Leur environnement est relativement peu accidenté et n’est composé que d’ herbes , de buissons ainsi que de quelques arbres. [...] target item: corne , horn, in blue candidates: pending yellow words ( oreille/queue , ear/tail), relevant: green words pattes , legs) not relevant: red words: herbe , grass Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 7 / 15

Inter-annotator agreement N i : linguists. 2 × 100 pairs Annotators Non-contextual Contextual Agreement rate Kappa Agreement rate Kappa 77 % 91 % N1+N2 0.52 0.66 N1+N3 70 % 0.36 92 % 0.69 N2+N3 79 % 0.50 92 % 0.69 Average 75 , 3 % 0.46 91 , 7 % 0.68 Experts NA NA 90 . 8 % 0.80 Experts: N1+N2, 2000 pairs. Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 8 / 15

Predicting relatedness • Data: the 2000 annotated pairs of lexical items appearing in a common paragraph (either “relevant” or “not relevant”) • An imbalance classification problem: 11% of pairs only are relevant • Features: • a group for corpus frequencies of target lexical items • a group for distributional association related measures • a group for contextual information • Baseline: just use Lin’s score, with a threshold for relevance determined on a sample of the instances. • Classifiers: Random Forest and Naive Bayes • Class imbalance addressed with resampling (Smote), and cost-aware learning (MetaCost) • Evaluation: 10 fold cross validation Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 9 / 15

Features: corpus for each pair of lexical items ( a , b ) (“neighbours”), considering corpus frequencies: Feature Description freq min min ( freq a , freq b ) freq max max ( freq a , freq b ) freq × log ( freq a × freq b ) P ( a , b ) mi mi = log P ( a ) · P ( b ) Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 10 / 15

Features: distributional similarity given similarity scores, we can use rankings of items similar to another one, and productivity of items (the number of times they appear as similar to another item) Feature Description lin Lin’s score rank min min ( rank a − b , rank b − a ) max ( rank a − b , rank b − a ) rank max rank × log ( rank a − b × rank b − a ) min ( prod a , prod b ) prod min prod max max ( prod a , prod b ) prod × log ( prod a × prod b ) cats neighbour pos pair (eg NN, AN,...) predarg predicate or argument Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 11 / 15

Features: context given a set of occurrences of item a and b in the same text, use frequencies in this context, distances between occurrences, related items (productivity within the text, connected components) Feature Description freqtxt min min ( freqtxt a , freqtxt b ) max ( freqtxt a , freqtxt b ) freqtxt max freqtxt × log ( freqtxt a × freqstxt b ) tf · ipf tf · ipf (neighbour a ) × tf · ipf (neighbour b ) copr ph , copr para copresence in a sentence, paragraph sd , gd , ad smallest, highest, average distance between neighbour a and neighbour b prodtxt min , max min ( prod a , prod b ) , max ( ... ) prodtxt × log ( prod a × prod b ) cc belong to the same lexical connected component Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 12 / 15

Results Method Precision Recall F-score CI Baseline (Lin threshold) 24.0 24.0 24.0 RF 24.2 35.7 ± 3.4 68.1 NB 34.8 51.3 41.5 ± 2.6 RF+resampling 56.6 32.0 40.9 ± 3.3 NB+resampling 32.8 54.0 40.7 ± 2.5 RF+cost aware learning 40.4 54.3 46.3 ± 2.7 NB+cost aware learning 27.3 37.8 ± 2.2 61.5 Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 13 / 15

Feature Impact Features Prec. Recall F-score all 40.4 54.3 46.3 all − corpus feat. 37.4 52.8 43.8 all − similarity feat. 36.1 49.5 41.8 all − contextual feat. 36.5 54.8 43.8 Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 14 / 15

Take-away • distributional similarity hypothesis: lexical items with same usage contexts are semantically related • exhibited relations are either classical synonymy/hypernymy or indicative of topical cohesion • but • polysemy and context: relatedness depends on the context of use • validation is not obvious our contribution • context helps validation of lexical associations • the relevance of relatedness in a given context can be predicted • contextual features play a role, not just strength of a priori similarity Muller, Fabre, Adam () Contextual semantic similarity June 23rd, 2014 15 / 15

Predicting the relevance of distributional semantic similarity with - PowerPoint PPT Presentation

Predicting the relevance of distributional semantic similarity with contextual information Philippe Muller, Ccile Fabre, Clmentine Adam IRIT & CLLE, University of Toulouse June 23rd, 2014 supported by Asfalda project (ANR-12-CORD-023)

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Improving Hypernymy Extraction with Distributional Semantic Classes Introduction May 10, 2018

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Optimizing the Relevance-Redundancy Tradeoff for Efficient Semantic Segmentation Caner Hazrba

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

The Strategist Adam Brandenburger J.P . Valles Professor, NYU Stern School of

Named Entity Recognition & Sequence Labeling CSCI 699: ML for Knowledge Extraction &

SURVIVING CRAPPY INTERNET Anouk Ruhaak @anoukruhaak BETWEEN OFFLINE AND ONLINE SPEED

Processes and threads assignments (LE: 1,3,5,7,8,9,10,11,13) 1. Consider a computer running an mp3

List of Attendees Paul Stacey, John Storer, Rich Langan, David Patrick, Dean Peschel, Lindsey

Post-quantum cryptography Daniel J. Bernstein & Tanja Lange University of Illinois at Chicago

SPDY, err... HTTP 2.0 WebRTC what is it, how, why, and when? Make the Web Fast, Google Improve

What, exactly, is different or new about MOBILE mobile security? SECURITY TECHNOLOGIES 2017

Predicting the relevance of distributional semantic similarity with - PowerPoint PPT Presentation

Predicting the relevance of distributional semantic similarity with contextual information Philippe Muller, Ccile Fabre, Clmentine Adam IRIT & CLLE, University of Toulouse June 23rd, 2014 supported by Asfalda project (ANR-12-CORD-023)

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Improving Hypernymy Extraction with Distributional Semantic Classes Introduction May 10, 2018

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Optimizing the Relevance-Redundancy Tradeoff for Efficient Semantic Segmentation Caner Hazrba

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

The Strategist Adam Brandenburger J.P . Valles Professor, NYU Stern School of

Named Entity Recognition &amp; Sequence Labeling CSCI 699: ML for Knowledge Extraction &amp;

SURVIVING CRAPPY INTERNET Anouk Ruhaak @anoukruhaak BETWEEN OFFLINE AND ONLINE SPEED

Processes and threads assignments (LE: 1,3,5,7,8,9,10,11,13) 1. Consider a computer running an mp3

List of Attendees Paul Stacey, John Storer, Rich Langan, David Patrick, Dean Peschel, Lindsey

Post-quantum cryptography Daniel J. Bernstein &amp; Tanja Lange University of Illinois at Chicago

SPDY, err... HTTP 2.0 WebRTC what is it, how, why, and when? Make the Web Fast, Google Improve

What, exactly, is different or new about MOBILE mobile security? SECURITY TECHNOLOGIES 2017

Named Entity Recognition & Sequence Labeling CSCI 699: ML for Knowledge Extraction &

Post-quantum cryptography Daniel J. Bernstein & Tanja Lange University of Illinois at Chicago