SLIDE 1 Cross-Level Semantic Similarity
MultiJEDI ERC 259234
SemEval 2014 Task-3
SLIDE 2
Semantic Similarity
SLIDE 3
Semantic Similarity
Mostly focused on similar types of lexical items
SLIDE 4
Semantic Similarity
What if we have different types of inputs?
SLIDE 5
CLSS: Cross-Level Semantic Similarity
A new type of similarity task
SLIDE 6 CLSS: Cross-Level Semantic Similarity
- A new type of similarity task
SLIDE 7 CLSS: Comparison Types
Paragraph to Sentence
SLIDE 8 CLSS: Comparison Types
Sentence to Phrase Paragraph to Sentence
SLIDE 9 CLSS: Comparison Types
Sentence to Phrase Paragraph to Sentence Phrase to Word
SLIDE 10 CLSS: Comparison Types
Sentence to Phrase Paragraph to Sentence Word to Sense Phrase to Word
SLIDE 11 Task Data
Training set Test set
4000 pairs in total
SLIDE 12 Task Data
A wide range of domains and text styles
SLIDE 13
word-to-sense pairs
Word to Sense
SLIDE 14
word-to-sense pairs
Word to Sense
SLIDE 15
word-to-sense pairs
Word to Sense
SLIDE 16
word-to-sense pairs
Word to Sense
SLIDE 17
Rating Scale
SLIDE 18
Crafting an idealized similarity distribution
SLIDE 19 Crafting an idealized similarity distribution
larger side
SLIDE 20 Crafting an idealized similarity distribution
larger side
SLIDE 21 Crafting an idealized similarity distribution
2 4 1 3 larger side
SLIDE 22 Crafting an idealized similarity distribution
2 4 1 3 larger side
SLIDE 23 Crafting an idealized similarity distribution
2 4 1 3 larger side smaller side
SLIDE 24 Crafting an idealized similarity distribution
2 4 1 3 larger side smaller side
SLIDE 25 Crafting an idealized similarity distribution
2 4 1 3
SLIDE 26 Crafting an idealized similarity distribution
2 4 1 3
SLIDE 27 Crafting an idealized similarity distribution
2 4 1 3
SLIDE 28 Test and Training data IAA
Training (all) Training (unadjudicated) Test (all) Test (unadjudicated)
Krippendorff’s α
Paragraph-Sentence Sentence-Phrase Phrase-Word Word-Sense
SLIDE 29
The annotation procedure produces a balanced rating distribution
SLIDE 30 Experimental Setup
The brown fox was quick The quick brown fox The brown foxes were quick
Baslines:
SLIDE 31 Experimental Setup
The brown fox was quick The quick brown fox The brown foxes were quick
Baslines: Evaluation Measure:
SLIDE 32
Number of participants
Paragraph-Sentence Sentence-Phrase Phrase-Word Word-Sense
SLIDE 33 1 2 3 4
Meerkat Mafia pw* SimCompass run1 ECNU run1 UNAL-NLP run2 SemantiKLUE run1 GST Baseline LCS Baseline Gold
paragraph-sentence sentence-phrase phrase-word word-sense
Top 5 Systems and Baselines
SLIDE 34 1 2 3 4
Meerkat Mafia pw* SimCompass run1 ECNU run1 UNAL-NLP run2 SemantiKLUE run1 GST Baseline LCS Baseline Gold
paragraph-sentence sentence-phrase phrase-word word-sense
Top 5 Systems and Baselines
SLIDE 35 0.75 1.5 2.25 3
Meerkat Mafia pw* SimCompass run1 ECNU run1 UNAL-NLP run2 SemantiKLUE run1 GST Baseline LCS Baseline
paragraph-sentence sentence-phrase phrase-word word-sense
Where do the baselines stand?
SLIDE 36 0.75 1.5 2.25 3
Meerkat Mafia pw* SimCompass run1 ECNU run1 UNAL-NLP run2 SemantiKLUE run1 GST Baseline LCS Baseline
paragraph-sentence sentence-phrase phrase-word word-sense
Where do the baselines stand?
SLIDE 37 0.75 1.5 2.25 3
Meerkat Mafia pw* SimCompass run1 ECNU run1 UNAL-NLP run2 SemantiKLUE run1 GST Baseline LCS Baseline
paragraph-sentence sentence-phrase phrase-word word-sense
Where do the baselines stand?
SLIDE 38
Correlation per genre
paragraph-to-sentence
SLIDE 39
Correlation per genre
paragraph-to-sentence
SLIDE 40
Correlation per genre
paragraph-to-sentence
SLIDE 41
Correlation per genre
phrase-to-word
SLIDE 42
Correlation per genre
phrase-to-word
SLIDE 43
What makes the task difficult?
SLIDE 44
Handling OOV words and novel usages
SLIDE 45
Dealing with social media text
SLIDE 46 CLSS: Cross-Level Semantic Similarity
Similarity of different types of lexical items High-quality dataset: 4000 pairs for four comparison types 38 systems from 19 teams
SLIDE 47 Thank you!
David Jurgens Mohammad Taher Pilehvar Roberto Navigli
MultiJEDI ERC 259234