Roberto Navigli The MultiJEDI ERC Project: Multilingual Joint Word Sense Disambiguation
5 July 2016 – META-FORUM 2016 http://lcl.uniroma1.it
The MultiJEDI ERC Project: Multilingual Joint Word Sense - - PowerPoint PPT Presentation
The MultiJEDI ERC Project: Multilingual Joint Word Sense Disambiguation Roberto Navigli http://lcl.uniroma1.it 5 July 2016 META-FORUM 2016 ing Andrea Moro Alessandro Claudio Raganato Delli Bovi Daniele 11.07.16 Tiziano Vannella
Roberto Navigli The MultiJEDI ERC Project: Multilingual Joint Word Sense Disambiguation
5 July 2016 – META-FORUM 2016 http://lcl.uniroma1.it
ing
11.07.16Simone Ponzetto Tiziano Flati Andrea Moro Daniele Vannella Taher Pilehvar Francesco Cecconi
11.07.16 The MultiJEDI ERC Project Roberto Navigli 2Federico Scozzafava Alessandro Raganato Ignacio Iacobacci José Camacho Collados Claudio Delli Bovi
You may say I'm a dreamer, but I am not the only one. I hope someday you'll join us. And the world will be as one!
A 5-year ERC Starting Grant (2011-2016)
http://multijedi.org
The MultiJEDI ERC Project Roberto Navigli[Navigli & Ponzetto, ACL 2010; Pilehvar & Navigli, ACL 2014]
11.07.16 The MultiJEDI ERC Project Roberto NavigliThe resource diaspora
Key Objective 1: create knowledge for all languages Multilingual Joint Word Sense Disambiguation (MultiJEDI)
WordNet MultiWordNet WOLF MCR GermaNet BalkaNet
The MultiJEDI ERC Project Roberto Navigliimages, etc. from each of the merged resources Merging entries from different resources into BabelNet
The MultiJEDI ERC Project Roberto Navigli 9WordNet
What is BabelNet?
– WordNet: the most popular computational lexicon of English – Open Multilingual WordNet: a collection of open wordnets – WoNeF: a French WordNet – Wikipedia: the largest collaborative encyclopedia – Wikidata: the largest collaborative knowledge base – Wiktionary: the largest collaborative dictionary – OmegaWiki: a medium-size collaborative multilingual dictionary – GeoNames: a worldwide geographical database – Microsoft Terminology: a computer science thesaurus – High-quality automatic sense-based translations
The MultiJEDI ERC Project Roberto NavigliWhat is BabelNet?
Why do we need BabelNet?
languages
The MultiJEDI ERC Project Roberto NavigliWhy do we need BabelNet?
languages
The MultiJEDI ERC Project Roberto NavigliWhy do we need BabelNet?
languages
The MultiJEDI ERC Project Roberto NavigliWhy do we need BabelNet?
languages
– 6M concepts and 7.7M named entities – 119M word senses – 378M semantic relations (27 relations per concept on avg.) – 11M images associated with concepts – 41M textual definitions – 2M concepts with domains associated
The MultiJEDI ERC Project Roberto NavigliWhy do we need BabelNet?
languages
encyclopedic knowledge is semantically interconnected
11.07.16 META Prize 2015: BabelNet Roberto Navigli 16Why do we need BabelNet?
languages
encyclopedic knowledge is semantically interconnected
with labeled relations, pictures, multilingual synsets
11.07.16 META Prize 2015: BabelNet Roberto Navigli 17Why do we need BabelNet?
languages
encyclopedic knowledge is semantically interconnected
with labeled relations, pictures, multilingual synsets
both concepts and named entities (Wikipedia Bitaxonomy)
– Ferrari Testarossa is-a sports car – BabelNet is-a semantic network & encyclopedic dictionary
The MultiJEDI ERC Project Roberto NavigliWhy do we need BabelNet?
languages
encyclopedic knowledge is semantically interconnected
with labeled relations, pictures, multilingual synsets
both concepts and named entities (Wikipedia Bitaxonomy)
endpoint (2 billion triples); downloadable indices for research purposes
The MultiJEDI ERC Project Roberto NavigliThe core of the Linguistic Linked Open Data cloud!
What can we do with BabelNet?
What can we do with BabelNet?
What can we do with BabelNet?
WordNet-Wikipedia mapping accuracy
– On the 6000 lowest-confidence mappings – Note: this concerns only 50k synsets in the intersection
11.07.16 BabelNet & friends Roberto Navigli 24Creating Datasets with BabelNet: all in one!
WordNet, Wikipedia, OmegaWiki, Open Multilingual WordNet, Wikidata and Wiktionary
Key fact!
25BabelNet
[Moro, Raganato & Navigli, TACL 2014]
The MultiJEDI ERC Project Roberto Navigli 26Motivation (1): hungry computers
Motivation (1): hungry computers
Motivation (1): hungry computers
Multilingual Joint Word Sense Disambiguation (MultiJEDI) Key Objective 2: use all languages to disambiguate one
30 11.07.16 The MultiJEDI ERC Project Roberto NavigliSo what?
Word Sense Disambiguation (common nouns, verbs, adjectives) and Entity Linking together
Step 4: Select the most reliable meanings
“Thomas and Mario are strikers playing in Munich”
Thomas (novel) Seth Thomas Thomas Müller Mario Gómez Mario (Album) Mario (Character) Striker (Movie) Striker (Video Game) striker (Sport) Munich (City) FC Bayern Munich Munich (Song)
32 11.07.16 The MultiJEDI ERC Project Roberto NavigliExperimental Results: Fine-grained (Multilingual) Disambiguation
Senseval-3 SemEval-2007 task 17 SemEval-2013 task 12
34 11.07.16 The MultiJEDI ERC Project Roberto NavigliExperimental Results: KORE50, AIDA-CoNLL
Babelfy "understands" 'the mouse ate the cheese'!
11.07.16 36 The MultiJEDI ERC Project Roberto NavigliWSD and Entity Linking together win!
11.07.16 37 The MultiJEDI ERC Project Roberto NavigliThe Crazy Polyglot!
11.07.16 Multilingual Web Access – WWW 2015 Roberto Navigli 38Live demo (2) – Crazy polyglot! EN In todayʼs knowledge and information society FR le paysage lexicographique est plus hétérogène que jamais. IT Possono le risorse stand-alone competere ES con múltiples funciones, portale lexicográficas multilingüe y servicios web, ZH Web服,定 制 的 喜 好 和 个 人 用 的 个 人 料 ?
11.07.16 39 The MultiJEDI ERC Project Roberto NavigliBabelNet 3.6 is now a knowledge base!
(superset of DBpedia) + relations extracted with Open Information Extraction techniques
79 The MultiJEDI ERC Project Roberto NavigliSENSE AND CONCEPT REPRESENTATIONS
[Iacobacci et al., ACL 2015; Camacho-Collados et al., NAACL+ACL 2015]
The MultiJEDI ERC Project Roberto Navigli 41Latent representation of word senses: SensEmbed
Iacobacci, Pilehvar, Navigli (ACL 2015)
11.07.16 Représentations vectorielles latentes et explicites Roberto Navigli 42Problem: word representations cannot capture polysemy
11.07.16 43 Représentations vectorielles latentes et explicites Roberto NavigliProblem: word representations cannot capture polysemy
Problem: word representations cannot capture polysemy
Our solution: distinct representation for each word’s meaning
Représentations vectorielles latentes et explicites Roberto NavigliEmbeddings + Semantic Knowledge = SensEmbed
11.07.16 Représentations vectorielles latentes et explicites Roberto Navigli 47with Babelfy with high precision, low recall
Explicit representation of concepts: NASARI
Camacho Collados, Pilehvar and Navigli NAACL 2015 + ACL 2015 + Artificial Intelligence Journal 2016
11.07.16 Représentations vectorielles latentes et explicites Roberto Navigli 48Motivation
11.07.16 49 Représentations vectorielles latentes et explicites Roberto NavigliNASARI: human-interpretable semantic vectors
11.07.16 50 Représentations vectorielles latentes et explicites Roberto Navigli– words – Babel synsets (concepts and named entities)
NASARI: human-interpretable semantic vectors
11.07.16 51 Représentations vectorielles latentes et explicites Roberto Naviglisemantic alignments and comparison of text
Semantic similarity: results
vectors:
11.07.16 Recent achievements in multilingual NLP Roberto Navigli 52Cross-lingual Word similarity: Results
Spearman (ρ) and Pearson (r) correlation performance of different systems on multilingual editions of the RG-65 datasets.
Babelscape: bringing our multilingual technologies to the market
– BabelNet live – Increase coverage
market
sustainable
for research purposes
11.07.16 The MultiJEDI ERC Project Roberto Navigli 54Summarizing…
55 The MultiJEDI ERC Project Roberto Navigli+ latent and explicit representations of meanings + sustainability plan for improving our systems over time
Thanks or…
(grazie)
57 11.07.16 The MultiJEDI ERC Project Roberto NavigliRoberto Navigli
Linguistic Computing Laboratory http://lcl.uniroma1.it
@RNavigli