Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, - PowerPoint PPT Presentation

*Antonio Toral ^Rafael Muñoz *Monica Monachini Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, Italy) ^University of Alicante (Spain) LREC 2008 O12 - Named Entity Recognition Marrakech, 2008-05-28

Outline Intro Named Entities (NEs) Language Resources (LRs) Why NEs in LRs? How to enrich LRs with NEs? Named Entity WordNet Mapping & Disambiguation Article extraction NE identification NE repository Conclusions & Future 2

NEs Usually refer to Proper nouns: names of people, locations, organizations, ... Numerical expressions: time, amounts, ... Important for NLP tasks NEs: 10% of text + carry important semantic info Different sets of NE categories ConLL -> flat, 4 types (per, org, loc, misc) Sekine -> hierarchy, +100 subtypes 3

LRs Manually created by expert lexicographers Broad-coverage resources Common nouns, adjectives, verbs, adverbs Rich Semantic Info (relations, roles, etc) WordNet +100k word senses 4

LRs Manually created by expert lexicographers Broad-coverage resources Common nouns, adjectives, verbs, adverbs Rich Semantic Info (relations, roles, etc) WordNet +100k word senses LRs lack info about NEs “building a proper noun ontology is more difficult than building a common noun ontology as the set of proper nouns grows more rapidly ” (Mann, 2002) 5

Why NEs in LRs? Stored Knowledge can be applied to NLP tasks E.g. Question Answering Question (CLEF 2006) Who is Vigdis Finnbogadottir? QA system Linguistic analysis of text [S. Ferrandez et al. 06] “[...] presidents: Vigdis Finnbogadottir ( Iceland ), [...]” Solution (wrong): Iceland 6

Why NEs in LRs? Stored Knowledge can be applied to NLP tasks E.g. Question Answering Question (CLEF 2006) Who is Vigdis Finnbogadottir? QA system Linguistic analysis of text [S. Ferrandez et al. 06] “[...] presidents: Vigdis Finnbogadottir ( Iceland ), [...]” Solution (wrong): Iceland Possible related knowledge in LR “Vigdis Finnbogadottir” instance_of: “president”, “icelandic”, “female head of state” LR can be useful within QA, for example to: Find answers Validate answers 7

How to enrich LRs with NEs? NEs should be acquired & introduced automatically Ideal Source Up-to-date High Coverage Allow a Good Quality Extraction 8

How to enrich LRs with NEs? NEs should be acquired & introduced automatically Ideal Source Up-to-date High Coverage Allow a Good Quality Extraction Wikipedia Dynamic source Huge amount of NEs Some degree of structure 9

Named Entity WordNet Automatically Extend WordNet with NEs extracted from Wikipedia Wikip Wikip cats articles Mapping & Article NE Disambig extraction identificat WN NE nouns reposit 10

Mapping Map lemmas WordNet: noun classes (instantiated) Wikipedia: categories Results Wikipedia dump date 200704 200711 200801 Total 893 Mapped 513 536 541 Synsets % 57.44% 60.02% 60.58% Analysis (non mapped) 75% no matching category but matching article 13% no matching category nor matching article 10% matching category but PoS error 11

Disambiguation WordNet polysemous nouns to Wikipedia categories Intersection of instances WN obelisk Obelisk1: stone pillar WK Obelisks Obelisk2: character Mapping used in printing 12

Disambiguation WordNet polysemous nouns to Wikipedia categories Intersection of instances WN obelisk Obelisk1: stone pillar WK Obelisks has_instance Obelisk2: character Mapping used in printing Washington Monument - 13

Disambiguation WordNet polysemous nouns to Wikipedia categories Intersection of instances WN obelisk Obelisk1: stone pillar WK Obelisks has_instance Obelisk2: character Mapping used in printing contains Washington Monument Washington Monument - 14

Disambiguation WordNet polysemous nouns to Wikipedia categories Intersection of instances WN obelisk Obelisk1 : stone pillar WK Obelisks has_instance Obelisk2: character Mapping used in printing contains Washington Monument Washington Monument intersect - Results (262 words): 100% precision, 39% recall Analysis non disambiguated words: 78% no common instance found 22% no sense corresponds to category 15

Article extraction For each category mapped (and its hyponyms*) fetch: Titles Abstracts Variants *Hyponym identification (subcategories) ^ category (“ by “ | “ of “ | “ in “ | “ stubs$”) Obelisks in Argentina ^ (JJ|JJR|NN|NP)+ (CC(JJ|JJR|NN|NP)+)* “ “ category$ Ancient obelisks 16

NE identification An extracted article might be a NE or a common noun Look for occurrences of its title in its body text & check capitalisation (Bunescu & Pasca 2006) Not only in the English Wikipedia, but in 10 Wikipedias for langs that follow these caps. norms Text size to look for occurrences bigger -> results more representative Language independent -> whatever the language we obtain the article equivalent in these languages 17

NE identification An extracted article might be a NE or a common noun Look for occurrences of its title in its body text & check capitalisation (Bunescu & Pasca 2006) Not only in the English Wikipedia, but in 10 Wikipedias for langs that follow these caps. norms Text size to look for occurrences bigger -> results more representative Language independent -> whatever the language we obtain the article equivalent in these languages Results Only English -> F 78.06%, P 73.91%, R 87.93% 10 languages -> F 82.26%, P 79.69%, R 87.93% 18

Extracted NEs General 310,742 Nes, 452,017 variants, 381,043 instance rels Detailed (per lexicographic file) Lex File Nes Example act 4,214 Project_Pluto instanceOfproject0_4 artifact 23,878 Akinada_Bridge instanceOf suspension_bridge0_6 communication 1,973 Flower_of_Scotland instanceOf national_antherm0_10 event 58 Sino-Soviet_split instanceOf schism0_11 group 1,216 Medici instanceOf family0_14 location 43,582 Incense_Route instanceOf trade_route0_15 object 28,180 Pyxis instanceOf constellation=_17 person 277,941 Vladimir_Kotelnikov instanceOf electrical_engineer0_18 19

NE repository Elements: NEs, classes, relations, variants, definitions LMF compliant: ISO standard for lexicons Independent from specific LRs Web test & download dlsi.ua.es/~atoral/#Resources www2.ilc.cnr.it/ne-repository 20

Conclusions & Future High Quality & Large NE extension of WordNet +310k Nes (it had 7k), +380k relations Standard-compliant output Future Apply to other LRs for different languages Empirically demonstrate generality of the approach Derive a Multilingual NE repository Exploit Textual Entailment to disambiguate mapping 22

End Thanks for your attention! Questions? 23

Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, - PowerPoint PPT Presentation

Antonio Toral ^Rafael Muoz Monica Monachini Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, Italy) ^University of Alicante (Spain) LREC 2008 O12 - Named Entity Recognition Marrakech, 2008-05-28 Outline Intro Named

KDI An Example of Linguistic Resource: WordNet Fausto Giunchiglia and Mattia Fumagallli

Named Entity Recognition Using BERT and ELMo Group 8 : Mikaela Guerrero Vikash Kumar Nitya

Recycling Named Entity Taggers Unsupervised Domain and Language Adaptation for Named Entity

From LEX i TRON to Asian WordNet on From LEX i TRON to Asian WordNet on Collaborative

Whither WordNet? Christiane Fellbaum George A. Miller Princeton University WordNet was made

Wordnet Ontology as a Wordnet Ontology as a Geographical Information Geographical Information

Multi-Task Transfer Learning for Fine-Grained Named Entity Recognition Masato Hagiwara 1 , Ryuji

Information Extraction Extracting limited forms of information from text Named entity

AIDA-light: High-Throughput Named-Entity Disambiguation Ba Dat Nguyen Johannes Hoffart Martin

VI.3 Named Entity Reconciliation Problem: Same entity appears in Different spellings

Verbs in the Open Multilingual Wordnet Francis Bond Linguistics and Multilingual Studies,

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Efficient Dependency-Guided Named Entity Recognition Zhanming Jie Aldrian Obaja Muis Wei Lu

Structured Generative Models for Unsupervised Named Entity Clustering Micha Elsner, Prof. Eugene

An Automatically Built Named Entity Lexicon for Arabic M. Attia, A. Toral , L. Tounsi*, M.

YAGO: A LARGE ONTOLOGY FROM WIKIPEDIA AND WORDNET Fabian M. Suchanek, Gjergji Kasneci, Gerhard

Lecture 3 Toward a Science of Mechanics Who are we? Galileo From questionnaires first class

On On On On On On On On CMOS Circuit CMOS Circuit CMOS Circuit CMOS Circuit CMOS

Abstraction Elimination of Special cases Exceptions Spelling rules Punctuation

Michelangelos Life Born March 6, 1475 in Caprese, Italy Mother died when he was six

SEARCHES FOR ANNIHILATING DARK MATTER IN THE MILKY WAY HALO WITH ICECUBE Samuel Flis* , Morten

FacultyAdministrator Collaboration Team(FACT) FDP Meeting Sept 2018 Agenda for FACT

Today CS 188: Artificial Intelligence Neural Nets (wrap-up) and Decision Trees Neural Nets --

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Sambuz

Useful Links

Newsletter

Mail Us

Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, - PowerPoint PPT Presentation

*Antonio Toral ^Rafael Muoz *Monica Monachini Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, Italy) ^University of Alicante (Spain) LREC 2008 O12 - Named Entity Recognition Marrakech, 2008-05-28 Outline Intro Named

KDI An Example of Linguistic Resource: WordNet Fausto Giunchiglia and Mattia Fumagallli

Named Entity Recognition Using BERT and ELMo Group 8 : Mikaela Guerrero Vikash Kumar Nitya

Recycling Named Entity Taggers Unsupervised Domain and Language Adaptation for Named Entity

From LEX i TRON to Asian WordNet on From LEX i TRON to Asian WordNet on Collaborative

Whither WordNet? Christiane Fellbaum George A. Miller Princeton University WordNet was made

Wordnet Ontology as a Wordnet Ontology as a Geographical Information Geographical Information

Multi-Task Transfer Learning for Fine-Grained Named Entity Recognition Masato Hagiwara 1 , Ryuji

Information Extraction Extracting limited forms of information from text Named entity

AIDA-light: High-Throughput Named-Entity Disambiguation Ba Dat Nguyen Johannes Hoffart Martin

VI.3 Named Entity Reconciliation Problem: Same entity appears in Different spellings

Verbs in the Open Multilingual Wordnet Francis Bond Linguistics and Multilingual Studies,

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Efficient Dependency-Guided Named Entity Recognition Zhanming Jie Aldrian Obaja Muis Wei Lu

Structured Generative Models for Unsupervised Named Entity Clustering Micha Elsner, Prof. Eugene

An Automatically Built Named Entity Lexicon for Arabic M. Attia*, A. Toral *, L. Tounsi*, M.

YAGO: A LARGE ONTOLOGY FROM WIKIPEDIA AND WORDNET Fabian M. Suchanek, Gjergji Kasneci, Gerhard

Lecture 3 Toward a Science of Mechanics Who are we? Galileo From questionnaires first class

On On On On On On On On CMOS Circuit CMOS Circuit CMOS Circuit CMOS Circuit CMOS

Abstraction Elimination of Special cases Exceptions Spelling rules Punctuation

Michelangelos Life Born March 6, 1475 in Caprese, Italy Mother died when he was six

SEARCHES FOR ANNIHILATING DARK MATTER IN THE MILKY WAY HALO WITH ICECUBE Samuel Flis* , Morten

FacultyAdministrator Collaboration Team(FACT) FDP Meeting Sept 2018 Agenda for FACT

Today CS 188: Artificial Intelligence Neural Nets (wrap-up) and Decision Trees Neural Nets --

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Sambuz

Useful Links

Newsletter

Mail Us

Antonio Toral ^Rafael Muoz Monica Monachini Named Entity WordNet *Istituto di Linguistica Computazionale (Pisa, Italy) ^University of Alicante (Spain) LREC 2008 O12 - Named Entity Recognition Marrakech, 2008-05-28 Outline Intro Named

An Automatically Built Named Entity Lexicon for Arabic M. Attia, A. Toral , L. Tounsi*, M.