type inference through the analysis of wikipedia links
play

Type inference through the analysis of Wikipedia links Andrea - PowerPoint PPT Presentation

Type inference through the analysis of Wikipedia links Andrea Giovanni Nuzzolese nuzzoles@cs.unibo.it Aldo Gangemi aldo.gangemi@cnr.it Valentina Presutti valentina.presutti@cnr.it Paolo Ciancarini ciancarini@cs.unibo.it stlab.istc.cnr.it


  1. Type inference through the analysis of Wikipedia links Andrea Giovanni Nuzzolese nuzzoles@cs.unibo.it Aldo Gangemi aldo.gangemi@cnr.it Valentina Presutti valentina.presutti@cnr.it Paolo Ciancarini ciancarini@cs.unibo.it stlab.istc.cnr.it 16 April 2012 - Lyon, France - LDOW 2012

  2. Outline • Motivations • Materials • Applied methods • Results • Conclusions stlab.istc.cnr.it 2

  3. Motivations ✦ Only a subset of the DBpedia resources is typed with the Resources used in wikilinks DBpedia ontology (DBPO) relations: ✦ The typing procedure is top- 15,944,381 down. ✦ Is the DBPO complete with respect to the DBpedia domain? ✦ How good and homogeneous Resources having a DBPO type: is the granularity of DBPO 1,518,697 types? stlab.istc.cnr.it 3

  4. Materials Wikilink triples with typed subject/object: DBpedia 3.6 16,745,830 Dataset # of triples Wikilinks wikilink triples 107,892,317 triples: 107,892,317 infobox mapping-based “data” triples 9,357,273 rdfs:label triples 7,972,225 rdf:type triples 6,173,940 DBpedia ontology: infobox mapping-based “object” triples 4,251,239 272 classes stlab.istc.cnr.it 4

  5. What we did • Wikilinks of a DBpedia resource convey knowledge that can be used for classifying it. • Classification methods ✦ Inductive learning: k-Nearest Neighbor algorithm ✦ Abductive classification based on EKPs [1] and homotypes used as background knowledge • The methods were performed on Resources having a Resources used in wikilinks DBPO type: relations: Sample of 1,518,697 15,944,381 untyped resources: 1,000 [1] A. G. Nuzzolese, A. Gangemi, V. Presutti, and P . Ciancarini. Encyclopedic Knowledge Patterns from Wikipedia Links. In L. Aroyo, N. Noy, and C. Welty, editors, Proceedings of the 10th International Semantic Web Conference (ISWC2011), pages 520-536. Springer, 2011. stlab.istc.cnr.it 5

  6. Inductive classification • We designed two inductive classification experiments based on the k -NN algorithm ✦ on 272 features, i.e., all the classes in the DBPO ✦ on 27 features, i.e., the top-level classes in the DBPO hierarchy • For each experiment we built a labeled feature space model as training set by using a randomly sampled 20% of typed resources ✦ the algorithms were tested on the remaining 80% of typed resources stlab.istc.cnr.it 6 5

  7. Building the training set for K-Nearest Neighbor algorithm dbpedia:Apple_Inc. dbpedia:NeXT dbpo:wikiPageWikiLink dbpedia:Steve_Jobs dbpedia:Forbes dbpedia:Cupertino,_California Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs ... stlab.istc.cnr.it 7

  8. Building the training set for K-Nearest Neighbor algorithm dbpo:Organisation dbpedia:Apple_Inc. dbpedia:NeXT dbpo:wikiPageWikiLink rdf:type dbpedia:Steve_Jobs dbpo:Magazine dbpo:City dbpedia:Forbes dbpedia:Cupertino,_California dbpo:Person Mammal Scientist Company Drug City Magazine Class dbpo:Person dbpedia:Steve_Jobs ... stlab.istc.cnr.it 7

  9. Building the training set for K-Nearest Neighbor algorithm dbpo:Organisation dbpo:Magazine dbpo:City dbpo:wikiPageWikiLink rdf:type kp:linksTo dbpedia:Steve_Jobs Mammal Scientist Company Drug City Magazine Class 1 1 dbpo:Person dbpedia:Steve_Jobs 0 0 0 1 ... stlab.istc.cnr.it 7

  10. Building the training set for K-Nearest Neighbor algorithm dbpo:Organisation dbpo:Magazine dbpo:City dbpo:wikiPageWikiLink rdf:type kp:linksTo dbpedia:Steve_Jobs Mammal Scientist Company Drug City Magazine Class 1 1 dbpo:Person dbpedia:Steve_Jobs 0 0 0 1 ... ... ... ... ... ... ... ... stlab.istc.cnr.it 7

  11. Building the training set for K-Nearest Neighbor algorithm dbpo:Organisation dbpo:Magazine dbpo:City dbpo:wikiPageWikiLink rdf:type kp:linksTo dbpedia:Steve_Jobs ✦ Precision using all DBPO types as features: 31.65% ✦ Precision using the top-level of DBPO as features: 40.27% stlab.istc.cnr.it 7

  12. Abductive classification with EKPs • EKPs ✦ A EKP of a certain entity type is a small vocabulary that captures the core types used for describing such entity type as it emerges from the Wikipedia crowds visit aemoo.org for an exploratory tool based on EKPs stlab.istc.cnr.it 8

  13. How can we infer the type of “Galileo Galilei”? http://www.aemoo.org stlab.istc.cnr.it 9

  14. How can we infer We know its path types the type of “Galileo Galilei”? http://www.aemoo.org stlab.istc.cnr.it 9

  15. We compare the path types involving We have 231 EKPs “Galileo Galilei” as subject with EKPs in order to identify the most similar, which is the "Scientist" EKP . http://www.aemoo.org stlab.istc.cnr.it 9

  16. The inferred type for the resource “Galileo Galiei” is the class “Scientist” http://www.aemoo.org stlab.istc.cnr.it 9

  17. Distinctive weakness of some EKPs ✦ The distinctive weakness seems due to wide overlaps among some EKPs ✦ Systematic ambiguity of the 4 largest classes ✦ Precision and recall on all DBPO types both 44.4% ✦ Precision and recall on the top-level of DBPO hierarchy: 36.5% and 79.5% stlab.istc.cnr.it 10

  18. Homotype-based abductive classification • Homotypes are wikilinks that have the same type on both the subject and the object of the triple dbpedia:Plato dbpo:Philosopher dbpedia:Immanuel_Kant dbpo:Philosopher rdf:type rdf:type dbpo:wikiPageWikiLink • We have observed how the homotype is usually the most frequent (or in the top 3) wikilink type • Given an untyped entity, we hypothesize that the most frequent type involved in its ingoing/ outgoing wikilinks detects its homotype, hence it indicates its type 11 stlab.istc.cnr.it

  19. Homotype-based abductive classification s stlab.istc.cnr.it 12

  20. Homotype-based abductive classification s stlab.istc.cnr.it 12

  21. Results on classifying already typed resources stlab.istc.cnr.it 13

  22. Results on untyped resources • Results on a sample of 1,000 untyped resources are much less satisfactory With EKPs With Homotypes stlab.istc.cnr.it 14

  23. Why? [1] • Typed entities: 2:3 typed wikilinks ratio • Untyped entities: 1:3 typed wikilinks ratio • Link structure for untyped entities is not rich enough stlab.istc.cnr.it 15

  24. Why? [2] • DBPO does not provide a complete set of classes for correctly typing DBpedia resources dbpedia:List_of_FIFA_World_Cup_finals Collection dbpedia:Computer_Science ScientificDiscipline dbpedia:Counterattack Plan dbpedia:Eros(concept) Concept dbpedia:Gentlemen’s_agreement Agreement stlab.istc.cnr.it 16

  25. Conclusions • We have investigated different approaches for typing DBpedia resources based on the data set of wikilinks • Results are acceptable in the test set, but extensive untypedness in output links, and poor DBPO coverage severely compromise automatic typing for untyped resources • We have analyzed possible causes deriving from some bias in DBpedia 17 stlab.istc.cnr.it

  26. Future work • Yago could be helpful but ✦ there is a lack of mapping between YAGO and DBPO ✦ it has larger coverage and only an overlap with DBPO ✦ the granularity of its categories is finer, and not easily reusable, because the top level is very large stlab.istc.cnr.it 18

  27. Thank you Andrea Nuzzolese - STLab, ISTC-CNR & Dipartimento di Scienze dell’Informazione University of Bologna Italy stlab.istc.cnr.it 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend