dbpedia extraction of knowledge from wikipedia
play

DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann - PowerPoint PPT Presentation

Creating Knowledge out of Interlinked Data DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann AKSW, Universitt Leipzig DBpedia is a community project, please see http://dbpedia.org for a full list of contributors LOD2


  1. Creating Knowledge out of Interlinked Data DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann AKSW, Universität Leipzig DBpedia is a community project, please see http://dbpedia.org for a full list of contributors LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

  2. Creating Knowledge out of Interlinked Data DBpedia • DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. • DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. Structured Information Semi Structured Wiki Syntax http://lod2.eu KAIST – LOD2 16.8.2011 2 2

  3. Creating Knowledge out of Interlinked Data DBpedia - Overview • Description DBpedia • Data Set • DBpedia Software • LOD Cloud • Collaborative Ontology Engineering • DBpedia-Live • Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 3 3

  4. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  5. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  6. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  7. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  8. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  9. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  10. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  11. Infobox Templates Wikitext-Syntax {{Infobox Korean settlement | title = Busan Metropolitan City | img = Busan.jpg | imgcaption = A view of the [[Geumjeong]] district in Busan | hangul = 부산 광역시 ... | area_km2 = 763.46 | pop = 3635389 | popyear = 2006 | mayor = Hur Nam-sik | divs = 15 wards (Gu), 1 county (Gun) | region = [[Yeongnam]] | dialect = [[Gyeongsang]] }} RDF representation dbp:Busan dbp:title ″Busan Metropolitan City″ dbp:Busan dbp:hangul ″ 부산 광역시 ″ @Hang dbp:Busan dbp:area_km2 ″763.46“^xsd:float dbp:Busan dbp:pop ″3635389“^xsd:int dbp:Busan dbp:region dbp:Yeongnam dbp:Busan dbp:dialect dbp:Gyeongsang ...

  12. Creating Knowledge out of Interlinked Data DBpedia – Data Set Simple Questions – hard to answer: • What have Innsbruck and Leipzig in common? • Who are mayors of central European towns elevated more than 1000m? • All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants DBpedia can answer these questions and provides a public SPARQL endpoint for developing (hosted on a Virtuoso server) http://lod2.eu KAIST – LOD2 16.8.2011 12 12

  13. Creating Knowledge out of Interlinked Data DBpedia – Data Set • “A little Semantics goes a long way” - Jim Hendler http://tinyurl.com/2uhuow9 http://lod2.eu KAIST – LOD2 16.8.2011 13 13

  14. Creating Knowledge out of Interlinked Data DBpedia - Overview http://lod2.eu KAIST – LOD2 16.8.2011 14 14

  15. Creating Knowledge out of Interlinked Data DBpedia – Data Set http://lod2.eu KAIST – LOD2 16.8.2011 15 15

  16. Creating Knowledge out of Interlinked Data DBpedia – Data Set http://en.wikipedia.org/wiki/Daejeon http://dbpedia.org/resource/Daejeon - stable IDs - useful data (population, pictures ...) http://lod2.eu KAIST – LOD2 16.8.2011 16 16

  17. Creating Knowledge out of Interlinked Data DBpedia – Software DIEF - DBpedia Information Extraction Framework http://lod2.eu KAIST – LOD2 16.8.2011 17 17

  18. Creating Knowledge out of Interlinked Data DBpedia – Software DIEF - DBpedia Information Extraction Framework • Hosted on Sourceforge • More than 30 developers • Written in Scala • Can potentially be adapted to other MediaWikis (currently Wiktionary) http://lod2.eu KAIST – LOD2 16.8.2011 18 18

  19. DIEF

  20. DIEF

  21. DIEF

  22. DIEF

  23. DIEF

  24. Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • “A little Semantics goes a long way” - Jim Hendler http://lod2.eu KAIST – LOD2 16.8.2011 24 24

  25. Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • “A little Semantics goes a long way” - Jim Hendler • More Semantics go a longer way... • Schema and mapping is created by a community to improve data quality • improves precision and recall of queries http://lod2.eu KAIST – LOD2 16.8.2011 25 25

  26. Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 26

  27. Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 27

  28. Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 28

  29. Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 29

  30. Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 30

  31. Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 31

  32. Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • Correct Semantics: • Combine what belongs together (birth_place, birthplace) • Separate what is different (bornIn, birthplace) • Mappings Wiki • Mapping Rules • http://mappings.dbpedia.org/ • Everybody can contribute • About 120 editors http://lod2.eu KAIST – LOD2 16.8.2011 32 32

  33. Creating Knowledge out of Interlinked Data DBpedia – Live DBpedia Live http://lod2.eu KAIST – LOD2 16.8.2011 33 33

  34. Creating Knowledge out of Interlinked Data DBpedia – Live • DBpedia dumps were created based on Wikipedia dumps • About 100,000 – 150,000 page edits on Wikipedia per day • Page edits are pulled, transformed into RDF and loaded into a triple store • 5 minute delay increases performance by 15% http://lod2.eu KAIST – LOD2 16.8.2011 34 34

  35. Creating Knowledge out of Interlinked Data DBpedia – Live http://lod2.eu KAIST – LOD2 16.8.2011 35 35

  36. Creating Knowledge out of Interlinked Data DBpedia – Live • SPARQL Endpoint: http://live.dbpedia.org/sparql • Documentation: http://wiki.dbpedia.org/DBpediaLive • Statistics: http://live.dbpedia.org/LiveStats/ http://lod2.eu KAIST – LOD2 16.8.2011 36 36

  37. Creating Knowledge out of Interlinked Data DBpedia – Internationalization DBpedia Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 37 37

  38. Creating Knowledge out of Interlinked Data DBpedia – Internationalization • DBpedia Internationalization Committee founded: http://wiki.dbpedia.org/Internationalization • Available DBpedias: • Korean, Greece, German, Polish, Russian, Netherlands • Mappings available for over 12 languages http://lod2.eu KAIST – LOD2 16.8.2011 38 38

  39. Creating Knowledge out of Interlinked Data DBpedia – Internationalization • DBpedia Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 39 39

  40. 40

  41. Creating Knowledge out of Interlinked Data Thank you for your attention! DBpedia is a community project, please see http://dbpedia.org for a full list of contributors. LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend