DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann - - PowerPoint PPT Presentation

dbpedia extraction of knowledge from wikipedia
SMART_READER_LITE
LIVE PREVIEW

DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann - - PowerPoint PPT Presentation

Creating Knowledge out of Interlinked Data DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann AKSW, Universitt Leipzig DBpedia is a community project, please see http://dbpedia.org for a full list of contributors LOD2


slide-1
SLIDE 1

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

AKSW, Universität Leipzig

Sebastian Hellmann

DBpedia Extraction of Knowledge from Wikipedia

DBpedia is a community project, please see http://dbpedia.org for a full list of contributors

slide-2
SLIDE 2

2

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia

  • DBpedia is a community effort to extract structured information from

Wikipedia and to make this information available on the Web.

  • DBpedia allows you to ask sophisticated queries against Wikipedia,

and to link other data sets on the Web to Wikipedia data.

2

Semi Structured Wiki Syntax Structured Information

slide-3
SLIDE 3

3

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia - Overview

  • Description DBpedia
  • Data Set
  • DBpedia Software
  • LOD Cloud
  • Collaborative Ontology Engineering
  • DBpedia-Live
  • Internationalization

3

slide-4
SLIDE 4

Structure in Wikipedia

 Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links

 other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

slide-5
SLIDE 5

Structure in Wikipedia

 Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links

 other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

slide-6
SLIDE 6

Structure in Wikipedia

 Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links

 other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

slide-7
SLIDE 7

Structure in Wikipedia

 Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links

 other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

slide-8
SLIDE 8

Structure in Wikipedia

 Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links

 other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

slide-9
SLIDE 9

Structure in Wikipedia

 Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links

 other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

slide-10
SLIDE 10

Structure in Wikipedia

 Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links

 other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

slide-11
SLIDE 11

Infobox Templates

{{Infobox Korean settlement | title = Busan Metropolitan City | img = Busan.jpg | imgcaption = A view of the [[Geumjeong]] district in Busan | hangul = 부산 광역시 ... | area_km2 = 763.46 | pop = 3635389 | popyear = 2006 | mayor = Hur Nam-sik | divs = 15 wards (Gu), 1 county (Gun) | region = [[Yeongnam]] | dialect = [[Gyeongsang]] }} dbp:Busan dbp:title ″Busan Metropolitan City″ dbp:Busan dbp:hangul ″ 부산 광역시″ @Hang dbp:Busan dbp:area_km2 ″763.46“^xsd:float dbp:Busan dbp:pop ″3635389“^xsd:int dbp:Busan dbp:region dbp:Yeongnam dbp:Busan dbp:dialect dbp:Gyeongsang ...

Wikitext-Syntax RDF representation

slide-12
SLIDE 12

12

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Data Set

Simple Questions – hard to answer:

  • What have Innsbruck and Leipzig in common?
  • Who are mayors of central European towns elevated more than 1000m?
  • All soccer players, who played as goalkeeper for a club that has a

stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants DBpedia can answer these questions and provides a public SPARQL endpoint for developing (hosted on a Virtuoso server)

12

slide-13
SLIDE 13

13

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Data Set

  • “A little Semantics goes a long way” - Jim Hendler

http://tinyurl.com/2uhuow9

13

slide-14
SLIDE 14

14

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia - Overview

14

slide-15
SLIDE 15

15

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Data Set

15

slide-16
SLIDE 16

16

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Data Set

16

http://en.wikipedia.org/wiki/Daejeon http://dbpedia.org/resource/Daejeon

  • stable IDs
  • useful data (population, pictures ...)
slide-17
SLIDE 17

17

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Software DIEF - DBpedia Information Extraction Framework

17

slide-18
SLIDE 18

18

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Software DIEF - DBpedia Information Extraction Framework

  • Hosted on Sourceforge
  • More than 30 developers
  • Written in Scala
  • Can potentially be adapted to other MediaWikis (currently Wiktionary)

18

slide-19
SLIDE 19

DIEF

slide-20
SLIDE 20

DIEF

slide-21
SLIDE 21

DIEF

slide-22
SLIDE 22

DIEF

slide-23
SLIDE 23

DIEF

slide-24
SLIDE 24

24

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Collaborative Ontology Engineering

  • “A little Semantics goes a long way” - Jim Hendler

24

slide-25
SLIDE 25

25

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Collaborative Ontology Engineering

  • More Semantics go a longer way...
  • Schema and mapping is created by a community to improve data

quality

  • improves precision and recall of queries

25

  • “A little Semantics goes a long way” - Jim Hendler
slide-26
SLIDE 26

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8..2011 . Page http://lod2.eu 26

A closer look at infoboxes

slide-27
SLIDE 27

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8..2011 . Page http://lod2.eu 27

A closer look at infoboxes

slide-28
SLIDE 28

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8..2011 . Page http://lod2.eu 28

A closer look at infoboxes

slide-29
SLIDE 29

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8..2011 . Page http://lod2.eu 29

Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister)

  • ffice = Prime Minister of the UK

birth_date = 20.4.1951 birth_place = Govan Romero (Actor)

  • ccupation = Actor, Editor

birthdate = 4.2.1940 birthplace = New York

slide-30
SLIDE 30

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8..2011 . Page http://lod2.eu 30

Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister)

  • ffice = Prime Minister of the UK

birth_date = 20.4.1951 birth_place = Govan Romero (Actor)

  • ccupation = Actor, Editor

birthdate = 4.2.1940 birthplace = New York

slide-31
SLIDE 31

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8..2011 . Page http://lod2.eu 31

Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister)

  • ffice = Prime Minister of the UK

birth_date = 20.4.1951 birth_place = Govan Romero (Actor)

  • ccupation = Actor, Editor

birthdate = 4.2.1940 birthplace = New York

slide-32
SLIDE 32

32

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Collaborative Ontology Engineering

  • Correct Semantics:
  • Combine what belongs together (birth_place, birthplace)
  • Separate what is different (bornIn, birthplace)
  • Mappings Wiki
  • Mapping Rules
  • http://mappings.dbpedia.org/
  • Everybody can contribute
  • About 120 editors

32

slide-33
SLIDE 33

33

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Live DBpedia Live

33

slide-34
SLIDE 34

34

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Live

  • DBpedia dumps were created based on Wikipedia dumps
  • About 100,000 – 150,000 page edits on Wikipedia per day
  • Page edits are pulled, transformed into RDF and loaded into a triple

store

  • 5 minute delay increases performance by 15%

34

slide-35
SLIDE 35

35

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Live

35

slide-36
SLIDE 36

36

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Live

  • SPARQL Endpoint: http://live.dbpedia.org/sparql
  • Documentation: http://wiki.dbpedia.org/DBpediaLive
  • Statistics: http://live.dbpedia.org/LiveStats/

36

slide-37
SLIDE 37

37

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Internationalization DBpedia Internationalization

37

slide-38
SLIDE 38

38

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Internationalization

  • DBpedia Internationalization Committee founded:

http://wiki.dbpedia.org/Internationalization

  • Available DBpedias:
  • Korean, Greece, German, Polish, Russian, Netherlands
  • Mappings available for over 12 languages

38

slide-39
SLIDE 39

39

Creating Knowledge out of Interlinked Data

KAIST – LOD2 16.8.2011 http://lod2.eu

DBpedia – Internationalization

  • DBpedia Internationalization

39

slide-40
SLIDE 40

40

slide-41
SLIDE 41

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

Thank you for your attention!

DBpedia is a community project, please see http://dbpedia.org for a full list of contributors.