Glue for all Wikipedias and a Use Case for Multilingualism Marco - - PowerPoint PPT Presentation

glue for all wikipedias and a use case for
SMART_READER_LITE
LIVE PREVIEW

Glue for all Wikipedias and a Use Case for Multilingualism Marco - - PowerPoint PPT Presentation

Dbpedia: Glue for all Wikipedias and a Use Case for Multilingualism Marco Fossati Martin Brmmer Mariano Rico Dbpedia: Extracting knowledge from Wikipedia Martin Brmmer bruemmer@informatik.uni-leipzig.de How? Mapping wikipedia data to Linked


slide-1
SLIDE 1

Glue for all Wikipedias and a Use Case for Multilingualism

Dbpedia: Marco Fossati Mariano Rico Martin Brümmer

slide-2
SLIDE 2

Extracting knowledge from Wikipedia

Dbpedia:

Martin Brümmer bruemmer@informatik.uni-leipzig.de

slide-3
SLIDE 3

why? Turn documents into data to granularly use and query it How? Mapping wikipedia data to Linked data Result:

Multilingual data with a common structure

slide-4
SLIDE 4

Multilingual community Guarantee data quality and coverage beyond language borders Organized in chapters 14 Language communities maintaining their language dbpedias

Supported by DBpedia association Opening the research project for long-term sponsoring

slide-5
SLIDE 5

The center of the lod cloud

slide-6
SLIDE 6

Extracting multilingual knowledge

Internationalization English Mapping

Dbpedia.org

slide-7
SLIDE 7

Extracting multilingual knowledge

Internationalization

slide-8
SLIDE 8

Extracting multilingual knowledge

Internationalization

slide-9
SLIDE 9

Extracting multilingual knowledge

Internationalization

slide-10
SLIDE 10

Extracting multilingual knowledge

Internationalization Mapped by chapters

$lang.Dbpedia.org

slide-11
SLIDE 11

USE CASES

dbpedia internationalization

slide-12
SLIDE 12
  • Abbrev. base

industrial use case

slide-13
SLIDE 13

why? Help in text segmentation in the form

  • f exceptions to segmentation rules

what? Multilingual Knowledge base of abbreviations how? Extract words that look like sentence boundaries, model via Lemon

slide-14
SLIDE 14

T H E I T A L I A N J O B

D B P E D I A I N T E R N A T I O N A L I Z A T I O N

Marco Fossati fossati@fbk.eu

slide-15
SLIDE 15

B U I L D I N G H U G E G A Z E T T E E R S

I N D U S T R I A L U S E C A S E

slide-16
SLIDE 16

W H Y ? natural language understanding W H A T ? linguistic resource language, domain-specific H O W ? the simplest query

slide-17
SLIDE 17

T H E O P E N D A T A L A N D S C A P E

U S E R S

slide-18
SLIDE 18

O P E N C O E S I O N E . G O V . I T

  • pen government
slide-19
SLIDE 19

F L O R E N C E N A T I O N A L L I B R A R Y

digital libraries

slide-20
SLIDE 20

I N F O G R A P H I C S

data-driven journalism

slide-21
SLIDE 21

S T U D E N T S L E A R N H O W T O T R A N S L A T E A C U L T U R E

T H E F I R S T I T A L I A N D B P E D I A M A P P I N G S P R I N T

slide-22
SLIDE 22

W H Y ? High quality, multilingual data W H A T ? mapping italian data to the dbpedia

  • ntology

H O W ? hackathon in a high school

slide-23
SLIDE 23

T H E S P A N I S H A P A R T M E N T

A N D N O W …

Marco Fossati fossati@fbk.eu

slide-24
SLIDE 24

T H E S P A N I S H J O B

D B P E D I A I 1 8 N

Mariano.Rico@upm.es

slide-25
SLIDE 25

T H E S P A N I S H J O B M E X I C A N A R G E N T I N I A N C O L O M B I A N … ( U P T O 2 2 )

D B P E D I A I 1 8 N

Mariano.Rico@upm.es

slide-26
SLIDE 26

W I K I P E D I A L A N G U A G E S

Ranking:

(As of 29th Jan. 2014)

1.- English (4.4 M) 2.- German (1.7 M) 3.- French (1.5 M) 4.- Italian (1.1M) Russian (1.1M) Spanish (1.1M) Polish (1.1M) 5.- Japanese (0.9) 6.- Portuguese (0.8M) 7.- Chinese (0.8M)

slide-27
SLIDE 27

M A P P I N G R A C E 2 0 1 1

ESDBPEDIA HACKATON ( N O V . 2 0 1 1 ) 15 PEOPLE

4H 4H 101 CLASSES MAPPED

8 0 % I N S T A N C E S M A P P E D

slide-28
SLIDE 28

E N G L I S H H U M A N S

E S D B P E D I A : T H E W E B S I T E

slide-29
SLIDE 29

S P A N I S H H U M A N S

E S D B P E D I A : T H E W E B S I T E

slide-30
SLIDE 30

L O C A T I O N S

E S D B P E D I A : T H E W E B S I T E

slide-31
SLIDE 31

L O C A T I O N S

E S D B P E D I A : T H E W E B S I T E

English

(browser)

users: 16%

(2091 in 12686)

Spanish

(browser)

users: 78%%

(10048 in 12686)

No es| No en

(browser)

users: 5%

slide-32
SLIDE 32

S P A R Q L Q U E R I E S

E S D B P E D I A : T H E S P A R Q L E N D P O I N T

Up to 350,000 sparql queries per day

22M SPARQL queries FROM 2200 IPs

slide-33
SLIDE 33

S P A R Q L Q U E R I E S

E S D B P E D I A : T H E S P A R Q L E N D P O I N T

22M SPARQL queries FROM 2200 IP

IPs with more than 103 requests: 60 IPs with requests between 103 and 10: 440 IPs with <less than 10 requests: 1700

slide-34
SLIDE 34

N O I S E G E N E R A T O R S

E S D B P E D I A : T H E S P A R Q L E N D P O I N T

22M SPARQL queries FROM 2200 IP 2012 9-month queries

slide-35
SLIDE 35

L E S S O N S L E A R N T

Lesson 1

Take care of IP monsters

slide-36
SLIDE 36

L E S S O N S L E A R N T

Lesson 2

Take care of NOISE GENERATORS

slide-37
SLIDE 37

H T T P : / / D B P E D I A . O R G

Mariano.Rico@upm.es

T h a n k s f o r y o u r a t t e n t i o n !

fossati@fbk.eu bruemmer@informatik.uni-leipzig.de