YAGO: Yet Another Great Ontology Fabian M. Suchanek (joint work - - PowerPoint PPT Presentation

yago yet another great ontology
SMART_READER_LITE
LIVE PREVIEW

YAGO: Yet Another Great Ontology Fabian M. Suchanek (joint work - - PowerPoint PPT Presentation

YAGO: Yet Another Great Ontology Fabian M. Suchanek (joint work with Gjergji Kasneci, Mauro Sozio and Gerhard Weikum) (Max-Planck-Institute for Informatics, Saarbrcken/Germany) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 1


slide-1
SLIDE 1

YAGO - A Core of Semantic Knowledge 1 Fabian M. Suchanek

YAGO: Yet Another Great Ontology

Fabian M. Suchanek

(joint work with Gjergji Kasneci, Mauro Sozio and Gerhard Weikum)‏

(Max-Planck-Institute for Informatics, Saarbrücken/Germany)‏

slide-2
SLIDE 2

YAGO - A Core of Semantic Knowledge 2 Fabian M. Suchanek

Overview

ر Motivation: Why would anybody need Ontologies? ر Building a Core Ontology: YAGO ر Extending the Core Ontology: SOFIE

slide-3
SLIDE 3

YAGO - A Core of Semantic Knowledge 3 Fabian M. Suchanek

The Search for Excellent Scientists

Max-Planck Institute DFKI

slide-4
SLIDE 4

YAGO - A Core of Semantic Knowledge 4 Fabian M. Suchanek

scientist musician prize

The Search for Excellent Scientists

Invisible Gorilla steals the Nobel Prize

...The gorilla, plus dropped food and country music, were honored... newscientist.org/article/invisibleGorilla.htm Cached Similar pages

slide-5
SLIDE 5

YAGO - A Core of Semantic Knowledge 5 Fabian M. Suchanek

scientist who are musicians and won a prize

Invisible Gorilla steals the Nobel Prize

...The gorilla, plus dropped food and country music, were honored... newscientist.org/article/invisibleGorilla.htm Cached Similar pages

The Search for Excellent Scientists

slide-6
SLIDE 6

YAGO - A Core of Semantic Knowledge 6 Fabian M. Suchanek

Please give me IMMEDIATELY the scientists who are...

Invisible Gorilla steals the Nobel Prize

...The gorilla, plus dropped food and country music, were honored... newscientist.org/article/invisibleGorilla.htm Cached Similar pages

The Search for Excellent Scientists

slide-7
SLIDE 7

YAGO - A Core of Semantic Knowledge 7 Fabian M. Suchanek

Solution: An Ontology

gotPrize is a musician scientist is a person is a is a

slide-8
SLIDE 8

YAGO - A Core of Semantic Knowledge 8 Fabian M. Suchanek

1980 born person entity subclass "Sam Smart" "Dr. Smart" means means Words is a Individuals Classes Relations

Solution: An Ontology

means

slide-9
SLIDE 9

YAGO - A Core of Semantic Knowledge 9 Fabian M. Suchanek

Where do we get the ontology from?

Previous Approaches: ر Assemble the ontology manually (WordNet, SUMO, Cyc, GeneOntology)‏ Problem: Usually low coverage (MPI is in none of these)‏ ر Extract the ontology from corpora (e.g. the Web)‏ (Text2Onto, KnowItAll, Espresso, Snowball, LEILA, TextRunner)‏ Problems:

  • 1. Usually low accuracy (50%-92%)‏
  • 2. Non-canonicity

ر Use community work (Semantic Wikipedia, Freebase)‏ Problem: We don't know yet whether it takes off recoverWithout(most_people, medication)‏ areUnder(0%, the_age_of_18)‏ support(these_findings, the_notion)‏

slide-10
SLIDE 10

YAGO - A Core of Semantic Knowledge 10 Fabian M. Suchanek

Overview

ر Motivation: Why would anybody need Ontologies? ر Building a Core Ontology: YAGO ر Extending the Core Ontology: SOFIE

slide-11
SLIDE 11

YAGO - A Core of Semantic Knowledge 11 Fabian M. Suchanek

YAGO Construction: Infoboxes

blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter

Name: Sam Smart Born in: Berlin ...

Berlin bornIn Exploit infoboxes Smart, S

slide-12
SLIDE 12

YAGO - A Core of Semantic Knowledge 12 Fabian M. Suchanek

YAGO Construction: Categories

blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter

Categories: 1980_births Exploit relational categories bornIn Exploit infoboxes 1980 born Berlin Smart, S

slide-13
SLIDE 13

YAGO - A Core of Semantic Knowledge 13 Fabian M. Suchanek

YAGO Construction: Categories

blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter

Categories: German_scientists Exploit relational categories bornIn Exploit infoboxes 1980 born Exploit conceptual categories is a GermanScientist Berlin Smart, S

slide-14
SLIDE 14

YAGO - A Core of Semantic Knowledge 14 Fabian M. Suchanek

YAGO Construction: Categories

blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter

Categories: Physics Exploit relational categories bornIn Exploit infoboxes 1980 born Exploit conceptual categories Avoid thematic categories is a Physics Berlin is a GermanScientist Smart, S

slide-15
SLIDE 15

YAGO - A Core of Semantic Knowledge 15 Fabian M. Suchanek

YAGO Construction: Upper Model

1980 born German Scientist is a person entity

?

slide-16
SLIDE 16

YAGO - A Core of Semantic Knowledge 16 Fabian M. Suchanek

YAGO Construction: Upper Model

1980 born German Scientist is a People_by_occupation Business

?

Social_group

slide-17
SLIDE 17

YAGO - A Core of Semantic Knowledge 17 Fabian M. Suchanek

YAGO Construction: Upper Model

1980 born German Scientist is a Scientist Person subclass subclass "scientist" means "S. Smart" means

WordNet Wikipedia

slide-18
SLIDE 18

YAGO - A Core of Semantic Knowledge 18 Fabian M. Suchanek

YAGO: Relations

is a familyName givenName bornOnDate diedOnDate bornIn diedIn locatedIn establishedOnDate isMarriedTo hasPopulation hasHeight hasWeight hasInflation actedIn ... 90 relations

Manual evaluation: 95% correct

slide-19
SLIDE 19

YAGO - A Core of Semantic Knowledge 19 Fabian M. Suchanek

YAGO: Size*

KnowItAll SUMO WordNet OpenCyc Cyc 30,000 60,000 200,000 300,000 3,000,000 19,000,000 Yago

* Publicly available ontologies with a quality guarantee. Size is not correlated with usefulness.

slide-20
SLIDE 20

YAGO - A Core of Semantic Knowledge 20 Fabian M. Suchanek

YAGO Model: Why binary is not enough

is a 1998 Wikipedia since source

#1 (Sam, is_a, scientist)‏‏‏‏ #2 (#1, since, 1998)‏‏‏‏ #3 (#1, source, Wikipedia)‏‏‏‏

scientist

slide-21
SLIDE 21

YAGO - A Core of Semantic Knowledge 21 Fabian M. Suchanek

YAGO Model: Formal view

A YAGO ontology over ر a set of relations ر a set of common entities ر a set of fact identifiers is a function → ∪∪× × ∪∪‏ We can talk about ر facts (#1, source, Wikipedia)‏‏‏‏ ر additional arguments (#1, since, 1998)‏‏‏‏ ر relations (time, hasRange, time_interval)‏‏‏‏ #1 (Sam, is_a, scientist)‏‏‏‏ #2 (#1, since, 1998)‏‏‏‏ #3 (#1, source, Wikipedia)‏‏‏‏

S t i l l : D e c i d e a b l e C

  • n

s i s t e n c y

slide-22
SLIDE 22

YAGO - A Core of Semantic Knowledge 22 Fabian M. Suchanek

A Hitchhiker's Guide to Ontology

DBpedia (HU Berlin)‏ Cyc (commercial)‏ Freebase (community)‏ UMBEL (commercial)‏ Linking Open Data (HU Berlin, U Leipzig, OLS Inc.)‏ YAGO SUMO (research project)‏

YAGO forms taxonomic backbone YAGO is part of the project by its Web service YAGO contributes the entities Planned YAGO will be included YAGO and SUMO have been merged

[Elsevier 2008] Semantic Wikipedia (U Karlsruhe)‏

slide-23
SLIDE 23

YAGO - A Core of Semantic Knowledge 23 Fabian M. Suchanek

Extending the Ontology

  • Dr. Smart was born in 1980.
  • Dr. Smart

1980 was born in Our first approach: LEILA - Combining Linguistic and Statistical Analysis [SIGKDD 2006] Worked well, but was slow.

slide-24
SLIDE 24

YAGO - A Core of Semantic Knowledge 24 Fabian M. Suchanek

Extending the Ontology

  • Dr. Smart was born in 1980.
  • Dr. Smart

1980 was born in bornInYear(Person, Year)‏

slide-25
SLIDE 25

YAGO - A Core of Semantic Knowledge 25 Fabian M. Suchanek

Extending the Ontology

  • Dr. Smart was born in 1980.
  • Dr. Smart

1980 bornInYear

  • 1. Mapping patterns to relations
slide-26
SLIDE 26

YAGO - A Core of Semantic Knowledge 26 Fabian M. Suchanek

Extending the Ontology

  • Dr. Smart was born in 1980.

1980 bornInYear

  • 1. Mapping patterns to relations
  • 2. Disambiguating entity names

diedInYear 1776

slide-27
SLIDE 27

YAGO - A Core of Semantic Knowledge 27 Fabian M. Suchanek

Extending the Ontology

  • Dr. Smart was born in 1980.

1980 bornInYear

  • 1. Mapping patterns to relations
  • 2. Disambiguating entity names
  • 3. Performing logical reasoning
slide-28
SLIDE 28

YAGO - A Core of Semantic Knowledge 28 Fabian M. Suchanek

SOFIE: A Unifying Framework

  • 1. Mapping patterns to relations
  • 2. Disambiguating entity names
  • 3. Performing logical reasoning

New ! „Elvis was born in 1937.“ 1937 bornInYear

+ = = = =

„X was born in Y“ is a good pattern for bornInYear

slide-29
SLIDE 29

YAGO - A Core of Semantic Knowledge 29 Fabian M. Suchanek

SOFIE: A Unifying Framework

  • 1. Mapping patterns to relations
  • 2. Disambiguating entity names
  • 3. Performing logical reasoning

New ! „Dr. Smart was born in 1980.“ 1980 bornInYear

+ = = = =

„X was born in Y“ is a good pattern for bornInYear

slide-30
SLIDE 30

YAGO - A Core of Semantic Knowledge 30 Fabian M. Suchanek

„...The world as such, I would like to say – even though some will contradict – is not as it seems. As Dr. Smart pointed out in his ground-breaking paper „The world according to Smart“, the world rather seems not what it seems...“

SOFIE: A Unifying Framework

  • 1. Mapping patterns to relations
  • 2. Disambiguating entity names
  • 3. Performing logical reasoning

New ! r(x,y) /\ occurs(p,x,y) => isGoodPattern(p,r)‏ isGoodPattern(p,r) /\ occurs(p,x',y') => r(x',y')‏ disambiguate(„Dr. Smart“, Sam_Smart)[0.8] 0.8 0.2 disambiguate(„Dr. Smart“, Lisa_Smart)[0.2]

slide-31
SLIDE 31

YAGO - A Core of Semantic Knowledge 31 Fabian M. Suchanek

SOFIE: A Unifying Framework

  • 1. Mapping patterns to relations
  • 2. Disambiguating entity names
  • 3. Performing logical reasoning

New ! r(x,y) /\ occurs(p,x,y) => isGoodPattern(p,r)‏ isGoodPattern(p,r) /\ occurs(p,x',y') => r(x',y')‏ bornInYear(x,b) /\ diedInYear(x,d) => b<d disambiguate(„Dr. Smart“, Sam_Smart)[0.8] It's all just logical formulae with weights Find truth values for the literals so that a maximal number of formulae is happy! [PhD]

slide-32
SLIDE 32

YAGO - A Core of Semantic Knowledge 32 Fabian M. Suchanek

SOFIE: A Unifying Framework

New ! r(x,y) /\ occurs(p,x,y) => isGoodPattern(p,r)‏ isGoodPattern(p,r) /\ occurs(p,x',y') => r(x',y')‏ bornInYear(x,b) /\ diedInYear(x,d) => b<d disambiguate(„Dr. Smart“, Sam_Smart)[0.8] It's all just logical formulae with weights Find truth values for the literals so that a maximal number of formulae is happy! [PhD] bornInYear(Elvis,1937)‏ YAGO

slide-33
SLIDE 33

YAGO - A Core of Semantic Knowledge 33 Fabian M. Suchanek

SOFIE: A Unifying Framework

/ \ r ( a , b ) = > s ( x , y ) ‏

Sam_Smart 1980

Algorithm Functional MAX SAT

FOR i=1 TO 42 ... NEXT i

P

  • l

y n

  • m

i a l t i m e A p p r

  • x

i m a t i

  • n

G u a r a n t e e Weighted MAX SAT Problem

slide-34
SLIDE 34

YAGO - A Core of Semantic Knowledge 34 Fabian M. Suchanek

SOFIE: A Unifying Framework

politicianOf bornIn bornOnDate diedOnDate Precision values on 3700 biography documents downloaded from the Web SOFIE's precision, including disambiguation (≈90%)‏ 95% 87%87% 98%

slide-35
SLIDE 35

YAGO - A Core of Semantic Knowledge 35 Fabian M. Suchanek

The Excellent Scientist

Which scientist was also a musician and has won a prize? hasWonPrize is a musician scientist is a We're not there yet... ...but YAGO can already help us with the original question:

slide-36
SLIDE 36

YAGO - A Core of Semantic Knowledge 36 Fabian M. Suchanek

The Excellent Scientist

Which scientist was also a musician and has won a prize? We're not there yet... ...but YAGO can already help us with the original question: X isa scientist X isa musician X hasWonPrize Y

(DEMO)

slide-37
SLIDE 37

YAGO - A Core of Semantic Knowledge 37 Fabian M. Suchanek

Conclusion

ر YAGO is a large ontology ر Ontological knowledge can help in many applications ر SOFIE uses logical reasoning to extend YAGO ر They do exist http://mpii.de/yago

slide-38
SLIDE 38

YAGO - A Core of Semantic Knowledge 38 Fabian M. Suchanek

References

[SIGKDD 2006] Fabian M. Suchanek, Georgiana Ifrim and Gerhard Weikum "Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents" International Conference on Knowledge Discovery and Data Mining (SIGKDD 2006)‏ [WWW 2007] Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum "YAGO - A Core of Semantic Knowledge" International World Wide Web conference (WWW 2007)‏ [Elsevier 2008] Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum "YAGO - A Large Ontology from Wikipedia and WordNet" Elsevier Journal of Web Semantics 2008 [PhD] Fabian M. Suchanek "Automated Construction and Growth of a Large Ontology" PhD thesis, see http://mpii.de/~suchanek The SOFIE part is also published as Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum „SOFIE – A Self-Organizing Framework for Information Extraction“ Technical report, see http://mpii.de/~suchanek Submitted to the International World Wide Web conference (WWW 2009)‏