YAGO: Yet Another Great Ontology Fabian M. Suchanek (joint work - - PowerPoint PPT Presentation
YAGO: Yet Another Great Ontology Fabian M. Suchanek (joint work - - PowerPoint PPT Presentation
YAGO: Yet Another Great Ontology Fabian M. Suchanek (joint work with Gjergji Kasneci, Mauro Sozio and Gerhard Weikum) (Max-Planck-Institute for Informatics, Saarbrcken/Germany) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 1
YAGO - A Core of Semantic Knowledge 2 Fabian M. Suchanek
Overview
ر Motivation: Why would anybody need Ontologies? ر Building a Core Ontology: YAGO ر Extending the Core Ontology: SOFIE
YAGO - A Core of Semantic Knowledge 3 Fabian M. Suchanek
The Search for Excellent Scientists
Max-Planck Institute DFKI
YAGO - A Core of Semantic Knowledge 4 Fabian M. Suchanek
scientist musician prize
The Search for Excellent Scientists
Invisible Gorilla steals the Nobel Prize
...The gorilla, plus dropped food and country music, were honored... newscientist.org/article/invisibleGorilla.htm Cached Similar pages
YAGO - A Core of Semantic Knowledge 5 Fabian M. Suchanek
scientist who are musicians and won a prize
Invisible Gorilla steals the Nobel Prize
...The gorilla, plus dropped food and country music, were honored... newscientist.org/article/invisibleGorilla.htm Cached Similar pages
The Search for Excellent Scientists
YAGO - A Core of Semantic Knowledge 6 Fabian M. Suchanek
Please give me IMMEDIATELY the scientists who are...
Invisible Gorilla steals the Nobel Prize
...The gorilla, plus dropped food and country music, were honored... newscientist.org/article/invisibleGorilla.htm Cached Similar pages
The Search for Excellent Scientists
YAGO - A Core of Semantic Knowledge 7 Fabian M. Suchanek
Solution: An Ontology
gotPrize is a musician scientist is a person is a is a
YAGO - A Core of Semantic Knowledge 8 Fabian M. Suchanek
1980 born person entity subclass "Sam Smart" "Dr. Smart" means means Words is a Individuals Classes Relations
Solution: An Ontology
means
YAGO - A Core of Semantic Knowledge 9 Fabian M. Suchanek
Where do we get the ontology from?
Previous Approaches: ر Assemble the ontology manually (WordNet, SUMO, Cyc, GeneOntology) Problem: Usually low coverage (MPI is in none of these) ر Extract the ontology from corpora (e.g. the Web) (Text2Onto, KnowItAll, Espresso, Snowball, LEILA, TextRunner) Problems:
- 1. Usually low accuracy (50%-92%)
- 2. Non-canonicity
ر Use community work (Semantic Wikipedia, Freebase) Problem: We don't know yet whether it takes off recoverWithout(most_people, medication) areUnder(0%, the_age_of_18) support(these_findings, the_notion)
YAGO - A Core of Semantic Knowledge 10 Fabian M. Suchanek
Overview
ر Motivation: Why would anybody need Ontologies? ر Building a Core Ontology: YAGO ر Extending the Core Ontology: SOFIE
YAGO - A Core of Semantic Knowledge 11 Fabian M. Suchanek
YAGO Construction: Infoboxes
blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter
Name: Sam Smart Born in: Berlin ...
Berlin bornIn Exploit infoboxes Smart, S
YAGO - A Core of Semantic Knowledge 12 Fabian M. Suchanek
YAGO Construction: Categories
blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter
Categories: 1980_births Exploit relational categories bornIn Exploit infoboxes 1980 born Berlin Smart, S
YAGO - A Core of Semantic Knowledge 13 Fabian M. Suchanek
YAGO Construction: Categories
blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter
Categories: German_scientists Exploit relational categories bornIn Exploit infoboxes 1980 born Exploit conceptual categories is a GermanScientist Berlin Smart, S
YAGO - A Core of Semantic Knowledge 14 Fabian M. Suchanek
YAGO Construction: Categories
blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter
Categories: Physics Exploit relational categories bornIn Exploit infoboxes 1980 born Exploit conceptual categories Avoid thematic categories is a Physics Berlin is a GermanScientist Smart, S
YAGO - A Core of Semantic Knowledge 15 Fabian M. Suchanek
YAGO Construction: Upper Model
1980 born German Scientist is a person entity
?
YAGO - A Core of Semantic Knowledge 16 Fabian M. Suchanek
YAGO Construction: Upper Model
1980 born German Scientist is a People_by_occupation Business
?
Social_group
YAGO - A Core of Semantic Knowledge 17 Fabian M. Suchanek
YAGO Construction: Upper Model
1980 born German Scientist is a Scientist Person subclass subclass "scientist" means "S. Smart" means
WordNet Wikipedia
YAGO - A Core of Semantic Knowledge 18 Fabian M. Suchanek
YAGO: Relations
is a familyName givenName bornOnDate diedOnDate bornIn diedIn locatedIn establishedOnDate isMarriedTo hasPopulation hasHeight hasWeight hasInflation actedIn ... 90 relations
Manual evaluation: 95% correct
YAGO - A Core of Semantic Knowledge 19 Fabian M. Suchanek
YAGO: Size*
KnowItAll SUMO WordNet OpenCyc Cyc 30,000 60,000 200,000 300,000 3,000,000 19,000,000 Yago
* Publicly available ontologies with a quality guarantee. Size is not correlated with usefulness.
YAGO - A Core of Semantic Knowledge 20 Fabian M. Suchanek
YAGO Model: Why binary is not enough
is a 1998 Wikipedia since source
#1 (Sam, is_a, scientist) #2 (#1, since, 1998) #3 (#1, source, Wikipedia)
scientist
YAGO - A Core of Semantic Knowledge 21 Fabian M. Suchanek
YAGO Model: Formal view
A YAGO ontology over ر a set of relations ر a set of common entities ر a set of fact identifiers is a function → ∪∪× × ∪∪ We can talk about ر facts (#1, source, Wikipedia) ر additional arguments (#1, since, 1998) ر relations (time, hasRange, time_interval) #1 (Sam, is_a, scientist) #2 (#1, since, 1998) #3 (#1, source, Wikipedia)
S t i l l : D e c i d e a b l e C
- n
s i s t e n c y
YAGO - A Core of Semantic Knowledge 22 Fabian M. Suchanek
A Hitchhiker's Guide to Ontology
DBpedia (HU Berlin) Cyc (commercial) Freebase (community) UMBEL (commercial) Linking Open Data (HU Berlin, U Leipzig, OLS Inc.) YAGO SUMO (research project)
YAGO forms taxonomic backbone YAGO is part of the project by its Web service YAGO contributes the entities Planned YAGO will be included YAGO and SUMO have been merged
[Elsevier 2008] Semantic Wikipedia (U Karlsruhe)
YAGO - A Core of Semantic Knowledge 23 Fabian M. Suchanek
Extending the Ontology
- Dr. Smart was born in 1980.
- Dr. Smart
1980 was born in Our first approach: LEILA - Combining Linguistic and Statistical Analysis [SIGKDD 2006] Worked well, but was slow.
YAGO - A Core of Semantic Knowledge 24 Fabian M. Suchanek
Extending the Ontology
- Dr. Smart was born in 1980.
- Dr. Smart
1980 was born in bornInYear(Person, Year)
YAGO - A Core of Semantic Knowledge 25 Fabian M. Suchanek
Extending the Ontology
- Dr. Smart was born in 1980.
- Dr. Smart
1980 bornInYear
- 1. Mapping patterns to relations
YAGO - A Core of Semantic Knowledge 26 Fabian M. Suchanek
Extending the Ontology
- Dr. Smart was born in 1980.
1980 bornInYear
- 1. Mapping patterns to relations
- 2. Disambiguating entity names
diedInYear 1776
YAGO - A Core of Semantic Knowledge 27 Fabian M. Suchanek
Extending the Ontology
- Dr. Smart was born in 1980.
1980 bornInYear
- 1. Mapping patterns to relations
- 2. Disambiguating entity names
- 3. Performing logical reasoning
YAGO - A Core of Semantic Knowledge 28 Fabian M. Suchanek
SOFIE: A Unifying Framework
- 1. Mapping patterns to relations
- 2. Disambiguating entity names
- 3. Performing logical reasoning
New ! „Elvis was born in 1937.“ 1937 bornInYear
+ = = = =
„X was born in Y“ is a good pattern for bornInYear
YAGO - A Core of Semantic Knowledge 29 Fabian M. Suchanek
SOFIE: A Unifying Framework
- 1. Mapping patterns to relations
- 2. Disambiguating entity names
- 3. Performing logical reasoning
New ! „Dr. Smart was born in 1980.“ 1980 bornInYear
+ = = = =
„X was born in Y“ is a good pattern for bornInYear
YAGO - A Core of Semantic Knowledge 30 Fabian M. Suchanek
„...The world as such, I would like to say – even though some will contradict – is not as it seems. As Dr. Smart pointed out in his ground-breaking paper „The world according to Smart“, the world rather seems not what it seems...“
SOFIE: A Unifying Framework
- 1. Mapping patterns to relations
- 2. Disambiguating entity names
- 3. Performing logical reasoning
New ! r(x,y) /\ occurs(p,x,y) => isGoodPattern(p,r) isGoodPattern(p,r) /\ occurs(p,x',y') => r(x',y') disambiguate(„Dr. Smart“, Sam_Smart)[0.8] 0.8 0.2 disambiguate(„Dr. Smart“, Lisa_Smart)[0.2]
YAGO - A Core of Semantic Knowledge 31 Fabian M. Suchanek
SOFIE: A Unifying Framework
- 1. Mapping patterns to relations
- 2. Disambiguating entity names
- 3. Performing logical reasoning
New ! r(x,y) /\ occurs(p,x,y) => isGoodPattern(p,r) isGoodPattern(p,r) /\ occurs(p,x',y') => r(x',y') bornInYear(x,b) /\ diedInYear(x,d) => b<d disambiguate(„Dr. Smart“, Sam_Smart)[0.8] It's all just logical formulae with weights Find truth values for the literals so that a maximal number of formulae is happy! [PhD]
YAGO - A Core of Semantic Knowledge 32 Fabian M. Suchanek
SOFIE: A Unifying Framework
New ! r(x,y) /\ occurs(p,x,y) => isGoodPattern(p,r) isGoodPattern(p,r) /\ occurs(p,x',y') => r(x',y') bornInYear(x,b) /\ diedInYear(x,d) => b<d disambiguate(„Dr. Smart“, Sam_Smart)[0.8] It's all just logical formulae with weights Find truth values for the literals so that a maximal number of formulae is happy! [PhD] bornInYear(Elvis,1937) YAGO
YAGO - A Core of Semantic Knowledge 33 Fabian M. Suchanek
SOFIE: A Unifying Framework
/ \ r ( a , b ) = > s ( x , y )
Sam_Smart 1980
Algorithm Functional MAX SAT
FOR i=1 TO 42 ... NEXT i
P
- l
y n
- m
i a l t i m e A p p r
- x
i m a t i
- n
G u a r a n t e e Weighted MAX SAT Problem
YAGO - A Core of Semantic Knowledge 34 Fabian M. Suchanek
SOFIE: A Unifying Framework
politicianOf bornIn bornOnDate diedOnDate Precision values on 3700 biography documents downloaded from the Web SOFIE's precision, including disambiguation (≈90%) 95% 87%87% 98%
YAGO - A Core of Semantic Knowledge 35 Fabian M. Suchanek
The Excellent Scientist
Which scientist was also a musician and has won a prize? hasWonPrize is a musician scientist is a We're not there yet... ...but YAGO can already help us with the original question:
YAGO - A Core of Semantic Knowledge 36 Fabian M. Suchanek
The Excellent Scientist
Which scientist was also a musician and has won a prize? We're not there yet... ...but YAGO can already help us with the original question: X isa scientist X isa musician X hasWonPrize Y
(DEMO)
YAGO - A Core of Semantic Knowledge 37 Fabian M. Suchanek
Conclusion
ر YAGO is a large ontology ر Ontological knowledge can help in many applications ر SOFIE uses logical reasoning to extend YAGO ر They do exist http://mpii.de/yago
YAGO - A Core of Semantic Knowledge 38 Fabian M. Suchanek
References
[SIGKDD 2006] Fabian M. Suchanek, Georgiana Ifrim and Gerhard Weikum "Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents" International Conference on Knowledge Discovery and Data Mining (SIGKDD 2006) [WWW 2007] Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum "YAGO - A Core of Semantic Knowledge" International World Wide Web conference (WWW 2007) [Elsevier 2008] Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum "YAGO - A Large Ontology from Wikipedia and WordNet" Elsevier Journal of Web Semantics 2008 [PhD] Fabian M. Suchanek "Automated Construction and Growth of a Large Ontology" PhD thesis, see http://mpii.de/~suchanek The SOFIE part is also published as Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum „SOFIE – A Self-Organizing Framework for Information Extraction“ Technical report, see http://mpii.de/~suchanek Submitted to the International World Wide Web conference (WWW 2009)