1
Ontology Learning: Framework, Techniques and a Software Environment
MEANING WS Presentation, San Sebastian
Alexander Maedche Forschungszentrum Informatik an der Universität Karlsruhe Forschungsbereich Wissensmanagement (WIM) http://www.fzi.de/wim
Ontology Learning: Framework, Techniques and a Software Environment - - PowerPoint PPT Presentation
Ontology Learning: Framework, Techniques and a Software Environment MEANING WS Presentation, San Sebastian Alexander Maedche Forschungszentrum Informatik an der Universitt Karlsruhe Forschungsbereich Wissensmanagement (WIM)
1
Alexander Maedche Forschungszentrum Informatik an der Universität Karlsruhe Forschungsbereich Wissensmanagement (WIM) http://www.fzi.de/wim
2
3
distributed and heterogeneous, stored in databases, semi- structured documents and free text documents.
4
5
6
strict separation between schema and instance.
– elementary information container, contains ontology and instance data:
Description Logics paradigm, but executed using deductive database techniques.
7
Linked documents Linked documents Linked and Typed Instances
URI-SHA URI-STEFAND URI-DAMLPROJ WORKSIN
COOPER ATE
WORKSIN „DAML – Darpa Agent Markup Language“
Linked and Typed Instances
URI-SHA URI-STEFAND URI-DAMLPROJ WORKSIN
COOPER ATE
WORKSIN „DAML – Darpa Agent Markup Language“
Linked and Typed Instances
URI-SHA URI-STEFAND URI-DAMLPROJ WORKSIN
COOPER ATE
WORKSIN „DAML – Darpa Agent Markup Language“
PROJECT RESEARCHER PERSON
subClassOf range domain
Ontology
TOP COOPERATE WORKSIN NAME
domain domain range
NAME
subClassOf subClassOf
SYMMETRIC
8
the ontology.
specific model within the ontology (lexical OI-Model) and is considered as meta- information.
multilingual labels, synonyms, etc. etc.
9
10
Web documents
Domain Ontology
Data Import & Processing
WordNet Ontology
O1
Algorithm Library Result Set
Import existing
Lexicon i Crawl corpus
DTD XML
Legacy databases
Ontology Engineer
Import schema Import semi- structured schema
O2
KAON – OIModeler GUI /Management Component
NLP System
Processed Data Web documents
Domain Ontology Domain Ontology
Data Import & Processing
WordNet Ontology
O1
WordNet Ontology
O1
Algorithm Library Result Set
Import existing
Lexicon i Crawl corpus
DTD XML
Legacy databases
Ontology Engineer
Import schema Import semi- structured schema
O2
Ontology Engineering Comp. – Presentation Component
NLP components
Processed Data
11
12
13
(*) based on: Katerina Frantzi, Sophia Ananiadou, Hideki Mima: Automatic recognition of multi-word terms: the C-value/NC-value method, Int J Digit Libr (2000) 3: 115-130
14
15
16
TOP x0 x2 x6 x7 ... x3 x5 ... x10 x8 ... x4 x9 x1
Baltic Sea Wellness Hotel Hotel Accomodation Area F(Wellness Hotel) = x4 F(Baltic Sea) = x9
Concept pair (ling. transaction) (x4,x9) bzw. (F(Wellness Hotel), F(Baltic See)) Generalized Association: (F(Accomodation) -> F(Area)) (with label: G(locatedin))
F(Wellness Hotel) = x4 F(Baltic Sea) = x9
Concept pair (ling. transaction) (x4,x9) bzw. (F(Wellness Hotel), F(Baltic See)) Generalized Association: (F(Accomodation) -> F(Area)) (with label: G(locatedin))
17
0,00 0,20 0,40 0,60 0,80 1,00 0,00 0,20 0,40 precision recall 0-1 1-3 1-2 3-4 0-3 0-4 2-3
Referenz-
O0-gold OS1 OS2 Vergleich OS3 OS4
18
19
20
21
22
tightly integrated into the ontology management architecture KAON.
cooperative modeling approach, means that everything can be done manually, but automatic methods exist.
23
multi-word term extraction.
24
different process of classifying extracted terms into the
25
extracting relations, including:
rules
based extraction
reuse
26
taxonomic relations associated verbs, supporting labeling
relations.
27
28
29
30
Bi-Section as version of k-Means cosine as similarity bag of terms
(details on one of the next slides)
Java data set (with documents about java)
(*) work done by Andreas Hotho, University of Karlsruhe
31
docid term1 term2 term3 ... doc1 1 doc2 2 3 1 doc3 10 doc4 2 23 ...
docid term1 term2 term3 ... concept1 concept2 concept3 .. doc1 1 1 1 doc2 2 3 1 2 1 doc3 10 10 doc4 2 23 2 23 ...
32
33
(*) work done within the EU-IST funded DOT.KOM project. Ontology Learning Ontology Learning
Training/ Testing Training/ Testing Integration Ontologie-IE Integration Ontologie-IE Refinement/ Evolution Refinement/ Evolution Annotation Annotation Ontology-based IE
34
35
36
37
38
39
40
Forschungszentrum Informatik an der Universität Karlsruhe Research Group WIM http://www.fzi.de/wim
41
64% (1362,478,171) 64% (1511,511,181) [(RB)(JJ)(NN)]*(IN)? [(RB)(JJ)(NN)]* [(NN)(NNS)] 76% (1243,361,85) 86% (1362,362,47) (RB)*(JJ)*[(NN)(NNS)]+ 80% (1079,202,40) 88% (*) (1230,233,27) (**) [(NNS)(NN)]+ Corpus2 „EoI-Knowledge- Technologies“ Corpus1 „Human Language Technology“ filter
(*) Of precision
(**) (number of all extracted terms, number of multiword terms, number of incorectly extracted multiword terms )
42
docid term1 term2 term3 ... doc1 1 doc2 2 3 1 doc3 10 doc4 2 23 ...
(term frequency - inverted document frequency)
|D|
df(t)
contain term t