SLIDE 1
Georg Gottlob, Carsten Lutz
Knowledge Representation, Ontologies, and Semantic Web
SLIDE 2 KR + DB
2
- ntology = schema? ontology = constraints? Yes and no!
Knowledge Representation: Build ontology / knowledge base capturing general knowedge of application domain Databases: Build systems for managing / querying data from application domain
+
tool for incomplete and heterogeneous data and for data integration
SLIDE 3
Culture Shock
3
Shocking languages with strange names and syntax: Shocking data models: relations of arity at most two This part of tutorial: Good news: common query languages (CQs, UCQs, RPQs, etc) no objections against higher arity, sometimes have it description logics A bit of DL history, area overview, help understanding
SLIDE 4
Description Logic Jungle
4
KL-ONE deprecated stuff DL-Lite family ALC family EL family OWL2 QL OWL2 EL OWL2 DL formalization scalability/data semantic web/ standardization 1981 1991 2005 2009 no disjunction no recursion
SLIDE 5 ALC Family
5
Ontology: Operators available in ALC: A(x) A ¬C, C u D, C t D ¬C(x), C(x) ∧ D(x), C(x) ∨ D(x) ∃r.C ∃y r(x, y) ∧ C(y) ∀y r(x, y) → C(y) ∀r.C
(attribute concept language with complement)
For example: ∀x C(x) → D(x) Director ⌘ Person u 9directed.(Movie t TVseries) ForeignMovie ⌘ 8producedIn.¬US u 9language.¬English Movie v Comedy t Drama t HorrorMovie C v D ∀x C(x) ↔ D(x) C ≡ D Modeling strongly concentrates on classes (= unary relations)
SLIDE 6 ALC Family
6
Precise characterization: Sloppily: the fragment of FO that “speaks about trees” DL kind of syntax actually not that unusual: same type used in temporal logic, mu-calculus, PDL
- Theorem. An FO-sentence is equiv. to an ALC-ontology iff it is invariant
under global bisimulation and disjoint union. Despite focus on classes, DLs are viewed as FO fragments, related e.g. to the guarded fragment
SLIDE 7
7
ALC Family
How does this give rise to an ALC family of logics? speaking also about the inverse of relations: + I counting the number of successors of a node: + Q ~five most frequently used modifiers, plus several minor ones constants: + O Most DL people agree: DL names can be ugly! (heard of ALCHQIOR+?) Of course, these additions might change the model theory (e.g. no longer purely trees)
SLIDE 8 8
EL and DL-Lite Families
essentially: inclusion dependencies + projection + fundeps EL family Positive-existential-conjunctive fragment of ALC Scientist u 9participatesIn.DagstuhlSeminar v 9customerOf.TaxiCompany , e.g.: DL-Lite family
(existential language)
tree shaped rule bodies, no EDB/IDB separation x x EL ontology ≈ monadic datalog program + ∃ in rule heads, Movie v 9hasDirector 9hasISSN v SerialPublication
SLIDE 9 9
Ontologies vs Schemas
Ontologies are not quite like a schema in several ways: SNOMED CT
- ften supposed to be general purpose and universally useful
result of expensive modeling effort, often rather large
- ntology for the web by Bing, Google, Yahoo!, Yandex
schema.org ~700 classes, ~1000 binary relations, ~30 contributors international standard for electronic health records Two examples: ~400.000 classes, ~36 binary relations, ~40 engineers There are various ontology repositories containing hundreds of ontologies
SLIDE 10 10
Basic DL Research
Reasoning helps to construct / maintain / verify ontology: satisfiability: check consistency of concepts implication/subsumption: make ontology consequences explicit There is now serious tool support: editors (such as protege), reasoners (Konclude, ELK, many many more) We are quite good in solving ExpTime-complete problems in practice, Studied all the way from theory to systems f r
P T i m e v i a E x p T i m e t
b
e e.g. satisfiability in (extensions of) ALC (choice of logics helps!) No data, just an ontology.
SLIDE 11 11
Ontology Reasoning
Some other “data-free” lines of research:
learning/mining ontologies concept matching and unification summary / uniform interpolation non-monotonic ontologies concrete domains (= data values) probabilistic ontologies
temporal ontologies
- ntology “diff” and debugging
conservative extensions, modularity explanation systems & optimization
SLIDE 12 Sample: Conservative Extensions + Modularity
12
Good for managing ontologies, e.g. modularity: M ⊆ O is self-contained Σ-module if O is Σ-c.e. of M. M ⊆ O is depleting Σ-module if O \ M is Σ-c.e. of ∅. subdomain of interest
- Theorem. Conservative extensions are undecidable in ALC (and below).
Bad news: Good news: there are very good replacements! Let Σ be signature. O2 ⊇ O1 is Σ-conservative extension of O1 if for every model I1 of O1, there is model I2 of O2 with I1|Σ = I2|Σ.
SLIDE 13
Sample: Conservative Extensions + Modularity
13
Overapproximation: deductive conservative extensions Underapproximation: ?-conservative extensions Can be syntactically (under)approximated, giving rise to polytime module extraction algorithms that work very well in practice Equivalent: O2 ⊇ O1 is Σ-conservative extension of O1 if for every model I1 of O1, there is model I2 of O2 with I1|Σ = I2|Σ up to bisimulation. This recovers decidability e.g. via automata, 2ExpTime-complete for ALC. [GhilardiL__Wolter] O2 ◆ O1 is Σ-conservative extension of O1 if O2 | = C v D implies O1 | = C v D whenever C, D use only Σ-relations. O2 ⊇ O1 is Σ-conservative extension of O1 if from every model I1 of O1, we get model of O2 by making non-Σ-symbols empty. [CuencaGrauHorrocksKazakovSattler] Can be reduced to satisfiability, thus ExpTime-complete for ALC.
SLIDE 14 14
Adding Data
[RoussetLevy96,CalvaneseEtAl98] first seriously considered in now very mainstream Ontology used at querying time for inferencing, unlike constraints
- pen world / certain answer semantics
SLIDE 15 15
Adding Data
Essentially two kinds of scenarios:
- ntology provides class-centric global view / conceptual model
sometimes ontology even given as part of data
- ntologies typically application dependent and custom tailored
Web data / Semantic Web (AI’ish) Data Integration / Ontology-Based Data Access (DB’ish) very large scale, very incomplete mappings (typically GAV) connect ontology with data sources
- ntologies tend to be general purpose and pragmatic
SLIDE 16 16
Implementation Approaches
Query rewriting to get rid of ontology: target query languages include SQL = UCQs = non-recursive Datalog, Datalog, linear Datalog, monadic Datalog Combined approach: materialization of ontology consequences in data: Implementations: Oracle Semantic Technologies, RDFox, Combo Incremental maintenance related to FOIES / DynFO
- ften becomes infinite because of existential quantifiers
finite representation used instead that is unsound soundness regained by limited query rewriting
SLIDE 17
17
DL-Lite
In DL-Lite family: FO-rewritings… always exist in (DL-Lite,UCQ) are small in practice when data comes from classical DB is polynomial under mild assumptions [Calvanese,Rodriguez-Muro] [CalvaneseEtAl] [ZakharyaschevEtAl] can be superpolynomial unless NP⊆P/poly [GottlobSchwentick] Implementations include OnTop, Clipper, Rapid, Requiem, Presto/Mastro
SLIDE 18
18
EL
FO-rewritings are not guaranteed to exist because of recursion 9r.A v A + query A(x) = reachability of A-point on r-path existence of FO-rewritings related to monadic datalog boundedness but often simpler (PSpace, ExpTime, 2ExpTime dep. on setup) [BienvenuHansenL__Wolter] FO-rewritings exist in almost all cases and can be computed efficiently (as non-recursive Datalog programs): Grind system [HansenL__Wolter] In EL family: combined approach always applicable, polynomial query rewriting [L__TomanWolter]
SLIDE 19
19
ALC
Connection to CSP, also related to natural questions such as: how does expressive power of OMQs relate to traditional QLs? can we classify the complexity of all OMQs in, e.g., (ALC, UCQ)? Can also be used to clarify complexity of FO- and Datalog-rewritability NExpTime for (ALC, AQ), 2NExpTime for (ALC, UCQ). [BienvenuTenCateL__Wolter] (ALC, AQ) = coCSP (ALC, UCQ) = coMMSNP = monadic disjunctive Datalog (GF, UCQ) = coGMSNP = frontier-guarded disjunctive Datalog [BienvenuTenCateL__Wolter,FeierKuusistoL__] Emerging picture is very interesting:
SLIDE 20
20
Other Things People are Interested in
OMQ emptiness and containment uncertain / probabilistic databases supporting data analytics privacy / confidentiality OMQ expressive power Annual workshop with ~80 participants, next year 30th edition DL is rather active subarea of KR consistent query answering updates partial closed world assumption dynamic & temporal aspects explanation systems & optimization
SLIDE 21
Higher Arity Relations
21
DLs as language for graph DBs Possible but not so elegant workaround: mappings / reification more from [CalvaneseOrtizSimkus] Native solutions: Why again binary? Mix of syntax and desired universality of relations. Sometimes we don’t need more: RDF!? more concise proposal DLFU1 [Kuusisto16]:
C ::= A | ¬C | (C1 u C2) | 9R.(C1, . . . , Cn) R ::= S | ε | ¬R | (R1 u R2) | σR surjection [n] → [m], n arity of R n-ary language DLR already in [LenzeriniEtAl98]