SLIDE 1
The Quest for Probabilistic Description Logics: A Personal - - PowerPoint PPT Presentation
The Quest for Probabilistic Description Logics: A Personal - - PowerPoint PPT Presentation
The Quest for Probabilistic Description Logics: A Personal Perspective Carsten Lutz University of Bremen DLs and Probabilities: a Match Made in Heaven Good reasons to extend Description Logics with probabilistic aspects: Representing
SLIDE 2
SLIDE 3
DLs and Probabilities: a Match Made in Hell
3
SO MANY possible combinations and setups: Which formalism to combine with? Bayes Net? Markov Net? Markov Logic? Lexikographic Entailment? Max Entropy? How to model probabilistic independence? Apply probabilities to TBoxes? Concepts? Roles? Data Items? One distribution? Only constraints on distributions? Which distributions? Resulting formalisms VERY VERY different and hard to compare Although sometimes claimed, there is no “one-size-fits-all” solution Moreover: Intractability comes VERY VERY quickly How to handle non-monotonic aspects? …
SLIDE 4
From Hell to Heaven?
4
In this talk: Personal opinion: research on probabilistic DLs should start from concrete application task, develop dedicated logic for it. Representing Uncertain Aspects of Domain Concepts Ontology-Mediated Querying of Uncertain Data From Statistical to Subjective Probabilities
SLIDE 5
DLs: a short Reminder
5
DL-Lite family (including OWL2 QL) Director ⌘ Person u 9directed.(Movie t TVseries) ForeignMovie ⌘ 8producedIn.¬US u 9language.¬English Simple DB constraints: inclusion dependencies + projection + fundeps Movie v 9hasDirector 9hasISSN v SerialPublication Considered “expressive” in the area, have all Boolean operators ALC family (including SHIQ, OWL2 DL, and others) Positive, conjunctive, existential; has many tractable members EL family (including Horn-SHIQ, OWL2 EL, and others) Director ⌘ Person u 9directed.Movie
SLIDE 6
6
Part 1: Representing Uncertain Aspects of Domain Concepts
SLIDE 7
Uncertain Concepts in Ontologies
7
Some examples from medical ontologies: Adequate modelling of domain concepts may involve uncertain aspects Aim: design ontology language for capturing uncertain aspects of concepts Probable tubo-ovarian abscess Natural death with probable cause suspected Probable Diagnosis, Probably Present Basal Cell Tumor, Uncertain whether Benign or Malignant
SLIDE 8
ProbFO
8
Halpern, Bacchus, et al. [1990]: Probabilistic First-Order Logic (ProbFO) “Type 1” (statistical probabilities): Models: single FO-structure + probability distribution over domain! Suitable for the representation of statistical probabilities, e.g. , but interaction probabilities - logic limited Validity is Π2
1-hard
FO + terms ||φ(x, y)||x + function symbols +, ×, 0, 1 + =, > ||φ(x, y)||x: probability that randomly chosen x satisfy φ(x, y) ||Hepatitis(x) ∧ Jaundice(x)||x = 0.8 · ||Jaundice(x)||x
SLIDE 9
ProbFO
9
Halpern, Bacchus, et al. [1990]: Probabilistic First-Order Logic (ProbFO) “Type 2” (degrees of belief): Models: probability distribution over FO structures (possible worlds)! Suitable for the representation of degrees of belief, e.g. More interesting interaction between probabilities and logic Validity is Π2
1-hard
FO + terms p(φ(x)) + function symbols +, ×, 0, 1 + =, > p(φ(x)): degree of belief that x satisfies φ p(Hepatitis(eric)) > 0.8 Probable tubo-ovarian abscess, probable diagnosis, etc: Type 2!
SLIDE 10
ProbDL
10
[L__SchröderKR10,Gutierrez-BasultoJungL__SchröderAAAI11] Extends classical DLs with:
Probabilistic concepts: P∼nC with ∼ ∈ {<, ≤, =, ≥, >} ProbableTuboOvarianAbscess ≡ P≥0.9TuboOvarianAbscess Probabilistic roles: P∼nr with ∼ ∈ {<, ≤, =, ≥, >}
Linear/polynomial concept inequalities, independence constraints indep(AB0, Male) ProbableTOA ≡ P(TOA) > c · P(¬TOA) Probabilistic DLs as fragments of Type 2 Probabilistic FO RiskPatient ⌘ Patient u 9P=1hadContactWith.Infected
SLIDE 11
Some Results
11
Probabilistic concepts only: ExpTime-complete in ALC et al., thus essentially for free Probabilistic concepts and roles: undecidable with linear concept inequalities or independence constraints both ALC and EL: decidability open 2ExpTime-c. / PSpace-c. when restricted to {0, 1} (in ALC / EL) still holds with polynomial concept inequalities + indep constraints still ExpTime-c. in EL even with a single operator P∼pC
in PTime with probabilities {>0, =1} - possibility / necessity and with probabilities {>0, =1, >p} when restricted to classical TBoxes
SLIDE 12
Some More Results
12
Monodic fragments of ProbFO: apply probability operator only to formulas with <=1 free variable Then: Validity is recursively enumerable, in contrast to full ProbFO Decidable if FO-part is restricted to decidable FO fragment [JungL__GoncharovSchröderICALP14] disallow quantification over probability values (under mild assumptions) E.g.: for the guarded fragment, complexity between 2ExpTime-c. and NExpTime-c.
SLIDE 13
Open Question
13
Prob-DL has limitations regarding independence: independence only between different properties of same object all independences must be explicitly declared indep(AB0, Male)
- ften infeasible, default independence assumptions needed
but cannot say e.g.: a person being male is independent of any other person being male Intuitively, this results in overcautious reasoning How can it be overcome?
SLIDE 14
14
Part 2: Ontology-Mediated Querying of Uncertain Data
SLIDE 15
Uncertain Data, Certain Domain Knowledge
15
Two recent development in databases: Ontology-Mediated Querying Probabilistic Databases More complete answers to queries over incomplete data Data annotated with probabilities, answers to queries too Combination of the two is very natural: ABox: (SoccerClub(FCBarca), 0.9) p(messi) being answer to ∃y (playsFor(x, y) ∧ SoccerClub(y)): 0.54 (all independent!)
(playsFor(messi, FCBarca), 0.6) (Player(messi), 0.8)
SLIDE 16
Uncertain Data, Certain Domain Knowledge
16
Two recent development in databases: Ontology-Mediated Querying Probabilistic Databases More complete answers to queries over incomplete data Data annotated with probabilities, answers to queries too Combination of the two is very natural: ABox: (SoccerClub(FCBarca), 0.9) p(messi) being answer to ∃y (playsFor(x, y) ∧ SoccerClub(y)): TBox: Player v 9playsFor 9playsFor− v 9SoccerClub 0.908 (Player(messi), 0.8) (playsFor(messi, FCBarca), 0.6)
SLIDE 17
Tuple-Independent Databases
17
Tuple independent database: set of data items, each associated with a probability, such as (Player(messi), 0.8) all data items considered to be independent probabilistic events
- ne possible world for each set S of data items,
p(S) = Q
t∈S p(t) × Q t/ ∈S 1 − p(t)
This is the most inexpressive probabilistic data model: cannot assign probability to group of data items does not separate data items and probabilistic events
SLIDE 18
Probabilistic DBs: Data Complexity
18
Tuple independent databases: For answering UCQs (unions of conjunctive queries), there is a precise characterisation and dichotomy for PTime / #P Implemented (research) systems: MayBeMS, Trio, MystiQ, ProbDB [DalviSchnaitterSuciuPODS10]
SLIDE 19
Probabilistic DBs: Dichotomy
There is set of five inference rules that can be applied to a (Boolean) query to compute in polytime the probability that it is true, e.g.: A paradigmatic #P-hard query: ∃x∃y A(x) ∧ R(x, y) ∧ B(y) Rule application can fail. Then the query can be shown #P-hard, reduction of #SAT for monotone bipartite DNF formulas Independent and/or: Independent projection: Inclusion/exclusion:
p(q1 ∧ q2) = p(q1) × p(q2) and p(q1 ∨ q2) = 1 − ((1 − p(q1)) × (1 − p(q2))
if q1, q2 share no symbols, then if x is a “separator variable”, then
p(∃x q) = 1 − Q
a∈dom(1 − p(q[a/x]))
Simplest form: p(q1 ∧ q2) = p(q1) + p(q2) − p(q1 ∨ q2) W
j xij ∨ yij
SLIDE 20
pOMQ: Abstract Dichotomy
20
SQL Database FO-query Data in ABox Query and TBox Ontology-Mediated Querying with DL-Lite TBoxes via Query Rewriting Theorem (Dichotomy) Rewritten query: UCQ equivalent to original query, thus preserves probability For every CQ q and DL-Lite TBox T , computing answer probabilities for q w.r.t. T is either in PTIME or #P-hard. => probabilistic DB systems can be utilised for ontology-mediated querying
SLIDE 21
pOMQ: Concrete Dichotomy
21
PTime PTime PTime #P #P PTime ∅ TBox 9s v 9r
r r
- r is implied, modulo T , by a (minimal) query containing this
Simple tree query: query in which there is a variable that occurs in every atom Observation: If (minimized) query is not simple tree query, then #P-hard Essentially, a simple tree query is #P-hard if it contains [JungL__ISWC2012]
SLIDE 22
Beyond FO-Rewritability
22
For most DLs except DL-Lite, FO-rewritability is not guaranteed Theorem [JungL__ISWC12] By reduction from #SAT restricted to monotone bipartite DNF: Thus, FO-rewritability is a complete tool for proving PTime-results!! What to expect regarding the complexity of non-FO-rewritable queries? If a (rooted) CQ q is not FO-rewritable relative to an ELI-TBox T , then q is #P-hard w.r.t. T .
SLIDE 23
Monte Carlo Approximation
23
Definition Fully polynomial randomized approximation scheme (FPRAS) for CQ q and TBox T is randomized polytime algorithm: Input: Output: Pr ⇣|p(A, T | = q) − x| p(A, T | = q) ≤ 1 ✏ ⌘ ≥ 3 4. probabilistic ABox A + error bound ✏ > 0 (follows from [KarpLuby83,DalviSuciuVLDB07]) In ELI, question related to FPRASes for network reliability problems ALC: coNP data complexity implies no FPRAS (unless RP = NP) approximated answer probability x such that In DL-Lite, there is an FPRAS for every CQ and TBox
SLIDE 24
Some Questions
24
Probabilities in both TBox and data (+ tuple independence!): Semantics? Computational Properties? Since most queries are #P-hard: How to deal with them? Are there questions for logicians (beyond FPRASes)?
SLIDE 25
25
Part 3: From Statistical to Subjective Probabilities
SLIDE 26
26
Motivation
Joint reasoning about statistical and subjective probabilities is very useful: TBox: Conclusion: ||Hepatitis|Jaundice|| = .8 p(Hepatitis(eric)) = .8 ABox: Jaundice(eric) Brave reasoning!
SLIDE 27
27
Motivation
Joint reasoning about statistical and subjective probabilities is very useful: TBox: ABox: Jaundice(eric) Conclusion: ||Hepatitis|Jaundice|| = .8 ||Hepatitis|Jaundice ∧ Fever|| = .95 Fever(eric) p(Hepatitis(eric)) = .95 Reference class reasoning, Non-Monotonic!
SLIDE 28
28
Motivation
Joint reasoning about statistical and subjective probabilities is very useful: TBox: ||Hepatitis|Jaundice|| = .8 ||Hepatitis|Jaundice ∧ Fever|| = .95 Tall(eric) ABox: Conclusion: p(Hepatitis(eric)) = .95 Ignore irrelevant information! What is irrelevant? Jaundice(eric) Fever(eric)
SLIDE 29
29
Motivation
Joint reasoning about statistical and subjective probabilities is very useful: TBox: ||Hepatitis|Jaundice|| = .8 ||Hepatitis|Jaundice ∧ Fever|| = .95 Tall(eric) ABox: Jaundice(eric) Fever(eric)
SLIDE 30
30
Motivation
Joint reasoning about statistical and subjective probabilities is very useful: TBox: ||Hepatitis|Jaundice|| = .8 ABox: Conclusion: ||Hepatitis|Fever|| = .1 Combination of evidences (assumed independent) p(Hepatitis(eric)) = some value slightly > .8 Tall(eric) Jaundice(eric) Fever(eric)
SLIDE 31
31
Random Worlds
[BacchusGroveHalpernKollerAIJ96] Degree of belief in α given KB, with fixed domain size N: (principle of indifference, related to max entropy in unary case) ≈ uniform distribution on models of KB of size N Given that domain size is unknown and supposed to be large: let N grow large and define degree of believe to be limit value Gets impressive number of examples right, technically related to 0-1 laws consider all size N models of KB, use fraction that satisfy α KB contains both statistical and logical knowledge α is potential consequence, e.g. Jaundice(eric); probability?
SLIDE 32
32
Random Worlds and DLs
It is possible and very tempting to apply random worlds to DL KBs: TBox is set of concept inclusions plus statistics ||Hepatitis|Jaundice|| = .8 Hepatitis v Disease ABox is standard DL ABox Query is: compute degree of belief in given ABox assertion Problem: main non-propositional feature of DLs: ∃r.C and ∀r.C (unless C is unsatisfiable or valid) [HalpernKapronAPAL94] Limit probability of ∃r.C is always 1, of ∀r.C always 0 (empty KB)!
SLIDE 33
33
Random Worlds and DLs
TBox: ABox: Conclusion: empty Patient(eric) p(Hepatitis(eric)) = .5 empty Patient(eric) p(∃hasDisease.Hepatitis(eric)) = 1 Semantically, the problem is clear: For almost all binary relations, the principle of indifference
- n the level of tuples is clearly inappropriate
33
The expected number of successors in a size N structure is N/2
In a DL context, the expected number of successor should be small, but this seems fundamentally incompatible with random worlds
SLIDE 34
Question
34
Can we find a reasonable and computationally feasible semantics for computing degrees of belief in DL knowledge bases? This is a challenge even for very simple DLs auch as EL and DL-Lite Maybe use [KollerHalpernAAAI96] where uniform distributions are weakened (to distributions that are exchangeable + fully independent) Another approach is [LukasiewiczAIJ07], but see [KlinovParsiaSattler09]
SLIDE 35
35
Many more Probabilistic DLs
[Heinsohn1994] [Jaeger1994] [KollerLevyPfeiffer1997] [Yelland2000] [d’AmatoFanizziLukasiewicz2008] [CeylanPenaloza2014] [MauaCozman2015] [RiguzziBellodiLammaZese2015] [PenalozaPotyka2016] [Lukasiewicz2008] [KlinovParsia2013] [GottlobLukasiewiczMartinezSimari2013] [Lukasiewicz2007]
SLIDE 36