The Quest for Probabilistic Description Logics: A Personal - - PowerPoint PPT Presentation

the quest for probabilistic description logics a personal
SMART_READER_LITE
LIVE PREVIEW

The Quest for Probabilistic Description Logics: A Personal - - PowerPoint PPT Presentation

The Quest for Probabilistic Description Logics: A Personal Perspective Carsten Lutz University of Bremen DLs and Probabilities: a Match Made in Heaven Good reasons to extend Description Logics with probabilistic aspects: Representing


slide-1
SLIDE 1

Carsten Lutz University of Bremen

The Quest for Probabilistic Description Logics: A Personal Perspective

slide-2
SLIDE 2

DLs and Probabilities: a Match Made in Heaven

2

Reasoning about uncertain data Representing uncertain aspects of domain concepts Representing statistical domain knowledge Good reasons to extend Description Logics with probabilistic aspects: Representing degrees of belief 80% of all patients with jaundice have hepatitis When sbdy is a professor, my degree of belief (s)he’s intelligent is 90% Snomed CT: Natural death with probable cause suspected E.g. from web sources with different levels of trust

slide-3
SLIDE 3

DLs and Probabilities: a Match Made in Hell

3

SO MANY possible combinations and setups: Which formalism to combine with? Bayes Net? Markov Net? Markov Logic? Lexikographic Entailment? Max Entropy? How to model probabilistic independence? Apply probabilities to TBoxes? Concepts? Roles? Data Items? One distribution? Only constraints on distributions? Which distributions? Resulting formalisms VERY VERY different and hard to compare Although sometimes claimed, there is no “one-size-fits-all” solution Moreover: Intractability comes VERY VERY quickly How to handle non-monotonic aspects? …

slide-4
SLIDE 4

From Hell to Heaven?

4

In this talk: Personal opinion: research on probabilistic DLs should start from concrete application task, develop dedicated logic for it. Representing Uncertain Aspects of Domain Concepts Ontology-Mediated Querying of Uncertain Data From Statistical to Subjective Probabilities

slide-5
SLIDE 5

DLs: a short Reminder

5

DL-Lite family (including OWL2 QL) Director ⌘ Person u 9directed.(Movie t TVseries) ForeignMovie ⌘ 8producedIn.¬US u 9language.¬English Simple DB constraints: inclusion dependencies + projection + fundeps Movie v 9hasDirector 9hasISSN v SerialPublication Considered “expressive” in the area, have all Boolean operators ALC family (including SHIQ, OWL2 DL, and others) Positive, conjunctive, existential; has many tractable members EL family (including Horn-SHIQ, OWL2 EL, and others) Director ⌘ Person u 9directed.Movie

slide-6
SLIDE 6

6

Part 1: Representing Uncertain Aspects of Domain Concepts

slide-7
SLIDE 7

Uncertain Concepts in Ontologies

7

Some examples from medical ontologies: Adequate modelling of domain concepts may involve uncertain aspects Aim: design ontology language for capturing uncertain aspects of concepts Probable tubo-ovarian abscess Natural death with probable cause suspected Probable Diagnosis, Probably Present Basal Cell Tumor, Uncertain whether Benign or Malignant

slide-8
SLIDE 8

ProbFO

8

Halpern, Bacchus, et al. [1990]: Probabilistic First-Order Logic (ProbFO) “Type 1” (statistical probabilities): Models: single FO-structure + probability distribution over domain! Suitable for the representation of statistical probabilities, e.g. , but interaction probabilities - logic limited Validity is Π2

1-hard

FO + terms ||φ(x, y)||x + function symbols +, ×, 0, 1 + =, > ||φ(x, y)||x: probability that randomly chosen x satisfy φ(x, y) ||Hepatitis(x) ∧ Jaundice(x)||x = 0.8 · ||Jaundice(x)||x

slide-9
SLIDE 9

ProbFO

9

Halpern, Bacchus, et al. [1990]: Probabilistic First-Order Logic (ProbFO) “Type 2” (degrees of belief): Models: probability distribution over FO structures (possible worlds)! Suitable for the representation of degrees of belief, e.g. More interesting interaction between probabilities and logic Validity is Π2

1-hard

FO + terms p(φ(x)) + function symbols +, ×, 0, 1 + =, > p(φ(x)): degree of belief that x satisfies φ p(Hepatitis(eric)) > 0.8 Probable tubo-ovarian abscess, probable diagnosis, etc: Type 2!

slide-10
SLIDE 10

ProbDL

10

[L__SchröderKR10,Gutierrez-BasultoJungL__SchröderAAAI11] Extends classical DLs with:

Probabilistic concepts: P∼nC with ∼ ∈ {<, ≤, =, ≥, >} ProbableTuboOvarianAbscess ≡ P≥0.9TuboOvarianAbscess Probabilistic roles: P∼nr with ∼ ∈ {<, ≤, =, ≥, >}

Linear/polynomial concept inequalities, independence constraints indep(AB0, Male) ProbableTOA ≡ P(TOA) > c · P(¬TOA) Probabilistic DLs as fragments of Type 2 Probabilistic FO RiskPatient ⌘ Patient u 9P=1hadContactWith.Infected

slide-11
SLIDE 11

Some Results

11

Probabilistic concepts only: ExpTime-complete in ALC et al., thus essentially for free Probabilistic concepts and roles: undecidable with linear concept inequalities or independence constraints both ALC and EL: decidability open 2ExpTime-c. / PSpace-c. when restricted to {0, 1} (in ALC / EL) still holds with polynomial concept inequalities + indep constraints still ExpTime-c. in EL even with a single operator P∼pC

in PTime with probabilities {>0, =1} - possibility / necessity and with probabilities {>0, =1, >p} when restricted to classical TBoxes

slide-12
SLIDE 12

Some More Results

12

Monodic fragments of ProbFO: apply probability operator only to formulas with <=1 free variable Then: Validity is recursively enumerable, in contrast to full ProbFO Decidable if FO-part is restricted to decidable FO fragment [JungL__GoncharovSchröderICALP14] disallow quantification over probability values (under mild assumptions) E.g.: for the guarded fragment, complexity between 2ExpTime-c. and NExpTime-c.

slide-13
SLIDE 13

Open Question

13

Prob-DL has limitations regarding independence: independence only between different properties of same object all independences must be explicitly declared indep(AB0, Male)

  • ften infeasible, default independence assumptions needed

but cannot say e.g.: a person being male is independent of any other person being male Intuitively, this results in overcautious reasoning How can it be overcome?

slide-14
SLIDE 14

14

Part 2: Ontology-Mediated Querying of Uncertain Data

slide-15
SLIDE 15

Uncertain Data, Certain Domain Knowledge

15

Two recent development in databases: Ontology-Mediated Querying Probabilistic Databases More complete answers to queries over incomplete data Data annotated with probabilities, answers to queries too Combination of the two is very natural: ABox: (SoccerClub(FCBarca), 0.9) p(messi) being answer to ∃y (playsFor(x, y) ∧ SoccerClub(y)): 0.54 (all independent!)

(playsFor(messi, FCBarca), 0.6) (Player(messi), 0.8)

slide-16
SLIDE 16

Uncertain Data, Certain Domain Knowledge

16

Two recent development in databases: Ontology-Mediated Querying Probabilistic Databases More complete answers to queries over incomplete data Data annotated with probabilities, answers to queries too Combination of the two is very natural: ABox: (SoccerClub(FCBarca), 0.9) p(messi) being answer to ∃y (playsFor(x, y) ∧ SoccerClub(y)): TBox: Player v 9playsFor 9playsFor− v 9SoccerClub 0.908 (Player(messi), 0.8) (playsFor(messi, FCBarca), 0.6)

slide-17
SLIDE 17

Tuple-Independent Databases

17

Tuple independent database: set of data items, each associated with a probability, such as (Player(messi), 0.8) all data items considered to be independent probabilistic events

  • ne possible world for each set S of data items,

p(S) = Q

t∈S p(t) × Q t/ ∈S 1 − p(t)

This is the most inexpressive probabilistic data model: cannot assign probability to group of data items does not separate data items and probabilistic events

slide-18
SLIDE 18

Probabilistic DBs: Data Complexity

18

Tuple independent databases: For answering UCQs (unions of conjunctive queries), there is a precise characterisation and dichotomy for PTime / #P Implemented (research) systems: MayBeMS, Trio, MystiQ, ProbDB [DalviSchnaitterSuciuPODS10]

slide-19
SLIDE 19

Probabilistic DBs: Dichotomy

There is set of five inference rules that can be applied to a (Boolean) query to compute in polytime the probability that it is true, e.g.: A paradigmatic #P-hard query: ∃x∃y A(x) ∧ R(x, y) ∧ B(y) Rule application can fail. Then the query can be shown #P-hard, reduction of #SAT for monotone bipartite DNF formulas Independent and/or: Independent projection: Inclusion/exclusion:

p(q1 ∧ q2) = p(q1) × p(q2) and p(q1 ∨ q2) = 1 − ((1 − p(q1)) × (1 − p(q2))

if q1, q2 share no symbols, then if x is a “separator variable”, then

p(∃x q) = 1 − Q

a∈dom(1 − p(q[a/x]))

Simplest form: p(q1 ∧ q2) = p(q1) + p(q2) − p(q1 ∨ q2) W

j xij ∨ yij

slide-20
SLIDE 20

pOMQ: Abstract Dichotomy

20

SQL Database FO-query Data in ABox Query and TBox Ontology-Mediated Querying with DL-Lite TBoxes via Query Rewriting Theorem (Dichotomy) Rewritten query: UCQ equivalent to original query, thus preserves probability For every CQ q and DL-Lite TBox T , computing answer probabilities for q w.r.t. T is either in PTIME or #P-hard. => probabilistic DB systems can be utilised for ontology-mediated querying

slide-21
SLIDE 21

pOMQ: Concrete Dichotomy

21

PTime PTime PTime #P #P PTime ∅ TBox 9s v 9r

r r

  • r is implied, modulo T , by a (minimal) query containing this

Simple tree query: query in which there is a variable that occurs in every atom Observation: If (minimized) query is not simple tree query, then #P-hard Essentially, a simple tree query is #P-hard if it contains [JungL__ISWC2012]

slide-22
SLIDE 22

Beyond FO-Rewritability

22

For most DLs except DL-Lite, FO-rewritability is not guaranteed Theorem [JungL__ISWC12] By reduction from #SAT restricted to monotone bipartite DNF: Thus, FO-rewritability is a complete tool for proving PTime-results!! What to expect regarding the complexity of non-FO-rewritable queries? If a (rooted) CQ q is not FO-rewritable relative to an ELI-TBox T , then q is #P-hard w.r.t. T .

slide-23
SLIDE 23

Monte Carlo Approximation

23

Definition Fully polynomial randomized approximation scheme (FPRAS) for CQ q and TBox T is randomized polytime algorithm: Input: Output: Pr ⇣|p(A, T | = q) − x| p(A, T | = q) ≤ 1 ✏ ⌘ ≥ 3 4. probabilistic ABox A + error bound ✏ > 0 (follows from [KarpLuby83,DalviSuciuVLDB07]) In ELI, question related to FPRASes for network reliability problems ALC: coNP data complexity implies no FPRAS (unless RP = NP) approximated answer probability x such that In DL-Lite, there is an FPRAS for every CQ and TBox

slide-24
SLIDE 24

Some Questions

24

Probabilities in both TBox and data (+ tuple independence!): Semantics? Computational Properties? Since most queries are #P-hard: How to deal with them? Are there questions for logicians (beyond FPRASes)?

slide-25
SLIDE 25

25

Part 3: From Statistical to Subjective Probabilities

slide-26
SLIDE 26

26

Motivation

Joint reasoning about statistical and subjective probabilities is very useful: TBox: Conclusion: ||Hepatitis|Jaundice|| = .8 p(Hepatitis(eric)) = .8 ABox: Jaundice(eric) Brave reasoning!

slide-27
SLIDE 27

27

Motivation

Joint reasoning about statistical and subjective probabilities is very useful: TBox: ABox: Jaundice(eric) Conclusion: ||Hepatitis|Jaundice|| = .8 ||Hepatitis|Jaundice ∧ Fever|| = .95 Fever(eric) p(Hepatitis(eric)) = .95 Reference class reasoning, Non-Monotonic!

slide-28
SLIDE 28

28

Motivation

Joint reasoning about statistical and subjective probabilities is very useful: TBox: ||Hepatitis|Jaundice|| = .8 ||Hepatitis|Jaundice ∧ Fever|| = .95 Tall(eric) ABox: Conclusion: p(Hepatitis(eric)) = .95 Ignore irrelevant information! What is irrelevant? Jaundice(eric) Fever(eric)

slide-29
SLIDE 29

29

Motivation

Joint reasoning about statistical and subjective probabilities is very useful: TBox: ||Hepatitis|Jaundice|| = .8 ||Hepatitis|Jaundice ∧ Fever|| = .95 Tall(eric) ABox: Jaundice(eric) Fever(eric)

slide-30
SLIDE 30

30

Motivation

Joint reasoning about statistical and subjective probabilities is very useful: TBox: ||Hepatitis|Jaundice|| = .8 ABox: Conclusion: ||Hepatitis|Fever|| = .1 Combination of evidences (assumed independent) p(Hepatitis(eric)) = some value slightly > .8 Tall(eric) Jaundice(eric) Fever(eric)

slide-31
SLIDE 31

31

Random Worlds

[BacchusGroveHalpernKollerAIJ96] Degree of belief in α given KB, with fixed domain size N: (principle of indifference, related to max entropy in unary case) ≈ uniform distribution on models of KB of size N Given that domain size is unknown and supposed to be large: let N grow large and define degree of believe to be limit value Gets impressive number of examples right, technically related to 0-1 laws consider all size N models of KB, use fraction that satisfy α KB contains both statistical and logical knowledge α is potential consequence, e.g. Jaundice(eric); probability?

slide-32
SLIDE 32

32

Random Worlds and DLs

It is possible and very tempting to apply random worlds to DL KBs: TBox is set of concept inclusions plus statistics ||Hepatitis|Jaundice|| = .8 Hepatitis v Disease ABox is standard DL ABox Query is: compute degree of belief in given ABox assertion Problem: main non-propositional feature of DLs: ∃r.C and ∀r.C (unless C is unsatisfiable or valid) [HalpernKapronAPAL94] Limit probability of ∃r.C is always 1, of ∀r.C always 0 (empty KB)!

slide-33
SLIDE 33

33

Random Worlds and DLs

TBox: ABox: Conclusion: empty Patient(eric) p(Hepatitis(eric)) = .5 empty Patient(eric) p(∃hasDisease.Hepatitis(eric)) = 1 Semantically, the problem is clear: For almost all binary relations, the principle of indifference

  • n the level of tuples is clearly inappropriate

33

The expected number of successors in a size N structure is N/2

In a DL context, the expected number of successor should be small, but this seems fundamentally incompatible with random worlds

slide-34
SLIDE 34

Question

34

Can we find a reasonable and computationally feasible semantics for computing degrees of belief in DL knowledge bases? This is a challenge even for very simple DLs auch as EL and DL-Lite Maybe use [KollerHalpernAAAI96] where uniform distributions are weakened (to distributions that are exchangeable + fully independent) Another approach is [LukasiewiczAIJ07], but see [KlinovParsiaSattler09]

slide-35
SLIDE 35

35

Many more Probabilistic DLs

[Heinsohn1994] [Jaeger1994] [KollerLevyPfeiffer1997] [Yelland2000] [d’AmatoFanizziLukasiewicz2008] [CeylanPenaloza2014] [MauaCozman2015] [RiguzziBellodiLammaZese2015] [PenalozaPotyka2016] [Lukasiewicz2008] [KlinovParsia2013] [GottlobLukasiewiczMartinezSimari2013] [Lukasiewicz2007]

slide-36
SLIDE 36

Thank You!

36