Reasoning about Disclosure in Data Integration in the Presence of Source Constraints
DIG Seminar 21/11/19
Michael Benedikt Pierre Bourhis Louis Jachiet Micha¨ el Thomazo
Louis JACHIET 1 / 32
Reasoning about Disclosure in Data Integration in the Presence of - - PowerPoint PPT Presentation
Reasoning about Disclosure in Data Integration in the Presence of Source Constraints DIG Seminar 21/11/19 Michael Benedikt Pierre Bourhis Louis Jachiet Micha el Thomazo Louis JACHIET 1 / 32 Motivations Louis JACHIET 2 / 32
DIG Seminar 21/11/19
Michael Benedikt Pierre Bourhis Louis Jachiet Micha¨ el Thomazo
Louis JACHIET 1 / 32
Motivations
Louis JACHIET 2 / 32
Motivations
Schema+Constraints
Louis JACHIET 2 / 32
Motivations
publication
Louis JACHIET 2 / 32
Motivations
Secret
publication
Louis JACHIET 2 / 32
Motivations
Secret
publication Secret leaked?
Louis JACHIET 2 / 32
Motivations
Secret
Safe publication? Secret leaked?
Louis JACHIET 2 / 32
Example: Hospital setting
Patients Doctors Buildings Specialties Open hours
Louis JACHIET 3 / 32
Example: Hospital setting
DocSpec PatSpec PatDoc DocBldg PatBldg IsOpen
Louis JACHIET 4 / 32
Example
Database schema Predicate Meaning IsOpen(b, t) Building b is open on Date t PatBdlg(p, b) Patient p is present in Building b PatSpec(p, s) Patient p was treated for Specialty s PatDoc(p, d) Patient p was treated by Doctor d DocBldg(d, b) Doctor d is associated with Building b DocSpec(d, s) Doctor d is associated with Specialty s
Louis JACHIET 5 / 32
Example
Views OpenHours(b, t) = IsOpen(b, t) VisitingHours(p, t) = PatBdlg(p, b) ∧ IsOpen(b, t) DocList(d, s, b) = DocSpec(d, s) ∧ DocBldg(d, b)
Louis JACHIET 6 / 32
Example
Views OpenHours(b, t) = IsOpen(b, t) VisitingHours(p, t) = PatBdlg(p, b) ∧ IsOpen(b, t) DocList(d, s, b) = DocSpec(d, s) ∧ DocBldg(d, b) Constraints PatDoc(p, d) → ∃s PatSpec(p, s) ∧ DocSpec(d, s) PatBdlg(p, b) → ∃d PatDoc(p, d) ∧ DocBldg(d, b)
Louis JACHIET 6 / 32
Example
Views OpenHours(b, t) = IsOpen(b, t) VisitingHours(p, t) = PatBdlg(p, b) ∧ IsOpen(b, t) DocList(d, s, b) = DocSpec(d, s) ∧ DocBldg(d, b) Constraints PatDoc(p, d) → ∃s PatSpec(p, s) ∧ DocSpec(d, s) PatBdlg(p, b) → ∃d PatDoc(p, d) ∧ DocBldg(d, b) Secret ∃p, s PatSpec(p, s)?
Louis JACHIET 6 / 32
Example
OpenHour B1 Tuesday B2 Every day 10-17h VisitingHours Charline Tuesday DocList Alice Cancer B1 Alice Cancer B2 Bob Radiology B2 Daniel Cancer B1 Views OpenHours(b, t) = IsOpen(b, t) VisitingHours(p, t) = PatBdlg(p, b) ∧ IsOpen(b, t) DocList(d, s, b) = DocSpec(d, s) ∧ DocBldg(d, b)
Louis JACHIET 7 / 32
Example
OpenHour B1 Tuesday B2 Every day 10-17h VisitingHours Charline Tuesday DocList Alice Cancer B1 Alice Cancer B2 Bob Radiology B2 Daniel Cancer B1 Views OpenHours(b, t) = IsOpen(b, t) VisitingHours(p, t) = PatBdlg(p, b) ∧ IsOpen(b, t) DocList(d, s, b) = DocSpec(d, s) ∧ DocBldg(d, b)
Louis JACHIET 7 / 32
Example
OpenHour B1 Tuesday B2 Every day 10-17h VisitingHours Charline Tuesday DocList Alice Cancer B1 Alice Cancer B2 Bob Radiology B2 Daniel Cancer B1 Constraints PatBdlg(p, b) → ∃d PatDoc(p, d) ∧ DocBldg(d, b) Views OpenHours(b, t) = IsOpen(b, t) VisitingHours(p, t) = PatBdlg(p, b) ∧ IsOpen(b, t) DocList(d, s, b) = DocSpec(d, s) ∧ DocBldg(d, b)
Louis JACHIET 7 / 32
Example
OpenHour B1 Tuesday B2 Every day 10-17h VisitingHours Charline Tuesday DocList Alice Cancer B1 Alice Cancer B2 Bob Radiology B2 Daniel Cancer B1 Constraints PatBdlg(p, b) → ∃d PatDoc(p, d) ∧ DocBldg(d, b) Views OpenHours(b, t) = IsOpen(b, t) VisitingHours(p, t) = PatBdlg(p, b) ∧ IsOpen(b, t) DocList(d, s, b) = DocSpec(d, s) ∧ DocBldg(d, b)
Louis JACHIET 7 / 32
Example
OpenHour B1 Tuesday B2 Every day 10-17h VisitingHours Charline Tuesday DocList Alice Cancer B1 Alice Cancer B2 Bob Radiology B2 Daniel Cancer B1 Constraints PatBdlg(p, b) → ∃d PatDoc(p, d) ∧ DocBldg(d, b) PatDoc(p, d) → ∃s PatSpec(p, s) ∧ DocSpec(d, s) Views OpenHours(b, t) = IsOpen(b, t) VisitingHours(p, t) = PatBdlg(p, b) ∧ IsOpen(b, t) DocList(d, s, b) = DocSpec(d, s) ∧ DocBldg(d, b)
Louis JACHIET 7 / 32
Formalism
Data represented by databases R(1, 17), R(2, 42), S(23, 45), . . .
Louis JACHIET 8 / 32
Formalism
Data represented by databases R(1, 17), R(2, 42), S(23, 45), . . .
Mappings and secrets are CQ V (x, z) := R(x, y) ∧ S(y, z)
Louis JACHIET 8 / 32
Formalism
Data represented by databases R(1, 17), R(2, 42), S(23, 45), . . .
Mappings and secrets are CQ V (x, z) := R(x, y) ∧ S(y, z) Constraints are TGD R(x, y) → ∃z, S(y, z)
Louis JACHIET 8 / 32
Safe publication
Secret
Louis JACHIET 9 / 32
Safe publication
Secret
No secret
Louis JACHIET 9 / 32
Safe publication
Secret
No secret View Problem Given (schema, constraints C, views V, secret S, visible ) do we have such that C( ), V( ) = and ¬S( )?
Louis JACHIET 9 / 32
Safe publication
Secret
No secret Schema Problem Given (schema, constraints C, views V, secret S) do we have for all an instance such that C( ), V( ) = V( ) and ¬S( )?
Louis JACHIET 9 / 32
Which configurations are decidable/tractable for the schema problem?
Louis JACHIET 10 / 32
Which configurations are decidable/tractable for the schema problem? Secrets Views Constraints
Louis JACHIET 10 / 32
Which configurations are decidable/tractable for the schema problem? Secrets Views Constraints CQ
Louis JACHIET 10 / 32
Which configurations are decidable/tractable for the schema problem? Secrets Views Constraints CQ CQ
Louis JACHIET 10 / 32
Which configurations are decidable/tractable for the schema problem? Secrets Views Constraints CQ CQ ProjMap AtomMap GuardedMap CQMap
Louis JACHIET 10 / 32
Which configurations are decidable/tractable for the schema problem? Secrets Views Constraints CQ CQ ProjMap AtomMap GuardedMap CQMap Let’s talk about constraints...
Louis JACHIET 10 / 32
Ontologies in a few words
An ontology represents entities and their relationship to each other.
Louis JACHIET 11 / 32
Ontologies in a few words
An ontology represents entities and their relationship to each other. Ontologies can be seen as a sort of expressive schema.
Louis JACHIET 11 / 32
Ontologies in a few words
An ontology represents entities and their relationship to each other. Ontologies can be seen as a sort of expressive schema. Ontologies allows to enrich data by inferring new facts from existing ones.
Louis JACHIET 11 / 32
An example of ontology
With plain words: All cats are mammals. All mammals are animals.
Louis JACHIET 12 / 32
An example of ontology
With plain words: All cats are mammals. All mammals are animals. With Tuple Generating Dependencies: CAT(x) → MAMMAL(x) MAMMAL(x) → ANIMAL(x)
Louis JACHIET 12 / 32
An example of ontology
With plain words: All cats are mammals. All mammals are animals. Database: {CAT( )}
Louis JACHIET 12 / 32
An example of ontology
With plain words: All cats are mammals. All mammals are animals. Database: {CAT( )} Query: Are there animals? (∃X, ANIMAL(X)?)
Louis JACHIET 12 / 32
An example of ontology
With plain words: All cats are mammals. All mammals are animals. Database: {CAT( )} Query: Are there animals? (∃X, ANIMAL(X)?) Answer: Yes: CAT( ) ⇒ MAMMAL( ) ⇒ ANIMAL( )
Louis JACHIET 12 / 32
More complex ontological rules
Foreign key constraint: SEMINAR(team, speaker, room, date) → ∃pers, RESERVED(room, date, pers)
Louis JACHIET 13 / 32
More complex ontological rules
Foreign key constraint: SEMINAR(team, speaker, room, date) → ∃pers, RESERVED(room, date, pers) Even more complex constraints: SEMINAR(team, speaker, room, date) → ∃pers, RESERVED(room, date, pers)∧MEMBER(pers, team)
Louis JACHIET 13 / 32
More complex ontological rules
Foreign key constraint: SEMINAR(team, speaker, room, date) → ∃pers, RESERVED(room, date, pers) Even more complex constraints: SEMINAR(team, speaker, room, date) → ∃pers, RESERVED(room, date, pers)∧MEMBER(pers, team) MEMBER(person, team) → ∃date, room, SEMINAR(team, person, room, date)
Louis JACHIET 13 / 32
Tuple Generating Dependencies
∀ X, Y ϕ( X, Y ) ⇒ ∃ Z ψ( Y , Z)
Louis JACHIET 14 / 32
Tuple Generating Dependencies
∀ X, Y ϕ( X, Y ) ⇒ ∃ Z ψ( Y , Z) Often Omitted
Louis JACHIET 14 / 32
Tuple Generating Dependencies
ϕ( X, Y ) ⇒ ∃ Z ψ( Y , Z) Body Head
Louis JACHIET 14 / 32
Tuple Generating Dependencies
ϕ( X, Y ) ⇒ ∃ Z ψ( Y , Z) Frontier
Louis JACHIET 14 / 32
Open World Query Answering (OWQA)
Open World Query Answering
Do we have, for all : (F ⊆ ∧ C( )) ⇒ Q( )?
Louis JACHIET 15 / 32
Open World Query Answering
OWQA(F, C, Q) asks if Q is true in all completions of F (respecting C).
Louis JACHIET 16 / 32
Open World Query Answering
OWQA(F, C, Q) asks if Q is true in all completions of F (respecting C). The Chase algorithms build a universal model.
Louis JACHIET 16 / 32
Open World Query Answering
OWQA(F, C, Q) asks if Q is true in all completions of F (respecting C). The Chase algorithms build a universal model. OWQA(F, C, Q) ⇔ Chase(F, C) Q
Louis JACHIET 16 / 32
The Chase
Intuitively the chase simply “applies” the constraints.
Louis JACHIET 17 / 32
The Chase
Intuitively the chase simply “applies” the constraints. With:
)
ANIMAL(X)} We obtain:
)}
Louis JACHIET 17 / 32
The Chase
Intuitively the chase simply “applies” the constraints. With:
)
ANIMAL(X)} We obtain:
)}
), MAMMAL( )}
Louis JACHIET 17 / 32
The Chase
Intuitively the chase simply “applies” the constraints. With:
)
ANIMAL(X)} We obtain:
)}
), MAMMAL( )}
), MAMMAL( ), ANIMAL( )}
Louis JACHIET 17 / 32
The Chase
Intuitively the chase simply “applies” the constraints.
Louis JACHIET 18 / 32
The Chase
Intuitively the chase simply “applies” the constraints. With:
We obtain:
Louis JACHIET 18 / 32
The Chase
Intuitively the chase simply “applies” the constraints. With:
We obtain:
Louis JACHIET 18 / 32
The Chase
Intuitively the chase simply “applies” the constraints. With:
We obtain:
Louis JACHIET 18 / 32
The Chase
With:
PARENT(X, Y ) → ∃PERSON(Y )} We obtain:
Louis JACHIET 19 / 32
The Chase
With:
PARENT(X, Y ) → ∃PERSON(Y )} We obtain:
Louis JACHIET 19 / 32
The Chase
With:
PARENT(X, Y ) → ∃PERSON(Y )} We obtain:
Louis JACHIET 19 / 32
The Chase
With:
PARENT(X, Y ) → ∃PERSON(Y )} We obtain:
PARENT(Y , Y ′)}
Louis JACHIET 19 / 32
The Chase
With:
PARENT(X, Y ) → ∃PERSON(Y )} We obtain:
PARENT(Y , Y ′)} . . .
Louis JACHIET 19 / 32
The Chase
The Chase model is not always finite.
Louis JACHIET 20 / 32
The Chase
The Chase model is not always finite. When it is not finite it sometimes has a regularity that allows for decidable OWQA.
Louis JACHIET 20 / 32
The Chase
The Chase model is not always finite. When it is not finite it sometimes has a regularity that allows for decidable OWQA. And in general the OWQA is undecidable. . .
Louis JACHIET 20 / 32
Decidable Classes of TGD
A(x, Y ) → ∃Z, B(x, Z)
Louis JACHIET 21 / 32
Decidable Classes of TGD
A(x, Y ) → ∃Z, B(x, Z)
A( X, Y ) → ∃Z, B( X, Z)
Louis JACHIET 21 / 32
Decidable Classes of TGD
A(x, Y ) → ∃Z, B(x, Z)
A( X, Y ) → ∃Z, B( X, Z)
A(x, x, y) → ∃Z, B(x, y, y, z)
Louis JACHIET 21 / 32
Decidable Classes of TGD
A(x, Y ) → ∃Z, B(x, Z)
A( X, Y ) → ∃Z, B( X, Z)
A(x, x, y) → ∃Z, B(x, y, y, z)
A(x, y, z) ∧ B(x) ∧ C(y, z) → ∃w, D(x, y, w)
Louis JACHIET 21 / 32
Decidable Classes of TGD
A(x, Y ) → ∃Z, B(x, Z)
A( X, Y ) → ∃Z, B( X, Z)
A(x, x, y) → ∃Z, B(x, y, y, z)
A(x, y, z) ∧ B(x) ∧ C(y, z) → ∃w, D(x, y, w)
A(w, y, y) ∧ B(x) ∧ C(y, z) → ∃u, D(x, y, u)
Louis JACHIET 21 / 32
Decidable Classes of TGD
A(x, Y ) → ∃Z, B(x, Z)
A( X, Y ) → ∃Z, B( X, Z)
A(x, x, y) → ∃Z, B(x, y, y, z)
A(x, y, z) ∧ B(x) ∧ C(y, z) → ∃w, D(x, y, w)
A(w, y, y) ∧ B(x) ∧ C(y, z) → ∃u, D(x, y, u)
variable A(w, y, y) ∧ B(x) ∧ C(y, z) → ∃u, D(x, u)
Louis JACHIET 21 / 32
Another approach: query rewriting
With
And the query ∃X, ANIMAL(X), we obtain:
Louis JACHIET 22 / 32
Another approach: query rewriting
With
And the query ∃X, ANIMAL(X), we obtain:
Louis JACHIET 22 / 32
Another approach: query rewriting
With
And the query ∃X, ANIMAL(X), we obtain:
Louis JACHIET 22 / 32
Back to our problems
View Problem Given (schema, constraints C, views V, secret S, visible ) do we have such that C( ), V( ) = and ¬S( )? Schema Problem Given (schema, constraints C, views V, secret S) do we have for all an instance such that C( ), V( ) = V( ) and ¬S( )?
Louis JACHIET 23 / 32
Solving the schema problem
The critical instance The instance
C contains one fact per relation, with one
constant: C.
Louis JACHIET 24 / 32
Solving the schema problem
The critical instance The instance
C contains one fact per relation, with one
constant: C. We note
C = V( C) its view image.
Louis JACHIET 24 / 32
Solving the schema problem
The critical instance The instance
C contains one fact per relation, with one
constant: C. We note
C = V( C) its view image.
Reduction for schema problem SchemaProblem(C, V, S) reduces to ViewProblem(
C, C, V, S)
From Querying Visible and Invisible Information. LICS 2016
Louis JACHIET 24 / 32
Reduction to Open World Query Answering (OWQA)
Open World Query Answering
Do we have, for all : (F ⊆ ∧ C( )) ⇒ Q( )?
Louis JACHIET 25 / 32
Reduction to Open World Query Answering
Encoding ViewProblem(
C, C, V, S) as OWQA
Louis JACHIET 26 / 32
Reduction to Open World Query Answering
Encoding ViewProblem(
C, C, V, S) as OWQA
Louis JACHIET 26 / 32
Reduction to Open World Query Answering
Encoding ViewProblem(
C, C, V, S) as OWQA
C ⊆ V(
)
Louis JACHIET 26 / 32
Reduction to Open World Query Answering
Encoding ViewProblem(
C, C, V, S) as OWQA
C ⊆ V(
)
Louis JACHIET 26 / 32
Reduction to Open World Query Answering
Encoding ViewProblem(
C, C, V, S) as OWQA
C ⊆ V(
)
But we also need to encode the backward constraints V( ) ⊆
C!
Louis JACHIET 26 / 32
Reduction to Open World Query Answering
Encoding ViewProblem(
C, C, V, S) as OWQA
C ⊆ V(
)
But we also need to encode the backward constraints V( ) ⊆
C!
For this we use that adom(V( )) = {C}
Louis JACHIET 26 / 32
Lower bounds
Various reductions from:
Louis JACHIET 27 / 32
Lower bounds
Various reductions from:
Louis JACHIET 27 / 32
Lower bounds
Various reductions from:
Louis JACHIET 27 / 32
Complexity results
Constraints Views ProjMap AtomMap GuardedMap CQMap IncDep PSpace ExpTime 2ExpTime 2ExpTime LTGD ExpTime ExpTime 2ExpTime 2ExpTime GTGD 2ExpTime 2ExpTime 2ExpTime 2ExpTime FGTGD 2ExpTime 2ExpTime 2ExpTime 2ExpTime
Table 1: Complexity of disclosure
⇒ all bounds are tight!
Louis JACHIET 28 / 32
Complexity results
Constraints Views ProjMap AtomMap GuardedMap CQMap IncDep NP NP ExpTime 2ExpTime LTGD NP NP ExpTime 2ExpTime GTGD ExpTime ExpTime ExpTime 2ExpTime FGTGD 2ExpTime 2ExpTime 2ExpTime 2ExpTime
Table 2: Complexity of disclosure in bounded arity
⇒ all bounds are tight!
Louis JACHIET 29 / 32
Complexity results
In PTime:
Louis JACHIET 30 / 32
Complexity results
In PTime:
Louis JACHIET 30 / 32
Future Works
Louis JACHIET 31 / 32
Future Works
Louis JACHIET 31 / 32
Thank you!
Questions?
Louis JACHIET 32 / 32