Theory of Peer Data Management Sebastian Skritek Database and - - PowerPoint PPT Presentation

theory of peer data management
SMART_READER_LITE
LIVE PREVIEW

Theory of Peer Data Management Sebastian Skritek Database and - - PowerPoint PPT Presentation

Theory of Peer Data Management Sebastian Skritek Database and Artificial Intelligence Group Vienna University of Technology DEIS 2010 S.Skritek Theory of PDM 1/54 1.Motivation 1.1. Motivation Motivation From Data Integration to Peer


slide-1
SLIDE 1

Theory of Peer Data Management

Sebastian Skritek

Database and Artificial Intelligence Group Vienna University of Technology

DEIS 2010

S.Skritek – Theory of PDM 1/54

slide-2
SLIDE 2

1.Motivation 1.1. Motivation

Motivation

From Data Integration to Peer Data Integration

DB1 DB2 DB3 G

S.Skritek – Theory of PDM 2/54

slide-3
SLIDE 3

1.Motivation 1.1. Motivation

Motivation

From Data Integration to Peer Data Integration

DB1 DB2 DB3 G DB1 DB2 DB3

Extend semantics from data integration, BUT: query answering may become undecidable some tractable fragments are very restrictive further undesired properties ⇒ several suggestions made for semantics of mappings

S.Skritek – Theory of PDM 2/54

slide-4
SLIDE 4

1.Motivation 1.1. Motivation

Motivation

⇒ use “tools” from data exchange and data integration (but they are not completely satisfactory) Additional problems (compared to DEI) Modularity of peers Inconsistencies, Updates Trust Peer Data Management covers a variety of scenarios ⇒ take a look on the theory behind some of these systems

(formal semantics, decidability, complexity, . . . )

S.Skritek – Theory of PDM 3/54

slide-5
SLIDE 5

1.Motivation 1.1. Motivation

Talk Outline

  • 1. Motivation
  • 2. Query Answering in Peer Data Management
  • 3. Materialization of Data in Peer Data Management
  • 4. Optimization of Query Reformulation
  • 5. Conclusion

S.Skritek – Theory of PDM 4/54

slide-6
SLIDE 6

2.Query Answering in Peer Data Management 2.0.

Outline

  • 1. Motivation
  • 2. Query Answering in Peer Data Management

2.1 General Framework 2.2 PPL 2.3 An Epistemic Logic Approach

  • 3. Materialization of Data in Peer Data Management
  • 4. Optimization of Query Reformulation
  • 5. Conclusion

S.Skritek – Theory of PDM 5/54

slide-7
SLIDE 7

2.Query Answering in Peer Data Management 2.1. General Framework

A Framework for Peer Data Integration

P1 P2 P3 P4 P5 P6 P7 P8

Each peer P = (G, S, L, M) consists of

S.Skritek – Theory of PDM 6/54

slide-8
SLIDE 8

2.Query Answering in Peer Data Management 2.1. General Framework

A Framework for Peer Data Integration

P1 P2 P3 P4 P5 P6 P7 P8 G1 G2 G3 G4 G5 G6 G7 G8

Each peer P = (G, S, L, M) consists of

  • a peer schema G

S.Skritek – Theory of PDM 6/54

slide-9
SLIDE 9

2.Query Answering in Peer Data Management 2.1. General Framework

A Framework for Peer Data Integration

P1 P2 P3 P4 P5 P6 P7 P8 S1 G1 S2 G2 S3 G3 S4 G4 S5 G5 G6 G7 G8

Each peer P = (G, S, L, M) consists of

  • a peer schema G
  • a (possible empty) local/source schema S

S.Skritek – Theory of PDM 6/54

slide-10
SLIDE 10

2.Query Answering in Peer Data Management 2.1. General Framework

A Framework for Peer Data Integration

P1 P2 P3 P4 P5 P6 P7 P8 S1 G1 S2 G2 S3 G3 S4 G4 S5 G5 G6 G7 G8

Each peer P = (G, S, L, M) consists of

  • a peer schema G
  • a (possible empty) local/source schema S
  • a (possible empty) set of local mappings L: {cqS cqG}

S.Skritek – Theory of PDM 6/54

slide-11
SLIDE 11

2.Query Answering in Peer Data Management 2.1. General Framework

A Framework for Peer Data Integration

P1 P2 P3 P4 P5 P6 P7 P8 S1 G1 S2 G2 S3 G3 S4 G4 S5 G5 G6 G7 G8

Each peer P = (G, S, L, M) consists of

  • a peer schema G
  • a (possible empty) local/source schema S
  • a (possible empty) set of local mappings L: {cqS cqG}
  • a set of peer mappings M: {cqP′ cqP}

S.Skritek – Theory of PDM 6/54

slide-12
SLIDE 12

2.Query Answering in Peer Data Management 2.1. General Framework

A Framework for Peer Data Integration

P1 P2 P3 P4 P5 P6 P7 P8 S1 G1 S2 G2 S3 G3 S4 G4 S5 G5 G6 G7 G8 q1 q2

Each peer P = (G, S, L, M) consists of

  • a peer schema G
  • a (possible empty) local/source schema S
  • a (possible empty) set of local mappings L: {cqS cqG}
  • a set of peer mappings M: {cqP′ cqP}

Queries q are posed over peer schema of a single peer

  • data remains in sources, queries (and results) are propagated

S.Skritek – Theory of PDM 6/54

slide-13
SLIDE 13

2.Query Answering in Peer Data Management 2.2. PPL

PPL (Peer Programming Language)

[Halevy et al., VLDB J. 2005] P S G L M P = (G, S, L, M)

Definition

Local Mappings L:

  • P : r ⊆ cq (P : r = cq)

Peer Mappings M:

  • cq′

P′ ⊆ cqP (cq′ P′ = cqP)

inclusion/equality mappings

  • rP(

x) :- cqP′( x) definitional mappings

G, S:

  • relational schemas

(Note: mappings only between pairs of peers)

S.Skritek – Theory of PDM 7/54

slide-14
SLIDE 14

2.Query Answering in Peer Data Management 2.2. PPL

PPL: Semantics

Definition (consistent data instance)

Let N be a PDMS, D an instance for S. Instance I for G is consistent with N and D if for every m ∈ L

  • r D ⊆ cqI

(resp. r D = cqI)

for every m ∈ M either

  • cq′I

P′ ⊆ cqI P

(resp. cq′I

P′ = cqI P)

  • r
  • r(

x)I = body(m1)I ∪ · · · ∪ body(mn)I where r = head(m), and {m ∈ M | head(m) = r} = {m1, . . . , mn}

certain answers to query q( x): tuples t s.t. t ∈ q( x)I for every consistent instance I

S.Skritek – Theory of PDM 8/54

slide-15
SLIDE 15

2.Query Answering in Peer Data Management 2.2. PPL

PPL: First Order Interpretation

PDMS P = {P1, . . . , Pn}

(consider only inclusion storage descriptions)

→ Define semantics in terms of FO logic: ∀ x

r(

x) → ∃ zψG( x, z)

(for each m ∈ Li)

∀ x

y(φPi( x, y)) → ∃ zψPj( x, z)

(for each m ∈ M)

  • allow only restricted inclusion peer mappings,
  • use disjunctive TGDs for definitional mappings

⇒ certain answers w.r.t. D: answer in every model I of P FO theory TPi for Pi, TP =

Pi∈P TPi for P

Models for theories (given instances Di for Si):

  • Model of TPi (TP) based on Di (D =

i Di):

interpretation I of TPi (TP) s.t. sI = sDi (for each s ∈ Si)

  • Model of P based on D:

model of TP based on D and of mappings M

S.Skritek – Theory of PDM 9/54

slide-16
SLIDE 16

2.Query Answering in Peer Data Management 2.2. PPL

PPL: Complexity

Theorem (Halevy et al., 2005)

Let N be a PDMS specified in PPL

1 Finding all certain answers to CQ q is undecidable 2 If N contains only inclusion peer and storage descriptions and

the peer mappings are acyclic ⇒ CQ answering in polynomial time (data complexity)

Proof (sketch).

(2) Encode query and mappings in a nonrecursive datalog program with Skolem terms ⇒ evaluation PTIME (data complexity).

S.Skritek – Theory of PDM 10/54

slide-17
SLIDE 17

2.Query Answering in Peer Data Management 2.2. PPL

PPL: Complexity (contd.)

Finding all certain answers to a CQ q is undecidable

Proof (sketch).

(1) Shown by reduction from implication problem for FDs and IDs: Given R, Σ, ϕ = Ri[A] ⊆ Rj[B] ⇒ N = {P1}, with P1 = ( R, {S/1}, {S ⊆ Ri[A]}, M), and M: for FD Ri : A → B: {( A, B1, B2) | Ri( A, B1), Ri( A, B2)} ⊆ {( A, B, B) | Ri( A, B)} for ID Ri[ A] ⊆ Rj[ B]: Ri[ A] ⊆ Rj[ B] Then let I = {S(a)}, and q : {Rj[B]}. It holds that Σ | = ϕ iff q returns a.

S.Skritek – Theory of PDM 11/54

slide-18
SLIDE 18

2.Query Answering in Peer Data Management 2.2. PPL

PPL: Complexity (contd.)

Consider the following restrictions:

1 equality storage or peer mappings do not contain projection 2 peer relations that appear in the head of a definitional

mapping do not appear on the rhs of any other mapping

Theorem (Halevy et al., VLDB J. 2005)

All inclusion peer mappings acyclic, but equality peer mappings ⇒ CQ answering is (data complexity) If (1) and (2) ⇒ in PTIME If (1) but not (2) ⇒ coNP complete If (2) but not (1) ⇒ coNP complete

S.Skritek – Theory of PDM 12/54

slide-19
SLIDE 19

2.Query Answering in Peer Data Management 2.2. PPL

PPL: Query Answering

Consider again the mappings: Peer Mappings M:

  • cq′

P′ = cqP ⇒ cq′ P′ ⊆ cqP and cqP ⊆ cq′ P′

  • cq′

P′ ⊆ cqP ⇒ v(

x) ⊆ cqP and v( x) :- cq′

P′

  • rP(

x) :- cqP′( x)

⇒ pure LAV and GAV mappings

S.Skritek – Theory of PDM 13/54

slide-20
SLIDE 20

2.Query Answering in Peer Data Management 2.2. PPL

PPL: Query Answering

Consider again the mappings: Peer Mappings M:

  • cq′

P′ = cqP ⇒ cq′ P′ ⊆ cqP and cqP ⊆ cq′ P′

  • cq′

P′ ⊆ cqP ⇒ v(

x) ⊆ cqP and v( x) :- cq′

P′

  • rP(

x) :- cqP′( x)

⇒ pure LAV and GAV mappings combine methods for answering queries in these settings:

  • unfolding
  • algorithms for answering queries using views

build a rule/goal tree derive UCQ over S from it ⇒ sound, and for polynomial cases also complete

S.Skritek – Theory of PDM 13/54

slide-21
SLIDE 21

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering: First Order Reasoning

query answering: FO reasoning over P Citizen(P) Male(P) Female(P) TaxPayer(P) m1 m2 m3 m4

C(,) Female(alice) Male(alice) TaxPayer(alice)

m1 : Citizen(x) :- Male(x) m2 : Citizen(x) :- Female(x) m3 : Male(x) ⊆ TaxPayer(x) m4 : Female(x) ⊆ TaxPayer(x) m1, m2 : Citizen(x) → Male(x) ∨ Female(x) m3 : Male(x) → TaxPayer(x) m4 : Female(x) → TaxPayer(x)

S.Skritek – Theory of PDM 14/54

slide-22
SLIDE 22

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering: First Order Reasoning

query answering: FO reasoning over P Citizen(P) Male(P) Female(P) TaxPayer(P) m1 m2 m3 m4

C(,) Citizen(alice) Female(alice) Male(alice) TaxPayer(alice)

m1 : Citizen(x) :- Male(x) m2 : Citizen(x) :- Female(x) m3 : Male(x) ⊆ TaxPayer(x) m4 : Female(x) ⊆ TaxPayer(x) m1, m2 : Citizen(x) → Male(x) ∨ Female(x) m3 : Male(x) → TaxPayer(x) m4 : Female(x) → TaxPayer(x)

Example

Consider Query {x | TaxPayer(x)}

S.Skritek – Theory of PDM 14/54

slide-23
SLIDE 23

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering: First Order Reasoning

query answering: FO reasoning over P Citizen(P) Male(P) Female(P) TaxPayer(P) m1 m2 m3 m4

C(,) Citizen(alice) Female(alice) Female(alice) Male(alice) TaxPayer(alice) TaxPayer(alice)

m1 : Citizen(x) :- Male(x) m2 : Citizen(x) :- Female(x) m3 : Male(x) ⊆ TaxPayer(x) m4 : Female(x) ⊆ TaxPayer(x) m1, m2 : Citizen(x) → Male(x) ∨ Female(x) m3 : Male(x) → TaxPayer(x) m4 : Female(x) → TaxPayer(x)

Example

Consider Query {x | TaxPayer(x)}

S.Skritek – Theory of PDM 14/54

slide-24
SLIDE 24

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering: First Order Reasoning

query answering: FO reasoning over P Citizen(P) Male(P) Female(P) TaxPayer(P) m1 m2 m3 m4

C(,) Citizen(alice) Female(alice) Male(alice) Male(alice) TaxPayer(alice) TaxPayer(alice)

m1 : Citizen(x) :- Male(x) m2 : Citizen(x) :- Female(x) m3 : Male(x) ⊆ TaxPayer(x) m4 : Female(x) ⊆ TaxPayer(x) m1, m2 : Citizen(x) → Male(x) ∨ Female(x) m3 : Male(x) → TaxPayer(x) m4 : Female(x) → TaxPayer(x)

Example

Consider Query {x | TaxPayer(x)}

S.Skritek – Theory of PDM 14/54

slide-25
SLIDE 25

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering: First Order Reasoning

query answering: FO reasoning over P Citizen(P) Male(P) Female(P) TaxPayer(P) m1 m2 m3 m4

C(,) Citizen(alice) Female(alice) Male(alice) Female(alice) Male(alice) TaxPayer(alice) TaxPayer(alice)

m1 : Citizen(x) :- Male(x) m2 : Citizen(x) :- Female(x) m3 : Male(x) ⊆ TaxPayer(x) m4 : Female(x) ⊆ TaxPayer(x) m1, m2 : Citizen(x) → Male(x) ∨ Female(x) m3 : Male(x) → TaxPayer(x) m4 : Female(x) → TaxPayer(x)

Example

Consider Query {x | TaxPayer(x)}

S.Skritek – Theory of PDM 14/54

slide-26
SLIDE 26

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Epistemic Logic

A modal logic used for modeling knowledge, certainty Modal logic is used e.g. in multi agent systems More precisely: KT45 (or S5)

S.Skritek – Theory of PDM 15/54

slide-27
SLIDE 27

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Epistemic Logic

A modal logic used for modeling knowledge, certainty Modal logic is used e.g. in multi agent systems More precisely: KT45 (or S5)

I1 I2 I3 I4

Syntax: FOL, but also Kφ is an atom (if φ is a formula) Semantics:

  • Often defined using Kripke structures (W , R, V )
  • Here: every world is accessible from every world
  • epistemic interpretation ε = (I, W )

W . . . set of FO interpretations, I ∈ W

a( x) satisfied in ε: by t s.t. a( t) is true in I Kφ( x) satisfied in ε: by t s.t. φ( t) is satisfied in all ε′ = (J, W ) with J ∈ W epistemic model: φ is satisfied in every (J, W ) (J ∈ W )

S.Skritek – Theory of PDM 15/54

slide-28
SLIDE 28

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Modeling PDM [Calvanese et al., 2004]

P1 P2 P3 P4 P5 P6 P7 P8 S1 G1 S2 G2 S3 G3 S4 G4 S5 G5 G6 G7 G8 q1 q2

Peer schema: G may contain function free FO formulas over AG

S.Skritek – Theory of PDM 16/54

slide-29
SLIDE 29

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Modeling PDM [Calvanese et al., 2004]

P1 P2 P3 P4 P5 P6 P7 P8 S1 G1 S2 G2 S3 G3 S4 G4 S5 G5 G6 G7 G8 q1 q2

Peer schema: G may contain function free FO formulas over AG Epistemic Theory: TP:

  • formulas in G

x

y(φS( x, y)) → ∃ zψG( x, z)

  • (for each m ∈ L)

MP:

  • axioms ∀

x

  • K(∃

yφ( x, y)) → ∃ zψ( x, z)

  • (for each m ∈ M)

S.Skritek – Theory of PDM 16/54

slide-30
SLIDE 30

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Modeling PDM [Calvanese et al., 2004]

P1 P2 P3 P4 P5 P6 P7 P8 S1 G1 S2 G2 S3 G3 S4 G4 S5 G5 G6 G7 G8 q1 q2

Peer schema: G may contain function free FO formulas over AG Epistemic Theory: TP:

  • formulas in G

x

y(φS( x, y)) → ∃ zψG( x, z)

  • (for each m ∈ L)

MP:

  • axioms ∀

x

  • K(∃

yφ( x, y)) → ∃ zψ( x, z)

  • (for each m ∈ M)

Semantics: Recall: FOL model of TP based on D Epistemic model of P based on D: (I, W )

  • W : set of models of TP based on D
  • (I, W ): epistemic model of MP

Certain answers w.r.t. D: qI for all epistemic models (I, W )

S.Skritek – Theory of PDM 16/54

slide-31
SLIDE 31

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Properties of Epistemic Logic Based Semantics

(denote certain answers w.r.t. source instance D as ans(q, P, D)) sound approximation of FOL: ansK(q, P, D) ⊆ ansfol(q, P, D) Unique Maximal Epistemic Model for P (I, W ) s.t. there exists no model (J, W ′) with W ⊂ W ′

  • Unique, Independent of I

⇒ ansK(q, P, D) = { t | t ∈ qI for each I ∈ W }

S.Skritek – Theory of PDM 17/54

slide-32
SLIDE 32

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Properties of Epistemic Logic Based Semantics

(denote certain answers w.r.t. source instance D as ans(q, P, D)) sound approximation of FOL: ansK(q, P, D) ⊆ ansfol(q, P, D) Unique Maximal Epistemic Model for P (I, W ) s.t. there exists no model (J, W ′) with W ⊂ W ′

  • Unique, Independent of I

⇒ ansK(q, P, D) = { t | t ∈ qI for each I ∈ W } FOE(P, D): minimal FO theory containing TP, D, and

  • for each cq′ cq, if FOE(P, D) |

= ∃ ybody cq′( t, y), then ∃ zbody cq( t, z) ∈ FOE(P, D)

Theorem (Calvanese et al., 2004)

The set of interpretations {I | I | = FOE(P, D)} is the unique maximal epistemic model W for P based on D.

S.Skritek – Theory of PDM 17/54

slide-33
SLIDE 33

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Intuition: Only exchange certain answers

Definition (τ(P))

Given Pi = (G, S, L, M), define τ(Pi) = (G, S′, L′, M) where

1 S′ = S ∪ {r | cq′ cq ∈ M} 2 L′ = L ∪

{

x | r( x)} cq | cq′ cq ∈ M

  • G

S

cq′

1 cq1

cq′

2 cq2 S.Skritek – Theory of PDM 18/54

slide-34
SLIDE 34

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Intuition: Only exchange certain answers

Definition (τ(P))

Given Pi = (G, S, L, M), define τ(Pi) = (G, S′, L′, M) where

1 S′ = S ∪ {r | cq′ cq ∈ M} 2 L′ = L ∪

{

x | r( x)} cq | cq′ cq ∈ M

  • G

S

cq′

1 cq1

cq′

2 cq2

r1 r2 r1 → cq1 r2 → cq2

S.Skritek – Theory of PDM 18/54

slide-35
SLIDE 35

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Intuition: Only exchange certain answers (contd.)

G S

cq′

1 cq1

cq′

2 cq2

r1 r2 r1 → cq1 r2 → cq2

D q G S

cq′

1 cq1

cq′

2 cq2

r1 r2 r1 → cq1 r2 → cq2

D q

Given P, Pi = (G, S, L, M), D and query q over G: Let ¯ D be source instance for τ(Pi) s.t.

  • S ¯

D = SD and r ¯ D = ans(cq′, P, D)

We want ans(q, P, D) = ans(q, τ(P), ¯ D) provides: modularity and independence

S.Skritek – Theory of PDM 19/54

slide-36
SLIDE 36

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Intuition: Only exchange certain answers (contd.)

Recall intuition: ans(q, P, D) = ans(q, τ(P), ¯ D) for cq′ cq ∈ M:

  • r ∈ S′, {

x | r( x)} cq ∈ L′

  • r ¯

D = ans(cq′, P, D)

S.Skritek – Theory of PDM 20/54

slide-37
SLIDE 37

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Intuition: Only exchange certain answers (contd.)

Recall intuition: ans(q, P, D) = ans(q, τ(P), ¯ D) for cq′ cq ∈ M:

  • r ∈ S′, {

x | r( x)} cq ∈ L′

  • r ¯

D = ans(cq′, P, D)

Further recall: ansK(cq′, P, D) = { t | t ∈ cq′I for each I ∈ W }

  • for W : maximal epistemic model

axiom ∀ x

K(∃

ybodycq′( x, y)) → ∃ zbodycq( x, z)

  • S.Skritek – Theory of PDM

20/54

slide-38
SLIDE 38

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Intuition: Only exchange certain answers (contd.)

Recall intuition: ans(q, P, D) = ans(q, τ(P), ¯ D) for cq′ cq ∈ M:

  • r ∈ S′, {

x | r( x)} cq ∈ L′

  • r ¯

D = ans(cq′, P, D)

Further recall: ansK(cq′, P, D) = { t | t ∈ cq′I for each I ∈ W }

  • for W : maximal epistemic model

axiom ∀ x

K(∃

ybodycq′( x, y)) → ∃ zbodycq( x, z)

  • Hence (informal)

ansK(cq′, P, D) = { t | for each I ∈ W : ∃ y: bodycq′( t, y) ∈ I} K(∃ ybodycq′( x, y)) satisfied by tuples { t | in each I ∈ W : ∃ y : bodycq′( t, y) ∈ I} ⇒ P “imports” the same tuples

S.Skritek – Theory of PDM 20/54

slide-39
SLIDE 39

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering

use this idea for query answering ⇒ always consider τ(P)

perfect reformulation

Given query q over Gi ⇒ query q1 over S′

i s.t. for every instance

D1 for τ(P), qD1

1

= ans(q, τ(P), D1)

(assume settings where perfect reformulation always exists)

S.Skritek – Theory of PDM 21/54

slide-40
SLIDE 40

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering

use this idea for query answering ⇒ always consider τ(P)

perfect reformulation

Given query q over Gi ⇒ query q1 over S′

i s.t. for every instance

D1 for τ(P), qD1

1

= ans(q, τ(P), D1)

(assume settings where perfect reformulation always exists)

Idea of the Algorithm Compute a datalog program DP, containing

  • facts from S
  • rules encoding perfect reformulations to S′

S.Skritek – Theory of PDM 21/54

slide-41
SLIDE 41

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering

use this idea for query answering ⇒ always consider τ(P)

perfect reformulation

Given query q over Gi ⇒ query q1 over S′

i s.t. for every instance

D1 for τ(P), qD1

1

= ans(q, τ(P), D1)

(assume settings where perfect reformulation always exists)

Idea of the Algorithm Compute a datalog program DP, containing

  • facts from S
  • rules encoding perfect reformulations to S′

Theorem (Calvanese et al., 2004)

1 Eval(headq, DP) computes ansK(q, P, D) 2 Given P, q,

t, deciding t ∈ ansK(q, P, D) is PTIME-complete (data complexity)

S.Skritek – Theory of PDM 21/54

slide-42
SLIDE 42

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Query Answering: Algorithm

query answering algorithm

at Pi: peerQueryHandler(q,rq) (1) DPI = computePerfectRef(q,rq,Pi); DPE = ∅ (2) for each r ∈ S′

i ∩ DPI:

(3) if r ∈ S (∗)

(3a) DPE = DPE ∪ {r( t) | r( t) ∈ D}

else (r ∈ S′ \ S)

(3b) DP′ = P′.peerQueryHandler(Q(r),r) DPI = DPI ∪ DP′

I ; DPE = DPE ∪ DP′ E

(4) return DP

(∗) loop detection omitted

G S

cq′

1 cq1

cq′

2 cq2

r1 r2 r1 → cq1 r2 → cq2

D q

S.Skritek – Theory of PDM 22/54

slide-43
SLIDE 43

2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach

Further nice properties

Decidability depends only on local properties under FOL: also constraints may be propagated by mappings Epistemic Logic: provides complete modularity for peers Mapping Composition Semantics allows for (reasonable) mapping composition Resulting systems are query equivalent Inconsistency Handling Consider two kinds of inconsistency:

  • local inconsistency, P2P inconsistency

Use nonmonotonic extension (K45A

n ), model cqi cqj:

x

  • ¬Ai⊥i ∧ Ki(∃

ybody cqi( x, y)) ∧ ¬Aj(¬∃ zbody cqj( x, z)) → Kj(∃ zbody cqj( x, z))

  • S.Skritek – Theory of PDM

23/54

slide-44
SLIDE 44

3.Materialization of Data in Peer Data Management 3.0.

Outline

  • 1. Motivation
  • 2. Query Answering in Peer Data Management
  • 3. Materialization of Data in Peer Data Management

3.1 Reconciling PDM and Data Exchange 3.2 Active XML 3.3 Orchestra

  • 4. Optimization of Query Reformulation
  • 5. Conclusion

S.Skritek – Theory of PDM 24/54

slide-45
SLIDE 45

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Idea

So far: Peer Data Integration data remains local at peers information needed for query answering are exchanged mappings can be considered as “virtual” Other possibility: Generalize Data Exchange copy data between different peers interpret mappings as constraints materialize data to satisfy these constraints → Look onto some approaches following this idea

S.Skritek – Theory of PDM 25/54

slide-46
SLIDE 46

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Reconciling Data Exchange and PDM

[De Giacomo et al., PODS 2007]

Si Gi

S.Skritek – Theory of PDM 26/54

slide-47
SLIDE 47

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Reconciling Data Exchange and PDM

[De Giacomo et al., PODS 2007]

P1

S.Skritek – Theory of PDM 26/54

slide-48
SLIDE 48

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Reconciling Data Exchange and PDM

[De Giacomo et al., PODS 2007]

P1 P2 P3 P4 P5 S = P, ,

  • S.Skritek – Theory of PDM

26/54

slide-49
SLIDE 49

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Reconciling Data Exchange and PDM

[De Giacomo et al., PODS 2007]

P1 P2 P3 P4 P5 S = P, , ME ME: TGDs between pairs

  • f peers

S.Skritek – Theory of PDM 26/54

slide-50
SLIDE 50

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Reconciling Data Exchange and PDM

[De Giacomo et al., PODS 2007]

P1 P2 P3 P4 P5 S = P, CE, ME ME: TGDs between pairs

  • f peers

CE: TGDs & EGDs over single peer

S.Skritek – Theory of PDM 26/54

slide-51
SLIDE 51

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Reconciling Data Exchange and PDM

[De Giacomo et al., PODS 2007]

P1 P2 P3 P4 P5 S = P, CE, ME ME: TGDs between pairs

  • f peers

CE: TGDs & EGDs over single peer Semantics: CE: FO semantics ME: exchanges only certain answers Universal S-solution

S.Skritek – Theory of PDM 26/54

slide-52
SLIDE 52

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Reconciling Data Exchange and PDM

[De Giacomo et al., PODS 2007]

P1 P2 P3 P4 P5 S = P, CE, ME, CI, MI ME, MI: TGDs between pairs of peers CE, CI: TGDs & EGDs over single peer

S.Skritek – Theory of PDM 26/54

slide-53
SLIDE 53

3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange

Reconciling Data Exchange and PDM

[De Giacomo et al., PODS 2007]

P1 P2 P3 P4 P5 S = P, CE, ME, CI, MI ME, MI: TGDs between pairs of peers CE, CI: TGDs & EGDs over single peer Semantics: CE, CI: FO semantics ME, MI: certain answers CI, MI precedence Universal S-solution

S.Skritek – Theory of PDM 26/54

slide-54
SLIDE 54

3.Materialization of Data in Peer Data Management 3.2. Active XML

Active XML

Active XML

S.Skritek – Theory of PDM 27/54

slide-55
SLIDE 55

3.Materialization of Data in Peer Data Management 3.2. Active XML

Active XML

Recall: Active XML

xml x x x

Not considered yet Formal semantics

  • service call
  • query answering

Complexity [Abiteboul et al., PODS 2004] → consider only monotone Web Services

S.Skritek – Theory of PDM 28/54

slide-56
SLIDE 56

3.Materialization of Data in Peer Data Management 3.2. Active XML

AXML Document

Definition (AXML document)

AXML document: pair (T, λ) where T = (N, E): finite, unordered tree

  • N ⊂ N: finite set of nodes
  • E ⊂ N × N: directed edges

λ: N → L ∪ F ∪ V: function s.t.

  • λ(n) ∈ V only if n is a leaf node
  • for root n, λ(n) ∈ V ∪ L

D: document names, N: nodes, L: labels, V: atomic values, F: function names

S.Skritek – Theory of PDM 29/54

slide-57
SLIDE 57

3.Materialization of Data in Peer Data Management 3.2. Active XML

AXML Document

data nodes, function nodes

Definition (AXML document)

AXML document: pair (T, λ) where T = (N, E): finite, unordered tree

  • N ⊂ N: finite set of nodes
  • E ⊂ N × N: directed edges

λ: N → L ∪ F ∪ V: function s.t.

  • λ(n) ∈ V only if n is a leaf node
  • for root n, λ(n) ∈ V ∪ L

D: document names, N: nodes, L: labels, V: atomic values, F: function names

S.Skritek – Theory of PDM 29/54

slide-58
SLIDE 58

3.Materialization of Data in Peer Data Management 3.2. Active XML

AXML Document

data nodes, function nodes

Definition (AXML document)

AXML document: pair (T, λ) where T = (N, E): finite, unordered tree

  • N ⊂ N: finite set of nodes
  • E ⊂ N × N: directed edges

λ: N → L ∪ F ∪ V: function s.t.

  • λ(n) ∈ V only if n is a leaf node
  • for root n, λ(n) ∈ V ∪ L

D: document names, N: nodes, L: labels, V: atomic values, F: function names

function call: pass subtree as parameter; get forest as return value ⇒ append as siblings to call node

S.Skritek – Theory of PDM 29/54

slide-59
SLIDE 59

3.Materialization of Data in Peer Data Management 3.2. Active XML

Reduced Documents

Definition

(T1, λ1) is subsumed by (T2, λ2) ((T1, λ1) ⊆ (T2, λ2)) if there exists mapping h: N1 → N2 s.t:

  • h(root(T1)) = root(T2)
  • n1 child of n2 ⇒ h(n1) child of h(n2) (for all n1, n2 ∈ N1)
  • λ1(n) = λ2(h(n)) (for all n ∈ N1)

d1 ⊆ d2 and d2 ⊆ d1 ⇒ d1 ≡ d2 → Document d is reduced if for no subtree d′ of d, d ≡ d′

S.Skritek – Theory of PDM 30/54

slide-60
SLIDE 60

3.Materialization of Data in Peer Data Management 3.2. Active XML

Reduced Documents

Definition

(T1, λ1) is subsumed by (T2, λ2) ((T1, λ1) ⊆ (T2, λ2)) if there exists mapping h: N1 → N2 s.t:

  • h(root(T1)) = root(T2)
  • n1 child of n2 ⇒ h(n1) child of h(n2) (for all n1, n2 ∈ N1)
  • λ1(n) = λ2(h(n)) (for all n ∈ N1)

d1 ⊆ d2 and d2 ⊆ d1 ⇒ d1 ≡ d2 → Document d is reduced if for no subtree d′ of d, d ≡ d′ Properties:

  • Each document has a unique reduced version
  • Decision and Function problem solvable in PTIME

S.Skritek – Theory of PDM 30/54

slide-61
SLIDE 61

3.Materialization of Data in Peer Data Management 3.2. Active XML

Monotone AXML Systems

Definition (monotone AXML system)

monotone AXML system: S = (D, F, I)

  • finite sets D ⊂ D , F ⊂ F
  • mapping I: for d ∈ D, I(d) returns a document,

for f ∈ F, I(f ) returns a monotone service

S.Skritek – Theory of PDM 31/54

slide-62
SLIDE 62

3.Materialization of Data in Peer Data Management 3.2. Active XML

Monotone AXML Systems

Definition (monotone AXML system)

monotone AXML system: S = (D, F, I)

  • finite sets D ⊂ D , F ⊂ F
  • mapping I: for d ∈ D, I(d) returns a document,

for f ∈ F, I(f ) returns a monotone service

(web) service s

  • defined w.r.t. set {d1, . . . , dn} of document names
  • given assignment θ of AXML documents to {d1, . . . , dn},

return forest of AXML documents

  • consider s as black box

monotone service

  • for all θ, θ′: for all i: θ(di) ⊆ θ′(di) ⇒ s(θ) ⊆ s(θ′)

S.Skritek – Theory of PDM 31/54

slide-63
SLIDE 63

3.Materialization of Data in Peer Data Management 3.2. Active XML

Invocations of Services

Service invocation

  • given S, d ∈ D, v ∈ I(d), λ(v) = f
  • invoking f : call I(f ) on θ: θ(di) = I(di), θ(input), θ(context)
  • append I(f )(θ) to parent of v, normalize afterward

S.Skritek – Theory of PDM 32/54

slide-64
SLIDE 64

3.Materialization of Data in Peer Data Management 3.2. Active XML

Invocations of Services

Service invocation

  • given S, d ∈ D, v ∈ I(d), λ(v) = f
  • invoking f : call I(f ) on θ: θ(di) = I(di), θ(input), θ(context)
  • append I(f )(θ) to parent of v, normalize afterward

Sequences of Invocations

  • S

v

− → S′: S′ ≡ S; S′ obtained from S by invoking function at node v

  • rewriting (possible infinite): S

v1

− → S1

v2

− → S2 → . . .

vn

− → Sn . . . (S

− → Sn)

  • system terminates in Sn: no vn+1, Sn+1 s.t. Sn

vn+1

− − → Sn+1

S.Skritek – Theory of PDM 32/54

slide-65
SLIDE 65

3.Materialization of Data in Peer Data Management 3.2. Active XML

Invocations of Services

Service invocation

  • given S, d ∈ D, v ∈ I(d), λ(v) = f
  • invoking f : call I(f ) on θ: θ(di) = I(di), θ(input), θ(context)
  • append I(f )(θ) to parent of v, normalize afterward

Sequences of Invocations

  • S

v

− → S′: S′ ≡ S; S′ obtained from S by invoking function at node v

  • rewriting (possible infinite): S

v1

− → S1

v2

− → S2 → . . .

vn

− → Sn . . . (S

− → Sn)

  • system terminates in Sn: no vn+1, Sn+1 s.t. Sn

vn+1

− − → Sn+1

fair (infinite) sequence

  • for every vi ∈ Si: there exists a j > i s.t. either Sj

vi

− → Sj+1 or invoking vi has no effect on Sj

S.Skritek – Theory of PDM 32/54

slide-66
SLIDE 66

3.Materialization of Data in Peer Data Management 3.2. Active XML

Semantics of monotone AXML systems

Definition (semantics of monotone AXML systems)

For a monotone AXML system S, its semantics [S] is defined as: [S] = J if S

− → J and system terminates at J (J finite) [S] = Si for infinite fair rewriting S . . . → . . . vi − → Si . . .

S.Skritek – Theory of PDM 33/54

slide-67
SLIDE 67

3.Materialization of Data in Peer Data Management 3.2. Active XML

Semantics of monotone AXML systems

Definition (semantics of monotone AXML systems)

For a monotone AXML system S, its semantics [S] is defined as: [S] = J if S

− → J and system terminates at J (J finite) [S] = Si for infinite fair rewriting S . . . → . . . vi − → Si . . . Semantics is well defined (order of invocations does not matter)

  • S

− → ˆ S and S

− → ¯ S: either ¯ S ⊆ S′ ( ˆ S terminates at S′),

  • r ¯

S ⊆ Si for some i ( ˆ S not terminating)

  • one rewriting terminates at J ⇒ any rewriting terminates at J
  • one fair rewriting does not terminate ⇒ no rewriting

terminates; any fair rewriting results in same infinite system

S.Skritek – Theory of PDM 33/54

slide-68
SLIDE 68

3.Materialization of Data in Peer Data Management 3.2. Active XML

Positive Active XML

Also consider service implementations, defined as queries

S.Skritek – Theory of PDM 34/54

slide-69
SLIDE 69

3.Materialization of Data in Peer Data Management 3.2. Active XML

Positive Active XML

Also consider service implementations, defined as queries

Definition (Positive Query)

positive query q: r :- d1/p1, . . . , dn/pn, e1, . . . , em where di: document names, r, pi: positive AXML tree patterns each variable occurring in r also occurs in some pi ej: inequalities x = y between label, function, or value variables or constants (no tree variables). No tree variable occurs twice in the body simple query: no tree variables AXML tree pattern: subtree of AXML document some labels replaced by label variables

S.Skritek – Theory of PDM 34/54

slide-70
SLIDE 70

3.Materialization of Data in Peer Data Management 3.2. Active XML

Query Semantics

Recall:

  • query q = r :- d1/p1, . . . , dn/pn, e1, . . . , em
  • monotone AXML system S = (D, F, I)

S.Skritek – Theory of PDM 35/54

slide-71
SLIDE 71

3.Materialization of Data in Peer Data Management 3.2. Active XML

Query Semantics

Recall:

  • query q = r :- d1/p1, . . . , dn/pn, e1, . . . , em
  • monotone AXML system S = (D, F, I)

Snapshot Result q(S)

  • consider variable assignments µ (respect typing) s.t.

for each di/pi ∈ q: µ(pi) ⊆ I(di)

  • q(S): forest of all documents µ(r)

S.Skritek – Theory of PDM 35/54

slide-72
SLIDE 72

3.Materialization of Data in Peer Data Management 3.2. Active XML

Query Semantics

Recall:

  • query q = r :- d1/p1, . . . , dn/pn, e1, . . . , em
  • monotone AXML system S = (D, F, I)

Snapshot Result q(S)

  • consider variable assignments µ (respect typing) s.t.

for each di/pi ∈ q: µ(pi) ⊆ I(di)

  • q(S): forest of all documents µ(r)
  • Properties:

monotone (i.e. S ⊆ S′ ⇒ q(S) ⊆ q(S′)) for positive queries (no inequalities of tree variables) for positive queries: PTIME

S.Skritek – Theory of PDM 35/54

slide-73
SLIDE 73

3.Materialization of Data in Peer Data Management 3.2. Active XML

Query Semantics

Recall:

  • query q = r :- d1/p1, . . . , dn/pn, e1, . . . , em
  • monotone AXML system S = (D, F, I)

Snapshot Result q(S)

  • consider variable assignments µ (respect typing) s.t.

for each di/pi ∈ q: µ(pi) ⊆ I(di)

  • q(S): forest of all documents µ(r)
  • Properties:

monotone (i.e. S ⊆ S′ ⇒ q(S) ⊆ q(S′)) for positive queries (no inequalities of tree variables) for positive queries: PTIME

Query Result [q](S)

  • [q](S) = q([S]) if S converges to finite system [S]
  • [q](S) = q(Si) for infinite fair rewriting S . . . Si . . .
  • therwise
  • for positive queries: result is independent of rewriting sequence

S.Skritek – Theory of PDM 35/54

slide-74
SLIDE 74

3.Materialization of Data in Peer Data Management 3.2. Active XML

Positive Systems

service descriptions I(f ) defined as positive queries if all queries are simple → simple positive system

S.Skritek – Theory of PDM 36/54

slide-75
SLIDE 75

3.Materialization of Data in Peer Data Management 3.2. Active XML

Positive Systems

service descriptions I(f ) defined as positive queries if all queries are simple → simple positive system Semantics of positive systems positive system S, function node v, λ(v) = f , I(f ) = q invoking f : evaluate q under θ snapshot result of q(S) is added as sibling of v

S.Skritek – Theory of PDM 36/54

slide-76
SLIDE 76

3.Materialization of Data in Peer Data Management 3.2. Active XML

Positive Systems

service descriptions I(f ) defined as positive queries if all queries are simple → simple positive system Semantics of positive systems positive system S, function node v, λ(v) = f , I(f ) = q invoking f : evaluate q under θ snapshot result of q(S) is added as sibling of v Complexity

Theorem (Abiteboul et al., PODS 2004)

Any Turing Machine can be simulated by a positive AXML system, with the input tape represented by an AXML tree. ⇒ it is undecidable whether a positive system terminates

S.Skritek – Theory of PDM 36/54

slide-77
SLIDE 77

3.Materialization of Data in Peer Data Management 3.2. Active XML

Restricted Systems

Try to find decidable systems Acyclic Systems dependency graph (V , E) of S = (D, F, I):

  • V : D ∪ F (document and function names)
  • E: edge (d, f ) if f occurs in I(d),

edge (f , d) (resp. (f , g)) if d (resp. g) occurs in I(f )

AXML system acyclic if dependency graph is acyclic acyclic systems always terminate

S.Skritek – Theory of PDM 37/54

slide-78
SLIDE 78

3.Materialization of Data in Peer Data Management 3.2. Active XML

Restricted Systems

Try to find decidable systems Acyclic Systems dependency graph (V , E) of S = (D, F, I):

  • V : D ∪ F (document and function names)
  • E: edge (d, f ) if f occurs in I(d),

edge (f , d) (resp. (f , g)) if d (resp. g) occurs in I(f )

AXML system acyclic if dependency graph is acyclic acyclic systems always terminate Simple Positive Systems Recall: simple queries: no tree variables For every simple positive system S:

  • [S] is regular
  • compute finite graph representation of [S] in EXPTIME
  • termination: decidable in EXPTIME, coNP hard

S.Skritek – Theory of PDM 37/54

slide-79
SLIDE 79

3.Materialization of Data in Peer Data Management 3.2. Active XML

Querying Positive Systems

Instead of materialization: just consider query answering

Definition (q-finite)

AXML system S is q-finite if [q](S) is finite

S.Skritek – Theory of PDM 38/54

slide-80
SLIDE 80

3.Materialization of Data in Peer Data Management 3.2. Active XML

Querying Positive Systems

Instead of materialization: just consider query answering

Definition (q-finite)

AXML system S is q-finite if [q](S) is finite q: non-simple query undecidable whether positive system S is q-finite acyclic systems are q-finite simple positive systems: deciding q-finiteness is coNP hard and in EXPTIME

S.Skritek – Theory of PDM 38/54

slide-81
SLIDE 81

3.Materialization of Data in Peer Data Management 3.2. Active XML

Querying Positive Systems

Instead of materialization: just consider query answering

Definition (q-finite)

AXML system S is q-finite if [q](S) is finite q: non-simple query undecidable whether positive system S is q-finite acyclic systems are q-finite simple positive systems: deciding q-finiteness is coNP hard and in EXPTIME q: simple query result is always finite BUT: for non-simple positive systems S: testing if [q](S) is nonempty is undecidable

S.Skritek – Theory of PDM 38/54

slide-82
SLIDE 82

3.Materialization of Data in Peer Data Management 3.2. Active XML

Lazy Query Evaluation

It might not be necessary to invoke a service answering a query irrelevant for answer just return call to service in answer (lazy evaluation)

S.Skritek – Theory of PDM 39/54

slide-83
SLIDE 83

3.Materialization of Data in Peer Data Management 3.2. Active XML

Lazy Query Evaluation

It might not be necessary to invoke a service answering a query irrelevant for answer just return call to service in answer (lazy evaluation)

Definition (possible answer)

AXML document α is a possible answer if [α] = [[q](I)]

S.Skritek – Theory of PDM 39/54

slide-84
SLIDE 84

3.Materialization of Data in Peer Data Management 3.2. Active XML

Lazy Query Evaluation

It might not be necessary to invoke a service answering a query irrelevant for answer just return call to service in answer (lazy evaluation)

Definition (possible answer)

AXML document α is a possible answer if [α] = [[q](I)] ⇒ not expanding function nodes N still gives a possible answer? (q–unneeded)

S.Skritek – Theory of PDM 39/54

slide-85
SLIDE 85

3.Materialization of Data in Peer Data Management 3.2. Active XML

Lazy Query Evaluation

It might not be necessary to invoke a service answering a query irrelevant for answer just return call to service in answer (lazy evaluation)

Definition (possible answer)

AXML document α is a possible answer if [α] = [[q](I)] ⇒ not expanding function nodes N still gives a possible answer? (q–unneeded) Given positive AXML system S, q, N in S, t:

  • undecidable if: d is possible answer to q; function nodes in N

need not be expanded; no more function needs to be expanded

  • For simple systems: in NEXPTIME, coNP hard

S.Skritek – Theory of PDM 39/54

slide-86
SLIDE 86

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Updates in PDM

Updates in Peer Data Management

S.Skritek – Theory of PDM 40/54

slide-87
SLIDE 87

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Updates in PDM

Updates in Peer Data Management Updates in PDI: no problem in PDM: may lead to inconsistencies ⇒ problem

S.Skritek – Theory of PDM 40/54

slide-88
SLIDE 88

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Updates in PDM

Updates in Peer Data Management Updates in PDI: no problem in PDM: may lead to inconsistencies ⇒ problem Other concerns so far: “global” systems Trust Provenance information

S.Skritek – Theory of PDM 40/54

slide-89
SLIDE 89

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Updates in PDM

Updates in Peer Data Management Updates in PDI: no problem in PDM: may lead to inconsistencies ⇒ problem Other concerns so far: “global” systems Trust Provenance information Take a look onto the Orchestra system

S.Skritek – Theory of PDM 40/54

slide-90
SLIDE 90

3.Materialization of Data in Peer Data Management 3.3. Orchestra

General Setting

P1 P2 P3 schema mappings:

  • (weakly acyclic) sets of TGDs

users work on their local copies from time to time, they

  • publish their updates and
  • retrieve updates of other users

trust conditions on the mappings ⇒ need for provenance information

S.Skritek – Theory of PDM 41/54

slide-91
SLIDE 91

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Update Propagation

R( x) ∆R(d, x) User Actions:

  • Insert, Delete, Publish/Import

Maintain local edit log Answers over local database

  • consistent with local edit log
  • for imported updates:

certain answers

S.Skritek – Theory of PDM 42/54

slide-92
SLIDE 92

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Update Propagation

R( x) ∆R(d, x) ∆G(d, y) User Actions:

  • Insert, Delete, Publish/Import

Maintain local edit log Answers over local database

  • consistent with local edit log
  • for imported updates:

certain answers

S.Skritek – Theory of PDM 42/54

slide-93
SLIDE 93

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Update Propagation

R( x) ∆R(d, x) ∆G(d, y) User Actions:

  • Insert, Delete, Publish/Import

Maintain local edit log Answers over local database

  • consistent with local edit log
  • for imported updates:

certain answers

⇒ what data to materialize inconsistent updates: reconciliation algorithm (Taylor, Ives; Sigmod 2006)

  • resolve conflicts using priority mappings
  • user interaction if merging not possible

here: assume consistent updates

concentrate on what data to materialize

S.Skritek – Theory of PDM 42/54

slide-94
SLIDE 94

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Semantics of Update Exchange

R( x) ∆R(d, x)

S.Skritek – Theory of PDM 43/54

slide-95
SLIDE 95

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Semantics of Update Exchange

R( x) ∆R(d, x) Ro Ri Rr Rℓ Split every relation R:

  • Rℓ: local contributions

table

  • Rr: rejections table
  • Ri: input table
  • Ro: output table

S.Skritek – Theory of PDM 43/54

slide-96
SLIDE 96

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Semantics of Update Exchange

R( x) ∆R(d, x) Ro Ri Rr Rℓ Split every relation R:

  • Rℓ: local contributions

table

  • Rr: rejections table
  • Ri: input table
  • Ro: output table

Translate mappings Σ → Σ′:

  • for each m ∈ M: replace R

in lhs by Ro and in rhs by Ri

  • Ri(

x) ∧ ¬Rr( x) → Ro( x)

  • Rℓ(

x) → Ro( x)

S.Skritek – Theory of PDM 43/54

slide-97
SLIDE 97

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Semantics of Update Exchange (contd.)

Recall Σ′:

  • Ri(

x) ∧ ¬Rr( x) → Ro( x)

  • Rℓ(

x) → Ro( x)

  • M′: weakly acyclic TGDs

S.Skritek – Theory of PDM 44/54

slide-98
SLIDE 98

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Semantics of Update Exchange (contd.)

Recall Σ′:

  • Ri(

x) ∧ ¬Rr( x) → Ro( x)

  • Rℓ(

x) → Ro( x)

  • M′: weakly acyclic TGDs

Publish:

  • create new instance of Rr, Rℓ

Import:

  • recompute Ri, Ro (chase)

S.Skritek – Theory of PDM 44/54

slide-99
SLIDE 99

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Semantics of Update Exchange (contd.)

Recall Σ′:

  • Ri(

x) ∧ ¬Rr( x) → Ro( x)

  • Rℓ(

x) → Ro( x)

  • M′: weakly acyclic TGDs

Publish:

  • create new instance of Rr, Rℓ

Import:

  • recompute Ri, Ro (chase)

Definition (consistent system state)

Instance I, J over schema Rℓ ∪ Rr, Ro ∪ Ri is consistent if J = chaseΣ ′(I) computable in polynomial time (data complexity)

S.Skritek – Theory of PDM 44/54

slide-100
SLIDE 100

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Provenance

Need to track from where tuples are derived, and how Provenance Token base tuple: tuple id derived tuple: polynomial

  • binary operators +,·
  • unary function for each

mapping

S.Skritek – Theory of PDM 45/54

slide-101
SLIDE 101

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Provenance

Need to track from where tuples are derived, and how Provenance Token base tuple: tuple id derived tuple: polynomial

  • binary operators +,·
  • unary function for each

mapping

Example (Provenance Tokens)

Relations R1, R2, Mappings m1 : R1(A, B) → R2(A, B), m2 : R2(A, B) ∧ R1(B, C) → R2(A, C)

S.Skritek – Theory of PDM 45/54

slide-102
SLIDE 102

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Provenance

Need to track from where tuples are derived, and how Provenance Token base tuple: tuple id derived tuple: polynomial

  • binary operators +,·
  • unary function for each

mapping

Example (Provenance Tokens)

Relations R1, R2, Mappings m1 : R1(A, B) → R2(A, B), m2 : R2(A, B) ∧ R1(B, C) → R2(A, C) r1 : R1(a, b), r2 : R1(b, c), r3 : R1(a, c) r4 : R1(c, d)

S.Skritek – Theory of PDM 45/54

slide-103
SLIDE 103

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Provenance

Need to track from where tuples are derived, and how Provenance Token base tuple: tuple id derived tuple: polynomial

  • binary operators +,·
  • unary function for each

mapping

Example (Provenance Tokens)

Relations R1, R2, Mappings m1 : R1(A, B) → R2(A, B), m2 : R2(A, B) ∧ R1(B, C) → R2(A, C) r1 : R1(a, b), r2 : R1(b, c), r3 : R1(a, c) r4 : R1(c, d) Pv(R2(a, b)): m1(r1)

S.Skritek – Theory of PDM 45/54

slide-104
SLIDE 104

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Provenance

Need to track from where tuples are derived, and how Provenance Token base tuple: tuple id derived tuple: polynomial

  • binary operators +,·
  • unary function for each

mapping

Example (Provenance Tokens)

Relations R1, R2, Mappings m1 : R1(A, B) → R2(A, B), m2 : R2(A, B) ∧ R1(B, C) → R2(A, C) r1 : R1(a, b), r2 : R1(b, c), r3 : R1(a, c) r4 : R1(c, d) Pv(R2(a, b)): m1(r1) Pv(R2(a, c)): m1(r3)+m2(r2·m1(r1))

S.Skritek – Theory of PDM 45/54

slide-105
SLIDE 105

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Provenance

Need to track from where tuples are derived, and how Provenance Token base tuple: tuple id derived tuple: polynomial

  • binary operators +,·
  • unary function for each

mapping

Example (Provenance Tokens)

Relations R1, R2, Mappings m1 : R1(A, B) → R2(A, B), m2 : R2(A, B) ∧ R1(B, C) → R2(A, C) r1 : R1(a, b), r2 : R1(b, c), r3 : R1(a, c) r4 : R1(c, d) Pv(R2(a, b)): m1(r1) Pv(R2(a, c)): m1(r3)+m2(r2·m1(r1)) Pv(R2(a, d)): m2(r4 · (m1(r3) + m2(r2 · m1(r1)))

S.Skritek – Theory of PDM 45/54

slide-106
SLIDE 106

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Provenance

Need to track from where tuples are derived, and how Provenance Token base tuple: tuple id derived tuple: polynomial

  • binary operators +,·
  • unary function for each

mapping

Also possible: define provenance via provenance graph (omitted)

Example (Provenance Tokens)

Relations R1, R2, Mappings m1 : R1(A, B) → R2(A, B), m2 : R2(A, B) ∧ R1(B, C) → R2(A, C) r1 : R1(a, b), r2 : R1(b, c), r3 : R1(a, c) r4 : R1(c, d) Pv(R2(a, b)): m1(r1) Pv(R2(a, c)): m1(r3)+m2(r2·m1(r1)) Pv(R2(a, d)): m2(r4 · (m1(r3) + m2(r2 · m1(r1))) Infinitely many or arbitrarily large derivations ⇒ finitely representable

S.Skritek – Theory of PDM 45/54

slide-107
SLIDE 107

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Trust

Trust annotations T and D – Reject D

S.Skritek – Theory of PDM 46/54

slide-108
SLIDE 108

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Trust

Trust annotations T and D – Reject D Trust Conditions Define trust conditions ρi for mappings mi

  • e.g. trivial conditions T, D
  • more elaborate conditions like T if xi > 4, D otherwise

Assume every base tuple to be annotated with T, D Import data if ρi is satisfied and tuples are trusted

S.Skritek – Theory of PDM 46/54

slide-109
SLIDE 109

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Trust

Trust annotations T and D – Reject D Trust Conditions Define trust conditions ρi for mappings mi

  • e.g. trivial conditions T, D
  • more elaborate conditions like T if xi > 4, D otherwise

Assume every base tuple to be annotated with T, D Import data if ρi is satisfied and tuples are trusted Evaluate (finite) provenance expressions Identify T, D with boolean true, false, and +, · with ∨, ∧ Combine trust conditions on mappings by ∧ with arguments ⇒ Consider finite provenance expression as boolean equation

S.Skritek – Theory of PDM 46/54

slide-110
SLIDE 110

3.Materialization of Data in Peer Data Management 3.3. Orchestra

Trust

Trust annotations T and D – Reject D Trust Conditions Define trust conditions ρi for mappings mi

  • e.g. trivial conditions T, D
  • more elaborate conditions like T if xi > 4, D otherwise

Assume every base tuple to be annotated with T, D Import data if ρi is satisfied and tuples are trusted Evaluate (finite) provenance expressions Identify T, D with boolean true, false, and +, · with ∨, ∧ Combine trust conditions on mappings by ∧ with arguments ⇒ Consider finite provenance expression as boolean equation Encode trust in Σ′ add table Rt; change intern mappings to

  • Rt(

x) = trusted(Ri( x))

  • Rt(

x) ∧ ¬Rr( x) → Ro( x)

S.Skritek – Theory of PDM 46/54

slide-111
SLIDE 111

4.Optimization of Query Reformulation 4.0.

Outline

  • 1. Motivation
  • 2. Query Answering in Peer Data Management
  • 3. Materialization of Data in Peer Data Management
  • 4. Optimization of Query Reformulation
  • 5. Conclusion

S.Skritek – Theory of PDM 47/54

slide-112
SLIDE 112

4.Optimization of Query Reformulation 4.0.

Query Reformulation in Peer Data Integration

consider again query answering for PPL Query Reformulation Algorithm combination of LAV and GAV mappings for a query goal

  • unfolding if part of a GAV mapping
  • rewriting if part of a LAV mapping

follow semantic paths through the system create (special) rule-goal tree

S.Skritek – Theory of PDM 48/54

slide-113
SLIDE 113

4.Optimization of Query Reformulation 4.0.

Query Reformulation in Peer Data Integration

consider again query answering for PPL Query Reformulation Algorithm combination of LAV and GAV mappings for a query goal

  • unfolding if part of a GAV mapping
  • rewriting if part of a LAV mapping

follow semantic paths through the system create (special) rule-goal tree ⇒ prune the search tree

S.Skritek – Theory of PDM 48/54

slide-114
SLIDE 114

4.Optimization of Query Reformulation 4.0.

Query Reformulation in Peer Data Integration

consider again query answering for PPL Query Reformulation Algorithm combination of LAV and GAV mappings for a query goal

  • unfolding if part of a GAV mapping
  • rewriting if part of a LAV mapping

follow semantic paths through the system create (special) rule-goal tree ⇒ prune the search tree peers described by XML schemas mappings described as queries in a subset of XQuery

S.Skritek – Theory of PDM 48/54

slide-115
SLIDE 115

4.Optimization of Query Reformulation 4.0.

Pruning the Search Tree

possibilities for optimization Pruning reformulation goals

  • identify dead ends, redundancies

Minimizing reformulations

  • identify redundant subexpressions

Pre-computation of semantic paths

  • a priori optimization

Order of expansions (search strategy) Memorization Find first reformulations quickly

S.Skritek – Theory of PDM 49/54

slide-116
SLIDE 116

4.Optimization of Query Reformulation 4.0.

Pruning the Search Tree

possibilities for optimization Pruning reformulation goals ⇒ XML query containment

  • identify dead ends, redundancies

Minimizing reformulations

  • identify redundant subexpressions

Pre-computation of semantic paths

  • a priori optimization

Order of expansions (search strategy) Memorization Find first reformulations quickly

S.Skritek – Theory of PDM 49/54

slide-117
SLIDE 117

4.Optimization of Query Reformulation 4.0.

Pruning the Search Tree

possibilities for optimization Pruning reformulation goals ⇒ XML query containment

  • identify dead ends, redundancies

Minimizing reformulations ⇒ minimization of XML queries

  • identify redundant subexpressions

Pre-computation of semantic paths

  • a priori optimization

Order of expansions (search strategy) Memorization Find first reformulations quickly

S.Skritek – Theory of PDM 49/54

slide-118
SLIDE 118

4.Optimization of Query Reformulation 4.0.

Pruning the Search Tree

possibilities for optimization Pruning reformulation goals ⇒ XML query containment

  • identify dead ends, redundancies

Minimizing reformulations ⇒ minimization of XML queries

  • identify redundant subexpressions

Pre-computation of semantic paths ⇒ mapping composition

  • a priori optimization

Order of expansions (search strategy) Memorization Find first reformulations quickly

S.Skritek – Theory of PDM 49/54

slide-119
SLIDE 119

5.Conclusion 5.0.

Outline

  • 1. Motivation
  • 2. Query Answering in Peer Data Management
  • 3. Materialization of Data in Peer Data Management
  • 4. Optimization of Query Reformulation
  • 5. Conclusion

S.Skritek – Theory of PDM 50/54

slide-120
SLIDE 120

5.Conclusion 5.1. Conclusion

Conclusion

Theory of Peer Data Management considering PDM: interesting questions and results Summary

  • Peer Data Integration

global FO theory or “modular” semantics

  • Data Exchange in Peer Data Management

exchange certain answers AXML (service invocations, rewritings, query answering) update exchange (including trust, provenance)

Further Results

  • Trust, Priorities, Preferences
  • (In)consistency handling
  • Updates
  • . . .

S.Skritek – Theory of PDM 51/54

slide-121
SLIDE 121

5.Conclusion 5.2. References

References I

  • S. Abiteboul, O. Benjelloun, and T. Milo.

Positive active xml. In PODS, pages 35–45, 2004.

  • D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati.

Inconsistency tolerance in p2p data integration: An epistemic logic approach.

  • Inf. Syst., 33(4-5):360–384, 2008.
  • D. Calvanese, G. D. Giacomo, M. Lenzerini, and R. Rosati.

Logical foundations of peer-to-peer data integration. In PODS, pages 241–251, 2004.

  • G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati.

On reconciling data exchange, data integration, and peer data management. In PODS, pages 133–142, 2007.

  • T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen.

Update exchange with mappings and provenance. In VLDB, pages 675–686, 2007.

  • T. J. Green, G. Karvounarakis, and V. Tannen.

Provenance semirings. In PODS, pages 31–40, 2007.

S.Skritek – Theory of PDM 52/54

slide-122
SLIDE 122

5.Conclusion 5.2. References

References II

  • A. Y. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov.

Schema mediation for large-scale semantic data sharing. VLDB J., 14(1):68–83, 2005.

  • I. Tatarinov and A. Y. Halevy.

Efficient query reformulation in peer-data management systems. In SIGMOD Conference, pages 539–550. ACM, 2004.

S.Skritek – Theory of PDM 53/54

slide-123
SLIDE 123

5.Conclusion 5.3. Thanks!

Thank you!

S.Skritek – Theory of PDM 54/54