Integrity checking for combined databases Davide Martinenghi - - PowerPoint PPT Presentation

integrity checking for combined databases
SMART_READER_LITE
LIVE PREVIEW

Integrity checking for combined databases Davide Martinenghi - - PowerPoint PPT Presentation

CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003 Integrity checking for combined databases Davide Martinenghi Computer Science, building 42.1 Roskilde University Universitetsvej 1 P.O. Box 260 DK-4000


slide-1
SLIDE 1

Davide Martinenghi

CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003 Computer Science, building 42.1 Roskilde University Universitetsvej 1 P.O. Box 260 DK-4000 Roskilde Denmark Phone: +45 4674 2000 Fax: +45 4674 3072 www.dat.ruc.dk

Integrity checking for combined databases

slide-2
SLIDE 2

Davide Martinenghi 2 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Content

  • Description of the problem
  • A simplification framework
  • GaV and LaV mappings
  • Application to data integration
  • Examples
slide-3
SLIDE 3

Davide Martinenghi 3 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Description of the problem

  • ICs are properties of the DB that must

always hold

  • The integrity must be checked wrt the ICs

after every update (typically tested in an ad hoc way at the application level)

  • In a data integration system, it’s the same
  • Idea: generate specialized versions of the

ICs to be automatically executed

  • For expected kinds of updates
  • Assuming the integrity before the update
  • Generalize this technique to data integration

systems

slide-4
SLIDE 4

Davide Martinenghi 4 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

A simplification framework

  • 1. Produce a weakest precondition

Ex: ϕ=←p(x) U=p(a) AfterU(ϕ)=←(p(x)∨x=a)

  • 2. Use the fact that ϕ was known to hold

before the update (Cond. Weak. Prec.).

  • 3. Take the weakest CWP.

DEF: SimpU(ϕ) =Weakenϕ(AfterU(ϕ))

A condition about the updated state that can be checked in the present state

slide-5
SLIDE 5

Davide Martinenghi 5 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

A simplification framework - Example

ϕ = ← m(x, y) ∧ m(x, z)∧ y ≠ z checked by posing it as a query against the DB and expecting an empty answer U = m(Bob, Alice) simpU(ϕ) = ← m(Bob, y) ∧ y ≠ Alice

slide-6
SLIDE 6

Davide Martinenghi 6 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Mappings

  • Mapping = a way to associate n local DBs to

a global DB

  • GaV mapping = the global DB is expressed

as a set of views over the local sources.

  • LaV mapping = the local DBs are expressed

as a set of views over the global DB.

  • We assume:
  • sound mappings (the views produce only but not

necessarily all correct information)

  • no existential quantifier in LaV mappings
  • ⇒ LaV mappings can be rewritten as GaV

mappings without skolemization

slide-7
SLIDE 7

Davide Martinenghi 7 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Mappings - example

LaV mapping L = { m1(x, y) → m(x, y) ∧ n(x, it), m2(x, y) → m(x, y) ∧ n(x, dk) } GaV mapping ML = { m(x, y) ← m1(x, y), m(x, y) ← m2(x, y), n(x, y) ← m1(x, z) ∧ y=it, n(x, y) ← m2(x, z) ∧ y=dk }

slide-8
SLIDE 8

Davide Martinenghi 8 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Application to data integration

  • AfterM(ϕ) is a weakest precondition

M is a GaV mapping

  • SimpO∆(ϕ) =Weaken∆(AfterO(ϕ))

A condition about the global DB that can be checked on the local DBs

Conditions to check globally Conditions known to hold locally

slide-9
SLIDE 9

Davide Martinenghi 9 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Example 1

ϕ = ← m(x, y) ∧ m(x, z)∧ y ≠ z ϕ1 = ← m1(x, y) ∧ m1 (x, z)∧ y ≠ z ϕ2 = ← m2(x, y) ∧ m2 (x, z)∧ y ≠ z Check: { ← m1(x, y) ∧ m1 (x, z)∧ y ≠ z, ← m1(x, y) ∧ m2 (x, z)∧ y ≠ z, ← m2(x, y) ∧ m1 (x, z)∧ y ≠ z, ← m2(x, y) ∧ m2 (x, z)∧ y ≠ z } …

slide-10
SLIDE 10

Davide Martinenghi 10 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Example 1

ϕ = ← m(x, y) ∧ m(x, z)∧ y ≠ z ϕ1 = ← m1(x, y) ∧ m1 (x, z)∧ y ≠ z ϕ2 = ← m2(x, y) ∧ m2 (x, z)∧ y ≠ z Check: { ← m1(x, y) ∧ m1 (x, z)∧ y ≠ z, ← m1(x, y) ∧ m2 (x, z)∧ y ≠ z, ← m2(x, y) ∧ m1 (x, z)∧ y ≠ z, ← m2(x, y) ∧ m2 (x, z)∧ y ≠ z } = SimpM

ϕ1∧ ϕ2(ϕ)

slide-11
SLIDE 11

Davide Martinenghi 11 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Example 1

ϕ = ← m(x, y) ∧ m(x, z)∧ y ≠ z ϕ1 = ← m1(x, y) ∧ m1 (x, z)∧ y ≠ z ϕ2 = ← m2(x, y) ∧ m2 (x, z)∧ y ≠ z Check: { ← m1(x, y) ∧ m1 (x, z)∧ y ≠ z, ← m1(x, y) ∧ m2 (x, z)∧ y ≠ z, ← m2(x, y) ∧ m1 (x, z)∧ y ≠ z, ← m2(x, y) ∧ m2 (x, z)∧ y ≠ z } = SimpM

ϕ1 ∧ ϕ2 ∧ ϕ1,2(ϕ)

If we knew ϕ1,2 = ← m1(x, y) ∧ m2 (x, z)

slide-12
SLIDE 12

Davide Martinenghi 12 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Example 2

M={ f(i,t,r)←m(i,t,y)∧r(i,r)} ϕ1 ={← m(i,t1,y1) ∧ m(i,t2,y2) ∧ t1 ≠ t2, ← m(i,t1,y1) ∧ m(i,t2,y2) ∧ y1 ≠ y2} ϕ1,2 = ← r(i, r) ∧ ¬m(i,t,y) ϕ ={ ← f(i,t1,r1) ∧ f(i,t2,r2) ∧ t1 ≠ t2, ← f(i,t1,r1) ∧ f(i,t2,r2) ∧ r1 ≠ r2} Global check: {

← m(i,t1,y1) ∧ r(i, r1) ∧ m(i,t2,y2) ∧ r(i, r2) ∧ t1 ≠ t2, ← m(i,t1,y1) ∧ r(i, r1) ∧ m(i,t2,y2) ∧ r(i, r2) ∧ r1 ≠ r2}

slide-13
SLIDE 13

Davide Martinenghi 13 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Example 2

M={ f(i,t,r)←m(i,t,y)∧r(i,r)} ϕ1 ={← m(i,t1,y1) ∧ m(i,t2,y2) ∧ t1 ≠ t2, ← m(i,t1,y1) ∧ m(i,t2,y2) ∧ y1 ≠ y2} ϕ1,2 = ← r(i, r) ∧ ¬m(i,t,y) ϕ ={ ← f(i,t1,r1) ∧ f(i,t2,r2) ∧ t1 ≠ t2, ← f(i,t1,r1) ∧ f(i,t2,r2) ∧ r1 ≠ r2} Global check: {

← m(i,t1,y1) ∧ r(i, r1) ∧ m(i,t2,y2) ∧ r(i, r2) ∧ t1 ≠ t2, ← m(i,t1,y1) ∧ r(i, r1) ∧ m(i,t2,y2) ∧ r(i, r2) ∧ r1 ≠ r2}

= SimpM

ϕ1∧ ϕ1,2(ϕ)

slide-14
SLIDE 14

Davide Martinenghi 14 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

Summary

  • Express the data integration in terms of a

GaV-mapping

  • Reformulate the condition to check in terms
  • f the sources by calculating a weakest

precondition wrt the mapping

  • Remove from it all conditions known to hold

locally (plus possible cross-conditions)

slide-15
SLIDE 15

Davide Martinenghi 15 CoLogNET Workshop on Logic-Based Methods for Information Integration - 23 August 2003

References

  • 1. H. Christiansen, D. Martinenghi

Simplification of integrity constraints for data integration Submitted to FoIKS '04

  • 2. H. Christiansen, D. Martinenghi

Simplification of database integrity constraints revisited: A transformational approach Accepted for presentation at LOPSTR '03 http://www.dat.ruc.dk/~dm/publications