Analyzing, Comparing and Debugging Schema Mappings Emanuel - - PowerPoint PPT Presentation

analyzing comparing and debugging schema mappings
SMART_READER_LITE
LIVE PREVIEW

Analyzing, Comparing and Debugging Schema Mappings Emanuel - - PowerPoint PPT Presentation

Analyzing, Comparing and Debugging Schema Mappings Emanuel Sallinger Vienna University of Technology Institute of Information Systems Database and Artificial Intelligence Group DEIS10 11 November, 2010 Emanuel Sallinger DEIS10 11


slide-1
SLIDE 1

Analyzing, Comparing and Debugging Schema Mappings

Emanuel Sallinger

Vienna University of Technology Institute of Information Systems Database and Artificial Intelligence Group

DEIS’10

11 November, 2010

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 1

slide-2
SLIDE 2

Outline

Given a schema mapping M ... S T Σ

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 2

slide-3
SLIDE 3

Outline

Given a schema mapping M ... S T Σ What does it do? (analyzing) Are there any errors in it? (debugging)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 2

slide-4
SLIDE 4

Outline

Given a schema mapping M ... S T Σ What does it do? (analyzing) Are there any errors in it? (debugging) Is there a better one . . . (comparing/optimizing)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 2

slide-5
SLIDE 5

Outline

Given a schema mapping M ... S T Σ What does it do? (analyzing) Are there any errors in it? (debugging) Is there a better one . . . (comparing/optimizing)

. . . that is equivalent?

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 2

slide-6
SLIDE 6

Outline

Given a schema mapping M ... S T Σ What does it do? (analyzing) Are there any errors in it? (debugging) Is there a better one . . . (comparing/optimizing)

. . . that is equivalent? . . . that is equivalent for specific purposes, e.g. data exchange?

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 2

slide-7
SLIDE 7

Outline

Given a schema mapping M ... S T Σ What does it do? (analyzing) Are there any errors in it? (debugging) Is there a better one . . . (comparing/optimizing)

. . . that is equivalent? . . . that is equivalent for specific purposes, e.g. data exchange? What about other comparison criteria?

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 2

slide-8
SLIDE 8

Debugging with Routes

Manhattan Credit Cards SuppCards Fargo Finance Accounts Clients Σ

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 3

slide-9
SLIDE 9

Debugging with Routes

Manhattan Credit Cards SuppCards Fargo Finance Accounts Clients Σ σ1: Cards(cn, l, s, n, m, sal, loc) → ∃A (Accounts(cn, l, s) ∧ Clients(s, m, m, sal, A)) σ2: SuppCards(an, s, n, a) → ∃M, I Clients(s, n, M, I, a) σ3: Accounts(a, l, s) → ∃N, M, I, A Clients(s, N, M, I, A) σ4: Clients(s, n, m, i, a) → ∃N, L Accounts(N, L, s) σ5: Accounts(a, I, s) ∧ Accounts(a′, I ′, s) → I = I ′

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 3

slide-10
SLIDE 10

Debugging with Routes

Cards Accounts cardNo accNo limit limit ssn accHolder name mName Clients salary ssn location name mName SuppCards income accNo address ssn name address

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 4

slide-11
SLIDE 11

Debugging with Routes

Cards Accounts cardNo accNo limit limit ssn accHolder name mName Clients salary ssn location name mName SuppCards income accNo address ssn name address

σ1: Cards(cn, l, s, n, m, sal, loc) → ∃A (Accounts(cn, l, s) ∧ Clients(s, m, m, sal, A) σ2: SuppCards(an, s, n, a) → ∃M, I Clients(s, n, M, I, a)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 4

slide-12
SLIDE 12

Debugging with Routes

I s1, s2 J s1: Cards(6689, 15K, 434, J.Long, Smith, 50K, Seattle) s2: SuppCards(6689, 234, A.Long, California)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 5

slide-13
SLIDE 13

Debugging with Routes

I s1, s2 J t1, t2, t3, t4 s1: Cards(6689, 15K, 434, J.Long, Smith, 50K, Seattle) s2: SuppCards(6689, 234, A.Long, California) t1: Accounts(6689, 15K, 434) t2: Accounts(N1, 50K, 234) t3: Clients(434, Smith, Smith, 50K, A1) t4: Clients(234, A.Long, M1, I1, California)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 5

slide-14
SLIDE 14

Debugging with Routes

I s1, s2 J t1, t2, t3, t4 s1: Cards(6689, 15K, 434, J.Long, Smith, 50K, Seattle) s2: SuppCards(6689, 234, A.Long, California) t1: Accounts(6689, 15K, 434) t2: Accounts(N1, 50K, 234) t3: Clients(434, Smith, Smith, 50K, A1) t4: Clients(234, A.Long, M1, I1, California)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 5

slide-15
SLIDE 15

Debugging with Routes

{s1}, ∅ {s1}, {t1, t3} σ1, h s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards(cn, l, s, n, m, sal, loc) → ∃A (Accounts(cn, l, s) ∧ Clients(s, m, m, sal, A) h: {cn → 6689, l → 15K, s → 434, n → J.Long, m → Smith, sal → 50K, loc → Seattle, A → A1} t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 6

slide-16
SLIDE 16

Debugging with Routes

K1 K2 σ, h

Definition [CT06]

A satisfaction step is given as K1

σ,h

− − → K2 K1 is an instance such that K1 ⊆ K and K satisfies σ σ is a tgd ϕ( x) → ∃ y ψ( x, y) h is a homomorphism from ϕ( x) ∧ ψ( x, y) to K such that h is also a homomorphism from ϕ( x) to K1 K2 is the result of satisfying σ on K1 with homomorphism h, where K2 = K1 ∪ h(ψ( x, y))

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 7

slide-17
SLIDE 17

Debugging with Routes

{s1}, ∅ {s1}, {t1, t3} σ1, h s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards(cn, l, s, n, m, sal, loc) → ∃A (Accounts(cn, l, s) ∧ Clients(s, m, m, sal, A) h: {cn → 6689, l → 15K, s → 434, n → J.Long, m → Smith, sal → 50K, loc → Seattle, A → A1} t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 8

slide-18
SLIDE 18

Debugging with Routes

{s1}, ∅ {s1}, {t1, t3} σ1, h s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards(cn, l, s, n, m, sal, loc) → ∃A (Accounts(cn, l, s) ∧ Clients(s, m, m, sal, A) h: {cn → 6689, l → 15K, s → 434, n → J.Long, m → Smith, sal → 50K, loc → Seattle, A → A1} t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 9

slide-19
SLIDE 19

Debugging with Routes

Cards Accounts cardNo accNo limit limit ssn accHolder name mName Clients salary ssn location name mName SuppCards income accNo address ssn name address

σ1: Cards(cn, l, s, n, m, sal, loc) → ∃A (Accounts(cn, l, s) ∧ Clients(s, m, m, sal, A)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 10

slide-20
SLIDE 20

Debugging with Routes

Cards Accounts cardNo accNo limit limit ssn accHolder name mName Clients salary ssn location name mName SuppCards income accNo address ssn name address

σ′

1: Cards(cn, l, s, n, m, sal, loc) →

(Accounts(cn, l, s) ∧ Clients(s, n, m, sal, loc))

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 10

slide-21
SLIDE 21

Debugging with Routes

I s1, s2 J t1, t2, t3, t4 s1: Cards(6689, 15K, 434, J.Long, Smith, 50K, Seattle) s2: SuppCards(6689, 234, A.Long, California) t1: Accounts(6689, 15K, 434) t2: Accounts(N1, 50K, 234) t3: Clients(434, Smith, Smith, 50K, A1) t4: Clients(234, A.Long, M1, I1, California)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 11

slide-22
SLIDE 22

Debugging with Routes

I s1, s2 J t1, t2, t3, t4 s1: Cards(6689, 15K, 434, J.Long, Smith, 50K, Seattle) s2: SuppCards(6689, 234, A.Long, California) t1: Accounts(6689, 15K, 434) t2: Accounts(N1, 50K, 234) t3: Clients(434, Smith, Smith, 50K, A1) t4: Clients(234, A.Long, M1, I1, California)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 11

slide-23
SLIDE 23

Debugging with Routes

I, ∅ I, {t4} I, {t4, t2} σ2, h σ3, h′ s2: SuppCards(6689, 234, A.Long, California) σ2: SuppCards(an, s, n, a) → ∃M, I Clients(s, n, M, I, a) t4: Clients(234, A.Long, M1, I1, California) σ4: Clients(s, n, m, i, a) → ∃N, L Accounts(N, L, s) t2: Accounts(N1, 50K, 234)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 12

slide-24
SLIDE 24

Debugging with Routes

I, ∅ I, J1 I, Jn σ1, h1 . . . σn, hn

Definition [CT06]

A route for Js with M, I and J is a sequence of satisfaction steps (I, ∅)

σ1,h1

− − − → (I, J1) . . .

σn,hn

− − − → (I, Jn) where J is a solution of I under M Ji ⊆ J and σi are from M Js ⊆ Jn

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 13

slide-25
SLIDE 25

Debugging with Routes

I, ∅ I, {t4} I, {t4, t2} σ2, h σ3, h′ s2: SuppCards(6689, 234, A.Long, California) σ2: SuppCards(an, s, n, a) → ∃M, I Clients(s, n, M, I, a) t4: Clients(234, A.Long, M1, I1, California) σ4: Clients(s, n, m, i, a) → ∃N, L Accounts(N, L, s) t2: Accounts(N1, 50K, 234)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 14

slide-26
SLIDE 26

Debugging with Routes

I, ∅ I, {t4} I, {t4, t2} σ2, h σ3, h′ s2: SuppCards(6689, 234, A.Long, California) σ2: SuppCards(an, s, n, a) → ∃M, I Clients(s, n, M, I, a) t4: Clients(234, A.Long, M1, I1, California) σ4: Clients(s, n, m, i, a) → ∃N, L Accounts(N, L, s) t2: Accounts(N1, 50K, 234)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 14

slide-27
SLIDE 27

Debugging with Routes

I, ∅ I, {t4} I, {t4, t2} σ2, h σ3, h′ s2: SuppCards(6689, 234, A.Long, California) σ2: SuppCards(an, s, n, a) → ∃M, I Clients(s, n, M, I, a) t4: Clients(234, A.Long, M1, I1, California) σ4: Clients(s, n, m, i, a) → ∃N, L Accounts(N, L, s) t2: Accounts(N1, 50K, 234)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 14

slide-28
SLIDE 28

Debugging with Routes

I, ∅ I, {t4} I, {t4, t2} σ2, h σ3, h′ s1: Cards(6689, 15K, 434, J.Long, Smith, 50K, Seattle) s2: SuppCards(6689, 234, A.Long, California) σ′

2: Cards(cn, l, s1, n1, m, sal, loc) ∧ SuppCards(cn, s2, n2, a) →

∃M, I (Clients(s2, n2, M, I, a) ∧ Accounts(cn, l, s2) t′

2: Accounts(6689, 15K, 234)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 15

slide-29
SLIDE 29

Computing Routes

In general, a single route is not sufficient for analyzing and debugging.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 16

slide-30
SLIDE 30

Computing Routes

In general, a single route is not sufficient for analyzing and debugging.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 16

slide-31
SLIDE 31

Computing Routes

In general, a single route is not sufficient for analyzing and debugging.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 16

slide-32
SLIDE 32

Computing Routes

In general, a single route is not sufficient for analyzing and debugging.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 16

slide-33
SLIDE 33

Computing Routes

In general, a single route is not sufficient for analyzing and debugging.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 16

slide-34
SLIDE 34

Computing Routes

In general, a single route is not sufficient for analyzing and debugging.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 16

slide-35
SLIDE 35

Computing Routes

In general, a single route is not sufficient for analyzing and debugging.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 16

slide-36
SLIDE 36

Computing Routes

{s1}, ∅ {s1}, {t1, t3} σ1, h s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards( cn , l , s , n , m , sal , loc ) → ∃A (Accounts( cn , l , s ) ∧ Clients( s , m , m , sal , A ) t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 17

slide-37
SLIDE 37

Computing Routes

s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards( cn , l , s , n , m , sal , loc ) → ∃A (Accounts( cn , l , s ) ∧ Clients( s , m , m , sal , A ) t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1) h:

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 18

slide-38
SLIDE 38

Computing Routes

s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards( cn , l , s , n , m , sal , loc ) → ∃A (Accounts( cn , l , s ) ∧ Clients( s , m , m , sal , A ) t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1) h: {cn → 6689, l → 15K, s → 434,

1 Map an atom from ψ(

x, y) to t. Add it to h.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 18

slide-39
SLIDE 39

Computing Routes

s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards( cn , l , s , n , m , sal , loc ) → ∃A (Accounts( cn , l , s ) ∧ Clients( s , m , m , sal , A ) t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1) h: {cn → 6689, l → 15K, s → 434,

1 Map an atom from ψ(

x, y) to t. Add it to h.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 18

slide-40
SLIDE 40

Computing Routes

s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards( cn , l , s , n , m , sal , loc ) → ∃A (Accounts( cn , l , s ) ∧ Clients( s , m , m , sal , A ) t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1) h: {cn → 6689, l → 15K, s → 434, n → J.Long, m → Smith, sal → 50K, loc → Seattle,

2 Map ϕ(

x)h to I/J. Add it to h.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 19

slide-41
SLIDE 41

Computing Routes

s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards( cn , l , s , n , m , sal , loc ) → ∃A (Accounts( cn , l , s ) ∧ Clients( s , m , m , sal , A ) t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1) h: {cn → 6689, l → 15K, s → 434, n → J.Long, m → Smith, sal → 50K, loc → Seattle,

2 Map ϕ(

x)h to I/J. Add it to h.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 19

slide-42
SLIDE 42

Computing Routes

s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards( cn , l , s , n , m , sal , loc ) → ∃A (Accounts( cn , l , s ) ∧ Clients( s , m , m , sal , A ) t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1) h: {cn → 6689, l → 15K, s → 434, n → J.Long, m → Smith, sal → 50K, loc → Seattle, A → A1}

3 Map ψ(

x, y)h to J. Add it to h.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 20

slide-43
SLIDE 43

Computing Routes

s1: Cards(6689, 15K, 434, J. Long, Smith, 50K, Seattle) σ1: Cards( cn , l , s , n , m , sal , loc ) → ∃A (Accounts( cn , l , s ) ∧ Clients( s , m , m , sal , A ) t1: Accounts(6689, 15K, 434) t3: Clients(434, Smith, Smith, 50K, A1) h: {cn → 6689, l → 15K, s → 434, n → J.Long, m → Smith, sal → 50K, loc → Seattle, A → A1}

4 return h

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 21

slide-44
SLIDE 44

Computing Routes

Algorithm [CT06] (sketch)

FindHom(I, J, t, σ)

1 Map an atom from ψ(

x, y) to t. Add it to h.

2 Map ϕ(

x)h to I/J. Add it to h.

3 Map ψ(

x, y)h to J. Add it to h.

4 return h

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 22

slide-45
SLIDE 45

Outline

S T Σ Analyzing and Debugging

– Debugging with Routes – Computing Routes

Optimizing with Logical Equivalence Optimizing with Relaxed Notions of Equivalence Comparing Schema Mappings

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 23

slide-46
SLIDE 46

Outline

S T Σ Analyzing and Debugging

– Debugging with Routes – Computing Routes

Optimizing with Logical Equivalence Optimizing with Relaxed Notions of Equivalence Comparing Schema Mappings

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 23

slide-47
SLIDE 47

Comparing and Optimizing

Optimization

Finding a “better” schema mapping that is still “equivalent”. S T S T Σ Σ′

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 24

slide-48
SLIDE 48

Comparing and Optimizing

Optimization

Finding a “better” schema mapping that is still “equivalent”. S T

≡log

S T Σ Σ′

Definition [FKNP08]

M and M′ are logically equivalent, if for every instance (I, J) (I, J) M ⇔ (I, J) M′

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 24

slide-49
SLIDE 49

Optimality Criteria

S Lecture(title, year, prof ) T Course(title, prof -area) Equal-Year(course1, course2) Σ

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 25

slide-50
SLIDE 50

Optimality Criteria

S Lecture(title, year, prof ) T Course(title, prof -area) Equal-Year(course1, course2) Σ

Example

L(x1, x2, x3) → ∃ y1 , y2 C(y1, y2) ∧ C(x1, y2) L(x1, x2, x3) ∧ L(x4, x5, x6) → ∃y2 C(x1, y2)

Optimality Criteria

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 25

slide-51
SLIDE 51

Optimality Criteria

S Lecture(title, year, prof ) T Course(title, prof -area) Equal-Year(course1, course2) Σ

Example

L(x1, x2, x3) → ∃ y1 , y2 C(y1, y2) ∧ C(x1, y2) L(x1, x2, x3) ∧ L(x4, x5, x6) → ∃y2 C(x1, y2)

Optimality Criteria

Minimize the number of atoms in each conclusion

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 25

slide-52
SLIDE 52

Optimality Criteria

S Lecture(title, year, prof ) T Course(title, prof -area) Equal-Year(course1, course2) Σ

Example

L(x1, x2, x3) → ∃ y1 , y2 C(y1, y2) ∧ C(x1, y2) L(x1, x2, x3) ∧ L(x4, x5, x6) → ∃y2 C(x1, y2)

Optimality Criteria

Minimize the number of atoms in each conclusion Minimize the number of existentially quantified variables

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 25

slide-53
SLIDE 53

Optimality Criteria

S Lecture(title, year, prof ) T Course(title, prof -area) Equal-Year(course1, course2) Σ

Example

L(x1, x2, x3) → ∃ y1 , y2 C(y1, y2) ∧ C(x1, y2) L(x1, x2, x3) ∧ L(x4, x5, x6) → ∃y2 C(x1, y2)

Optimality Criteria

Minimize the number of atoms in each conclusion Minimize the number of existentially quantified variables Minimize the number of atoms in each antecedent

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 25

slide-54
SLIDE 54

Optimality Criteria

S Lecture(title, year, prof ) T Course(title, prof -area) Equal-Year(course1, course2) Σ

Example

L(x1, x2, x3) → ∃ y1 , y2 C(y1, y2) ∧ C(x1, y2) L(x1, x2, x3) ∧ L(x4, x5, x6) → ∃y2 C(x1, y2)

Optimality Criteria

Minimize the number of atoms in each conclusion Minimize the number of existentially quantified variables Minimize the number of atoms in each antecedent Minimize the number of dependencies

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 25

slide-55
SLIDE 55

Optimality Criteria

S Lecture(title, year, prof ) T Course(title, prof -area) Equal-Year(course1, course2) Σ

Example

Consider the set Σ of s-t tgds: L(x1, x2, x3) → ∃y C(x1, y) L(x1, x2, x3) ∧ L(x4, x2, x5) → E(x1, x4)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 26

slide-56
SLIDE 56

Optimality Criteria

S Lecture(title, year, prof ) T Course(title, prof -area) Equal-Year(course1, course2) Σ

Example

Consider the set Σ of s-t tgds: L(x1, x2, x3) → ∃y C(x1, y) L(x1, x2, x3) ∧ L(x4, x2, x5) → E(x1, x4) Equivalent set of s-t tgds Σ′: L(x1, x2, x3) ∧ L(x4, x2, x5) → ∃y C(x1, y) ∧ E(x1, x4)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 26

slide-57
SLIDE 57

Example (continued)

Consider the set Σ of s-t tgds: L(x1, x2, x3) → ∃y C(x1, y) L(x1, x2, x3) ∧ L(x4, x2, x5) → E(x1, x4) Equivalent set of s-t tgds Σ′: L(x1, x2, x3) ∧ L(x4, x2, x5) → ∃y C(x1, y) ∧ E(x1, x4)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 27

slide-58
SLIDE 58

Example (continued)

Consider the set Σ of s-t tgds: L(x1, x2, x3) → ∃y C(x1, y) L(x1, x2, x3) ∧ L(x4, x2, x5) → E(x1, x4) Equivalent set of s-t tgds Σ′: L(x1, x2, x3) ∧ L(x4, x2, x5) → ∃y C(x1, y) ∧ E(x1, x4)

Observation

Canonical universal solution: for Σ: one tuple in the C-relation per tuple in the L-relation for Σ′: in total, quadratically many tuples in the C-relation

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 27

slide-59
SLIDE 59

Example (continued)

Consider the set Σ of s-t tgds: L(x1, x2, x3) → ∃y C(x1, y) L(x1, x2, x3) ∧ L(x4, x2, x5) → E(x1, x4) Equivalent set of s-t tgds Σ′: L(x1, x2, x3) ∧ L(x4, x2, x5) → ∃y C(x1, y) ∧ E(x1, x4)

Observation

Canonical universal solution: for Σ: one tuple in the C-relation per tuple in the L-relation for Σ′: in total, quadratically many tuples in the C-relation

Optimality Criteria

Splitting should be applied whenever possible.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 27

slide-60
SLIDE 60

Optimality Criteria

Optimality Criteria

Splitting: Splitting should be applied whenever possible.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 28

slide-61
SLIDE 61

Optimality Criteria

Optimality Criteria

Splitting: Splitting should be applied whenever possible. Optimization goals: cardinality-minimality: the number of dependencies shall be minimal antecedent-minimality: the total size of the antecedents shall be minimal conclusion-minimality: the total size of the conclusions shall be minimal variable-minimality: the total number of existentially quantified variables shall be minimal

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 28

slide-62
SLIDE 62

Optimizing Schema Mappings

Rewrite System for s-t tgds [GPS09]

1 Simplification of the conclusion (core computation) 2 Simplification of the antecedent (core computation) 3 Splitting 4 Deletion of an s-t tgd (implication test) 5 Simplification of the conclusion using other tgds (implication test)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 29

slide-63
SLIDE 63

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) ∧ R(y1, x2, y2) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) ∧ R(x2, y4, x3) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 30

slide-64
SLIDE 64

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) ∧ R(y1, x2, y2) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) ∧ R(x2, y4, x3) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

1 Simplification of the conclusion (core computation)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 30

slide-65
SLIDE 65

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) ∧ R(x2, y4, x3) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 31

slide-66
SLIDE 66

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) ∧ R(x2, y4, x3) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 31

slide-67
SLIDE 67

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) ∧ R(x2, y4, x3) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 30

slide-68
SLIDE 68

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) ∧ R(x2, y4, x3) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

3 Splitting

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 30

slide-69
SLIDE 69

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

3 Splitting

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 30

slide-70
SLIDE 70

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 31

slide-71
SLIDE 71

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) ∧ L(x1, x2, x3) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

2 Simplification of the antecedent (core computation)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 31

slide-72
SLIDE 72

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 32

slide-73
SLIDE 73

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) → P(x1, y2, y1) ∧ Q(y1, y3, x2) ∧ Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

5 Simplification of the conclusion using other tgds (implication test)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 32

slide-74
SLIDE 74

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) → Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 33

slide-75
SLIDE 75

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) → Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

4 Deletion of an s-t tgd (implication test)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 33

slide-76
SLIDE 76

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x1, x1) → P(x1, y1, y2) ∧ Q(y2, y3, x1) ∧ R(y1, x1, y2) L(x1, x2, x2) → Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 33

slide-77
SLIDE 77

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x2, x2) → Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 34

slide-78
SLIDE 78

Optimizing Schema Mappings

L(x1, x2, x3) → P(x1, y1, 3) ∧ R(y1, x2, 3) L(x1, x2, x2) → Q(3, y3, x2) L(x1, x2, x2) ∧ L(x1, x2, x3) → R(x2, y4, x3)

Result of rewrite system

Is among all logically equivalent split-reduced mappings cardinality/antecedent/conclusion/variable-minimal Is a unique normal form

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 34

slide-79
SLIDE 79

Outline

S T Σ Analyzing and Debugging

– Debugging with Routes – Computing Routes

Optimizing with Logical Equivalence

– Optimality Criteria – Optimization and Normalization

Optimizing with Relaxed Notions of Equivalence Comparing Schema Mappings

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 35

slide-80
SLIDE 80

Outline

S T Σ Analyzing and Debugging

– Debugging with Routes – Computing Routes

Optimizing with Logical Equivalence

– Optimality Criteria – Optimization and Normalization

Optimizing with Relaxed Notions of Equivalence Comparing Schema Mappings

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 35

slide-81
SLIDE 81

Equivalence

S T

≡log

S T Σ Σ′

Definition [FKNP08]

M and M′ are logically equivalent, if for every source instance I Sol(I, M) = Sol(I, M′)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 36

slide-82
SLIDE 82

Equivalence

S T S T Σ Σ′ S(x) → T(x) T ′(x, y) → T ′(y, x) S(x) → T(x)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 37

slide-83
SLIDE 83

Equivalence

S T ≡log S T Σ Σ′ S(x) → T(x) T ′(x, y) → T ′(y, x) S(x) → T(x)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 37

slide-84
SLIDE 84

Equivalence

S T ≡log S T Σ Σ′ S(x) → T(x) T ′(x, y) → T ′(y, x) S(x) → T(x)

Observation

If we are interested in typical data exchange, i.e. the universal solutions, M′ is “just as good as” M, and has smaller cardinality.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 37

slide-85
SLIDE 85

Relaxed Notions of Equivalence

DE equivalence

Data-exchange (DE) equivalence does not distinguish mappings which behave in the same way for data exchange. S T

≡DE

S T Σ Σ′

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 38

slide-86
SLIDE 86

Relaxed Notions of Equivalence

DE equivalence

Data-exchange (DE) equivalence does not distinguish mappings which behave in the same way for data exchange. S T

≡DE

S T Σ Σ′

Definition [FKNP08]

M and M′ are data-exchange equivalent, if for every source instance I UnivSol(I, M) = UnivSol(I, M′)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 38

slide-87
SLIDE 87

Relaxed Notions of Equivalence

CQ equivalence

Conjunctive-query (CQ) equivalence does not distinguish mappings which behave similarly for answering conjunctive queries. S T

≡CQ

S T Σ Σ′

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 39

slide-88
SLIDE 88

Relaxed Notions of Equivalence

CQ equivalence

Conjunctive-query (CQ) equivalence does not distinguish mappings which behave similarly for answering conjunctive queries. S T

≡CQ

S T Σ Σ′

Definition [FKNP08]

M and M′ are conjunctive-query equivalent, if for every source instance I and every CQ q, either Sol(I, M) = Sol(I, M′) = ∅ or cert(q, I, M) = cert(q, I, M′)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 39

slide-89
SLIDE 89

Relaxed Notions of Equivalence

S T

≡CQ

S T Σ Σ′

Proposition [FKNP08]

Assume that the following holds for every source instance I: Sol(I, M) = ∅ ⇒ UnivSol(I, M) = ∅ Then M and M′ are conjunctive-query equivalent, if for every source instance I, either Sol(I, M) = Sol(I, M′) = ∅ or core(I, M) = core(I, M′)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 40

slide-90
SLIDE 90

Hierarchy of Equivalences

Proposition [FKNP08]

Let M = (S, T, Σ) and M′ = (S, T, Σ′) be two schema mappings. M ≡log M′ ⇒ M ≡DE M′ ⇒ M ≡CQ M′

log DE CQ

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 41

slide-91
SLIDE 91

Hierarchy of Equivalences

But is this hierarchy of optimization potential proper, or does it collapse?

log DE CQ log DE CQ

This of course depends on the class of schema mappings.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 42

slide-92
SLIDE 92

Hierarchy of Equivalences

S T S T Σ Σ′ ∅ T(x, y) → T(y, x)

(s-t tgds and target tgds)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 43

slide-93
SLIDE 93

Hierarchy of Equivalences

S T ≡DE ≡log S T Σ Σ′ ∅ T(x, y) → T(y, x)

(s-t tgds and target tgds)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 43

slide-94
SLIDE 94

Hierarchy of Equivalences

S T ≡DE ≡log S T Σ Σ′ ∅ T(x, y) → T(y, x)

(s-t tgds and target tgds)

Observation

UnivSol(I, M) = UnivSol(I, M′) = {∅}, however for any I the solution J = {T(a, b)} is a solution under M but not under M′

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 43

slide-95
SLIDE 95

Hierarchy of Equivalences

S T S T Σ Σ′ S(x) → T(x, x) T(x, y) ∧ T(y, x) → T(x, x) S(x) → T(x, x) T(x, y) ∧ T(y, z) ∧ T(z, x) → T(x, x)

(s-t tgds and target tgds)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 44

slide-96
SLIDE 96

Hierarchy of Equivalences

S T ≡CQ ≡DE S T Σ Σ′ S(x) → T(x, x) T(x, y) ∧ T(y, x) → T(x, x) S(x) → T(x, x) T(x, y) ∧ T(y, z) ∧ T(z, x) → T(x, x)

(s-t tgds and target tgds)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 44

slide-97
SLIDE 97

Hierarchy of Equivalences

S T ≡CQ ≡DE S T Σ Σ′ S(x) → T(x, x) T(x, y) ∧ T(y, x) → T(x, x) S(x) → T(x, x) T(x, y) ∧ T(y, z) ∧ T(z, x) → T(x, x)

(s-t tgds and target tgds)

Observation

This is a universal solution for I = {S(1)} under M, but not M′: J = {T(1, 1), T(x, y), T(y, z), T(z, x)} While J is universal for M and M′, J is no solution for I under M′.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 44

slide-98
SLIDE 98

Hierarchy of Equivalences

log DE CQ

s-t tgds

– no additional optimization power – all three equivalences decidable

log DE CQ

s-t tgds and target tgds

– additional optimization power – DE- and CQ-equivalence undecidable

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 45

slide-99
SLIDE 99

CQ-Equivalence to s-t tgds

Theorem [FKNP08]

If M is specified by full s-t tgds and full target tgds, then the following statements are equivalent: M has bounded parallel chase There is an M′ ≡CQ M specified by full s-t tgds There is an M′ ≡CQ M specified by s-t tgds There is an M′ ≡CQ M specified by an SO tgd

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 46

slide-100
SLIDE 100

CQ-Equivalence to s-t tgds

S T ≡CQ S T Σ Σ′ ∃f ∀x, y S(x, y) → T(x, y) ∧ ∀x S(x, x) → T(x, f (x)) ∧ ∀x S(x, x)∧x = f (x) → W (x)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 47

slide-101
SLIDE 101

CQ-Equivalence to s-t tgds

S {S} T {T, W } Σ S(x, y) → T(x, y) S(x, x) → T(x, f (x)) S(x, x) ∧ x = f (x) → W (x)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 48

slide-102
SLIDE 102

CQ-Equivalence to s-t tgds

S {S} T {T, W } Σ S(x, y) → T(x, y) S(x, x) → T(x, f (x)) S(x, x) ∧ x = f (x) → W (x)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 48

slide-103
SLIDE 103

CQ-Equivalence to s-t tgds

S {S} T {T, W } Σ S(x, y) → T(x, y) S(x, x) → T(x, f (x))

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 48

slide-104
SLIDE 104

CQ-Equivalence to s-t tgds

S {S} T {T, W } Σ S(x, y) → T(x, y) S(x, x) → T(x, f (x))

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 48

slide-105
SLIDE 105

CQ-Equivalence to s-t tgds

S {S} T {T, W } Σ S(x, y) → T(x, y)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 48

slide-106
SLIDE 106

CQ-Equivalence to s-t tgds

S T ≡CQ S T Σ Σ′ ∃f ∀x, y S(x, y) → T(x, y) ∧ ∀x S(x, x) → T(x, f (x)) ∧ ∀x S(x, x)∧x = f (x) → W (x) S(x, y) → T(x, y)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 49

slide-107
SLIDE 107

CQ-Equivalence to s-t tgds

Some further results [FKNP08]

Characterization for CQ-equivalence to s-t tgds: full s-t tgds and full target tgds: bounded parallel chase SO tgds: bounded f-block size s-t tgds and target tgds: bounded core chase as well as bounded f-block size

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 50

slide-108
SLIDE 108

Outline

S T Σ Analyzing and Debugging

– Debugging with Routes – Computing Routes

Optimizing with Logical Equivalence

– Optimality Criteria – Optimization and Normalization

Optimizing with Relaxed Notions of Equivalence

– Data-Exchange Equivalence – Conjunctive-Query Equivalence

Comparing Schema Mappings

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 51

slide-109
SLIDE 109

Outline

S T Σ Analyzing and Debugging

– Debugging with Routes – Computing Routes

Optimizing with Logical Equivalence

– Optimality Criteria – Optimization and Normalization

Optimizing with Relaxed Notions of Equivalence

– Data-Exchange Equivalence – Conjunctive-Query Equivalence

Comparing Schema Mappings

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 51

slide-110
SLIDE 110

Information Transfer

S T S T′ Σ Σ′ S(x, y) → T(x) S(x, y) → U(x, y)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 52

slide-111
SLIDE 111

Information Transfer

S T ≡CQ ≡DE ≡log S T′ Σ Σ′ S(x, y) → T(x) S(x, y) → U(x, y)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 52

slide-112
SLIDE 112

Information Transfer

S T S S T′ Σ Σ′ S(x, y) → T(x) S(x, y) → U(x, y)

Observation

Intuitively, M′ transfers more information than M, since with U(x, y) → T(x) the information transferred by M can be obtained from the target of M′.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 52

slide-113
SLIDE 113

Information Transfer

S T S S T′ Σ Σ′

Definition [APRR10]

M S M′ if there exists a mapping N from T′ to T s.t. M = M′ ◦ N We say that M′ transfers as much source information as M. S T′ T M′ N M

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 53

slide-114
SLIDE 114

Information Transfer

S T T S′ T Σ Σ′

Definition [APRR10]

M T M′ if there exists a mapping N from S to S′ s.t. M = N ◦ M′ We say that M′ covers as much target information as M. S S′ T N M′ M

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 54

slide-115
SLIDE 115

Redundancy

S T S T′ Σ Σ′ S(x) → T(x) S(x) → U(x, x)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 55

slide-116
SLIDE 116

Redundancy

S T ≡S S T′ Σ Σ′ S(x) → T(x) S(x) → U(x, x)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 55

slide-117
SLIDE 117

Redundancy

S T ≡S S T′ Σ Σ′ S(x) → T(x) S(x) → U(x, x)

Definition [APRR10]

M is target redundant if there exists an instance J∗ of T s.t. M∗ = {(I, J) ∈ M | J = J∗} satisfies M∗ ≡S M. Similarly defined for source redundancy.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 55

slide-118
SLIDE 118

Redundancy

S T ≡S S T′ Σ Σ′ S(x) → T(x) target non-redundant S(x) → U(x, x) target redundant

Definition [APRR10]

M is target redundant if there exists an instance J∗ of T s.t. M∗ = {(I, J) ∈ M | J = J∗} satisfies M∗ ≡S M. Similarly defined for source redundancy.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 55

slide-119
SLIDE 119

Redundancy

S T ≡S S T′ Σ Σ′ S(x) → T(x) target non-redundant S(x) → U(x, x) target redundant

Observation

Σ′ is target redundant, since a solution can contain an atom U(a, b) with a = b

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 55

slide-120
SLIDE 120

Redundancy

S T ≡S S T′ Σ Σ′ S(x) → T(x) target non-redundant S(x) → U(x, x) target redundant

Observation

Σ′ is target redundant, since a solution can contain an atom U(a, b) with a = b But the schema mapping given as follows is target non-redundant: S(x) → U(x, x) U(x, y) → x = y

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 55

slide-121
SLIDE 121

Comparing Schema Mappings

Extract operator

Given mapping M, create a new source schema S′ that captures exactly the information participating in M. S S′ T M1 M2 M

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 56

slide-122
SLIDE 122

Comparing Schema Mappings

Extract operator

Given mapping M, create a new source schema S′ that captures exactly the information participating in M. P(x, y) → ∃u T(x, u) ∧ U(x, x) P(x, y) ∧ R(y, z) → ∃v V (x, y, v) S S′ T M1 M2 M

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 56

slide-123
SLIDE 123

Comparing Schema Mappings

Extract operator

Given mapping M, create a new source schema S′ that captures exactly the information participating in M. P(x, y) → ∃u T(x, u) ∧ U(x, x) P(x, y) ∧ R(y, z) → ∃v V (x, y, v) S S′ T M1 M2 M P(x, y) → P1(x) P(x, y) ∧ R(y, z) → P2(x, y)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 56

slide-124
SLIDE 124

Comparing Schema Mappings

Extract operator

Given mapping M, create a new source schema S′ that captures exactly the information participating in M. P(x, y) → ∃u T(x, u) ∧ U(x, x) P(x, y) ∧ R(y, z) → ∃v V (x, y, v) S S′ T M1 M2 M P(x, y) → P1(x) P(x, y) ∧ R(y, z) → P2(x, y) P1(x) → ∃u T(x, u) ∧ U(x, x) P2(x, y) → ∃v V (x, y, v)

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 56

slide-125
SLIDE 125

Comparing Schema Mappings

Extract operator

Given mapping M, create a new source schema S′ that captures exactly the information participating in M. S S′ T M1 M2 M

Characterization [APRR10]

(M1, M2) is an extract of M iff M1 ◦ M2 = M M1 ≡S M and M1 is target non-redundant M2 ≡T M and M2 is source non-redundant

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 57

slide-126
SLIDE 126

Outline

S T Σ Analyzing and Debugging

– Debugging with Routes – Computing Routes

Optimizing with Logical Equivalence

– Optimality Criteria – Optimization and Normalization

Optimizing with Relaxed Notions of Equivalence

– Data-Exchange Equivalence – Conjunctive-Query Equivalence

Comparing Schema Mappings

– Information Transfer – Redundancy

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 58

slide-127
SLIDE 127

Outline

S T Σ Analyzing and Debugging

– Debugging with Routes – Computing Routes

Optimizing with Logical Equivalence

– Optimality Criteria – Optimization and Normalization

Optimizing with Relaxed Notions of Equivalence

– Data-Exchange Equivalence – Conjunctive-Query Equivalence

Comparing Schema Mappings

– Information Transfer – Redundancy

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 58

slide-128
SLIDE 128

Further Results

Analyzing and Debugging

– Computing a single route fast – Application to XML-based settings

Optimizing with Logical Equivalence

– Extending normalization to target egds

Optimizing with Relaxed Notions of Equivalence

– Boundary between DE- and CQ-equivalence – Full characterization of CQ-equivalence

Comparing Schema Mappings

– Characterization of the inverse operator, schema evolution

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 59

slide-129
SLIDE 129

Next Steps

Support debugging with routes for target egds Extend optimization to target egds Normalize and optimize target tgds Create a full characterization for DE-equivalence Find useful decidable fragments for the relaxed notions Develop heuristic approaches to optimization Expand results beyond the relational setting

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 60

slide-130
SLIDE 130

References

Marcelo Arenas, Jorge P´ erez, Juan L. Reutter, and Cristian Riveros. Foundations of schema mapping management. In Jan Paredaens and Dirk Van Gucht, editors, PODS, pages 227–238. ACM, 2010. Laura Chiticariu and Wang Chiew Tan. Debugging schema mappings with routes. In Umeshwar Dayal, Kyu-Young Whang, David B. Lomet, Gustavo Alonso, Guy M. Lohman, Martin L. Kersten, Sang Kyun Cha, and Young-Kuk Kim, editors, VLDB, pages 79–90. ACM, 2006. Ronald Fagin, Phokion G. Kolaitis, Alan Nash, and Lucian Popa. Towards a theory of schema-mapping optimization. In Maurizio Lenzerini and Domenico Lembo, editors, PODS, pages 33–42. ACM, 2008. Georg Gottlob, Reinhard Pichler, and Vadim Savenkov. Normalization and optimization of schema mappings. PVLDB, 2(1):1102–1113, 2009.

Emanuel Sallinger DEIS’10 – 11 November, 2010 Page 61