Designing Information-Preserving Mapping Schemes for XML Denilson Barbosa Juliana Freire Alberto O. Mendelzon VLDB 2005
Motivation � An XML-to-relational mapping scheme consists of a procedure for shredding XML documents into relational databases, a procedure for publishing the databases back as documents, and constraints the databases must satisfy � The focus to date has been mostly on the performance of queries (see e.g., (Krishnamurthy et al. [2003]) for a survey) and updates (Tatarinov et al. [2001, 2002]) � We need to understand the properties of a mapping scheme (in any domain) to determine its suitability for a given application • Well studied for traditional data models (Hull [1986], Abiteboul and Hull [1988], Miller et al. [1993]) • We are only starting in the XML context [XSYM’04], (Bohannon et al. [2005]) 1 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Information Preservation – Goals Answering queries: � Requires reconstructing every fragment of the document: losslessness [XSYM’04] � Previous methods (possibly with simple extensions) suffice Processing updates, preserving document validity: � Requires that the resulting database “represents” a valid document and that every valid document can be represented by some database: validation [XSYM’04] � Losslessness alone is not enough � Problem: checking whether the update is permissible 2 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Example Consider the following DTD and a valid document: mondial 1 mondial ← cities , country ∗ cities country cities ← city ∗ 2 19 city ← name , ( province | state ) , official + city name capital country ← name , capital 3 20 22 name ← # PCDATA name province official city 21 23 province ← # PCDATA 4 6 8 Brazil Brasilia state ← # PCDATA 5 7 9 official ← # PCDATA 10 Toronto Ontario David capital ← # PCDATA name state official official 11 13 15 17 12 14 16 18 Salt Lake City Utah Rocky Sam 3 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Example – cont’d. Consider this (lossless) mapping scheme: city (cityId, name, ord, province , state ) mondial ← cities , country ∗ official (officialId, cityId, name, ord) cities ← city ∗ country (countryId, name, capital, ord) city ← name , ( province | state ) , official + country ← name , capital city (1, ’Toronto’, 1, ’Ontario’, NULL) name ← # PCDATA city (4, ’Salt Lake City’, 2, NULL, ’Utah’) province ← # PCDATA official (2, 1, ’David’, 1) state ← # PCDATA official (5, 4, ’Rocky’, 1) official ← # PCDATA official (6, 4, ’Sam’, 2) capital ← # PCDATA country (7, ’Brazil’, ’Brasilia’, 1) � Problems: UPDATE city SET province=’Utah’ update WHERE name=’Salt Lake City’ delete //city[name=’Toronto’]/official[last()] Legal SQL update Cannot be checked statically 4 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Checking for Permissible Updates Using a mapping scheme that is only lossless: � Publish the portions of the database affected by the update, and validate the result • Potentially expensive operation; large fragments of the document may have to be reconstructed � Build a (incremental) validator into the DBMS • In-DBMS validation is expensive (Nicola and John [2003]) and incremental validation requires maintaining considerable auxiliary information [ICDE’04],(Balmin et al. [2004]) • Requires a new component whose functionality overlaps with the DBMS constraint checking mechanism 5 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Outline 1. Motivation 2. Information-Preserving Mapping Schemes � Losslessness � Validation 3. Designing Information-Preserving Mapping Schemes 4. LILO � Mapping scheme transformations 5. Conclusion 6 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Information-Preserving Mapping Schemes � A mapping scheme is a triple µ = ( σ, π, S ) σ π − − − → − − − → � A class of mapping schemes is defined by the languages for writing σ , π , and the constraints in S . � The XDS class of mapping schemes [XSYM’04] • Mapping language: XQuery augment with mapping expressions • Relational constraints: boolean queries in Datalog ¬ • Publishing language: SilkRoute – XQuery over “canonical” XML views of the databases • Powerful by design 7 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Information-Preserving Mapping Schemes σ I D X : all XML documents π [ D 1 ] [ I 1 ] D ′ [ D ] [ D 2 ] R ( S ) : all legal instances of S [ I 2 ] L ( X ) L ( X ) : all valid documents w.r.t. X R ( S ) R ( S ) X X [ · ] : equivalence class lossless and validating lossless mapping scheme mapping scheme � µ = ( σ, π, S ) is lossless iff π ( σ ( · )) is the identity on equivalence classes of documents � µ = ( σ, π, S ) is lossless and validating iff σ and π are bijective and σ = π − 1 (up to equivalence) � µ = ( σ, π, S ) is lossless and validating iff X ≡ S Losslessness and validation are undecidable for XDS mapping schemes [XSYM’04] 8 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Designing Mapping Schemes α k α 2 α 1 σ X S S 1 · · · S k π β 1 β 2 β k µ 0 µ 1 · · · µ k � Goal: designing a mapping scheme µ k = ( σ k , π k , S k ) that is both lossless and validating � Framework for designing lossless and validating mapping schemes in XDS : • Start with µ 0 that is known to be lossless and validating • Apply equivalence-preserving transformations between µ i and µ i +1 • In the paper: rewriting µ = ( σ, π, S ) in XDS and α i , β i in wrec-ILOG ¬ into µ ′ = ( σ ′ , π ′ , S ′ ) in XDS 9 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
LILO – Initial Mapping Scheme Initial mapping scheme in LILO: Edge ++ is both lossless and validating [XSYM’04] � Relational Schema: • Edge , FLC , ILS , Value : document structure and content • Type : element types • Transition : transition functions of all content models in the DTD � Constraints: • Structural Constraints ensure the database represents a well-formed XML document; e.g., the database encodes a tree, the ordering of siblings is consistent, etc. • Validating Constraints ensure that the content of every element is valid ; i.e., spells a word accepted by an appropriate DFA � Each validation constraint is implemented by a recursive Datalog ¬ program 10 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
LILO Transformations – Example Goal: replace a validating constraint by equivalent constraints that are easier to enforce Example: enforcing the rule country ← name , capital � Initial Edge ++ mapping ( S 0 ): Edge 0 FLC 0 Type 0 pid eid label pid first last eid type country 1 19 country 19 20 22 19 t 1 19 20 name 19 19 22 capital name capital Transition 0 type from label to acc 20 22 Value 0 ILS 0 t 1 q 0 name q 1 no eid value 21 23 left right capital yes t 1 q 1 q 2 20 Brazil Brazil Brasilia 20 22 22 Brasilia � Validation constraint: recursive Datalog ¬ program 11 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
LILO Transformations – Example Step 1: inline the name and capital elements S 0 S 1 Validation constraints: Edge 0 FLC 0 � name and capital are pid eid label Country 1 pid first last unique in Country 1 1 19 country name capital country 19 20 22 19 20 19 20 22 name � FKs: name and 19 22 capital capital in Country 1 Value 1 Value 0 refer to value in ILS 0 eid value eid value left right Value 1 20 Brazil 20 Brazil 20 22 22 Brasilia 22 Brasilia α 1 : R ( S 0 ) → R ( S 1 ) β 1 : R ( S 1 ) → R ( S 0 ) Diff ( e ): − Edge 0 ( , e, ′ country ′ ) Edge 0 ( e, c, l ): − Edge 1 ( e, c, l ) Edge 0 ( e, c, l ): − Edge 1 ( e, , ′ country ′ ) , Country ( c, , ) , Diff ( e ): − Edge 0 ( , e, ′ capital ′ ) l = ′ country ′ Diff ( e ): − Edge 0 ( , c, ′ country ′ ) , Edge 0 ( c, e, ′ name ′ ) Edge 0 ( e, c, l ): − Country 1 ( e, c, ) , l = ′ name ′ Country 1 ( e, n, c ): − Edge 0 ( e, n, ′ name ′ ) , Edge 0 ( e, c, ′ capital ′ ) Edge 0 ( e, c, l ): − Country 1 ( e, , c ) , l = ′ capital ′ Edge 1 ( e, c, l ): − Edge 0 ( e, c, l ) , ¬ Diff ( e ) FLC 1 ( p, f, l ): − FLC 0 ( p, f, l ) , ¬ Diff ( p ) FLC 0 ( p, f, l ): − FLC 1 ( p, f, l ) ILS 1 ( l, r ): − ILS 0 ( l, r ) , ¬ Diff ( l ) FLC 0 ( p, f, l ): − Country ( p, f, l ) Value 1 ( e, v ): − Value 0 ( e, v ) ILS 0 ( l, r ): − ILS 1 ( l, r ) ILS 0 ( l, r ): − Country 1 ( , l, r ) Value 0 ( e, v ): − Value 1 ( e, v ) 12 Designing Information-Preserving Mapping Schemes for XML — Denilson Barbosa
Recommend
More recommend