theory of peer data management
play

Theory of Peer Data Management Sebastian Skritek Database and - PowerPoint PPT Presentation

Theory of Peer Data Management Sebastian Skritek Database and Artificial Intelligence Group Vienna University of Technology DEIS 2010 S.Skritek Theory of PDM 1/54 1.Motivation 1.1. Motivation Motivation From Data Integration to Peer


  1. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Query Answering: First Order Reasoning query answering: FO reasoning over P C(,) Citizen(alice) Citizen(P) m 1 m 2 Male(P) Female(P) Male(alice) Female(alice) m 3 m 4 TaxPayer(P) TaxPayer(alice) m 1 , m 2 : m 1 : Citizen(x) :- Male(x) Citizen(x) → Male(x) ∨ Female(x) m 2 : Citizen(x) :- Female(x) m 3 : Male(x) → TaxPayer(x) m 3 : Male(x) ⊆ TaxPayer(x) m 4 : Female(x) → TaxPayer(x) m 4 : Female(x) ⊆ TaxPayer(x) Example Consider Query { x | TaxPayer ( x ) } S.Skritek – Theory of PDM 14/54

  2. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Query Answering: First Order Reasoning query answering: FO reasoning over P C(,) Citizen(alice) Citizen(P) m 1 m 2 Male(P) Female(P) Male(alice) Female(alice) Female(alice) m 3 m 4 TaxPayer(P) TaxPayer(alice) TaxPayer(alice) m 1 , m 2 : m 1 : Citizen(x) :- Male(x) Citizen(x) → Male(x) ∨ Female(x) m 2 : Citizen(x) :- Female(x) m 3 : Male(x) → TaxPayer(x) m 3 : Male(x) ⊆ TaxPayer(x) m 4 : Female(x) → TaxPayer(x) m 4 : Female(x) ⊆ TaxPayer(x) Example Consider Query { x | TaxPayer ( x ) } S.Skritek – Theory of PDM 14/54

  3. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Query Answering: First Order Reasoning query answering: FO reasoning over P C(,) Citizen(alice) Citizen(P) m 1 m 2 Male(P) Female(P) Male(alice) Male(alice) Female(alice) m 3 m 4 TaxPayer(P) TaxPayer(alice) TaxPayer(alice) m 1 , m 2 : m 1 : Citizen(x) :- Male(x) Citizen(x) → Male(x) ∨ Female(x) m 2 : Citizen(x) :- Female(x) m 3 : Male(x) → TaxPayer(x) m 3 : Male(x) ⊆ TaxPayer(x) m 4 : Female(x) → TaxPayer(x) m 4 : Female(x) ⊆ TaxPayer(x) Example Consider Query { x | TaxPayer ( x ) } S.Skritek – Theory of PDM 14/54

  4. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Query Answering: First Order Reasoning query answering: FO reasoning over P C(,) Citizen(alice) Citizen(P) m 1 m 2 Male(P) Female(P) Male(alice) Male(alice) Female(alice) Female(alice) m 3 m 4 TaxPayer(P) TaxPayer(alice) TaxPayer(alice) m 1 , m 2 : m 1 : Citizen(x) :- Male(x) Citizen(x) → Male(x) ∨ Female(x) m 2 : Citizen(x) :- Female(x) m 3 : Male(x) → TaxPayer(x) m 3 : Male(x) ⊆ TaxPayer(x) m 4 : Female(x) → TaxPayer(x) m 4 : Female(x) ⊆ TaxPayer(x) Example Consider Query { x | TaxPayer ( x ) } S.Skritek – Theory of PDM 14/54

  5. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Epistemic Logic A modal logic used for modeling knowledge, certainty Modal logic is used e.g. in multi agent systems More precisely: KT45 (or S5) S.Skritek – Theory of PDM 15/54

  6. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Epistemic Logic A modal logic used for modeling knowledge, certainty Modal logic is used e.g. in multi agent systems More precisely: KT45 (or S5) Syntax: FOL, but also K φ is an atom (if φ is a formula) Semantics: • Often defined using Kripke structures ( W , R , V ) I 4 • Here: every world is accessible from every world • epistemic interpretation ε = ( I , W ) I 2 I 3 W . . . set of FO interpretations, I ∈ W I 1 x ) satisfied in ε : by � t s.t. a ( � a ( � t ) is true in I x ) satisfied in ε : by � K φ ( � t s.t. t ) is satisfied in all ε ′ = ( J , W ) with J ∈ W φ ( � epistemic model: φ is satisfied in every ( J , W ) ( J ∈ W ) S.Skritek – Theory of PDM 15/54

  7. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach q 1 G 8 P 8 q 2 Modeling PDM [Calvanese et al., 2004] P 6 P 7 G 6 G 7 G 1 G 5 S 1 S 5 P 1 P 5 Peer schema: G 2 G 3 G 4 S 2 S 3 S 4 P 2 P 3 P 4 G may contain function free FO formulas over A G S.Skritek – Theory of PDM 16/54

  8. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach q 1 G 8 P 8 q 2 Modeling PDM [Calvanese et al., 2004] P 6 P 7 G 6 G 7 G 1 G 5 S 1 S 5 P 1 P 5 Peer schema: G 2 G 3 G 4 S 2 S 3 S 4 P 2 P 3 P 4 G may contain function free FO formulas over A G Epistemic Theory: T P : • formulas in G • ∀ � � � x ∃ � y ( φ S ( � x ,� y )) → ∃ � z ψ G ( � x ,� z ) (for each m ∈ L ) M P : • axioms ∀ � � � x K ( ∃ � y φ ( � x ,� y )) → ∃ � z ψ ( � x ,� z ) (for each m ∈ M ) S.Skritek – Theory of PDM 16/54

  9. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach q 1 G 8 P 8 q 2 Modeling PDM [Calvanese et al., 2004] P 6 P 7 G 6 G 7 G 1 G 5 S 1 S 5 P 1 P 5 Peer schema: G 2 G 3 G 4 S 2 S 3 S 4 P 2 P 3 P 4 G may contain function free FO formulas over A G Epistemic Theory: T P : • formulas in G • ∀ � � � x ∃ � y ( φ S ( � x ,� y )) → ∃ � z ψ G ( � x ,� z ) (for each m ∈ L ) M P : • axioms ∀ � � � x K ( ∃ � y φ ( � x ,� y )) → ∃ � z ψ ( � x ,� z ) (for each m ∈ M ) Semantics: Recall: FOL model of T P based on D Epistemic model of P based on D : ( I , W ) • W : set of models of T P based on D • ( I , W ) : epistemic model of M P Certain answers w.r.t. D : � q I for all epistemic models ( I , W ) S.Skritek – Theory of PDM 16/54

  10. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Properties of Epistemic Logic Based Semantics ( denote certain answers w.r.t. source instance D as ans ( q , P , D ) ) sound approximation of FOL: ans K ( q , P , D ) ⊆ ans fol ( q , P , D ) Unique Maximal Epistemic Model for P ( I , W ) s.t. there exists no model ( J , W ′ ) with W ⊂ W ′ • Unique, Independent of I t ∈ q I for each I ∈ W } ⇒ ans K ( q , P , D ) = { � t | � S.Skritek – Theory of PDM 17/54

  11. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Properties of Epistemic Logic Based Semantics ( denote certain answers w.r.t. source instance D as ans ( q , P , D ) ) sound approximation of FOL: ans K ( q , P , D ) ⊆ ans fol ( q , P , D ) Unique Maximal Epistemic Model for P ( I , W ) s.t. there exists no model ( J , W ′ ) with W ⊂ W ′ • Unique, Independent of I t ∈ q I for each I ∈ W } ⇒ ans K ( q , P , D ) = { � t | � FOE ( P , D ) : minimal FO theory containing T P , D , and • for each cq ′ � cq , if FOE ( P , D ) | ybody cq ′ ( � = ∃ � t ,� y ) , zbody cq ( � then ∃ � z ) ∈ FOE ( P , D ) t ,� Theorem (Calvanese et al., 2004) The set of interpretations { I | I | = FOE ( P , D ) } is the unique maximal epistemic model W for P based on D. S.Skritek – Theory of PDM 17/54

  12. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Intuition: Only exchange certain answers Definition ( τ ( P ) ) Given P i = ( G , S , L , M ) , define τ ( P i ) = ( G , S ′ , L ′ , M ) where 1 S ′ = S ∪ { r | cq ′ � cq ∈ M} 2 L ′ = L ∪ x ) } � cq | cq ′ � cq ∈ M � { � x | r ( � � cq ′ 1 � cq 1 G cq ′ 2 � cq 2 S S.Skritek – Theory of PDM 18/54

  13. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Intuition: Only exchange certain answers Definition ( τ ( P ) ) Given P i = ( G , S , L , M ) , define τ ( P i ) = ( G , S ′ , L ′ , M ) where 1 S ′ = S ∪ { r | cq ′ � cq ∈ M} 2 L ′ = L ∪ x ) } � cq | cq ′ � cq ∈ M � { � x | r ( � � cq ′ 1 � cq 1 G cq ′ 2 � cq 2 r 2 → cq 2 r 1 → cq 1 r 1 S r 2 S.Skritek – Theory of PDM 18/54

  14. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Intuition: Only exchange certain answers (contd.) q q cq ′ cq ′ 1 � cq 1 1 � cq 1 G G D D cq ′ cq ′ 2 � cq 2 2 � cq 2 r 2 → cq 2 r 2 → cq 2 r 1 → cq 1 r 1 → cq 1 r 1 r 2 r 1 r 2 S S Given P , P i = ( G , S , L , M ) , D and query q over G : Let ¯ D be source instance for τ ( P i ) s.t. D = S D and r ¯ D = ans ( cq ′ , P , D ) • S ¯ We want ans ( q , P , D ) = ans ( q , τ ( P ) , ¯ D ) provides: modularity and independence S.Skritek – Theory of PDM 19/54

  15. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Intuition: Only exchange certain answers (contd.) Recall intuition: ans ( q , P , D ) = ans ( q , τ ( P ) , ¯ D ) for cq ′ � cq ∈ M : • r ∈ S ′ , { � x ) } � cq ∈ L ′ x | r ( � D = ans ( cq ′ , P , D ) • r ¯ S.Skritek – Theory of PDM 20/54

  16. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Intuition: Only exchange certain answers (contd.) Recall intuition: ans ( q , P , D ) = ans ( q , τ ( P ) , ¯ D ) for cq ′ � cq ∈ M : • r ∈ S ′ , { � x ) } � cq ∈ L ′ x | r ( � D = ans ( cq ′ , P , D ) • r ¯ Further recall: t ∈ cq ′ I for each I ∈ W } ans K ( cq ′ , P , D ) = { � t | � • for W : maximal epistemic model axiom ∀ � x � K ( ∃ � ybody cq ′ ( � x ,� y )) → ∃ � zbody cq ( � x ,� z ) � S.Skritek – Theory of PDM 20/54

  17. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Intuition: Only exchange certain answers (contd.) Recall intuition: ans ( q , P , D ) = ans ( q , τ ( P ) , ¯ D ) for cq ′ � cq ∈ M : • r ∈ S ′ , { � x ) } � cq ∈ L ′ x | r ( � D = ans ( cq ′ , P , D ) • r ¯ Further recall: t ∈ cq ′ I for each I ∈ W } ans K ( cq ′ , P , D ) = { � t | � • for W : maximal epistemic model axiom ∀ � x � K ( ∃ � ybody cq ′ ( � x ,� y )) → ∃ � zbody cq ( � x ,� z ) � Hence (informal) ans K ( cq ′ , P , D ) = { � y : body cq ′ ( � t | for each I ∈ W : ∃ � y ) ∈ I } t ,� K ( ∃ � ybody cq ′ ( � x ,� y )) satisfied by tuples { � y : body cq ′ ( � t | in each I ∈ W : ∃ � y ) ∈ I } t ,� ⇒ P “imports” the same tuples S.Skritek – Theory of PDM 20/54

  18. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Query Answering use this idea for query answering ⇒ always consider τ ( P ) perfect reformulation Given query q over G i ⇒ query q 1 over S ′ i s.t. for every instance D 1 for τ ( P ) , q D 1 = ans ( q , τ ( P ) , D 1 ) 1 (assume settings where perfect reformulation always exists) S.Skritek – Theory of PDM 21/54

  19. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Query Answering use this idea for query answering ⇒ always consider τ ( P ) perfect reformulation Given query q over G i ⇒ query q 1 over S ′ i s.t. for every instance D 1 for τ ( P ) , q D 1 = ans ( q , τ ( P ) , D 1 ) 1 (assume settings where perfect reformulation always exists) Idea of the Algorithm Compute a datalog program DP , containing • facts from S • rules encoding perfect reformulations to S ′ S.Skritek – Theory of PDM 21/54

  20. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Query Answering use this idea for query answering ⇒ always consider τ ( P ) perfect reformulation Given query q over G i ⇒ query q 1 over S ′ i s.t. for every instance D 1 for τ ( P ) , q D 1 = ans ( q , τ ( P ) , D 1 ) 1 (assume settings where perfect reformulation always exists) Idea of the Algorithm Compute a datalog program DP , containing • facts from S • rules encoding perfect reformulations to S ′ Theorem (Calvanese et al., 2004) 1 Eval ( head q , DP ) computes ans K ( q , P , D ) 2 Given P , q, � t, deciding � t ∈ ans K ( q , P , D ) is PTIME -complete (data complexity) S.Skritek – Theory of PDM 21/54

  21. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Query Answering: Algorithm query answering algorithm at P i : peerQueryHandler( q , r q ) (1) DP I = computePerfectRef( q , r q , P i ); DP E = ∅ (2) for each r ∈ S ′ i ∩ DP I : (3) if r ∈ S ( ∗ ) q cq ′ 1 � cq 1 (3a) DP E = DP E ∪ { r ( � t ) | r ( � t ) ∈ D } G D cq ′ 2 � cq 2 r 2 → cq 2 else ( r ∈ S ′ \ S ) r 1 → cq 1 r 1 S r 2 (3b) DP ′ = P ′ .peerQueryHandler( Q ( r ) , r ) DP I = DP I ∪ DP ′ I ; DP E = DP E ∪ DP ′ E (4) return DP ( ∗ ) loop detection omitted S.Skritek – Theory of PDM 22/54

  22. 2.Query Answering in Peer Data Management 2.3. An Epistemic Logic Approach Further nice properties Decidability depends only on local properties under FOL: also constraints may be propagated by mappings Epistemic Logic: provides complete modularity for peers Mapping Composition Semantics allows for (reasonable) mapping composition Resulting systems are query equivalent Inconsistency Handling Consider two kinds of inconsistency: • local inconsistency, P2P inconsistency Use nonmonotonic extension ( K 45 A n ), model cq i � cq j : • ∀ � � ¬ A i ⊥ i ∧ K i ( ∃ � ybody cq i ( � x ,� y )) ∧ ¬ A j ( ¬∃ � zbody cq j ( � x ,� z )) → x � K j ( ∃ � zbody cq j ( � x ,� z )) S.Skritek – Theory of PDM 23/54

  23. 3.Materialization of Data in Peer Data Management 3.0. Outline 1. Motivation 2. Query Answering in Peer Data Management 3. Materialization of Data in Peer Data Management 3.1 Reconciling PDM and Data Exchange 3.2 Active XML 3.3 Orchestra 4. Optimization of Query Reformulation 5. Conclusion S.Skritek – Theory of PDM 24/54

  24. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Idea So far: Peer Data Integration data remains local at peers information needed for query answering are exchanged mappings can be considered as “virtual” Other possibility: Generalize Data Exchange copy data between different peers interpret mappings as constraints materialize data to satisfy these constraints → Look onto some approaches following this idea S.Skritek – Theory of PDM 25/54

  25. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Reconciling Data Exchange and PDM [De Giacomo et al., PODS 2007] G i S i S.Skritek – Theory of PDM 26/54

  26. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Reconciling Data Exchange and PDM [De Giacomo et al., PODS 2007] P 1 S.Skritek – Theory of PDM 26/54

  27. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Reconciling Data Exchange and PDM [De Giacomo et al., PODS 2007] S = �P , � , P 1 P 4 P 5 P 2 P 3 S.Skritek – Theory of PDM 26/54

  28. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Reconciling Data Exchange and PDM [De Giacomo et al., PODS 2007] S = �P , , M E � M E : TGDs between pairs of peers P 1 P 4 P 5 P 2 P 3 S.Skritek – Theory of PDM 26/54

  29. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Reconciling Data Exchange and PDM [De Giacomo et al., PODS 2007] S = �P , C E , M E � M E : TGDs between pairs of peers P 1 C E : TGDs & EGDs over single peer P 4 P 5 P 2 P 3 S.Skritek – Theory of PDM 26/54

  30. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Reconciling Data Exchange and PDM [De Giacomo et al., PODS 2007] S = �P , C E , M E � M E : TGDs between pairs of peers P 1 C E : TGDs & EGDs over single peer P 4 P 5 Semantics: C E : FO semantics P 2 P 3 M E : exchanges only certain answers Universal S -solution S.Skritek – Theory of PDM 26/54

  31. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Reconciling Data Exchange and PDM [De Giacomo et al., PODS 2007] S = �P , C E , M E , C I , M I � M E , M I : TGDs between pairs of peers P 1 C E , C I : TGDs & EGDs over single peer P 4 P 5 P 2 P 3 S.Skritek – Theory of PDM 26/54

  32. 3.Materialization of Data in Peer Data Management 3.1. Reconciling PDM and Data Exchange Reconciling Data Exchange and PDM [De Giacomo et al., PODS 2007] S = �P , C E , M E , C I , M I � M E , M I : TGDs between pairs of peers P 1 C E , C I : TGDs & EGDs over single peer P 4 P 5 Semantics: C E , C I : FO semantics M E , M I : certain answers P 2 P 3 C I , M I precedence Universal S -solution S.Skritek – Theory of PDM 26/54

  33. 3.Materialization of Data in Peer Data Management 3.2. Active XML Active XML Active XML S.Skritek – Theory of PDM 27/54

  34. 3.Materialization of Data in Peer Data Management 3.2. Active XML Active XML Recall: Active XML Not considered yet x x Formal semantics • service call • query answering xml Complexity x [Abiteboul et al., PODS 2004] → consider only monotone Web Services S.Skritek – Theory of PDM 28/54

  35. 3.Materialization of Data in Peer Data Management 3.2. Active XML AXML Document Definition (AXML document) AXML document: pair ( T , λ ) where T = ( N , E ) : finite, unordered tree • N ⊂ N : finite set of nodes • E ⊂ N × N : directed edges λ : N → L ∪ F ∪ V : function s.t. • λ ( n ) ∈ V only if n is a leaf node • for root n , λ ( n ) ∈ V ∪ L D : document names, N : nodes, L : labels, V : atomic values, F : function names S.Skritek – Theory of PDM 29/54

  36. 3.Materialization of Data in Peer Data Management 3.2. Active XML AXML Document Definition (AXML document) AXML document: pair ( T , λ ) where T = ( N , E ) : finite, unordered tree • N ⊂ N : finite set of nodes • E ⊂ N × N : directed edges λ : N → L ∪ F ∪ V : function s.t. • λ ( n ) ∈ V only if n is a leaf node • for root n , λ ( n ) ∈ V ∪ L D : document names, N : nodes, L : labels, V : atomic values, F : function names data nodes, function nodes S.Skritek – Theory of PDM 29/54

  37. 3.Materialization of Data in Peer Data Management 3.2. Active XML AXML Document Definition (AXML document) AXML document: pair ( T , λ ) where T = ( N , E ) : finite, unordered tree • N ⊂ N : finite set of nodes • E ⊂ N × N : directed edges λ : N → L ∪ F ∪ V : function s.t. • λ ( n ) ∈ V only if n is a leaf node • for root n , λ ( n ) ∈ V ∪ L D : document names, N : nodes, L : labels, V : atomic values, F : function names data nodes, function nodes function call: pass subtree as parameter; get forest as return value ⇒ append as siblings to call node S.Skritek – Theory of PDM 29/54

  38. 3.Materialization of Data in Peer Data Management 3.2. Active XML Reduced Documents Definition ( T 1 , λ 1 ) is subsumed by ( T 2 , λ 2 ) ( ( T 1 , λ 1 ) ⊆ ( T 2 , λ 2 ) ) if there exists mapping h : N 1 → N 2 s.t: • h ( root ( T 1 )) = root ( T 2 ) • n 1 child of n 2 ⇒ h ( n 1 ) child of h ( n 2 ) (for all n 1 , n 2 ∈ N 1 ) • λ 1 ( n ) = λ 2 ( h ( n )) (for all n ∈ N 1 ) d 1 ⊆ d 2 and d 2 ⊆ d 1 ⇒ d 1 ≡ d 2 → Document d is reduced if for no subtree d ′ of d , d ≡ d ′ S.Skritek – Theory of PDM 30/54

  39. 3.Materialization of Data in Peer Data Management 3.2. Active XML Reduced Documents Definition ( T 1 , λ 1 ) is subsumed by ( T 2 , λ 2 ) ( ( T 1 , λ 1 ) ⊆ ( T 2 , λ 2 ) ) if there exists mapping h : N 1 → N 2 s.t: • h ( root ( T 1 )) = root ( T 2 ) • n 1 child of n 2 ⇒ h ( n 1 ) child of h ( n 2 ) (for all n 1 , n 2 ∈ N 1 ) • λ 1 ( n ) = λ 2 ( h ( n )) (for all n ∈ N 1 ) d 1 ⊆ d 2 and d 2 ⊆ d 1 ⇒ d 1 ≡ d 2 → Document d is reduced if for no subtree d ′ of d , d ≡ d ′ Properties: • Each document has a unique reduced version • Decision and Function problem solvable in PTIME S.Skritek – Theory of PDM 30/54

  40. 3.Materialization of Data in Peer Data Management 3.2. Active XML Monotone AXML Systems Definition (monotone AXML system) monotone AXML system: S = ( D , F , I ) • finite sets D ⊂ D , F ⊂ F • mapping I : for d ∈ D , I ( d ) returns a document, for f ∈ F , I ( f ) returns a monotone service S.Skritek – Theory of PDM 31/54

  41. 3.Materialization of Data in Peer Data Management 3.2. Active XML Monotone AXML Systems Definition (monotone AXML system) monotone AXML system: S = ( D , F , I ) • finite sets D ⊂ D , F ⊂ F • mapping I : for d ∈ D , I ( d ) returns a document, for f ∈ F , I ( f ) returns a monotone service (web) service s • defined w.r.t. set { d 1 , . . . , d n } of document names • given assignment θ of AXML documents to { d 1 , . . . , d n } , return forest of AXML documents • consider s as black box monotone service • for all θ, θ ′ : for all i : θ ( d i ) ⊆ θ ′ ( d i ) ⇒ s ( θ ) ⊆ s ( θ ′ ) S.Skritek – Theory of PDM 31/54

  42. 3.Materialization of Data in Peer Data Management 3.2. Active XML Invocations of Services Service invocation • given S , d ∈ D , v ∈ I ( d ) , λ ( v ) = f • invoking f : call I ( f ) on θ : θ ( d i ) = I ( d i ) , θ ( input ) , θ ( context ) • append I ( f )( θ ) to parent of v , normalize afterward S.Skritek – Theory of PDM 32/54

  43. 3.Materialization of Data in Peer Data Management 3.2. Active XML Invocations of Services Service invocation • given S , d ∈ D , v ∈ I ( d ) , λ ( v ) = f • invoking f : call I ( f ) on θ : θ ( d i ) = I ( d i ) , θ ( input ) , θ ( context ) • append I ( f )( θ ) to parent of v , normalize afterward Sequences of Invocations → S ′ : S ′ �≡ S ; S ′ obtained from S by invoking function at v • S − node v v 1 v 2 v n • rewriting (possible infinite): S − → S 1 − → S 2 → . . . − → S n . . . ∗ ( S − → S n ) v n + 1 • system terminates in S n : no v n + 1 , S n + 1 s.t. S n − − → S n + 1 S.Skritek – Theory of PDM 32/54

  44. 3.Materialization of Data in Peer Data Management 3.2. Active XML Invocations of Services Service invocation • given S , d ∈ D , v ∈ I ( d ) , λ ( v ) = f • invoking f : call I ( f ) on θ : θ ( d i ) = I ( d i ) , θ ( input ) , θ ( context ) • append I ( f )( θ ) to parent of v , normalize afterward Sequences of Invocations → S ′ : S ′ �≡ S ; S ′ obtained from S by invoking function at v • S − node v v 1 v 2 v n • rewriting (possible infinite): S − → S 1 − → S 2 → . . . − → S n . . . ∗ ( S − → S n ) v n + 1 • system terminates in S n : no v n + 1 , S n + 1 s.t. S n − − → S n + 1 fair (infinite) sequence v i • for every v i ∈ S i : there exists a j > i s.t. either S j − → S j + 1 or invoking v i has no effect on S j S.Skritek – Theory of PDM 32/54

  45. 3.Materialization of Data in Peer Data Management 3.2. Active XML Semantics of monotone AXML systems Definition (semantics of monotone AXML systems) For a monotone AXML system S , its semantics [ S ] is defined as: ∗ [ S ] = J if S − → J and system terminates at J ( J finite) [ S ] = � S i for infinite fair rewriting S . . . → . . . v i − → S i . . . S.Skritek – Theory of PDM 33/54

  46. 3.Materialization of Data in Peer Data Management 3.2. Active XML Semantics of monotone AXML systems Definition (semantics of monotone AXML systems) For a monotone AXML system S , its semantics [ S ] is defined as: ∗ [ S ] = J if S − → J and system terminates at J ( J finite) [ S ] = � S i for infinite fair rewriting S . . . → . . . v i − → S i . . . Semantics is well defined (order of invocations does not matter) S ⊆ S ′ ( ˆ → ˆ ∗ → ¯ ∗ S : either ¯ • S S terminates at S ′ ), − S and S − or ¯ S ⊆ S i for some i ( ˆ S not terminating) • one rewriting terminates at J ⇒ any rewriting terminates at J • one fair rewriting does not terminate ⇒ no rewriting terminates; any fair rewriting results in same infinite system S.Skritek – Theory of PDM 33/54

  47. 3.Materialization of Data in Peer Data Management 3.2. Active XML Positive Active XML Also consider service implementations, defined as queries S.Skritek – Theory of PDM 34/54

  48. 3.Materialization of Data in Peer Data Management 3.2. Active XML Positive Active XML Also consider service implementations, defined as queries Definition (Positive Query) positive query q: r :- d 1 / p 1 , . . . , d n / p n , e 1 , . . . , e m where d i : document names, r , p i : positive AXML tree patterns each variable occurring in r also occurs in some p i e j : inequalities x � = y between label, function, or value variables or constants (no tree variables). No tree variable occurs twice in the body simple query: no tree variables AXML tree pattern: subtree of AXML document some labels replaced by label variables S.Skritek – Theory of PDM 34/54

  49. 3.Materialization of Data in Peer Data Management 3.2. Active XML Query Semantics Recall: • query q = r :- d 1 / p 1 , . . . , d n / p n , e 1 , . . . , e m • monotone AXML system S = ( D , F , I ) S.Skritek – Theory of PDM 35/54

  50. 3.Materialization of Data in Peer Data Management 3.2. Active XML Query Semantics Recall: • query q = r :- d 1 / p 1 , . . . , d n / p n , e 1 , . . . , e m • monotone AXML system S = ( D , F , I ) Snapshot Result q ( S ) • consider variable assignments µ (respect typing) s.t. for each d i / p i ∈ q : µ ( p i ) ⊆ I ( d i ) • q ( S ) : forest of all documents µ ( r ) S.Skritek – Theory of PDM 35/54

  51. 3.Materialization of Data in Peer Data Management 3.2. Active XML Query Semantics Recall: • query q = r :- d 1 / p 1 , . . . , d n / p n , e 1 , . . . , e m • monotone AXML system S = ( D , F , I ) Snapshot Result q ( S ) • consider variable assignments µ (respect typing) s.t. for each d i / p i ∈ q : µ ( p i ) ⊆ I ( d i ) • q ( S ) : forest of all documents µ ( r ) • Properties: monotone (i.e. S ⊆ S ′ ⇒ q ( S ) ⊆ q ( S ′ ) ) for positive queries (no inequalities of tree variables) for positive queries: PTIME S.Skritek – Theory of PDM 35/54

  52. 3.Materialization of Data in Peer Data Management 3.2. Active XML Query Semantics Recall: • query q = r :- d 1 / p 1 , . . . , d n / p n , e 1 , . . . , e m • monotone AXML system S = ( D , F , I ) Snapshot Result q ( S ) • consider variable assignments µ (respect typing) s.t. for each d i / p i ∈ q : µ ( p i ) ⊆ I ( d i ) • q ( S ) : forest of all documents µ ( r ) • Properties: monotone (i.e. S ⊆ S ′ ⇒ q ( S ) ⊆ q ( S ′ ) ) for positive queries (no inequalities of tree variables) for positive queries: PTIME Query Result [ q ]( S ) • [ q ]( S ) = q ([ S ]) if S converges to finite system [ S ] • [ q ]( S ) = � q ( S i ) for infinite fair rewriting S . . . S i . . . otherwise • for positive queries: result is independent of rewriting sequence S.Skritek – Theory of PDM 35/54

  53. 3.Materialization of Data in Peer Data Management 3.2. Active XML Positive Systems service descriptions I ( f ) defined as positive queries if all queries are simple → simple positive system S.Skritek – Theory of PDM 36/54

  54. 3.Materialization of Data in Peer Data Management 3.2. Active XML Positive Systems service descriptions I ( f ) defined as positive queries if all queries are simple → simple positive system Semantics of positive systems positive system S , function node v , λ ( v ) = f , I ( f ) = q invoking f : evaluate q under θ snapshot result of q ( S ) is added as sibling of v S.Skritek – Theory of PDM 36/54

  55. 3.Materialization of Data in Peer Data Management 3.2. Active XML Positive Systems service descriptions I ( f ) defined as positive queries if all queries are simple → simple positive system Semantics of positive systems positive system S , function node v , λ ( v ) = f , I ( f ) = q invoking f : evaluate q under θ snapshot result of q ( S ) is added as sibling of v Complexity Theorem (Abiteboul et al., PODS 2004) Any Turing Machine can be simulated by a positive AXML system, with the input tape represented by an AXML tree. ⇒ it is undecidable whether a positive system terminates S.Skritek – Theory of PDM 36/54

  56. 3.Materialization of Data in Peer Data Management 3.2. Active XML Restricted Systems Try to find decidable systems Acyclic Systems dependency graph ( V , E ) of S = ( D , F , I ) : • V : D ∪ F (document and function names) • E : edge ( d , f ) if f occurs in I ( d ) , edge ( f , d ) (resp. ( f , g ) ) if d (resp. g ) occurs in I ( f ) AXML system acyclic if dependency graph is acyclic acyclic systems always terminate S.Skritek – Theory of PDM 37/54

  57. 3.Materialization of Data in Peer Data Management 3.2. Active XML Restricted Systems Try to find decidable systems Acyclic Systems dependency graph ( V , E ) of S = ( D , F , I ) : • V : D ∪ F (document and function names) • E : edge ( d , f ) if f occurs in I ( d ) , edge ( f , d ) (resp. ( f , g ) ) if d (resp. g ) occurs in I ( f ) AXML system acyclic if dependency graph is acyclic acyclic systems always terminate Simple Positive Systems Recall: simple queries: no tree variables For every simple positive system S : • [ S ] is regular • compute finite graph representation of [ S ] in EXPTIME • termination: decidable in EXPTIME , coNP hard S.Skritek – Theory of PDM 37/54

  58. 3.Materialization of Data in Peer Data Management 3.2. Active XML Querying Positive Systems Instead of materialization: just consider query answering Definition ( q -finite) AXML system S is q -finite if [ q ]( S ) is finite S.Skritek – Theory of PDM 38/54

  59. 3.Materialization of Data in Peer Data Management 3.2. Active XML Querying Positive Systems Instead of materialization: just consider query answering Definition ( q -finite) AXML system S is q -finite if [ q ]( S ) is finite q : non-simple query undecidable whether positive system S is q -finite acyclic systems are q -finite simple positive systems: deciding q -finiteness is coNP hard and in EXPTIME S.Skritek – Theory of PDM 38/54

  60. 3.Materialization of Data in Peer Data Management 3.2. Active XML Querying Positive Systems Instead of materialization: just consider query answering Definition ( q -finite) AXML system S is q -finite if [ q ]( S ) is finite q : non-simple query undecidable whether positive system S is q -finite acyclic systems are q -finite simple positive systems: deciding q -finiteness is coNP hard and in EXPTIME q : simple query result is always finite BUT: for non-simple positive systems S : testing if [ q ]( S ) is nonempty is undecidable S.Skritek – Theory of PDM 38/54

  61. 3.Materialization of Data in Peer Data Management 3.2. Active XML Lazy Query Evaluation It might not be necessary to invoke a service answering a query irrelevant for answer just return call to service in answer (lazy evaluation) S.Skritek – Theory of PDM 39/54

  62. 3.Materialization of Data in Peer Data Management 3.2. Active XML Lazy Query Evaluation It might not be necessary to invoke a service answering a query irrelevant for answer just return call to service in answer (lazy evaluation) Definition (possible answer) AXML document α is a possible answer if [ α ] = [[ q ]( I )] S.Skritek – Theory of PDM 39/54

  63. 3.Materialization of Data in Peer Data Management 3.2. Active XML Lazy Query Evaluation It might not be necessary to invoke a service answering a query irrelevant for answer just return call to service in answer (lazy evaluation) Definition (possible answer) AXML document α is a possible answer if [ α ] = [[ q ]( I )] ⇒ not expanding function nodes N still gives a possible answer? ( q –unneeded) S.Skritek – Theory of PDM 39/54

  64. 3.Materialization of Data in Peer Data Management 3.2. Active XML Lazy Query Evaluation It might not be necessary to invoke a service answering a query irrelevant for answer just return call to service in answer (lazy evaluation) Definition (possible answer) AXML document α is a possible answer if [ α ] = [[ q ]( I )] ⇒ not expanding function nodes N still gives a possible answer? ( q –unneeded) Given positive AXML system S , q , N in S , t : • undecidable if: d is possible answer to q ; function nodes in N need not be expanded; no more function needs to be expanded • For simple systems: in NEXPTIME , coNP hard S.Skritek – Theory of PDM 39/54

  65. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Updates in PDM Updates in Peer Data Management S.Skritek – Theory of PDM 40/54

  66. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Updates in PDM Updates in Peer Data Management Updates in PDI: no problem in PDM: may lead to inconsistencies ⇒ problem S.Skritek – Theory of PDM 40/54

  67. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Updates in PDM Updates in Peer Data Management Updates in PDI: no problem in PDM: may lead to inconsistencies ⇒ problem Other concerns so far: “global” systems Trust Provenance information S.Skritek – Theory of PDM 40/54

  68. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Updates in PDM Updates in Peer Data Management Updates in PDI: no problem in PDM: may lead to inconsistencies ⇒ problem Other concerns so far: “global” systems Trust Provenance information Take a look onto the Orchestra system S.Skritek – Theory of PDM 40/54

  69. 3.Materialization of Data in Peer Data Management 3.3. Orchestra General Setting schema mappings: • (weakly acyclic) sets of TGDs P 2 users work on their local copies from time to time, they P 1 • publish their updates and • retrieve updates of other users trust conditions on the mappings P 3 ⇒ need for provenance information S.Skritek – Theory of PDM 41/54

  70. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Update Propagation User Actions: • Insert, Delete, Publish/Import Maintain local edit log Answers over local database R ( � x ) ∆ R ( d ,� x ) • consistent with local edit log • for imported updates: certain answers S.Skritek – Theory of PDM 42/54

  71. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Update Propagation User Actions: ∆ G ( d ,� y ) • Insert, Delete, Publish/Import Maintain local edit log Answers over local database R ( � x ) ∆ R ( d ,� x ) • consistent with local edit log • for imported updates: certain answers S.Skritek – Theory of PDM 42/54

  72. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Update Propagation User Actions: ∆ G ( d ,� y ) • Insert, Delete, Publish/Import Maintain local edit log Answers over local database R ( � x ) ∆ R ( d ,� x ) • consistent with local edit log • for imported updates: certain answers ⇒ what data to materialize inconsistent updates: reconciliation algorithm (Taylor, Ives; Sigmod 2006) • resolve conflicts using priority mappings • user interaction if merging not possible here: assume consistent updates concentrate on what data to materialize S.Skritek – Theory of PDM 42/54

  73. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Semantics of Update Exchange R ( � ∆ R ( d ,� x ) x ) S.Skritek – Theory of PDM 43/54

  74. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Semantics of Update Exchange R i R r R ℓ R ( � ∆ R ( d ,� x ) x ) R o Split every relation R : • R ℓ : local contributions table • R r : rejections table • R i : input table • R o : output table S.Skritek – Theory of PDM 43/54

  75. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Semantics of Update Exchange R i R r R ℓ R ( � ∆ R ( d ,� x ) x ) R o Translate mappings Σ → Σ ′ : Split every relation R : • for each m ∈ M : replace R • R ℓ : local contributions in lhs by R o and table in rhs by R i • R r : rejections table • R i : input table • R i ( � x ) ∧ ¬ R r ( � x ) → R o ( � x ) • R o : output table • R ℓ ( � x ) → R o ( � x ) S.Skritek – Theory of PDM 43/54

  76. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Semantics of Update Exchange (contd.) Recall Σ ′ : • R i ( � x ) ∧ ¬ R r ( � x ) → R o ( � x ) • R ℓ ( � x ) → R o ( � x ) • M ′ : weakly acyclic TGDs S.Skritek – Theory of PDM 44/54

  77. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Semantics of Update Exchange (contd.) Recall Σ ′ : • R i ( � x ) ∧ ¬ R r ( � x ) → R o ( � x ) • R ℓ ( � x ) → R o ( � x ) • M ′ : weakly acyclic TGDs Publish: • create new instance of R r , R ℓ Import: • recompute R i , R o (chase) S.Skritek – Theory of PDM 44/54

  78. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Semantics of Update Exchange (contd.) Recall Σ ′ : • R i ( � x ) ∧ ¬ R r ( � x ) → R o ( � x ) • R ℓ ( � x ) → R o ( � x ) • M ′ : weakly acyclic TGDs Publish: • create new instance of R r , R ℓ Import: • recompute R i , R o (chase) Definition (consistent system state) Instance � I , J � over schema � � R ℓ ∪ � R r , � R o ∪ � R i � is consistent if J = chase Σ ′ ( I ) computable in polynomial time (data complexity) S.Skritek – Theory of PDM 44/54

  79. 3.Materialization of Data in Peer Data Management 3.3. Orchestra Provenance Need to track from where tuples are derived, and how Provenance Token base tuple: tuple id derived tuple: polynomial • binary operators + , · • unary function for each mapping S.Skritek – Theory of PDM 45/54

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend