Towards a Formal Model for View Maintenance in Data Warehouses D. - - PowerPoint PPT Presentation

towards a formal model for view maintenance in data
SMART_READER_LITE
LIVE PREVIEW

Towards a Formal Model for View Maintenance in Data Warehouses D. - - PowerPoint PPT Presentation

Towards a Formal Model for View Maintenance in Data Warehouses D. Agrawal, A. El Abbadi, A. Most efaoui, M. Raynal and M. Roy Towards a Formal Modelfor View Maintenance in Data Warehouses p.1/22 Summary The Data Warehouse Problem


slide-1
SLIDE 1

Towards a Formal Model for View Maintenance in Data Warehouses

  • D. Agrawal, A. El Abbadi, A. Most´

efaoui, M. Raynal and M. Roy

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.1/22

slide-2
SLIDE 2

Summary

The Data Warehouse Problem Definitions Existing protocols A Formal Definition of the Problem Formal Definition of Data Objects Abstract Definition of View Management The Protocol A Virtual Topology A Pipelining Technique

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.2/22

slide-3
SLIDE 3

The Data Warehouse Problem

A set of databases x1, x2, · · · , xn How to efficiently query a database aggregate? Query

x2 x3 x4 x5 x1

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.3/22

slide-4
SLIDE 4

The Data Warehouse Problem

A set of databases x1, x2, · · · , xn How to efficiently query a database aggregate? By adding a Data Warehouse Query

x2 x3 x4 x5 x1 Data Warehouse

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.3/22

slide-5
SLIDE 5

Data Warehouse: Definition

The Data Warehouse maintains a DB summary a Select-Project-Join (SPJ) expression: F(X1, · · · , Xn) = ΠA(σC(X1 ⊲ ⊳ · · · ⊲ ⊳ Xn)) Data Warehouse (DWH) problem ≡ calculus of a “Simple” distributed function with changing Data Sources.

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.4/22

slide-6
SLIDE 6

Extremal Solutions

The DWH maintains the total aggregation of all Data Sources. costly in space unnecessary network usage The DWH stores no datum, and forwards queries to Data Sources high latency unnecessary network usage then, the DWH is just a proxy

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.5/22

slide-7
SLIDE 7

Proposed Solutions

The DWH maintains the SPJ expression F Periodically, it calculates the ∆F Major Problem: asynchrony of updates on Data Sources Error Terms

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.6/22

slide-8
SLIDE 8

Major Difficulties

Asynchrony and distribution of the model: Consistency issues Performance issues network usage memory/disk usage on dwh. Complexity of proposed protocols: unproved algorithms need for a formal definition of the problem.

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.7/22

slide-9
SLIDE 9

Formal Definitions (data)

Data Objects denoted xi a data manager is associated with each xi can be updated and read using the query/update primitives Timeline: the successive values of xi are denoted (x[t]

i )t>0.

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.8/22

slide-10
SLIDE 10

Formal Definitions (operations)

Data Operations add/remove, denoted ⊕, for source updates associative commutative. a join operation, denoted ⊗ associative, commutative, distributive over ⊕.

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.9/22

slide-11
SLIDE 11

Formal Definitions (dwh)

the Data Warehouse calculates F such that F = x1 ⊗ x2 ⊗ · · · ⊗ xn consistency is mandatory at any time. up-to-dateness is eventual for performance reasons

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.10/22

slide-12
SLIDE 12

Abstract Def. of View Management

A View Management protocol should satisfy: Validity any query on the dwh returns an f = x[t1]

1

⊗ · · · x[tn]

n .

Order Consistency If q1 = x[t11]

1

⊗ · · · x[t1n]

n

(resp q2 = x[t21]

1

⊗ · · · x[t2n]

n

) is the result of a query, if q1 was issued before q2, then ∀i, t1i ≤ t2i. Up-to-Dateness for any t > 0, for any i ∈ [1..n], an infinite sequence of queries will return at least an f = F(· · · , x[t′]

i , · · ·) with

t′ ≥ t.

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.11/22

slide-13
SLIDE 13

The Protocol: a single update

Suppose that F = x1 ⊗ x2 ⊗ x3 ⊗ x4. if x1 is updated to x1 ⊕ δ1, then the corresponding ∆F is: ∆F = δ1 ⊗ x2 ⊗ x3 ⊗ x4 x1’s data manager sends δ1 to x2:

x2’s data manager computes δ1 ⊗ x2 and sends the result to x3 x3’s data manager computes δ1 ⊗ x2 ⊗ x3 when x4’s data manager computes δ1 ⊗ x2 ⊗ x3 ⊗ x4, it can send the result to the dwh

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.12/22

slide-14
SLIDE 14

The Protocol: Concurrent Updates

Now, suppose that both x1 and x2 are updated. F ′ = (x1 ⊕ δ1) ⊗ (x2 ⊕ δ2) ⊗ x3 ⊗ x4 F ′ = F ⊕ ∆F

∆F = (δ1⊗x2⊗x3⊗x4)⊕(x1⊗δ2⊗x3⊗x4)⊕(δ1⊗δ2⊗x3⊗x4)

complexity increases with concurrency two solutions:

  • 1. compute error terms
  • 2. order the updates

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.13/22

slide-15
SLIDE 15

The Protocol: a Virtual Topology

the star topology (center: dwh, edges: nodes) is seen as a ring a token perpetually moves on the ring it generates a natural order on updates

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.14/22

slide-16
SLIDE 16

The Protocol: Pipelining Updates

The token generates a global time (# of steps) the sites maintain an additional variable, the difference δi between the current xi and the last commited xi. when an update made a total rotation, it can be integrated to the data warehouse. the token can contain up to n updates in commitment phase.

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.15/22

slide-17
SLIDE 17

The Protocol: Code (1)

when the token arrives to xi with sequence number sn:

  • 1. let ∆F = token[i];
  • 2. if (∆F = ⊥) then sn ← sn + 1;

send incr (∆F , sn) to DWH endif;

  • 3. token[i] ← ∆i;
  • 4. ∀j = i do token[j] ← (token[j] ⊗ (xi ⊖ ∆i)) enddo;
  • 5. ∆i ← ⊥;
  • 6. send token sn (token, sn) to next data

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.16/22

slide-18
SLIDE 18

The Protocol: Code (2)

when update (δi) is received by xi:

  • 1. xi ← xi ⊕ δi;
  • 2. ∆i ← ∆i ⊕ δi

when incr (∆F, sn) is received by DWH:

  • 1. wait (next_sn = sn);
  • 2. f ← f ⊕ ∆F ;
  • 3. next_sn ← next_sn + 1

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.17/22

slide-19
SLIDE 19

The Protocol: Sketch for the Proof

Validity, Up-to-dateness and Order Consistency use a total order: the number of steps performed by the token induction on the content of the token

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.18/22

slide-20
SLIDE 20

a Real Life Protocol

How to make a quiescent protocol? when there is no update, then the token is destroyed. when an update occurs, the data source sends a request to the data warehouse if the token was destroyed, it is recreated

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.19/22

slide-21
SLIDE 21

a Real Life Protocol (2)

How to remove the ring assumption? in a star network, each message comes from/to the dwh the dwh incorporates updates and destroys/recreates the token when necessary

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.20/22

slide-22
SLIDE 22

Extension: Multi Term

Meta-datawarehouse: aggregation of multiple data warehouses a data object may appear in several views computed in the data warehouses

Meta−DWH x1x3x4+x2x3x4x5 x1 x2 x3 x4 x5 DWH1 DWH2

synchronization problems, possible deadlocks.

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.21/22

slide-23
SLIDE 23

Conclusion

a formal definition of a database problem an abstract protocol provable can be adapted to fit to real-life systems efficient

Towards a Formal Modelfor View Maintenance in Data Warehouses – p.22/22