IRIT Lab., iiWAS'12 & Momm'12 1
Evolution of Data Management Systems: from Uniprocessor to Largescale Distributed Systems
- Prof. Abdelkader Hameurlain
Evolution of Data Management Systems: from Uniprocessor to - - PowerPoint PPT Presentation
Pyramid Evolution of Data Management Systems: from Uniprocessor to Largescale Distributed Systems Prof. Abdelkader Hameurlain < Hameurlain@irit.fr> Institut de Recherche en Informatique de Toulouse IRIT Pyramid Team Paul Sabatier
Data Modelling & Semantic Query Processing & Optimization Concurrency Control (Transactions) Replication & Caching Cost Models Security and Reliability Issues Monitoring Services Resource Discovery Autonomic Data Management (selftuning, selfrepairing, …), … …
< Sequential /Indexed > Organization < Hashing/Relative> Organization
Sequential AM Key AM :=<Indexed/Hashing> AM
Data description must be done in each program Relationships between files are materialized (New Files)
Structured Data: Data Model Definition
Stored Data on Disk: I/O Management
Shared Data: Concurrency Control (Transactions, …)
What is the Objective of a DM? What is the Wealth of a DM?
Relational Algebra RA [Codd 72]: Basic Operations & Additional Operations Fundamental Characteristics of RA:
Internal Law: Opi (Ri, [Rj])= Relation
Commutative: R1xR2= R2xR1 ; SJ=JS
Algebraic Language/Rel. Algeb. Expression: P(S(J(Emp, Dept,
Declaratives Languages: SQL [Cham 76], QUEL [Sto 76], QBE [Zlo 77]
Logical Optimization: Rewriting of Algebraic Tree
Complex Queries: Number of Joins >6 ? Size of Research Space [Tan 91]: Very Large (e.g. 2 N1) Optimization Cost: can be very expansive Optimal Execution Plan: not guaranteed
Méthodes (2)
Fragmentation of Relations: Horizontal, Vertical, Hybrid Location sites Replication sites
Méthodes (2)
Direct Join: R(Site1) Join S (Site2); Transfer the smaller relation SemiJoin based Join: =<Project; SemiJoin; Join>
Determining the optimal execution site for each local subquery
Scheduling of intersite operators minimizing a cost function
Physical Optimization (Uniprocessor Env.) Parallelization (Parallel Env.)
Méthodes (2)
Méthodes (2)
Models ==> Pivot Model (e.g. Relational Model)
Semantic Conflicts (Integration of DB Schemes) Servers (e.g. local DBMS, Processors, …)
Data Sources can be structured in different models (Autonomy!)
Méthodes (2)
Information System of an Insurance Comp. DB: Toulouse Univ. Employees
Investigation Responsible
Users
Reformulation
Expressed query on the global schema
Result
RDBMS Web Pages
Méthodes (1)
Reducing Delays in Data Arrival Rates [Ams 98, Ive 99]
Reducing Network Traffic: Distant Interactions
+ Large Scale Environment
Resource Discovery & Selection Query Processing & Optimization Monitoring Services Replications & Caching Cost Models Autonomic Data Management (selftuning, selfrepairing, …) Security Issues, …
Morgan Kaufmann Publishers, 2004.
Elsevier, 2007, Vol. 23, N. 7, pp. 864-878.
2007, Vol. 5, N. 3, pp. 273-281.
Computation: Practice & Experience , Vol. 17, 2005, pp. 357–376.
IEEE CS, 2001, pp. 181-194.
Cooperative Computing , Vol. 3033, pp. 855-862.
IEEE Intl. Symposium on Cluster Computing and the Grid, 2006, pp. 115-122.
…
Workshop, IEEE CS, 2005, pp. 179-185.
4-5, 2007, pp. 339-358.
Elsevier, 2007, Vol. 23, N. 7, pp. 864-878.
Proceedings of the 3rd Intl. Joint Conference on Autonomous Agents and Multi-agent Systems, pages 8–15. IEEE, 2004
Cluster Computing and the Grid, Berlin, 2002, pp.350-351.
and e-Service, IEEE, 2005, pp. 222-225.
Knowledge and Grid, 2007, pp. 570-571.
Modeling Control and Automation and Intl. Conf. on Intelligent Agents Web Technologies and Intl Commerce, 2006, pp. 255- 256.