1
Database Heterogeneity
Lecture 13
2
Outline
- Database Integration
- Wrappers
- Mediators
- Integration Conflicts
- Data Warehousing
Database Heterogeneity Lecture 13 1 Outline Database - - PDF document
Database Heterogeneity Lecture 13 1 Outline Database Integration Wrappers Mediators Integration Conflicts Data Warehousing 2 Motivation If we all use the same database, things are quite simple However,
1
2
3
4
Ebay DVD
IMDB amazon Oracle PointBase MySQL IBM DB2 movie DB
5
6
database
Book(ISBN, Title, Price, Author) Author(Name, ISBN) Livre(ISBN, Prix, Titre) Auteur(Prenom, Nom, ISBN) Book(ISBN, Title) Author(Name, ISBN)
7
sources at run-time
Use S1, S2 Select Titre From S1.Book, S2.Livre Where S1.Book.ISBN = S2.Livre.ISBN
8
9
– data access over the network – inconsistent replicated data
– solved by using Web access (over HTTP) – Web Services, Java RMI, … – publishing using JSP – JDBC to access remote databases – etc.
10
11
12
13
DB1 DB2 DB3 DB4 Oracle PointBase MySQL IBM DB2 wrapper wrapper wrapper wrapper mediator application 1 application 2 application 3
mediators integrate the data from the DBs wrappers convert to a common representation
14
request/query result/data
Compensation for missing processing capabilities Transformation
Communication interface Source data Metadata
integrity constraints
15
16
17
18
19
– Semantics are important – integration mapping schema integration – can be implemented, e.g., as database views
– only for virtual integration
20
A simple Example: Source A: R1(prof, course, university) Source B: R2(title, prof, course) Definition of the global, integrated schema: Global(prof, course, title, university) Source A defined as: Create view R1 as SELECT prof, course, university FROM Global Source B defined as: Create view R2 sa SELECT title, prof, course FROM Global
21
(wrapper)
22
– source schema – source database – source application – database administrator, developer, user
23
– e.g. related names – One of the important current research topics
– reachability by paths
– distribution of values
24
25
View 1 View 2 View 3 Integrated Schema
Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema
...
Relational. DBMS Objectorient. DBMS File System Web Server
26
View 1 View 2 View 3 Integrated Schema
Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema
...
Relational. DBMS Objectorient. DBMS File System Web Server
27
View 1 View 2 View 3 Integrated Schema
Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema
...
Relational. DBMS Objectorient. DBMS File System Web Server
28
View 1 View 2 View 3 Integrated Schema
Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema
...
Relational. DBMS Objectorient. DBMS File System Web Server
29
View 1 View 2 View 3 Integrated Schema
Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema
...
Relational. DBMS Objectorient. DBMS File System Web Server
30
31
32
33
34
35
36
– similar to naming conflicts at schema level – Resolution: mapping tables and functions – Similarity functions
– of corresponding data instances – Resolution: mapping tables and functions – Prefer data from more trusted data source
37
38
39
40
– One approach to data integration
– (expressed e.g., as SQL views but stored)
41
– Extracted – Filtered – Merged – Enriched by historical information – Stored in a central repository
42
Data Warehouse End users DW Software Technology
Operational databases or
sources RDBMS, flat files or other sources One central repository
43
Snapshots done at given intervals (mat. views) Always up-to-date Data freshness Infrequent access to large data volumes Tuning for many queries to small data volumes Tuning / data volume Access to large data volumes Queries return relatively small results Data access For management decision to maximise profit (analysis, forecasts, ad-doc reports) Daily business Main usage Rather vague Typically known Requirements Looks at several “views” at the same time For a particular application / purpose Purpose DW Operational DB
by Klemens Boehm
44
to correct errors in data and potentially remove duplicate entries
45
– Heterogeneous resources and formats
– Certain data modifications are required
splitting/merging tables/columns
– Different models to store data:
46
– E.g., market analysis of demand and supply
– Requires multi-dimensional data structures such as data cubes, etc.
47
48
Source http://projects.cs.dal.ca/panda/datacube.html Source: http://hubpages.com/hub/DataCube
49
50
51