presented by amel benna agenda
play

Presented by Amel Benna Agenda Background 1. Data Integration - PowerPoint PPT Presentation

The International Workshop on Advanced Information Systems for Enterprises Constantine April,19-20, 2008 Nadir Salhi , & Amel Benna, CERIST, University A/Mira Bejaia, Algeria Zaia Alimazighi LSI, USTHB Algers, Algeria Bilal Amrouche &


  1. The International Workshop on Advanced Information Systems for Enterprises Constantine April,19-20, 2008 Nadir Salhi , & Amel Benna, CERIST, University A/Mira Bejaia, Algeria Zaia Alimazighi LSI, USTHB Algers, Algeria Bilal Amrouche & Ferhat Makhloufi INI, Algeria Presented by Amel Benna

  2. Agenda Background 1. Data Integration issues 2. Our Approach for Data Integration 3. Architecture � Schema Description Base � Query process � Implementation 4. Conclusion & Perspectives 5. Constantine April,19-20, 2008 IWAISE'08 2

  3. Background • Information Systems are evolving Today… the issue is too many databases, too much information in heterogeneous & distributed environments. • In order to be efficient, companies need to manage and integrate all information sources taking into account semantics . Constantine April,19-20, 2008 IWAISE'08 3

  4. Background Different ways Different ways to Query to Reply • How can data sources cooperate? Source n Source 1 Schema Source 2 Schema • How to integrate new sources? Schema Oracle DB2 SQL ... • How to find data semantics? Server Diverse data sources Data Integration System • to provide a uniform access to heterogeneous source. • to join partial replies from heterogeneous sources. Constantine April,19-20, 2008 IWAISE'08 4

  5. Data Integration Issues � Definition: “ The data integration is the process by which several sources of autonomous data, distributed and under heterogeneous shape are integrated as a unique source represented by a global schema” . � Among Issues to be addressed : Heterogeneity � Model level: RDBMS, OODBMS, XML, … � Structure: Eg. DB1:Book (Title, Author,) ,DB2:Book(Title, ISBN,) � Semantics: � Names: Eg. Label “NAME” used for Book Title, Author,… � Scaling & precision conflicts: Eg. Book price in DB1 in Euro with VAT, in DB2 in $ without VAT. Constantine April,19-20, 2008 IWAISE'08 5

  6. Data Integration Issues Related Research in Semantic Interoperability for DB is categorized 1. Query-oriented (based on declarative languages or extended SQL) User - + Scalability Manual resolution of Multibase semantic conflicts Language Source Source Source n 2 1 Constantine April,19-20, 2008 IWAISE'08 6

  7. Data Integration Issues Related Research in Semantic Interoperability for DB is categorized 1. Query-oriented (based on declarative languages or extended SQL) 2. Mapping-based (mapping between global & local schemas) - + User Dependancy of particular Transparency & Global schema global schemas semantic conflicts Scalablility resolved Complexity of building global Integration schema Source Source Source n 2 1 Constantine April,19-20, 2008 IWAISE'08 7

  8. Data Integration Issues Related Research in Semantic Interoperability for DB is categorized 1. Query-oriented (based on declarative languages or extended SQL) 2. Mapping-based (mapping between global & local schemas) 3. Intermediary-based (Mediator-Wrapper) Mediator : User System • Integrates data from different representations (mapping using GAV or LAV) Global schema • Decompose the query Mediator • Re-compose the replies Wrapper Wrapper Wrapper n 1 2 Wrappers convert to common representation Query from mediator & Reply from source. Source Source Source n 1 2 Constantine April,19-20, 2008 IWAISE'08 8

  9. Our Approach for Data Integration 1. Intermediary-based approach (Mediator-Wrapper) 2. Use domain ontology to resolve semantic conflicts 3. We have defined “ Schema Description Base” to store and manage mappings between ontology and sources 4. A user Query Format based on ontology concept and similar to SQL. 5. Algorithms for localization of the sources, decomposition of the query, re-composition of the replies. Focused on relational data bases as data sources . Constantine April,19-20, 2008 IWAISE'08 9

  10. Architecture User Level Mediator Level Database Level Request Ontology Wrapper Query Query Processin Processin Wrapper g g Module Module Schema Schema Description Wrapper Description Reply Base Base Constantine April,19-20, 2008 10 IWAISE'08

  11. Architecture User Level 1. � The user has an interface allowing him to write his requests using ontology concepts. � The ontology is described with OWL language: concepts, properties and relations. � The user's request is written in the format: Individual Book SELECT [ List of properties] Write FROM [ List of concepts | relation between concepts] Student Author ISBN WHERE [ List of conditions] Name Name Eg.: SELECT BOOK.ISBN, Author.Name Concept FROM Book, Author, Write(Book, Author) Is a WHERE Book.price<100 has Property Relation Example : Domain Ontology Constantine April,19-20, 2008 IWAISE'08 11

  12. Architecture Mediator Level Ontology Wrapper Query Query Processing Processing Wrapper Module Module Schema Wrapper Description Base Constantine April,19-20, 2008 12 IWAISE'08

  13. Schema Description Base: Mapping Ontology - Source � The Schema Description Base is a database that store mappings between ontology and sources. � In our case, this is done manually by the DBA of each source. � Our mapping is based on the methodology of building an ontology from a relational DB. � This mapping can be defined as follows : - Every attribute of a schema source can be associate to a property or to a Concept. - Every foreign key can be associated to an ontology relation Constantine April,19-20, 2008 IWAISE'08 13

  14. Schema Description Base Constantine April,19-20, 2008 IWAISE'08 14

  15. Query Process Constantine April,19-20, 2008 IWAISE'08 15

  16. Query Process Analysis of the global request: 1. � Extracting the different components of the global request � Finding equivalent elements in the sources. Localization of the sources : 2. Select from the Schema Description Base the sources that provide � a partial or complete answer to the global request. Relevant source contain: � All the attributes equivalent to the elements of the global request. � Partial properties of the global request that can be joined with other attributes of other sources. � Some of the properties of the global request. Constantine April,19-20, 2008 IWAISE'08 16

  17. Query Process 3. Decomposition and Re-writing of the global request into sub-queries Q : Decomposition ( eg. Book name, Book Author, City) Q n (S n ) Q 5 (S 5 ) … Q 1 (S 1 ) (Eg. ISBN,City, Edition) (eg.Book name, Book Author, ISBN) Source Source Source 5 n 1 Constantine April,19-20, 2008 IWAISE'08 17

  18. Query Process 4. Execution of sub-query • Each sub query is run by each of the local DBMS • Wrapper translates the replies generated from the DBMS into a common format for the mediator. 5. Re-composition of the replies: R: Recomposition R 5 (S 5 ) ∪ ( R 1 (S 1 ) ∩ R 3 (S 3 )) ( eg. Book name, Book authors, City) R 1 (S 1 ) R 3 (S 3 ) (eg.Book name, Book authors, ISBN) (Eg. ISBN,City, Edition R 5 (S 5 ) … ( eg. Book name, Book authors, City) Constantine April,19-20, 2008 IWAISE'08 18

  19. Implementation Application Level Databases Tomcat Application server PostgreSQL + MySQL DB1 Jena API AXIS2 DB2 Wrapper … (Web Service) Ontologie OWL DBn JAVA PostgreSQL Schema Query Processing Module Description Base Constantine April,19-20, 2008 IWAISE'08 19

  20. Conclusion & Perspectives � Our approach is based on: Intermediary-based approach (Mediator-Wrapper) � A shared ontology that respects the autonomy of every relational � source, and resolve some semantic conflicts. A newly defined concept of “ Schema Description Base” to find � relevant sources. A user Query Format based on ontology concept and similar to � SQL. Specific Algorithms for the Query Processing Module. � � Prototype Implemented . Constantine April,19-20, 2008 IWAISE'08 20

  21. Conclusion & Perspectives � In our solution, the mapping is done manually for every relational source. � Our future work, is about : � Automating management of mappings � Define other criteria for joining sources. � Optimize the query process. Constantine April,19-20, 2008 IWAISE'08 21

  22. Thank You Constantine April,19-20, 2008 IWAISE'08 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend