framework
play

Framework Information Integration : Making 1. XML databases from - PDF document

Framework Information Integration : Making 1. XML databases from various places work as one. Semi-structured Data : A new data 2. model designed to cope with problems Semi-structured Data of information integration. Extensible Markup


  1. Framework Information Integration : Making 1. XML databases from various places work as one. Semi-structured Data : A new data 2. model designed to cope with problems Semi-structured Data of information integration. Extensible Markup Language XML : A standard language for 3. Document Type Definitions describing semi-structured data schemas and representing data. 2 1 1. Information Integration 2. Semi-structured Data Generally databases in an enterprises have: � A new data model designed to cope � Several underlying database management � with problems of information systems integration Oracle, MS SQL Server, DB2, Informix, Sybase (SQL � Server), MS Access, etc. Accommodates of different DBMS � Several underlying database schemas � Integrates different schemas Information in an employee table can contain � � Employee Name, SSN, DOB, title, hrsPerWeek. � modifiedTime, modifiedBy Employee Name, SSN, DOB, title, degree, createTime, � createBy Employee Name, SSN, DOB, title, salary, modifiedTime, � modifiedBy, createTime, createBy 3 4 The Information-Integration 3. XML Problem XML : A standard language for Major bottleneck in enterprise � � describing semi-structured data application integration schemas and representing data. For example, � Hewlett Packard split into HP and Agilent � HP bought Compaq � Need to integrate data from different � sources 5 6 1

  2. The Information-Integration Problem Example Related data exists in many places � Consider merger of three stores in a � and could, in principle, work together. mall But different databases differ in: � There is some overlap in the products � sold but the databases are different Model (relational, object-oriented?). 1. Schema (normalized/unnormalized?). 2. Terminology: are consultants employees? 3. Retirees? Subcontractors? Conventions (meters versus feet?). 4. 7 8 Example Two Approaches to Integration � Every store has a database. Warehousing 1. � One may use a relational DBMS; another Makes a copy of the data � keeps the menu in an MS-Word document. More developed of the two � � One stores the phones of distributors, Mediation 2. another does not. Creates a view of the data � � One distinguishes products in one Newer and less developed � department and another doesn’t. � One counts inventory by number of items, another by cases. 9 10 Warehousing Mediation Make copies of the data sources at a central Create a view of all sources, as if they � � site and transform it to a common schema. were integrated. Reconstruct data daily/weekly Answer a view query by translating it to � � Do not try to keep it more up-to-date than that. terminology of the sources and querying � them. Pro: � Pro: Very well-developed, and several commercial tools are � � available Current data � Con: � Con: � Data can be old since updates are expensive � Can be slow � Availability of tools � 11 12 2

  3. Warehouse Diagram A Mediator User query Result Warehouse Mediator Query Result Result Query Wrapper Wrapper Wrapper Wrapper Query Result Query Result Source 1 Source 2 Source 1 Source 2 13 14 Semi-structured: Motivation Semi-structured: Motivation � Most effective approach to Information � Main limitation of Object-Oriented Integration: Models: Object Models are Strongly Typed � Semi-structured Data Model � Objects of a class have one structure only � or Semi-structured Objects � Semi-structured approach solves this problem 15 16 Semi-structured Data Semi-structured Data � Purpose: � Each object has a class of their own and properties are defined whatever labels � Represent data from independent sources more flexibly than are attached to that object � either relational � Properties mean attributes, relationships, � or object-oriented models. methods, etc. 17 18 3

  4. Semi-structured Graphs Semi-structured Data � Think of objects, but with the type of � Easy to think of Semi-structured data as each object its own business, not that Graphs of its “class.” � Nodes = objects. � Labels to indicate meaning of � Labels on arcs: substructures. � attributes leading to a leaf node � Relationships leading to another node. 19 20 Example: Data Graph Semi-structured Graphs Root object represents the entire DB. Often look like trees, but are not. Notice a � Atomic values at leaf nodes root new kind soda soda � nodes with no arcs out. of data. rest � Flexibility: no restriction on: manf manf prize PepsiCo � Labels out of a node. name name year award sellsAt � Number of successors with a given label. Pepsi 2003 Sobe BestSeller name addr The soda object for Pepsi KFC Main St (arc-in called soda; arc-out called name to Pepsi) The restaurant object for KFC (arc-in called rest; 21 22 arc-out labeled name to KFC) XML Well-Formed and Valid XML � XML = Extensible Markup Language. � Well-Formed XML allows you to invent your own tags. � While HTML uses tags for formatting � Similar to labels in semi-structured data graph. (e.g., “italic”), XML uses tags for � Valid XML involves a DTD (Document Type semantics (e.g., “this is an address”). Definition), which � Key idea: create tag sets for a domain � gives a grammar for the use of labels (e.g., genomics), and translate all data � limits the set of labels our of node into properly tagged XML documents. � the order and number of times a label occurs 23 24 4

  5. Well-Formed XML: Header Well-Formed XML: Body � Start the document with a declaration , � Body of document is a root tag surrounded by < ? … ?> . surrounding nested tags. � Body can include: � Normal declaration for Well-Formed � several properly matching tags (as in html XML is: structure) <? XML VERSION = “1.0” STANDALONE = “yes” ?> � Root tag can � Version indicates version number � have a special meaning such as document type � Standalone = “yes” means no DTD � or can be generic provided. 25 26 Tags Example: Well-Formed XML Root tag RESTS < ? XML VERSION = “1.0” STANDALONE = “yes” ?> � Tags, as in HTML, are normally surrounds the < RESTS> entire document < NAME> tag specifies the REST name matched pairs, as < REST> One of several nested < NAME> Taco Bell< /NAME> � < BLAH> … < /BLAH> . REST tags representing < SODA> < NAME> Pepsi< /NAME> information about a < PRICE> 1.00< /PRICE> < / SODA> single REST � Tags may be nested arbitrarily. < SODA> < NAME> Sobe< /NAME> < PRICE> 2.00< /PRICE> < /SODA> � Some tags requiring no matching ender, < /REST > < REST> … � such as < P> in HTML, are also permitted. < SODA> tags have names < /REST > and price for each Soda � however, we will not use these in examples … nested in < NAME> and < PRI CE> tags. < /RESTS> 27 28 XML and Semi-structured Data XML and Semi-structured Data � Well-Formed XML documents with � Semi-structured approach allows for nested tags is exactly the same idea as non-tree structures trees of semi-structured data. � We shall see that XML also enables � Tags are the labels on edges non-tree structures, as does the semi- � Nodes represent data between matching structured data model. tags � Parent-child relationship is immediate nesting in XML 29 30 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend