plan 1 information in tegration imp ortan t new
play

Plan 1. Information in tegration: imp ortan t new - PDF document

Plan 1. Information in tegration: imp ortan t new application that motiv ates what follo ws. 2. Semistructured data: a new data mo del designed to cop e with problems of information in tegration. 3. XML: a new W


  1. Plan 1. Information in tegration: imp ortan t new application that motiv ates what follo ws. 2. Semistructured data: a new data mo del designed to cop e with problems of information in tegration. 3. XML: a new W eb standard that is essen tiall y semistructured data. 4. X QUER Y: an emerging standard query language for XML data. 1

  2. Information In tegration Problem: related data exists in man y places. They talk ab out the same things, but di�er in mo del, sc hema, con v en tions (e.g., terminology). Example In the real w orld, ev ery bar has its o wn database. Some ma y ha v e relations lik e b eer-price; others � ha v e an MS-w ord �le from whic h the men u is prin ted. Some k eep phones of man ufacturers but not � addresses. Some distinguish b eers and ales; others do not. � 2

  3. Tw o approac hes 1. : Mak e copies of information at War ehousing eac h data source cen trally . ✦ Reconstruct data daily/w eekl y/mon thly , but do not try to k eep it up-to-date. 2. : Create a view of all information, Me diation but do not mak e copies. ✦ Answ er queries b y sending appropriate queries to sources. 3

  4. W arehousing user result query W arehouse Com biner W rapp er W rapp er DB1 DB2 4

  5. Mediation query result Mediator query query result result W rapp er W rapp er query result query result DB1 DB2 5

  6. Semistructured Data A di�eren t kind of data mo del, more suited � to information-in tegration applications than either relational or OO. ✦ Think of \ob jects," but with the t yp e of an ob ject its o wn business rather than the business of the class to whic h it b elongs. ✦ Allo ws information from sev eral sources, with related but di�eren t prop erties, to b e �t together in one whole. Ma jor application: XML do cumen ts. � 6

  7. Graph Represen tati on of Semistructured Data No des = ob jects. � No des connected in a general ro oted graph � structure. Lab els on arcs. � A tomic v alues on leaf no des. � Big deal: no restriction on lab els (roughly = � attributes). ✦ Zero, one, or man y c hildren of a giv en lab el t yp e are all OK. 7

  8. Example bar b eer b eer manf manf name prize Bud A.B. name y ear a w ard M'lob serv edA t 1995 Gold name addr Jo e's Maple 8

  9. XML (Extensible Markup Language) HTML uses tags for formatting (e.g., \itali c") . XML uses tags for seman tics (e.g., \this is an address"). Tw o mo des: � 1. XML allo ws y ou to in v en t Wel l-forme d y our o wn tags, m uc h lik e lab els in semistructured data. 2. XML in v olv es a DTD (Do cumen t V alid T yp e De�nition) that tells the lab els and giv es a grammar for ho w they ma y b e nested. 9

  10. W ell-F ormed XML 1. Declaration = ?> . <? ... ✦ Normal declaration is <? XML VERSION = "1.0" STANDALONE = "yes" ?> ✦ \Standalone" means that there is no DTD sp eci�ed. 2. tag surrounds the en tire balance of the R o ot do cumen t. ✦ is balanced b y </FOO> , as in <FOO> HTML. 3. An y balanced structure of tags OK. ✦ Option of tags that don't balance, r e quir e lik e in HTML. <P> 10

  11. Example <?XML VERSION = "1.0" STANDALONE = "yes"?> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 11

  12. Do cumen t T yp e De�nitions (DTD) Essen tially a grammar describing the legal nesting of tags. In ten tion is that DTD's will b e standards for � a domain, used b y ev ery one preparing or using data in that domain. ✦ Example: a DTD for describing protein structure; a DTD for describing bar men us, etc. Gross Structure of a DTD <!DOCTYPE r oot tag [ <!ELEMENT name ( components )> mor e el ements ]> 12

  13. Elemen ts of a DTD An is a name (its tag) and a paren thesized element description of tags within an elemen t. ✦ Sp ecial case: after an elemen t (#PCDATA) name means it is text. Example <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> 13

  14. Comp onen ts Eac h elemen t name is a tag. � Its comp onen ts are the tags that app ear � nested within, in the order sp eci�ed. Multipli ci t y of a tag is con trolled b y: � a) = zero or more of. * b) = one or more of. + c) = zero or one of. ? In addition, = \or." | � 14

  15. Using a DTD 1. Set = "no" . STANDALONE 2. Either a) Include the DTD as a pream ble, or b) F ollo w the tag b y a XML DOCTYPE declaration with the ro ot tag, the k eyw ord SYSTEM , and a �le where the DTD can b e found. 15

  16. Example of (a) <?XML VERSION = "1.0" STANDALONE = "no"?> <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 16

  17. Example of (b) Supp ose our bars DTD is in �le . bar.dtd <?XML VERSION = "1.0" STANDALONE = "no"?> <!DOCTYPE Bars SYSTEM "bar.dtd"> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 17

  18. A ttribute Lists Op ening tags can ha v e \argumen ts" that app ear within the tag, in analogy to constructs lik e in HTML. <A HREF = ...> Keyw ord in tro duces a list of !ATTLIST � attributes and their data t yp es. Example <!ELEMENT BAR (NAME BEER*)> <!ATTLIST BAR type = "sushi"|"sports"|"o ther" > Bar ob jects can ha v e a (bar) t yp e, and the � v alue of that t yp e is limited to the three strings sho wn. Example of use: � <BAR type = "sushi"> . . . </BAR> 18

  19. ID's and IDREF's These are p oin ters from one ob ject to another, analogous to and in NAME = "foo" HREF = "#foo" HTML. Allo ws the structure of an XML do cumen t to � b e a general graph, rather than just a tree. An attribute of t yp e can b e used to giv e ID � the ob ject (string b et w een op ening and closing tags) a unique string iden ti�er. An attribute of t yp e refers to some IDREF � ob ject b y its iden ti�er. ✦ Also to allo w m ultiple ob ject IDREFS references within one tag. 19

  20. Example Let us include in our do cumen t t yp e elemen ts Bars that are the man ufacturers of b eers, and ha v e eac h b eer ob ject link, with an IDREF, to the prop er man ufacturer ob ject. <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*, MANF*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT MANF (ADDR)> <!ATTLIST MANF (name ID)> <!ELEMENT ADDR (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ATTLIST BEER (manf = IDREF)> <!ELEMENT PRICE (#PCDATA)> ]> 20

  21. X QUER Y Emerging standard for querying XML do cumen ts. Basic form: FOR < variables ranging over sets of elements > WHERE < condition > RETURN < set of elements > ; Sets of elemen ts describ ed b y aths , consisting p � of: 1. URL, if necessary . 2. Elemen t names forming a path in the semistructured data graph, e.g., = \start at an y no de and //BAR/NAME BAR go to a c hild." NAME 3. Ending condition of the form [ < condition about subelements, attributes (preceded by @), and values > ] . 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend