Plan 1. Information in tegration: imp ortan t new - - PDF document

plan 1 information in tegration imp ortan t new
SMART_READER_LITE
LIVE PREVIEW

Plan 1. Information in tegration: imp ortan t new - - PDF document

Plan 1. Information in tegration: imp ortan t new application that motiv ates what follo ws. 2. Semistructured data: a new data mo del designed to cop e with problems of information in tegration. 3. XML: a new W


slide-1
SLIDE 1 Plan 1. Information in tegration: imp
  • rtan
t new application that motiv ates what follo ws. 2. Semistructured data: a new data mo del designed to cop e with problems
  • f
information in tegration. 3. XML: a new W eb standard that is essen tiall y semistructured data. 4. X QUER Y: an emerging standard query language for XML data. 1
slide-2
SLIDE 2 Information In tegration Problem: related data exists in man y places. They talk ab
  • ut
the same things, but dier in mo del, sc hema, con v en tions (e.g., terminology). Example In the real w
  • rld,
ev ery bar has its
  • wn
database.
  • Some
ma y ha v e relations lik e b eer-price;
  • thers
ha v e an MS-w
  • rd
le from whic h the men u is prin ted.
  • Some
k eep phones
  • f
man ufacturers but not addresses.
  • Some
distinguish b eers and ales;
  • thers
do not. 2
slide-3
SLIDE 3 Tw
  • approac
hes 1. War ehousing : Mak e copies
  • f
information at eac h data source cen trally .

Reconstruct data daily/w eekl y/mon thly , but do not try to k eep it up-to-date. 2. Me diation : Create a view
  • f
all information, but do not mak e copies.

Answ er queries b y sending appropriate queries to sources. 3
slide-4
SLIDE 4 W arehousing W rapp er W rapp er Com biner DB1 DB2 W arehouse user query result 4
slide-5
SLIDE 5 Mediation W rapp er W rapp er DB1 DB2 Mediator query result query result result query query result query result 5
slide-6
SLIDE 6 Semistructured Data
  • A
dieren t kind
  • f
data mo del, more suited to information-in tegration applications than either relational
  • r
OO.

Think
  • f
\ob jects," but with the t yp e
  • f
an
  • b
ject its
  • wn
business rather than the business
  • f
the class to whic h it b elongs.

Allo ws information from sev eral sources, with related but dieren t prop erties, to b e t together in
  • ne
whole.
  • Ma
jor application: XML do cumen ts. 6
slide-7
SLIDE 7 Graph Represen tati
  • n
  • f
Semistructured Data
  • No
des =
  • b
jects.
  • No
des connected in a general ro
  • ted
graph structure.
  • Lab
els
  • n
arcs.
  • A
tomic v alues
  • n
leaf no des.
  • Big
deal: no restriction
  • n
lab els (roughly = attributes).

Zero,
  • ne,
  • r
man y c hildren
  • f
a giv en lab el t yp e are all OK. 7
slide-8
SLIDE 8 Example M'lob 1995 Gold Bud A.B. prize a w ard y ear name manf manf b eer b eer bar Jo e's Maple name addr serv edA t name 8
slide-9
SLIDE 9 XML (Extensible Markup Language) HTML uses tags for formatting (e.g., \itali c") . XML uses tags for seman tics (e.g., \this is an address").
  • Tw
  • mo
des: 1. Wel l-forme d XML allo ws y
  • u
to in v en t y
  • ur
  • wn
tags, m uc h lik e lab els in semistructured data. 2. V alid XML in v
  • lv
es a DTD (Do cumen t T yp e Denition) that tells the lab els and giv es a grammar for ho w they ma y b e nested. 9
slide-10
SLIDE 10 W ell-F
  • rmed
XML 1. Declaration = <? ... ?>.

Normal declaration is <? XML VERSION = "1.0" STANDALONE = "yes" ?>

\Standalone" means that there is no DTD sp ecied. 2. R
  • t
tag surrounds the en tire balance
  • f
the do cumen t.

<FOO> is balanced b y </FOO>, as in HTML. 3. An y balanced structure
  • f
tags OK.

Option
  • f
tags that don't r e quir e balance, lik e <P> in HTML. 10
slide-11
SLIDE 11 Example <?XML VERSION = "1.0" STANDALONE = "yes"?> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 11
slide-12
SLIDE 12 Do cumen t T yp e Denitions (DTD) Essen tially a grammar describing the legal nesting
  • f
tags.
  • In
ten tion is that DTD's will b e standards for a domain, used b y ev ery
  • ne
preparing
  • r
using data in that domain.

Example: a DTD for describing protein structure; a DTD for describing bar men us, etc. Gross Structure
  • f
a DTD <!DOCTYPE r
  • ot
tag [ <!ELEMENT name (components)> mor e el ements ]> 12
slide-13
SLIDE 13 Elemen ts
  • f
a DTD An element is a name (its tag) and a paren thesized description
  • f
tags within an elemen t.

Sp ecial case: (#PCDATA) after an elemen t name means it is text. Example <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> 13
slide-14
SLIDE 14 Comp
  • nen
ts
  • Eac
h elemen t name is a tag.
  • Its
comp
  • nen
ts are the tags that app ear nested within, in the
  • rder
sp ecied.
  • Multipli
ci t y
  • f
a tag is con trolled b y: a) * = zero
  • r
more
  • f.
b) + =
  • ne
  • r
more
  • f.
c) ? = zero
  • r
  • ne
  • f.
  • In
addition, | = \or." 14
slide-15
SLIDE 15 Using a DTD 1. Set STANDALONE = "no". 2. Either a) Include the DTD as a pream ble,
  • r
b) F
  • llo
w the XML tag b y a DOCTYPE declaration with the ro
  • t
tag, the k eyw
  • rd
SYSTEM, and a le where the DTD can b e found. 15
slide-16
SLIDE 16 Example
  • f
(a) <?XML VERSION = "1.0" STANDALONE = "no"?> <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 16
slide-17
SLIDE 17 Example
  • f
(b) Supp
  • se
  • ur
bars DTD is in le bar.dtd . <?XML VERSION = "1.0" STANDALONE = "no"?> <!DOCTYPE Bars SYSTEM "bar.dtd"> <BARS> <BAR><NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR> ... </BARS> 17
slide-18
SLIDE 18 A ttribute Lists Op ening tags can ha v e \argumen ts" that app ear within the tag, in analogy to constructs lik e <A HREF = ...> in HTML.
  • Keyw
  • rd
!ATTLIST in tro duces a list
  • f
attributes and their data t yp es. Example <!ELEMENT BAR (NAME BEER*)> <!ATTLIST BAR type = "sushi"|"sports"|"o ther" >
  • Bar
  • b
jects can ha v e a (bar) t yp e, and the v alue
  • f
that t yp e is limited to the three strings sho wn.
  • Example
  • f
use: <BAR type = "sushi"> . . . </BAR> 18
slide-19
SLIDE 19 ID's and IDREF's These are p
  • in
ters from
  • ne
  • b
ject to another, analogous to NAME = "foo" and HREF = "#foo" in HTML.
  • Allo
ws the structure
  • f
an XML do cumen t to b e a general graph, rather than just a tree.
  • An
attribute
  • f
t yp e ID can b e used to giv e the
  • b
ject (string b et w een
  • p
ening and closing tags) a unique string iden tier.
  • An
attribute
  • f
t yp e IDREF refers to some
  • b
ject b y its iden tier.

Also IDREFS to allo w m ultiple
  • b
ject references within
  • ne
tag. 19
slide-20
SLIDE 20 Example Let us include in
  • ur
Bars do cumen t t yp e elemen ts that are the man ufacturers
  • f
b eers, and ha v e eac h b eer
  • b
ject link, with an IDREF, to the prop er man ufacturer
  • b
ject. <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*, MANF*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT MANF (ADDR)> <!ATTLIST MANF (name ID)> <!ELEMENT ADDR (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ATTLIST BEER (manf = IDREF)> <!ELEMENT PRICE (#PCDATA)> ]> 20
slide-21
SLIDE 21 X QUER Y Emerging standard for querying XML do cumen ts. Basic form: FOR <variables ranging
  • ver
sets
  • f
elements> WHERE <condition> RETURN <set
  • f
elements>;
  • Sets
  • f
elemen ts describ ed b y p aths, consisting
  • f:
1. URL, if necessary . 2. Elemen t names forming a path in the semistructured data graph, e.g., //BAR/NAME = \start at an y BAR no de and go to a NAME c hild." 3. Ending condition
  • f
the form [<condition about subelements, attributes (preceded by @), and values>]. 21
slide-22
SLIDE 22 Example The le http://www.stanford. edu/b ars.x ml: <?XML VERSION = "1.0" STANDALONE = "no"?> <!DOCTYPE Bars SYSTEM "bar.dtd"> <BARS> <BAR type = "sports"> <NAME>Joe's Bar</NAME> <BEER><NAME>Bud< /NAME > <PRICE>2.50</PRICE>< /BEER > <BEER><NAME>Mill er</N AME> <PRICE>3.00</PRICE>< /BEER > </BAR> <BAR type = "sushi"> <NAME>Homma's</N AME> <BEER><NAME>Sapp
  • ro</
NAME> <PRICE>4.00</PRICE>< /BEER > </BAR> ... </BARS> 22
slide-23
SLIDE 23 X QUER Y Query Find the prices c harged for Bud b y sp
  • rts
bars that serv e Miller. FOR $ba IN document("http://w ww.st an- ford.edu/bars.ht ml") //BAR[@type = "sports"], $be IN $ba/BEER[NAME = "Bud"] WHERE $ba/BEER/[NAME = "Miller"] RETURN $be/PRICE; 23