XML and Web Data Data in HTML HyperText Markup Language - - PDF document

xml and web data data in html
SMART_READER_LITE
LIVE PREVIEW

XML and Web Data Data in HTML HyperText Markup Language - - PDF document

XML and Web Data Data in HTML HyperText Markup Language Different data elements are set out using tags No schema? Based on the data itself, we can make a reasonable guess about the structure Self-describing CMPT


slide-1
SLIDE 1

XML and Web Data

slide-2
SLIDE 2

CMPT 354: Database I -- XML 2

Data in HTML

  • HyperText Markup

Language

– Different data elements are set out using tags

  • No schema?

– Based on the data itself, we can make a reasonable guess about the structure – “Self-describing”

slide-3
SLIDE 3

CMPT 354: Database I -- XML 3

Object and Schema

slide-4
SLIDE 4

CMPT 354: Database I -- XML 4

Semi-structured Data

  • Object-like: it can be represented as a

collection of objects

  • Schemaless: it is not guaranteed to conform

to any type structure

  • Self-describing

– Often carries only the names of the attributes and has a lower degree of organization than the data in the database

  • Semi-structured data: data with the above

characteristics

slide-5
SLIDE 5

CMPT 354: Database I -- XML 5

Schemaless But Self-Describing

(#12345, [ListName:“Students”, Contents:{ [Name:“John Doe”, ID:“111111111”, Address:[Number:123, Street:“Main St”] ], [Name:“Joe Public”, Id:“666666666”, Address:[Number:666, Street:“Hollow Rd”] ]} ] )

slide-6
SLIDE 6

CMPT 354: Database I -- XML 6

XML

  • Extensible Markup Language

– A standard adopted in 1998 by the W3C (World Wide Web Consortium)

  • Optional mechanisms for specifying document

structure

– DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top of XML

  • Query languages for XML

– XPath: lightweight – XSLIT: document transformation language – XQuery: a full-blown language

slide-7
SLIDE 7

CMPT 354: Database I -- XML 7

From HTML to XML

slide-8
SLIDE 8

CMPT 354: Database I -- XML 8

HTML and XML

  • HTML

– A fixed number of tags – Each tag has its own well-defined meaning

  • E.g., <table> … </table>
  • XML: HTML-like language

– An arbitrary number of user-defined tags – No a priori semantics – Mainly for data exchange – Display using stylesheet

slide-9
SLIDE 9

CMPT 354: Database I -- XML 9

Important Differences

  • XML contains a large assortment of tags chosen

by the document author

– The only valid tags in HTML are those sanctioned by the official specification of the language; other tags are ignored

  • Every opening tag must have a matching closing

tag, and the tags must be properly nested

– E.g., <a><b></a></b> is not allowed – Some HTML tags are not required to be closed, e.g., <p>

  • The document has a root element – the element

that contains all other elements

slide-10
SLIDE 10

CMPT 354: Database I -- XML 10

Example

Root element Mandatory statement XML elements Element names Element contents

slide-11
SLIDE 11

CMPT 354: Database I -- XML 11

Hierarchical Structure

PersonList Student Title Contents Person Person Name: John Doe Id: 111111111 Address Number: 123 Street: Main St Name: Joe Public Id: 666666666 Address Number: 666 Street: Hollow Rd

slide-12
SLIDE 12

CMPT 354: Database I -- XML 12

Attributes

  • <PersonList Type=“Student”>

– Type is the name of an attribute that belongs to the element PersonList – Student is the attribute value – All attribute values must be quoted – Text strings between tags do not need to be quoted

  • Empty element

– <Title Value=“Student List”/> – The element has one attribute and no content – A shorthand for <Title Value=“Student List”></Title>

slide-13
SLIDE 13

CMPT 354: Database I -- XML 13

Processing Instructions & Comments

  • Processing instructions

– <?xml version=“1.0” ?> – Contain anything the author might want to communicate to the XML processor, e.g., <?my-command go bring coffee?> – Rarely used

  • Comment

– <!-- A comment --> – Can occur everywhere except inside the markups, i.e., between symbols < and > – An integral part of the document – May be used by a receiver (e.g., a browser)

slide-14
SLIDE 14

CMPT 354: Database I -- XML 14

CDATA Construct

  • Include strings of characters which contain

markup elements that might make the document ill formed

  • <![CDATA[ This is an example of markup in

HTML: <b><i> Example <\b><\i>]]>

slide-15
SLIDE 15

CMPT 354: Database I -- XML 15

XML Elements and Data Objects

  • XML allows mixed data/text structure
  • XML elements are ordered
  • XML has only one primitive type, string, and

very weak facilities for specifying constraints

<Address> <Number> 123 </Number> <Street> Main St </Street> </Address> is different from <Address> <Street> Main St </Street> <Number> 123 </Number> </Address> A legal XML document <Address> Sally lives on <Street> Main St </Street> house number <Number> 123 </Number> in the beautiful Anytown, Canada. </Address>

slide-16
SLIDE 16

CMPT 354: Database I -- XML 16

Use of Attributes

  • An element can have any number of user-defined

attributes

  • What attributes can do can also be achieved with elements

– An attribute may occur only once within a tag, while subelements with the same tag may be repeated

  • Attributes introduce ambiguity as to whether to represent

information as attributes or elements

– Sometimes convenient for representing data, can also be done with elements – The use of attributes is expected to decline

<Address> <Number> 123 </Number> <Street> Main St </Street> </Address> <Address Number=“123” Street=“Main St/>

slide-17
SLIDE 17

CMPT 354: Database I -- XML 17

Attributes in Markup

<Act Number=“5”> <Scene Number=“1” Place=“Mantua. A street”> … <Apothecary Voice=“scared”> Such mortal drugs I have; but Mantua’s law Is death to any he that utters them. </Apothecary> <Romeo Voice=“persistent”> Art thou so bare and full of wretchedness, And fear’st to die? … </Romeo> … </Scene> </Act>

slide-18
SLIDE 18

CMPT 354: Database I -- XML 18

Advantages of Attributes

  • Attributes in an element are not ordered

– <Address Number=“123” Street=“Main St”/> – <Address Street=“Main St” Number=“123”/>

  • Attributes are more succinct
  • Attributes can be declared to have unique value

and can be used to enforce limited kind of referential integrity

<Address> <Number> 123 </Number> <Street> Main St </Street> </Address>

slide-19
SLIDE 19

CMPT 354: Database I -- XML 19

ID and IDREF – Cross-References

slide-20
SLIDE 20

CMPT 354: Database I -- XML 20

Well Formed XML Document

  • It has a root element
  • Every opening tag is followed by a matching

closing tag, and the elements are properly nested inside each other

  • Any attribute can occur at most once in a

given opening tag, its value must be provided, and this value must be quoted

slide-21
SLIDE 21

CMPT 354: Database I -- XML 21

Namespaces

  • A term (tag) might have different meanings in

different contexts

– <name><First>John</First> <Last>Doe</Last></Name> – <Name>Simon Fraser University</Name>

  • Every XML tag must have two parts: namespace

and local name

– General structure: namespace:local-name – Namespace represented by URI (uniform resource identifier)

  • An abstract identifier (a general unique string)
  • URL (uniform resource locator)
slide-22
SLIDE 22

CMPT 354: Database I -- XML 22

Example – Namespace

  • Namespaces are defined using the attribute xmlns

– All names xml* should be considered reserved

  • Default namespace xmlns=“…”

– Only one default namespace

  • Other namespace xmlns:toy=“…”

– Prefixes (e.g., toy) must be distinct

<item xmlns=“http://www.acmeinc.com/jp#supplies” xmlns:toy=“http://www.acmeinc.com/jp#toys”> <name>backpack</name> <feature> <toy:item> <toy:name>cyberpet</toy:name> </toy:item> </feature> </item>

slide-23
SLIDE 23

CMPT 354: Database I -- XML 23

Namespace Declarations

  • Namespace as prefix

– E.g., toy:item, toy:name – Tags without prefix belong to the default namespace

  • Namespace declarations have scope

– Can be nested like a program block

slide-24
SLIDE 24

CMPT 354: Database I -- XML 24

Example – Scopes of Namespaces

<item xmlns=“http://www.acmeinc.com/jp#supplies” xmlns:toy=“http://www.acmeinc.com/jp#toys”> <name>backpack</name> <feature> <toy:item> <toy:name>cyberpet</toy:name> </toy:item> </feature> <item xmlns=“http://www.acmeinc.com/jp#supplies2” xmlns:toy=“http://www.acmeinc.com/jp#toys2”> <name>notebook</name> <toy:name>sticker</toy:name> </item> </item>

slide-25
SLIDE 25

CMPT 354: Database I -- XML 25

More About Namespace

  • The name of a namespace is just a string

that happens to be a URL

  • Not necessarily it is a real address that

contains some kind of schema describing the corresponding set of names

  • Don’t be misled by the URL!
slide-26
SLIDE 26

CMPT 354: Database I -- XML 26

Summary

  • HTML and XML: differences and

applications

  • Structure of XML

– Elements – Attributes – Well formed XML documents

  • Namespace
slide-27
SLIDE 27

CMPT 354: Database I -- XML 27

To-Do-List

  • Can every relational table be represented in

XML? Can every XML document be represented in a relational table?

  • RSS is an application of XML. Try to

understand the two RSS segments at http://www.xml.com/pub/a/2002/12/18/dive- into-xml.html