xml and web data data in html
play

XML and Web Data Data in HTML HyperText Markup Language - PDF document

XML and Web Data Data in HTML HyperText Markup Language Different data elements are set out using tags No schema? Based on the data itself, we can make a reasonable guess about the structure Self-describing CMPT


  1. XML and Web Data

  2. Data in HTML • HyperText Markup Language – Different data elements are set out using tags • No schema? – Based on the data itself, we can make a reasonable guess about the structure – “Self-describing” CMPT 354: Database I -- XML 2

  3. 3 Object and Schema CMPT 354: Database I -- XML

  4. Semi-structured Data • Object-like: it can be represented as a collection of objects • Schemaless: it is not guaranteed to conform to any type structure • Self-describing – Often carries only the names of the attributes and has a lower degree of organization than the data in the database • Semi-structured data: data with the above characteristics CMPT 354: Database I -- XML 4

  5. Schemaless But Self-Describing (#12345, [ListName:“Students”, Contents:{ [Name:“John Doe”, ID:“111111111”, Address:[Number:123, Street:“Main St”] ], [Name:“Joe Public”, Id:“666666666”, Address:[Number:666, Street:“Hollow Rd”] ]} ] ) CMPT 354: Database I -- XML 5

  6. XML • Extensible Markup Language – A standard adopted in 1998 by the W3C (World Wide Web Consortium) • Optional mechanisms for specifying document structure – DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top of XML • Query languages for XML – XPath: lightweight – XSLIT: document transformation language – XQuery: a full-blown language CMPT 354: Database I -- XML 6

  7. 7 From HTML to XML CMPT 354: Database I -- XML

  8. HTML and XML • HTML – A fixed number of tags – Each tag has its own well-defined meaning • E.g., <table> … </table> • XML: HTML-like language – An arbitrary number of user-defined tags – No a priori semantics – Mainly for data exchange – Display using stylesheet CMPT 354: Database I -- XML 8

  9. Important Differences • XML contains a large assortment of tags chosen by the document author – The only valid tags in HTML are those sanctioned by the official specification of the language; other tags are ignored • Every opening tag must have a matching closing tag, and the tags must be properly nested – E.g., <a><b></a></b> is not allowed – Some HTML tags are not required to be closed, e.g., <p> • The document has a root element – the element that contains all other elements CMPT 354: Database I -- XML 9

  10. Example Mandatory statement Root element XML elements Element names Element contents CMPT 354: Database I -- XML 10

  11. Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Name: Joe Public Id: 111111111 Id: 666666666 Address Address Number: 123 Number: 666 Street: Main St Street: Hollow Rd CMPT 354: Database I -- XML 11

  12. Attributes • <PersonList Type=“Student”> – Type is the name of an attribute that belongs to the element PersonList – Student is the attribute value – All attribute values must be quoted – Text strings between tags do not need to be quoted • Empty element – <Title Value=“Student List”/> – The element has one attribute and no content – A shorthand for <Title Value=“Student List”></Title> CMPT 354: Database I -- XML 12

  13. Processing Instructions & Comments • Processing instructions – <?xml version=“1.0” ?> – Contain anything the author might want to communicate to the XML processor, e.g., <?my-command go bring coffee?> – Rarely used • Comment – <!-- A comment --> – Can occur everywhere except inside the markups, i.e., between symbols < and > – An integral part of the document – May be used by a receiver (e.g., a browser) CMPT 354: Database I -- XML 13

  14. CDATA Construct • Include strings of characters which contain markup elements that might make the document ill formed • <![CDATA[ This is an example of markup in HTML: <b><i> Example <\b><\i>]]> CMPT 354: Database I -- XML 14

  15. XML Elements and Data Objects • XML allows mixed data/text structure • XML elements are ordered • XML has only one primitive type, string, and very weak facilities for specifying constraints <Address> A legal XML document <Number> 123 </Number> <Address> <Street> Main St </Street> Sally lives on </Address> <Street> Main St </Street> is different from house number <Address> <Number> 123 </Number> <Street> Main St </Street> in the beautiful Anytown, Canada. <Number> 123 </Number> </Address> </Address> CMPT 354: Database I -- XML 15

  16. Use of Attributes • An element can have any number of user-defined attributes • What attributes can do can also be achieved with elements – An attribute may occur only once within a tag, while subelements with the same tag may be repeated • Attributes introduce ambiguity as to whether to represent information as attributes or elements – Sometimes convenient for representing data, can also be done with elements – The use of attributes is expected to decline <Address> <Number> 123 </Number> <Address Number=“123” Street=“Main St/> <Street> Main St </Street> </Address> CMPT 354: Database I -- XML 16

  17. Attributes in Markup <Act Number=“5”> <Scene Number=“1” Place=“Mantua. A street”> … <Apothecary Voice=“scared”> Such mortal drugs I have; but Mantua’s law Is death to any he that utters them. </Apothecary> <Romeo Voice=“persistent”> Art thou so bare and full of wretchedness, And fear’st to die? … </Romeo> … </Scene> </Act> CMPT 354: Database I -- XML 17

  18. Advantages of Attributes • Attributes in an element are not ordered – <Address Number=“123” Street=“Main St”/> – <Address Street=“Main St” Number=“123”/> • Attributes are more succinct • Attributes can be declared to have unique value and can be used to enforce limited kind of referential integrity <Address> <Number> 123 </Number> <Street> Main St </Street> </Address> CMPT 354: Database I -- XML 18

  19. ID and IDREF – Cross-References CMPT 354: Database I -- XML 19

  20. Well Formed XML Document • It has a root element • Every opening tag is followed by a matching closing tag, and the elements are properly nested inside each other • Any attribute can occur at most once in a given opening tag, its value must be provided, and this value must be quoted CMPT 354: Database I -- XML 20

  21. Namespaces • A term (tag) might have different meanings in different contexts – <name><First>John</First> <Last>Doe</Last></Name> – <Name>Simon Fraser University</Name> • Every XML tag must have two parts: namespace and local name – General structure: namespace:local-name – Namespace represented by URI (uniform resource identifier) • An abstract identifier (a general unique string) • URL (uniform resource locator) CMPT 354: Database I -- XML 21

  22. Example – Namespace • Namespaces are defined using the attribute xmlns – All names xml* should be considered reserved • Default namespace xmlns=“…” – Only one default namespace • Other namespace xmlns:toy=“…” – Prefixes (e.g., toy) must be distinct <item xmlns=“http://www.acmeinc.com/jp#supplies” xmlns:toy=“http://www.acmeinc.com/jp#toys”> <name>backpack</name> <feature> <toy:item> <toy:name>cyberpet</toy:name> </toy:item> </feature> </item> CMPT 354: Database I -- XML 22

  23. Namespace Declarations • Namespace as prefix – E.g., toy:item, toy:name – Tags without prefix belong to the default namespace • Namespace declarations have scope – Can be nested like a program block CMPT 354: Database I -- XML 23

  24. Example – Scopes of Namespaces <item xmlns=“http://www.acmeinc.com/jp#supplies” xmlns:toy=“http://www.acmeinc.com/jp#toys”> <name>backpack</name> <feature> <toy:item> <toy:name>cyberpet</toy:name> </toy:item> </feature> <item xmlns=“http://www.acmeinc.com/jp#supplies2” xmlns:toy=“http://www.acmeinc.com/jp#toys2”> <name>notebook</name> <toy:name>sticker</toy:name> </item> </item> CMPT 354: Database I -- XML 24

  25. More About Namespace • The name of a namespace is just a string that happens to be a URL • Not necessarily it is a real address that contains some kind of schema describing the corresponding set of names • Don’t be misled by the URL! CMPT 354: Database I -- XML 25

  26. Summary • HTML and XML: differences and applications • Structure of XML – Elements – Attributes – Well formed XML documents • Namespace CMPT 354: Database I -- XML 26

  27. To-Do-List • Can every relational table be represented in XML? Can every XML document be represented in a relational table? • RSS is an application of XML. Try to understand the two RSS segments at http://www.xml.com/pub/a/2002/12/18/dive- into-xml.html CMPT 354: Database I -- XML 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend