DTD and XML Schema XML Extensible Markup Language A standard - - PDF document

dtd and xml schema xml
SMART_READER_LITE
LIVE PREVIEW

DTD and XML Schema XML Extensible Markup Language A standard - - PDF document

DTD and XML Schema XML Extensible Markup Language A standard adopted in 1998 by the W3C (World Wide Web Consortium) Optional mechanisms for specifying document structure DTD: the Document Type Definition Language, part of the


slide-1
SLIDE 1

DTD and XML Schema

slide-2
SLIDE 2

CMPT 354: Database I -- DTD and XML Schema 2

XML

  • Extensible Markup Language

– A standard adopted in 1998 by the W3C (World Wide Web Consortium)

  • Optional mechanisms for specifying document

structure

– DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top of XML

  • Query languages for XML

– XPath: lightweight – XSLIT: document transformation language – XQuery: a full-blown language

slide-3
SLIDE 3

CMPT 354: Database I -- DTD and XML Schema 3

Example

Root element Mandatory statement XML element Element name Element content

slide-4
SLIDE 4

CMPT 354: Database I -- DTD and XML Schema 4

Hierarchical Structure

PersonList Student Title Contents Person Person Name: John Doe Id: 111111111 Address Number: 123 Street: Main St Name: Joe Public Id: 666666666 Address Number: 666 Street: Hollow Rd

slide-5
SLIDE 5

CMPT 354: Database I -- DTD and XML Schema 5

Document Type Definitions

  • A set of rules for structuring an XML document

– Specified as part of the document itself, or – Give a URL where its DTD can be found – A document that conforms to its DTD is said valid

  • XML does not require a document has a DTD, but

it must be well formed

  • A grammar that specifies a legal XML document,

based on the tags used in the document and their attributes

slide-6
SLIDE 6

CMPT 354: Database I -- DTD and XML Schema 6

Example – DTD

<!DOCTYPE PersonList[ <!ELEMENT PersonList (Title, Contents)> <!ELEMENT Title EMPTY> <!ELEMENT Contents (Person*)> <!ELEMENT Person (Name, Id, Address)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Id (#PCDATA)> <!ELEMENT Address (Number, Street)> <!ELEMENT Number (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST PersonList Type CDATA #IMPLIED Date CDATA #IMPLIED> <!ATTLIST Title Value CDATA #REQUIRED> ]>

slide-7
SLIDE 7

CMPT 354: Database I -- DTD and XML Schema 7

DTD Components

  • Name (e.g., PersonList)

– Must coincide with the tag name of the root element of the document

  • One ELEMENT statement

for each allowed tag, including the root tag

  • For each tag that can have

attributes, the ATTLIST statement specifies the allowed attributes and their types

<!DOCTYPE PersonList[ <!ELEMENT PersonList (Title, Contents)> <!ELEMENT Title EMPTY> <!ELEMENT Contents (Person*)> <!ELEMENT Person (Name, Id, Address)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Id (#PCDATA)> <!ELEMENT Address (Number, Street)> <!ELEMENT Number (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST PersonList Type CDATA #IMPLIED Date CDATA #IMPLIED> <!ATTLIST Title Value CDATA #REQUIRED> ]>

slide-8
SLIDE 8

CMPT 354: Database I -- DTD and XML Schema 8

Specification

  • *: a subelement can appear zero or more times

– +: a subelement can appear at least one time

  • #PCDATA (parsed character data), CDATA:

character strings

  • #IMPLIED: an attribute is optional
  • ?: a subelement is optional

– <!ELEMENT Person (Name, Id, Address?)>

  • |: alternatives of subelements

– <!ELEMENT Name ((First, Last)|(Last, First))>

slide-9
SLIDE 9

CMPT 354: Database I -- DTD and XML Schema 9

Types for Attributes

  • CDATA: character

strings

  • ID: unique values
  • IDREF: referential
  • IDREFS: list of IDREF
slide-10
SLIDE 10

CMPT 354: Database I -- DTD and XML Schema 10

DTD as Data Definition Language

  • There are some limitations
  • Namespaces are not in native design
  • DTD syntax is quite different from XML
  • Very limited set of basic types
  • Limited ways to specify data consistency

constraints

– No keys, weak referential integrity, no type references

  • No referential integrity for elements
  • Ordered elements
  • Global definition of elements
slide-11
SLIDE 11

CMPT 354: Database I -- DTD and XML Schema 11

Why XML Schema?

  • Use the same syntax as that used for ordinary XML

documents

– An alternative to DTD

  • Integrated with the namespace mechanism

– Different schemas can be imported from different namespaces and integrated into one schema

  • Provide a number of built-in types similar to SQL, e.g.,

string, integer, and time

  • Define complex types from simpler ones
  • The same element name can be defined as different types

depending on where the element is nested

  • Support keys and referential integrity constraints
  • Easy to specify documents where elements are unordered
slide-12
SLIDE 12

CMPT 354: Database I -- DTD and XML Schema 12

Schema and Instance

  • Goal: describing XML schema using XML
  • An XML document D that conforms to a

given schema (which is another XML document) is said to be schema valid

– D is called an instance of the schema

slide-13
SLIDE 13

CMPT 354: Database I -- DTD and XML Schema 13

XML Schema and Namespaces

  • An XML schema document begins with a declaration of the

namespaces to be used

  • http://www.w3.org/2001/XMLSchema – the namespace

identifying the names of tags and attributes used in a schema (not in the instances)

– Describe the structural properties of documents in general, e.g., schema, attribute, element, …

  • http://www.w3.org/2001/XMLSchema-instance – another

namespace used in conjunction with the above one

– Identify a small number of special names that are defined in the XML Schema Specification and are used in the instance documents, e.g., schemaLocation

  • The target namespace – identifies the set of names

defined by a particular schema document to be used in the instances

slide-14
SLIDE 14

CMPT 354: Database I -- DTD and XML Schema 14

Schema and An Instance Document

slide-15
SLIDE 15

CMPT 354: Database I -- DTD and XML Schema 15

Report Document

slide-16
SLIDE 16

CMPT 354: Database I -- DTD and XML Schema 16

Primitive Types

  • DTD has very limited primitive types

– CDATA, ID, IDREF, IDREFS

  • Many useful primitive types in XML Schema

– Decimal, integer, float, Boolean, date, …

  • Derive new primitive types from the basic
  • nes

– The mechanism is similar to the CREATE DOMAIN statement in SQL

slide-17
SLIDE 17

CMPT 354: Database I -- DTD and XML Schema 17

Deriving Simple Types

  • IDREFS is not one of the primitive types

<simpleType name=“myIdrefs”> <list itemType=“IDREF”/> </simpleType>

  • Union of multiple types

Suppose local phone numbers are 7 digits long and long distance numbers are 10 digits long <simpleType name=“phoneNumber”> <union memberTypes=“phone7digits phone10digits”/> </simpleType>

slide-18
SLIDE 18

CMPT 354: Database I -- DTD and XML Schema 18

Deriving Simple Types by Restriction

  • Constrain a basic type using one or more

constraints from a fixed repertoire defined by the XML Schema specification

<simpleType name=“phone7digits”> <restriction base=“integer”> <minInclusive value=“1000000”/> <maxInclusive value=“9999999”/> </restriction> </simpleType>

slide-19
SLIDE 19

CMPT 354: Database I -- DTD and XML Schema 19

More Examples

  • Phone numbers in XXX-YYYY format

<simpleType name=“phone7digitsAndDash”> <restriction base=“string”> <pattern value=“[0-9]{3}-[0-9]{4}”/> </restriction> </simpleType>

  • More restrictions on basic string type

– <length value=“7”/> – strings of length 7 – <minLength value=“7”/> – strings of length >= 7 – <maxLength value=“14”/> – strings of length <=14

slide-20
SLIDE 20

CMPT 354: Database I -- DTD and XML Schema 20

Enumeration

  • Restrict the domain to a finite set
  • Can be applied to any base type

<simpleType name=“emergencyNumbers”> <restriction base=“integer”> <enumeration value=“911”/> <enumeration value=“333”/> <enumeration value=“5431234”/> </restriction> </simpleType>

slide-21
SLIDE 21

CMPT 354: Database I -- DTD and XML Schema 21

More Examples on Simple Types

slide-22
SLIDE 22

CMPT 354: Database I -- DTD and XML Schema 22

Complex Types

slide-23
SLIDE 23

CMPT 354: Database I -- DTD and XML Schema 23

Basics of Complex Types

  • Tag complexType
  • Tag sequence: a list of elements that must occur

in the given order

  • Using minOccurs and maxOccurs
  • Associating attributes with type
  • A complex type can be associated with an element

<element name=“Student” type=“adm:studentType”/>

slide-24
SLIDE 24

CMPT 354: Database I -- DTD and XML Schema 24

Element without Content

  • Just associate attributes with types
  • Example

<complexType name=“courseTakenType”> <attribute name=“CrsCode” type=“adm:courseRef”/> <attribute name=“Semester” type=“string”/> </complexType>

slide-25
SLIDE 25

CMPT 354: Database I -- DTD and XML Schema 25

Compositors

  • Tags describing how elements can be combined into

groups, e.g., sequence

– Required when a tag has complex content – Required even if the type has only one child element!

  • Compositor all: allow elements appear in any order

<complexType name=“addressType”> <all> <element name=“StreetName” type=“string”/> <element name=“StreetNumber” type=“string”/> <element name=“city” type=“string”/> </all> </complexType>

slide-26
SLIDE 26

CMPT 354: Database I -- DTD and XML Schema 26

Restrictions on Compositor All

  • All must appear directly below complexType

<complexType name=“studentType2”> <sequence> <all> <element name=“First” type=“string”/> <element name=“Last” type=“string”/> </all> <element name=“Address” type=“string”/> </sequence> </complexType>

  • No element within all can be repeated

<complexType name=“studentType3”> <all> <element name=“First” type=“string”/> <element name=“Last” type=“string”/> <element name=“Address” type=“string” minOccurs=“1” maxOccurs=“unbounded”/> </all> </complexType>

slide-27
SLIDE 27

CMPT 354: Database I -- DTD and XML Schema 27

Compositor Choice

<complexType name=“addressType”> <sequence> <choice> <element name=“POBox” type=“string”/> <sequence> <element name=“Name” type=“string”/> <element name=“Number” type=“string”/> </sequence> </choice> <element name=“City” type=“string”/> </sequence> </complexType>

slide-28
SLIDE 28

CMPT 354: Database I -- DTD and XML Schema 28

Local Element Names

  • Two complex types can have elements that share the

same name

– Names of students and names of courses – Impossible in DTD, where all element declarations are global

slide-29
SLIDE 29

CMPT 354: Database I -- DTD and XML Schema 29

Anonymous Types

  • Useful for types that might not be reused

<element name=“Report”> <complexType> <sequence> <element name=“Students” type=…/> <element name=“Classes” type=…/> <element name=“Course” type=…/> </sequence> </complexType> </element>

slide-30
SLIDE 30

CMPT 354: Database I -- DTD and XML Schema 30

Keys

slide-31
SLIDE 31

CMPT 354: Database I -- DTD and XML Schema 31

Foreign Key Constraints

slide-32
SLIDE 32

CMPT 354: Database I -- DTD and XML Schema 32

Summary

  • DTD: a set of rules for structuring an XML

document

  • XML Schema: a more sophisticated tool to

specify structures of XML documents

– XML Schema is written in XML

  • Assignment 3