DTD and XML Schema XML Extensible Markup Language A standard - - PDF document
DTD and XML Schema XML Extensible Markup Language A standard - - PDF document
DTD and XML Schema XML Extensible Markup Language A standard adopted in 1998 by the W3C (World Wide Web Consortium) Optional mechanisms for specifying document structure DTD: the Document Type Definition Language, part of the
CMPT 354: Database I -- DTD and XML Schema 2
XML
- Extensible Markup Language
– A standard adopted in 1998 by the W3C (World Wide Web Consortium)
- Optional mechanisms for specifying document
structure
– DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top of XML
- Query languages for XML
– XPath: lightweight – XSLIT: document transformation language – XQuery: a full-blown language
CMPT 354: Database I -- DTD and XML Schema 3
Example
Root element Mandatory statement XML element Element name Element content
CMPT 354: Database I -- DTD and XML Schema 4
Hierarchical Structure
PersonList Student Title Contents Person Person Name: John Doe Id: 111111111 Address Number: 123 Street: Main St Name: Joe Public Id: 666666666 Address Number: 666 Street: Hollow Rd
CMPT 354: Database I -- DTD and XML Schema 5
Document Type Definitions
- A set of rules for structuring an XML document
– Specified as part of the document itself, or – Give a URL where its DTD can be found – A document that conforms to its DTD is said valid
- XML does not require a document has a DTD, but
it must be well formed
- A grammar that specifies a legal XML document,
based on the tags used in the document and their attributes
CMPT 354: Database I -- DTD and XML Schema 6
Example – DTD
<!DOCTYPE PersonList[ <!ELEMENT PersonList (Title, Contents)> <!ELEMENT Title EMPTY> <!ELEMENT Contents (Person*)> <!ELEMENT Person (Name, Id, Address)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Id (#PCDATA)> <!ELEMENT Address (Number, Street)> <!ELEMENT Number (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST PersonList Type CDATA #IMPLIED Date CDATA #IMPLIED> <!ATTLIST Title Value CDATA #REQUIRED> ]>
CMPT 354: Database I -- DTD and XML Schema 7
DTD Components
- Name (e.g., PersonList)
– Must coincide with the tag name of the root element of the document
- One ELEMENT statement
for each allowed tag, including the root tag
- For each tag that can have
attributes, the ATTLIST statement specifies the allowed attributes and their types
<!DOCTYPE PersonList[ <!ELEMENT PersonList (Title, Contents)> <!ELEMENT Title EMPTY> <!ELEMENT Contents (Person*)> <!ELEMENT Person (Name, Id, Address)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Id (#PCDATA)> <!ELEMENT Address (Number, Street)> <!ELEMENT Number (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST PersonList Type CDATA #IMPLIED Date CDATA #IMPLIED> <!ATTLIST Title Value CDATA #REQUIRED> ]>
CMPT 354: Database I -- DTD and XML Schema 8
Specification
- *: a subelement can appear zero or more times
– +: a subelement can appear at least one time
- #PCDATA (parsed character data), CDATA:
character strings
- #IMPLIED: an attribute is optional
- ?: a subelement is optional
– <!ELEMENT Person (Name, Id, Address?)>
- |: alternatives of subelements
– <!ELEMENT Name ((First, Last)|(Last, First))>
CMPT 354: Database I -- DTD and XML Schema 9
Types for Attributes
- CDATA: character
strings
- ID: unique values
- IDREF: referential
- IDREFS: list of IDREF
CMPT 354: Database I -- DTD and XML Schema 10
DTD as Data Definition Language
- There are some limitations
- Namespaces are not in native design
- DTD syntax is quite different from XML
- Very limited set of basic types
- Limited ways to specify data consistency
constraints
– No keys, weak referential integrity, no type references
- No referential integrity for elements
- Ordered elements
- Global definition of elements
CMPT 354: Database I -- DTD and XML Schema 11
Why XML Schema?
- Use the same syntax as that used for ordinary XML
documents
– An alternative to DTD
- Integrated with the namespace mechanism
– Different schemas can be imported from different namespaces and integrated into one schema
- Provide a number of built-in types similar to SQL, e.g.,
string, integer, and time
- Define complex types from simpler ones
- The same element name can be defined as different types
depending on where the element is nested
- Support keys and referential integrity constraints
- Easy to specify documents where elements are unordered
CMPT 354: Database I -- DTD and XML Schema 12
Schema and Instance
- Goal: describing XML schema using XML
- An XML document D that conforms to a
given schema (which is another XML document) is said to be schema valid
– D is called an instance of the schema
CMPT 354: Database I -- DTD and XML Schema 13
XML Schema and Namespaces
- An XML schema document begins with a declaration of the
namespaces to be used
- http://www.w3.org/2001/XMLSchema – the namespace
identifying the names of tags and attributes used in a schema (not in the instances)
– Describe the structural properties of documents in general, e.g., schema, attribute, element, …
- http://www.w3.org/2001/XMLSchema-instance – another
namespace used in conjunction with the above one
– Identify a small number of special names that are defined in the XML Schema Specification and are used in the instance documents, e.g., schemaLocation
- The target namespace – identifies the set of names
defined by a particular schema document to be used in the instances
CMPT 354: Database I -- DTD and XML Schema 14
Schema and An Instance Document
CMPT 354: Database I -- DTD and XML Schema 15
Report Document
CMPT 354: Database I -- DTD and XML Schema 16
Primitive Types
- DTD has very limited primitive types
– CDATA, ID, IDREF, IDREFS
- Many useful primitive types in XML Schema
– Decimal, integer, float, Boolean, date, …
- Derive new primitive types from the basic
- nes
– The mechanism is similar to the CREATE DOMAIN statement in SQL
CMPT 354: Database I -- DTD and XML Schema 17
Deriving Simple Types
- IDREFS is not one of the primitive types
<simpleType name=“myIdrefs”> <list itemType=“IDREF”/> </simpleType>
- Union of multiple types
Suppose local phone numbers are 7 digits long and long distance numbers are 10 digits long <simpleType name=“phoneNumber”> <union memberTypes=“phone7digits phone10digits”/> </simpleType>
CMPT 354: Database I -- DTD and XML Schema 18
Deriving Simple Types by Restriction
- Constrain a basic type using one or more
constraints from a fixed repertoire defined by the XML Schema specification
<simpleType name=“phone7digits”> <restriction base=“integer”> <minInclusive value=“1000000”/> <maxInclusive value=“9999999”/> </restriction> </simpleType>
CMPT 354: Database I -- DTD and XML Schema 19
More Examples
- Phone numbers in XXX-YYYY format
<simpleType name=“phone7digitsAndDash”> <restriction base=“string”> <pattern value=“[0-9]{3}-[0-9]{4}”/> </restriction> </simpleType>
- More restrictions on basic string type
– <length value=“7”/> – strings of length 7 – <minLength value=“7”/> – strings of length >= 7 – <maxLength value=“14”/> – strings of length <=14
CMPT 354: Database I -- DTD and XML Schema 20
Enumeration
- Restrict the domain to a finite set
- Can be applied to any base type
<simpleType name=“emergencyNumbers”> <restriction base=“integer”> <enumeration value=“911”/> <enumeration value=“333”/> <enumeration value=“5431234”/> </restriction> </simpleType>
CMPT 354: Database I -- DTD and XML Schema 21
More Examples on Simple Types
CMPT 354: Database I -- DTD and XML Schema 22
Complex Types
CMPT 354: Database I -- DTD and XML Schema 23
Basics of Complex Types
- Tag complexType
- Tag sequence: a list of elements that must occur
in the given order
- Using minOccurs and maxOccurs
- Associating attributes with type
- A complex type can be associated with an element
<element name=“Student” type=“adm:studentType”/>
CMPT 354: Database I -- DTD and XML Schema 24
Element without Content
- Just associate attributes with types
- Example
<complexType name=“courseTakenType”> <attribute name=“CrsCode” type=“adm:courseRef”/> <attribute name=“Semester” type=“string”/> </complexType>
CMPT 354: Database I -- DTD and XML Schema 25
Compositors
- Tags describing how elements can be combined into
groups, e.g., sequence
– Required when a tag has complex content – Required even if the type has only one child element!
- Compositor all: allow elements appear in any order
<complexType name=“addressType”> <all> <element name=“StreetName” type=“string”/> <element name=“StreetNumber” type=“string”/> <element name=“city” type=“string”/> </all> </complexType>
CMPT 354: Database I -- DTD and XML Schema 26
Restrictions on Compositor All
- All must appear directly below complexType
<complexType name=“studentType2”> <sequence> <all> <element name=“First” type=“string”/> <element name=“Last” type=“string”/> </all> <element name=“Address” type=“string”/> </sequence> </complexType>
- No element within all can be repeated
<complexType name=“studentType3”> <all> <element name=“First” type=“string”/> <element name=“Last” type=“string”/> <element name=“Address” type=“string” minOccurs=“1” maxOccurs=“unbounded”/> </all> </complexType>
CMPT 354: Database I -- DTD and XML Schema 27
Compositor Choice
<complexType name=“addressType”> <sequence> <choice> <element name=“POBox” type=“string”/> <sequence> <element name=“Name” type=“string”/> <element name=“Number” type=“string”/> </sequence> </choice> <element name=“City” type=“string”/> </sequence> </complexType>
CMPT 354: Database I -- DTD and XML Schema 28
Local Element Names
- Two complex types can have elements that share the
same name
– Names of students and names of courses – Impossible in DTD, where all element declarations are global
CMPT 354: Database I -- DTD and XML Schema 29
Anonymous Types
- Useful for types that might not be reused
<element name=“Report”> <complexType> <sequence> <element name=“Students” type=…/> <element name=“Classes” type=…/> <element name=“Course” type=…/> </sequence> </complexType> </element>
CMPT 354: Database I -- DTD and XML Schema 30
Keys
CMPT 354: Database I -- DTD and XML Schema 31
Foreign Key Constraints
CMPT 354: Database I -- DTD and XML Schema 32
Summary
- DTD: a set of rules for structuring an XML
document
- XML Schema: a more sophisticated tool to
specify structures of XML documents
– XML Schema is written in XML
- Assignment 3