3. Defining the document structure (DTD) Declaration of - - PowerPoint PPT Presentation

3 defining the document structure dtd
SMART_READER_LITE
LIVE PREVIEW

3. Defining the document structure (DTD) Declaration of - - PowerPoint PPT Presentation

3. Defining the document structure (DTD) Declaration of application-specific names and structural constraints A document is valid if it specifies a DTD, and if its contents conform to the DTD. A validating parser does the checking;


slide-1
SLIDE 1

XML-3 J. Teuhola 2013 37

  • 3. Defining the document structure (DTD)
  • Declaration of application-specific names and structural

constraints

  • A document is valid if it specifies a DTD, and if its

contents conform to the DTD.

  • A validating parser does the checking;

but: validation is not mandatory

  • Items not specified in the DTD are forbidden
  • A DTD does not specify: the root, precise number of

element instances, data formats (everything is a string; some restrictions on names), semantics (meaning)

  • Alternative to DTD: XML Schema (see later)
slide-2
SLIDE 2

XML-3 J. Teuhola 2013 38

Example DTD: Course document

#PCDATA (parsed char data) may contain entity references like &amp; but not tags. Note! the DTD syntax does not conform to the general XML syntax. <!ELEMENT course (cname, teacher, semester, audience)> <!ELEMENT cname (#PCDATA)> <!ELEMENT teacher (#PCDATA)> <!ELEMENT semester (#PCDATA)> <!ELEMENT audience (student*)> <!ELEMENT student (#PCDATA)>

slide-3
SLIDE 3

XML-3 J. Teuhola 2013 39

Example: test documents

Valid:

<course> <cname>XML</cname> <teacher>JT</teacher> <semester> Spring 2013 </semester> <audience> <student>NN</student> </audience> <course>

Invalid:

<course> <cname>XML</cname> <teacher>JT</teacher> <student>NN</student> <extent>5 sp</extent> </course> Errors: ’semester’ and ’audience’ are missing; ’extent’ not defined in DTD

slide-4
SLIDE 4

XML-3 J. Teuhola 2013 40

Declaring the DTD

  • Position: in the document prolog

(after XML declaration, before the root)

  • Alternatives:

– External dtd file URI: <!DOCTYPE coursetype SYSTEM ”http://...”> – External public DTD; unique and known application: <!DOCTYPE coursetype PUBLIC ”ref” ”backup”> where backup is used if ref is not found. – Internal; useful in development phase: <!DOCTYPE coursetype [<!ELEMENT ... >]> – Both (compatible internal and external subsets): <!DOCTYPE coursetype SYSTEM ”http://...” [ ... ]>

slide-5
SLIDE 5

XML-3 J. Teuhola 2013 41

Declaring elements

  • <!ELEMENT name (content)>

where the content can be:

– #PCDATA (parsed character data) – child – sequence (comma-separated ordered list) – alternatives (’|’-separated list)

  • Repetition indicators (suffix symbol), applicable

to elements and parentesis expressions:

– ? = zero or one – * = zero or many – + = one or many

slide-6
SLIDE 6

XML-3 J. Teuhola 2013 42

Declaring elements (cont.)

  • Examples:

<!ELEMENT audience (student*) <!ELEMENT day (sunday|monday|...)> <!ELEMENT semester (year,(spring|fall))> <!ELEMENT audience (#PCDATA|student)*>

  • Special cases:

– Empty element: <!ELEMENT name EMPTY> allows elements <name /> <name></name> – Arbitrary contents: <!ELEMENT name ANY>

slide-7
SLIDE 7

XML-3 J. Teuhola 2013 43

Declaring attributes

  • All possible attributes must be declared for each

element type.

  • Syntax:

<!ATTLIST element attname1 type1 default1 attname2 type2 default2 ... >

  • Example:

<!ATTLIST course name CDATA #REQUIRED dept CDATA ”CS-IT”>

  • Attributes of one element may also be declared
  • ne by one in separate ATTLIST statements.
slide-8
SLIDE 8

XML-3 J. Teuhola 2013 44

Attribute types

CDATA Character string where < and & must be escaped by &lt; and &amp; (possibly also &quot; and &apos;). Numeric data is also CDATA.

NMTOKEN

Name token; like XML name but may start with a number / punctuation

NMTOKENS Whitespace-separated list of name

tokens in parentheses

Enumeration ’|’-separated list of alternative names

following the XML name restrictions

slide-9
SLIDE 9

XML-3 J. Teuhola 2013 45

Attribute types (cont.)

  • ID

XML name which is unique among ID- attributes in the document. Only one ID attribute per element is allowed. ID value must be a valid XML name (plain number is not!).

  • IDREF

XML name referring to an ID attribute. This enables relationships between elements (cf. foreign keys of relations; but: referential integrity not checked). Needed for M:M relationships.

  • IDREFS Whitespace-separated list of ID

references.

slide-10
SLIDE 10

XML-3 J. Teuhola 2013 46

Attribute types (cont.)

  • ENTITY

Name of an (unparsed) entity, defined elsewhere in the DTD.

  • ENTITIES

Whitespace-separated list of entity names

  • NOTATION ’|’-separated list (in parentheses) of

alternative NOTATION declarations in DTD A NOTATION is more flexible than enumeration because notations are not restricted to XML naming rules. Declaring a notation, e.g.

<!NOTATION gif SYSTEM “image/gif”> <!NOTATION tiff SYSTEM “image/tiff”> … <!ATTLIST image type NOTATION (gif | tiff) #REQUIRED>

slide-11
SLIDE 11

XML-3 J. Teuhola 2013 47

Attribute defaults

Alternatives: #REQUIRED Compulsory, no default value #IMPLIED Attribute value may be omitted; no default #FIXED Always the same value; may be

  • mitted

Literal Quoted default value

slide-12
SLIDE 12

XML-3 J. Teuhola 2013 48

Declaring entities

  • Entity is a name with a related replacement text
  • Predefined: &lt; &amp; &gt; &quot; &apos;
  • Example: <!ENTITY domain ”it.utu.fi”>
  • Reference: &domain;
  • Replacement may contain well-formed markup:

<!ENTITY address ”<addr> <street>Joukahaisenkatu 3-5</street> <zip>20014</zip> <city>Turku</city> </addr>”>

  • Replacement may contain entity references

(but not loops).

slide-13
SLIDE 13

XML-3 J. Teuhola 2013 49

External entities

  • Parsed external entity:

– Replacement in a file, e.g.

<!ENTITY addr SYSTEM ”/folder/addr.xml”>

– Not allowed in attribute values – After replacement the result must be well-formed – An external entity must not have a prolog (e.g. DTD)

  • Unparsed external entity:

– Any data, e.g. digital image: <!ENTITY people SYSTEM ”pic.jpg” NDATA jpeg> – NDATA refers to (application-specific) notation:

<!NOTATION jpeg SYSTEM ”image/jpeg”>

– Usage as attribute value:

<!ATTLIST course photo ENTITY #REQUIRED>

– Instance: <course photo=”people”>

slide-14
SLIDE 14

XML-3 J. Teuhola 2013 50

Parameter entities

  • Used to name a repeating segment in the DTD
  • Syntax:

<!ENTITY % name ”replacement”>

  • Reference (to be replaced): %name;
  • Example:

<!ENTITY % employee ”name, dept, bdate”> <!ELEMENT professor (%employee;)> <!ELEMENT lecturer (%employee;)> <!ELEMENT assistant (%employee;)>

  • Usually appears in external DTDs, but can be

redefined in an internal DTD (if both exist); replacement can itself be external:

<!ENTITY % name SYSTEM ”http://...”>

slide-15
SLIDE 15

XML-3 J. Teuhola 2013 51

Example DTD (in file ’letters.dtd’)

<!ELEMENT letters (letter+)> <!ELEMENT letter (topic*, text)> <!ATTLIST letter num ID #REQUIRED from CDATA #FIXED ”John Smith, IBM” to CDATA #REQUIRED date CDATA #REQUIRED secret (yes | no) ”no”> <!ELEMENT topic EMPTY> <!ATTLIST topic title CDATA #IMPLIED> <!ELEMENT text ANY> <!ENTITY signature ”Cheers, John”>

slide-16
SLIDE 16

XML-3 J. Teuhola 2013 52

Example: valid document

<?xml version=”1.0” standalone=”no”?> <!DOCTYPE letters SYSTEM ”letters.dtd”> <letters> <letter num=”A123” to=”Bill” date=”20.09” secret=”yes”> <topic title=”Howdy” /> <topic title=”What’s cooking?” /> <text>Thanks for the party. &signature;</text> </letter> <letter num=”A124” to=”Jim” date=”21.09”> <topic title=”Hi” /> <text>See you again. &signature;</text> </letter> </letters>

slide-17
SLIDE 17

XML-3 J. Teuhola 2013 53

Problems with DTD

  • Does not itself use XML syntax;

needs a different parser/editor/processor

  • No constraints on character data (e.g. no

format, no regular expressions)

  • No strict data types (e.g. integer, float, boolean)
  • Restricting the number of repetitions is difficult
  • Namespaces are not interpreted; prefixes are

just part of the names.

  • Definitions cannot depend on the context

(DTD allows ”too much”)

slide-18
SLIDE 18

XML-3 J. Teuhola 2013 54

Problems with DTD (cont.)

  • Uniqueness scope of IDs cannot be restricted.
  • Referential integrity of IDREFS is not specified.
  • Limited modularity (using ENTITY-definitions);

another way to build from pieces: XInclude.

  • No defaults for elements (only for attributes)
  • No wildcards for elements/attributes (only ANY

content possible for elements). Some of these problems were solved in the XML schema language (see later).