Schema Languages Schema Languages Regular expressions a commonly - - PowerPoint PPT Presentation

schema languages schema languages
SMART_READER_LITE
LIVE PREVIEW

Schema Languages Schema Languages Regular expressions a commonly - - PowerPoint PPT Presentation

Objectives Objectives The purpose of using schemas An Introduction to XML and Web Technologies An Introduction to XML and Web Technologies The schema languages DTD and XML Schema (and DSD2 and RELAX NG ) Schema Languages Schema


slide-1
SLIDE 1

1

An Introduction to XML and Web Technologies An Introduction to XML and Web Technologies

Schema Languages Schema Languages

Anders Møller & Michael I. Schwartzbach  2006 Addison-Wesley

2

An Introduction to XML and Web Technologies

Objectives Objectives

The purpose of using schemas The schema languages DTD and XML Schema (and DSD2 and RELAX NG) Regular expressions – a commonly used formalism in schema languages

3

An Introduction to XML and Web Technologies

Motivation Motivation

We have designed our Recipe Markup Language ...but so far only informally described its syntax How can we make tools that check that an XML document is a syntactically correct Recipe Markup Language document (and thus meaningful)? Implementing a specialized validation tool for Recipe Markup Language is not the solution...

4

An Introduction to XML and Web Technologies

XML Languages XML Languages

XML language:

a set of XML documents with some semantics

schema:

a formal definition of the syntax of an XML language

schema language:

a notation for writing schemas

slide-2
SLIDE 2

2

5

An Introduction to XML and Web Technologies

Validation Validation

instance document schema processor schema valid invalid normalized instance document error message

6

An Introduction to XML and Web Technologies

Why use Schemas? Why use Schemas?

Formal but human-readable descriptions Data validation can be performed with existing schema processors

7

An Introduction to XML and Web Technologies

General Requirements General Requirements

Expressiveness Efficiency Comprehensibility

8

An Introduction to XML and Web Technologies

Regular Expressions Regular Expressions

Commonly used in schema languages to describe sequences of characters or elements Σ: an alphabet (typically Unicode characters or element names)

σ∈Σ matches the string σ α? matches zero or one α α* matches zero or more α’s α+ matches one or more α’s α β matches any concatenation of an α and a β α | β matches the union of α and β

slide-3
SLIDE 3

3

9

An Introduction to XML and Web Technologies

Examples Examples

A regular expression describing integers:

0|-?(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*

A regular expression describing the valid contents of table able elements in XHTML:

caption? ( col* | colgroup* ) thead? tfoot? ( tbody+ | tr+ )

10

An Introduction to XML and Web Technologies

DTD DTD – – Document Type Definition Document Type Definition

Defined as a subset of the DTD formalism from SGML Specified as an integral part of XML 1.0 A starting point for development of more expressive schema languages Considers elements, attributes, and character data – processing instructions and comments are mostly ignored

11

An Introduction to XML and Web Technologies

Document Type Declarations Document Type Declarations

Associates a DTD schema with the instance document

  • <?xml version="1.1"?>

<!DO !DOCT CTYPE YPE co coll llect ection ion S SYST YSTEM EM "h "http ttp:// ://ww www.b w.bric rics. s.dk/ dk/ixw ixwt/ t/rec recipe ipes. s.dtd dtd"> "> <collection> ... </collection>

  • <!DOCTYPE html

PUBLI BLIC " C "-//W3 /W3C// C//DT DTD X D XHTM HTML L 1.0 1.0 Tr Tran ansit sition ional al//E //EN” N” "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

  • <!DOCTYPE collection [ ..

... ] . ]> 12

An Introduction to XML and Web Technologies

Element Declarations Element Declarations

<!ELEMENT element-name content-model >

Content models: EMPTY MPTY AN ANY mixed content: (#PCDATA|e1|e2|...|en)* element content: regular expression over element names

(concatenation is written with “,”) Example:

<!ELEMENT table (caption?,(col*|colgroup*),thead?,tfoot?,(tbody+|tr+)) >

slide-4
SLIDE 4

4

13

An Introduction to XML and Web Technologies

Attribute Attribute-

  • List Declarations

List Declarations

<!ATTLIST element-name attribute-definitions > Each attribute definition consists of an attribute name an attribute type a default declaration

Example:

<!ATTLIST input maxlength CDATA #IMPLIED tabindex CDATA #IMPLIED>

14

An Introduction to XML and Web Technologies

Attribute Types Attribute Types

CDATA: any value enumeration: (s1|s2|...|sn) ID: must have unique value IDREF (/ IDREFS): must match some ID attribute(s) ... Examples:

<!ATTLIST p align (left|center|right|justify) #IMPLIED> <!ATTLIST recipe id ID #IMPLIED> <!ATTLIST related ref IDREF #IMPLIED>

15

An Introduction to XML and Web Technologies

Attribute Default Declarations Attribute Default Declarations

#REQUIRED #IMPLIED

(= optional)

”value” (= optional, but default provided) #FIXED ”value” (= required, must have this value)

Examples:

<!ATTLIST form action CDATA #REQUIRED

  • nsubmit CDATA #IMPLIED

method (get|post) "get" enctype CDATA "application/x-www-form-urlencoded" > <!ATTLIST html xmlns CDATA #FIXED "http://www.w3.org/1999/xhtml">

16

An Introduction to XML and Web Technologies

Entity Declarations (1/3) Entity Declarations (1/3)

Internal entity declarations – a simple macro mechanism

Example:

  • Schema:

<!E <!ENT NTITY ITY c copyri yright ghtno notic tice "Copy

  • pyrig

right ht &# &#16 169; 9; 20 2005 05 Wi Widge dgets ts'R 'R'Us 'Us."> .">

  • Input:

A gadget has a medium size head and a big gizmo subwidget. &co &copy pyrig right htno notic tice; e;

  • Output:

A gadget has a medium size head and a big gizmo subwidget. Cop Copyr yrigh ight t &# &#169 169; 2 ; 2005 W Widgets' ts'R'U R'Us. s.

slide-5
SLIDE 5

5

17

An Introduction to XML and Web Technologies

Entity Declarations (2/3) Entity Declarations (2/3)

Internal parameter entity declarations – apply to the DTD, not the instance document

Example:

  • Schema:

<!E <!ENT NTITY ITY % Shap hape " e "(rect ect|c |cir ircle cle|po |poly ly|de |defa faul ult)" t)">

  • <!A

<!ATT TTLIS LIST ar T area sha ea shape pe %S %Sha hape pe; "rec rect" t">

corresponds to

<!A <!ATT TTLIS LIST ar T area sha ea shape pe (r (rec ect| t|cir circle cle|p |poly

  • ly|d

|def efaul ault) t) "r "rect ect"> ">

18

An Introduction to XML and Web Technologies

Entity Declarations (3/3) Entity Declarations (3/3)

External parsed entity declarations – references to XML data in other files

Example:

  • <!ENTITY widgets

<!ENTITY widgets SYSTEM "http://www.brics.dk/ixwt/widgets.xml" SYSTEM "http://www.brics.dk/ixwt/widgets.xml">

External unparsed entity declarations – references to non-XML data

Example:

  • <!ENTITY widget-image

SYSTEM "http://www.brics.dk/ixwt/widget.gif” SYSTEM "http://www.brics.dk/ixwt/widget.gif” NDATA gif NDATA gif >

  • <!NOTATION gif

<!NOTATION gif SYSTEM "http: SYSTEM "http://www.iana.org/assignments/media-types/image/gif"> //www.iana.org/assignments/media-types/image/gif">

  • <!ATTLIST thing img ENTITY

ENTITY #REQUIRED>

not widely used!

19

An Introduction to XML and Web Technologies

Conditional Sections Conditional Sections

Allow parts of schemas to be enabled/disabled by a switch

Example:

  • <![%person.simple; [

<!ELEMENT person (firstname,lastname)> ]]> <![%person.full; [ <!ELEMENT person (firstname,lastname,email+,phone?)> <!ELEMENT email (#PCDATA)> <!ELEMENT phone (#PCDATA)> ]]> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)>

  • <!ENTITY % person.simple "INCLUDE" >

<!ENTITY % person.full "IGNORE" >

20

An Introduction to XML and Web Technologies

Checking Validity with DTD Checking Validity with DTD

A DTD processor (also called a validating XML parser) parses the input document (includes checking well-formedness) checks the root element name for each element, checks its contents and attributes checks uniqueness and referential constraints (ID/IDREF(S) attributes)

slide-6
SLIDE 6

6

21

An Introduction to XML and Web Technologies

RecipeML with DTD (1/2) RecipeML with DTD (1/2)

<!ELEMENT collection (description,recipe*)> <!ELEMENT description (#PCDATA)> <!ELEMENT recipe (title,date,ingredient*,preparation,comment?, nutrition,related*)> <!ATTLIST recipe id ID #IMPLIED> <!ELEMENT title (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT ingredient (ingredient*,preparation)?> <!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED>

22

An Introduction to XML and Web Technologies

RecipeML with DTD (2/2) RecipeML with DTD (2/2)

<!ELEMENT preparation (step*)> <!ELEMENT step (#PCDATA)> <!ELEMENT comment (#PCDATA)> <!ELEMENT nutrition EMPTY> <!ATTLIST nutrition calories CDATA #REQUIRED carbohydrates CDATA #REQUIRED fat CDATA #REQUIRED protein CDATA #REQUIRED alcohol CDATA #IMPLIED> <!ELEMENT related EMPTY> <!ATTLIST related ref IDREF #REQUIRED>

23

An Introduction to XML and Web Technologies

Problems with the DTD description Problems with the DTD description

  • cal

calorie

  • ries should contain a non-negative number
  • pro

protein tein should contain a value on the form N% where N is between 0 and 100;

  • com

comment ment should be allowed to appear anywhere in the contents of re recipe cipe

  • uni

unit should only be allowed in an elements where amo amount unt is also present

  • nested in

ingre gredien dient elements should only be allowed when amo amount unt is absent – our DTD schema permits in some cases too much and in

  • ther cases too little!

24

An Introduction to XML and Web Technologies

Limitations of DTD Limitations of DTD

1. Cannot constraint character data 2. Specification of attribute values is too limited 3. Element and attribute declarations are context insensitive 4. Character data cannot be combined with the regular expression content model 5. The content models lack an “interleaving” operator 6. The support for modularity, reuse, and evolution is too primitive 7. The normalization features lack content defaults and proper whitespace control 8. Structured embedded self-documentation is not possible 9. The ID/IDREF mechanism is too simple 10. It does not itself use an XML syntax 11. No support for namespaces

slide-7
SLIDE 7

7

25

An Introduction to XML and Web Technologies

Requirements for XML Schema Requirements for XML Schema

  • W3C’s proposal for replacing DTD

Design principles:

  • More expressive than DTD
  • Use XML notation
  • Self-describing
  • Simplicity

Technical requirements:

  • Namespace support
  • User-defined datatypes
  • Inheritance (OO-like)
  • Evolution
  • Embedded documentation
  • ...

26

An Introduction to XML and Web Technologies

Types and Declarations Types and Declarations

Simple type definition:

defines a family of Unicode text strings

Complex type definition:

defines a content and attribute model

Element declaration:

associates an element name with a simple or complex type

Attribute declaration:

associates an attribute name with a simple type

27

An Introduction to XML and Web Technologies

Example (1/3) Example (1/3)

<b:card xmlns:b="http://businesscard.org"> <b:name>John Doe</b:name> <b:title>CEO, Widget Inc.</b:title> <b:email>john.doe@widget.com</b:email> <b:phone>(202) 555-1414</b:phone> <b:logo b:uri="widget.gif"/> </b:card> Instance document:

28

An Introduction to XML and Web Technologies

Example (2/3) Example (2/3)

<sc schema hema xmlns="http://w ttp://www.w3. ww.w3.org/200

  • rg/2001/XMLS

1/XMLSchema chema" xmlns:b="http://businesscard.org" targetNa rgetNamespace mespace="http://businesscard.org"> <element lement name="card" type="b:card_type"/> <element lement name="name" type="string"/> <element lement name="title" type="string"/> <element lement name="email" type="string"/> <element lement name="phone" type="string"/> <element lement name="logo" type="b:logo_type"/> <attribu ttribute te name="uri" type="anyURI"/>

Schema:

slide-8
SLIDE 8

8

29

An Introduction to XML and Web Technologies

Example (3/3) Example (3/3)

<complex

  • mplexType

Type name="card_type"> <seque equence nce> <element ment ref="b:name"/> <element ment ref="b:title"/> <element ment ref="b:email"/> <element ment ref="b:phone" minOccurs="0"/> <element ment ref="b:logo" minOccurs="0"/> </sequ equence ence> </comple

  • mplexType

xType> <complex

  • mplexType

Type name="logo_type"> <attri ttribute bute ref=“b:uri" use="required"/> </comple

  • mplexType

xType> </schema chema>

30

An Introduction to XML and Web Technologies

Connecting Schemas and Instances Connecting Schemas and Instances

<b:card xmlns:b="http://businesscard.org“ xm xmln lns: s:xs xsi= i="h "htt ttp: p:// //ww www. w.w3 w3.o .org rg/2 /200 001/ 1/XM XMLS LSch chem ema- a-in inst stan ance ce" xs xsi: i:sc sche hema maLo Loca cati tion

  • n="

="ht http tp:/ ://b /bus usin ines essc scar ard. d.or

  • rg

bu busi sine ness ss_c _car ard. d.xs xsd" d"> <b:name>John Doe</b:name> <b:title>CEO, Widget Inc.</b:title> <b:email>john.doe@widget.com</b:email> <b:phone>(202) 555-1414</b:phone> <b:logo b:uri="widget.gif"/> </b:card>

31

An Introduction to XML and Web Technologies

Element and Attribute Declarations Element and Attribute Declarations

Examples:

  • <elem

lement ent name="serialnumber" type="nonNegativeInteger"/>

  • <attr

ttribut ibute name=”alcohol" type=”r:percentage"/>

32

An Introduction to XML and Web Technologies

Simple Types ( Simple Types (Datatypes Datatypes) ) – – Primitive Primitive

st stri ring ng any Unicode string bo bool

  • lea

ean true, false, 1, 0 de deci cima mal 3.1415 fl floa

  • at

6.02214199E23 do doub uble le 42E970 da date teTi Time me 2004-09-26T16:29:00-05:00 ti time me 16:29:00-05:00 da date te 2004-09-26 he hexB xBin inar ary 48656c6c6f0a ba base se64 64Bi Bina nary ry SGVsbG8K an anyU yURI RI http://www.brics.dk/ixwt/ QN QNam ame rcp:recipe, recipe ...

slide-9
SLIDE 9

9

33

An Introduction to XML and Web Technologies

Derivation of Simple Types Derivation of Simple Types – – Restriction Restriction

Constraining facets:

  • le

leng ngth th

  • mi

minL nLen engt gth

  • ma

maxL xLen engt gth

  • pa

patt tter ern

  • en

enum umer erat atio ion

  • wh

whit iteS eSpa pace ce

  • ma

maxI xInc nclu lusi sive ve

  • ma

maxE xExc xclu lusi sive ve

  • mi

minI nInc nclu lusi sive ve

  • mi

minE nExc xclu lusi sive ve

  • to

tota talD lDig igit its

  • fr

frac acti tion

  • nDi

Digi gits ts

34

An Introduction to XML and Web Technologies

Examples Examples

<simpleType name="score_from_0_to_100"> <restrict restriction ion base="i base="integer"> nteger"> <minInc minInclusive lusive valu value="0"/> e="0"/> <maxInc maxInclusive lusive valu value="100"/> e="100"/> </ </restric restriction tion> </simpleType> <simpleType name="percentage"> <restrict <restriction base="s ion base="string"> tring"> <patter pattern value=" value="([ ([0-9]|[1-9][ 0-9]|[1-9][0-9]|100)% 0-9]|100)%"/ "/> </restric </restriction> tion> </simpleType>

regular expression

35

An Introduction to XML and Web Technologies

Simple Type Derivation Simple Type Derivation – – List List

<simpleType name="integerList"> <li list st itemTyp mType="i e="int nteger eger"/> "/> </simpleType> matches whitespace separated lists of integers

36

An Introduction to XML and Web Technologies

Simple Type Derivation Simple Type Derivation – – Union Union

<simpleType name="boolean_or_decimal"> <union ion> <simpleType> <restriction base="boolean"/> </simpleType> <simpleType> <restriction base="decimal"/> </simpleType> </union ion> </simpleType>

slide-10
SLIDE 10

10

37

An Introduction to XML and Web Technologies

Built Built-

  • In Derived Simple Types

In Derived Simple Types

  • no

norm rmal aliz ized edSt Stri ring ng

  • to

toke ken

  • la

lang ngua uage ge

  • Na

Name me

  • NC

NCNa Name me

  • ID

ID

  • ID

IDRE REF

  • in

inte tege ger

  • no

nonN nNeg egat ativ iveI eInt nteg eger er

  • un

unsi sign gned edLo Long ng

  • lo

long ng

  • in

int

  • sh

shor

  • rt
  • by

byte te

  • ..

...

38

An Introduction to XML and Web Technologies

Complex Types with Complex Contents Complex Types with Complex Contents

Content models as regular expressions:

  • Element reference

<element ref=”name”/>

  • Concatenation

<sequence> ... </sequence>

  • Union

<choice> ... </choice>

  • All

<all> ... </all>

  • Element wildcard:

<any namespace=”...” processContents=”...”/> Attribute reference: <attribute ref=”...”/> Attribute wildcard: <anyAttribute namespace=”...” processContents=”...”/> Cardinalities: minOccurs, maxOccurs, use Mixed content: mixed=”true”

39

An Introduction to XML and Web Technologies

Example Example

<element name="order" type="n:order_type"/> <com compl plex exTy Type pe name="order_type" mixed="true"> <choice> <element ref="n:address"/> <sequence> <element ref="n:email" minOccurs="0" maxOccurs="unbounded"/> <element ref="n:phone"/> </sequence> </choice> <attribute ref=”n:id" use="required"/> </co comp mple lexT xTyp ype>

40

An Introduction to XML and Web Technologies

Complex Types with Simple Content Complex Types with Simple Content

<complexType name="category"> <simpl simpleC eCon

  • nten

tent> <exten extensi sion

  • n base="integer">

<attribute ref=”r:class”/> </extens extension ion> </simple simpleCo Conte ntent nt> </complexType> <complexType name="extended_category"> <simpl simpleC eCon

  • nten

tent> <extension base="n:cate n:categor gory"> <attribute ref=”r:kind"/> </extension> </simple simpleCo Conte ntent nt> </complexType> <complexType name="restricted_category"> <simpleContent> <restr restric ictio tion base="n:category"> <total totalDig Digits its value="3"/> <attri attribut bute ref=“r:class" use="requir required ed"/> </restri restricti ction

  • n>

</simpleContent> </complexType>

slide-11
SLIDE 11

11

41

An Introduction to XML and Web Technologies

Derivation with Complex Content Derivation with Complex Content

<complexType name="basic_card_type"> <sequence> <element ref="b:name"/> </sequence> </complexType> <complexType name="extended_type"> <comple complexC xCon

  • nten

tent> <exten extensi sion

  • n base=

"b:basic_card_type"> <sequence> <element ref="b:title"/> <element ref="b:email" minOccurs="0"/> </sequence> </extens extensio ion> </compl complex exCo Conte ntent nt> </complexType> <complexType name="further_derived"> <comple complexC xCon

  • nten

tent> <restr restric icti tion

  • n base=

"b:extended_type"> <sequence> <element ref="b:name"/> <element ref="b:title"/> <element ref="b:email"/> </sequence> </restri restrict ction ion> </compl complex exCo Conte ntent nt> </complexType>

Note: re rest stri riction is not the opposite of exte xtens nsio ion!

42

An Introduction to XML and Web Technologies

Global vs. Local Descriptions Global vs. Local Descriptions

Global (toplevel) style:

<element name="card“ type="b:card_ card_type type"/> <element name="name name“ type="string"/> <complexType name="card_ty card_type pe"> <sequence> <element ref ref="b:name name"/> ... </sequence> </complexType>

Local (inlined) style:

<element name="card"> <compl complexTy exType pe> <sequence> <element name name="name name" type="string"/> ... </sequence> </comp complexT lexType ype> </element>

inlined

43

An Introduction to XML and Web Technologies

Global vs. Local Descriptions Global vs. Local Descriptions

Local type definitions are anonymous Local element/attribute declarations can be overloaded – a simple form of context sensitivity (particularly useful for attributes!) Only globally declared elements can be starting points for validation (e.g. roots) Local definitions permit an alternative namespace semantics (explained later...)

44

An Introduction to XML and Web Technologies

Requirements to Complex Types Requirements to Complex Types

  • Two element declarations that have the same name

and appear in the same complex type must have identical types

<complexType name=”some_type"> <choice> <element name=”foo foo" type=”str strin ing"/> <element name=”foo foo" type=”int integ eger er"/> </choice> </complexType>

  • This requirement makes efficient implementation easier
  • all can only contain element (e.g. not sequence!)
  • so we cannot use all to solve the problem with comment in RecipeML
  • ...
slide-12
SLIDE 12

12

45

An Introduction to XML and Web Technologies

Namespaces Namespaces

<schema targetN getNames amespa pace ce="...” ...> Prefixes are also used in certain attribute values! Unqualified Locals:

  • if enabled, the name of a locally declared element
  • r attribute in the instance document must have

no namespace prefix (i.e. the empty namespace URI)

  • such an attribute or element “belongs to” the element

declared in the surrounding global definition

  • always change the default behavior using

elementFormDefault="qualified"

46

An Introduction to XML and Web Technologies

Derived Types and Derived Types and Subsumption Subsumption

Assume that

  • T is some type
  • T- is derived from T by restriction
  • T+ is derived from T by extension

Subsumption: Whenever a T instance is required,

  • a T- instance may be used instead (trivial)
  • a T+ instance may be used instead – if the instance has

xsi:type=”T+”

(with xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")

  • Derivation, instantiation, and subsumption can be constrained

using final, abstract, and block

47

An Introduction to XML and Web Technologies

Substitution Groups Substitution Groups

Assume D is (in some number of steps) derived from B, ED is an element declaration of type D, and EB is an element declaration of type B If ED is in substitution group of EB then an ED element may be used whenever an EB is required (This is subsumption based on element declarations, not on types)

48

An Introduction to XML and Web Technologies

Uniqueness, Keys, References Uniqueness, Keys, References

<element name="w:widget" xmlns:w="http://www.widget.org"> <complexType> ... </complexType> <key n <key name="my "my_wi _widg dget_ et_ke key" y"> <sel selec ecto tor x r xpat path= h="w: "w:co comp mpone

  • nents

nts/w /w:pa :part rt"/ "/> <fie field ld x xpat path=" h="@m @manu anufa fact cture urer"/ r"/> <fie field ld x xpat path=" h="w: w:inf info/

  • /@p

@prod roduct uctid id"/> "/> </ </ke key> <keyre yref name= me="an "anno notat tatio ion_ n_ref refere erenc nces" es" r refer= er="w: "w:my my_wi _widg dget et_ke _key"> y"> <sel selec ecto tor x r xpat path= h="./ ".//w /w:a :anno nnotat tatio ion"/ n"/> <fie field ld x xpat path=" h="@m @manu anu"/ "/> <fie field ld x xpat path=" h="@p @prod rod"/ "/> </keyr eyref ef> </element> uni uniqu que: as key, but fields may be absent

in every widget, each part must have unique (manufacturer, productid) in every widget, for each annotation, (manu, prod) must match a my_widget_key

  • nly a “downward”

subset of XPath is used

slide-13
SLIDE 13

13

49

An Introduction to XML and Web Technologies

Other Features in XML Schema Other Features in XML Schema

Groups Nil values Annotations Defaults and whitespace Modularization – read the book chapter

50

An Introduction to XML and Web Technologies

RecipeML RecipeML with XML Schema (1/5) with XML Schema (1/5)

<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:r="http://www.brics.dk/ixwt/recipes" targetNamespace="http://www.brics.dk/ixwt/recipes" elementFormDefault="qualified"> <element name="collection collection"> <complexType> <sequence> <element name="description description" type="string"/> <element ref="r:recipe recipe" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> <unique name="recipe-id-uniqueness"> <selector xpath=".//r:recipe"/> <field xpath="@id"/> </unique> <keyref name="recipe-references" refer="r:recipe-id-uniqueness"> <selector xpath=".//r:related"/> <field xpath="@ref"/> </keyref> </element> 51

An Introduction to XML and Web Technologies

RecipeML RecipeML with XML Schema (2/5) with XML Schema (2/5)

<element name="recipe recipe"> <complexType> <sequence> <element name="title title" type="string"/> <element name="date date" type="string"/> <element ref="r:ingredient ingredient" minOccurs="0" maxOccurs="unbounded"/> <element ref="r:preparation preparation"/> <element name="comment comment" type="string" minOccurs="0"/> <element ref="r:nutrition nutrition"/> <element ref="r:related related" minOccurs="0" maxOccurs="unbounded"/> </sequence> <attribute name="id id" type="NMTOKEN"/> </complexType> </element> 52

An Introduction to XML and Web Technologies

RecipeML RecipeML with XML Schema (3/5) with XML Schema (3/5)

<element name="ingredient ingredient"> <complexType> <sequence minOccurs="0"> <element ref="r:ingredient ingredient" minOccurs="0" maxOccurs="unbounded"/> <element ref="r:preparation preparation"/> </sequence> <attribute name="name name" use="required"/> <attribute name="amount amount" use="optional"> <simpleType> <union> <simpleType> <restriction base="r:nonNegativeDecimal"/> </simpleType> <simpleType> <restriction base="string"> <enumeration value="*"/> </restriction> </simpleType> </union> </simpleType> </attribute> <attribute name="unit unit" use="optional"/> </complexType> </element>

slide-14
SLIDE 14

14

53

An Introduction to XML and Web Technologies

RecipeML RecipeML with XML Schema (4/5) with XML Schema (4/5)

<element name="preparation preparation"> <complexType> <sequence> <element name="step step" type="string“ minOccurs="0“ maxOccurs="unbounded"/> </sequence> </complexType> </element> <element name="nutrition nutrition"> <complexType> <attribute name="calories calories" type="r:nonNegativeDecimal“ use="required"/> <attribute name="protein protein" type="r:percentage" use="required"/> <attribute name="carbohydrates carbohydrates" type="r:percentage" use="required"/> <attribute name="fat fat" type="r:percentage" use="required"/> <attribute name="alcohol alcohol" type="r:percentage" use="optional"/> </complexType> </element> <element name="related related"> <complexType> <attribute name="ref ref" type="NMTOKEN" use="required"/> </complexType> </element> 54

An Introduction to XML and Web Technologies

RecipeML RecipeML with XML Schema (5/5) with XML Schema (5/5)

<simpleType name="nonNegativeDecimal"> <restriction base="decimal"> <minInclusive value="0"/> </restriction> </simpleType> <simpleType name="percentage"> <restriction base="string"> <pattern value="([0-9]|[1-9][0-9]|100)%"/> </restriction> </simpleType> </schema> 55

An Introduction to XML and Web Technologies

Problems with the XML Schema description Problems with the XML Schema description

  • cal

calorie

  • ries should contain a non-negative number
  • pro

protein tein should contain a value on the form N% where N is between 0 and 100;

  • com

comment ment should be allowed to appear anywhere in the contents of rec recipe ipe

  • uni

unit should only be allowed in an elements where amou amount nt is also present

  • nested in

ingre gredien dient elements should only be allowed when amo amount unt is absent – even XML Schema has insufficient expressiveness!

solved

56

An Introduction to XML and Web Technologies

Limitations of XML Schema Limitations of XML Schema

1. The details are extremely complicated (and the spec is unreadable) 2. Declarations are (mostly) context insentitive 3. It is impossible to write an XML Schema description of XML Schema 4. With mixed content, character data cannot be constrained 5. Unqualified local elements are bad practice 6. Cannot require specific root element 7. Element defaults cannot contain markup 8. The type system is overly complicated 9. xsi:type is problematic 10. Simple type definitions are inflexible

slide-15
SLIDE 15

15

57

An Introduction to XML and Web Technologies

Strengths of XML Schema Strengths of XML Schema Namespace support Data types (built-in and derivation) Modularization Type derivation mechanism

58

An Introduction to XML and Web Technologies

Document Structure Description 2.0 Document Structure Description 2.0

– read the book chapter

59

An Introduction to XML and Web Technologies

RELAX NG RELAX NG

OASIS + ISO competitor to XML Schema Validation only (no normalization) Designed for simplicity and expressiveness, solid mathematical foundation

60

An Introduction to XML and Web Technologies

Processing Model Processing Model

For a valid instance document, the root element must match a designated pattern A pattern may match elements, attributes, or character data Element patterns can contain sub-patterns, that describe contents and attributes

slide-16
SLIDE 16

16

61

An Introduction to XML and Web Technologies

Patterns Patterns – – Regular Hedge Expressions Regular Hedge Expressions

<element name=”...”> ... </element> <attribute name=”...”> ... </attribute> <text/> <group> ... </group> (concatenation) <optional> ... </optional> <zeroOrMore> ... </zeroOrMore> <oneOrMore> ... </oneOrMore> <choice> ... </choice> (union) <empty/> <interleave> ... </interleave> <mixed> ... </mixed>

62

An Introduction to XML and Web Technologies

Example Example

<element name="card"> <element name="name"><text/></element> <element name="title"><text/></element> <element name="email"><text/></element> <optional> <element name="phone"><text/></element> </optional> <optional> <element name="logo"> <attribute name="uri"><text/></attribute> </element> </optional> </element>

63

An Introduction to XML and Web Technologies

Grammars Grammars

Pattern definitions and references allow description of recursive structures

  • <grammar ...>

<start> ... </start> <define name="..."> ... </define> ... </grammar>

64

An Introduction to XML and Web Technologies

Other Features in RELAX NG Other Features in RELAX NG

Name classes Datatypes (based on XML Schema’s datatypes) Modularization An alternative compact, non-XML syntax – read the book chapter

slide-17
SLIDE 17

17

65

An Introduction to XML and Web Technologies

RecipeML RecipeML with RELAX NG (1/5) with RELAX NG (1/5)

<grammar xmlns="http://relaxng.org/ns/structure/1.0" ns="http://www.brics.dk/ixwt/recipes" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="collection collection"> <element name="description description"><text/></element> <zeroOrMore><ref name="element-recipe"/></zeroOrMore> </element> </start> <define name="element-recipe"> <element name="recipe recipe"> <optional><attribute name="id id"> <data datatypeLibrary=“http://relaxng.org/...“ type="ID"/> </attribute></optional> 66

An Introduction to XML and Web Technologies

RecipeML RecipeML with RELAX NG (2/5) with RELAX NG (2/5)

<interleave> <group> <element name="title title"><text/></element> <element name="date date"><text/></element> <zeroOrMore><ref name="element-ingredient"/></zeroOrMore> <ref name="element-preparation"/> <element name="nutrition nutrition"> <ref name="attributes-nutrition"/> </element> <zeroOrMore><ref name="element-related"/></zeroOrMore> </group> <optional><element name="comment comment"><text/></element></optional> </interleave> </element> </define> 67

An Introduction to XML and Web Technologies

RecipeML RecipeML with RELAX NG (3/5) with RELAX NG (3/5)

<define name="element-ingredient"> <element name="ingredient ingredient"> <attribute name="name name"/> <choice> <group> <attribute name="amount amount"> <choice><value>*</value><ref name="NUMBER"/></choice> </attribute> <optional><attribute name="unit unit"/></optional> </group> <group> <zeroOrMore><ref name="element-ingredient"/></zeroOrMore> <ref name="element-preparation"/> </group> </choice> </element> </define> 68

An Introduction to XML and Web Technologies

RecipeML RecipeML with RELAX NG (4/5) with RELAX NG (4/5)

<define name="element-preparation"> <element name="preparation preparation"> <zeroOrMore><element name="step step"><text/></element></zeroOrMore> </element> </define> <define name="attributes-nutrition"> <attribute name="calories calories"><ref name="NUMBER"/></attribute> <attribute name="protein protein"><ref name="PERCENTAGE"/></attribute> <attribute name="carbohydrates carbohydrates"><ref name="PERCENTAGE"/></attribute> <attribute name="fat fat"><ref name="PERCENTAGE"/></attribute> <optional> <attribute name="alcohol alcohol"<ref name="PERCENTAGE"/></attribute> </optional> </define>

slide-18
SLIDE 18

18

69

An Introduction to XML and Web Technologies

RecipeML RecipeML with RELAX NG (5/5) with RELAX NG (5/5)

<define name="element-related"> <element name="related related"> <attribute name="ref ref"> <data datatypeLibrary="http://relaxng.org/..." type="IDREF"/> </attribute> </element> </define> <define name="PERCENTAGE"> <data type="string"> <param name="pattern">([0-9]|[1-9][0-9]|100)%</param> </data> </define> <define name="NUMBER"> <data type="decimal"><param name="minInclusive">0</param></data> </define> </grammar> 70

An Introduction to XML and Web Technologies

Summary Summary

schema: formal description of the syntax of an XML language DTD: simple schema language

  • elements, attributes, entities, ...

XML Schema: more advanced schema language

  • element/attribute declarations
  • simple types, complex types, type derivations
  • global vs. local descriptions
  • ...

71

An Introduction to XML and Web Technologies

Essential Online Resources Essential Online Resources

http://www.w3.org/TR/xml11/ http://www.w3.org/TR/xmlschema-1/ http://www.w3.org/TR/xmlschema-2/