COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML - - PowerPoint PPT Presentation

comp60411 semi structured data and the web datatypes
SMART_READER_LITE
LIVE PREVIEW

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML - - PowerPoint PPT Presentation

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT Bijan Parsia and Uli Sattler University of Manchester 1 Sunday, 21 October 2012 1 Datatypes and representations Or, are you my type? 2


slide-1
SLIDE 1

1

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT

Bijan Parsia and Uli Sattler

University of Manchester

1 Sunday, 21 October 2012

slide-2
SLIDE 2

Datatypes and representations

Or, are you my type?

2 2 Sunday, 21 October 2012

slide-3
SLIDE 3

M3

  • Did you need for to write get-all-axioms.xquery?

– You did need disjunction in some form!

  • Consider predicate tests vs. functions returning nodes

– ssd:axiom(.) = true

  • How would we use this?

– ssd:axiom(.) as element()

  • How would we use this?
  • Types affect PSVI!

– Abstract types as well! – Inheritance simulates disjunction! (In this case!)

  • Nodes contain their children

– Nodes/elements are not tags!

3 3 Sunday, 21 October 2012

slide-4
SLIDE 4

Some SE3 Questions

  • Which query is most robust to changes in the schema?
  • 1. /*/(equivalent|subsumes|...)
  • 2. /*/*[ssd:axiom(.)]
  • 3. /*/element(*,el:Axiom)
  • 4. They are equi-robust (and fragile)
  • 5. They are equi-robust (and robust)
  • Which query is most widely usable?
  • 1. /*/(equivalent|subsumes|...)
  • 2. /*/*[ssd:axiom(.)]
  • 3. /*/element(*,el:Axiom)
  • 4. They are equi-usable (and not widely usable)
  • 5. They are equi-usable (and widely usable)

CLICK!

4 4 Sunday, 21 October 2012

slide-5
SLIDE 5

Robustness as a value

  • Robustness in the face of change

– A measure of evolvability

  • If something changes, does our system break?
  • If our system breaks, do we know that it broke?

– Did it fail silently.

  • If it broke, can we fix it?
  • If we “fixed” it, can we tell?

– Will anything else break?

  • Given a prospective change, can we predict breakage?
  • Robustness is an organization-wide phenomenon

– Fragility in one area can be compensated for by another

  • E.g., by someone who never sleeps and knows the system

– Different sorts of fragility

  • With different probabilities and costs

5 5 Sunday, 21 October 2012

slide-6
SLIDE 6

Type Systems

  • What, in the most general sense, is a datatype?
  • 1. A set of (data) values
  • 2. A description of the arguments of a function
  • 3. Anything derived from xs:anyType
  • 4. An annotation of a variable
  • Anything naming or describing a set

– ...has an associated type!

  • Types are just sets (of “values”)
  • The “extensional” view

– But we may not be able to express this type » in certain ways

  • A Type System is a language for

– describing types (the “intensional” view) – associating types with other linguistic entities

  • E.g., literals, variables, expressions, programs

CLICK!

6 6 Sunday, 21 October 2012

slide-7
SLIDE 7

A Typical Type System

  • Has a set of primitive or built-in or “basic” types

– Integer, strings, etc. – Typically, lots of builtin support

  • Has a set of composite types

– Arrays, records, dictionaries, etc. – Typically, there are constructors for composite types

  • So we can define an Array of Integers
  • Has a set of additional constructors

– To, for example, create other derived types – E.g., Positive Integers

  • Has a syntax for associating types with variables

– And functions, etc. – Type “Declarations”

  • A set of conditions for success or failure (Type Errors)

7 7 Sunday, 21 October 2012

slide-8
SLIDE 8

W3C XML Schema

  • Has/Is a type system

– Type are central, in fact – Both

  • a structuring mechanism
  • a way of modifying the PSVI

– integers rather than strings – elements have types as well as names

  • Large set of “simple” types

– Strings, integers, etc. in many flavors

  • Key composite type: “Complex”

– Essentially, element content models

  • But named
  • And composed

– Derviation by extension

– Other! (List, union, ...)

  • XML Schema (plus a little) is XQuery’s type system

8 8 Sunday, 21 October 2012

slide-9
SLIDE 9

A Brief Tour of Type Systems

  • Strong vs. Weak

– Type errors cause failure

  • Static vs. Dynamic

– Check at compile type or at run time

  • Explicit vs. Implicit Declarations

– Also known as Manifest vs. Latent – Type inference vs. type checking

  • Nominal vs. Structural

– Type compatibility relies on features of the declaration

  • I declare a two types, “miles” and “feet” whose values are integers
  • 1 as miles != 1 as feet

– Type compatibility relies entirely on the structure of the values

  • 1 as miles == 1 as feet (1 is the same integer!)

9 9 Sunday, 21 October 2012

slide-10
SLIDE 10

Some questions

  • Java’s type system is primarily
  • 1. strong, manifest, and nominal
  • 2. strong, manifest, and structural
  • 3. strong, latent, and nominal
  • 4. weak, latent, and structural
  • 5. weak, manifest, and nominal
  • XQuery’s type system is primarily
  • 1. strong, manifest, and nominal
  • 2. strong, manifest, and structural
  • 3. strong, latent, and nominal
  • 4. weak, latent, and nominal
  • 5. weak, manifest, and structural

CLICK!

10 10 Sunday, 21 October 2012

slide-11
SLIDE 11

Some Expression Examples

  • Consider a simple expression

– if (true()) then 1+1 else "2" – What is itʼs type? – if (true()) then 1+1 else "2" instance of xs:integer

  • Consider another

– if (false()) then 1+1 else "2"

– How about this?

– (if (false()) then 1+1 else "2") instance of xs:string

  • Finally

– if ($aBool) then 1+1 else "2"

– (Assume that $aBool is restricted to xs:boolean)

– (if ($aBool) then 1+1 else "2") instance of (xs:integer | xs:string)

  • What’s the most restrictive type of each of these?

Not legal XQuery

11 11 Sunday, 21 October 2012

slide-12
SLIDE 12

Mistyped

  • Obvious conflict

– "2" + 2 – Arithmetic operator is not defined for arguments of types (xs:integer, xs:string)

  • Slightly less obvious conflict

– (if (false()) then 1+1 else "2") + 2

  • Same as above

– (if (true()) then 1+1 else "2") + 2

  • This is fine!
  • Conflicts

– declare function ssd:test($x as xs:boolean) as xs:integer{ if ($x) then 1+1 else "2" + 2 }; – declare function ssd:test($x as xs:boolean) as xs:integer{ if ($x) then 1+1 else "2" }; My checker doesn’t flag this error It does flag this one!

12 12 Sunday, 21 October 2012

slide-13
SLIDE 13

Simple Promotion

  • Explicit

– (1.0 + ("1" cast as xs:integer)) instance of xs:decimal

  • True!
  • Implicit

– ((1.0 treat as xs:decimal) + 125E2) instance of xs:double

  • Also true

– Same as: ((1.0 cast as xs:double l) + 125E2) instance of xs:double

– Note that treat as and cast as are not the same

  • ("1.0" treat as xs:decimal)

– Required item type of value in 'treat as' expression is xs:decimal

  • r subtypes; supplied value has item type xs:string

– (1 treat as xs:integer) vs. (1 treat as xs:decimal) » Fixes the static type

  • ("1.0" cast as xs:decimal)

– This results in 1

13

Implicit cast here!

13 Sunday, 21 October 2012

slide-14
SLIDE 14

Complex Casting

http://msdn.microsoft.com/en-us/library/ms191231.aspx 14 14 Sunday, 21 October 2012

slide-15
SLIDE 15

Getting to PSVI

  • Consider a very simple XQuery

– import schema default element namespace "…” at "el-typed.xsd"; <instance-of> <constant name="sally"/> <atomic name="Person"/> </instance-of>/element(*, ClassExpression)

– No results!

  • Must validate!

– import schema default element namespace "..." at "el-typed.xsd"; validate {<instance-of> <constant name="sally"/> <atomic name="Person"/> </instance-of>}/element(*, ClassExpression)

– Returns: <atomic xmlns="..." name="Person"/> – validate generates a PSVI – Constructors don’t validate!

  • Casting only works with atomics

15 15 Sunday, 21 October 2012

slide-16
SLIDE 16

Complex Typed Transform

  • Input and output all typed

import schema namespace el="http://owl.cs.manchester.ac.uk/2010/comp/ssd-60372/day2/el" at "el-typed.xsd"; import schema namespace owl="http://www.w3.org/2002/07/owl#" at "owl2-xml.xsd"; declare namespace ex="http://ex.org"; declare function ex:convertAxiom($ax as element(*, el:Axiom)) as element(*, owl:Axiom){ typeswitch ($ax) case schema-element(el:equivalent) return validate{<owl:EquivalentClasses>{ for $expr in $ax/* return ex:convertExpression($expr)}</owl:EquivalentClasses>} default return validate {<owl:EquivalentClasses><owl:Class IRI="http://BOGUS"/><owl:Class IRI="http://BOGUS"/></

  • wl:EquivalentClasses>}

}; declare function ex:convertExpression($expr as element(*, el:ClassExpression)) as element(*,

  • wl:ClassExpression){

if ($expr instance of element(el:atomic)) then validate{<owl:Class IRI="{$expr/@name}"/>} else validate {<owl:Class IRI="http://BOGUS"/>} (:These would be easier if the elements were nilable:) }; declare function ex:convert($ont as element(*, el:Ontology)) as element(owl:Ontology, owl:Ontology){ validate{ <owl:Ontology> {for $e in $ont/element(*,el:Axiom) return ex:convertAxiom($e)} </owl:Ontology> } }; ex:convert(validate{doc("el1.xml")/*}) 16 16 Sunday, 21 October 2012

slide-17
SLIDE 17

Complex Typed Transform

  • Input and output all typed

<?xml version="1.0" encoding="UTF-8"?> <owl:Ontology xmlns:owl="http://www.w3.org/2002/07/owl#"> <owl:EquivalentClasses> <owl:Class IRI="http://BOGUS"/> <owl:Class IRI="http://BOGUS"/> </owl:EquivalentClasses> <owl:EquivalentClasses> <owl:Class IRI="http://BOGUS"/> <owl:Class IRI="http://BOGUS"/> </owl:EquivalentClasses> <owl:EquivalentClasses> <owl:Class IRI="Person"/> <owl:Class IRI="http://BOGUS"/> </owl:EquivalentClasses> <owl:EquivalentClasses> <owl:Class IRI="http://BOGUS"/> <owl:Class IRI="http://BOGUS"/> </owl:EquivalentClasses> </owl:Ontology>

The only “proper” value

17 17 Sunday, 21 October 2012

slide-18
SLIDE 18

Type Soundness

  • A (statically verified) type safe program

– has some guaranteed behavior

  • and thus can be transformed or optimized in aggressive ways

– may be more brittle

  • fails hard on invalid input
  • less input is valid

Type-inference rules are written in such a way that any value that can be returned by an expression is guaranteed to conform to the static type inferred for the expression. This property of a type system is called type soundness. A consequence of this property is that a query that raises no type errors during static analysis will also raise no type errors during execution on valid input data. The importance of type soundness depends somewhat on which errors are classified as "type errors," as we will see below.

http://www.informit.com/articles/article.aspx?p=100667&seqNum=6 18 18 Sunday, 21 October 2012

slide-19
SLIDE 19

Data Representations

  • Data and data structures have representations

– (More or less) Physical embodiments – (Ultimately) Bits in a machine

  • The “same” data can have distinct representations

– 1 vs. “one”

  • The “same” data structure can have distinct

representations

– At different levels of abstraction

  • One key distinction

– Internal (“in-memory”) – External (“on disk”)

  • Generally:

– External representations are for exchange between (heterogeneous) systems

“Location” doesn’t really matter

19 19 Sunday, 21 October 2012

slide-20
SLIDE 20

A Java Example (1)

  • Consider a value of type int*

– 109987

  • We have several canonical external representations:

– Decimal: 109987 – Hexadecimal: 1ADA3 (0x1ADA3 in source code) – Octal: 326643 (0326643 in source code)

  • We have one (canonical) internal representation:

– 32 bit, signed two’s complement

  • 11010110110100011

– (Each “digit” is a bit not a character)

– The representations are different

  • Decimal size in memory: Approx** 48 bytes
  • Internal rep: 4 bytes

*We consider only ints, i.e., 32 bit integers ** http://www.javaworld.com/javaworld/javatips/jw-javatip130.html?page=2 ** See also: http://lingpipe-blog.com/2010/06/22/the-unbearable-heaviness-jav-strings/

20 20 Sunday, 21 October 2012

slide-21
SLIDE 21

A Java Example (2)

  • We have APIs (the Integer class):

– Reading/Parsing/Deserializing/Unmarshalling – Writing/Printing/Serializing/Marshalling – ADT functions

  • +, -, /, *, <, >, etc.

– For examining and manipulating the internal rep

http://download.oracle.com/javase/6/docs/api/java/lang/Integer.html

21 21 Sunday, 21 October 2012

slide-22
SLIDE 22

JSON (1)

  • Javascript has a rich set of literals (ext. reps)

– Atomic (numbers, booleans, strings*)

  • 1, 2, true, “I’m a string”

– Composite

  • Arrays

– Ordered lists with random access – [1, 2, “one”, “two”]

  • “Objects”

– Associative arrays/dictionary – {“one”:1, “two”:2}

  • These can nest!

– [{“one”:1, “o1”:{“a1”: [1,2,3.0], “a2”:[]}]

  • JSON == roughly this subset of Javascript

– The internal representation varies

  • In JS, 1 represents a 64 bit, IEEE floating point number
  • In Python’s json module, 1 represents a 32 bit integer in two’s complement

*Strings can be thought of as a composite, i.e., an array of characters, but not here.

22 22 Sunday, 21 October 2012

slide-23
SLIDE 23

JSON (2)

{"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }} <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html

Slightly different!

23 23 Sunday, 21 October 2012

slide-24
SLIDE 24

JSON (2.1)

{"menu": [{ "id": "file", "value": "File"}, "popup": [ "menuitem": {"value": "New", "onclick": "CreateNewDoc()"}, "menuitem": {"value": "Open", "onclick": "OpenDoc()"}, "menuitem": {"value": "Close", "onclick": "CloseDoc()"} ] ] }} <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html

Needed to preserve

  • rder!

Still not right!

24 24 Sunday, 21 October 2012

slide-25
SLIDE 25

JSON (2.2)

{"menu": [{"id": "file", "value": "File"}, [{"popup": [{}, [{"menuitem": [{"value": "New", "onclick": "CreateNewDoc()"},[]]}, {"menuitem": [{"value": "Open", "onclick": "OpenDoc()"},[]]}, {"menuitem": [{"value": "Close", "onclick": "CloseDoc()"},[]]} ] ] } ] ] } <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 25 25 Sunday, 21 October 2012

slide-26
SLIDE 26

JSON (2.1) Recipe

  • Elements are mapped to “objects”

– With one pair

  • ElementName : contents
  • Contents are a list

– First item is an “object”, the attributes

  • Attributes are pairs of strings

– Second item is a list (of children)

  • Empty elements require an explicit empty list
  • No attributes requires an explicit empty object

Cumbersome!

26 26 Sunday, 21 October 2012

slide-27
SLIDE 27

JSON vs. XML (expressivity)

  • Every XML WF DOM can be faithfully represented as a

JSON object

  • Every JSON object can be faithfully represented as an

XML WF DOM

  • Every WXS PSVI can be faithfully represented as a JSON
  • bject
  • Every JSON object can be faithfully represented as a

WXS PSVI CLICK!

27 27 Sunday, 21 October 2012

slide-28
SLIDE 28

Conversion

  • We can go from internal to external (i2e)

– Parsing, reading, loading, de-serializing, unmarshalling

  • We can go from external to internal (e2i)

– Serializing, writing, printing, saving, marshalling – Different systems may have different internals

  • At least in detail

– Different applications may behave differently

  • There and back again

– Roundtripping

  • Internal to external to internal (e2i2e)
  • External to internal to external (i2e2i)
  • Ideally preserves key properties

– Which? – When is ok not to preserve?

28 28 Sunday, 21 October 2012

slide-29
SLIDE 29

What is an XML “Document”?

  • Layers

– A series of octets – A series of unicode characters – A series of “events”

  • SAX perspective
  • E.g., Start/End tags
  • Events are tokens

– A tree structure

  • A DOM/Infoset

– A tree of a certain shape

  • A Validated Infoset

– An adorned tree of a certain shape

  • A PSVI wrt an WXS

Errors here mean no XML! SAX ErrorHandler Yay! XPath! XSLT! Etc. Types in play

29

29 Sunday, 21 October 2012

slide-30
SLIDE 30

What is an XML “Document”?

  • Layers

– A series of octets – A series of unicode characters – A series of “events”

  • SAX perspective
  • E.g., Start/End tags
  • Events are tokens

– A tree structure

  • A DOM/Infoset

– A tree of a certain shape

  • A Validated Infoset

– An adorned tree of a certain shape

  • A PSVI wrt an WXS

validate erase

30

30 Sunday, 21 October 2012

slide-31
SLIDE 31

What is an XML “Document”?

  • Layers

– A series of octets – A series of unicode characters – A series of “events”

  • SAX perspective
  • E.g., Start/End tags
  • Events are tokens

– A tree structure

  • A DOM/Infoset

– A tree of a certain shape

  • A Validated Infoset

– An adorned tree of a certain shape

  • A PSVI wrt an WXS

“Same” inputs can have different “meanings”! (external validation)

31

31 Sunday, 21 October 2012

slide-32
SLIDE 32

What is an XML “Document”?

  • Layers

– A series of octets – A series of unicode characters – A series of “events”

  • SAX perspective
  • E.g., Start/End tags
  • Events are tokens

– A tree structure

  • A DOM/Infoset

– A tree of a certain shape

  • A Validated Infoset

– An adorned tree of a certain shape

  • A PSVI wrt an WXS

Generally looks like

<configuration xmlns="http://saxon.sf.net/ns/configuration" edition="EE"> <serialization method="xml" /> </configuration>

But can look otherwise!

element configuration { attribute edition {"ee"}, element serialization {attribute method {"xml"}}}

Same “meaning”, different spelling

32

32 Sunday, 21 October 2012

slide-33
SLIDE 33

What is an XML “Document”?

  • Layers

– A series of octets – A series of unicode characters – A series of “events”

  • SAX perspective
  • E.g., Start/End tags
  • Events are tokens

– A tree structure

  • A DOM/Infoset

– A tree of a certain shape

  • A Validated Infoset

– An adorned tree of a certain shape

  • A PSVI wrt an WXS

– A picture (or document, or action, or…)

  • Application meaning

Can have many... ..for “the same” meaning

33

33 Sunday, 21 October 2012

slide-34
SLIDE 34

34

The Essence of XML (with WXS)

  • Thesis:

– “XML is touted as an external format for representing data.”

  • Two properties

– Self-describing

  • Destroyed by external validation

– Round-tripping

  • Destroyed by defaults and union types

http://bit.ly/essenceOfXML2

34 Sunday, 21 October 2012

slide-35
SLIDE 35

35

The Essence of XML (with WXS)

  • Roundtripping issues

– Internal to external and back

  • Take an element, foo, with content {“one”, “2”, 3}
  • It’s (simple) type is a list of union of integer and string
  • Serialise

– <foo>one 2 3</foo>

  • Parse and validate

– Content is {“one”, 2, “3”}

– External to internal and back

  • “001” to 1 to “1”

http://bit.ly/essenceOfXML2

35 Sunday, 21 October 2012

slide-36
SLIDE 36

36

The Essence of XML (with WXS)

  • Conclusion:

– “So the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.”

  • Itʼs not obvious

– That the issues are serious (enough) – That the problem solved is all that easy – That there arenʼt other, worse issues

http://bit.ly/essenceOfXML2

36 Sunday, 21 October 2012

slide-37
SLIDE 37

S’more Tree Grammars

37 37 Sunday, 21 October 2012

slide-38
SLIDE 38

Tree Grammars: a reminder

38

N → P (PA | (FEd,SEd*))

– for each w ∈ nodes(T) with children w1 w2... wn, there exists a rule X → a e ∈ P such that

  • r(w) = X,
  • T(w) = a, and
  • r(w1) r(w2)... r(wn) matches e.

w1 P ? w w2 ? wn ? r(w)=N

r(w1) = FEd r(w2) =SEd r(wn) =SEd

... then, for w1,w2,..: check FEd → ? e1 SEd → ? e2

match?

  • Production rules

– are central to tree grammars – reflect element declarations

  • ...to be read as follows

38 Sunday, 21 October 2012

slide-39
SLIDE 39

Tree Grammars: 3 more things

★ A single-type grammar can have no more than one run on a tree. ★ A regular grammar can have more than one run on a tree.

  • BTW, w.l.o.g., we can assume that no two production rules have the same

non-terminal on the left hand side and the same terminal. I.e., no N → P PA and N → P (Editor,Editor*). We can also rewrite those, e.g., to N → P (PA | (Editor,Editor*))

  • ...so, how did we get here? From DTDs and XML schemas!

39 39 Sunday, 21 October 2012

slide-40
SLIDE 40

Tree Grammars and DTDs

  • since DTDs don’t have “types”, just element names, they correspond

to grammars of a peculiar, simple kind:

★ Tree grammars for DTDs are always local

...even if the DTD has a non-deterministic content model <!ELEMENT N1 (M|(M,M))> is not deterministic and thus illegal (but can be replaced with <!ELEMENT N1 (M,(M|ε))>)

<!ELEMENT T (N1,N2*)> <!ELEMENT N1 (M|(M,M))> <!ELEMENT N2 (#PCDATA)> <!ELEMENT M (#PCDATA)> F = (N, Σ, S, P) with N = {T, N1, N2, M, pcdata} Σ = {T, N1, N2, M, pcdata} S = {T} P = { T → T (N1,N2*), N1 → N1 (M|(M,M)), N2 → N2 pcdata, M → M pcdata, pcdata → pcdata ε}

ε

0,0

T N1 1,0 M

pcdata

1 N2 0,0,0 pcdata

40 40 Sunday, 21 October 2012

slide-41
SLIDE 41

Remember?!

  • in DTDs and in WXS, content models are further restricted

(for compatibility with SGML)

– [DTD] determistic (or 1-unambiguous), e.g., (M|(M,M)) is not deterministic, (M,(M|ε)) is. e.g., ((b, c) | (b, d)) is not deterministic, b,(c|d) is. From http://www.w3.org/TR/REC-xml/:

41

As noted in 3.2.1 Element Content, it is required that content models in element type declarations be deterministic. This requirement is for compatibility with SGML (which calls deterministic content models "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors.

More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of Aho, Sethi, and Ullman [Aho/Ullman]. In many such algorithms, a follow set is constructed for each position in the regular expression (i.e., each leaf node in the syntax tree for the regular expression); if any position has a follow set in which more than one following position is labeled with the same element type name, then the content model is in error and may be reported as an error.

41 Sunday, 21 October 2012

slide-42
SLIDE 42

Tree Grammars and DTDs

  • so, DTDs are local (and thus single-type) because they don’t have

types at all

– and not because their content model is deterministic! – they are single-type even with non-deterministic content model

  • hence we could extend DTDs with types and still be single-

type...provided we impose suitable restrictions

42 42 Sunday, 21 October 2012

slide-43
SLIDE 43

Tree Grammars and WXS

  • tree grammars also capture the basic, structural part of WXS:

✓ types (complex and anonymous)

  • model groups (we ignore them)
  • derivation by extension and restriction (we ignore them)
  • substitution groups (we ignore them)
  • integrity constraints like keys (must be ignored, don’t fit into tree grammars)
  • we only deal with simple XML schemas, but general approach works for more
  • to transform an XML schema S into a tree grammar G,
  • 1. we translate S into a generalized tree grammar
  • 2. then flatten the generalized tree grammar into a tree grammar G
  • this will be done such that T validates against S iff T is accepted by G.

43 43 Sunday, 21 October 2012

slide-44
SLIDE 44

Translating WXS into Tree Grammars

  • let S be a simple XML Schema

➡ for each top-level element in S of the form

– <xs:element name="mylist" type="BlistT"></xs:element>

  • add the following production rule to your grammar

– MYLIST → mylist BLIST^TYPE – add MYLIST, BLIST^TYPE to non-terminals, add mylist to terminals

➡ for each top-level element in S of the form

– <xs:element name="mylist">

<xs:complexType> <xs:sequence> <xs:element name="ename" type="CompT" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>

  • add the following production rules to your grammar

– MYLIST → mylist ENAME,ENAME* – ENAME → ename COMP^TYPE – add MYLIST, ENAME, COMP^TYPE to non-terminals, add mylist, ename to terminals what is the default for minOccurs?

44 44 Sunday, 21 October 2012

slide-45
SLIDE 45

Translating WXS into Tree Grammars

➡ for each top-level element in S of the form

– <xs:complexType name="BlistT"> <xs:sequence> <xs:element name="friend" type='PersonT' minOccurs = ʻ1ʼ maxOccurs ='2'/> </xs:sequence> </xs:complexType>

  • add the following production rules to your grammar

– BLIST^TYPE → (FRIEND | (FRIEND,FRIEND)) – FRIEND → friend PERSON^TYPE – add BLIST^TYPE, FRIEND, PERSON^TYPE to non-terminals, add friend to terminals

38

%% generalized rule: to be expanded!

45 45 Sunday, 21 October 2012

slide-46
SLIDE 46

Translating WXS into Tree Grammars

➡ for each top-level element in S of the form

  • <xs:complexType name="BBlistT">

<xs:choice> <xs:sequence> <xs:element name="A" type="xs:string"/> <xs:element name="B" type="xs:string"/> </xs:sequence> <xs:sequence> <xs:element name="A" type="xs:string"/> <xs:element name="C" type="xs:string"/> </xs:sequence> </xs:choice> </xs:complexType>

  • add the following production rules to your grammar

– BBLIST^TYPE → (A,B) | (A,C) – A → A STRING^TYPE – B → B STRING^TYPE – C → C STRING^TYPE – add BBLIST^TYPE, A, B, C, STRING^TYPE to non-terminals, add A, B, C to terminals %% generalized rule -- to be expanded! %% UPA - violation: %% Oxygen complains!

46 46 Sunday, 21 October 2012

slide-47
SLIDE 47

Translating WXS into Tree Grammars

  • Consider the following case:
  • To handle cases like the one above we can’t always add rules

– AT^TYPE → N*, BT^TYPE → N* – N → N ??LIST^TYPE

  • Instead, we translate these as

– AT^TYPE → N^AS^ALIST^TYPE* BT^TYPE → N^AS^BLIST^TYPE* – N^AS^ALIST^TYPE → N ALIST^TYPE – N^AS^BLIST^TYPE → N BLIST^TYPE

<xs:complexType name="AT"> <xs:sequence> <xs:element name="N" type="AlistT" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="BT"> <xs:sequence> <xs:element name="N" type="BlistT" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType>

47 47 Sunday, 21 October 2012

slide-48
SLIDE 48

Translating WXS into Tree Grammars

Our translation yields almost a tree grammar:

  • it produces illegal rules of the form X → e, i.e., without non-terminal

– e.g., BLIST^TYPE → (FRIEND | (FRIEND,FRIEND))

  • ur grammar model doesn’t handle those (check definition of a run)

๏ hence we expand these illegal rules:

  • e.g., MYLIST → mylist BLIST^TYPE would be transformed into

– MYLIST → mylist (FRIEND | (FRIEND,FRIEND))

  • ...and if we had <xs:element name="yourlist" type="Blist"/> then we also had

– YOURLIST → yourlist BLIST^TYPE and thus

– YOURLIST → yourlist (FRIEND | (FRIEND,FRIEND)) pick illegal rule X → e: – remove X → e from rule set – replace all occurrences of X in rule set with e until no illegal rules are left in rule set

48 48 Sunday, 21 October 2012

slide-49
SLIDE 49

Translating WXS into Tree Grammars

  • Expanding illegal rules even works with cyclic type definitions - try
  • This gives you these rules, including 2 illegal rules
  • ...which can be expanded as follows:

<xs:complexType name="NT"> <xs:choice> <xs:element name="test2" type="AT"/> <xs:element name="EndElement" type="xs:string"/> </xs:choice> </xs:complexType> <xs:complexType name="AT"> <xs:choice> <xs:element name="test1" type="NT"/> <xs:element name="EndElement" type="xs:string"/> </xs:choice> </xs:complexType>

NT^TYPE → (TEST2 | ENDELEMENT) TEST2 → test2 AT^TYPE ENDELEMENT → EndElement STRING^TYPE AT^TYPE → (TEST1 | ENDELEMENT) TEST1 → test1 NT^TYPE ENDELEMENT → EndElement STRING^TYPE TEST2 → test2 (TEST1 | ENDELEMENT) ENDELEMENT → EndElement STRING^TYPE TEST1 → test1 (TEST2 | ENDELEMENT) ENDELEMENT → EndElement STRING^TYPE

49 49 Sunday, 21 October 2012

slide-50
SLIDE 50

WXS and Tree Grammars

  • So, to transform an XML schema S into a tree grammar G,
  • 1. we translate S into a generalized tree grammar G’
  • 2. then expand G’ into a tree grammar G

★ Then any tree T validates against S iff T is accepted by G.

  • So, what are the tree grammars we get as results?

– they are tree grammars – are they single-type? – are they local?

★ Tree grammars corresponding to WXS are not local.

  • E.g., consider

– N^AS^ALIST^TYPE → N ALIST^TYPE – N^AS^BLIST^TYPE → N BLIST^TYPE – .. N^AS^ALIST^TYPE and N^AS^BLIST^TYPE are competing!

Loc ST Reg

50 50 Sunday, 21 October 2012

slide-51
SLIDE 51

WXS and Tree Grammars

★ Tree grammars corresponding to WXS are single-type.

– This is ensured by the Unique Particle Attribution constraint in WXS.

  • Tree grammars corresponding to DTDs are local,

….hence

★ DTDs are less expressive than XML schemata.

  • That is, there are tree languages that

we can describe in WXS, but not in DTDs, e.g.:

Loc ST Reg

N = {Book, PA, Editor, A, Paper, F, L} Σ = {B,N,A,P,C} S = {Book, Paper} P = { Book → B Editor|PA, Paper → P PA, Editor → N F,L, PA → N L,A, F → F ε, L → L ε, A → A ε }

L ε 0,0 B N 0,1 F ε 0,0 P N 0,1 A L

51 Sunday, 21 October 2012

slide-52
SLIDE 52

Remember:

  • In XML Schema, content model is constrainted as well

– to make validation easier & for compatibility with SGML – e.g., through Unique Particle Attribute Constraint:

52

A content model must be formed such that during validation of an element information item sequence, the particle component contained directly, indirectly or implicitly therein with which to attempt to validate each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence.

http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#cos-nonambig

Rephrasing: a content model M must be formed such that, during validation of an element E’s childnode sequence E1...Ek, we can, starting from i = 1 and increasing, associate each Ei with a single particle contained (possibly implicitly) in M without examining the content or attributes of Ei, and without any information about any Ej with j >i.

52 Sunday, 21 October 2012

slide-53
SLIDE 53

Content models & types in DTD & WXS

  • (we already know that) in WXS, we have a type hierarchy

– an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y

  • but you have to say so explicitly:

– we call this ‘named’ typing:

  • sub-types are declared (restriction
  • r extension), and not inferred

(by comparing structure)

– in DTDs, we don’t have types!

  • In order to prevent difficulties in WXS as caused by types,

Element Declarations Consistent constraint is imposed:

<xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> 53 <person phone="2"> <Name>Peter</Name> <DoB>1966-05-04</DoB></person> <person xsi:type="LongPersonType" phone="5432"> <Name>Paul</Name> <DoB>1967-05-04</DoB> <address>Manchester</address></person> 53 Sunday, 21 October 2012

slide-54
SLIDE 54

Outlook: next steps

Loc ST Reg

54

✓ we have now seen that

  • DTDs local grammars
  • WXS single-type grammars

➡ DTDs are structurally weaker than WXS

  • RelaxNG: an even stronger schema language
  • RelaxNG regular grammars

➡ DTDs are structurally weaker than WXS

  • we will also look into how computationally expensive validation is
  • against DTD/local grammar
  • against WXS/single-type grammar
  • against RelaxNG/regular grammar

➡ ...all roughly the same!

54 Sunday, 21 October 2012

slide-55
SLIDE 55

Relax NG, a very powerful schema language

55 55 Sunday, 21 October 2012

slide-56
SLIDE 56

56

Relax NG: yet another schema language

  • Relax NG was designed to be a simpler schema language
  • (described in a readable on-line book by Eric Van der Vlist)
  • and allows us to describe (valid) XML documents in terms of their

tree abstractions:

– no default attributes – no entity declarations – no key/uniqueness constraints – minimal datatypes: only “token” and “string” like DTDs (but a mechanism to use XSD datatypes)

  • since it is so simple/flexible

– it’s (claimed to be) easy to use – it doesn’t have complex constraints on description of element content like determinism/1-unambiguity – it’s claimed to be reliable – but you need other tools to do other things (like datatypes and attributes)

56 Sunday, 21 October 2012

slide-57
SLIDE 57

57

Relax NG: another side of Determinism

  • remember that DTDs and WXS required their content models to be

– [DTD] deterministic (and thus look-ahead-free) – [WXS] deterministic (EDC, every matching child node sequence matches in exactly one way only) – [WXS] UPA constraint expresses both and other constraints even more

  • determinism & single-typeness have a reason:

– some tools annotate a (valid) document while parsing:

  • type information -- to be exploited, e.g., for concise queries (remember assignment?)
  • default attribute values

– if your schema is not single-type, then

  • tools validating the same document against the same schema may construct different

PSVIs

  • this can happen with different tools or different runs of the same tool

57 Sunday, 21 October 2012

slide-58
SLIDE 58

58

RelaxNG: another side of Validation

Reasons why one would want to validate an XML document:

  • ensure that structure is ok
  • ensure that values in elements/attributes are of the correct type
  • generate PSVI to work with
  • check constraints on co-occurrence of elements/how they are related
  • check other integrity constraints, eg. a person age vs. their mother’s age
  • check constraints on elements/their value against external data

– postcode correctness – VAT/tax/other numeric constraints – spell checking

...only few of these checks can be carried out by validating against schemas... Relax NG was designed to

  • 1. validate structure and
  • 2. link to datatype validators to type check values of elements/attributes

58 Sunday, 21 October 2012

slide-59
SLIDE 59

59

Relax NG: basic principles

  • Relax NG is based on patterns (similar to XPath expressions):

– a pattern is a description of a set of valid node sets – we can view our example as different combinations

  • f different parts, and

design patterns for each – enhanced flexibility

<?xml version="1.0" encoding="UTF-8"?> <people> <person age="41"> <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>.... </people>

59 Sunday, 21 October 2012

slide-60
SLIDE 60

60

Relax NG: good to know

Relax NG comes in 2 syntaxes

  • the compact syntax

– succinct – human readable

  • the XML syntax

– verbose – machine readable

ü Trang converts between the two, pfew! (and also into/from

  • ther schema

languages) ü Trang can be used from Oxygen grammar { start = element name { element first { text }, element last { text } }} <grammar xmlns="http:...” xmlns:a="http:.." datatypeLibrary="http:...> <start> <element name="name"> <element name="first"><text/></element> <element name="first"><text/></element> </element> </start> </grammar>

60 Sunday, 21 October 2012

slide-61
SLIDE 61

61

Relax NG - structure validation:

  • 3 kinds of patterns, for the 3 “central” nodes:

– text – attribute – element

  • these can be combined

  • rdered groups

– unordered groups – choices

  • we can constrain cardinalities of patterns
  • text nodes

– can be marked as “data” and linked

  • we can specify libraries of patterns

<?xml version="1.0" encoding="UTF-8"?> <people> <person age="41"> <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>.... </people>

element name { element first { text }, element last { text }}

is a RelaxNG schema for (parts of) this:

61 Sunday, 21 October 2012

slide-62
SLIDE 62

62

Relax NG: ordered groups

  • we can name patterns
  • in strange “chains”
  • we can use ?, *, and +:

<?xml version="1.0" encoding="UTF-8"?> <people> <person age="41"> <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>.... </people> use “?” if

  • ptional

grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text },

  • element name {name-content},
  • element address { text }+,
  • element project {project-content}*

name-content = element first { text },

  • element middle { text }?,
  • element first { text }

project-content = attribute type { text },

  • attribute id {text},
  • text }

is a RelaxNG schema for this

62 Sunday, 21 October 2012

slide-63
SLIDE 63

<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="people"> <ref name="people-content"/> </element> </start> <define name="people-content"> <oneOrMore> <element name="person"> <ref name="person-content"/> </element> </oneOrMore> </define> <define name="person-content"> <attribute name="age"/> <element name="name"> <ref name="name-content"/> </element> <oneOrMore> <element name="address"> <text/> </element> </oneOrMore> <zeroOrMore> <element name="project"> <ref name="project-content"/> </element> </zeroOrMore> </define> <define name="name-content"> <element name="first"> <text/> </element> <optional> <element name="middle"> <text/> </element>

Relax NG: ordered groups in XML syntax (Trang knows…)

63 grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text },

  • element name {name-content},
  • element address { text }+,
  • element project {project-content}*

name-content = element first { text },

  • element middle { text }?,
  • element first { text }

project-content = attribute type { text },

  • attribute id {text},
  • text }
  • ur schema in compact syntax:
  • ur schema in XML syntax:

use Trang to convert ⇆

63 Sunday, 21 October 2012

slide-64
SLIDE 64

64

Relax NG: different styles

grammar { start = people-element people-element = element people { person-element+ } person-element = element person {

  • attribute age { text },
  • name-element,
  • address-element+,
  • project-element*}

name-element = element name {

  • element first { text },
  • element middle { text }?,
  • element last { text } }

address-element = element address { text } project-element = element project {

  • attribute type { text },
  • attribute id {text},
  • text }}
  • so far, we modelled ‘element centric’...we can model ‘content centric’:

grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text },

  • element name {name-content},
  • element address { text }+,
  • element project {project-content}*

name-content = element first { text },

  • element middle { text }?,
  • element first { text }

project-content = attribute type { text },

  • attribute id {text},
  • text }

64 Sunday, 21 October 2012

slide-65
SLIDE 65

65

Relax NG - structure validation: ordered groups

  • we can combine patterns in fancy ways:

grammar {start = element people {people-content} people-content = element person { person-content }+ person-content = HR-stuff,

  • contact-stuff

HR-stuff = attribute age { text },

  • project-content

contact-stuff = attribute phone { text },

  • element name {name-content},
  • element address { text }

name-content = element first { text },

  • element middle { text }?,
  • element first { text }

project-content = element project { attribute type { text },

  • attribute id {text},
  • text }+}

<?xml version="1.0" encoding="UTF-8"?> <people> <person age="41"> <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>.... </people>

65 Sunday, 21 October 2012

slide-66
SLIDE 66

66

Relax NG: structure validation summary

  • Relax NG’s specification of structure differs from DTDs and XSD:

– grammar oriented – 2 syntaxes with automatic translation – flexible: we can gather different aspects of elements into different patterns – unconstrained: no constraints regarding unambiguity/1-ambiguity/deterministic content model/Unique Particle Constraints/Element Declarations Consistent – like for XSD, we have an “ALL” construct for unordered groups, “interleave” &: element person { attribute age { text}, attribute phone { text}, name-element , address-element+ , project-element*} here, the patterns must appear in the specified order, (except for attributes, which are allowed to appear in any order in the start tag): here, the patterns can appear any order: element person { attribute age { text } & attribute phone { text} & name-element & address-element+ & project-element*}

66 Sunday, 21 October 2012

slide-67
SLIDE 67

Translating Relax NG into tree grammars by example 1

  • ...let’s see one more

67 grammar { start = AddressBook AddressBook = element addressBook { Card* } Card = element card { Inline } Inline = Name, Email+ Name = element name { text } Email = element email { text } }

Translate into G=(N, Σ, S, P) with N = {AddressBook, Card, Inline, Name, Email, Pcdata} Σ = {addressBook, card, name, email, pcdata} S = {AddressBook} P = {AddressBook → addressBook Card*, Card → card Inline, Inline → Name, Email+, Name → name Pcdata, Email → email Pcdata, Pcdata → pcdata ϵ }

“element y” ➟ y ∈ Σ ...possibly also “uppercased copy” ➟ Y ∈ N all other user defined symbols X ➟ X ∈ N ...translate Relax NG rules easy (depending on Relax NG style)

67 Sunday, 21 October 2012

slide-68
SLIDE 68

Translating Relax NG into tree grammars by example 2

68

grammar { start = p-el p-el = element people { per-el+ } per-el = element person { attribute age { text }, na-el, ad-el+, pro-el*} na-el = element name { element first { text }, element middle { text }?, element last { text } } ad-el = element address { text } pro-el = element project { attribute type { text }, attribute id {text}, text }} Translate into G = (N, Σ, S, P) with N = {P-EL, PER-EL, NA-EL, AD-EL, PRO-EL, FIRST, MIDDLE, LAST, Pcdata} Σ = {people, person, name, first, middle, last, address, project} S = {P-EL} P = {P-EL → people PER-EL, PER-EL*, PER-EL → person NA-EL,AD-EL, AD-EL*,PRO-EL* NA-EL → name FIRST, (MIDDLE|ε), LAST, FIRST → first Pcdata, MIDDLE → middle Pcdata, LAST → last Pcdata, AD-EL → address Pcdata, PRO-EL → project Pcdata, Pcdata → pcdata ϵ }

Ignore! Ignore!

This Relax NG style makes translation of rules easy

68 Sunday, 21 October 2012

slide-69
SLIDE 69

Translating Relax NG into tree grammars by example 3

69

grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text }, element name {name-content}, element address { text }+, element project {project-content}* name-content = element first { text }, element middle { text }?, element last { text } project-content = attribute type { text }, attribute id {text}, text } Translate into G=(N, Σ, S, P) with N = {PEOPLE, P-C, PER-C, NA, NA-C, PERSON, PRO-C,ADR, PROJ, PRO-C, FIRST, MIDDLE,LAST, Pcdata} Σ = {people, person, name, first, middle, last, address, project} S = {PEOPLE} P = {PEOPLE → people P-C, P-C → PERSON, PERSON*, PERSON → person PER-C, PER-C → NA, ADR, ADR*,PROJ, NA → name NA-C, ADR → address Pcdata, PROJ → project PRO-C, PRO-C → pcdata ϵ, NA-C → FIRST,(MIDDLE|ϵ),LAST FIRST → first Pcdata, MIDDLE → middle Pcdata, LAST → last Pcdata, Pcdata → pcdata ϵ }

Ignore! Ignore! expand! expand!

This Relax NG style makes translation of rules less easy… and leads to generalized rules!

69 Sunday, 21 October 2012

slide-70
SLIDE 70

Translating Relax NG into tree grammars by example 3

Two things we have already seen when translating WXS:

  • that we might need to introduce “generalized” rules -- which can & need to

be expanded, as for WXS:

  • we might have to “contextualise” names and types of elements: ...

70 ... people-content = element person { person-content }+ ..... person-content = attribute age { text }, element name {name-content}, element address { text }+, element project {project-content}*

... PERSON → person PER-C, PER-C → NA, ADR, ADR*,PROJ, NA → name NA-C, ADR → address Pcdata, ...

expand!

for each illegal rule X → e: – remove X → e from rule set – replace all occurrences of X in rule set with e

70 Sunday, 21 October 2012

slide-71
SLIDE 71

... P-C → PERSON, PERSON*,FRIEND,FRIEND* PERSON → person PER-C, FRIEND → friend FRIE-C, PER-C → NA^NA-C, ... FRIE-C → NA^FRIE-NA-C, ... NA^NA-C → name NA-C, NA^FRIE-NA-C → name FRIE-NA-C, ...

... people-content = element person { person-content }+, element friend {friend-content }+ ..... person-content = attribute age { text }, element name {name-content}, ... friend-content = attribute age { text }, element name {friend-name-content}, ...

Translating Relax NG into tree grammars by example 4

71

  • 2. we might have to “contextualise” names and types of

elements, to handle schemas where the same element name is used in different contexts with different types:

71 Sunday, 21 October 2012

slide-72
SLIDE 72

Translating Relax NG into tree grammars

  • each Relax NG schema can be faithfully translated into a tree grammar:

– local? no: example on previous slide leads to competing non-terminals (NA^PER-C and NA^FRIE-C) – single-type? no: see example below NA^NA-C and NA^FO-NA-C compete and occur in the same RHS – so is Relax NG as powerful as tree grammars?

72

... NA^PER-C → name NA-C, NA^FRIE-C → name NA-C, ...

... person-content = attribute age { text }, element name {name-content} | element name {foreign-name-content}, ...

... PER-C → NA^NA-C | NA^FO-NA-C NA^NA-C → name NA-C, NA^FO-NA-C → name FO-NA-C, ... 72 Sunday, 21 October 2012

slide-73
SLIDE 73

Relax NG schema is indeed as powerful as tree grammars

★ Every tree grammar can be faithfully translated into a Relax NG schema.

  • Proof (not too hard): given a tree grammar G = (N, Σ, S, P),
  • 1. translate each production rule N → t regexp in P into

(fortunately, the tree grammar regular expression syntax is very close to and more strict than Relax NG regular expression syntax)

  • 2. Put the resulting statements into

a grammar, where N1 , ... , Nk are all start symbols, i.e., S = {N1 , ... , Nk}

  • 3. Call the resulting schema GS

★ Then T ∈ L(G) if and only if T validates against GS.

73

N = element t { regexp } grammar {start = N1 | ... | Nk ..... }

73 Sunday, 21 October 2012

slide-74
SLIDE 74

Tree Grammars and Schema Languages

74

Loc ST Reg DTD WXS Relax NG

with our knowledge

74 Sunday, 21 October 2012

slide-75
SLIDE 75

Outlook: next steps

Loc ST Reg

75

✓ we have now seen that

  • DTDs local grammars
  • WXS single-type grammars
  • RelaxNG regular grammars

➡ DTDs are structurally weaker than WXS ➡ DTDs are structurally weaker than WXS

  • we will also look into how computationally expensive validation is
  • against DTD/local grammar
  • against WXS/single-type grammar
  • against RelaxNG/regular grammar

➡ ...all roughly the same!

75 Sunday, 21 October 2012

slide-76
SLIDE 76

How costly is validaty testing? … Does it matter against which kind of schema? … Is Single-Type cheaper than general?

76 76 Sunday, 21 October 2012

slide-77
SLIDE 77

Schema Languages and Tree Grammars

  • We will look at:

– the problem of – algorithms for

77

validating a document against a schema!

algorithm Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise See the paper by Murata, Lee, Mani, Kawaguchi

77 Sunday, 21 October 2012

slide-78
SLIDE 78
  • To design our “schema validator”,
  • 1. we start with the easy case: assume that G is local

(this gives us automatically a validator for structural aspect of DTDs)

  • 2. then expand algorithm to single-type

(this gives us automatically a validator for structural aspect of WXS)

  • 3. then expand to general tree grammars (...Relax NG)

– we also assume that we have a subroutine – to see how to build that one (it’s based on a translation of regular expressions into finite state machines (aka automata)), consult

  • remember your undergraduate studies (?)
  • read it up, e.g., in the textbook by Hopcroft, Ullman

78

ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise MatchAlgo String w regular expression e “yes”, if w ∈ L(e), (w matches e) “no”, otherwise

78 Sunday, 21 October 2012

slide-79
SLIDE 79

79

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Loc ST Reg

let’s start simple!

79 Sunday, 21 October 2012

slide-80
SLIDE 80

General idea of algorithm

  • ur algorithm visits a tree in a depth-first, left-2-to-right manner
  • whenever we visit a node
  • n our way

– down, we push relevant information for this node on stacks – up, we pop relevant information for this node from stacks

  • hence, whenever we are at a

node n during this traversal, all relevant information regarding all ancestors of n are (in reverse

  • rder), on our stacks

80

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

80 Sunday, 21 October 2012

slide-81
SLIDE 81

Input: DOM Tree for T, local tree grammar G = (N, Σ, S, P), NT is a stack of strings of non-terminals R is a stack of production rules Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down,

if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop

When an element E is visited on way up,

pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT else report “not accepted” and stop

report “accepted” and stop

81

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

See the paper by Murata, Lee, Mani, Kawaguchi

locality ⇒ unique

store rule for E’s content in R start remembering E’s child nodes retrieve rule for E’s content in R retrieve E’s child nodes add E’s terminal node to its predecessor siblings to store NTs of child nodes 81 Sunday, 21 October 2012

slide-82
SLIDE 82
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

82

a c c b c b

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

R NT

Stack of rules Stack of NT strings

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

82 Sunday, 21 October 2012

slide-83
SLIDE 83
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

83 15

a c c b c b R NT S → a B,B* ϵ

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

83 Sunday, 21 October 2012

slide-84
SLIDE 84
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

84

a c c b c b R NT

B → b (C,C)|C S → a B,B* ϵ ϵ

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

84 Sunday, 21 October 2012

slide-85
SLIDE 85
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

85

a c c b c b R NT

C → c ϵ|C B → b (C,C)|C S → a B,B* ϵ ϵ ϵ

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

85 Sunday, 21 October 2012

slide-86
SLIDE 86
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

86

a c c b c b R NT

B → b (C,C)|C S → a B,B* ϵ ϵ C → c ϵ|C

ϵ

yes, ϵ ∈ L(ϵ|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

86 Sunday, 21 October 2012

slide-87
SLIDE 87
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

87

a c c b c b R NT

B → b (C,C)|C S → a B,B* C ϵ C → c ϵ|C

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

87 Sunday, 21 October 2012

slide-88
SLIDE 88
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

88

a c c b c b R NT

C ϵ C → c ϵ|C B → b (C,C)|C S → a B,B* ϵ ϵ

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

88 Sunday, 21 October 2012

slide-89
SLIDE 89
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

89

a c c b c b R NT

C ϵ B → b (C,C)|C S → a B,B* ϵ C → c ϵ|C

ϵ

yes, ϵ ∈ L(ϵ|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

89 Sunday, 21 October 2012

slide-90
SLIDE 90
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

90

a c c b c b R NT

B → b (C,C)|C S → a B,B* CC ϵ C → c ϵ|C

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

90 Sunday, 21 October 2012

slide-91
SLIDE 91
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

91

a c c b c b R NT

S → a B,B* ϵ B → b (C,C)|C

CC

yes, CC ∈ L((C,C)|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

91 Sunday, 21 October 2012

slide-92
SLIDE 92
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

92

a c c b c b R NT

S → a B,B* B

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

92 Sunday, 21 October 2012

slide-93
SLIDE 93
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

93

a c c b c b R NT

B → b (C,C)|C S → a B,B* ϵ B

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

93 Sunday, 21 October 2012

slide-94
SLIDE 94
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

94

a c c b c b R NT

C → c ϵ|C B → b (C,C)|C S → a B,B* ϵ ϵ B

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

94 Sunday, 21 October 2012

slide-95
SLIDE 95
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

95

a c c b c b R NT

B → b (C,C)|C S → a B,B* ϵ B C → c ϵ|C

ϵ

yes, ϵ ∈ L(ϵ|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

95 Sunday, 21 October 2012

slide-96
SLIDE 96
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

96

a c c b c b R NT

B → b (C,C)|C S → a B,B* C B C → c ϵ|C

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

96 Sunday, 21 October 2012

slide-97
SLIDE 97
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

97

a c c b c b R NT

S → a B,B* B B → b (C,C)|C

C

yes, C ∈ L((C,C)|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

97 Sunday, 21 October 2012

slide-98
SLIDE 98
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

98

a c c b c b

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

R NT

S → a B,B* BB B → b (C,C)|C ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

98 Sunday, 21 October 2012

slide-99
SLIDE 99
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

99

a c c b c b R NT BB

S → a B,B* yes, BB ∈ L(B,B*)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

99 Sunday, 21 October 2012

slide-100
SLIDE 100
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, C → c ϵ|C}

100

a c c b c b R NT

“accepted” (“yes”), T ∈ L(G)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop report “accepted” and stop

☜ Check slide 74

ValAlgo XML doc/Tree T local Grammar G “yes”, if T ∈ L(G) “no”, otherwise

100 Sunday, 21 October 2012

slide-101
SLIDE 101

Validating trees against tree grammars

  • want to implement this algorithm?

– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion

  • ...and we can use this algorithm for general DTDs!
  • ...next week, we’ll see how this works for

– single-type tree grammars (and WXS)

  • rather straightforward because we still only have at most one run of our tree grammar
  • n the input tree
  • remember: single-type means that no start symbols compete and no RHS of any rule

contains competing non-terminals

  • so we won’t need to guess production rules, just check content model of predecessor

node

– general tree grammars (and Relax NG)...

101 101 Sunday, 21 October 2012

slide-102
SLIDE 102

XSLT

102

102 Sunday, 21 October 2012

slide-103
SLIDE 103

103

XSLT: general stuff

  • XSLT 1.0 is a W3C standard since 1999

– see http://www.w3.org/TR/xslt – makes heavy use of XPath 1.0

  • XSLT 2.0 is a W3C standard since January 2007

– see http://www.w3.org/TR/xslt20 – makes heavy use of XPath 2.0

  • is a descendant of the style-sheet language XSL, but has

become independent

  • is a Turing-complete, functional programming language,

designed for the transformation of XML documents into Unicode-streams, where transformation includes the – selection of parts of the source document, – their re-arrangement, and – the derivation of new content

103 Sunday, 21 October 2012

slide-104
SLIDE 104

104

XSLT: general stuff -- what is XSLT

XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary. XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformations that are needed when XSLT is used as part of XSL.

from: http://www.w3.org/TR/xslt

104 Sunday, 21 October 2012

slide-105
SLIDE 105

105

XSL = XSLT ∪ XSL-FO

  • XSL, the eXtensible Stylesheet Language, consists of 2 parts:

– XSLTransformations, which we will discuss here, and – XSL Formatting Objects, an XML-based formalism to describe the layout of a document

  • ften, we

– first use XSLT to transform, filter & shape an XML document, and – then use XSL-FO or CSS to specify its layout, e.g.,

  • margins, text boxes, padding, footers, etc.

– currently, XSL-FO is (in contrast to CSS) little used & little supported – but XSL-FO is a bit more powerful than CSS…

105 Sunday, 21 October 2012

slide-106
SLIDE 106

106

XSLT: stylesheet

  • an XSLT stylesheet is a

– well-formed XML document – conforming to the XML namespaces – which uses elements from the namespace http://www.w3.org/1999/XSL/Transform – using traditionally “xsl” as a prefix for this namespace as in <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/> – whose root element is of type xsl:transform or xsl:stylesheet

  • stylesheet is a synonym for transformation, both in documentations and

in XML documents, i.e., a transformation can have root element – xsl:transform or – xsl:stylesheet

106 Sunday, 21 October 2012

slide-107
SLIDE 107

Is XSLT Schema Aware?

  • Information from a schema can be used both

– statically: when the stylesheet is compiled, and – dynamically: during evaluation of the stylesheet to transform a source document.

  • In a stylesheet (e.g., in XPath expressions and patterns), we

may refer to named types from a schema (e.g., Person from

<xs:complexType name="Person">)

  • The conformance rules for XSLT 2.0 distinguish between a

– basic XSLT processor and a – schema-aware XSLT processor – in <oXygen>, you have both

  • Helpful: http://www.ibm.com/developerworks/xml/library/x-

schemaxslt.html

107

107 Sunday, 21 October 2012

slide-108
SLIDE 108

108

XSLT: stylesheet

  • a stylesheet describes/tells an XSLT processor how to transform

a

  • via XML template rules which associate
  • which are then used by an XSLT processor as follows:

result tree (or text) source tree into a

templates patterns

with

instantiate corresponding template to create parts of the result tree match pattern against elements in source tree

108 Sunday, 21 October 2012

slide-109
SLIDE 109

109

XSLT: stylesheet

<xsl:stylesheet version="1.0”

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”>

top-level-elements </xsl:stylesheet>

Alternatively: <xsl:transform version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”> top-level-elements </xsl:transform>

An xsl:stylesheet can have zero or more of each of the following elements in (almost) any order:

xsl:import xsl:include xsl:strip-space xsl:preserve-space xsl:output xsl:key xsl:decimal-format xsl:namespace-alias xsl:attribute-set xsl:variable xsl:param xsl:template

later and in more detail

109 Sunday, 21 October 2012

slide-110
SLIDE 110

110

XSLT: top-level elements

  • xsl:include: a multi-document or modularity mechanism

– results in the union of the documents – origin of declarations has no effect on their priority

  • xsl:import: a slightly different multi-document or modularity mechanism

– like xsl:include, but local declarations are “stronger” than imported ones – can only occur as child nodes of the root element & before other elements

  • xsl:strip-space and xsl:preserve-space

– to specify element names for which white space should be removed/ preserved – preserving white space is default – e.g., <xsl:strip-space elements="year city country"/> <xsl:preserve-space elements=”name title description" />

  • xsl:key

– to declare a named key to be used in the style sheet with the key() function – note: the key does not have to be unique!

110 Sunday, 21 October 2012

slide-111
SLIDE 111

111

XSLT: top-level elements …more..

Very specialized top-level elements

  • xsl:decimal-format

– to specify how to convert numbers into strings

  • xsl:namespace-alias

– to specify namespace replacement

  • xsl:attribute-set

– to name sets of attributes to be used in output

111 Sunday, 21 October 2012

slide-112
SLIDE 112

112

XSLT elements: template rule

  • (most important element!) a template rule is of the form

<xsl:template match=“expression” name=“qname” priority=“number” mode=“qname”> parameter-list template-def </xsl:template>

  • parameter-list is a list of zero or more xsl:param elements
  • as expression, an XPath location path can be used

– with some restrictions,e.g., it must evaluate to a node set – for XSLT 1.0, use XPath 1.0, – for XSLT 2.0, use XPath 2.0,

  • template-def is an XML document that makes use of other XSLT elements

– including instructions such as xsl:apply-templates or xsl:copy-of

  • ptional

the pattern the template

112 Sunday, 21 October 2012

slide-113
SLIDE 113

113

XSLT elements: template rules

<xsl:template match=expression name = qname priority = number mode = qname> parameter-list template-def </xsl:template>

  • Example: when applied to “<emph>important</emph>”,
  • careful: there

– are various built-in template rules – is a default prioritisation on template rules – is the XSLT processor who fires the templates rules

  • we will see later what elements we can use in template-def

<xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template> <fo:inline-sequence font-weight="bold"> important </fo:inline-sequence>

yields

113 Sunday, 21 October 2012

slide-114
SLIDE 114

114

XSLT elements: processing model, sketched

  • an XSLT processor takes an XML document d with associated stylesheet s
  • processes the (XPath DM) tree (possibly PSVI if SA) corresponding to d
  • in a depth-first manner

– thus we always have a context node

  • applies those template rules to the context node that

– match the context node and – have highest priority

  • thereby generating the result tree according to the template rules
  • the easiest way to generate output is to use literal elements

as the blue and green in the previous example:

<xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template>

114 Sunday, 21 October 2012

slide-115
SLIDE 115

115

XSLT elements: processing model by example

consider the following source tree:

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="my.xsl"?> <people> <person age="41"> <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> </person> <person age="43"> <name> <first>Tony</first> <last>Potter</last> </name> <address>4 Main Road </address> </person> </people>

<?xml .... ?>

...

115 Sunday, 21 October 2012

slide-116
SLIDE 116

116

XSLT elements: processing model by example

consider this source tree with the following XSLT stylesheet: what does this seemingly empty (no template rules!) stylesheet produce?

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> </xsl:stylesheet>

116 Sunday, 21 October 2012

slide-117
SLIDE 117

117

XSLT elements: processing model by example

(tricky!) the previous stylesheet was only seemingly empty because XSLT processors employ built-in template rules: thus templates are applied to all nodes (element, root, text,..)

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="*|/"> <xsl:apply-templates/> </xsl:template> <xsl:template match="text()|@*"> <xsl:value-of select="."/> </xsl:template> <xsl:template match="processing-instruction()|comment()"/> </xsl:stylesheet>

(1) for all element & document nodes (3) for all text and attribute nodes (2) don’t do anything but apply templates to all child nodes (4) return their value (5) ignore p-i & comments

117 Sunday, 21 October 2012

slide-118
SLIDE 118

118

XSLT elements: processing model by example

Built-in template rules:

(b) <xsl:template match="*|/"> <xsl:apply-templates select="node()"/> </xsl:template>

this is the default for “apply-templates”, and node() matches all nodes except attribute nodes & root node

(1) <xsl:template match="*|/"> <xsl:apply-templates select="node()|@*"/> </xsl:template>

if you want your stylesheet to consider attribute nodes, you must overwrite this default, e.g. like this

If we use template rule (1), then it over-rides built-in (b), hence now rules are applied to all nodes (element, root, text,..) including attribute nodes but still except namespace nodes

(node() matches any node other than an attribute node and the root node)

118 Sunday, 21 October 2012

slide-119
SLIDE 119

119

XSLT elements: processing model by example

what does this slightly more elaborate stylesheet yield? Note: <xsl:text> superfluous here, but helpful

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "person"> <xsl:text> Person found! </xsl:text> </xsl:template> </xsl:stylesheet>

119 Sunday, 21 October 2012

slide-120
SLIDE 120

120

XSLT elements: processing model by example

we can make use “functions” to retrieve the “value” of a node:

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "person"> Person found called:

<xsl:value-of select="name"/>

</xsl:template> </xsl:stylesheet>

120 Sunday, 21 October 2012

slide-121
SLIDE 121

121

we can conveniently copy a node and its complete sub-tree: alternatively, I could have used <xsl:copy-of select=“*”/>

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "people">

<family> <xsl:copy-of select="child::*"/> </family> </xsl:template>

</xsl:stylesheet>

XSLT elements: processing model by example

121 Sunday, 21 October 2012

slide-122
SLIDE 122

122

we can re-name elements and filter out data:

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="person"> <myFriend> <xsl:apply-templates select="@*|*|text()"/> </myFriend> </xsl:template> <xsl:template match="@*|text()|*"> <xsl:copy> <xsl:apply-templates select="@*|text()|*"/> </xsl:copy> </xsl:template> <xsl:template match="address"/> </xsl:stylesheet>

XSLT elements: processing model by example

122 Sunday, 21 October 2012

slide-123
SLIDE 123

123

we can even apply several rules to the same elements using modes for rules:

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:template match="/people"> <html><body><ol> <xsl:apply-templates select="person" mode="o"/> </ol> <xsl:apply-templates select="person" mode="f"/> </body></html> </xsl:template> <xsl:template match="person" mode="o"> <li> <xsl:value-of select="name/first"/> <xsl:value-of select="name/last"/></li> </xsl:template> <xsl:template match="person" mode="f"> <p> Last name: <xsl:value-of select="name/last"/> Age: <xsl:value-of select="age"/> </p> </xsl:template> </xsl:stylesheet>

XSLT elements: processing model by example

123 Sunday, 21 October 2012

slide-124
SLIDE 124

124

XSLT instructions: apply-templates

a statement A = <xsl:apply-templates select=location-path mode=mode-name>

  • can only have child nodes of the following two types, but any number of these:

– xsl:with-param to pass parameters into template rules – xsl:sort to sort the children before processing (and thereby to be used to sort the output)

  • location-path is a (restricted) XPath expression that evaluates to a node list,

evaluated from the current node to select a node set S

  • an XSLT proc. applies all A-applicable template rules to all nodes in S

– in either document order or in the one given through xsl:sort children

  • a template rule <xsl:template match=m1 priority = p1 mode = m2> is A-applicable

to a node n if – n is in the node set selected by m1 (in addition to being in S) – (in case that mode is used) m2 = mode-name – and it has highest priority (incl. default, order, and explicit priorities)

124 Sunday, 21 October 2012

slide-125
SLIDE 125

125

XSLT instructions: call-template

<xsl:call-template name = qname>

  • like xsl:apply-templates, but

– it requires the “name” attribute – only the template rule called qname (i.e., with name=qname set) is applied – if there is more than one template rule with the same name of the same import level, this will lead to an error

125 Sunday, 21 October 2012

slide-126
SLIDE 126

126

XSLT instructions: value-of

<xsl:value-of select=expression/>

  • is one of the generating instructions provided by XLST
  • it returns, for the first node selected through expression, the

string value that corresponds to that node, where the string value of

– a text node is its text – an attribute node is its value – an element or root node is the concatenation of the string values of all its descendant’s text nodes

  • ...all this is a bit more tricky if you use SA XSLT

– because then, we have more than “text” in text nodes, and need to take into types...

126 Sunday, 21 October 2012

slide-127
SLIDE 127

127

XSLT elements: generating instructions

  • literal result elements: a simple way to create new nodes, e.g., in

<xsl:template match=”person"> <Employee> <xsl:apply-templates/> </Employee> </xsl:template>

  • <xsl:text>: to produce pure text (and invoke error if elements are

produced), e.g., in <xsl:template match="person"> <xsl:text> Person found! </xsl:text> </xsl:template>

  • <xsl:element name=“qname”>: to create a new element called

qname in the resulte tree, with content the child nodes of that instruction, e.g. in <xsl:template match="person"> <xsl:element name="Employee"> <xsl:apply-templates/> </xsl:element> </xsl:template>

127 Sunday, 21 October 2012

slide-128
SLIDE 128

128

XSLT elements: generating instructions

  • <xsl:attribute> to produce an attribute, e.g., in

<xsl:template match=”person"> <xsl:element name="Employee"> <xsl:attribute name="alter"> <xsl:value-of select=”@age"/> </xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template>

  • (already seen) <xsl:value-of select=expression/> returns, for each node

selected through expression, the string values that corresponds to that node, where the string value of a – text node is its text – attribute node is its value – element or root node is the concatenation of the string values of all its descendent text nodes

128 Sunday, 21 October 2012

slide-129
SLIDE 129

129

  • <xsl:copy-of select=expression> produces a node set selected through expression.

It can be used to reuse fragments of the source document. Careful: – <xslt:value-of> converts fragments into a string before copying it into the result tree – <xslt:copy-of> copies the complete fragment based on the (required) select attribute, without first converting the fragment into a string – e.g., <xsl:template match="people"> <family><xsl:copy-of select="*"/></family> </xsl:template>

  • <xsl:copy use-attribute-sets=“..”> simply copies the current node and then applies

the template (in case it contains a template as child nodes) – namespaces are included automatically in the copy – attributes are not automatically included, they can be included via the “use-attribute-set” attribute

  • <xsl:number> can be used to increase

running numbers -- beyond this class <xsl:template match="people"> <family> <xsl:for-each select="person"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:for-each> </family> </xsl:template>

XSLT elements: generating instructions

129 Sunday, 21 October 2012

slide-130
SLIDE 130

130

XSLT elements: conditional processing

  • <xsl:if test=“bool-exp”> does the obvious: if bool-exp evaluates to true,

its child nodes are processed, otherwise nothing. E.g., in <xsl:template match=”person">

<xsl:element name="Employee"> <xsl:if test=”@age > 0"> <xsl:attribute name="alter"> <xsl:value-of select=”@age"/> </xsl:attribute></xsl:if> <xsl:apply-templates/></xsl:element></xsl:template>

  • <xsl:choose> in combination with <xsl:when test=“bool-exp”> and

<xsl:otherwise> provides a useful means for case distinctions. e.g., in <xsl:choose> <xsl:when test="price &gt; 10"> <td bgcolor="#ff00ff"><xsl:value-of select="artist"/></td> </xsl:when> <xsl:otherwise> <td><xsl:value-of select="artist"/></td> </xsl:otherwise> </xsl:choose>

130 Sunday, 21 October 2012

slide-131
SLIDE 131

131

XSLT elements: conditional processing

  • <xsl:for-each select=“expression”> can be used to process

all nodes selected through expression, e.g., in <xsl:template match=“/people”> <table cellspacing="3" cellpadding="3" width="450" border="1"> <tbody> <tr><td>First Name</td><td>Last Name</td><td>Age</ td></tr> <xsl:for-each select="person”> <xsl:sort select=“name/last”/> <tr><td><xsl:value-of select="name/first"/></td> <td><xsl:value-of select="name/last"/></td> <td><xsl:value-of select="@age"/></td> </tr> </xsl:for-each> </tbody> </table> </xsl:template>

131 Sunday, 21 October 2012

slide-132
SLIDE 132

132

XSLT…

  • many more things are provided by XSLT,
  • you are cordially invited to

– find more about them – experiment with schema awareness

  • see nice features and complications

– experiment with namespaces – (and with SA and namespaces) – get your own experiences using <oXygen/> – have a look, e.g., at the influence of template rules’ order to the result! – think about how one compare XSLT and XQuery

  • their (dis)advantages
  • when would you use/recommend which?
  • do we need both?

132 Sunday, 21 October 2012