Internal to External Or, spill your guts 1 Monday, 29 October 2012 - - PowerPoint PPT Presentation

internal to external
SMART_READER_LITE
LIVE PREVIEW

Internal to External Or, spill your guts 1 Monday, 29 October 2012 - - PowerPoint PPT Presentation

Internal to External Or, spill your guts 1 Monday, 29 October 2012 Self-description As standard description A series of octets A series of unicode characters Well-formed A series of events Only one way to parse it


slide-1
SLIDE 1

Internal to External

Or, spill your guts

1

Monday, 29 October 2012

slide-2
SLIDE 2

Self-description

  • As standard description

– A series of octets – A series of unicode characters – A series of “events”

  • SAX perspective
  • E.g., Start/End tags
  • Events are tokens

– A tree structure

  • A DOM/Infoset

– A tree of a certain shape

  • A Validated Infoset

– An adorned tree of a certain shape

  • A PSVI wrt an WXS

– A picture (or document, or action, or…)

  • Application meaning

Well-formed Only one way to parse it Internal (DTD and doc are one)

2

External (Schema and doc are separate;

  • ut-of-band desription)

Monday, 29 October 2012

slide-3
SLIDE 3

Roundtripping Fail: Defaults

3 <a> <b/> <b c="bar"/> </a> Test.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> <!ATTLIST b c CDATA #IMPLIED> sparse.dtd <!ELEMENT a (b)+> <!ELEMENT b EMPTY> <!ATTLIST b c CDATA 'foo'> full.dtd

count(//@c) = 2 count(//@c) = 1

<a> <b c="foo"/> <b c="bar"/> </a> Test-full.xml <a> <b/> <b c="bar"/> </a> Test-sparse.xml

Validate Serialize Query Can we think of Test-sparse and -full as “the same”?

Note: In oXygen, one needs to use internal validation.

Monday, 29 October 2012

slide-4
SLIDE 4

Not self-describing!

  • Under external validation
  • Not just legality, but content!

– The PSVIs have different information in them!

4

Monday, 29 October 2012

slide-5
SLIDE 5

Roundtripping “Success”: Types

5 <a> <b/> <b/> </a> Test.xml

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>

bare.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>

typed.xsd

count(//b) = 2 count(//b) = 2

Validate Query

Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.

Monday, 29 October 2012

slide-6
SLIDE 6

Roundtripping “Success”: Types

6 <a> <b/> <b/> </a> Test.xml

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>

bare.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>

typed.xsd

count(//b) = 2 count(//b) = 2

Validate Query2

Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes and elements as well.

count(//element(*,btype)) = ? count(//element(*,btype)) = 2

Monday, 29 October 2012

slide-7
SLIDE 7

Roundtripping “Success”: Types

7 <a> <b/> <b/> </a> Test.xml

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>

bare.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>

typed.xsd

count(//b) = 2 count(//b) = 2

Validate Query2

Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.

count(//element(*,btype)) = ? count(//element(*,btype)) = 2

Monday, 29 October 2012

slide-8
SLIDE 8

Roundtripping “Success”: Types

8 <a> <b/> <b/> </a> Test.xml

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>

bare.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>

typed.xsd

count(//b) = 2 count(//b) = 2

<a> <b/> <b /> </a> Test.xml

Validate Serialize Query2

Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.

count(//element(*,btype)) = ? count(//element(*,btype)) = 2

Does external through internal succeed? Does internal through external succeed?

Monday, 29 October 2012

slide-9
SLIDE 9

XSLT

9

Monday, 29 October 2012

slide-10
SLIDE 10

10

XSLT: general stuff

  • XSLT 1.0 is a W3C standard since 1999

– see http://www.w3.org/TR/xslt – makes heavy use of XPath 1.0

  • XSLT 2.0 is a W3C standard since January 2007

– see http://www.w3.org/TR/xslt20 – makes heavy use of XPath 2.0

  • is a Turing-complete*, functional programming language, designed

for the transformation of XML documents into XML documents (among other things), where transformation includes the – selection of parts of the source document, – their re-arrangement, and – the derivation of new content

* A proof: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.8846&rep=rep1&type=pdf

Monday, 29 October 2012

slide-11
SLIDE 11

11

XSLT: stylesheet

  • an XSLT stylesheet is a

– well-formed (namespace aware) XML document – which uses elements from the namespace http://www.w3.org/1999/XSL/Transform – using traditionally “xsl” as a prefix for this namespace as in

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0”

  • xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/>
  • stylesheet is a synonym for transformation, both in documentations and in

XML documents

  • a stylesheet is (at the core) a set of

– function definitions, and – template rules

  • XSLT relies heavily on XPath (2.0)

– though is strictly more expressive*

* The Complexity of XPath Query Evaluation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.7936

Monday, 29 October 2012

slide-12
SLIDE 12

But, wait XQuery!??!!

  • XQuery is a Turing-complete, functional programming language,

based on XPath with functions which can transform XML documents....

12

XPath (2.0) FLOWR Various Control Structures User Defined (Recursive) Functions Template Rules XQuery XML Syntax XSLT SQLish Syntax Both can be Schema Aware!

Monday, 29 October 2012

slide-13
SLIDE 13

Syntax!

  • XSLT is an XML format

– Thus “homoiconic”

  • XSLT code is a first class kind of data

– Indeed, an instance of THE main XSLT data structure (i.e., XML DOMs)

  • Very easy to write XSLT to manipulate XSLT
  • XSLT is a “2 syntax” language

– XPath is a key component – XPath is not homoiconic, nor is it XML based

  • Thus the integration is a bit odd

13 Example from: http://www.xml.com/lpt/a/1549

Monday, 29 October 2012

slide-14
SLIDE 14

Benefits?

14

The fact that an XSLT stylesheet is a well-formed XML document has a number of advantages, however:

  • Stylesheets can be used as the input or output of a transformation (this is surprisingly

common in practice)

  • XSLT can be embedded in other XML-based languages, and can in turn have other XML-

based languages embedded within it. For example, this enables XSLT to support embedded schemas (a schema embedded within a stylesheet) in a way that XQuery can not. Similarly, XSLT can be easily embedded in pipeline processing languages such as Orbeon's XPL.

  • Because XSLT is XML, rather than merely mimicking XML, the same parser technology can

be reused, the whole range of XML techniques can be used when writing stylesheets (for example, use of external entities and CDATA sections) and there are no surprises in store for a user who knows the rules of XML. ... Another benefit I have seen from using XML syntax is that it makes the grammar of XSLT much more easily extensible than that of XQuery. Because it tries to make do without any reserved words, and because it mixes a number of different syntactic styles, the grammar of XQuery is a delicate creature. Adding new features like a full-text search capability requires very careful analysis to ensure that no grammatical ambiguities are introduced. By contrast, it's very easy to extend XSLT with new instructions or new attributes, without any risk of ambiguities or backwards incompatibilities. This means that it's quite possible for such extensions to be implemented by vendors (or even by third parties) as well as by the XSL Working Group itself.

Comparing XSLT and XQuery

http://www.mscs.mu.edu/~praveen/Teaching/fa05/AdvDb/PaperTeams/XSLT2XQTeam/Comparing%20XSLT%20and%20XQuery.htm Monday, 29 October 2012

slide-15
SLIDE 15

Syntax! (a fragment of XSLT)

15

<xsl:template match="pig-rescue" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <html> <head> <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>The Pigs</title> </head> <body> <div align="center"> <h1>The Pigs At Belly Draggers Ranch</h1> </div> <ul> <xsl:apply-templates select="animal[position() mod $perPage = 1]" mode="indexList" /> </ul> </body> </html> </xsl:template>

Example from: http://www.xml.com/lpt/a/1549

Literal Constructors (Data! XML!) XSLT Syntax (Code! XML!) XPath Syntax (Code! Not XML!)

Monday, 29 October 2012

slide-16
SLIDE 16

Syntax! (a fragment of XQuery)

16

let $doc := fn:input()/pig-rescue return ( <html> <head> <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>The Pigs</title> </head> <body> <div align="center"> <h1>The Pigs At Belly Draggers Ranch</h1> </div> <ul> { local:make-name-list( $doc/animal ) } </ul> </body> </html> )

Example from: http://www.xml.com/lpt/a/1549

Literal Constructors (Data! XML!) XQuery Syntax (Code! Not XML!) XPath Syntax (Code! Not XML!)

Monday, 29 October 2012

slide-17
SLIDE 17

Verbosity? (Simple examples)

  • Goes both ways, depending...

17

<xsl:param name="perPage" select="'4'"/> declare variable $perPage as xs:integer := 4; <xsl:value-of select="$filename"/>#a<xsl:value-of select="$start+position()-1"/> {$filename}#a{$start + $pos - 1}

Monday, 29 October 2012

slide-18
SLIDE 18

Is XSLT Schema Aware?

  • Information from a schema can be used both

– statically: when the stylesheet is compiled, and – dynamically: during evaluation of the stylesheet to transform a source document.

  • In a stylesheet (e.g., in XPath expressions and patterns), we may

refer to named types from a schema (e.g., Person from

<xs:complexType name="Person">)

  • The conformance rules for XSLT 2.0 distinguish between a

– basic XSLT processor and a – schema-aware XSLT processor – in <oXygen>, you have both

  • Helpful: http://www.ibm.com/developerworks/xml/library/x-

schemaxslt.html

18

Monday, 29 October 2012

slide-19
SLIDE 19

19

XSLT: stylesheet

  • a stylesheet describes/tells an XSLT processor how to transform a
  • via XML template rules which associate
  • which are then used by an XSLT processor as follows:

result tree (or text) source tree into a

templates patterns

with

instantiate corresponding template to create parts of the result tree match pattern against elements in source tree

Monday, 29 October 2012

slide-20
SLIDE 20

20

XSLT: stylesheet

<xsl:stylesheet version="1.0”

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”>

top-level-elements </xsl:stylesheet>

Alternatively: <xsl:transform version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”> top-level-elements </xsl:transform>

An xsl:stylesheet can have zero or more of each of the following elements in (almost) any order:

xsl:import xsl:include xsl:strip-space xsl:preserve-space xsl:output xsl:key xsl:decimal-format xsl:namespace-alias xsl:attribute-set xsl:variable xsl:param xsl:template

later and in more detail

Monday, 29 October 2012

slide-21
SLIDE 21

21

XSLT elements: template rule

  • (most important element!) a template rule is of the form

<xsl:template match=“expression” name=“qname” priority=“number” mode=“qname”> parameter-list template-def </xsl:template>

  • parameter-list is a list of zero or more xsl:param elements
  • as expression, an XPath location path can be used

– with some restrictions,e.g., it must evaluate to a node set – for XSLT 1.0, use XPath 1.0, – for XSLT 2.0, use XPath 2.0,

  • template-def is an XML document that makes use of other XSLT elements

– including instructions such as xsl:apply-templates or xsl:copy-of

  • ptional

the pattern the template

Monday, 29 October 2012

slide-22
SLIDE 22

22

XSLT elements: template rules

<xsl:template match=expression name = qname priority = number mode = qname> parameter-list template-def </xsl:template>

  • Example: when applied to “<emph>important</emph>”,
  • careful: there

– are various built-in template rules – is a default prioritisation on template rules – is the XSLT processor who fires the templates rules

  • we will see later what elements we can use in template-def

<xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template> <fo:inline-sequence font-weight="bold"> important </fo:inline-sequence>

yields

Monday, 29 October 2012

slide-23
SLIDE 23

23

XSLT elements: processing model, sketched

  • an XSLT processor takes an XML document d with associated stylesheet s
  • processes the (XPath DM) tree (possibly PSVI if SA) corresponding to d
  • in a depth-first manner

– thus we always have a context node

  • applies those template rules to the context node that

– match the context node and – have highest priority

  • thereby generating the result tree according to the template rules
  • the easiest way to generate output is to use literal elements

as the blue and green in the previous example:

<xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template>

Monday, 29 October 2012

slide-24
SLIDE 24

24

XSLT elements: processing model by example

consider the following source tree:

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="my.xsl"?> <people> <person age="41"> <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> </person> <person age="43"> <name> <first>Tony</first> <last>Potter</last> </name> <address>4 Main Road </address> </person> </people>

<?xml .... ?>

...

Monday, 29 October 2012

slide-25
SLIDE 25

25

XSLT elements: processing model by example

consider this source tree with the following XSLT stylesheet: what does this seemingly empty (no template rules!) stylesheet produce?

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> </xsl:stylesheet>

Monday, 29 October 2012

slide-26
SLIDE 26

26

XSLT elements: processing model by example

(tricky!) the previous stylesheet was only seemingly empty because XSLT processors employ built-in template rules: thus templates are applied to all nodes (element, root, text,..) except attribute and namespace nodes

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="*|/"> <xsl:apply-templates/> </xsl:template> <xsl:template match="text()|@*"> <xsl:value-of select="."/> </xsl:template> <xsl:template match="processing-instruction()|comment()"/> </xsl:stylesheet>

(1) for all element & document nodes (3) for all text and attribute nodes (2) don’t do anything but apply templates to all child nodes (4) return their value (5) ignore p-i & comments

Monday, 29 October 2012

slide-27
SLIDE 27

27

XSLT elements: processing model by example

Built-in template rules:

(b) <xsl:template match="*|/"> <xsl:apply-templates select="node()"/> </xsl:template>

this is the default for “apply-templates”, and node() matches all nodes except attribute nodes & root node

(1) <xsl:template match="*|/"> <xsl:apply-templates select="node()|@*"/> </xsl:template>

if you want your stylesheet to consider attribute nodes, you must overwrite this default, e.g. like this

If we use template rule (1), then it over-rides built-in (b), hence now rules are applied to all nodes (element, root, text,..) including attribute nodes but still except namespace nodes

(node() matches any node other than an attribute node and the root node)

Monday, 29 October 2012

slide-28
SLIDE 28

28

XSLT elements: processing model by example

what does this slightly more elaborate stylesheet yield? Note: <xsl:text> superfluous here, but helpful

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "person"> <xsl:text> Person found! </xsl:text> </xsl:template> </xsl:stylesheet>

Monday, 29 October 2012

slide-29
SLIDE 29

29

XSLT elements: processing model by example

we can make use “functions” to retrieve the “value” of a node:

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "person"> Person found called: <xsl:value-of select="name"/> </xsl:template> </xsl:stylesheet>

Monday, 29 October 2012

slide-30
SLIDE 30

30

we can conveniently copy a node and its complete sub-tree: alternatively, I could have used <xsl:copy-of select=“*”/> <xsl:copy-of select=“person”/>

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "people"> <family> <xsl:copy-of select="child::*"/> </family> </xsl:template> </xsl:stylesheet>

XSLT elements: processing model by example

Monday, 29 October 2012

slide-31
SLIDE 31

The identity transform

  • XSLT Stylesheet that outputs the original document

– E.g., an identity function

  • f(x) = x

31

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>

http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT

Monday, 29 October 2012

slide-32
SLIDE 32

32

we can re-name elements and filter out data:

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="person"> <myFriend> <xsl:apply-templates select="@*|*|text()"/> </myFriend> </xsl:template> <xsl:template match="@*|text()|*"> <xsl:copy> <xsl:apply-templates select="@*|text()|*"/> </xsl:copy> </xsl:template> <xsl:template match="address"/> </xsl:stylesheet>

XSLT elements: processing model by example

Monday, 29 October 2012

slide-33
SLIDE 33

33

we can even apply several rules to the same elements using modes for rules:

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:template match="/people"> <html><body><ol> <xsl:apply-templates select="person" mode="o"/> </ol> <xsl:apply-templates select="person" mode="f"/> </body></html> </xsl:template> <xsl:template match="person" mode="o"> <li> <xsl:value-of select="name/first"/> <xsl:value-of select="name/last"/></li> </xsl:template> <xsl:template match="person" mode="f"> <p> Last name: <xsl:value-of select="name/last"/> Age: <xsl:value-of select="age"/> </p> </xsl:template> </xsl:stylesheet>

XSLT elements: processing model by example

Monday, 29 October 2012

slide-34
SLIDE 34

34

XSLT instructions: value-of

<xsl:value-of select=expression/>

  • is one of the generating instructions provided by XLST
  • it returns, for the first node selected through expression, the string

value that corresponds to that node, where the string value of

– a text node is its text – an attribute node is its value – an element or root node is the concatenation of the string values of all its descendant’s text nodes

  • ...all this is a bit more tricky if you use SA XSLT

– because then, we have more than “text” in text nodes, and need to take into types...

Monday, 29 October 2012

slide-35
SLIDE 35

35

XSLT elements: generating instructions

  • literal result elements: a simple way to create new nodes, e.g., in

<xsl:template match=”person"> <Employee> <xsl:apply-templates/> </Employee> </xsl:template>

  • <xsl:text>: to produce pure text (and invoke error if elements are

produced), e.g., in <xsl:template match="person"> <xsl:text> Person found! </xsl:text> </xsl:template>

  • <xsl:element name=“qname”>: to create a new element called qname

in the resulte tree, with content the child nodes of that instruction, e.g. in <xsl:template match="person"> <xsl:element name="Employee"> <xsl:apply-templates/> </xsl:element> </xsl:template> handy for producing elements with attributes and namespaces

Monday, 29 October 2012

slide-36
SLIDE 36

36

XSLT elements: generating instructions

  • <xsl:attribute> to produce an attribute, e.g., in

<xsl:template match=”person"> <xsl:element name="Employee"> <xsl:attribute name="alter"> <xsl:value-of select=”@age"/> </xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template>

  • (already seen) <xsl:value-of select=expression/> returns, for each node

selected through expression, the string values that corresponds to that node, where the string value of a – text node is its text – attribute node is its value – element or root node is the concatenation of the string values of all its descendent text nodes

Monday, 29 October 2012

slide-37
SLIDE 37

37

  • <xsl:copy-of select=expression> produces a node set selected through expression. It

can be used to reuse fragments of the source document. Careful: – <xslt:value-of> converts fragments into a string before copying it into the result tree – <xslt:copy-of> copies the complete fragment based on the (required) select attribute, without first converting the fragment into a string – e.g., <xsl:template match="people"> <family><xsl:copy-of select="*"/></family> </xsl:template>

  • <xsl:copy use-attribute-sets=“..”> simply copies the current node and then applies the

template (in case it contains a template as child nodes) – namespaces are included automatically in the copy – attributes are not automatically included, they can be included via the “use-attribute-set” attribute

  • <xsl:number> can be used to increase

running numbers -- beyond this class <xsl:template match="people"> <family> <xsl:for-each select="person"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:for-each> </family> </xsl:template>

XSLT elements: generating instructions

Monday, 29 October 2012

slide-38
SLIDE 38

XSLT elements: More Control Structures

  • Conditionals (<http://www.w3.org/TR/xslt20/#conditionals>)

– <xsl:if>, <xsl:choose> – “if” in XPath (2.0) expressions

  • Repetition (<http://www.w3.org/TR/xslt20/#for-each>)

– <xsl:for-each> – “for” in XPath (2.0) expressions

  • Called templates (<http://www.w3.org/TR/xslt20/#named-templates>)

– You can name templates and then call them by name

  • With parameters

– Interrupt or restart the template control flow

  • Functions! (<http://www.w3.org/TR/xslt20/#stylesheet-functions>)

– <xsl:function> defines – Use as XQuery functions in XPath expressions

38

Monday, 29 October 2012

slide-39
SLIDE 39

XSLT Function Example

39

<xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:sequence select="if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence"/> </xsl:function> <xsl:template match="/"> <output> <xsl:value-of select="str:reverse('DOG BITES MAN')"/> </output> </xsl:template>

http://www.w3.org/TR/xslt20/#stylesheet-functions

Monday, 29 October 2012

slide-40
SLIDE 40

XSLT Function Example

40

<xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:choose> <xsl:when test="contains($sentence, ' ')"> <xsl:sequence select="concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' '))"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="$sentence"/> </xsl:otherwise> </xsl:choose> </xsl:function>

http://www.w3.org/TR/xslt20/#stylesheet-functions

Monday, 29 October 2012

slide-41
SLIDE 41

(XQuery version)

41

declare namespace str="http://ex.org"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; declare function str:reverse($sentence as xs:string) as xs:string{ if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence }; str:reverse('DOG BITES MAN')

Monday, 29 October 2012

slide-42
SLIDE 42

Functions Compared

42 <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:str="http://example.com/namespace" version="2.0" exclude-result-prefixes="str"> <xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:sequence select=" if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence"/> </xsl:function> <xsl:template match="/"> <output> <xsl:value-of select="str:reverse('DOG BITES MAN')"/> </output> </xsl:template> </xsl:transform> declare namespace str="http://ex.org"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; declare function str:reverse($sentence as xs:string) as xs:string{ if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence }; str:reverse('DOG BITES MAN')

Monday, 29 October 2012

slide-43
SLIDE 43

declare function local:copy($element as element()) { element {node-name($element)} {$element/@*, for $child in $element/node() return if ($child instance of element()) then local:copy($child) else $child } };

Verbosity encore: Identity Transform

43

http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>

Explicit recursion!

Monday, 29 October 2012

slide-44
SLIDE 44

declare function local:copy-filter-elements($element as element(), $element-name as xs:string*) as element() { element {node-name($element) } { $element/@*, for $child in $element/node()[not(name(.)=$element-name)] return if ($child instance of element()) then local:copy-filter-elements($child,$element-name) else $child } };

Verbosity encore: Identity Transform

44

http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT

<xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- remove all social security numbers --> <xsl:template match="PersonSSNID"/>

Eek!

Monday, 29 October 2012

slide-45
SLIDE 45

XSLT ⇔ XQuery

  • Both Turing Complete

– Share XPath – Share ton of functions – Theoretically, equivalent!

  • XQuery to XSLT

– Have to cope with FLOWR

  • Use control structures!
  • XSLT to XQuery

– Have to encode template rules!

  • Lots of work!
  • See http://www.mscs.mu.edu/~praveen/Teaching/fa05/AdvDb/PaperTeams/

XSLT2XQTeam/Presentation.ppt for one approach

45

Kay, Comparing XSLT and XQuery Monday, 29 October 2012

slide-46
SLIDE 46

46

XSLT…

  • many more things are provided by XSLT,
  • you are cordially invited to

– find more about them – experiment with schema awareness

  • see nice features and complications

– experiment with namespaces – (and with SA and namespaces) – get your own experiences using <oXygen/> – have a look, e.g., at the influence of template rules’ order to the result! – think about how one compare XSLT and XQuery

  • their (dis)advantages
  • when would you use/recommend which?
  • do we need both?

Monday, 29 October 2012

slide-47
SLIDE 47

Bit more tree grammar!

47

Monday, 29 October 2012

slide-48
SLIDE 48

Last week...

...we have designed our first “schema validator” algorithm

  • for local tree grammars first
  • that can be implemented by

– walking the DOM tree in a depth-first, left-2-right way, or – using a SAX parser to do it in a streaming fashion

  • thus uses memory space linear in the depth of the input tree
  • that uses stacks

– to keep track of

  • each rule that a node’s validation needs to check against: R

written on the way down, checked on the way up

  • result of child nodes validations: which non-terminal symbols did they validate

with?

48 ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise

local⇒unique!

Monday, 29 October 2012

slide-49
SLIDE 49

This week...

...we expand the algorithm

  • first to single-type

– this gives us automatically a validator for structural aspect of WXS – will be rather straightforward

  • then to general tree grammars

– this gives us automatically a validator for Relax NG schemas – will be more tricky: we’ll still use stacks to keep track of

  • all rule that a node’s validation needs to check against: R

written on the way down, checked on the way up

  • result of child nodes validations: which non-terminal symbols did they validate

with?

49 ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-50
SLIDE 50

This week...

  • All three algorithms can be implemented by

– walking the DOM tree in a depth-first, left-2-right way, or – using a SAX parser to do it in a streaming fashion

  • thus use memory space linear in the depth of the input tree

– which is quite impressive/surprising for general/Relax NG:

✓we have already seen that

  • DTDs ⤳ local grammars
  • WXS ⤳ single-type grammars
  • RelaxNG ⤳ regular grammars

➡ DTDs are structurally weaker than WXS ➡ WXS is structurally weaker than RelaxNG

50 ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Loc ST Reg

Monday, 29 October 2012

slide-51
SLIDE 51
  • ...if validation of more expressive schema languages is

equally cheap?!

  • Single-type-ness of schema language ensure uniqueness of PSVI

– because each accepted tree only has a unique run – thus answers to schema-aware queries are unambigiously determined (1 doc validating against 1 schema results in exactly 1 PSVI)

..so why restrict to single-type?

51

PSVI

(tree adorned with default values & types)

Schema-aware query processor Schema- aware parser Query XML doc. Schema Query processor Query Answer

Monday, 29 October 2012

slide-52
SLIDE 52
  • ...if validation of more expressive schema languages is

equally cheap?!

  • Single-type-ness of schema language ensure uniqueness of PSVI

– because each accepted tree only has a unique run – thus answers to schema-aware queries are unambigiously determined

..so why restrict to single-type?

52

Remember: G= (N, Σ, S, P) with N = {S1, S2, B} Σ = {a,b} S = {S1, S2} P = {S1 ⟶ a B, S2 ⟶ a B, B ⟶ b ∊} G is not single-type and has 2 runs on ☞ namely: ...what does query for nodes/elemnts of ‘type’ S1 return?

ε a b S1 B ε a b S2 B ε a b

Monday, 29 October 2012

slide-53
SLIDE 53

how to validate documents against schemas (2) and (3)

53

Monday, 29 October 2012

slide-54
SLIDE 54

Input: DOM Tree for T, single-type tree grammar G = (N, Σ, S, P), NT is a stack of strings of non-terminals R is a stack of production rules Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down,

if there is a production rule N → a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop

When an element E is visited on way up,

pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT else report “not accepted” and stop

report “accepted” and stop

add E’s terminal node to its predecessor siblings 54

ValAlgo XML doc/Tree T single-type Grammar G “yes”, if T ∈ L(G) “no”, otherwise

See the paper by Murata, Lee, Mani, Kawaguchi store rule for E’s content in R start remembering E’s child nodes retrieve rule for E’s content in R retrieve E’s child nodes

single-type ⇒ unique rule!

nothing changed

Monday, 29 October 2012

slide-55
SLIDE 55

When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop

55

ValAlgo XML doc/Tree T single-type Grammar G “yes”, if T ∈ L(G) “no”, otherwise

a c c b c b c c c c

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*,D B → b (C,C)|C, C → c ϵ|C, D → c C,C,C} – ...in order to know which production rule N → c ... to chose for nodes labelled c, I need to check rule for predecessor and ensure that N

  • ccurs in RHS chosen for them...

Monday, 29 October 2012

slide-56
SLIDE 56
  • want to implement this algorithm? Again, as for local tree grammars,

– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion

  • now, this was for single-type tree grammars, let’s see how this works

for general tree grammars

– we can have competing non-terminal symbols in RHS of rules – how do we know with which to continue? – try/guess one and, if failed, backtrack? – or by keeping track of all possibilities

  • and, as long as we have some, everything is fine..
  • which means we need some more stacks for track keeping...

56

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-57
SLIDE 57

57 store non-terminals from RHS of possibly applicable rules

we don’t know which to use!

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Input: DOM Tree for T, a tree grammar G = (N, Σ, S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules NS is a stack of sets of non-terminals, init with S Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS

  • nto NS

else report “not accepted” and stop When an element E is visited on way up,

pop a rule set RS = {Ni → a ei | i = 1..k} out of R

Monday, 29 October 2012

slide-58
SLIDE 58

Input: DOM Tree for T, a tree grammar G = (N, Σ, S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules NS is a stack of sets of non-terminals, init with S Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop

When an element E is visited on way up,

pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop

report “accepted” and stop

58 store non-terminals from RHS of possibly applicable rules

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-59
SLIDE 59
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C), C → a (A,A,A)|ϵ}

59

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} a a a a a ➀ ➁ ➂

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-60
SLIDE 60
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

60

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C}

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-61
SLIDE 61
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

61

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-62
SLIDE 62
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

62

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-63
SLIDE 63
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

63

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-64
SLIDE 64
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

64

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk {A,B,C} W = {A,C}

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

(ϵ only matches RHS of ➀ & ➂)

Monday, 29 October 2012

slide-65
SLIDE 65
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

65

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {A,C}

(ϵ only matches RHS of ➀ & ➂)

Monday, 29 October 2012

slide-66
SLIDE 66
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

66

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} RS = {➀, ➁, ➂} {A,B,C} {➀, ➁, ➂} ϵ

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-67
SLIDE 67
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

67

a a R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {A,C}

(ϵ only matches RHS of ➀ & ➂)

Monday, 29 October 2012

slide-68
SLIDE 68
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

68

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C}

RS = {➀, ➁, ➂} ϵ = W1...Wk

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {A,C}

(ϵ only matches RHS of ➀ & ➂)

Monday, 29 October 2012

slide-69
SLIDE 69
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

69

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C}

RS = {➀, ➁, ➂}

{➀, ➁, ➂} ϵ {A,B,C} ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-70
SLIDE 70
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

70

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C} {A,B,C}

RS = {➀, ➁, ➂} ϵ = W1...Wk

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {A,C}

(ϵ only matches RHS of ➀ & ➂)

Monday, 29 October 2012

slide-71
SLIDE 71
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

71

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C},{A,C} {A,B,C}

RS = {➀, ➁, ➂} ϵ = W1...Wk

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {A,C}

(ϵ only matches RHS of ➀ & ➂)

Monday, 29 October 2012

slide-72
SLIDE 72
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

72

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {A,B,C} RS = {➀, ➁, ➂} {A,C},{A,C},{A,C} = W1...W3

W = {C} (LHS of ➂)

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

(only AAA matches a RHS, namely of ➂)

Monday, 29 October 2012

slide-73
SLIDE 73
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

73

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C} RS = {➀, ➁, ➂} ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise {A,C},{A,C},{A,C} = W1...W3

W = {C} (LHS of ➂)

(only AAA matches a RHS, namely of ➂)

Monday, 29 October 2012

slide-74
SLIDE 74
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

74

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C}

RS = {➀, ➁, ➂}

{➀, ➁, ➂} ϵ {A,B,C}

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-75
SLIDE 75
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

75

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C}

RS = {➀, ➁, ➂}

{A,B,C} ϵ = W1...Wk ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {A,C}

(ϵ only matches RHS of ➀ & ➂)

Monday, 29 October 2012

slide-76
SLIDE 76
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

76

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C},{A,C} {A,B,C}

RS = {➀, ➁, ➂} ϵ = W1...Wk

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {A,C}

(ϵ only matches RHS of ➀ & ➂)

Monday, 29 October 2012

slide-77
SLIDE 77
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

77

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {A,B,C}

RS = {➀, ➁, ➂} {C},{A,C} = W1...Wk

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {B}

(only CC matches RHS of ➁)

Monday, 29 October 2012

slide-78
SLIDE 78
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

78

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

{B} {➀, ➁, ➂} NS {A,B,C} {A,B,C}

RS = {➀, ➁, ➂}

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {B}

(only CC matches RHS of ➁)

{C},{A,C} = W1...Wk

Monday, 29 October 2012

slide-79
SLIDE 79
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

79

a a

R NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

NS {A,B,C} {A,B,C}

RS = {➀, ➁, ➂} {B} = W1...Wk

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

W = {A,B}

(B matches RHS of ➀ & ➁)

Monday, 29 October 2012

slide-80
SLIDE 80
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}

80

a a R

NT

When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS

a a a a a ➀ ➁ ➂

NS {A,B,C} “accepted”/“yes”, T is accepted by G

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-81
SLIDE 81
  • Implementing this algorithm? Again, as for single-type tree grammars,

– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion

  • Insights gained? Validating general tree grammars

– does not require guessing & backtrack – can be implemented in a streaming way – is a bit more tricky than validating single-type grammars, – but not really more complex (in terms of time/space)

  • still only space linear in depth of input tree

– so, for validating purposes, restrictions to single-type is not necessary

  • feel free to describe structure in a powerful way!

– but, for uniqueness of PSVI,

  • we need single-type

81

ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise

Monday, 29 October 2012

slide-82
SLIDE 82

Schema languages for different purposes...

  • testing/describing structural constraints

– do persons’ names have both a first and second name?

  • testing type constraints

– is age an integer? And DoB a date?

  • describing a handy PSVI

– adding default values or type information for easy/robust querying/manipulation

  • single-typedness useful for some, but not all purposes!
  • locality?
  • Your applications might use different schemas for different purposes
  • ...and there are purposes none of our schema languages can serve:

– in CW5, not all valid input documents were really grammars – checking whether non-terminals are mentioned correctly is beyond XSD’s abilities...we need an even more powerful schema language!

82

Monday, 29 October 2012

slide-83
SLIDE 83

Other interesting questions

...closely related to validation are

  • Schema emptiness:

– given a schema/grammar S, does there exist a document/tree d such that d is valid w.r.t. S – relevant as a basic consistency test for schemas

  • Schema containment:

– given schemas/grammars S1, S2, is S1 a specialization of S2? – i.e., is every document that is valid w.r.t. S1 also valid w.r.t. S2? – relevant to support tasks such as schema refinement:

  • if I say I want to refine S2,
  • then it would be nice if this intention could be later verified to ensure that I did

what I wanted

– also solves schema equivalence: see your coursework!

  • ...a lot of research in both areas

83

Monday, 29 October 2012

slide-84
SLIDE 84

Bye for now! (I’ll be around) I have enjoyed working with you, and hope you learned loads and also enjoyed the experience!

84

Monday, 29 October 2012

slide-85
SLIDE 85

The Essence of Error

Or, so wrong it’s right

85

Monday, 29 October 2012

slide-86
SLIDE 86

How to cope?

  • With which task?

– Authoring, aggregating, querying…

  • Settle on a core representation of the model

– Perhaps the Atom DOM

  • Coerce/transform/extract other models

– To the representative one – Or build software that mediates the difference

  • Hope that there aren’t too many
  • Advocate standards!

– Or make them – The nice thing about standards is that there are so many of them to choose from.

  • Kent Pitman and others

Monday, 29 October 2012

slide-87
SLIDE 87

Postel’s Law

  • Liberality

– Many DOMs, all expressing the same thing – Many surface syntaxes (perhaps) for each DOM

  • Conservativity

– What should we send?

  • It depends on the receiver!

– Minimal standards?

  • Well formed XML?
  • Valid according to a popular schema/format?
  • HTML?

Be liberal in what you accept, and conservative in what you send.

Monday, 29 October 2012

slide-88
SLIDE 88

88

Structure and Presentation

  • We’ve called this “DOM” and “Application” Layer

– A very common application layer is “rendering”

  • Text, images
  • Like, y’know, the web
  • Standard vs. default renderings
  • Goes back to SGML

<sentence style="slanted">This sentence is false.</sentence> This sentence is false. Correct rendering This sentence is false. Fallback!

(Still see this in XSLT!)

Monday, 29 October 2012

slide-89
SLIDE 89

89

Why Separate them?

  • Presentation is more fluid than structure

– The "look" may need updating

  • Presentation needs may vary

– What works for 21" screens doesn't for mobile phones

  • (Or maybe not!)
  • Accessibility

– (content should be perceivable by everyone)

  • Programmatic processing needs

Monday, 29 October 2012

slide-90
SLIDE 90

90

Another digression: CSS

  • The style language for the Web

– Strong separation of presentation

  • CSS is

– not an XML/angle brackets format

  • Oh NOES! Not another one!

– annotative, not transformative

  • Well, sorta

– mostly “formats” nodes – ubiquitous on the Web, esp. client side – works with arbitrary XML

  • But most clients work with (X)HTML
  • See the excellent PrinceXML formatter

Monday, 29 October 2012

slide-91
SLIDE 91

91

Basic Component

  • Rules

– Which consist of

  • Selectors

– Like XPath expressions – But only forward, with some syntactic sugar

  • Declaration blocks

– Sets of property/value pairs

div.title { text-align:center; font-size: 24; }

Monday, 29 October 2012

slide-92
SLIDE 92

92

<html><head><title>A bit of style</title></head> <body><style type="text/css"> .title { font-weight: bold } div.title { text-align:center; font-size: 24; } div.entry div.title { text-align: left; font-variant: normal} span.date {font-style: italic} span.date:after {content:" by"} div.content {font-style: italic} div.content i {font-style: normal;font-weight: bold} #one {color: red}</style> <div class="title">My Weblog</div> <div class="entry"> <div class="title">What I Did Today</div> <div class="byline"> <span class="date">Feb. 09, 2009</span> <span class="author">Bijan Parsia</span> </div> <div class="content" id="one"> <p>Taught a class and it went <i>very</i> well.</p> </div> </div> </body></html>

Try it in http://software.hixie.ch/utilities/js/live-dom-viewer/

Monday, 29 October 2012

slide-93
SLIDE 93

93

Media Types

  • Different sets of rules can be contextualized to

media

– Screen, Print, Braille, Aural…

  • This is done with groupings called “@media rule”s

@media print { BODY { font-size: 10pt } } @media screen { BODY { font-size: 12pt } }

Larger font size for screen

Monday, 29 October 2012

slide-94
SLIDE 94

94

Cascading

  • CSS Rules cascade

– That is, there is overriding (and non-overriding) inheritance

  • That is, rules combine in different ways

– http://www.w3.org/TR/CSS21/cascade.html#cascade

  • General principles

– Distance to the node is significant – Precision of selectors is significant – Order of appearance is significant

Monday, 29 October 2012

slide-95
SLIDE 95

95

Error Handling

  • XML has “draconian” error handling

– Well formedness error…BOOM

  • CSS has “forgiving” error handling

– “Rules for handling parsing errors”

http://www.w3.org/TR/CSS21/syndata.html#parsing-errors

  • That is, how to interpret illegal documents
  • Not reporting errors, but working around them

– E.g.,“User agents must ignore a declaration with an unknown property.”

  • Replace: “h1 { color: red; rotation: 70minutes }”
  • With: “h1 { color: red }”
  • Study the error handling rules!

Monday, 29 October 2012

slide-96
SLIDE 96

96

CSS Robustness

  • Has to deal with Web conditions
  • 1. People borrowing
  • 2. People collaborating
  • 3. Different devices
  • 4. Different kinds of audiences (and authors)
  • 5. Maintainability
  • 6. Aesthetics
  • CSS is designed for this

– Cascading & Inheritance help with 1, 2, 5

  • And importing, of course

– @media rules help with 3-6 – Error handling helps with 1, 2, 4

Monday, 29 October 2012

slide-97
SLIDE 97

Errors!

  • One person’s error is another’s data
  • Errors may or may not be unusual
  • Errors are relative to a norm
  • Preventing errors

– Make errors hard or impossible to make

  • Make doing things hard or impossible

– Make doing the right thing easy and inevitable – Make detecting errors easy – Make correcting errors easy – Correct errors – Fail silently – Fail randomly – Fail differently (interop problem)

97

Monday, 29 October 2012

slide-98
SLIDE 98

(Perceived) Affordances

  • (Perceived) Affordance

– an available action that is salient to the actor

Donald Norman, The Design of Everyday Things

Monday, 29 October 2012

slide-99
SLIDE 99

(Perceived) Affordances

  • (Perceived) Affordance

– an available action that is salient to the actor

Donald Norman, The Design of Everyday Things

Monday, 29 October 2012

slide-100
SLIDE 100

Attractive Nuisances

  • A dominant or attractive affordance

– with a bad or wrong action – In law, “a hazardous object or condition on the land that is likely to attract children who are unable to appreciate the risk posed by the object or condition” -- ye olde Wikipedia – We can reformulate

  • “a hazardous or misleading language or UI feature that is likely to be

misused by (even) an educated user”

  • Contrast with “merely” hard to use

– An attractive nuisance is easy to attempt, hard to use (correctly), and has bad (to catastrophic) effects

Monday, 29 October 2012

slide-101
SLIDE 101

Typical Schema Languages

  • Grammar (and maybe type based)

– Recognize all or none

  • Though what the “all” is can be rather flexible

– Restrictive by default

  • Slogan: What is not permitted is forbidden

– Error detection and reporting

  • Is at the discretion of the system
  • “Not accepted” is the starting place
  • The point where an error is detected

– might not be the point where it occurred – might not be the most helpful point to look at!

  • Programs!

– Null pointer deref » Is the right point the deref or the setting to null? – Non-crashing errors

Monday, 29 October 2012

slide-102
SLIDE 102

The SSD Way

  • Explore before prescribe
  • Describe rather than define
  • Take what you can, when you can take it
  • Extra or missing stuff is (can be) OK

– Irregular structure!

  • Adhere to the task at hand
  • Adore Postel’s Law

Monday, 29 October 2012

slide-103
SLIDE 103

XML Error Handling

  • De facto XML motto

– Be strict about the well formedness of what you accept, and strict in what you send – Draconian error handling – Severe consequences on the Web

  • And other places
  • Fail early and fail hard
  • What about higher levels?

– Validity and other analysis? – Most schema languages poor at error reporting

  • How about XQuery’s type error reporting?

Monday, 29 October 2012

slide-104
SLIDE 104

XML Error Handling

  • The spec:

– fatal error [Definition: An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).]

  • What should an application do?

– To or for its users

Monday, 29 October 2012

slide-105
SLIDE 105

XPath for Validation

  • What XPath is “equivalent” to the declaration of <b>?

<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml

count(//b) count(//b/*) count(//b/text()) =3 =3 =0 =1 =0 =1

<a> <b/> <b>Foo</b> </a>

=0

<a> <b/> <b><b/><b/> </a>

=0

Monday, 29 October 2012

slide-106
SLIDE 106

XPath for Validation

  • What XPath is “equivalent” to the declaration of <b>?

<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml

count(//b/(* | text()))

=0 =2

<a> <b/> <b>Foo</b> </a>

=1

<a> <b/> <b><b/><b/> </a>

=1

Monday, 29 October 2012

slide-107
SLIDE 107

XPath for Validation

  • What XPath is “equivalent” to the declaration of <b>?

<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml

if (count(//b/(* | text()))=0) then “valid” else “invalid”

=valid =invalid

<a> <b/> <b>Foo</b> </a> <a> <b/> <b><b/><b/> </a>

Can even “find” the errors!

Monday, 29 October 2012

slide-108
SLIDE 108

Monday, 29 October 2012

slide-109
SLIDE 109

XPath (etc) for Validation

  • We could have finer control

– Validate parts of a document – A la wildcards

  • But with more control!
  • We could have greater expressivity

– Far reaching dependancies – Computations

  • Essentially, code based validation!

– With XQuery and XSLT – But still a leetle declarative

  • We always need it

The essence of Schematron

Monday, 29 October 2012

slide-110
SLIDE 110
  • A different sort of schema language

– Not grammar or object/type based – Rule based – Test oriented – Complimentary

  • Conceptually simple

– Patterns contain rules

  • Rules set a context and contain asserts and reports (A&Rs)
  • A&Rs contain

– Tests, which are XPath expressions, and – Assertions, which are natural language descriptions

Schematron

Monday, 29 October 2012

slide-111
SLIDE 111

DTDx Schematron

  • “Only 1 Element declaration with a given name”

– (Ok, could handle this with Keys in XML Schema!)

<rule context="element">

<let name="n" value="@name"/> <assert test="count(//element/name[text()=$n]) = 1"> There can be only one element declaration with a given name. </assert> </rule>

  • “Every element reference must have a corresponding element

declaration ”

<rule context="elementref"> <let name="r" value=”/ref/text()"/> <assert test="count(//element/nametext()=$r]) = 1"> There must be an element declaration (with the right name) for elementref to refer to. </assert> </rule>

Monday, 29 October 2012

slide-112
SLIDE 112

Tip of the iceberg

  • Computations

–Using XPath functions and variables

  • Dynamic checks

–Can pull stuff from other file

  • Elaborate reports

–diagnostics has (value-ofed) expressions –“Generate paths” to errors

  • Sound familiar?
  • General case

–Thin shim over XSLT –Closer to “arbitrary code”

Monday, 29 October 2012

slide-113
SLIDE 113

Interesting Points

  • DTDx has a WXS

– Schematron doesn’t care – Two phase validation

  • RELAX NG has a way of embedding
  • WXS 1.1 incorporating similar rules
  • Arbitrary XPath for context and test

– Plus variables!

  • What isn’t forbidden is permitted

– Unlike all the other schema languages! – We’re not performing runs

  • We’re firing rules

– Somewhat easy to use

  • If you know XPath
  • If you don’t need coverage

– What about analysis?

Monday, 29 October 2012

slide-114
SLIDE 114

Schematron Presumes…

  • …well formed XML

–As do all XML schema languages

  • Work on DOM!

–So can’t help with e.g., overlapping tags

  • Or tag soup in general
  • Namespace Analysis!?
  • …authorial repair

–At least, in the default case

  • Communicate errors to people
  • Thus, not the basis of a modern browser!

–Unlike CSS

  • Is this enough liberality?

–Or rather, does it support enough liberality?

Monday, 29 October 2012

slide-115
SLIDE 115

Take the following sample XHTML code:

  • 01. <html>
  • 02. <head>

03. <title>Hello!</title> 04. <meta http-equiv="Content-Type" content="application/xhtml+xml" />

  • 05. </head>
  • 06. <body>

07. <p>Hello to you!</p> 08. <p>Can you spot the problem?

  • 09. </body>
  • 10. </html>

115 Slide due to Iain Flynn

Monday, 29 October 2012

slide-116
SLIDE 116

HTML: XHTML:

116 Slide due to Iain Flynn

Monday, 29 October 2012

slide-117
SLIDE 117

Validation In The Wild

  • HTML

– 1%-5% of web pages are valid – Validation is very weak! – All sorts of breakage

  • E.g., overlapping tags
  • <b>hi <i>there</b>, my good friend</i>
  • Syndication Formats

– 10% feeds not well-formed – Where do the problems come from?

  • Hand authoring
  • Generation bugs
  • String concat based generation
  • Composition from random sources

Monday, 29 October 2012

slide-118
SLIDE 118

More recently

In 2005, the developers of Google Reader (Google’s RSS and Atom feed parser) took a snapshot of the XML documents they parsed in one day.

  • Approximately 7% of these documents contained at

least one well-formedness error.

  • Google Reader deals with millions of feeds per day.

– That’s a lot of broken documents

Source: http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html Slide due to Iain Flynn

Monday, 29 October 2012

slide-119
SLIDE 119

Even More Recently!

The Quality of the XML Web [2011]

Monday, 29 October 2012

slide-120
SLIDE 120

Text

Encoding Structure Entity Typo

Slide due to Iain Flynn

Monday, 29 October 2012

slide-121
SLIDE 121

!"#$%&"'() *+,)

  • ./0#.0/1()

23,) !"4.5() **,) 657$() 2,)

!""#"$%"&'()#*+,$

!"#$%&"'()

  • ./0#.0/1()

!"4.5() 657$()

Slide due to Iain Flynn

Monday, 29 October 2012

slide-122
SLIDE 122

A Thought Experiment

  • “Imagine...that all web browsers use strict XML parsers”
  • “...that you were using a publishing tool that [was strict]

– “All of its default templates were valid XHTML.” – “It incorporated a nifty layout editor to ensure that you couldn’t introduce any invalid XHTML...”

  • “You click ‘Publish’”

– “the page that you...validly authored is now not well-formed”

  • Problem: “a trackback with some illegal characters”

– “...your publishing tool had a bug” – “The administration page itself tries to display the trackbacks you’ve received, and you get an XML processing error.”

http://diveintomark.org/archives/2004/01/14/thought_experiment

Monday, 29 October 2012

slide-123
SLIDE 123

Real Life

Monday, 29 October 2012

slide-124
SLIDE 124

Lesson #1

  • We are dealing with socio-political (and economic) phenomena

– Complex ones! – Many players; many sorts of player – Lots of historical specifics – Lots of interaction effects

  • Human factors critical

– What do people do (and why?) – How to influence them? – Affordances and incentives – Dealing with “bozos”

  • “There’s just no nice way to say this: Anyone who can’t make a

syndication feed that’s well-formed XML is an incompetent fool.”

Monday, 29 October 2012

slide-125
SLIDE 125

3 Error Handling Styles

  • Draconian

– Fail hard and fast

  • Ignore errors

– CSS, DTD ATTLISTs, HTML

  • Hard coded DWIM repair

– HTML, HTML5

  • Ultimately, (some) errors are propagated

– The key is to fail correctly

  • In the right way, at the right time, for the right reason

– With the right message!

  • Better is to make errors unlikely!

Every set of bytes has a corresponding (determinate) DOM

Monday, 29 October 2012

slide-126
SLIDE 126

Wrap-up

Or, goodbyes and farewells

126

Monday, 29 October 2012

slide-127
SLIDE 127

Semi-structured Data

  • There’s a tension between

– flexibility and stability – flexibility and efficiently – expressivity and efficiently – usability and flexibility – usability and rigidity – etc. etc. etc.

  • It is important to

– understand trade-offs – cultivate judgement

  • Most things can be made to work

– there is no silver bullet

  • Most things can fail

Monday, 29 October 2012

slide-128
SLIDE 128

Last coursework

  • There’s the usual line up
  • COURSEWORK DEADLINE IS DIFFERENT

– Due MONDAY, NOV 5TH!!! – At 9:00AM

  • Some “make up” work available

– Due after period 2 – So as not to conflict – Practice some Java!

Monday, 29 October 2012

slide-129
SLIDE 129

The Exam

  • Electronic/Online

– Basically, an extended version of Qs and SEs

  • Revision session

– After break

  • Blackboard discussion area

– For revision

Monday, 29 October 2012

slide-130
SLIDE 130

Thanks for playing

  • Uli and I enjoyed working with you
  • There are many possible projects that would build on

things you’ve learned; see me if you’re interested

Monday, 29 October 2012