Internal to External
Or, spill your guts
1
Monday, 29 October 2012
Internal to External Or, spill your guts 1 Monday, 29 October 2012 - - PowerPoint PPT Presentation
Internal to External Or, spill your guts 1 Monday, 29 October 2012 Self-description As standard description A series of octets A series of unicode characters Well-formed A series of events Only one way to parse it
Or, spill your guts
1
Monday, 29 October 2012
– A series of octets – A series of unicode characters – A series of “events”
– A tree structure
– A tree of a certain shape
– An adorned tree of a certain shape
– A picture (or document, or action, or…)
Well-formed Only one way to parse it Internal (DTD and doc are one)
2
External (Schema and doc are separate;
Monday, 29 October 2012
3 <a> <b/> <b c="bar"/> </a> Test.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> <!ATTLIST b c CDATA #IMPLIED> sparse.dtd <!ELEMENT a (b)+> <!ELEMENT b EMPTY> <!ATTLIST b c CDATA 'foo'> full.dtd
count(//@c) = 2 count(//@c) = 1
<a> <b c="foo"/> <b c="bar"/> </a> Test-full.xml <a> <b/> <b c="bar"/> </a> Test-sparse.xml
Validate Serialize Query Can we think of Test-sparse and -full as “the same”?
Note: In oXygen, one needs to use internal validation.
Monday, 29 October 2012
– The PSVIs have different information in them!
4
Monday, 29 October 2012
5 <a> <b/> <b/> </a> Test.xml
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>
bare.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>
typed.xsd
count(//b) = 2 count(//b) = 2
Validate Query
Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.
Monday, 29 October 2012
6 <a> <b/> <b/> </a> Test.xml
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>
bare.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>
typed.xsd
count(//b) = 2 count(//b) = 2
Validate Query2
Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes and elements as well.
count(//element(*,btype)) = ? count(//element(*,btype)) = 2
Monday, 29 October 2012
7 <a> <b/> <b/> </a> Test.xml
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>
bare.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>
typed.xsd
count(//b) = 2 count(//b) = 2
Validate Query2
Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.
count(//element(*,btype)) = ? count(//element(*,btype)) = 2
Monday, 29 October 2012
8 <a> <b/> <b/> </a> Test.xml
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>
bare.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>
typed.xsd
count(//b) = 2 count(//b) = 2
<a> <b/> <b /> </a> Test.xml
Validate Serialize Query2
Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.
count(//element(*,btype)) = ? count(//element(*,btype)) = 2
Does external through internal succeed? Does internal through external succeed?
Monday, 29 October 2012
9
Monday, 29 October 2012
10
– see http://www.w3.org/TR/xslt – makes heavy use of XPath 1.0
– see http://www.w3.org/TR/xslt20 – makes heavy use of XPath 2.0
for the transformation of XML documents into XML documents (among other things), where transformation includes the – selection of parts of the source document, – their re-arrangement, and – the derivation of new content
* A proof: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.8846&rep=rep1&type=pdf
Monday, 29 October 2012
11
– well-formed (namespace aware) XML document – which uses elements from the namespace http://www.w3.org/1999/XSL/Transform – using traditionally “xsl” as a prefix for this namespace as in
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0”
XML documents
– function definitions, and – template rules
– though is strictly more expressive*
* The Complexity of XPath Query Evaluation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.7936
Monday, 29 October 2012
based on XPath with functions which can transform XML documents....
12
XPath (2.0) FLOWR Various Control Structures User Defined (Recursive) Functions Template Rules XQuery XML Syntax XSLT SQLish Syntax Both can be Schema Aware!
Monday, 29 October 2012
– Thus “homoiconic”
– Indeed, an instance of THE main XSLT data structure (i.e., XML DOMs)
– XPath is a key component – XPath is not homoiconic, nor is it XML based
13 Example from: http://www.xml.com/lpt/a/1549
Monday, 29 October 2012
14
The fact that an XSLT stylesheet is a well-formed XML document has a number of advantages, however:
common in practice)
based languages embedded within it. For example, this enables XSLT to support embedded schemas (a schema embedded within a stylesheet) in a way that XQuery can not. Similarly, XSLT can be easily embedded in pipeline processing languages such as Orbeon's XPL.
be reused, the whole range of XML techniques can be used when writing stylesheets (for example, use of external entities and CDATA sections) and there are no surprises in store for a user who knows the rules of XML. ... Another benefit I have seen from using XML syntax is that it makes the grammar of XSLT much more easily extensible than that of XQuery. Because it tries to make do without any reserved words, and because it mixes a number of different syntactic styles, the grammar of XQuery is a delicate creature. Adding new features like a full-text search capability requires very careful analysis to ensure that no grammatical ambiguities are introduced. By contrast, it's very easy to extend XSLT with new instructions or new attributes, without any risk of ambiguities or backwards incompatibilities. This means that it's quite possible for such extensions to be implemented by vendors (or even by third parties) as well as by the XSL Working Group itself.
Comparing XSLT and XQuery
http://www.mscs.mu.edu/~praveen/Teaching/fa05/AdvDb/PaperTeams/XSLT2XQTeam/Comparing%20XSLT%20and%20XQuery.htm Monday, 29 October 2012
15
<xsl:template match="pig-rescue" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <html> <head> <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>The Pigs</title> </head> <body> <div align="center"> <h1>The Pigs At Belly Draggers Ranch</h1> </div> <ul> <xsl:apply-templates select="animal[position() mod $perPage = 1]" mode="indexList" /> </ul> </body> </html> </xsl:template>
Example from: http://www.xml.com/lpt/a/1549
Literal Constructors (Data! XML!) XSLT Syntax (Code! XML!) XPath Syntax (Code! Not XML!)
Monday, 29 October 2012
16
let $doc := fn:input()/pig-rescue return ( <html> <head> <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>The Pigs</title> </head> <body> <div align="center"> <h1>The Pigs At Belly Draggers Ranch</h1> </div> <ul> { local:make-name-list( $doc/animal ) } </ul> </body> </html> )
Example from: http://www.xml.com/lpt/a/1549
Literal Constructors (Data! XML!) XQuery Syntax (Code! Not XML!) XPath Syntax (Code! Not XML!)
Monday, 29 October 2012
17
<xsl:param name="perPage" select="'4'"/> declare variable $perPage as xs:integer := 4; <xsl:value-of select="$filename"/>#a<xsl:value-of select="$start+position()-1"/> {$filename}#a{$start + $pos - 1}
Monday, 29 October 2012
– statically: when the stylesheet is compiled, and – dynamically: during evaluation of the stylesheet to transform a source document.
refer to named types from a schema (e.g., Person from
<xs:complexType name="Person">)
– basic XSLT processor and a – schema-aware XSLT processor – in <oXygen>, you have both
schemaxslt.html
18
Monday, 29 October 2012
19
result tree (or text) source tree into a
templates patterns
with
instantiate corresponding template to create parts of the result tree match pattern against elements in source tree
Monday, 29 October 2012
20
<xsl:stylesheet version="1.0”
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”>
top-level-elements </xsl:stylesheet>
Alternatively: <xsl:transform version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”> top-level-elements </xsl:transform>
An xsl:stylesheet can have zero or more of each of the following elements in (almost) any order:
xsl:import xsl:include xsl:strip-space xsl:preserve-space xsl:output xsl:key xsl:decimal-format xsl:namespace-alias xsl:attribute-set xsl:variable xsl:param xsl:template
later and in more detail
Monday, 29 October 2012
21
<xsl:template match=“expression” name=“qname” priority=“number” mode=“qname”> parameter-list template-def </xsl:template>
– with some restrictions,e.g., it must evaluate to a node set – for XSLT 1.0, use XPath 1.0, – for XSLT 2.0, use XPath 2.0,
– including instructions such as xsl:apply-templates or xsl:copy-of
the pattern the template
Monday, 29 October 2012
22
<xsl:template match=expression name = qname priority = number mode = qname> parameter-list template-def </xsl:template>
– are various built-in template rules – is a default prioritisation on template rules – is the XSLT processor who fires the templates rules
<xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template> <fo:inline-sequence font-weight="bold"> important </fo:inline-sequence>
yields
Monday, 29 October 2012
23
– thus we always have a context node
– match the context node and – have highest priority
as the blue and green in the previous example:
<xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template>
Monday, 29 October 2012
24
consider the following source tree:
<?xml .... ?> root people person person name age=41 address
Potter 4 Main Road Harry
first last
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="my.xsl"?> <people> <person age="41"> <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> </person> <person age="43"> <name> <first>Tony</first> <last>Potter</last> </name> <address>4 Main Road </address> </person> </people>
<?xml .... ?>
...
Monday, 29 October 2012
25
consider this source tree with the following XSLT stylesheet: what does this seemingly empty (no template rules!) stylesheet produce?
<?xml .... ?> root people person person name age=41 address
Potter 4 Main Road Harry
first last
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> </xsl:stylesheet>
Monday, 29 October 2012
26
(tricky!) the previous stylesheet was only seemingly empty because XSLT processors employ built-in template rules: thus templates are applied to all nodes (element, root, text,..) except attribute and namespace nodes
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="*|/"> <xsl:apply-templates/> </xsl:template> <xsl:template match="text()|@*"> <xsl:value-of select="."/> </xsl:template> <xsl:template match="processing-instruction()|comment()"/> </xsl:stylesheet>
(1) for all element & document nodes (3) for all text and attribute nodes (2) don’t do anything but apply templates to all child nodes (4) return their value (5) ignore p-i & comments
Monday, 29 October 2012
27
Built-in template rules:
(b) <xsl:template match="*|/"> <xsl:apply-templates select="node()"/> </xsl:template>
this is the default for “apply-templates”, and node() matches all nodes except attribute nodes & root node
(1) <xsl:template match="*|/"> <xsl:apply-templates select="node()|@*"/> </xsl:template>
if you want your stylesheet to consider attribute nodes, you must overwrite this default, e.g. like this
If we use template rule (1), then it over-rides built-in (b), hence now rules are applied to all nodes (element, root, text,..) including attribute nodes but still except namespace nodes
(node() matches any node other than an attribute node and the root node)
Monday, 29 October 2012
28
what does this slightly more elaborate stylesheet yield? Note: <xsl:text> superfluous here, but helpful
<?xml .... ?> root people person person name age=41 address
Potter 4 Main Road Harry
first last
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "person"> <xsl:text> Person found! </xsl:text> </xsl:template> </xsl:stylesheet>
Monday, 29 October 2012
29
we can make use “functions” to retrieve the “value” of a node:
<?xml .... ?> root people person person name age=41 address
Potter 4 Main Road Harry
first last
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "person"> Person found called: <xsl:value-of select="name"/> </xsl:template> </xsl:stylesheet>
Monday, 29 October 2012
30
we can conveniently copy a node and its complete sub-tree: alternatively, I could have used <xsl:copy-of select=“*”/> <xsl:copy-of select=“person”/>
<?xml .... ?> root people person person name age=41 address
Potter 4 Main Road Harry
first last
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "people"> <family> <xsl:copy-of select="child::*"/> </family> </xsl:template> </xsl:stylesheet>
Monday, 29 October 2012
– E.g., an identity function
31
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT
Monday, 29 October 2012
32
we can re-name elements and filter out data:
<?xml .... ?> root people person person name age=41 address
Potter 4 Main Road Harry
first last
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="person"> <myFriend> <xsl:apply-templates select="@*|*|text()"/> </myFriend> </xsl:template> <xsl:template match="@*|text()|*"> <xsl:copy> <xsl:apply-templates select="@*|text()|*"/> </xsl:copy> </xsl:template> <xsl:template match="address"/> </xsl:stylesheet>
Monday, 29 October 2012
33
we can even apply several rules to the same elements using modes for rules:
<?xml .... ?> root people person person name age=41 address
Potter 4 Main Road Harry
first last <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:template match="/people"> <html><body><ol> <xsl:apply-templates select="person" mode="o"/> </ol> <xsl:apply-templates select="person" mode="f"/> </body></html> </xsl:template> <xsl:template match="person" mode="o"> <li> <xsl:value-of select="name/first"/> <xsl:value-of select="name/last"/></li> </xsl:template> <xsl:template match="person" mode="f"> <p> Last name: <xsl:value-of select="name/last"/> Age: <xsl:value-of select="age"/> </p> </xsl:template> </xsl:stylesheet>
Monday, 29 October 2012
34
<xsl:value-of select=expression/>
value that corresponds to that node, where the string value of
– a text node is its text – an attribute node is its value – an element or root node is the concatenation of the string values of all its descendant’s text nodes
– because then, we have more than “text” in text nodes, and need to take into types...
Monday, 29 October 2012
35
<xsl:template match=”person"> <Employee> <xsl:apply-templates/> </Employee> </xsl:template>
produced), e.g., in <xsl:template match="person"> <xsl:text> Person found! </xsl:text> </xsl:template>
in the resulte tree, with content the child nodes of that instruction, e.g. in <xsl:template match="person"> <xsl:element name="Employee"> <xsl:apply-templates/> </xsl:element> </xsl:template> handy for producing elements with attributes and namespaces
Monday, 29 October 2012
36
<xsl:template match=”person"> <xsl:element name="Employee"> <xsl:attribute name="alter"> <xsl:value-of select=”@age"/> </xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template>
selected through expression, the string values that corresponds to that node, where the string value of a – text node is its text – attribute node is its value – element or root node is the concatenation of the string values of all its descendent text nodes
Monday, 29 October 2012
37
can be used to reuse fragments of the source document. Careful: – <xslt:value-of> converts fragments into a string before copying it into the result tree – <xslt:copy-of> copies the complete fragment based on the (required) select attribute, without first converting the fragment into a string – e.g., <xsl:template match="people"> <family><xsl:copy-of select="*"/></family> </xsl:template>
template (in case it contains a template as child nodes) – namespaces are included automatically in the copy – attributes are not automatically included, they can be included via the “use-attribute-set” attribute
running numbers -- beyond this class <xsl:template match="people"> <family> <xsl:for-each select="person"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:for-each> </family> </xsl:template>
Monday, 29 October 2012
– <xsl:if>, <xsl:choose> – “if” in XPath (2.0) expressions
– <xsl:for-each> – “for” in XPath (2.0) expressions
– You can name templates and then call them by name
– Interrupt or restart the template control flow
– <xsl:function> defines – Use as XQuery functions in XPath expressions
38
Monday, 29 October 2012
39
<xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:sequence select="if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence"/> </xsl:function> <xsl:template match="/"> <output> <xsl:value-of select="str:reverse('DOG BITES MAN')"/> </output> </xsl:template>
http://www.w3.org/TR/xslt20/#stylesheet-functions
Monday, 29 October 2012
40
<xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:choose> <xsl:when test="contains($sentence, ' ')"> <xsl:sequence select="concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' '))"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="$sentence"/> </xsl:otherwise> </xsl:choose> </xsl:function>
http://www.w3.org/TR/xslt20/#stylesheet-functions
Monday, 29 October 2012
41
declare namespace str="http://ex.org"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; declare function str:reverse($sentence as xs:string) as xs:string{ if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence }; str:reverse('DOG BITES MAN')
Monday, 29 October 2012
42 <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:str="http://example.com/namespace" version="2.0" exclude-result-prefixes="str"> <xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:sequence select=" if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence"/> </xsl:function> <xsl:template match="/"> <output> <xsl:value-of select="str:reverse('DOG BITES MAN')"/> </output> </xsl:template> </xsl:transform> declare namespace str="http://ex.org"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; declare function str:reverse($sentence as xs:string) as xs:string{ if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence }; str:reverse('DOG BITES MAN')
Monday, 29 October 2012
declare function local:copy($element as element()) { element {node-name($element)} {$element/@*, for $child in $element/node() return if ($child instance of element()) then local:copy($child) else $child } };
43
http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
Explicit recursion!
Monday, 29 October 2012
declare function local:copy-filter-elements($element as element(), $element-name as xs:string*) as element() { element {node-name($element) } { $element/@*, for $child in $element/node()[not(name(.)=$element-name)] return if ($child instance of element()) then local:copy-filter-elements($child,$element-name) else $child } };
44
http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT
<xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- remove all social security numbers --> <xsl:template match="PersonSSNID"/>
Eek!
Monday, 29 October 2012
– Share XPath – Share ton of functions – Theoretically, equivalent!
– Have to cope with FLOWR
– Have to encode template rules!
XSLT2XQTeam/Presentation.ppt for one approach
45
Kay, Comparing XSLT and XQuery Monday, 29 October 2012
46
– find more about them – experiment with schema awareness
– experiment with namespaces – (and with SA and namespaces) – get your own experiences using <oXygen/> – have a look, e.g., at the influence of template rules’ order to the result! – think about how one compare XSLT and XQuery
Monday, 29 October 2012
47
Monday, 29 October 2012
...we have designed our first “schema validator” algorithm
– walking the DOM tree in a depth-first, left-2-right way, or – using a SAX parser to do it in a streaming fashion
– to keep track of
written on the way down, checked on the way up
with?
48 ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise
local⇒unique!
Monday, 29 October 2012
...we expand the algorithm
– this gives us automatically a validator for structural aspect of WXS – will be rather straightforward
– this gives us automatically a validator for Relax NG schemas – will be more tricky: we’ll still use stacks to keep track of
written on the way down, checked on the way up
with?
49 ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– walking the DOM tree in a depth-first, left-2-right way, or – using a SAX parser to do it in a streaming fashion
– which is quite impressive/surprising for general/Relax NG:
✓we have already seen that
➡ DTDs are structurally weaker than WXS ➡ WXS is structurally weaker than RelaxNG
50 ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Loc ST Reg
Monday, 29 October 2012
equally cheap?!
– because each accepted tree only has a unique run – thus answers to schema-aware queries are unambigiously determined (1 doc validating against 1 schema results in exactly 1 PSVI)
51
PSVI
(tree adorned with default values & types)
Schema-aware query processor Schema- aware parser Query XML doc. Schema Query processor Query Answer
Monday, 29 October 2012
equally cheap?!
– because each accepted tree only has a unique run – thus answers to schema-aware queries are unambigiously determined
52
Remember: G= (N, Σ, S, P) with N = {S1, S2, B} Σ = {a,b} S = {S1, S2} P = {S1 ⟶ a B, S2 ⟶ a B, B ⟶ b ∊} G is not single-type and has 2 runs on ☞ namely: ...what does query for nodes/elemnts of ‘type’ S1 return?
ε a b S1 B ε a b S2 B ε a b
Monday, 29 October 2012
53
Monday, 29 October 2012
Input: DOM Tree for T, single-type tree grammar G = (N, Σ, S, P), NT is a stack of strings of non-terminals R is a stack of production rules Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down,
if there is a production rule N → a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop
When an element E is visited on way up,
pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT else report “not accepted” and stop
report “accepted” and stop
add E’s terminal node to its predecessor siblings 54
ValAlgo XML doc/Tree T single-type Grammar G “yes”, if T ∈ L(G) “no”, otherwise
See the paper by Murata, Lee, Mani, Kawaguchi store rule for E’s content in R start remembering E’s child nodes retrieve rule for E’s content in R retrieve E’s child nodes
single-type ⇒ unique rule!
nothing changed
Monday, 29 October 2012
When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop
55
ValAlgo XML doc/Tree T single-type Grammar G “yes”, if T ∈ L(G) “no”, otherwise
a c c b c b c c c c
– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*,D B → b (C,C)|C, C → c ϵ|C, D → c C,C,C} – ...in order to know which production rule N → c ... to chose for nodes labelled c, I need to check rule for predecessor and ensure that N
Monday, 29 October 2012
– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion
for general tree grammars
– we can have competing non-terminal symbols in RHS of rules – how do we know with which to continue? – try/guess one and, if failed, backtrack? – or by keeping track of all possibilities
56
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
57 store non-terminals from RHS of possibly applicable rules
we don’t know which to use!
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Input: DOM Tree for T, a tree grammar G = (N, Σ, S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules NS is a stack of sets of non-terminals, init with S Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS
else report “not accepted” and stop When an element E is visited on way up,
pop a rule set RS = {Ni → a ei | i = 1..k} out of R
Monday, 29 October 2012
Input: DOM Tree for T, a tree grammar G = (N, Σ, S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules NS is a stack of sets of non-terminals, init with S Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop
When an element E is visited on way up,
pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop
report “accepted” and stop
58 store non-terminals from RHS of possibly applicable rules
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C), C → a (A,A,A)|ϵ}
59
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} a a a a a ➀ ➁ ➂
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
60
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
61
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
62
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
63
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
64
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk {A,B,C} W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
(ϵ only matches RHS of ➀ & ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
65
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {A,C}
(ϵ only matches RHS of ➀ & ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
66
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} RS = {➀, ➁, ➂} {A,B,C} {➀, ➁, ➂} ϵ
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
67
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {A,C}
(ϵ only matches RHS of ➀ & ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
68
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C}
RS = {➀, ➁, ➂} ϵ = W1...Wk
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {A,C}
(ϵ only matches RHS of ➀ & ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
69
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C}
RS = {➀, ➁, ➂}
{➀, ➁, ➂} ϵ {A,B,C} ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
70
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C} {A,B,C}
RS = {➀, ➁, ➂} ϵ = W1...Wk
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {A,C}
(ϵ only matches RHS of ➀ & ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
71
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C},{A,C} {A,B,C}
RS = {➀, ➁, ➂} ϵ = W1...Wk
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {A,C}
(ϵ only matches RHS of ➀ & ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
72
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {A,B,C} RS = {➀, ➁, ➂} {A,C},{A,C},{A,C} = W1...W3
W = {C} (LHS of ➂)
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
(only AAA matches a RHS, namely of ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
73
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C} RS = {➀, ➁, ➂} ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise {A,C},{A,C},{A,C} = W1...W3
W = {C} (LHS of ➂)
(only AAA matches a RHS, namely of ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
74
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C}
RS = {➀, ➁, ➂}
{➀, ➁, ➂} ϵ {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
75
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C}
RS = {➀, ➁, ➂}
{A,B,C} ϵ = W1...Wk ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {A,C}
(ϵ only matches RHS of ➀ & ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
76
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C},{A,C} {A,B,C}
RS = {➀, ➁, ➂} ϵ = W1...Wk
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {A,C}
(ϵ only matches RHS of ➀ & ➂)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
77
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {A,B,C}
RS = {➀, ➁, ➂} {C},{A,C} = W1...Wk
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {B}
(only CC matches RHS of ➁)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
78
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
{B} {➀, ➁, ➂} NS {A,B,C} {A,B,C}
RS = {➀, ➁, ➂}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {B}
(only CC matches RHS of ➁)
{C},{A,C} = W1...Wk
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
79
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
NS {A,B,C} {A,B,C}
RS = {➀, ➁, ➂} {B} = W1...Wk
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
W = {A,B}
(B matches RHS of ➀ & ➁)
Monday, 29 October 2012
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
80
a a R
NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
NS {A,B,C} “accepted”/“yes”, T is accepted by G
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion
– does not require guessing & backtrack – can be implemented in a streaming way – is a bit more tricky than validating single-type grammars, – but not really more complex (in terms of time/space)
– so, for validating purposes, restrictions to single-type is not necessary
– but, for uniqueness of PSVI,
81
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Monday, 29 October 2012
– do persons’ names have both a first and second name?
– is age an integer? And DoB a date?
– adding default values or type information for easy/robust querying/manipulation
– in CW5, not all valid input documents were really grammars – checking whether non-terminals are mentioned correctly is beyond XSD’s abilities...we need an even more powerful schema language!
82
Monday, 29 October 2012
...closely related to validation are
– given a schema/grammar S, does there exist a document/tree d such that d is valid w.r.t. S – relevant as a basic consistency test for schemas
– given schemas/grammars S1, S2, is S1 a specialization of S2? – i.e., is every document that is valid w.r.t. S1 also valid w.r.t. S2? – relevant to support tasks such as schema refinement:
what I wanted
– also solves schema equivalence: see your coursework!
83
Monday, 29 October 2012
84
Monday, 29 October 2012
Or, so wrong it’s right
85
Monday, 29 October 2012
– Authoring, aggregating, querying…
– Perhaps the Atom DOM
– To the representative one – Or build software that mediates the difference
– Or make them – The nice thing about standards is that there are so many of them to choose from.
Monday, 29 October 2012
– Many DOMs, all expressing the same thing – Many surface syntaxes (perhaps) for each DOM
– What should we send?
– Minimal standards?
Be liberal in what you accept, and conservative in what you send.
Monday, 29 October 2012
88
– A very common application layer is “rendering”
<sentence style="slanted">This sentence is false.</sentence> This sentence is false. Correct rendering This sentence is false. Fallback!
(Still see this in XSLT!)
Monday, 29 October 2012
89
– The "look" may need updating
– What works for 21" screens doesn't for mobile phones
– (content should be perceivable by everyone)
Monday, 29 October 2012
90
– Strong separation of presentation
– not an XML/angle brackets format
– annotative, not transformative
– mostly “formats” nodes – ubiquitous on the Web, esp. client side – works with arbitrary XML
Monday, 29 October 2012
91
– Which consist of
– Like XPath expressions – But only forward, with some syntactic sugar
– Sets of property/value pairs
div.title { text-align:center; font-size: 24; }
Monday, 29 October 2012
92
<html><head><title>A bit of style</title></head> <body><style type="text/css"> .title { font-weight: bold } div.title { text-align:center; font-size: 24; } div.entry div.title { text-align: left; font-variant: normal} span.date {font-style: italic} span.date:after {content:" by"} div.content {font-style: italic} div.content i {font-style: normal;font-weight: bold} #one {color: red}</style> <div class="title">My Weblog</div> <div class="entry"> <div class="title">What I Did Today</div> <div class="byline"> <span class="date">Feb. 09, 2009</span> <span class="author">Bijan Parsia</span> </div> <div class="content" id="one"> <p>Taught a class and it went <i>very</i> well.</p> </div> </div> </body></html>
Try it in http://software.hixie.ch/utilities/js/live-dom-viewer/
Monday, 29 October 2012
93
– Screen, Print, Braille, Aural…
@media print { BODY { font-size: 10pt } } @media screen { BODY { font-size: 12pt } }
Larger font size for screen
Monday, 29 October 2012
94
– That is, there is overriding (and non-overriding) inheritance
– http://www.w3.org/TR/CSS21/cascade.html#cascade
– Distance to the node is significant – Precision of selectors is significant – Order of appearance is significant
Monday, 29 October 2012
95
– Well formedness error…BOOM
– “Rules for handling parsing errors”
http://www.w3.org/TR/CSS21/syndata.html#parsing-errors
– E.g.,“User agents must ignore a declaration with an unknown property.”
Monday, 29 October 2012
96
– Cascading & Inheritance help with 1, 2, 5
– @media rules help with 3-6 – Error handling helps with 1, 2, 4
Monday, 29 October 2012
– Make errors hard or impossible to make
– Make doing the right thing easy and inevitable – Make detecting errors easy – Make correcting errors easy – Correct errors – Fail silently – Fail randomly – Fail differently (interop problem)
97
Monday, 29 October 2012
– an available action that is salient to the actor
Donald Norman, The Design of Everyday Things
Monday, 29 October 2012
– an available action that is salient to the actor
Donald Norman, The Design of Everyday Things
Monday, 29 October 2012
– with a bad or wrong action – In law, “a hazardous object or condition on the land that is likely to attract children who are unable to appreciate the risk posed by the object or condition” -- ye olde Wikipedia – We can reformulate
misused by (even) an educated user”
– An attractive nuisance is easy to attempt, hard to use (correctly), and has bad (to catastrophic) effects
Monday, 29 October 2012
– Recognize all or none
– Restrictive by default
– Error detection and reporting
– might not be the point where it occurred – might not be the most helpful point to look at!
– Null pointer deref » Is the right point the deref or the setting to null? – Non-crashing errors
Monday, 29 October 2012
– Irregular structure!
Monday, 29 October 2012
– Be strict about the well formedness of what you accept, and strict in what you send – Draconian error handling – Severe consequences on the Web
– Validity and other analysis? – Most schema languages poor at error reporting
Monday, 29 October 2012
– fatal error [Definition: An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).]
– To or for its users
Monday, 29 October 2012
<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml
count(//b) count(//b/*) count(//b/text()) =3 =3 =0 =1 =0 =1
<a> <b/> <b>Foo</b> </a>
=0
<a> <b/> <b><b/><b/> </a>
=0
Monday, 29 October 2012
<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml
=0 =2
<a> <b/> <b>Foo</b> </a>
=1
<a> <b/> <b><b/><b/> </a>
=1
Monday, 29 October 2012
<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml
=valid =invalid
<a> <b/> <b>Foo</b> </a> <a> <b/> <b><b/><b/> </a>
Can even “find” the errors!
Monday, 29 October 2012
Monday, 29 October 2012
– Validate parts of a document – A la wildcards
– Far reaching dependancies – Computations
– With XQuery and XSLT – But still a leetle declarative
The essence of Schematron
Monday, 29 October 2012
– Not grammar or object/type based – Rule based – Test oriented – Complimentary
– Patterns contain rules
– Tests, which are XPath expressions, and – Assertions, which are natural language descriptions
Monday, 29 October 2012
– (Ok, could handle this with Keys in XML Schema!)
<rule context="element">
<let name="n" value="@name"/> <assert test="count(//element/name[text()=$n]) = 1"> There can be only one element declaration with a given name. </assert> </rule>
declaration ”
<rule context="elementref"> <let name="r" value=”/ref/text()"/> <assert test="count(//element/nametext()=$r]) = 1"> There must be an element declaration (with the right name) for elementref to refer to. </assert> </rule>
Monday, 29 October 2012
–Using XPath functions and variables
–Can pull stuff from other file
–diagnostics has (value-ofed) expressions –“Generate paths” to errors
–Thin shim over XSLT –Closer to “arbitrary code”
Monday, 29 October 2012
– Schematron doesn’t care – Two phase validation
– Plus variables!
– Unlike all the other schema languages! – We’re not performing runs
– Somewhat easy to use
– What about analysis?
Monday, 29 October 2012
–As do all XML schema languages
–So can’t help with e.g., overlapping tags
–At least, in the default case
–Unlike CSS
–Or rather, does it support enough liberality?
Monday, 29 October 2012
03. <title>Hello!</title> 04. <meta http-equiv="Content-Type" content="application/xhtml+xml" />
07. <p>Hello to you!</p> 08. <p>Can you spot the problem?
115 Slide due to Iain Flynn
Monday, 29 October 2012
116 Slide due to Iain Flynn
Monday, 29 October 2012
– 1%-5% of web pages are valid – Validation is very weak! – All sorts of breakage
– 10% feeds not well-formed – Where do the problems come from?
Monday, 29 October 2012
In 2005, the developers of Google Reader (Google’s RSS and Atom feed parser) took a snapshot of the XML documents they parsed in one day.
least one well-formedness error.
– That’s a lot of broken documents
Source: http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html Slide due to Iain Flynn
Monday, 29 October 2012
The Quality of the XML Web [2011]
Monday, 29 October 2012
Text
Encoding Structure Entity Typo
Slide due to Iain Flynn
Monday, 29 October 2012
!"#$%&"'() *+,)
23,) !"4.5() **,) 657$() 2,)
!""#"$%"&'()#*+,$
!"#$%&"'()
!"4.5() 657$()
Slide due to Iain Flynn
Monday, 29 October 2012
– “All of its default templates were valid XHTML.” – “It incorporated a nifty layout editor to ensure that you couldn’t introduce any invalid XHTML...”
– “the page that you...validly authored is now not well-formed”
– “...your publishing tool had a bug” – “The administration page itself tries to display the trackbacks you’ve received, and you get an XML processing error.”
http://diveintomark.org/archives/2004/01/14/thought_experiment
Monday, 29 October 2012
Monday, 29 October 2012
– Complex ones! – Many players; many sorts of player – Lots of historical specifics – Lots of interaction effects
– What do people do (and why?) – How to influence them? – Affordances and incentives – Dealing with “bozos”
syndication feed that’s well-formed XML is an incompetent fool.”
Monday, 29 October 2012
– Fail hard and fast
– CSS, DTD ATTLISTs, HTML
– HTML, HTML5
– The key is to fail correctly
– With the right message!
Every set of bytes has a corresponding (determinate) DOM
Monday, 29 October 2012
Or, goodbyes and farewells
126
Monday, 29 October 2012
– flexibility and stability – flexibility and efficiently – expressivity and efficiently – usability and flexibility – usability and rigidity – etc. etc. etc.
– understand trade-offs – cultivate judgement
– there is no silver bullet
Monday, 29 October 2012
– Due MONDAY, NOV 5TH!!! – At 9:00AM
– Due after period 2 – So as not to conflict – Practice some Java!
Monday, 29 October 2012
– Basically, an extended version of Qs and SEs
– After break
– For revision
Monday, 29 October 2012
things you’ve learned; see me if you’re interested
Monday, 29 October 2012