COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML - PowerPoint PPT Presentation

Complex Typed Transform • Input and output all typed <?xml version="1.0" encoding="UTF-8"?> <owl:Ontology xmlns:owl="http://www.w3.org/2002/07/owl#"> <owl:EquivalentClasses> <owl:Class IRI="http://BOGUS"/> <owl:Class IRI="http://BOGUS"/> </owl:EquivalentClasses> <owl:EquivalentClasses> <owl:Class IRI="http://BOGUS"/> <owl:Class IRI="http://BOGUS"/> </owl:EquivalentClasses> <owl:EquivalentClasses> The only “proper” value <owl:Class IRI="Person"/> <owl:Class IRI="http://BOGUS"/> </owl:EquivalentClasses> <owl:EquivalentClasses> <owl:Class IRI="http://BOGUS"/> <owl:Class IRI="http://BOGUS"/> </owl:EquivalentClasses> </owl:Ontology> 17 Sunday, 21 October 2012 17

Type Soundness Type-inference rules are written in such a way that any value that can be returned by an expression is guaranteed to conform to the static type inferred for the expression. This property of a type system is called type soundness. A consequence of this property is that a query that raises no type errors during static analysis will also raise no type errors during execution on valid input data. The importance of type soundness depends somewhat on which errors are classified as "type errors," as we will see below. • A (statically verified) type safe program – has some guaranteed behavior • and thus can be transformed or optimized in aggressive ways – may be more brittle • fails hard on invalid input • less input is valid 18 http://www.informit.com/articles/article.aspx?p=100667&seqNum=6 Sunday, 21 October 2012 18

Data Representations • Data and data structures have representations – (More or less) Physical embodiments – (Ultimately) Bits in a machine • The “same” data can have distinct representations – 1 vs. “one” • The “same” data structure can have distinct representations – At different levels of abstraction • One key distinction – Internal (“in-memory”) “Location” doesn’t really matter – External (“on disk”) • Generally: – External representations are for exchange between (heterogeneous) systems 19 Sunday, 21 October 2012 19

A Java Example (1) • Consider a value of type int* – 109987 • We have several canonical external representations: – Decimal: 109987 – Hexadecimal: 1ADA3 (0x1ADA3 in source code) – Octal: 326643 (0326643 in source code) • We have one (canonical) internal representation: – 32 bit, signed two’s complement • 11010110110100011 – (Each “digit” is a bit not a character) – The representations are different • Decimal size in memory: Approx** 48 bytes • Internal rep: 4 bytes *We consider only ints, i.e., 32 bit integers ** http://www.javaworld.com/javaworld/javatips/jw-javatip130.html?page=2 ** See also: http://lingpipe-blog.com/2010/06/22/the-unbearable-heaviness-jav-strings/ 20 Sunday, 21 October 2012 20

A Java Example (2) • We have APIs (the Integer class): – Reading/Parsing/Deserializing/Unmarshalling – Writing/Printing/Serializing/Marshalling – ADT functions • +, -, /, *, <, >, etc. – For examining and manipulating the internal rep http://download.oracle.com/javase/6/docs/api/java/lang/Integer.html 21 Sunday, 21 October 2012 21

JSON (1) • Javascript has a rich set of literals (ext. reps) – Atomic (numbers, booleans, strings*) • 1, 2, true, “I’m a string” – Composite • Arrays – Ordered lists with random access – [1, 2, “one”, “two”] • “Objects” – Associative arrays/dictionary – {“one”:1, “two”:2} • These can nest! – [{“one”:1, “o1”:{“a1”: [1,2,3.0], “a2”:[]}] • JSON == roughly this subset of Javascript – The internal representation varies • In JS, 1 represents a 64 bit, IEEE floating point number • In Python’s json module, 1 represents a 32 bit integer in two’s complement *Strings can be thought of as a composite, i.e., an array of characters, but not here. 22 Sunday, 21 October 2012 22

JSON (2) {"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }} Slightly different! <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 23 Sunday, 21 October 2012 23

JSON (2.1) Needed to preserve {"menu": [{ order! "id": "file", "value": "File"}, "popup": [ "menuitem": {"value": "New", "onclick": "CreateNewDoc()"}, "menuitem": {"value": "Open", "onclick": "OpenDoc()"}, "menuitem": {"value": "Close", "onclick": "CloseDoc()"} ] ] }} Still not right! <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 24 Sunday, 21 October 2012 24

JSON (2.2) {"menu": [{"id": "file", "value": "File"}, [{"popup": [{}, [{"menuitem": [{"value": "New", "onclick": "CreateNewDoc()"},[]]}, {"menuitem": [{"value": "Open", "onclick": "OpenDoc()"},[]]}, {"menuitem": [{"value": "Close", "onclick": "CloseDoc()"},[]]} ] ] } ] ] } <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 25 Sunday, 21 October 2012 25

JSON (2.1) Recipe • Elements are mapped to “objects” – With one pair • ElementName : contents • Contents are a list – First item is an “object”, the attributes • Attributes are pairs of strings – Second item is a list (of children) • Empty elements require an explicit empty list • No attributes requires an explicit empty object Cumbersome! 26 Sunday, 21 October 2012 26

JSON vs. XML (expressivity) CLICK! • Every XML WF DOM can be faithfully represented as a JSON object • Every JSON object can be faithfully represented as an XML WF DOM • Every WXS PSVI can be faithfully represented as a JSON object • Every JSON object can be faithfully represented as a WXS PSVI 27 Sunday, 21 October 2012 27

Conversion • We can go from internal to external (i2e) – Parsing, reading, loading, de-serializing, unmarshalling • We can go from external to internal (e2i) – Serializing, writing, printing, saving, marshalling – Different systems may have different internals • At least in detail – Different applications may behave differently • There and back again – Roundtripping • Internal to external to internal (e2i2e) • External to internal to external (i2e2i) • Ideally preserves key properties – Which? – When is ok not to preserve? 28 Sunday, 21 October 2012 28

What is an XML “Document”? • Layers – A series of octets Errors here mean no – A series of unicode characters XML! SAX ErrorHandler – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure Yay! XPath! XSLT! Etc. • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape Types in play • A PSVI wrt an WXS 29 Sunday, 21 October 2012 29

What is an XML “Document”? • Layers validate – A series of octets – A series of unicode characters – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS erase 30 Sunday, 21 October 2012 30

What is an XML “Document”? • Layers – A series of octets – A series of unicode characters – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset – A tree of a certain shape “Same” inputs can • A Validated Infoset have different “meanings”! – An adorned tree of a certain shape (external validation) • A PSVI wrt an WXS 31 Sunday, 21 October 2012 31

What is an XML “Document”? • Layers Generally looks like <configuration xmlns="http://saxon.sf.net/ns/configuration" – A series of octets edition="EE"> <serialization method="xml" /> – A series of unicode characters </configuration> – A series of “events” But can look otherwise! • SAX perspective element configuration { attribute edition {"ee"}, • E.g., Start/End tags element serialization {attribute method {"xml"}}} • Events are tokens – A tree structure Same “meaning”, • A DOM/Infoset different spelling – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS 32 Sunday, 21 October 2012 32

What is an XML “Document”? • Layers – A series of octets – A series of unicode characters Can have many... – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens ..for “the same” meaning – A tree structure • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS – A picture (or document, or action, or … ) • Application meaning 33 Sunday, 21 October 2012 33

The Essence of XML (with WXS) • Thesis: – “XML is touted as an external format for representing data.” • Two properties – Self-describing • Destroyed by external validation – Round-tripping • Destroyed by defaults and union types http://bit.ly/essenceOfXML2 34 Sunday, 21 October 2012 34

The Essence of XML (with WXS) • Roundtripping issues – Internal to external and back • Take an element, foo, with content {“one”, “2”, 3} • It’s (simple) type is a list of union of integer and string • Serialise – <foo>one 2 3</foo> • Parse and validate – Content is {“one”, 2, “3”} – External to internal and back • “001” to 1 to “1” http://bit.ly/essenceOfXML2 35 Sunday, 21 October 2012 35

The Essence of XML (with WXS) • Conclusion: – “So the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.” • It ʼ s not obvious – That the issues are serious (enough) – That the problem solved is all that easy – That there aren ʼ t other, worse issues http://bit.ly/essenceOfXML2 36 Sunday, 21 October 2012 36

S’more Tree Grammars 37 Sunday, 21 October 2012 37

Tree Grammars: a reminder • Production rules – are central to tree grammars N → P (PA | (FEd,SEd*)) – reflect element declarations • ...to be read as follows – for each w ∈ nodes(T) with children w 1 w 2 ... w n , match? there exists a rule X → a e ∈ P such that • r(w) = X, r(w)=N w P • T(w) = a, and • r(w 1 ) r(w 2 )... r(w n ) matches e. ... w1 ? w2 ? wn ? r(w1) r(w2) r(wn) = FEd =SEd =SEd then, for w1,w2,..: check FEd → ? e1 SEd → ? e2 38 Sunday, 21 October 2012 38

Tree Grammars: 3 more things ★ A single-type grammar can have no more than one run on a tree. ★ A regular grammar can have more than one run on a tree. • BTW, w.l.o.g., we can assume that no two production rules have the same non-terminal on the left hand side and the same terminal. I.e., no N → P PA and N → P (Editor,Editor*). We can also rewrite those, e.g., to N → P (PA | (Editor,Editor*)) • ...so, how did we get here? From DTDs and XML schemas! 39 Sunday, 21 October 2012 39

Tree Grammars and DTDs • since DTDs don’t have “types”, just element names, they correspond to grammars of a peculiar, simple kind: ε T F = (N, Σ , S, P) with <!ELEMENT T (N1,N2*)> N = {T, N1, N2, M, pcdata} <!ELEMENT N1 (M|(M,M))> Σ = {T, N1, N2, M, pcdata} <!ELEMENT N2 (#PCDATA)> 0 N1 1 N2 S = {T} <!ELEMENT M (#PCDATA)> M P = { T → T (N1,N2*), pcdata 0,0 N1 → N1 (M|(M,M)), 1,0 N2 → N2 pcdata , 0,0,0 pcdata M → M pcdata , pcdata → pcdata ε } ★ Tree grammars for DTDs are always local ...even if the DTD has a non-deterministic content model <!ELEMENT N1 (M|(M,M))> is not deterministic and thus illegal (but can be replaced with <!ELEMENT N1 (M,(M| ε ))>) 40 Sunday, 21 October 2012 40

Remember?! • in DTDs and in WXS, content models are further restricted (for compatibility with SGML) – [DTD] determistic (or 1-unambiguous), e.g., (M|(M,M)) is not deterministic, (M,(M| ε )) is. e.g., ((b, c) | (b, d)) is not deterministic, b,(c|d) is. From http://www.w3.org/TR/REC-xml/: As noted in 3.2.1 Element Content, it is required that content models in element type declarations be deterministic . This requirement is for compatibility with SGML (which calls deterministic content models "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors. More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of Aho, Sethi, and Ullman [Aho/Ullman]. In many such algorithms, a follow set is constructed for each position in the regular expression (i.e., each leaf node in the syntax tree for the regular expression); if any position has a follow set in which more than one following position is labeled with the same element type name, then the content model is in error and may be reported as an error. 41 Sunday, 21 October 2012 41

Tree Grammars and DTDs • so, DTDs are local (and thus single-type) because they don’t have types at all – and not because their content model is deterministic! – they are single-type even with non-deterministic content model • hence we could extend DTDs with types and still be single- type...provided we impose suitable restrictions 42 Sunday, 21 October 2012 42

Tree Grammars and WXS • tree grammars also capture the basic, structural part of WXS: ✓ types (complex and anonymous) ‣ model groups (we ignore them) ‣ derivation by extension and restriction (we ignore them) ‣ substitution groups (we ignore them) ‣ integrity constraints like keys (must be ignored, don’t fit into tree grammars) • we only deal with simple XML schemas, but general approach works for more • to transform an XML schema S into a tree grammar G, 1. we translate S into a generalized tree grammar 2. then flatten the generalized tree grammar into a tree grammar G • this will be done such that T validates against S iff T is accepted by G. 43 Sunday, 21 October 2012 43

Translating WXS into Tree Grammars • let S be a simple XML Schema ➡ for each top-level element in S of the form – <xs:element name="mylist" type="BlistT"></xs:element> • add the following production rule to your grammar – MYLIST → mylist BLIST^TYPE – add MYLIST, BLIST^TYPE to non-terminals, add mylist to terminals ➡ for each t op-level element in S of the form – <xs:element name="mylist"> <xs:complexType> <xs:sequence> <xs:element name="ename" type="CompT" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> • add the following production rules to your grammar – MYLIST → mylist ENAME,ENAME* what is the default – ENAME → ename COMP^TYPE for minOccurs? – add MYLIST, ENAME, COMP^TYPE to non-terminals, add mylist, ename to terminals 44 Sunday, 21 October 2012 44

Translating WXS into Tree Grammars ➡ for each t op-level element in S of the form – <xs:complexType name="BlistT"> <xs:sequence> <xs:element name="friend" type='PersonT' minOccurs = ʻ 1 ʼ maxOccurs ='2'/> </xs:sequence> </xs:complexType> • add the following production rules to your grammar – BLIST^TYPE → (FRIEND | (FRIEND,FRIEND)) %% generalized rule: to be expanded! – FRIEND → friend PERSON^TYPE – add BLIST^TYPE, FRIEND, PERSON^TYPE to non-terminals, add friend to terminals 38 45 Sunday, 21 October 2012 45

Translating WXS into Tree Grammars ➡ for each t op-level element in S of the form - <xs:complexType name="BBlistT"> <xs:choice> <xs:sequence> <xs:element name="A" type="xs:string"/> %% UPA - violation: <xs:element name="B" type="xs:string"/> %% Oxygen complains! </xs:sequence> <xs:sequence> <xs:element name="A" type="xs:string"/> <xs:element name="C" type="xs:string"/> </xs:sequence> </xs:choice> </xs:complexType> • add the following production rules to your grammar – BBLIST^TYPE → (A,B) | (A,C) – A → A STRING^TYPE %% generalized rule -- to be expanded! – B → B STRING^TYPE – C → C STRING^TYPE – add BBLIST^TYPE, A, B, C, STRING^TYPE to non-terminals, add A, B, C to terminals 46 Sunday, 21 October 2012 46

Translating WXS into Tree Grammars • Consider the following case: <xs:complexType name="BT"> <xs:complexType name="AT"> <xs:sequence> <xs:sequence> <xs:element name="N" <xs:element name="N" type="BlistT" type="AlistT" minOccurs="0" minOccurs="0" maxOccurs="unbounded"/> maxOccurs="unbounded"/> </xs:sequence> </xs:sequence> </xs:complexType> </xs:complexType> • To handle cases like the one above we can’t always add rules – AT^TYPE → N*, BT^TYPE → N* – N → N ?? LIST^TYPE • Instead, we translate these as – AT^TYPE → NÂSÂLIST^TYPE* BT^TYPE → NÂS^BLIST^TYPE* – NÂSÂLIST^TYPE → N ALIST^TYPE – NÂS^BLIST^TYPE → N BLIST^TYPE 47 Sunday, 21 October 2012 47

Translating WXS into Tree Grammars Our translation yields almost a tree grammar: it produces illegal rules of the form X → e, i.e., without non-terminal • – e.g., BLIST^TYPE → (FRIEND | (FRIEND,FRIEND)) • our grammar model doesn’t handle those (check definition of a run) ๏ hence we expand these illegal rules: pick illegal rule X → e: – remove X → e from rule set – replace all occurrences of X in rule set with e until no illegal rules are left in rule set • e.g., MYLIST → mylist BLIST^TYPE would be transformed into – MYLIST → mylist (FRIEND | (FRIEND,FRIEND)) • ...and if we had <xs:element name="yourlist" type="Blist"/> then we also had YOURLIST → yourlist BLIST^TYPE and thus – – YOURLIST → yourlist (FRIEND | (FRIEND,FRIEND)) 48 Sunday, 21 October 2012 48

Translating WXS into Tree Grammars • Expanding illegal rules even works with cyclic type definitions - try <xs:complexType name="NT"> <xs:complexType name="AT"> <xs:choice> <xs:choice> <xs:element name="test2" type="AT"/> <xs:element name="test1" type="NT"/> <xs:element name="EndElement" <xs:element name="EndElement" type="xs:string"/> type="xs:string"/> </xs:choice> </xs:choice> </xs:complexType> </xs:complexType> • This gives you these rules, including 2 illegal rules NT^TYPE → (TEST2 | ENDELEMENT) TEST1 → test1 NT^TYPE TEST2 → test2 AT^TYPE ENDELEMENT → EndElement STRING^TYPE ENDELEMENT → EndElement STRING^TYPE AT^TYPE → (TEST1 | ENDELEMENT) • ...which can be expanded as follows: TEST2 → test2 (TEST1 | ENDELEMENT) TEST1 → test1 (TEST2 | ENDELEMENT) ENDELEMENT → EndElement STRING^TYPE ENDELEMENT → EndElement STRING^TYPE 49 Sunday, 21 October 2012 49

WXS and Tree Grammars • So, to transform an XML schema S into a tree grammar G, 1. we translate S into a generalized tree grammar G’ 2. then expand G’ into a tree grammar G ★ Then any tree T validates against S iff T is accepted by G. • So, what are the tree grammars we get as results? – they are tree grammars – are they single-type? Loc Reg ST – are they local? ★ Tree grammars corresponding to WXS are not local. • E.g., consider – NÂSÂLIST^TYPE → N ALIST^TYPE – NÂS^BLIST^TYPE → N BLIST^TYPE .. NÂSÂLIST^TYPE and NÂS^BLIST^TYPE are competing! – 50 Sunday, 21 October 2012 50

WXS and Tree Grammars ★ Tree grammars corresponding to WXS are single-type. – This is ensured by the Unique Particle Attribution constraint in WXS. • Tree grammars corresponding to DTDs are local, … .hence ★ DTDs are less expressive than XML schemata. Loc Reg ST • That is, there are tree languages that we can describe in WXS, but not in DTDs, e.g.: ε ε P B N = {Book, PA, Editor, A, Paper, F, L} Σ = {B,N,A,P,C} 0 S = {Book, Paper} 0 N N P = { Book → B Editor|PA, Paper → P PA, Editor → N F,L, PA → N L,A, L L F A F → F ε , L → L ε , A → A ε } 0,1 0,0 0,0 0,1 Sunday, 21 October 2012 51

Remember: • In XML Schema, content model is constrainted as well – to make validation easier & for compatibility with SGML – e.g., through Unique Particle Attribute Constraint : A content model must be formed such that during validation of an element information item sequence, the particle component contained directly, indirectly or implicitly therein with which to attempt to validate each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence. http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#cos-nonambig Rephrasing: a content model M must be formed such that, during validation of an element E’s childnode sequence E 1 ...E k , we can, starting from i = 1 and increasing, associate each E i with a single particle contained (possibly implicitly) in M without examining the content or attributes of E i , and without any information about any E j with j >i. 52 Sunday, 21 October 2012 52

Content models & types in DTD & WXS • (we already know that) in WXS, we have a type hierarchy – an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y • but you have to say so explicitly: – we call this ‘named’ typing: <person phone="2"> • sub-types are declared (restriction <Name>Peter</Name> or extension), and not inferred <DoB>1966-05-04</DoB></person> <person xsi:type="LongPersonType" phone="5432"> (by comparing structure) <Name>Paul</Name> – in DTDs, we don’t have types! <DoB>1967-05-04</DoB> <address>Manchester</address></person> • In order to prevent difficulties in WXS as caused by types, Element Declarations Consistent constraint is imposed: <xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> 53 Sunday, 21 October 2012 53

Outlook: next steps Loc Reg ST ✓ we have now seen that • DTDs � local grammars • WXS � single-type grammars ➡ DTDs are structurally weaker than WXS • RelaxNG: an even stronger schema language • RelaxNG � regular grammars ➡ DTDs are structurally weaker than WXS • we will also look into how computationally expensive validation is • against DTD/local grammar • against WXS/single-type grammar • against RelaxNG/regular grammar ➡ ...all roughly the same! 54 Sunday, 21 October 2012 54

Relax NG, a very powerful schema language 55 Sunday, 21 October 2012 55

Relax NG: yet another schema language • Relax NG was designed to be a simpler schema language • (described in a readable on-line book by Eric Van der Vlist) • and allows us to describe (valid) XML documents in terms of their tree abstractions : – no default attributes – no entity declarations – no key/uniqueness constraints – minimal datatypes: only “token” and “string” like DTDs (but a mechanism to use XSD datatypes) • since it is so simple/flexible – it’s (claimed to be) easy to use – it doesn’t have complex constraints on description of element content like determinism/1-unambiguity – it’s claimed to be reliable – but you need other tools to do other things (like datatypes and attributes) 56 Sunday, 21 October 2012 56

Relax NG: another side of Determinism • remember that DTDs and WXS required their content models to be – [DTD] deterministic (and thus look-ahead-free) – [WXS] deterministic (EDC, every matching child node sequence matches in exactly one way only) – [WXS] UPA constraint expresses both and other constraints even more • determinism & single-typeness have a reason: – some tools annotate a (valid) document while parsing: • type information -- to be exploited, e.g., for concise queries (remember assignment?) • default attribute values – if your schema is not single-type, then • tools validating the same document against the same schema may construct different PSVIs • this can happen with different tools or different runs of the same tool 57 Sunday, 21 October 2012 57

RelaxNG: another side of Validation Reasons why one would want to validate an XML document: • ensure that structure is ok • ensure that values in elements/attributes are of the correct type • generate PSVI to work with • check constraints on co-occurrence of elements/how they are related • check other integrity constraints, eg. a person age vs. their mother’s age • check constraints on elements/their value against external data – postcode correctness – VAT/tax/other numeric constraints – spell checking ...only few of these checks can be carried out by validating against schemas... Relax NG was designed to 1. validate structure and 2. link to datatype validators to type check values of elements/attributes 58 Sunday, 21 October 2012 58

Relax NG: basic principles • Relax NG is based on patterns (similar to XPath expressions): – a pattern is a description of a set of valid node sets – we can view our example as different combinations of different parts, and <?xml version="1.0" encoding="UTF-8"?> design patterns for each <people> <person age="41"> – enhanced flexibility <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>.... </people> 59 Sunday, 21 October 2012 59

Relax NG: good to know grammar { start = Relax NG comes in 2 syntaxes element name { • the compact syntax element first { text }, – succinct element last { text } – human readable }} • the XML syntax – verbose <grammar – machine readable xmlns="http:...” ü Trang converts between xmlns:a="http:.." the two, pfew! datatypeLibrary="http:...> (and also into/from <start> other schema <element name="name"> languages) <element name="first"><text/></element> <element name="first"><text/></element> ü Trang can be used from </element> Oxygen </start> </grammar> 60 Sunday, 21 October 2012 60

Relax NG - structure validation: • 3 kinds of patterns , for the 3 “central” nodes: – text element name { – attribute element first { text }, – element element last { text }} • these can be combined is a RelaxNG schema for (parts of) this: – ordered groups – unordered groups <?xml version="1.0" encoding="UTF-8"?> – choices <people> <person age="41"> <name> • we can constrain cardinalities of patterns <first>Harry</first> <last>Potter</last> • text nodes </name> <address>4 Main Road </address> – can be marked as “data” and linked <project type="epsrc" id="1"> • we can specify libraries of patterns DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>.... </people> 61 Sunday, 21 October 2012 61

Relax NG: ordered groups • we can name patterns • in strange “chains” • we can use ?, *, and +: grammar { start = <?xml version="1.0" encoding="UTF-8"?> element people {people-content} <people> <person age="41"> people-content = <name> � is a element person { person-content }+ <first>Harry</first> RelaxNG schema <last>Potter</last> for this � person-content = attribute age { text }, </name> � element name {name-content}, <address>4 Main Road </address> � element address { text }+, <project type="epsrc" id="1"> � element project {project-content}* DeCompO </project> name-content = element first { text }, <project type="eu" id="3"> use “?” if � element middle { text }?, TONES optional � element first { text } </project> </person> project-content = attribute type { text }, <person>.... � attribute id {text}, </people> � text } 62 Sunday, 21 October 2012 62

⇆ Relax NG: ordered groups in XML syntax (Trang knows … ) our schema in compact syntax: our schema in XML syntax: grammar { start = <?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> element people {people-content} <start> <element name="people"> people-content = <ref name="people-content"/> </element> element person { person-content }+ </start> use <define name="people-content"> person-content = attribute age { text }, <oneOrMore> Trang <element name="person"> � element name {name-content}, to <ref name="person-content"/> � element address { text }+, </element> � element project {project-content}* convert </oneOrMore> </define> <define name="person-content"> name-content = element first { text }, <attribute name="age"/> � element middle { text }?, <element name="name"> � element first { text } <ref name="name-content"/> </element> <oneOrMore> project-content = attribute type { text }, <element name="address"> � attribute id {text}, <text/> </element> � text } </oneOrMore> <zeroOrMore> <element name="project"> <ref name="project-content"/> </element> </zeroOrMore> </define> <define name="name-content"> <element name="first"> <text/> </element> <optional> <element name="middle"> 63 <text/> </element> Sunday, 21 October 2012 63

Relax NG: different styles • so far, we modelled ‘element centric’...we can model ‘content centric’: grammar { start = grammar { start = people-element element people {people-content} people-element = element people people-content = { person-element+ } element person { person-content }+ person-element = element person { person-content = attribute age { text }, � attribute age { text }, � element name {name-content}, � name-element, � element address { text }+, � address-element+, � element project {project-content}* � project-element*} name-content = element first { text }, name-element = element name { � element middle { text }?, � element first { text }, � element first { text } � element middle { text }?, � element last { text } } project-content = attribute type { text }, � attribute id {text}, address-element = element address { text } � text } project-element = element project { � attribute type { text }, � attribute id {text}, � text }} 64 Sunday, 21 October 2012 64

Relax NG - structure validation: ordered groups • we can combine patterns in fancy ways: grammar {start = element people {people-content} people-content = element person { person-content }+ <?xml version="1.0" encoding="UTF-8"?> <people> person-content = HR-stuff, <person age="41"> � contact-stuff <name> <first>Harry</first> HR-stuff = attribute age { text }, <last>Potter</last> � project-content </name> <address>4 Main Road </address> contact-stuff = attribute phone { text }, <project type="epsrc" id="1"> � element name {name-content}, DeCompO � element address { text } </project> <project type="eu" id="3"> name-content = element first { text }, TONES � element middle { text }?, </project> � element first { text } </person> project-content = element project { <person>.... attribute type { text }, </people> � attribute id {text}, � text }+} 65 Sunday, 21 October 2012 65

Relax NG: structure validation summary • Relax NG’s specification of structure differs from DTDs and XSD: – grammar oriented – 2 syntaxes with automatic translation – flexible: we can gather different aspects of elements into different patterns – unconstrained: no constraints regarding unambiguity/1-ambiguity/deterministic content model/Unique Particle Constraints/Element Declarations Consistent – like for XSD, we have an “ALL” construct for unordered groups, “interleave” &: here, the patterns must appear in the specified order, (except for attributes, which are allowed to appear in any order here, the patterns can in the start tag): appear any order: element person { element person { attribute age { text}, attribute age { text } & attribute phone { text}, attribute phone { text} & name-element , name-element & address-element+ , address-element+ & project-element*} project-element*} 66 Sunday, 21 October 2012 66

Translating Relax NG into tree grammars by example 1 grammar { Translate into G=(N, Σ , S, P) with start = AddressBook N = {AddressBook, Card, Inline, Name, AddressBook = element addressBook { Card* } Email, Pcdata} Card = element card { Inline } Σ = {addressBook, card, name, email, pcdata} Inline = Name, Email+ S = {AddressBook} Name = element name { text } P = {AddressBook → addressBook Card*, Email = element email { text } } Card → card Inline, Inline → Name, Email+, Name → name Pcdata, Email → email Pcdata, Pcdata → pcdata ϵ } “element y” ➟ y ∈ Σ ...possibly also “uppercased copy” ➟ Y ∈ N all other user defined symbols X ➟ X ∈ N ...translate Relax NG rules easy (depending on Relax NG style) • ...let’s see one more 67 Sunday, 21 October 2012 67

Translating Relax NG into tree grammars by example 2 grammar { start = p-el Translate into G = (N, Σ , S, P) with N = {P-EL, PER-EL, NA-EL, AD-EL, PRO-EL, p-el = element people FIRST, MIDDLE, LAST, Pcdata} { per-el+ } Σ = {people, person, name, first, middle, last, address, project} per-el = element person { Ignore! S = {P-EL} attribute age { text }, P = {P-EL → people PER-EL, PER-EL*, na-el, PER-EL → person ad-el+, NA-EL,AD-EL, AD-EL*,PRO-EL* pro-el*} NA-EL → name FIRST, (MIDDLE| ε ) , LAST, FIRST → first Pcdata, na-el = element name { MIDDLE → middle Pcdata, element first { text }, LAST → last Pcdata, element middle { text }?, AD-EL → address Pcdata, element last { text } } PRO-EL → project Pcdata, Pcdata → pcdata ϵ } ad-el = element address { text } pro-el = element project { attribute type { text }, This Relax NG style makes Ignore! attribute id {text}, text }} translation of rules easy 68 Sunday, 21 October 2012 68

Translating Relax NG into tree grammars by example 3 This Relax NG style makes translation of rules less easy … and leads to generalized rules! grammar { start = element people Translate into G=(N, Σ , S, P) with {people-content} N = {PEOPLE, P-C, PER-C, NA, NA-C, PERSON, PRO-C,ADR, PROJ, PRO-C, people-content = FIRST, MIDDLE,LAST, Pcdata} element person Σ = {people, person, name, first, middle, { person-content }+ last, address, project} S = {PEOPLE} Ignore! person-content = attribute age { text }, P = {PEOPLE → people P-C, expand! element name P-C → PERSON, PERSON*, {name-content}, PERSON → person PER-C, expand! element address { text }+, PER-C → NA, ADR, ADR*,PROJ, element project NA → name NA-C, {project-content}* ADR → address Pcdata, PROJ → project PRO-C, name-content = element first { text }, PRO-C → pcdata ϵ , element middle { text }?, NA-C → FIRST,(MIDDLE| ϵ ),LAST element last { text } FIRST → first Pcdata, MIDDLE → middle Pcdata, project-content = attribute type { text }, LAST → last Pcdata, attribute id {text}, Pcdata → pcdata ϵ } text } Ignore! 69 Sunday, 21 October 2012 69

Translating Relax NG into tree grammars by example 3 ... ... people-content = PERSON → person PER-C, expand! element person PER-C → NA, ADR, ADR*,PROJ, { person-content }+ NA → name NA-C, ..... ADR → address Pcdata, person-content = attribute age { text }, ... element name {name-content}, element address { text }+, element project {project-content}* Two things we have already seen when translating WXS: • that we might need to introduce “generalized” rules -- which can & need to be expanded, as for WXS: for each illegal rule X → e: – remove X → e from rule set – replace all occurrences of X in rule set with e • we might have to “contextualise” names and types of elements: ... 70 Sunday, 21 October 2012 70

Translating Relax NG into tree grammars by example 4 2. we might have to “contextualise” names and types of elements, to handle schemas where the same element name is used in different contexts with different types: ... ... people-content = P-C → PERSON, PERSON*,FRIEND,FRIEND* element person PERSON → person PER-C, { person-content }+, FRIEND → friend FRIE-C, element friend PER-C → NA^NA-C, ... {friend-content }+ FRIE-C → NA^FRIE-NA-C, ... ..... NA^NA-C → name NA-C, person-content = attribute age { text }, element name NA^FRIE-NA-C → name FRIE-NA-C, {name-content}, ... ... friend-content = attribute age { text }, element name {friend-name-content}, ... 71 Sunday, 21 October 2012 71

Translating Relax NG into tree grammars • each Relax NG schema can be faithfully translated into a tree grammar: – local? no : example on previous slide leads to competing non-terminals (NA^PER-C and NA^FRIE-C) ... NA^PER-C → name NA-C, NA^FRIE-C → name NA-C, – single-type? no : see example below ... NA^NA-C and NA^FO-NA-C compete and occur in the same RHS ... ... person-content = attribute age { text }, PER-C → NA^NA-C | NA^FO-NA-C element name NA^NA-C → name NA-C, {name-content} | NA^FO-NA-C → name FO-NA-C, element name ... {foreign-name-content}, ... – so is Relax NG as powerful as tree grammars? 72 Sunday, 21 October 2012 72

Relax NG schema is indeed as powerful as tree grammars ★ Every tree grammar can be faithfully translated into a Relax NG schema. • Proof (not too hard): given a tree grammar G = (N, Σ , S, P), 1. translate each production rule N → t regexp in P into N = element t { regexp } (fortunately, the tree grammar regular expression syntax is very close to and more strict than Relax NG regular expression syntax) 2. Put the resulting statements into a grammar, where N 1 , ... , N k are all start symbols, i.e., S = {N 1 , ... , N k } grammar {start = N 1 | ... | N k ..... } 3. Call the resulting schema G S ★ Then T ∈ L(G) if and only if T validates against G S. 73 Sunday, 21 October 2012 73

Tree Grammars and Schema Languages with our Loc Relax NG WXS DTD Reg ST knowledge 74 Sunday, 21 October 2012 74

Outlook: next steps Loc Reg ST ✓ we have now seen that • DTDs � local grammars • WXS � single-type grammars • RelaxNG � regular grammars ➡ DTDs are structurally weaker than WXS ➡ DTDs are structurally weaker than WXS • we will also look into how computationally expensive validation is • against DTD/local grammar • against WXS/single-type grammar • against RelaxNG/regular grammar ➡ ...all roughly the same! 75 Sunday, 21 October 2012 75

How costly is validaty testing? … Does it matter against which kind of schema? … Is Single-Type cheaper than general? 76 Sunday, 21 October 2012 76

See the paper by Murata, Lee, Mani, Kawaguchi Schema Languages and Tree Grammars • We will look at: – the problem of validating a document against a schema! – algorithms for “yes”, if T ∈ L(G) Tree T algorithm Grammar G “no”, otherwise 77 Sunday, 21 October 2012 77

“yes”, if T ∈ L(G) Tree T ValAlgo Grammar G “no”, otherwise • To design our “schema validator”, 1. we start with the easy case: assume that G is local (this gives us automatically a validator for structural aspect of DTDs) 2. then expand algorithm to single-type (this gives us automatically a validator for structural aspect of WXS) 3. then expand to general tree grammars (...Relax NG) – we also assume that we have a subroutine “yes”, if w ∈ L(e), (w matches e) String w MatchAlgo regular expression e “no”, otherwise – to see how to build that one (it’s based on a translation of regular expressions into finite state machines (aka automata)), consult • remember your undergraduate studies (?) • read it up, e.g., in the textbook by Hopcroft, Ullman 78 Sunday, 21 October 2012 78

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise let’s start simple! Loc Reg ST 79 Sunday, 21 October 2012 79

General idea of algorithm • our algorithm visits a tree in a depth-first, left-2-to-right manner • whenever we visit a node on our way Traverse T in a depth-first, left-2-to-right manner – down, we When an element E is visited on way down , push relevant information if there is a production rule N → a e in for this node on stacks P with a = E’s tag name then push N → a e onto R and – up, we push ϵ onto NT pop relevant information else report “not accepted” and stop for this node from stacks When an element E is visited on way up, • hence, whenever we are at a pop a rule N → a e out of R node n during this traversal, all pop a string of non-terminals w out of NT relevant information regarding if w matches e all ancestors of n are (in reverse then pop a string w’ of non-terminals order), on our stacks out of NT and push w’N onto NT else report “not accepted” and stop 80 Sunday, 21 October 2012 80

See the paper by Murata, Lee, Mani, Kawaguchi “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise Input : DOM Tree for T, local tree grammar G = (N, Σ , S, P), NT is a stack of strings of non-terminals to store NTs of child nodes R is a stack of production rules locality Traverse T in a depth-first, left-2-to-right manner ⇒ unique When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop store rule for E’s content in R start remembering E’s child nodes When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT retrieve rule for E’s content in R retrieve E’s child nodes if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT else report “not accepted” and stop report “accepted” and stop add E’s terminal node to its predecessor siblings 81 Sunday, 21 October 2012 81

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, b b B → b (C,C)|C, C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop Stack of When an element E is visited on way up, rules pop a rule N → a e out of R pop a string of non-terminals w out of NT Stack of if w matches e NT strings then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 82 Sunday, 21 October 2012 82

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals S → a B,B* ϵ out of NT and push w’N onto NT R NT else report “not accepted” and stop 15 83 Sunday, 21 October 2012 83

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e ϵ B → b (C,C)|C S then pop a string w’ of non-terminals ϵ → a B,B* out of NT and push w’N onto NT R NT else report “not accepted” and stop 84 Sunday, 21 October 2012 84

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R C → c ϵ |C ϵ pop a string of non-terminals w out of NT B → b (C,C)|C S ϵ if w matches e → a B,B* ϵ then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 85 Sunday, 21 October 2012 85

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name yes, ϵ ∈ L( ϵ |C) then push N → a e onto R and push ϵ onto NT C → c ϵ |C ϵ else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT B → b (C,C)|C S ϵ if w matches e → a B,B* ϵ then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 86 Sunday, 21 October 2012 86

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT C → c ϵ |C else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT B → b (C,C)|C S C if w matches e → a B,B* ϵ then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 87 Sunday, 21 October 2012 87

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R C → c ϵ |C ϵ pop a string of non-terminals w out of NT B → b (C,C)|C S C if w matches e → a B,B* ϵ ϵ then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 88 Sunday, 21 October 2012 88

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name yes, ϵ ∈ L( ϵ |C) then push N → a e onto R and push ϵ onto NT C → c ϵ |C ϵ else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT B → b (C,C)|C S C if w matches e → a B,B* ϵ ϵ then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 89 Sunday, 21 October 2012 89

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT C → c ϵ |C else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT B → b (C,C)|C S CC if w matches e → a B,B* ϵ then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 90 Sunday, 21 October 2012 90

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in yes, CC ∈ L((C,C)|C) P with a = E’s tag name then push N → a e onto R and push ϵ onto NT B → b (C,C)|C CC else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e ϵ then pop a string w’ of non-terminals S → a B,B* out of NT and push w’N onto NT R NT else report “not accepted” and stop 91 Sunday, 21 October 2012 91

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e B then pop a string w’ of non-terminals S → a B,B* out of NT and push w’N onto NT R NT else report “not accepted” and stop 92 Sunday, 21 October 2012 92

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT B → b (C,C)|C S ϵ if w matches e → a B,B* B then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 93 Sunday, 21 October 2012 93

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R C → c ϵ |C ϵ pop a string of non-terminals w out of NT B → b (C,C)|C S ϵ if w matches e → a B,B* B then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 94 Sunday, 21 October 2012 94

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name yes, ϵ ∈ L( ϵ |C) then push N → a e onto R and push ϵ onto NT C → c ϵ |C ϵ else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT B → b (C,C)|C S ϵ if w matches e → a B,B* B then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 95 Sunday, 21 October 2012 95

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT C → c ϵ |C else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT B → b (C,C)|C S C if w matches e → a B,B* B then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 96 Sunday, 21 October 2012 96

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in yes, C ∈ L((C,C)|C) P with a = E’s tag name then push N → a e onto R and push ϵ onto NT B → b (C,C)|C C else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e S → a B,B* B then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 97 Sunday, 21 October 2012 97

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT B → b (C,C)|C else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e S → a B,B* BB then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 98 Sunday, 21 October 2012 98

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} c c c Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , if there is a production rule N → a e in yes, BB ∈ L(B,B*) P with a = E’s tag name then push N → a e onto R and push ϵ onto NT S → a B,B* BB else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT R NT else report “not accepted” and stop 99 Sunday, 21 October 2012 99

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo local Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*, B → b (C,C)|C, b b C → c ϵ |C} Traverse T in a depth-first, left-2-to-right manner c c c When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name then push N → a e onto R and push ϵ onto NT “accepted” (“yes”), T ∈ L(G) else report “not accepted” and stop When an element E is visited on way up, pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT else report “not accepted” and stop R NT ☜ Check slide 74 report “accepted” and stop 100 Sunday, 21 October 2012 100

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML - PowerPoint PPT Presentation

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT Bijan Parsia and Uli Sattler University of Manchester 1 Sunday, 21 October 2012 1 Datatypes and representations Or, are you my type? 2

Datatypes and Patterns Datatypes Amtoft from Hatcliff Type Names Datatypes Patterns Local

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Semi-structured data Data is not just text, but is not as well- Semi-structured data

COMP60411 Semi-structured Data and the Web Validating Trees against Tree Grammars The Essence of

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

COMP60411 Modelling Data On The Web Tim Morris & Uli Sattler Week 1 Introduction, Data

Advanced Parallel Programming Derived Datatypes Dr David Henty HPC Training and Support Manager

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Chapter 8 Properties of counters Abstract Datatypes We might also require that our

Random Testing of Purely Functional Abstract Datatypes Stefan Holdermans Abstract datatypes

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris & Uli Sattler

COMP60411 Modelling Data on the Web More error handling & RDF, a graph-based DM

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4

XML Applications Prof. Andrea Omicini DEIS, Ingegneria Due Alma Mater Studiorum, Universit di

Electronic Business Documents UBL v2.1 Tools and Techniques 1 (2014-11-22 13:50z) OASIS

1 Language processing Query Graph Model Two stages: compilation and Vertices execution

Language-Oriented Programming Principles of Programming Languages Colorado School of Mines

558 559 560 561 562 A simple approach to building a queue would be to insert a task into a

ID2208 Programming Web Services Homework 1 - XML Processing Kim Hammar (kimham@kth.se) Cosar

New Directions for Web Applications Dave Raggett, Canon, TV Raman, IBM 1/11 Web Applications

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML - PowerPoint PPT Presentation

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT Bijan Parsia and Uli Sattler University of Manchester 1 Sunday, 21 October 2012 1 Datatypes and representations Or, are you my type? 2

Datatypes and Patterns Datatypes Amtoft from Hatcliff Type Names Datatypes Patterns Local

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Semi-structured data Data is not just text, but is not as well- Semi-structured data

COMP60411 Semi-structured Data and the Web Validating Trees against Tree Grammars The Essence of

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

COMP60411 Modelling Data On The Web Tim Morris &amp; Uli Sattler Week 1 Introduction, Data

Advanced Parallel Programming Derived Datatypes Dr David Henty HPC Training and Support Manager

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Chapter 8 Properties of counters Abstract Datatypes We might also require that our

Random Testing of Purely Functional Abstract Datatypes Stefan Holdermans Abstract datatypes

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris &amp; Uli Sattler

COMP60411 Modelling Data on the Web More error handling &amp; RDF, a graph-based DM

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &amp;

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness &amp; Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness &amp; Errors Week 4

XML Applications Prof. Andrea Omicini DEIS, Ingegneria Due Alma Mater Studiorum, Universit di

Electronic Business Documents UBL v2.1 Tools and Techniques 1 (2014-11-22 13:50z) OASIS

1 Language processing Query Graph Model Two stages: compilation and Vertices execution

Language-Oriented Programming Principles of Programming Languages Colorado School of Mines

558 559 560 561 562 A simple approach to building a queue would be to insert a task into a

ID2208 Programming Web Services Homework 1 - XML Processing Kim Hammar (kimham@kth.se) Cosar

New Directions for Web Applications Dave Raggett, Canon, TV Raman, IBM 1/11 Web Applications

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

COMP60411 Modelling Data On The Web Tim Morris & Uli Sattler Week 1 Introduction, Data

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris & Uli Sattler

COMP60411 Modelling Data on the Web More error handling & RDF, a graph-based DM

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4