COMP6037 Read Blackboards Announcements Read Blackboards - - PowerPoint PPT Presentation

comp6037
SMART_READER_LITE
LIVE PREVIEW

COMP6037 Read Blackboards Announcements Read Blackboards - - PowerPoint PPT Presentation

General Stuff COMP6037 Read Blackboards Announcements Read Blackboards Discussions Semi-structured Data and the Web Forward your Blackboard email to your email account Relax NG and Tree Grammars and XSLT 4.1 U.


slide-1
SLIDE 1

1

COMP6037 Semi-structured Data and the Web Relax NG and Tree Grammars and XSLT 4.1

  • U. Sattler

University of Manchester

General Stuff

  • Read Blackboard’s Announcements
  • Read Blackboard’s Discussions
  • Forward your Blackboard email to your email account
  • 2

Schema Languages and Tree Grammars

  • Last week, we have seen how to translate schema languages in tree

grammars: we saw that – each DTD can be faithfully translated into a local tree grammar, and therefor in a single-type one

  • hence each DTD corresponds to a single-type grammar
  • hence there is exactly 1 PSVI for each document that validates

against a DTD – each XML schema can be faithfully translated into a single-type tree grammar,

  • hence there is exactly 1 PSVI for each document that validates

against an XML schema

  • ...we also saw that parts of the UPA constraint helps to generate

PSVI: do we need other parts?

3

Loc ST Reg Schema Languages and Tree Grammars

  • Last week, we have seen how to translate schema languages in tree

grammars: we saw that – each DTD can be faithfully translated into a local tree grammar, and therefor in a single-type one

  • hence each DTD corresponds to a single-type grammar
  • hence there is exactly 1 PSVI for each document that validates

against a DTD – each XML schema can be faithfully translated into a single-type tree grammar,

  • hence there is exactly 1 PSVI for each document that validates

against an XML schema

  • ...we also saw that parts of the UPA constraint helps to generate

PSVI: do we need other parts?

4

Loc ST Reg

Unique Particle Attribution Constraint: A content model must be formed such that

  • during validation of an element information item

sequence (child node sequence, cns),

  • the particle component contained directly, indirectly or

implicitly therein (in the content model)

  • with which to attempt to validate each item in the

sequence (cns) in turn can be uniquely determined

  • without examining the content or attributes of that item

(element name suffice), and without any information about the items in the remainder of the sequence (no look-ahead into rest of cns). is unnecessary

slide-2
SLIDE 2

Schema Languages and Tree Grammars

  • Last week, we have also learned about a third, flexible, liberal schema

language, Relax NG – but we haven’t translated Relax NG schemas yet into tree grammars – so we don’t know whether being more liberal gives more than single- type

  • Also, the whole approach brings another opportunity:

– we can investigate the problem of – we can discuss algorithms for

  • h, and we will learn about XSLT today as well

5

validating a document against a tree grammar algorithm Tree T Grammar G “yes”, if T L(G) “no”, otherwise See the paper by Murata, Lee, Mani, Kawaguchi

Translating Relax NG schema into tree grammars by example 1

  • ...let’s see one more

6 grammar { start = AddressBook AddressBook = element addressBook { Card* } Card = element card { Inline } Inline = Name, Email+ Name = element name { text } Email = element email { text } }

Translate into G=(N, , S, P) with N = {AddressBook, Card, Inline, Name, Email, Pcdata} = {addressBook, card, name, email, pcdata} S = {AddressBook} P = {AddressBook addressBook Card*, Card card Inline, Inline Name, Email+, Name name Pcdata, Email email Pcdata, Pcdata pcdata }

“element y” y ...possibly also “uppercased copy” Y N all other user defined symbols X X N ...translate Relax NG rules easy (depending on Relax NG style)

Translating Relax NG schema into tree grammars by example 2

7 5

grammar { start = p-el p-el = element people { per-el+ } per-el = element person { attribute age { text }, na-el, ad-el+, pro-el*} na-el = element name { element first { text }, element middle { text }?, element last { text } } ad-el = element address { text } pro-el = element project { attribute type { text }, attribute id {text}, text }} Translate into G = (N, , S, P) with N = {P-EL, PER-EL, NA-EL, AD-EL, PRO-EL, FIRST, MIDDLE, LAST, Pcdata} = {people, person, name, first, middle, last, address, project} S = {P-EL} P = {P-EL people PER-EL, PER-EL*, PER-EL person NA-EL,AD-EL, AD-EL*,PRO-EL* NA-EL name FIRST, (MIDDLE|), LAST, FIRST first Pcdata, MIDDLE middle Pcdata, LAST last Pcdata, AD-EL address Pcdata, PRO-EL project Pcdata, Pcdata pcdata }

Ignore! Ignore! This Relax NG style makes translation of rules easy

Translating Relax NG schema into tree grammars by example 3

8

grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text }, element name {name-content}, element address { text }+, element project {project-content}* name-content = element first { text }, element middle { text }?, element last { text } project-content = attribute type { text }, attribute id {text}, text } Translate into G=(N, , S, P) with N = {PEOPLE, P-C, PER-C, NA, NA-C, PERSON, PRO-C,ADR, PROJ, PRO-C, FIRST, MIDDLE,LAST, Pcdata} = {people, person, name, first, middle, last, address, project} S = {PEOPLE} P = {PEOPLE people P-C, P-C PERSON, PERSON*, PERSON person PER-C, PER-C NA, ADR, ADR*,PROJ, NA name NA-C, ADR address Pcdata, PROJ project PRO-C, PRO-C pcdata , NA-C FIRST,(MIDDLE|),LAST FIRST first Pcdata, MIDDLE middle Pcdata, LAST last Pcdata, Pcdata pcdata }

Ignore! Ignore! expand! expand!

This Relax NG style makes translation of rules less easy… and leads to generalized rules!

slide-3
SLIDE 3

Translating Relax NG schema into tree grammars by example 3

Two things we have already seen when translating WXS:

  • that we might need to introduce “generalized” rules -- which can & need to

be expanded, as for WXS:

  • we might have to “contextualise” names and types of elements: ...

9 ... people-content = element person { person-content }+ ..... person-content = attribute age { text }, element name {name-content}, element address { text }+, element project {project-content}*

... PERSON person PER-C, PER-C NA, ADR, ADR*,PROJ, NA name NA-C, ADR address Pcdata, ...

expand!

for each illegal rule X e: – remove X e from rule set – replace all occurrences of X in rule set with e

Translating Relax NG schema into tree grammars by example 4

10 ... people-content = element person { person-content }+, element friend {friend-content }+ ..... person-content = attribute age { text }, element name {name-content}, ... friend-content = attribute age { text }, element name {friend-name-content}, ... ... P-C PERSON, PERSON*,FRIEND,FRIEND* PERSON person PER-C, FRIEND friend FRIE-C, PER-C NA^NA-C, ... FRIE-C NA^FRIE-NA-C, ... NA^NA-C name NA-C, NA^FRIE-NA-C name FRIE-NA-C, ...

  • 2. we might have to “contextualise” names and types of elements,

to handle schemas where the same element name is used in different contexts with different types:

Translating Relax NG schema into tree grammars

  • each Relax NG schema can be faithfully translated into a tree grammar:

– local? no: example on previous slide leads to competing non-terminals (NA^PER-C and NA^FRIE-C) – single-type? no: see example below NA^NA-C and NA^FO-NA-C compete and occur in the same RHS – so is Relax NG as powerful as tree grammars?

11 ... NA^PER-C name NA-C, NA^FRIE-C name NA-C, ... ... person-content = attribute age { text }, element name {name-content} | element name {foreign-name-content}, ... ... PER-C NA^NA-C | NA^FO-NA-C NA^NA-C name NA-C, NA^FO-NA-C name FO-NA-C, ...

Relax NG schema is indeed as powerful as tree grammars

Every tree grammar can be faithfully translated into a Relax NG schema.

  • Proof (not too hard): given a tree grammar G = (N, , S, P),
  • 1. translate each production rule N t regexp in P into

(fortunately, the tree grammar regular expression syntax is very close to and more strict than Relax NG regular expression syntax)

  • 2. Put the resulting statements into

a grammar, where N1 , ... , Nk are all start symbols, i.e., S = {N1 , ... , Nk}

  • 3. Call the resulting schema GS

Then T L(G) if and only if T validates against GS.

12

N = element t { regexp } grammar {start = N1 | ... | Nk ..... }

slide-4
SLIDE 4

Tree Grammars and Schema Languages for XML

  • Harvest Time!
  • but then, isn’t validation of an XML document against a Relax NG schema

really complicated and complex (i.e., space and/or time consuming)?

  • perhaps it’s even undecidable or intractable?
  • ...we know: testing whether T validates against a Relax NG schema S is as

difficult as testing whether T L(GS).

  • ...so, how difficult is it?

13

Loc ST Reg DTD WXS Relax NG

with our knowledge ValAlgo Tree T Grammar G “yes”, if T L(G) “no”, otherwise

  • To design our “tree recogniser”,
  • 1. we start with the easy case: assume that G is local

(this gives us automatically a validator for structural aspect of DTDs)

  • 2. then expand algorithm to single-type

(this gives us automatically a validator for structural aspect of WXS)

  • 3. then expand to general tree grammars (...Relax NG)

– we also assume that we have a subroutine – ...if time permits, we will see later how to build that one (it’s based on a translation of regular expressions into finite state machines (aka automata), otherwise

  • remember your undergraduate studies (?)
  • read it up, e.g., in the textbook by Hopcroft, Ullman

14

ValAlgo Tree T Grammar G “yes”, if T L(G) “no”, otherwise MatchAlgo String w regular expression e “yes”, if w L(e), (w matches e) “no”, otherwise Input: DOM Tree for T, local tree grammar G = (N, , S, P), NT is a stack of strings of non-terminals R is a stack of production rules Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT else report “not accepted” and stop report “accepted” and stop

15

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise

See the paper by Murata, Lee, Mani, Kawaguchi

locality ⇒ unique rule!

store rule for E’s content in R start remembering E’s child nodes retrieve rule for E’s content in R retrieve E’s child nodes add E’s terminal node to its predecessor siblings to store NTs of child nodes

Stacks and tree traversal, observations

  • ur algorithm visits a tree in a depth-first, left-2-to-right manner
  • whenever we visit a node
  • n our way

– down, we push relevant information for this node on stacks – up, we pop relevant information for this node from stacks

  • hence, whenever we are at a

node n during this traversal, all relevant information regarding all ancestors of n are (in reverse

  • rder), on our stacks

16 Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

slide-5
SLIDE 5
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

17

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

R NT Stack of rules Stack of NT strings

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

18 15

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT S a B,B*

  • Traverse T in a depth-first, left-2-to-right manner

When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

19

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT B b (C,C)|C S a B,B*

  • Traverse T in a depth-first, left-2-to-right manner

When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

20

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT C c |C B b (C,C)|C S a B,B*

  • Traverse T in a depth-first, left-2-to-right manner

When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

slide-6
SLIDE 6
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

21

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT B b (C,C)|C S a B,B*

  • C c |C
  • yes, L(|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

22

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT B b (C,C)|C S a B,B* C

  • C c |C

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

23

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT C

  • C c |C

B b (C,C)|C S a B,B*

  • Traverse T in a depth-first, left-2-to-right manner

When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

24

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT C

  • B b (C,C)|C

S a B,B*

  • C c |C
  • yes, L(|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

slide-7
SLIDE 7
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

25

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT B b (C,C)|C S a B,B* CC

  • C c |C

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

26

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT S a B,B*

  • B b (C,C)|C

CC yes, CC L((C,C)|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

27

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT S a B,B* B

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

28

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT B b (C,C)|C S a B,B*

  • B

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

slide-8
SLIDE 8
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

29

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT C c |C B b (C,C)|C S a B,B*

  • B

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

30

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT B b (C,C)|C S a B,B*

  • B

C c |C

  • yes, L(|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

31

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT B b (C,C)|C S a B,B* C B C c |C

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

32

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT S a B,B* B B b (C,C)|C C yes, C L((C,C)|C)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

slide-9
SLIDE 9
  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

33

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

R NT S a B,B* BB B b (C,C)|C

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

34

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT BB S a B,B* yes, BB L(B,B*)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C)|C, C c |C}

35

ValAlgo XML doc/Tree T local Grammar G “yes”, if T L(G) “no”, otherwise a c c b c b R NT “accepted” (“yes”), T L(G)

Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals

  • ut of NT and push w’N onto NT

else report “not accepted” and stop report “accepted” and stop

☜ Check slide 14

Validating trees against tree grammars

  • want to implement this algorithm?

– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion

  • now, this was for local tree grammars, let’s see how this works for

– single-type tree grammars

  • rather straightforward because we still only have at most one run of
  • ur tree grammar on the input tree
  • remember: single-type means that no start symbols compete and no

RHS of any rule contains competing non-terminals

  • so we won’t need to guess production rules, just check content

model of predecessor node – general tree grammars ...

36

slide-10
SLIDE 10

add E’s terminal node to its predecessor siblings

Input: DOM Tree for T, single-type tree grammar G = (N, , S, P), NT is a stack of strings of non-terminals R is a stack of production rules Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N a e onto R and push onto NT else report “not accepted” and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT else report “not accepted” and stop report “accepted” and stop

37

ValAlgo XML doc/Tree single-type Grammar “yes”, if T “no”,

See the paper by Murata, Lee, Mani, Kawaguchi store rule for E’s content in R start remembering E’s child nodes retrieve rule for E’s content in R retrieve E’s child nodes

single- type ⇒ unique

  • Let’s see how algorithm works:

– G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*,D B b (C,C)|C, C c |C, D c C,C,C} – ...in order to know which production rule N c ... to chose for nodes labelled c, I need to check rule for predecessor and ensure that N

  • ccurs in RHS chosen for them...

38

ValAlgo XML doc/Tree single-type Grammar “yes”, if T “no”,

When an element E is visited on way down, if there is a production rule N a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N a e onto R and push onto NT else report “not accepted” and stop

a c c d c b

  • want to implement this algorithm? Again, as for local tree grammars,

– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion

  • now, this was for single-type tree grammars, let’s see how this works for

– general tree grammars

  • we can have competing non-terminal symbols in RHS of rules
  • how do we know with which to continue?
  • try/guess one and, if failed, backtrack?
  • or by keeping track of all possibilities

– and, as long as we have some, everything is fine.. – which means we need some more stacks for track keeping...

39

ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”, Input: DOM Tree for T, a tree grammar G = (N, , S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules NS is a stack of sets of non-terminals, init with S Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with

40

ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

store non-terminals from RHS

  • f possibly applicable rules

we don’t know which to use!

slide-11
SLIDE 11

Input: DOM Tree for T, a tree grammar G = (N, , S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules NS is a stack of sets of non-terminals, init with S Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop

When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

41

ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

store non-terminals from RHS

  • f possibly applicable rules
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

42

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} a a a a a

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

43

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a

  • {, , }

RS = {, , } NS {A,B,C} {A,B,C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

44

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a

  • {, , }

RS = {, , } NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}
slide-12
SLIDE 12
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

45

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a

  • {, , }

RS = {, , } NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , }

  • {A,B,C}
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

46

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a

  • {, , }

RS = {, , } NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , }

  • {A,B,C}

{, , }

  • {A,B,C}
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

47

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , }

  • {A,B,C}

RS = {, , } = W1...Wk {A,B,C} W = {A,C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

48

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , } {A,C} {A,B,C} RS = {, , } = W1...Wk W = {A,C}

slide-13
SLIDE 13
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

49

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , } {A,C} {A,B,C} RS = {, , } {A,B,C} {, , }

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

50

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS {A,B,C} {A,B,C} a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , } {A,C} {A,B,C} {A,B,C} RS = {, , } = W1...Wk W = {A,C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

51

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , } {A,C},{A,C} {A,B,C} RS = {, , } = W1...Wk W = {A,C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

52

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , } {A,C},{A,C} {A,B,C} RS = {, , } {, , }

  • {A,B,C}
slide-14
SLIDE 14
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

53

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , } {A,C},{A,C} {A,B,C} {A,B,C} RS = {, , } = W1...Wk W = {A,C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

54

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{, , } {A,C},{A,C},{A,C} {A,B,C} RS = {, , } = W1...Wk W = {A,C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

55

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , }

  • {A,B,C}

{A,B,C} RS = {, , } {A,C},{A,C},{A,C} = W1...W3 W = {C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

56

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , } {C} {A,B,C} RS = {, , } {A,C},{A,C},{A,C} = W1...W3 W = {C}

slide-15
SLIDE 15
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

57

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , } {C} {A,B,C} RS = {, , } {, , }

  • {A,B,C}
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

58

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , } {C} {A,B,C} RS = {, , } {A,B,C} = W1...Wk W = {A,C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

59

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {, , } {C},{A,C} {A,B,C} RS = {, , } = W1...Wk W = {A,C}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

60

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {, , }

NS {A,B,C} {A,B,C} {A,B,C} RS = {, , } {C},{A,C} = W1...Wk W = {B}

slide-16
SLIDE 16
  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

61

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • {B}

{, , } NS {A,B,C} {A,B,C} RS = {, , } {C},{A,C} = W1...Wk W = {B}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

62

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • NS

{A,B,C} {A,B,C} RS = {, , } {B} = W1...Wk W = {A}

  • Let’s see how algorithm works:

– G = ({A,B,C},{a},{A,B,C},P) with P = { A a B|, B a (C,C), C a (A,A,A)|}

63

a a R NT ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

When an element E is visited on way down, set RS to the set of production rules N a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop

NS a a a a a

  • NS

{A,B,C} “accepted”/“yes”, T is accepted by G

  • want to implement this algorithm? Again, as for single-type tree grammars,

– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion

  • Insights gained?
  • Validating general tree grammars
  • dooes not require guessing & backtrack
  • can be implemented in a streaming way
  • is a bit more tricky than validating single-type grammars,
  • but not really more complex (in terms of time/space)
  • so, for validating purposes, restrictions to single-type is not

necessary – feel free to describe structure in a powerful way!

  • but, for uniqueness of PSVI,

– we need single-type

64

ValAlgo XML doc/Tree any tree Grammar G “yes”, if T “no”,

slide-17
SLIDE 17

Assignments

  • in your assignments for next week,

– I give you a tree grammar G & a tree T, and – ask you to run the right algorithm on them to check whether G accepts T

  • you need to pick the right one depending on the type of G

– ask you to report stack content step by step

  • in csv for ease of preparation (sorry, couldn’t think of anything easier

and don’t want to mark hand-drawn tables)

  • feel free to use numbering of rules as in the last example and
  • replace “FirstStackName” etc. with the names from the algorithms
  • ask if you don’t understand

65 66

XSLT: general stuff

  • XSLT 1.0 is a W3C standard since 1999

– see http://www.w3.org/TR/xslt – makes heavy use of XPath 1.0

  • XSLT 2.0 is a W3C standard since January 2007

– see http://www.w3.org/TR/xslt20 – makes heavy use of XPath 2.0

  • is a descendant of the style-sheet language XSL, but has become

independent

  • is a Turing-complete, functional programming language, designed

for the transformation of XML documents into Unicode-streams, where transformation includes the – selection of parts of the source document, – their re-arrangement, and – the derivation of new content

67

XSLT: general stuff -- what is XSLT designed for?

XSLT is designed for use as part of XSL, which is a stylesheet language for

  • XML. In addition to XSLT, XSL includes an XML vocabulary for specifying
  • formatting. XSL specifies the styling of an XML document by using XSLT to

describe how the document is transformed into another XML document that uses the formatting vocabulary. XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation

  • language. Rather it is designed primarily for the kinds of transformations

that are needed when XSLT is used as part of XSL.

from: http://www.w3.org/TR/xslt

68

XSL = XSLT XSL-FO

  • XSL, the eXtensible Stylesheet Language, consists of 2 parts:

– XSLTransformations, which we will discuss here, and – XSL Formatting Objects, an XML-based formalism to describe the layout of a document

  • ften, we

– first use XSLT to transform, filter & shape an XML document, and – then use XSL-FO or CSS to specify its layout, e.g.,

  • margins, text boxes, padding, footers, etc.

– currently, XSL-FO is (in contrast to CSS) little used & little supported – but XSL-FO is a bit more powerful than CSS…

slide-18
SLIDE 18

69

XSLT: stylesheet

  • an XSLT stylesheet is a

– well-formed XML document – conforming to the XML namespaces – which uses elements from the namespace http://www.w3.org/1999/XSL/Transform – using traditionally “xsl” as a prefix for this namespace as in <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/> – whose root element is of type xsl:transform or xsl:stylesheet

  • stylesheet is a synonym for transformation, both in documentations and in

XML documents, i.e., a transformation can have root element – xsl:transform or – xsl:stylesheet

Is XSLT Schema Aware?

  • Information from a schema can be used both

– statically: when the stylesheet is compiled, and – dynamically: during evaluation of the stylesheet to transform a source document.

  • In a stylesheet (e.g., in XPath expressions and patterns), we may refer to

named types from a schema (e.g., Person from

<xs:complexType name="Person">)

  • The conformance rules for XSLT 2.0 distinguish between a

– basic XSLT processor and a – schema-aware XSLT processor – in <oXygen>, you have both

  • Helpful: http://www.ibm.com/developerworks/xml/library/x-schemaxslt.html

70 71

XSLT: stylesheet

  • a stylesheet describes/tells an XSLT processor how to transform a
  • via XML template rules which associate
  • which are then used by an XSLT processor as follows:

result tree (or text) source tree into a templates patterns with instantiate corresponding template to create parts of the result tree match pattern against elements in source tree

72

XSLT: stylesheet

<xsl:stylesheet version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”> top-level-elements </xsl:stylesheet>

Alternatively: <xsl:transform version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”> top-level-elements </xsl:transform>

An xsl:stylesheet can have zero or more of each of the following elements in (almost) any order:

xsl:import xsl:include xsl:strip-space xsl:preserve-space xsl:output xsl:key xsl:decimal-format xsl:namespace-alias xsl:attribute-set xsl:variable xsl:param xsl:template later and in more detail

slide-19
SLIDE 19

73

XSLT: top-level elements

  • xsl:include: a multi-document or modularity mechanism

– results in the union of the documents – origin of declarations has no effect on their priority

  • xsl:import: a slightly different multi-document or modularity mechanism

– like xsl:include, but local declarations are “stronger” than imported ones – can only occur as child nodes of the root element & before other elements

  • xsl:strip-space and xsl:preserve-space

– to specify element names for which white space should be removed/ preserved – preserving white space is default – e.g., <xsl:strip-space elements="year city country"/> <xsl:preserve-space elements=”name title description" />

  • xsl:key

– to declare a named key to be used in the style sheet with the key() function – note: the key does not have to be unique! – e.g., declare <xsl:key name=“mykey" match=”USCit" use="@SSN"/> and use select="key(’mykey',’12345')”

74

XSLT: top-level elements (ctd.)

Very specialized top-level elements

  • xsl:decimal-format

– to specify how to convert numbers into strings

  • xsl:namespace-alias

– to specify namespace replacement

  • xsl:attribute-set

– to name sets of attributes to be used in output

75

XSLT elements: template rule

  • (most important element!) a template rule is of the form

<xsl:template match=“expression” name=“qname” priority=“number” mode=“qname”> parameter-list template-def </xsl:template>

  • parameter-list is a list of zero or more xsl:param elements
  • as expression, an XPath location path can be used

– with some restrictions,e.g., it must evaluate to a node set – for XSLT 1.0, use XPath 1.0, – for XSLT 2.0, use XPath 2.0,

  • template-def is an XML document that makes use of other XSLT elements

– including instructions such as xsl:apply-templates or xsl:copy-of

  • ptional

the pattern the template

76

XSLT elements: template rules

<xsl:template match=expression name = qname priority = number mode = qname> parameter-list template-def </xsl:template>

  • Example: when applied to “<emph>important</emph>”,
  • careful: there

– are various built-in template rules – is a default prioritisation on template rules – is the XSLT processor who fires the templates rules

  • we will see later what elements we can use in template-def

<xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template> <fo:inline-sequence font-weight="bold"> important </fo:inline-sequence>

yields

slide-20
SLIDE 20

77

XSLT elements: processing model, sketched

  • an XSLT processor takes an XML document d with associated stylesheet s
  • processes the (XPath DM) tree (possibly PSVI if SA) corresponding to d
  • in a depth-first manner

– thus we always have a context node

  • applies those template rules to the context node that

– match the context node and – have highest priority

  • thereby generating the result tree according to the template rules
  • the easiest way to generate output is to use literal elements

as the blue and green in the previous example:

<xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template> 78

XSLT elements: processing model by example

consider the following source tree:

<?xml .... ?> root people person person name age=41 address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="my.xsl"?> <people> <person age="41"> <name> <first>Harry</first> <last>Potter</last> </name> <address>4 Main Road </address> </person> <person age="43"> <name> <first>Tony</first> <last>Potter</last> </name> <address>4 Main Road </address> </person> </people>

<?xml .... ?>

...

79

XSLT elements: processing model by example

consider this source tree with the following XSLT stylesheet: what does this seemingly empty (no template rules!) stylesheet produce?

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> </xsl:stylesheet>

80

XSLT elements: processing model by example

(tricky!) the previous stylesheet was only seemingly empty because XSLT processors employ built-in template rules: thus templates are applied to all nodes (element, root, text,..) except attribute and namespace nodes and outputs their text content (via “value-of”) <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="*|/"> <xsl:apply-templates/> </xsl:template> <xsl:template match="text()|@*"> <xsl:value-of select="."/> </xsl:template> <xsl:template match="processing-instruction()|comment()"/> </xsl:stylesheet>

(1) for all element & document nodes (3) for all text and attribute nodes (2) don’t do anything but apply templates to all child nodes (4) return their value (5) ignore p-i & comments

slide-21
SLIDE 21

81

XSLT elements: processing model by example

Built-in template rules: (b) <xsl:template match="*|/"> <xsl:apply-templates select="node()"/> </xsl:template>

this is the default for “apply-templates”, and node() matches all nodes except attribute nodes & root node

(1) <xsl:template match="*|/"> <xsl:apply-templates select="node()|@*"/> </xsl:template> if you want your stylesheet to consider attribute nodes, you must overwrite this default, e.g. like this If we use template rule (1), then it over-rides built-in (b), hence now rules are applied to all nodes (element, root, text,..) including attribute nodes but still except namespace nodes

(node() matches any node other than an attribute node and the root node)

82

XSLT elements: processing model by example

what does this slightly more elaborate stylesheet yield? Note: <xsl:text> superfluous here, but helpful

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "person"> <xsl:text> Person found! </xsl:text> </xsl:template> </xsl:stylesheet>

83

XSLT elements: processing model by example

we can make use “functions” to retrieve the “value” of a node:

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "person"> Person found called: <xsl:value-of select="name"/> </xsl:template> </xsl:stylesheet>

84

XSLT elements: processing model by example

we can conveniently copy a node and its complete sub-tree: alternatively, I could have used <xsl:copy-of select=“*”/> <xsl:copy-of select=“person”/>

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match= "people"> <family> <xsl:copy-of select="child::*"/> </family> </xsl:template> </xsl:stylesheet>

slide-22
SLIDE 22

85

XSLT elements: processing model by example

we can re-name elements and filter out data:

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="person"> <myFriend> <xsl:apply-templates select="@*|*|text()"/> </myFriend> </xsl:template> <xsl:template match="@*|text()|*"> <xsl:copy> <xsl:apply-templates select="@*|text()|*"/> </xsl:copy> </xsl:template> <xsl:template match= "address"/> </xsl:stylesheet>

86

XSLT elements: processing model by example

we can even apply several rules to the same elements using modes for rules:

<?xml .... ?> root people person person name age=41address

Potter 4 Main Road Harry

first last <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:template match="/people"> <html><body><ol> <xsl:apply-templates select="person" mode="o"/> </ol> <xsl:apply-templates select="person" mode="f"/> </body></html> </xsl:template> <xsl:template match="person" mode="o"> <li> <xsl:value-of select="name/first"/> <xsl:value-of select="name/last"/> </li> </xsl:template> <xsl:template match="person" mode="f"> <p> Last name: <xsl:value-of select="name/last"/> Age: <xsl:value-of select="age"/> </p> </xsl:template> </xsl:stylesheet> 87

XSLT instructions: apply-templates

a statement A = <xsl:apply-templates select=location-path mode=mode-name>

  • can only have child nodes of the following two types, but any number of these:

– xsl:with-param to pass parameters into template rules – xsl:sort to sort the children before processing (and thereby to be used to sort the output)

  • location-path is a (restricted) XPath expression that evaluates to a node list,

evaluated from the current node to select a node set S

  • an XSLT proc. applies all A-applicable template rules to all nodes in S

– in either document order or in the one given through xsl:sort children

  • a template rule <xsl:template match=m1 priority = p1 mode = m2> is A-applicable to

a node n if – n is in the node set selected by m1 (in addition to being in S) – (in case that mode is used) m2 = mode-name – and it has highest priority (incl. default, order, and explicit priorities)

88

XSLT instructions: call-template

<xsl:call-template name = qname>

  • like xsl:apply-templates, but

– it requires the “name” attribute – only the template rule called qname (i.e., with name=qname set) is applied – if there is more than one template rule with the same name of the same import level, this will lead to an error

slide-23
SLIDE 23

89

XSLT instructions: value-of

<xsl:value-of select=expression/>

  • is one of the generating instructions provided by XLST
  • it returns, for the first node selected through expression, the string value

that corresponds to that node, where the string value of – a text node is its text – an attribute node is its value – an element or root node is the concatenation of the string values of all its descendant’s text nodes

  • ...all this is a bit more tricky if you use SA XSLT

– because then, we have more than “text” in text nodes, and need to take into types...

90

XSLT elements: generating instructions

  • literal result elements: a simple way to create new nodes, e.g., in

<xsl:template match=”person"> <Employee> <xsl:apply-templates/> </Employee> </xsl:template>

  • <xsl:text>: to produce pure text (and invoke error if elements are

produced), e.g., in <xsl:template match="person"> <xsl:text> Person found! </xsl:text> </xsl:template>

  • <xsl:element name=“qname”>: to create a new element called qname in

the resulte tree, with content the child nodes of that instruction, e.g. in <xsl:template match="person"> <xsl:element name="Employee"> <xsl:apply-templates/> </xsl:element> </xsl:template> handy for producing elements with attributes and namespaces

91

XSLT elements: generating instructions

  • <xsl:attribute> to produce an attribute, e.g., in

<xsl:template match=”person"> <xsl:element name="Employee"> <xsl:attribute name="alter"> <xsl:value-of select=”@age"/> </xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template>

  • (already seen) <xsl:value-of select=expression/> returns, for each node selected

through expression, the string values that corresponds to that node, where the string value of a – text node is its text – attribute node is its value – element or root node is the concatenation of the string values of all its descendent text nodes

92

XSLT elements: generating instructions

  • <xsl:copy-of select=expression> produces a node set selected through expression.

It can be used to reuse fragments of the source document. Careful: – <xslt:value-of> converts fragments into a string before copying it into the result tree – <xslt:copy-of> copies the complete fragment based on the (required) select attribute, without first converting the fragment into a string – e.g., <xsl:template match="people"> <family><xsl:copy-of select="*"/></family> </xsl:template>

  • <xsl:copy use-attribute-sets=“..”> simply copies the current node and then applies

the template (in case it contains a template as child nodes) – namespaces are included automatically in the copy – attributes are not automatically included, they can be included via the “use-attribute-set” attribute

  • <xsl:number> can be used to increase

running numbers -- beyond this class <xsl:template match="people"> <family> <xsl:for-each select="person"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:for-each> </family> </xsl:template>

slide-24
SLIDE 24

93

XSLT elements: conditional processing

  • <xsl:if test=“bool-exp”> does the obvious: if bool-exp evaluates to true, its

child nodes are processed, otherwise nothing. E.g., in <xsl:template match=”person">

<xsl:element name="Employee"> <xsl:if test=”@age > 0"> <xsl:attribute name="alter"> <xsl:value-of select=”@age"/> </xsl:attribute></xsl:if> <xsl:apply-templates/></xsl:element></xsl:template>

  • <xsl:choose> in combination with <xsl:when test=“bool-exp”> and

<xsl:otherwise> provides a useful means for case distinctions. e.g., in <xsl:choose> <xsl:when test="price &gt; 10"> <td bgcolor="#ff00ff"><xsl:value-of select="artist"/></td> </xsl:when> <xsl:otherwise> <td><xsl:value-of select="artist"/></td> </xsl:otherwise> </xsl:choose>

94

XSLT elements: conditional processing, etc.

  • <xsl:for-each select=“expression”> can be used to process all nodes

selected through expression, e.g., in <xsl:template match=“/people”> <table cellspacing="3" cellpadding="3" width="450" border="1"> <tbody> <tr><td>First Name</td><td>Last Name</td><td>Age</td></tr> <xsl:for-each select="person”> <xsl:sort select=“name/last”/> <tr><td><xsl:value-of select="name/first"/></td> <td><xsl:value-of select="name/last"/></td> <td><xsl:value-of select="@age"/></td> </tr> </xsl:for-each> </tbody> </table> </xsl:template>

95

XSLT…

  • many more things are provided by XSLT,
  • you are cordially invited to

– find more about them – experiment with schema awareness

  • see nice features and complications

– experiment with namespaces – (and with SA and namespaces) – get your own experiences using <oXygen/> – have a look, e.g., at the influence of template rules’ order to the result! – think about how one compare XSLT and XQuery

  • their (dis)advantages
  • when would you use/recommend which?
  • do we need both?