1
COMP60411 Semi-structured Data and the Web Validating Trees against Tree Grammars The Essence of XML and Errors
Bijan Parsia and Uli Sattler
University of Manchester
1 Saturday, 29 October 2011
COMP60411 Semi-structured Data and the Web Validating Trees against - - PowerPoint PPT Presentation
COMP60411 Semi-structured Data and the Web Validating Trees against Tree Grammars The Essence of XML and Errors Bijan Parsia and Uli Sattler University of Manchester 1 Saturday, 29 October 2011 1 Last week... ...we have designed our first
1
Bijan Parsia and Uli Sattler
University of Manchester
1 Saturday, 29 October 2011
...we have designed our first “schema validator” algorithm
– walking the DOM tree in a depth-first, left-2-right way, or – using a SAX parser to do it in a streaming fashion
– to keep track of
written on the way down, checked on the way up
with?
2 ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise
local⇒unique!
2 Saturday, 29 October 2011
...we expand the algorithm
– this gives us automatically a validator for structural aspect of WXS – will be rather straightforward
– this gives us automatically a validator for Relax NG schemas – will be more tricky: we’ll still use stacks to keep track of
written on the way down, checked on the way up
with?
– walking the DOM tree in a depth-first, left-2-right way, or – using a SAX parser to do it in a streaming fashion
– which is quite impressive/surprising for general/Relax NG
3 ValAlgo Tree T Grammar G “yes”, if T ∈ L(G) “no”, otherwise
3 Saturday, 29 October 2011
add E’s terminal node to its predecessor siblings 4
ValAlgo XML doc/Tree T single-type Grammar G “yes”, if T ∈ L(G) “no”, otherwise
See the paper by Murata, Lee, Mani, Kawaguchi store rule for E’s content in R start remembering E’s child nodes retrieve rule for E’s content in R retrieve E’s child nodes
Input: DOM Tree for T, single-type tree grammar G = (N, Σ, S, P), NT is a stack of strings of non-terminals R is a stack of production rules Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down,
if there is a production rule N → a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop
When an element E is visited on way up,
pop a rule N → a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT else report “not accepted” and stop
report “accepted” and stop
single-type ⇒ unique rule!
nothing changed
4 Saturday, 29 October 2011
When an element E is visited on way down, if there is a production rule N → a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop
5
a c c b c b
ValAlgo XML doc/Tree T single-type Grammar G “yes”, if T ∈ L(G) “no”, otherwise
– G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*,D B → b (C,C)|C, C → c ϵ|C, D → c C,C,C} – ...in order to know which production rule N → c ... to chose for nodes labelled c, I need to check rule for predecessor and ensure that N
5 Saturday, 29 October 2011
– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion
for
– general tree grammars
– and, as long as we have some, everything is fine.. – which means we need some more stacks for track keeping...
6
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
6 Saturday, 29 October 2011
7 store non-terminals from RHS of possibly applicable rules
we don’t know which to use!
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
Input: DOM Tree for T, a tree grammar G = (N, Σ, S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules NS is a stack of sets of non-terminals, init with S Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS
else report “not accepted” and stop When an element E is visited on way up,
pop a rule set RS = {Ni → a ei | i = 1..k} out of R
7 Saturday, 29 October 2011
Input: DOM Tree for T, a tree grammar G = (N, Σ, S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules NS is a stack of sets of non-terminals, init with S Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop
When an element E is visited on way up,
pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop
report “accepted” and stop
8 store non-terminals from RHS of possibly applicable rules
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
8 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C), C → a (A,A,A)|ϵ}
9
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} a a a a a ➀ ➁ ➂
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
9 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
10
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
10 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
11
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
11 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
12
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
12 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
13
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} RS = {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
13 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
14
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk {A,B,C} W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
14 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
15
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
15 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
16
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} RS = {➀, ➁, ➂} {A,B,C} {➀, ➁, ➂} ϵ
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
16 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
17
a a R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS {A,B,C} {A,B,C} a a a a a ➀ ➁ ➂ ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C} {A,B,C} {A,B,C} RS = {➀, ➁, ➂} ϵ = W1...Wk W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
17 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
18
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C}
RS = {➀, ➁, ➂} ϵ = W1...Wk W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
18 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
19
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C}
RS = {➀, ➁, ➂}
{➀, ➁, ➂} ϵ {A,B,C} ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
19 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
20
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C} {A,B,C} {A,B,C}
RS = {➀, ➁, ➂} ϵ = W1...Wk W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
20 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
21
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {➀, ➁, ➂} {A,C},{A,C},{A,C} {A,B,C}
RS = {➀, ➁, ➂} ϵ = W1...Wk W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
21 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
22
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} ϵ {A,B,C} {A,B,C} RS = {➀, ➁, ➂} {A,C},{A,C},{A,C} = W1...W3
W = {C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
22 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
23
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C} RS = {➀, ➁, ➂}
W = {C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise {A,C},{A,C},{A,C} = W1...W3
23 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
24
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C}
RS = {➀, ➁, ➂}
{➀, ➁, ➂} ϵ {A,B,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
24 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
25
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C} {A,B,C}
RS = {➀, ➁, ➂}
{A,B,C} ϵ = W1...Wk
W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
25 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
26
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {➀, ➁, ➂} {C},{A,C} {A,B,C}
RS = {➀, ➁, ➂} ϵ = W1...Wk W = {A,C}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
26 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
27
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
ϵ {➀, ➁, ➂} NS {A,B,C} {A,B,C} {A,B,C}
RS = {➀, ➁, ➂}{C},{A,C} = W1...Wk W = {B}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
27 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
28
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
{B} {➀, ➁, ➂} NS {A,B,C} {A,B,C}
RS = {➀, ➁, ➂}{C},{A,C} = W1...Wk W = {B}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
28 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
29
a a
R NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
NS {A,B,C} {A,B,C}
RS = {➀, ➁, ➂}{B} = W1...Wk W = {A,B}
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
29 Saturday, 29 October 2011
– G = ({A,B,C},{a},{A,B,C},P) with P = { A → a B|ϵ, B → a (C,C) | B, C → a (A,A,A)|ϵ}
30
a a R
NT
When an element E is visited on way down, set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = {Ni → a ei | i = 1..k} out of R pop a string of sets of non-terminals W1...Wk out of NT set W to the set of those Ni such that there is a w1...wk with each wj from Wj that matches ei if W is non-empty then pop a string V1...Vm of non-terminals out of NT push V1...VmW onto NT, pop NS else report “not accepted” and stop report “accepted” and stop
NS
a a a a a ➀ ➁ ➂
NS {A,B,C} “accepted”/“yes”, T is accepted by G
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
30 Saturday, 29 October 2011
– walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion
– still only space linear in depth of input tree
– feel free to describe structure in a powerful way!
– we need single-type
31
ValAlgo XML doc/Tree T any tree Grammar G “yes”, if T ∈ L(G) “no”, otherwise
31 Saturday, 29 October 2011
– testing structural
– testing type constraints
– describing a handy PSVI
– … – single-typedness useful for some, but not all purposes! – locality?
– in CW4, not all valid input documents were really grammars – checking whether non-terminals are mentioned correctly is beyond XSD’s abilities...we need an even more powerful schema language!
32
32 Saturday, 29 October 2011
...closely related to validation are
– given a schema/grammar S, does there exist a document/tree d such that d is valid w.r.t. S – relevant as a basic consistency test for schemas
– given schemas/grammars S1, S2, is S1 a specialization of S2? – i.e., is every document that is valid w.r.t. S1 also valid w.r.t. S2? – relevant to support tasks such as schema refinement:
what I wanted
– also solves schema equivalence: see your coursework!
33
33 Saturday, 29 October 2011
34
34 Saturday, 29 October 2011
Or, spill your guts
35
35 Saturday, 29 October 2011
36
(Pinkwashing
36 Saturday, 29 October 2011
– Atomic (numbers, booleans, strings*)
– Composite
– Ordered lists with random access – [1, 2, “one”, “two”]
– Associative arrays/dictionary – {“one”:1, “two”:2}
– [{“one”:1, “o1”:{“a1”: [1,2,3.0], “a2”:[]}]
– The internal representation varies
*Strings can be thought of as a composite, i.e., an array of characters, but not here.
37
37 Saturday, 29 October 2011
{"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }} <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html
Slightly different!
38
38 Saturday, 29 October 2011
{"menu": [{ "id": "file", "value": "File"}, "popup": [ "menuitem": {"value": "New", "onclick": "CreateNewDoc()"}, "menuitem": {"value": "Open", "onclick": "OpenDoc()"}, "menuitem": {"value": "Close", "onclick": "CloseDoc()"} ] ] }} <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html
Needed to preserve
Still not right!
39
39 Saturday, 29 October 2011
{"menu": [{"id": "file", "value": "File"}, [{"popup": [{}, [{"menuitem": [{"value": "New", "onclick": "CreateNewDoc()"},[]]}, {"menuitem": [{"value": "Open", "onclick": "OpenDoc()"},[]]}, {"menuitem": [{"value": "Close", "onclick": "CloseDoc()"},[]]} ] ] } ] ] } <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 40
40 Saturday, 29 October 2011
– With one pair
– First item is an “object”, the attributes
– Second item is a list (of children)
Cumbersome!
41
41 Saturday, 29 October 2011
JSON object
XML WF DOM
JSON object
WXS PSVI CLICK!
42
42 Saturday, 29 October 2011
– Roundtripping (both ways) should be exact – Same program should behave the same in similar conditions
– Roundtripping (both ways) should be exact – Same program should behave the same in similar conditions – Interop!
– Roundtripping should be reasonable – Analogous programs should behave analogously
– Weaker notion of interop
43
43 Saturday, 29 October 2011
– A series of octets – A series of unicode characters – A series of “events”
– A tree structure
– A tree of a certain shape
– An adorned tree of a certain shape
Errors here mean no XML! SAX ErrorHandler Yay! XPath! XSLT! Etc. Types in play
44
44 Saturday, 29 October 2011
– A series of octets – A series of unicode characters – A series of “events”
– A tree structure
– A tree of a certain shape
– An adorned tree of a certain shape
validate erase
45
45 Saturday, 29 October 2011
– A series of octets – A series of unicode characters – A series of “events”
– A tree structure
– A tree of a certain shape
– An adorned tree of a certain shape
“Same” inputs can have different “meanings”! (external validation)
46
46 Saturday, 29 October 2011
– A series of octets – A series of unicode characters – A series of “events”
– A tree structure
– A tree of a certain shape
– An adorned tree of a certain shape
Generally looks like
<configuration xmlns="http://saxon.sf.net/ns/configuration" edition="EE"> <serialization method="xml" /> </configuration>
But can look otherwise!
element configuration { attribute edition {"ee"}, element serialization {attribute method {"xml"}}}
Same “meaning”, different spelling
47
47 Saturday, 29 October 2011
– A series of octets – A series of unicode characters – A series of “events”
– A tree structure
– A tree of a certain shape
– An adorned tree of a certain shape
– A picture (or document, or action, or…)
Can have many... ..for “the same” meaning
48
48 Saturday, 29 October 2011
49
– “XML is touted as an external format for representing data.”
– Self-describing
– Round-tripping
http://bit.ly/essenceOfXML2
49 Saturday, 29 October 2011
– A series of octets – A series of unicode characters – A series of “events”
– A tree structure
– A tree of a certain shape
– An adorned tree of a certain shape
– A picture (or document, or action, or…)
Well-formed Only one way to parse it Internal (DTD and doc are one)
50
External (Schema and doc are separate;
50 Saturday, 29 October 2011
51 <a> <b/> <b c="bar"/> </a> Test.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> <!ATTLIST b c CDATA #IMPLIED> sparse.dtd <!ELEMENT a (b)+> <!ELEMENT b EMPTY> <!ATTLIST b c CDATA 'foo'> full.dtd
count(//@c) = 2 count(//@c) = 1
<a> <b c="foo"/> <b c="bar"/> </a> Test-full.xml <a> <b/> <b c="bar"/> </a> Test-sparse.xml
Validate Serialize Query Can we think of Test-sparse and -full as “the same”?
Note: In oXygen, one needs to use internal validation.
51 Saturday, 29 October 2011
– The PSVIs have different information in them!
52
52 Saturday, 29 October 2011
53 <a> <b/> <b/> </a> Test.xml
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>
bare.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>
typed.xsd
count(//b) = 2 count(//b) = 2
Validate Query
Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.
53 Saturday, 29 October 2011
54 <a> <b/> <b/> </a> Test.xml
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>
bare.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>
typed.xsd
count(//b) = 2 count(//b) = 2
Validate Query2
Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes and elements as well.
count(//element(*,btype)) = ? count(//element(*,btype)) = 2
54 Saturday, 29 October 2011
55 <a> <b/> <b/> </a> Test.xml
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>
bare.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>
typed.xsd
count(//b) = 2 count(//b) = 2
Validate Query2
Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.
count(//element(*,btype)) = ? count(//element(*,btype)) = 2
55 Saturday, 29 October 2011
56 <a> <b/> <b/> </a> Test.xml
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema>
bare.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema>
typed.xsd
count(//b) = 2 count(//b) = 2
<a> <b/> <b /> </a> Test.xml
Validate Serialize Query2
Note: In oXygen, one needs to use internal validation. Note: WXS can do default attributes as well.
count(//element(*,btype)) = ? count(//element(*,btype)) = 2
Does external through internal succeed? Does internal through external succeed?
56 Saturday, 29 October 2011
57
– Internal to external and back
– <foo>one 2 3</foo>
– Content is {“one”, 2, “3”} – Key type info LOST » Silently » With only 1 schema
– External to internal and back
– Whitespace and layout
http://bit.ly/essenceOfXML2
57 Saturday, 29 October 2011
58
– “So the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.”
– That the issues are serious (enough) – That the problem solved is all that easy – That there arenʼt other, worse issues
http://bit.ly/essenceOfXML2
58 Saturday, 29 October 2011
Or, so wrong it’s right
59
59 Saturday, 29 October 2011
– Authoring, aggregating, querying…
– Perhaps the Atom DOM
– To the representative one – Or build software that mediates the difference
– Or make them – The nice thing about standards is that there are so many of them to choose from.
60 Saturday, 29 October 2011
– Many DOMs, all expressing the same thing – Many surface syntaxes (perhaps) for each DOM
– What should we send?
– Minimal standards?
Be liberal in what you accept, and conservative in what you send.
61 Saturday, 29 October 2011
62
– A very common application layer is “rendering”
<sentence style="slanted">This sentence is false.</sentence> This sentence is false. Correct rendering This sentence is false. Fallback!
(Still see this in XSLT!)
62 Saturday, 29 October 2011
63
– The "look" may need updating
– What works for 21" screens doesn't for mobile phones
– (content should be perceivable by everyone)
63 Saturday, 29 October 2011
64
– Strong separation of presentation
– not an XML/angle brackets format
– annotative, not transformative
– mostly “formats” nodes – ubiquitous on the Web, esp. client side – works with arbitrary XML
64 Saturday, 29 October 2011
65
– Which consist of
– Like XPath expressions – But only forward, with some syntactic sugar
– Sets of property/value pairs
div.title { text-align:center; font-size: 24; }
65 Saturday, 29 October 2011
66
<html><head><title>A bit of style</title></head> <body><style type="text/css"> .title { font-weight: bold } div.title { text-align:center; font-size: 24; } div.entry div.title { text-align: left; font-variant: normal} span.date {font-style: italic} span.date:after {content:" by"} div.content {font-style: italic} div.content i {font-style: normal;font-weight: bold} #one {color: red}</style> <div class="title">My Weblog</div> <div class="entry"> <div class="title">What I Did Today</div> <div class="byline"> <span class="date">Feb. 09, 2009</span> <span class="author">Bijan Parsia</span> </div> <div class="content" id="one"> <p>Taught a class and it went <i>very</i> well.</p> </div> </div> </body></html>
Try it in http://software.hixie.ch/utilities/js/live-dom-viewer/
66 Saturday, 29 October 2011
67
– Screen, Print, Braille, Aural…
@media print { BODY { font-size: 10pt } } @media screen { BODY { font-size: 12pt } }
Larger font size for screen
67 Saturday, 29 October 2011
68
– That is, there is overriding (and non-overriding) inheritance
– http://www.w3.org/TR/CSS21/cascade.html#cascade
– Distance to the node is significant – Precision of selectors is significant – Order of appearance is significant
68 Saturday, 29 October 2011
69
– Well formedness error…BOOM
– “Rules for handling parsing errors”
http://www.w3.org/TR/CSS21/syndata.html#parsing-errors
– E.g.,“User agents must ignore a declaration with an unknown property.”
69 Saturday, 29 October 2011
70
– Cascading & Inheritance help with 1, 2, 5
– @media rules help with 3-6 – Error handling helps with 1, 2, 4
70 Saturday, 29 October 2011
– Make errors hard or impossible to make
– Make doing the right thing easy and inevitable – Make detecting errors easy – Make correcting errors easy – Correct errors – Fail silently – Fail randomly – Fail differently (interop problem)
71
71 Saturday, 29 October 2011
– an available action that is salient to the actor
Donald Norman, The Design of Everyday Things
72 Saturday, 29 October 2011
– an available action that is salient to the actor
Donald Norman, The Design of Everyday Things
73 Saturday, 29 October 2011
– with a bad or wrong action – In law, “a hazardous object or condition on the land that is likely to attract children who are unable to appreciate the risk posed by the object or condition” -- ye olde Wikipedia – We can reformulate
misused by (even) an educated user”
– An attractive nuisance is easy to attempt, hard to use (correctly), and has bad (to catastrophic) effects
74 Saturday, 29 October 2011
– Recognize all or none
– Restrictive by default
– Error detection and reporting
– might not be the point where it occurred – might not be the most helpful point to look at!
– Null pointer deref » Is the right point the deref or the setting to null? – Non-crashing errors
75 Saturday, 29 October 2011
– Irregular structure!
76 Saturday, 29 October 2011
– Be strict about the well formedness of what you accept, and strict in what you send – Draconian error handling – Severe consequences on the Web
– Validity and other analysis? – Most schema languages poor at error reporting
77 Saturday, 29 October 2011
– fatal error [Definition: An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).]
– To or for its users
78 Saturday, 29 October 2011
<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml
count(//b) count(//b/*) count(//b/text()) =3 =3 =0 =1 =0 =1
<a> <b/> <b>Foo</b> </a>
=0
<a> <b/> <b><b/><b/> </a>
=0
79 Saturday, 29 October 2011
<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml
=0 =2
<a> <b/> <b>Foo</b> </a>
=1
<a> <b/> <b><b/><b/> </a>
=1
80 Saturday, 29 October 2011
<a> <b/> <b/> <b/> </a> valid.xml <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd <a> <b/> <b>Foo</b> <b><b/></b> </a> invalid.xml
=valid =invalid
<a> <b/> <b>Foo</b> </a> <a> <b/> <b><b/><b/> </a>
Can even “find” the errors!
81 Saturday, 29 October 2011
82 Saturday, 29 October 2011
– Validate parts of a document – A la wildcards
– Far reaching dependancies – Computations
– With XQuery and XSLT – But still a leetle declarative
The essence of Schematron
83 Saturday, 29 October 2011
– Not grammar or object/type based – Rule based – Test oriented – Complimentary
– Patterns contain rules
– Tests, which are XPath expressions, and – Assertions, which are natural language descriptions
84 Saturday, 29 October 2011
– (Ok, could handle this with Keys in XML Schema!)
<rule context="element">
<let name="n" value="@name"/> <assert test="count(//element/name[text()=$n]) = 1"> There can be only one element declaration with a given name. </assert> </rule>
declaration ”
<rule context="elementref"> <let name="r" value=”/ref/text()"/> <assert test="count(//element/nametext()=$r]) = 1"> There must be an element declaration (with the right name) for elementref to refer to. </assert> </rule>
85 Saturday, 29 October 2011
–Relax NG schema –Schemetron assertions –Custom code
–To break circles:
86 Saturday, 29 October 2011
<schema xmlns="http://purl.oclc.org/dsdl/schematron"> <ns prefix="h" uri="http://www.w3.org/1999/xhtml"/> <pattern name='dfn cannot nest'> <rule context="h:dfn"> <report test="ancestor::h:dfn"> The "dfn" element cannot contain any nested "dfn" elements.</report> </rule> </pattern> <pattern name='noscript cannot nest'> <rule context="h:noscript"> <report test="ancestor::h:noscript"> The "noscript element cannot contain any nested "noscript" elements.</report> </rule> </pattern> </schema>
87 Saturday, 29 October 2011
–Using XPath functions and variables
–Can pull stuff from other file
–diagnostics has (value-ofed) expressions –“Generate paths” to errors
–Thin shim over XSLT –Closer to “arbitrary code”
88 Saturday, 29 October 2011
– Schematron doesn’t care – Two phase validation
– Plus variables!
– Unlike all the other schema languages! – We’re not performing runs
– Somewhat easy to use
– What about analysis?
89 Saturday, 29 October 2011
–As do all XML schema languages
–So can’t help with e.g., overlapping tags
–At least, in the default case
–Unlike CSS
–Or rather, does it support enough liberality?
90 Saturday, 29 October 2011
03. <title>Hello!</title> 04. <meta http-equiv="Content-Type" content="application/xhtml+xml" />
07. <p>Hello to you!</p> 08. <p>Can you spot the problem?
91 Slide due to Iain Flynn
91 Saturday, 29 October 2011
92 Slide due to Iain Flynn
92 Saturday, 29 October 2011
– 1%-5% of web pages are valid – Validation is very weak! – All sorts of breakage
– 10% feeds not well-formed – Where do the problems come from?
93 Saturday, 29 October 2011
In 2005, the developers of Google Reader (Google’s RSS and Atom feed parser) took a snapshot of the XML documents they parsed in one day.
least one well-formedness error.
– That’s a lot of broken documents
Source: http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html Slide due to Iain Flynn
94 Saturday, 29 October 2011
Text
Encoding Structure Entity Typo
Slide due to Iain Flynn
95 Saturday, 29 October 2011
!"#$%&"'() *+,)
23,) !"4.5() **,) 657$() 2,)
!""#"$%"&'()#*+,$
!"#$%&"'()
!"4.5() 657$()
Slide due to Iain Flynn
96 Saturday, 29 October 2011
– “All of its default templates were valid XHTML.” – “It incorporated a nifty layout editor to ensure that you couldn’t introduce any invalid XHTML...”
– “the page that you...validly authored is now not well-formed”
– “...your publishing tool had a bug” – “The administration page itself tries to display the trackbacks you’ve received, and you get an XML processing error.”
http://diveintomark.org/archives/2004/01/14/thought_experiment
97 Saturday, 29 October 2011
98 Saturday, 29 October 2011
– Complex ones! – Many players; many sorts of player – Lots of historical specifics – Lots of interaction effects
– What do people do (and why?) – How to influence them? – Affordances and incentives – Dealing with “bozos”
syndication feed that’s well-formed XML is an incompetent fool.”
99 Saturday, 29 October 2011
– Fail hard and fast
– CSS, DTD ATTLISTs, HTML
– HTML, HTML5
– The key is to fail correctly
– With the right message!
Every set of bytes has a corresponding (determinate) DOM
100 Saturday, 29 October 2011
Or, goodbyes and farewells
101
101 Saturday, 29 October 2011
– flexibility and stability – flexibility and efficiency – expressivity and efficiency – usability and flexibility – usability and rigidity – etc. etc. etc.
– understand trade-offs – cultivate judgement
– there is no silver bullet
102 Saturday, 29 October 2011
– CW5 is “easy” :)
– M5 is a bit of Schematron
– Due MONDAY, NOV 7TH!!! – At 9:00AM
– Due after period 2 – So as not to conflict – Practice some Java!
103 Saturday, 29 October 2011
– Basically, an extended version of Qs and SEs
– After break
– For revision
104 Saturday, 29 October 2011
things you’ve learned; see me if you’re interested
105 Saturday, 29 October 2011