comp60411 semi structured data and the web validating
play

COMP60411 Semi-structured Data and the Web Validating Trees against - PowerPoint PPT Presentation

COMP60411 Semi-structured Data and the Web Validating Trees against Tree Grammars The Essence of XML and Errors Bijan Parsia and Uli Sattler University of Manchester 1 Saturday, 29 October 2011 1 Last week... ...we have designed our first


  1. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 11 Saturday, 29 October 2011 11

  2. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,B,C} ϵ pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 12 Saturday, 29 October 2011 12

  3. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 13 Saturday, 29 October 2011 13

  4. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 14 Saturday, 29 October 2011 14

  5. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 15 Saturday, 29 October 2011 15

  6. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 16 Saturday, 29 October 2011 16

  7. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 17 Saturday, 29 October 2011 17

  8. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 18 Saturday, 29 October 2011 18

  9. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 19 Saturday, 29 October 2011 19

  10. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 20 Saturday, 29 October 2011 20

  11. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 21 Saturday, 29 October 2011 21

  12. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} = W 1 ...W 3 in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 22 Saturday, 29 October 2011 22

  13. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} = W 1 ...W 3 in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 23 Saturday, 29 October 2011 23

  14. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 24 Saturday, 29 October 2011 24

  15. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {A,C} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring ϵ = W 1 ...W k RS = { ➀ , ➁ , ➂ } in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 25 Saturday, 29 October 2011 25

  16. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {A,C} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C},{A,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 26 Saturday, 29 October 2011 26

  17. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {B} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ }{C},{A,C} = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 27 Saturday, 29 October 2011 27

  18. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {B} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ }{C},{A,C} = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i { ➀ , ➁ , ➂ } {B} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 28 Saturday, 29 October 2011 28

  19. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {A,B} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ }{B} = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 29 Saturday, 29 October 2011 29

  20. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) “accepted”/“yes”, T is accepted by G if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i if W is non-empty then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 30 Saturday, 29 October 2011 30

  21. “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Implementing this algorithm? Again, as for single-type tree grammars, – walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion • Insights gained? • Validating general tree grammars • does not require guessing & backtrack • can be implemented in a streaming way • is a bit more tricky than validating single-type grammars, • but not really more complex (in terms of time/space) – still only space linear in depth of input tree • so, for validating purposes, restrictions to single-type is not necessary – feel free to describe structure in a powerful way! • but, for uniqueness of PSVI, – we need single-type 31 Saturday, 29 October 2011 31

  22. From Tree Grammars to Schema Languages • Different schema languages for different purposes – testing structural • do persons’ names have both a first and second name? – testing type constraints • is age an integer? And DoB a date? – describing a handy PSVI • adding default values or type information for easy/robust querying/manipulation – … – single-typedness useful for some, but not all purposes! – locality? • Your applications might use different schemas for different purposes • ...and there are purposes none of our schema languages can serve: – in CW4, not all valid input documents were really grammars – checking whether non-terminals are mentioned correctly is beyond XSD’s abilities...we need an even more powerful schema language! 32 Saturday, 29 October 2011 32

  23. Other interesting questions ...closely related to validation are • Schema emptiness: – given a schema/grammar S, does there exist a document/tree d such that d is valid w.r.t. S – relevant as a basic consistency test for schemas • Schema containment: – given schemas/grammars S1, S2, is S1 a specialization of S2? – i.e., is every document that is valid w.r.t. S1 also valid w.r.t. S2? – relevant to support tasks such as schema refinement: • if I say I want to refine S2, • then it would be nice if this intention could be later verified to ensure that I did what I wanted – also solves schema equivalence: see your coursework! • ...a lot of research in both areas 33 Saturday, 29 October 2011 33

  24. Bye for now! (I’ll be around) I have enjoyed working with you, and hope you learned loads and also enjoyed the experience! 34 Saturday, 29 October 2011 34

  25. Internal to External Or, spill your guts 35 Saturday, 29 October 2011 35

  26. What the...?!? (Pinkwashing obscured) 36 Saturday, 29 October 2011 36

  27. JSON (1) • Javascript has a rich set of literals (ext. reps) – Atomic (numbers, booleans, strings*) • 1, 2, true, “I’m a string” – Composite • Arrays – Ordered lists with random access – [1, 2, “one”, “two”] • “Objects” – Associative arrays/dictionary – {“one”:1, “two”:2} • These can nest! – [{“one”:1, “o1”:{“a1”: [1,2,3.0], “a2”:[]}] • JSON == roughly this subset of Javascript – The internal representation varies • In JS, 1 represents a 64 bit, IEEE floating point number • In Python’s json module, 1 represents a 32 bit integer in two’s complement *Strings can be thought of as a composite, i.e., an array of characters, but not here. 37 Saturday, 29 October 2011 37

  28. JSON (2) {"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }} Slightly different! <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 38 Saturday, 29 October 2011 38

  29. JSON (2.1) Needed to preserve {"menu": [{ order! "id": "file", "value": "File"}, "popup": [ "menuitem": {"value": "New", "onclick": "CreateNewDoc()"}, "menuitem": {"value": "Open", "onclick": "OpenDoc()"}, "menuitem": {"value": "Close", "onclick": "CloseDoc()"} ] ] }} Still not right! <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 39 Saturday, 29 October 2011 39

  30. JSON (2.2) {"menu": [{"id": "file", "value": "File"}, [{"popup": [{}, [{"menuitem": [{"value": "New", "onclick": "CreateNewDoc()"},[]]}, {"menuitem": [{"value": "Open", "onclick": "OpenDoc()"},[]]}, {"menuitem": [{"value": "Close", "onclick": "CloseDoc()"},[]]} ] ] } ] ] } <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 40 Saturday, 29 October 2011 40

  31. JSON (2.1) Recipe • Elements are mapped to “objects” – With one pair • ElementName : contents • Contents are a list – First item is an “object”, the attributes • Attributes are pairs of strings – Second item is a list (of children) • Empty elements require an explicit empty list • No attributes requires an explicit empty object Cumbersome! 41 Saturday, 29 October 2011 41

  32. JSON vs. XML (expressivity) CLICK! • Every XML WF DOM can be faithfully represented as a JSON object • Every JSON object can be faithfully represented as an XML WF DOM • Every WXS PSVI can be faithfully represented as a JSON object • Every JSON object can be faithfully represented as a WXS PSVI 42 Saturday, 29 October 2011 42

  33. Considerations • For “same system” – Roundtripping (both ways) should be exact – Same program should behave the same in similar conditions • For homogenous, distinct systems – Roundtripping (both ways) should be exact – Same program should behave the same in similar conditions – Interop! • For heterogenous systems – Roundtripping should be reasonable – Analogous programs should behave analogously • in analogous conditions – Weaker notion of interop 43 Saturday, 29 October 2011 43

  34. What is an XML “Document”? • Layers – A series of octets Errors here mean no – A series of unicode characters XML! SAX ErrorHandler – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure Yay! XPath! XSLT! Etc. • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape Types in play • A PSVI wrt an WXS 44 Saturday, 29 October 2011 44

  35. What is an XML “Document”? • Layers validate – A series of octets – A series of unicode characters – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS erase 45 Saturday, 29 October 2011 45

  36. What is an XML “Document”? • Layers – A series of octets – A series of unicode characters – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset – A tree of a certain shape “Same” inputs can • A Validated Infoset have different “meanings”! – An adorned tree of a certain shape (external validation) • A PSVI wrt an WXS 46 Saturday, 29 October 2011 46

  37. What is an XML “Document”? • Layers Generally looks like <configuration xmlns="http://saxon.sf.net/ns/configuration" – A series of octets edition="EE"> <serialization method="xml" /> </configuration> – A series of unicode characters – A series of “events” But can look otherwise! • SAX perspective element configuration { attribute edition {"ee"}, • E.g., Start/End tags element serialization {attribute method {"xml"}}} • Events are tokens – A tree structure Same “meaning”, • A DOM/Infoset different spelling – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS 47 Saturday, 29 October 2011 47

  38. What is an XML “Document”? • Layers – A series of octets – A series of unicode characters Can have many... – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens ..for “the same” meaning – A tree structure • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS – A picture (or document, or action, or … ) • Application meaning 48 Saturday, 29 October 2011 48

  39. The Essence of XML • Thesis: – “XML is touted as an external format for representing data.” • Two properties – Self-describing • Destroyed by external validation – Round-tripping • Destroyed by defaults and union types http://bit.ly/essenceOfXML2 49 Saturday, 29 October 2011 49

  40. Self-description • As standard descriptoin – A series of octets – A series of unicode characters Well-formed – A series of “events” Only one way to parse it • SAX perspective • E.g., Start/End tags • Events are tokens Internal – A tree structure (DTD and doc • A DOM/Infoset are one) – A tree of a certain shape • A Validated Infoset External – An adorned tree of a certain shape (Schema and doc are • A PSVI wrt an WXS separate; – A picture (or document, or action, or … ) out-of-band desription) • Application meaning 50 Saturday, 29 October 2011 50

  41. Roundtripping Fail: Defaults sparse.dtd Test-sparse.xml <!ELEMENT a (b)+> count(//@c) = 1 <!ELEMENT b EMPTY> <a> <!ATTLIST b c CDATA #IMPLIED> <b/> <b c="bar"/> </a> Test.xml <a> Validate Query Serialize <b/> <b c="bar"/> </a> <a> <b c="foo"/> <b c="bar"/> <!ELEMENT a (b)+> count(//@c) = 2 </a> <!ELEMENT b EMPTY> <!ATTLIST b c CDATA 'foo'> Test-full.xml full.dtd Can we think of Test-sparse and -full as “the same”? 51 Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 51

  42. Not self-describing! • Under external validation • Not just legality, but content! – The PSVIs have different information in them! 52 Saturday, 29 October 2011 52

  43. Roundtripping “Success”: Types bare.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> count(//b) = 2 <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema> Test.xml Query <a> Validate <b/> <b/> </a> typed.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> count(//b) = 2 <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema> 53 Note: WXS can do default attributes as well. Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 53

  44. Roundtripping “Success”: Types bare.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> count(//b) = 2 <xs:element name="a"> <xs:complexType> <xs:sequence> count(//element(*,btype)) = ? <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema> Test.xml Query2 <a> Validate <b/> <b/> </a> typed.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> count(//b) = 2 <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> count(//element(*,btype)) = 2 </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema> 54 Note: WXS can do default attributes and elements as well. Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 54

  45. Roundtripping “Success”: Types bare.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> count(//b) = 2 <xs:element name="a"> <xs:complexType> <xs:sequence> count(//element(*,btype)) = ? <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema> Test.xml Query2 <a> Validate <b/> <b/> </a> typed.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> count(//b) = 2 <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> count(//element(*,btype)) = 2 </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema> 55 Note: WXS can do default attributes as well. Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 55

  46. Roundtripping “Success”: Types bare.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> count(//b) = 2 <xs:element name="a"> <xs:complexType> <xs:sequence> count(//element(*,btype)) = ? <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> Test.xml </xs:schema> Test.xml <a> Query2 Serialize <a> Validate <b/> <b/> <b /> <b/> </a> </a> typed.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> count(//b) = 2 <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> count(//element(*,btype)) = 2 </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema> Does external through internal succeed? Does internal through external succeed? 56 Note: WXS can do default attributes as well. Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 56

  47. More Roundtripping Fail • Type – Internal to external and back • Take an element, foo, with content {“one”, “2”, 3} • It’s (simple) type is a list of union of integer and string • Serialize – <foo>one 2 3</foo> • Parse and validate – Content is {“one”, 2, “3”} – Key type info LOST » Silently » With only 1 schema • Spelling – External to internal and back • “001” to 1 to “1” – Whitespace and layout http://bit.ly/essenceOfXML2 57 Saturday, 29 October 2011 57

  48. The Essence of XML • Conclusion: – “So the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.” • It ʼ s not obvious – That the issues are serious (enough) – That the problem solved is all that easy – That there aren ʼ t other, worse issues http://bit.ly/essenceOfXML2 58 Saturday, 29 October 2011 58

  49. The Essence of Error Or, so wrong it’s right 59 Saturday, 29 October 2011 59

  50. How to cope? • With which task? – Authoring, aggregating, querying … • Settle on a core representation of the model – Perhaps the Atom DOM • Coerce/transform/extract other models – To the representative one – Or build software that mediates the difference • Hope that there aren’t too many • Advocate standards! – Or make them – The nice thing about standards is that there are so many of them to choose from. • Kent Pitman and others Saturday, 29 October 2011 60

  51. Postel’s Law Be liberal in what you accept, and conservative in what you send. • Liberality – Many DOMs, all expressing the same thing – Many surface syntaxes (perhaps) for each DOM • Conservativity – What should we send? • It depends on the receiver! – Minimal standards? • Well formed XML? • Valid according to a popular schema/format? • HTML? Saturday, 29 October 2011 61

  52. Structure and Presentation • We’ve called this “DOM” and “Application” Layer – A very common application layer is “rendering” • Text, images • Like, y’know, the web • Standard vs. default renderings • Goes back to SGML <sentence style="slanted">This sentence is false.</sentence> Correct rendering This sentence is false. Fallback! This sentence is false. (Still see this in XSLT!) 62 Saturday, 29 October 2011 62

  53. Why Separate them? • Presentation is more fluid than structure – The "look" may need updating • Presentation needs may vary – What works for 21" screens doesn't for mobile phones • (Or maybe not!) • Accessibility – (content should be perceivable by everyone) • Programmatic processing needs 63 Saturday, 29 October 2011 63

  54. Another digression: CSS • The style language for the Web – Strong separation of presentation • CSS is – not an XML/angle brackets format • Oh NOES! Not another one! – annotative, not transformative • Well, sorta – mostly “formats” nodes – ubiquitous on the Web, esp. client side – works with arbitrary XML • But most clients work with (X)HTML • See the excellent PrinceXML formatter 64 Saturday, 29 October 2011 64

  55. Basic Component • Rules – Which consist of • Selectors – Like XPath expressions – But only forward, with some syntactic sugar • Declaration blocks – Sets of property/value pairs div.title { text-align:center; font-size: 24; } 65 Saturday, 29 October 2011 65

  56. <html><head><title>A bit of style</title></head> <body><style type="text/css"> .title { font-weight: bold } div.title { text-align:center; font-size: 24; } div.entry div.title { text-align: left; font-variant: normal} span.date {font-style: italic} span.date:after {content:" by"} div.content {font-style: italic} div.content i {font-style: normal;font-weight: bold} #one {color: red}</style> <div class="title">My Weblog</div> <div class="entry"> <div class="title">What I Did Today</div> <div class="byline"> <span class="date">Feb. 09, 2009</span> <span class="author">Bijan Parsia</span> </div> <div class="content" id="one"> <p>Taught a class and it went <i>very</i> well.</p> </div> </div> </body></html> Try it in http://software.hixie.ch/utilities/js/live-dom-viewer/ 66 Saturday, 29 October 2011 66

  57. Media Types • Different sets of rules can be contextualized to media – Screen, Print, Braille, Aural … • This is done with groupings called “@media rule”s @media print { BODY { font-size: 10pt } } Larger font size for screen @media screen { BODY { font-size: 12pt } } 67 Saturday, 29 October 2011 67

  58. Cascading • CSS Rules cascade – That is, there is overriding (and non-overriding) inheritance • That is, rules combine in different ways – http://www.w3.org/TR/CSS21/cascade.html#cascade • General principles – Distance to the node is significant – Precision of selectors is significant – Order of appearance is significant 68 Saturday, 29 October 2011 68

  59. Error Handling • XML has “draconian” error handling – Well formedness error … BOOM • CSS has “forgiving” error handling – “Rules for handling parsing errors” http://www.w3.org/TR/CSS21/syndata.html#parsing-errors • That is, how to interpret illegal documents • Not reporting errors, but working around them – E.g.,“User agents must ignore a declaration with an unknown property.” • Replace: “ h1 { color: red; rotation: 70minutes } ” • With: “ h1 { color: red } ” • Study the error handling rules! 69 Saturday, 29 October 2011 69

  60. CSS Robustness • Has to deal with Web conditions 1. People borrowing 2. People collaborating 3. Different devices 4. Different kinds of audiences (and authors) 5. Maintainability 6. Aesthetics • CSS is designed for this – Cascading & Inheritance help with 1, 2, 5 • And importing, of course – @media rules help with 3-6 – Error handling helps with 1, 2, 4 70 Saturday, 29 October 2011 70

  61. Errors! • One person’s error is another’s data • Errors may or may not be unusual • Errors are relative to a norm • Preventing errors – Make errors hard or impossible to make • Make doing things hard or impossible – Make doing the right thing easy and inevitable – Make detecting errors easy – Make correcting errors easy – Correct errors – Fail silently – Fail randomly – Fail differently (interop problem) 71 Saturday, 29 October 2011 71

  62. (Perceived) Affordances • (Perceived) Affordance – an available action that is salient to the actor Donald Norman, The Design of Everyday Things Saturday, 29 October 2011 72

  63. (Perceived) Affordances • (Perceived) Affordance – an available action that is salient to the actor Donald Norman, The Design of Everyday Things Saturday, 29 October 2011 73

  64. Attractive Nuisances • A dominant or attractive affordance – with a bad or wrong action – In law, “a hazardous object or condition on the land that is likely to attract children who are unable to appreciate the risk posed by the object or condition” -- ye olde Wikipedia – We can reformulate • “a hazardous or misleading language or UI feature that is likely to be misused by (even) an educated user” • Contrast with “merely” hard to use – An attractive nuisance is easy to attempt, hard to use (correctly), and has bad (to catastrophic) effects Saturday, 29 October 2011 74

  65. Typical Schema Languages • Grammar (and maybe type based) – Recognize all or none • Though what the “all” is can be rather flexible – Restrictive by default • Slogan: What is not permitted is forbidden – Error detection and reporting • Is at the discretion of the system • “Not accepted” is the starting place • The point where an error is detected – might not be the point where it occurred – might not be the most helpful point to look at! • Programs! – Null pointer deref » Is the right point the deref or the setting to null? – Non-crashing errors Saturday, 29 October 2011 75

  66. The SSD Way • Explore before prescribe • Describe rather than define • Take what you can, when you can take it • Extra or missing stuff is (can be) OK – Irregular structure! • Adhere to the task at hand • Adore Postel’s Law Saturday, 29 October 2011 76

  67. XML Error Handling • De facto XML motto – Be strict about the well formedness of what you accept, and strict in what you send – Draconian error handling – Severe consequences on the Web • And other places • Fail early and fail hard • What about higher levels? – Validity and other analysis? – Most schema languages poor at error reporting • How about XQuery’s type error reporting? Saturday, 29 October 2011 77

  68. XML Error Handling • The spec: – fatal error [Definition: An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).] • What should an application do? – To or for its users Saturday, 29 October 2011 78

  69. XPath for Validation • What XPath is “equivalent” to the declaration of <b>? <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd valid.xml <a> =3 =0 =0 <b/> <b/> <b/> count(//b) count(//b/*) count(//b/text()) </a> invalid.xml <a> <b/> =3 =1 =1 <b>Foo</b> <b><b/></b> </a> <a> <a> =0 <b/> <b/> =0 <b>Foo</b> <b><b/><b/> </a> </a> Saturday, 29 October 2011 79

  70. XPath for Validation • What XPath is “equivalent” to the declaration of <b>? <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd valid.xml <a> =0 <b/> <b/> <b/> </a> count(//b/(* | text())) invalid.xml <a> <b/> =2 <a> <a> <b>Foo</b> =1 <b/> <b/> =1 <b><b/></b> <b>Foo</b> <b><b/><b/> </a> </a> </a> Saturday, 29 October 2011 80

  71. XPath for Validation • What XPath is “equivalent” to the declaration of <b>? <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd valid.xml <a> =valid <b/> <b/> if (count(//b/(* | text()))=0) <b/> </a> then “valid” invalid.xml else “invalid” <a> <b/> <b>Foo</b> <b><b/></b> =invalid </a> <a> <b/> <b><b/><b/> <a> </a> <b/> <b>Foo</b> Can even “find” the errors! </a> Saturday, 29 October 2011 81

  72. Saturday, 29 October 2011 82

  73. XPath (etc) for Validation • We could have finer control – Validate parts of a document – A la wildcards • But with more control! • We could have greater expressivity – Far reaching dependancies – Computations • Essentially, code based validation! – With XQuery and XSLT – But still a leetle declarative • We always need it The essence of Schematron Saturday, 29 October 2011 83

  74. Schematron • A different sort of schema language – Not grammar or object/type based – Rule based – Test oriented – Complimentary • Conceptually simple – Patterns contain rules • Rules set a context and contain asserts and reports (A&Rs) • A&Rs contain – Tests, which are XPath expressions, and – Assertions, which are natural language descriptions Saturday, 29 October 2011 84

  75. DTDx Schematron • “Only 1 Element declaration with a given name” – (Ok, could handle this with Keys in XML Schema!) <rule context="element"> <let name="n" value="@name"/> <assert test="count(//element/name[text()=$n]) = 1"> There can be only one element declaration with a given name. </assert> </rule> • “Every element reference must have a corresponding element declaration ” <rule context="elementref"> <let name="r" value=”/ref/text()"/> <assert test="count(//element/nametext()=$r]) = 1"> There must be an element declaration (with the right name) for elementref to refer to. </assert> </rule> Saturday, 29 October 2011 85

  76. From HTML5: Exclusions • HTML5 validator • http://hsivonen.iki.fi/thesis/ –Relax NG schema –Schemetron assertions –Custom code • Often want contextual exclusions –To break circles: •Paragraphs contain footnotes •Footnotes contain paragraphs •Footnote paragraphs may not contain footnotes • Without exclusions, would need many paragraph productions Saturday, 29 October 2011 86

  77. Exclusions Examples <schema xmlns="http://purl.oclc.org/dsdl/schematron"> <ns prefix="h" uri="http://www.w3.org/1999/xhtml"/> <pattern name='dfn cannot nest'> <rule context="h:dfn"> <report test="ancestor::h:dfn"> The "dfn" element cannot contain any nested "dfn" elements.</report> </rule> </pattern> <pattern name='noscript cannot nest'> <rule context="h:noscript"> <report test="ancestor::h:noscript"> The "noscript element cannot contain any nested "noscript" elements.</report> </rule> </pattern> </schema> Saturday, 29 October 2011 87

  78. Tip of the iceberg • Computations –Using XPath functions and variables • Dynamic checks –Can pull stuff from other file • Elaborate reports –diagnostics has ( value-of ed) expressions –“Generate paths” to errors •Sound familiar? • General case –Thin shim over XSLT –Closer to “arbitrary code” Saturday, 29 October 2011 88

  79. Interesting Points • DTDx has a WXS – Schematron doesn’t care – Two phase validation •RELAX NG has a way of embedding •WXS 1.1 incorporating similar rules • Arbitrary XPath for context and test – Plus variables! • What isn’t forbidden is permitted – Unlike all the other schema languages! – We’re not performing runs • We’re firing rules – Somewhat easy to use • If you know XPath • If you don’t need coverage – What about analysis? Saturday, 29 October 2011 89

  80. Schematron Presumes… • …well formed XML –As do all XML schema languages •Work on DOM! –So can’t help with e.g., overlapping tags •Or tag soup in general •Namespace Analysis!? • …authorial repair –At least, in the default case •Communicate errors to people •Thus, not the basis of a modern browser! –Unlike CSS • Is this enough liberality? –Or rather, does it support enough liberality? Saturday, 29 October 2011 90

  81. Take the following sample XHTML code: 01. <html> 02. <head> 03. <title>Hello!</title> 04. <meta http-equiv="Content-Type" content="application/xhtml+xml" /> 05. </head> 06. <body> 07. <p>Hello to you!</p> 08. <p>Can you spot the problem? 09. </body> 10. </html> 91 Slide due to Iain Flynn Saturday, 29 October 2011 91

  82. HTML: XHTML: 92 Slide due to Iain Flynn Saturday, 29 October 2011 92

  83. Validation In The Wild • HTML – 1%-5% of web pages are valid – Validation is very weak! – All sorts of breakage • E.g., overlapping tags • <b>hi <i>there</b>, my good friend</i> • Syndication Formats – 10% feeds not well-formed – Where do the problems come from? • Hand authoring • Generation bugs • String concat based generation • Composition from random sources Saturday, 29 October 2011 93

  84. More recently In 2005, the developers of Google Reader (Google’s RSS and Atom feed parser) took a snapshot of the XML documents they parsed in one day. • Approximately 7% of these documents contained at least one well-formedness error. • Google Reader deals with millions of feeds per day. – That’s a lot of broken documents Source: http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html Slide due to Iain Flynn Saturday, 29 October 2011 94

  85. Encoding Structure Entity Typo Text Slide due to Iain Flynn Saturday, 29 October 2011 95

  86. !""#"$%"&'()#*+,$ 657$() 2,) !"#$%&"'() !"4.5() *+,) **,) !"#$%&"'() -./0#.0/1() !"4.5() 657$() -./0#.0/1() 23,) Slide due to Iain Flynn Saturday, 29 October 2011 96

  87. A Thought Experiment • “Imagine...that all web browsers use strict XML parsers” • “...that you were using a publishing tool that [was strict] – “All of its default templates were valid XHTML.” – “It incorporated a nifty layout editor to ensure that you couldn’t introduce any invalid XHTML...” • “You click ‘Publish’” – “the page that you...validly authored is now not well-formed” • Problem: “a trackback with some illegal characters” – “...your publishing tool had a bug” – “The administration page itself tries to display the trackbacks you’ve received, and you get an XML processing error.” http://diveintomark.org/archives/2004/01/14/thought_experiment Saturday, 29 October 2011 97

  88. Real Life Saturday, 29 October 2011 98

  89. Lesson #1 • We are dealing with socio-political (and economic) phenomena – Complex ones! – Many players; many sorts of player – Lots of historical specifics – Lots of interaction effects • Human factors critical – What do people do (and why?) – How to influence them? – Affordances and incentives – Dealing with “bozos” • “There’s just no nice way to say this: Anyone who can’t make a syndication feed that’s well-formed XML is an incompetent fool.” Saturday, 29 October 2011 99

  90. 3 Error Handling Styles • Draconian – Fail hard and fast Every set of bytes • Ignore errors has a corresponding – CSS, DTD ATTLISTs, HTML (determinate) DOM • Hard coded DWIM repair – HTML, HTML5 • Ultimately, (some) errors are propagated – The key is to fail correctly • In the right way, at the right time, for the right reason – With the right message! • Better is to make errors unlikely! Saturday, 29 October 2011 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend