COMP60411 Semi-structured Data and the Web Validating Trees against - PowerPoint PPT Presentation

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 11 Saturday, 29 October 2011 11

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,B,C} ϵ pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 12 Saturday, 29 October 2011 12

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 13 Saturday, 29 October 2011 13

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 14 Saturday, 29 October 2011 14

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 15 Saturday, 29 October 2011 15

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 16 Saturday, 29 October 2011 16

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 17 Saturday, 29 October 2011 17

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 18 Saturday, 29 October 2011 18

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 19 Saturday, 29 October 2011 19

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 20 Saturday, 29 October 2011 20

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {A,C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 21 Saturday, 29 October 2011 21

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} = W 1 ...W 3 in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 22 Saturday, 29 October 2011 22

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {C} if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} = W 1 ...W 3 in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 23 Saturday, 29 October 2011 23

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 24 Saturday, 29 October 2011 24

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {A,C} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring ϵ = W 1 ...W k RS = { ➀ , ➁ , ➂ } in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 25 Saturday, 29 October 2011 25

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {A,C} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C},{A,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 26 Saturday, 29 October 2011 26

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {B} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ }{C},{A,C} = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 27 Saturday, 29 October 2011 27

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {B} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ }{C},{A,C} = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i { ➀ , ➁ , ➂ } {B} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 28 Saturday, 29 October 2011 28

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty W = {A,B} then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ }{B} = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 29 Saturday, 29 October 2011 29

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) “accepted”/“yes”, T is accepted by G if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i if W is non-empty then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 30 Saturday, 29 October 2011 30

“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Implementing this algorithm? Again, as for single-type tree grammars, – walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion • Insights gained? • Validating general tree grammars • does not require guessing & backtrack • can be implemented in a streaming way • is a bit more tricky than validating single-type grammars, • but not really more complex (in terms of time/space) – still only space linear in depth of input tree • so, for validating purposes, restrictions to single-type is not necessary – feel free to describe structure in a powerful way! • but, for uniqueness of PSVI, – we need single-type 31 Saturday, 29 October 2011 31

From Tree Grammars to Schema Languages • Different schema languages for different purposes – testing structural • do persons’ names have both a first and second name? – testing type constraints • is age an integer? And DoB a date? – describing a handy PSVI • adding default values or type information for easy/robust querying/manipulation – … – single-typedness useful for some, but not all purposes! – locality? • Your applications might use different schemas for different purposes • ...and there are purposes none of our schema languages can serve: – in CW4, not all valid input documents were really grammars – checking whether non-terminals are mentioned correctly is beyond XSD’s abilities...we need an even more powerful schema language! 32 Saturday, 29 October 2011 32

Other interesting questions ...closely related to validation are • Schema emptiness: – given a schema/grammar S, does there exist a document/tree d such that d is valid w.r.t. S – relevant as a basic consistency test for schemas • Schema containment: – given schemas/grammars S1, S2, is S1 a specialization of S2? – i.e., is every document that is valid w.r.t. S1 also valid w.r.t. S2? – relevant to support tasks such as schema refinement: • if I say I want to refine S2, • then it would be nice if this intention could be later verified to ensure that I did what I wanted – also solves schema equivalence: see your coursework! • ...a lot of research in both areas 33 Saturday, 29 October 2011 33

Bye for now! (I’ll be around) I have enjoyed working with you, and hope you learned loads and also enjoyed the experience! 34 Saturday, 29 October 2011 34

Internal to External Or, spill your guts 35 Saturday, 29 October 2011 35

What the...?!? (Pinkwashing obscured) 36 Saturday, 29 October 2011 36

JSON (1) • Javascript has a rich set of literals (ext. reps) – Atomic (numbers, booleans, strings*) • 1, 2, true, “I’m a string” – Composite • Arrays – Ordered lists with random access – [1, 2, “one”, “two”] • “Objects” – Associative arrays/dictionary – {“one”:1, “two”:2} • These can nest! – [{“one”:1, “o1”:{“a1”: [1,2,3.0], “a2”:[]}] • JSON == roughly this subset of Javascript – The internal representation varies • In JS, 1 represents a 64 bit, IEEE floating point number • In Python’s json module, 1 represents a 32 bit integer in two’s complement *Strings can be thought of as a composite, i.e., an array of characters, but not here. 37 Saturday, 29 October 2011 37

JSON (2) {"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }} Slightly different! <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 38 Saturday, 29 October 2011 38

JSON (2.1) Needed to preserve {"menu": [{ order! "id": "file", "value": "File"}, "popup": [ "menuitem": {"value": "New", "onclick": "CreateNewDoc()"}, "menuitem": {"value": "Open", "onclick": "OpenDoc()"}, "menuitem": {"value": "Close", "onclick": "CloseDoc()"} ] ] }} Still not right! <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 39 Saturday, 29 October 2011 39

JSON (2.2) {"menu": [{"id": "file", "value": "File"}, [{"popup": [{}, [{"menuitem": [{"value": "New", "onclick": "CreateNewDoc()"},[]]}, {"menuitem": [{"value": "Open", "onclick": "OpenDoc()"},[]]}, {"menuitem": [{"value": "Close", "onclick": "CloseDoc()"},[]]} ] ] } ] ] } <menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup> </menu> http://www.json.org/example.html 40 Saturday, 29 October 2011 40

JSON (2.1) Recipe • Elements are mapped to “objects” – With one pair • ElementName : contents • Contents are a list – First item is an “object”, the attributes • Attributes are pairs of strings – Second item is a list (of children) • Empty elements require an explicit empty list • No attributes requires an explicit empty object Cumbersome! 41 Saturday, 29 October 2011 41

JSON vs. XML (expressivity) CLICK! • Every XML WF DOM can be faithfully represented as a JSON object • Every JSON object can be faithfully represented as an XML WF DOM • Every WXS PSVI can be faithfully represented as a JSON object • Every JSON object can be faithfully represented as a WXS PSVI 42 Saturday, 29 October 2011 42

Considerations • For “same system” – Roundtripping (both ways) should be exact – Same program should behave the same in similar conditions • For homogenous, distinct systems – Roundtripping (both ways) should be exact – Same program should behave the same in similar conditions – Interop! • For heterogenous systems – Roundtripping should be reasonable – Analogous programs should behave analogously • in analogous conditions – Weaker notion of interop 43 Saturday, 29 October 2011 43

What is an XML “Document”? • Layers – A series of octets Errors here mean no – A series of unicode characters XML! SAX ErrorHandler – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure Yay! XPath! XSLT! Etc. • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape Types in play • A PSVI wrt an WXS 44 Saturday, 29 October 2011 44

What is an XML “Document”? • Layers validate – A series of octets – A series of unicode characters – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS erase 45 Saturday, 29 October 2011 45

What is an XML “Document”? • Layers – A series of octets – A series of unicode characters – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens – A tree structure • A DOM/Infoset – A tree of a certain shape “Same” inputs can • A Validated Infoset have different “meanings”! – An adorned tree of a certain shape (external validation) • A PSVI wrt an WXS 46 Saturday, 29 October 2011 46

What is an XML “Document”? • Layers Generally looks like <configuration xmlns="http://saxon.sf.net/ns/configuration" – A series of octets edition="EE"> <serialization method="xml" /> </configuration> – A series of unicode characters – A series of “events” But can look otherwise! • SAX perspective element configuration { attribute edition {"ee"}, • E.g., Start/End tags element serialization {attribute method {"xml"}}} • Events are tokens – A tree structure Same “meaning”, • A DOM/Infoset different spelling – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS 47 Saturday, 29 October 2011 47

What is an XML “Document”? • Layers – A series of octets – A series of unicode characters Can have many... – A series of “events” • SAX perspective • E.g., Start/End tags • Events are tokens ..for “the same” meaning – A tree structure • A DOM/Infoset – A tree of a certain shape • A Validated Infoset – An adorned tree of a certain shape • A PSVI wrt an WXS – A picture (or document, or action, or … ) • Application meaning 48 Saturday, 29 October 2011 48

The Essence of XML • Thesis: – “XML is touted as an external format for representing data.” • Two properties – Self-describing • Destroyed by external validation – Round-tripping • Destroyed by defaults and union types http://bit.ly/essenceOfXML2 49 Saturday, 29 October 2011 49

Self-description • As standard descriptoin – A series of octets – A series of unicode characters Well-formed – A series of “events” Only one way to parse it • SAX perspective • E.g., Start/End tags • Events are tokens Internal – A tree structure (DTD and doc • A DOM/Infoset are one) – A tree of a certain shape • A Validated Infoset External – An adorned tree of a certain shape (Schema and doc are • A PSVI wrt an WXS separate; – A picture (or document, or action, or … ) out-of-band desription) • Application meaning 50 Saturday, 29 October 2011 50

Roundtripping Fail: Defaults sparse.dtd Test-sparse.xml <!ELEMENT a (b)+> count(//@c) = 1 <!ELEMENT b EMPTY> <a> <!ATTLIST b c CDATA #IMPLIED> </a> Test.xml <a> Validate Query Serialize </a> <a> <!ELEMENT a (b)+> count(//@c) = 2 </a> <!ELEMENT b EMPTY> <!ATTLIST b c CDATA 'foo'> Test-full.xml full.dtd Can we think of Test-sparse and -full as “the same”? 51 Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 51

Not self-describing! • Under external validation • Not just legality, but content! – The PSVIs have different information in them! 52 Saturday, 29 October 2011 52

Roundtripping “Success”: Types bare.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> count(//b) = 2 <xs:element name="a"> <xs:complexType> <xs:sequence> <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema> Test.xml Query <a> Validate </a> typed.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> count(//b) = 2 <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema> 53 Note: WXS can do default attributes as well. Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 53

Roundtripping “Success”: Types bare.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> count(//b) = 2 <xs:element name="a"> <xs:complexType> <xs:sequence> count(//element(*,btype)) = ? <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema> Test.xml Query2 <a> Validate </a> typed.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> count(//b) = 2 <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> count(//element(*,btype)) = 2 </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema> 54 Note: WXS can do default attributes and elements as well. Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 54

Roundtripping “Success”: Types bare.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> count(//b) = 2 <xs:element name="a"> <xs:complexType> <xs:sequence> count(//element(*,btype)) = ? <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> </xs:schema> Test.xml Query2 <a> Validate </a> typed.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> count(//b) = 2 <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> count(//element(*,btype)) = 2 </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema> 55 Note: WXS can do default attributes as well. Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 55

Roundtripping “Success”: Types bare.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> count(//b) = 2 <xs:element name="a"> <xs:complexType> <xs:sequence> count(//element(*,btype)) = ? <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="b"/> Test.xml </xs:schema> Test.xml <a> Query2 Serialize <a> Validate </a> </a> typed.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="a"/> <xs:complexType name="atype"> <xs:sequence> count(//b) = 2 <xs:element ref="b" maxOccurs="unbounded"/> </xs:sequence> count(//element(*,btype)) = 2 </xs:complexType> <xs:element name="b" type="btype"/> <xs:complexType name="btype"/> </xs:schema> Does external through internal succeed? Does internal through external succeed? 56 Note: WXS can do default attributes as well. Note: In oXygen, one needs to use internal validation. Saturday, 29 October 2011 56

More Roundtripping Fail • Type – Internal to external and back • Take an element, foo, with content {“one”, “2”, 3} • It’s (simple) type is a list of union of integer and string • Serialize – <foo>one 2 3</foo> • Parse and validate – Content is {“one”, 2, “3”} – Key type info LOST » Silently » With only 1 schema • Spelling – External to internal and back • “001” to 1 to “1” – Whitespace and layout http://bit.ly/essenceOfXML2 57 Saturday, 29 October 2011 57

The Essence of XML • Conclusion: – “So the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.” • It ʼ s not obvious – That the issues are serious (enough) – That the problem solved is all that easy – That there aren ʼ t other, worse issues http://bit.ly/essenceOfXML2 58 Saturday, 29 October 2011 58

The Essence of Error Or, so wrong it’s right 59 Saturday, 29 October 2011 59

How to cope? • With which task? – Authoring, aggregating, querying … • Settle on a core representation of the model – Perhaps the Atom DOM • Coerce/transform/extract other models – To the representative one – Or build software that mediates the difference • Hope that there aren’t too many • Advocate standards! – Or make them – The nice thing about standards is that there are so many of them to choose from. • Kent Pitman and others Saturday, 29 October 2011 60

Postel’s Law Be liberal in what you accept, and conservative in what you send. • Liberality – Many DOMs, all expressing the same thing – Many surface syntaxes (perhaps) for each DOM • Conservativity – What should we send? • It depends on the receiver! – Minimal standards? • Well formed XML? • Valid according to a popular schema/format? • HTML? Saturday, 29 October 2011 61

Structure and Presentation • We’ve called this “DOM” and “Application” Layer – A very common application layer is “rendering” • Text, images • Like, y’know, the web • Standard vs. default renderings • Goes back to SGML <sentence style="slanted">This sentence is false.</sentence> Correct rendering This sentence is false. Fallback! This sentence is false. (Still see this in XSLT!) 62 Saturday, 29 October 2011 62

Why Separate them? • Presentation is more fluid than structure – The "look" may need updating • Presentation needs may vary – What works for 21" screens doesn't for mobile phones • (Or maybe not!) • Accessibility – (content should be perceivable by everyone) • Programmatic processing needs 63 Saturday, 29 October 2011 63

Another digression: CSS • The style language for the Web – Strong separation of presentation • CSS is – not an XML/angle brackets format • Oh NOES! Not another one! – annotative, not transformative • Well, sorta – mostly “formats” nodes – ubiquitous on the Web, esp. client side – works with arbitrary XML • But most clients work with (X)HTML • See the excellent PrinceXML formatter 64 Saturday, 29 October 2011 64

Basic Component • Rules – Which consist of • Selectors – Like XPath expressions – But only forward, with some syntactic sugar • Declaration blocks – Sets of property/value pairs div.title { text-align:center; font-size: 24; } 65 Saturday, 29 October 2011 65

<html><head><title>A bit of style</title></head> <body><style type="text/css"> .title { font-weight: bold } div.title { text-align:center; font-size: 24; } div.entry div.title { text-align: left; font-variant: normal} span.date {font-style: italic} span.date:after {content:" by"} div.content {font-style: italic} div.content i {font-style: normal;font-weight: bold} #one {color: red}</style> <div class="title">My Weblog</div> <div class="entry"> <div class="title">What I Did Today</div> <div class="byline"> Feb. 09, 2009 Bijan Parsia </div> <div class="content" id="one"> Taught a class and it went very well. </div> </div> </body></html> Try it in http://software.hixie.ch/utilities/js/live-dom-viewer/ 66 Saturday, 29 October 2011 66

Media Types • Different sets of rules can be contextualized to media – Screen, Print, Braille, Aural … • This is done with groupings called “@media rule”s @media print { BODY { font-size: 10pt } } Larger font size for screen @media screen { BODY { font-size: 12pt } } 67 Saturday, 29 October 2011 67

Cascading • CSS Rules cascade – That is, there is overriding (and non-overriding) inheritance • That is, rules combine in different ways – http://www.w3.org/TR/CSS21/cascade.html#cascade • General principles – Distance to the node is significant – Precision of selectors is significant – Order of appearance is significant 68 Saturday, 29 October 2011 68

Error Handling • XML has “draconian” error handling – Well formedness error … BOOM • CSS has “forgiving” error handling – “Rules for handling parsing errors” http://www.w3.org/TR/CSS21/syndata.html#parsing-errors • That is, how to interpret illegal documents • Not reporting errors, but working around them – E.g.,“User agents must ignore a declaration with an unknown property.” • Replace: “ h1 { color: red; rotation: 70minutes } ” • With: “ h1 { color: red } ” • Study the error handling rules! 69 Saturday, 29 October 2011 69

CSS Robustness • Has to deal with Web conditions 1. People borrowing 2. People collaborating 3. Different devices 4. Different kinds of audiences (and authors) 5. Maintainability 6. Aesthetics • CSS is designed for this – Cascading & Inheritance help with 1, 2, 5 • And importing, of course – @media rules help with 3-6 – Error handling helps with 1, 2, 4 70 Saturday, 29 October 2011 70

Errors! • One person’s error is another’s data • Errors may or may not be unusual • Errors are relative to a norm • Preventing errors – Make errors hard or impossible to make • Make doing things hard or impossible – Make doing the right thing easy and inevitable – Make detecting errors easy – Make correcting errors easy – Correct errors – Fail silently – Fail randomly – Fail differently (interop problem) 71 Saturday, 29 October 2011 71

(Perceived) Affordances • (Perceived) Affordance – an available action that is salient to the actor Donald Norman, The Design of Everyday Things Saturday, 29 October 2011 72

(Perceived) Affordances • (Perceived) Affordance – an available action that is salient to the actor Donald Norman, The Design of Everyday Things Saturday, 29 October 2011 73

Attractive Nuisances • A dominant or attractive affordance – with a bad or wrong action – In law, “a hazardous object or condition on the land that is likely to attract children who are unable to appreciate the risk posed by the object or condition” -- ye olde Wikipedia – We can reformulate • “a hazardous or misleading language or UI feature that is likely to be misused by (even) an educated user” • Contrast with “merely” hard to use – An attractive nuisance is easy to attempt, hard to use (correctly), and has bad (to catastrophic) effects Saturday, 29 October 2011 74

Typical Schema Languages • Grammar (and maybe type based) – Recognize all or none • Though what the “all” is can be rather flexible – Restrictive by default • Slogan: What is not permitted is forbidden – Error detection and reporting • Is at the discretion of the system • “Not accepted” is the starting place • The point where an error is detected – might not be the point where it occurred – might not be the most helpful point to look at! • Programs! – Null pointer deref » Is the right point the deref or the setting to null? – Non-crashing errors Saturday, 29 October 2011 75

The SSD Way • Explore before prescribe • Describe rather than define • Take what you can, when you can take it • Extra or missing stuff is (can be) OK – Irregular structure! • Adhere to the task at hand • Adore Postel’s Law Saturday, 29 October 2011 76

XML Error Handling • De facto XML motto – Be strict about the well formedness of what you accept, and strict in what you send – Draconian error handling – Severe consequences on the Web • And other places • Fail early and fail hard • What about higher levels? – Validity and other analysis? – Most schema languages poor at error reporting • How about XQuery’s type error reporting? Saturday, 29 October 2011 77

XML Error Handling • The spec: – fatal error [Definition: An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).] • What should an application do? – To or for its users Saturday, 29 October 2011 78

XPath for Validation • What XPath is “equivalent” to the declaration of ? <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd valid.xml <a> =3 =0 =0 count(//b) count(//b/*) count(//b/text()) </a> invalid.xml <a> =3 =1 =1 Foo </a> <a> <a> =0 =0 Foo </a> </a> Saturday, 29 October 2011 79

XPath for Validation • What XPath is “equivalent” to the declaration of ? <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd valid.xml <a> =0 </a> count(//b/(* | text())) invalid.xml <a> =2 <a> <a> Foo =1 =1 Foo </a> </a> </a> Saturday, 29 October 2011 80

XPath for Validation • What XPath is “equivalent” to the declaration of ? <!ELEMENT a (b)+> <!ELEMENT b EMPTY> simple.dtd valid.xml <a> =valid if (count(//b/(* | text()))=0) </a> then “valid” invalid.xml else “invalid” <a> Foo =invalid </a> <a> <a> </a> Foo Can even “find” the errors! </a> Saturday, 29 October 2011 81

Saturday, 29 October 2011 82

XPath (etc) for Validation • We could have finer control – Validate parts of a document – A la wildcards • But with more control! • We could have greater expressivity – Far reaching dependancies – Computations • Essentially, code based validation! – With XQuery and XSLT – But still a leetle declarative • We always need it The essence of Schematron Saturday, 29 October 2011 83

Schematron • A different sort of schema language – Not grammar or object/type based – Rule based – Test oriented – Complimentary • Conceptually simple – Patterns contain rules • Rules set a context and contain asserts and reports (A&Rs) • A&Rs contain – Tests, which are XPath expressions, and – Assertions, which are natural language descriptions Saturday, 29 October 2011 84

DTDx Schematron • “Only 1 Element declaration with a given name” – (Ok, could handle this with Keys in XML Schema!) <rule context="element"> <let name="n" value="@name"/> <assert test="count(//element/name[text()=$n]) = 1"> There can be only one element declaration with a given name. </assert> </rule> • “Every element reference must have a corresponding element declaration ” <rule context="elementref"> <let name="r" value=”/ref/text()"/> <assert test="count(//element/nametext()=$r]) = 1"> There must be an element declaration (with the right name) for elementref to refer to. </assert> </rule> Saturday, 29 October 2011 85

From HTML5: Exclusions • HTML5 validator • http://hsivonen.iki.fi/thesis/ –Relax NG schema –Schemetron assertions –Custom code • Often want contextual exclusions –To break circles: •Paragraphs contain footnotes •Footnotes contain paragraphs •Footnote paragraphs may not contain footnotes • Without exclusions, would need many paragraph productions Saturday, 29 October 2011 86

Exclusions Examples <schema xmlns="http://purl.oclc.org/dsdl/schematron"> <ns prefix="h" uri="http://www.w3.org/1999/xhtml"/> <pattern name='dfn cannot nest'> <rule context="h:dfn"> <report test="ancestor::h:dfn"> The "dfn" element cannot contain any nested "dfn" elements.</report> </rule> </pattern> <pattern name='noscript cannot nest'> <rule context="h:noscript"> <report test="ancestor::h:noscript"> The "noscript element cannot contain any nested "noscript" elements.</report> </rule> </pattern> </schema> Saturday, 29 October 2011 87

Tip of the iceberg • Computations –Using XPath functions and variables • Dynamic checks –Can pull stuff from other file • Elaborate reports –diagnostics has ( value-of ed) expressions –“Generate paths” to errors •Sound familiar? • General case –Thin shim over XSLT –Closer to “arbitrary code” Saturday, 29 October 2011 88

Interesting Points • DTDx has a WXS – Schematron doesn’t care – Two phase validation •RELAX NG has a way of embedding •WXS 1.1 incorporating similar rules • Arbitrary XPath for context and test – Plus variables! • What isn’t forbidden is permitted – Unlike all the other schema languages! – We’re not performing runs • We’re firing rules – Somewhat easy to use • If you know XPath • If you don’t need coverage – What about analysis? Saturday, 29 October 2011 89

Schematron Presumes… • …well formed XML –As do all XML schema languages •Work on DOM! –So can’t help with e.g., overlapping tags •Or tag soup in general •Namespace Analysis!? • …authorial repair –At least, in the default case •Communicate errors to people •Thus, not the basis of a modern browser! –Unlike CSS • Is this enough liberality? –Or rather, does it support enough liberality? Saturday, 29 October 2011 90

Take the following sample XHTML code: 01. <html> 02. <head> 03. <title>Hello!</title> 04. <meta http-equiv="Content-Type" content="application/xhtml+xml" /> 05. </head> 06. <body> 07. Hello to you! 08. Can you spot the problem? 09. </body> 10. </html> 91 Slide due to Iain Flynn Saturday, 29 October 2011 91

HTML: XHTML: 92 Slide due to Iain Flynn Saturday, 29 October 2011 92

Validation In The Wild • HTML – 1%-5% of web pages are valid – Validation is very weak! – All sorts of breakage • E.g., overlapping tags • hi there, my good friend • Syndication Formats – 10% feeds not well-formed – Where do the problems come from? • Hand authoring • Generation bugs • String concat based generation • Composition from random sources Saturday, 29 October 2011 93

More recently In 2005, the developers of Google Reader (Google’s RSS and Atom feed parser) took a snapshot of the XML documents they parsed in one day. • Approximately 7% of these documents contained at least one well-formedness error. • Google Reader deals with millions of feeds per day. – That’s a lot of broken documents Source: http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html Slide due to Iain Flynn Saturday, 29 October 2011 94

Encoding Structure Entity Typo Text Slide due to Iain Flynn Saturday, 29 October 2011 95

!""#"$%"&'()#*+,$ 657$() 2,) !"#$%&"'() !"4.5() *+,) **,) !"#$%&"'() -./0#.0/1() !"4.5() 657$() -./0#.0/1() 23,) Slide due to Iain Flynn Saturday, 29 October 2011 96

A Thought Experiment • “Imagine...that all web browsers use strict XML parsers” • “...that you were using a publishing tool that [was strict] – “All of its default templates were valid XHTML.” – “It incorporated a nifty layout editor to ensure that you couldn’t introduce any invalid XHTML...” • “You click ‘Publish’” – “the page that you...validly authored is now not well-formed” • Problem: “a trackback with some illegal characters” – “...your publishing tool had a bug” – “The administration page itself tries to display the trackbacks you’ve received, and you get an XML processing error.” http://diveintomark.org/archives/2004/01/14/thought_experiment Saturday, 29 October 2011 97

Real Life Saturday, 29 October 2011 98

Lesson #1 • We are dealing with socio-political (and economic) phenomena – Complex ones! – Many players; many sorts of player – Lots of historical specifics – Lots of interaction effects • Human factors critical – What do people do (and why?) – How to influence them? – Affordances and incentives – Dealing with “bozos” • “There’s just no nice way to say this: Anyone who can’t make a syndication feed that’s well-formed XML is an incompetent fool.” Saturday, 29 October 2011 99

3 Error Handling Styles • Draconian – Fail hard and fast Every set of bytes • Ignore errors has a corresponding – CSS, DTD ATTLISTs, HTML (determinate) DOM • Hard coded DWIM repair – HTML, HTML5 • Ultimately, (some) errors are propagated – The key is to fail correctly • In the right way, at the right time, for the right reason – With the right message! • Better is to make errors unlikely! Saturday, 29 October 2011 100

COMP60411 Semi-structured Data and the Web Validating Trees against - PowerPoint PPT Presentation

COMP60411 Semi-structured Data and the Web Validating Trees against Tree Grammars The Essence of XML and Errors Bijan Parsia and Uli Sattler University of Manchester 1 Saturday, 29 October 2011 1 Last week... ...we have designed our first

Semi-structured data Data is not just text, but is not as well- Semi-structured data

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

COMP60411 Modelling Data On The Web Tim Morris & Uli Sattler Week 1 Introduction, Data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris & Uli Sattler

Validating Procedural Knowledge in the Validating Procedural Knowledge in the Open Virtual

COMP60411 Modelling Data on the Web More error handling & RDF, a graph-based DM

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, errors, robustness week 4

(XML from Chapter 20 of text) Outline Why Structured Data? Types of Structured Data

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Masses Alon Halevy Google Structured Data & The Web Hard to find structured data via search

Approximate Osher-Solomon schemes for hyperbolic systems az 1 , Jos e M. Gallardo 1 , Antonio

Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling Daichi

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

PSI2 : Envelope Perfect Sampling of Non Monotone Systems c 1 , Bruno Gaujal 2 , 3 Ana Bu si

Vamsidhar Thummala Joint work with Shivnath Babu, Songyun Duan, Nedyalkov Borisov, and Herodotous

Readings Covered Additional Readings Statistical Graphics Multi-Scale Banking to 45 Degrees.

Game Playing Chapter 5 - supplement Various deterministic board games 1 Othello (reversi,

The Metric Coalescent Process joint with David Aldous Daniel Lanoue June 17, 2014 Daniel Lanoue

COMP60411 Semi-structured Data and the Web Validating Trees against - PowerPoint PPT Presentation

COMP60411 Semi-structured Data and the Web Validating Trees against Tree Grammars The Essence of XML and Errors Bijan Parsia and Uli Sattler University of Manchester 1 Saturday, 29 October 2011 1 Last week... ...we have designed our first

Semi-structured data Data is not just text, but is not as well- Semi-structured data

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

COMP60411 Modelling Data On The Web Tim Morris &amp; Uli Sattler Week 1 Introduction, Data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris &amp; Uli Sattler

Validating Procedural Knowledge in the Validating Procedural Knowledge in the Open Virtual

COMP60411 Modelling Data on the Web More error handling &amp; RDF, a graph-based DM

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &amp;

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness &amp; Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness &amp; Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, errors, robustness week 4

(XML from Chapter 20 of text) Outline Why Structured Data? Types of Structured Data

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Masses Alon Halevy Google Structured Data &amp; The Web Hard to find structured data via search

Approximate Osher-Solomon schemes for hyperbolic systems az 1 , Jos e M. Gallardo 1 , Antonio

Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling Daichi

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

PSI2 : Envelope Perfect Sampling of Non Monotone Systems c 1 , Bruno Gaujal 2 , 3 Ana Bu si

Vamsidhar Thummala Joint work with Shivnath Babu, Songyun Duan, Nedyalkov Borisov, and Herodotous

Readings Covered Additional Readings Statistical Graphics Multi-Scale Banking to 45 Degrees.

Game Playing Chapter 5 - supplement Various deterministic board games 1 Othello (reversi,

The Metric Coalescent Process joint with David Aldous Daniel Lanoue June 17, 2014 Daniel Lanoue

COMP60411 Modelling Data On The Web Tim Morris & Uli Sattler Week 1 Introduction, Data

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris & Uli Sattler

COMP60411 Modelling Data on the Web More error handling & RDF, a graph-based DM

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4

Masses Alon Halevy Google Structured Data & The Web Hard to find structured data via search