COMP6037 We know Semi-structured Data and the Web when a grammar - - PowerPoint PPT Presentation

comp6037
SMART_READER_LITE
LIVE PREVIEW

COMP6037 We know Semi-structured Data and the Web when a grammar - - PowerPoint PPT Presentation

Clarification: a grammar, its language, and their types COMP6037 We know Semi-structured Data and the Web when a grammar is local: i.e., if none of their non-terminal symbols compete Uniqueness in Trees, given a grammar G, what


slide-1
SLIDE 1

1

COMP6037 Semi-structured Data and the Web Uniqueness in Trees, Repercussion on interesting problems, and Graphs 5.2

Uli Sattler

University of Manchester

Clarification: a grammar, its language, and their types

  • We know
  • when a grammar is local:

i.e., if none of their non-terminal symbols compete…

  • given a grammar G, what the language (set of trees) L(G) of G is:

L(G) := { t | t is a tree accepted by G}

  • what it means for a language (set of trees) L to be local:

i.e., if we can find a local grammar G such that L = L(G)

  • hence to find out whether L is local

(and perhaps L is given through a grammar G, i.e., L = L(G)) you need to determine whether we can find/construct a local grammar F such that L = L(F)

  • ...the above works analgously if “local” is replaced with “single-type”

2

finite

Clarification: a grammar, its language, and their types

  • Remember: we saw
  • G is not single-type
  • G’ is single-type:

Author and BA still compete, but don’t occur together in a rule!

  • L(G’) = L(G)
  • hence L(G) is

single-type!

3

G = (N, ,S, P) with N = {Book, Author, Editor, Affilia, Paper, F, L} = {B, P, Name, F, L, A} S = {Book, Paper} P = { Book B Editor|Author, Paper P Author, Editor Name F,L, Author Name L,Affilia, F F , L L , Affilia A } G’ = (N’, ’,S’, P’) with N’ = {Book, Author, Editor, Affilia, Paper, F, L} ’ = {B, P, Name, F, L, A} S’ = {Book, Paper} P’ = { Book B BA, Paper P Author, BA Name (F,L)|(L,Affilia), Author Name L,Affilia, F F , L L , Affilia A }

Things done so far

  • [structures] semi-structured data, XML, datamodels, trees
  • [description mechanisms] schema languages

– of different styles, strengths, purposes – validation, validate-as, PSVIs – a useful abstraction: tree grammars

  • [‘difficult’ extensibility mechanism] namespaces, schemas
  • [interaction mechanisms] query languages, parsers,

– possibly schema aware – namespace aware

  • error handling
  • [modelling] attributes vs elements, deep vs flat, ...
  • ...today:

– we go back to [structures]: beyond trees, and – other ‘tasks’ around schemas – more modelling, human factors – exam preview

4

slide-2
SLIDE 2

So far, there were trees everywhere

  • trees in semi-structured data

– apart from when object identifiers are used

  • parse trees from XML documents
  • DOM trees
  • infosets
  • XPath datamodel tree
  • trees that tree grammars run on

– and that Relax NG and Schematron work on

  • ...but is everything really a tree?

– e.g., you, your friends and family, and the relationships between them?

5

Trees and families: family trees!

  • Assume you want to work with/display/search/combine/... family trees

– you are interested in genealogy – you work with a solicitor who handles inheritance cases – you study genetics – ....

  • easy:

– information is patchy & varied, thus use XML – let’s build a DTD for this ...

6 <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT family-tree (person | family)*> <!ELEMENT person (name, birth?, death?, father?, mother?, note?)> <!ELEMENT family (husband?, wife?, child*, marriage*, divorce*, note*)> <!ELEMENT father (name, birth?, death?, father?, mother?, note?)> <!ELEMENT mother (name, birth?, death?, father?, mother?, note?)> ... <!ELEMENT name (firstname?, middle?, lastname)> <!ELEMENT middle (#PCDATA)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT given (#PCDATA)> .... example taken from & modified http://penguin.dcs.bbk.ac.uk/academic/xml/family/index.php

Trees and families: family trees!

  • in order to ensure

– integrity: a person’s DoB should be the same regardless of where they occur in

  • ur tree

– maintainability: when we change a person’s data (e.g., add DoD), we should only have to do it once we can make use of IDs & IDREFs

7 <?xml version="1.0" encoding="UTF-8"?> <!ENTITY % reference "person IDREF #REQUIRED"> <!ELEMENT family-tree (person | family)*> <!ELEMENT person (name, birth?, death?, father?, mother?, note?)> <!ELEMENT family (husband?, wife?, child*, marriage*, divorce*, note*)> <!ELEMENT name (firstname?, middle?, lastname)> <!ELEMENT middle (#PCDATA)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT given (#PCDATA)> <!ATTLIST person id ID #REQUIRED sex (m | f) #IMPLIED> <!ELEMENT father EMPTY> <!ATTLIST father %reference;> <!ELEMENT mother EMPTY> <!ATTLIST mother %reference;> <!ELEMENT wife EMPTY> <!ATTLIST wife %reference;> ....

before:

<!ELEMENT father (name, birth?, death?, father?, mother?, note?)> <!ELEMENT mother (name, birth?, death?, father?, mother?, note?)>

Trees and families: family trees!

  • things work nicely:
  • e.g., to retrieve all pairs of persons and their

fathers, we can use a simple XQuery:

8 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family-tree SYSTEM "family.dtd"> <family-tree> <person id="p5" sex="m"> <name> <firstname>Alfred Ernest</firstname> <lastname>Farmer</lastname> </name> <death> <place>Finsbury Park, London</place> <date>8 January, 1964</date> </death> </person> <person id="p6" sex="m"> <name> <firstname>Ronald Alfred</firstname> <lastname>Farmer</lastname> </name> <birth> <place>London</place> <date>27 April, 1922</date> </birth> <death> <place>Hill House Nursing Home, Kenley, Surrey</place> <date>23 November, 2003</date> </death> <father father="p5"/> </person> <person id="p7" sex="f"> <name> <firstname>Daisy May</firstname> <lastname>Farmer</lastname> </name> <death>

let $d := doc("family.xml") for $p in $d//person return <childAndParents> <child>{ $p/name }</child> { if ($p/father/@father != "") then <father>{ id($p/father/@father)/name } </father> else <fatherUnknown/>} { if ($p/mother/@mother != "") then <mother>{ id($p/mother/@mother)/name } </mother> else <motherUnknown/>} </childAndParents>

slide-3
SLIDE 3

Trees and families: family trees!

  • things work sort of nicely:
  • e.g., to retrieve fathers and the number of their

children, we can still use an XQuery: ...we need “joins” on ID/IDREFs

9 5 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family-tree SYSTEM "family.dtd"> <family-tree> <person id="p5" sex="m"> <name> <firstname>Alfred Ernest</firstname> <lastname>Farmer</lastname> </name> <death> <place>Finsbury Park, London</place> <date>8 January, 1964</date> </death> </person> <person id="p6" sex="m"> <name> <firstname>Ronald Alfred</firstname> <lastname>Farmer</lastname> </name> <birth> <place>London</place> <date>27 April, 1922</date> </birth> <death> <place>Hill House Nursing Home, Kenley, Surrey</place> <date>23 November, 2003</date> </death> <father person="p5"/> </person> <person id="p7" sex="f"> <name> <firstname>Daisy May</firstname> <lastname>Farmer</lastname> </name> <death>

let $d := doc("family.xml") for $p1 in $d//person let $n := fn:count(//person/father[@father = $p1/@id]) where $n > 1 return <FathersAndNumberOfChildren> <father>{ $p1/name }</father> <NrOfChildren> { $n }</NrOfChildren> </FathersAndNumberOfChildren>

Trees and families: family trees!

  • things get a bit more tricky:
  • e.g., to retrieve all uncles and their nieces

and nephews, we can still use an XQuery (the

  • ne below is incomplete!):

...we need a few “joins” on ID/IDREFs

10 5 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family-tree SYSTEM "family.dtd"> <family-tree> <person id="p5" sex="m"> <name> <firstname>Alfred Ernest</firstname> <lastname>Farmer</lastname> </name> <death> <place>Finsbury Park, London</place> <date>8 January, 1964</date> </death> </person> <person id="p6" sex="m"> <name> <firstname>Ronald Alfred</firstname> <lastname>Farmer</lastname> </name> <birth> <place>London</place> <date>27 April, 1922</date> </birth> <death> <place>Hill House Nursing Home, Kenley, Surrey</place> <date>23 November, 2003</date> </death> <father person="p5"/> </person> <person id="p7" sex="f"> <name> <firstname>Daisy May</firstname> <lastname>Farmer</lastname> </name> <death>

let $d := doc("family.xml") for $p1 in $d//person for $p2 in $d//person where $p1 != $p2 and id(id($p1/father/@father)/father/@father) = id($p2/father/@father) and id($p1/father/@father) != $p2 return <UncleAndNiecesOrNephews> <uncle>{ $p1/name }</uncle> { if ($p2/@sex = "m") then <nephew> { $p2/name } </nephew> else <niece>{ $p2/name } </niece>} </UncleAndNiecesOrNephews>

Trees and objects with identities

  • Remember in first session, when we were talking about semi-structured data

– we observed that, in the presence

  • f object identifiers, the resulting

data model isn’t tree-shaped

  • in DTDs, XSD, Relax NG, we can use

ID/IDREFs as object identifiers – it’s supported by XPath, XQuery,... – it does the right thing – it’s conceptually slightly tricky: what is a node/child/... and what are objects with identifiers

  • in XSD, we have more powerful

mechanisms...

11 Example: { persons: {person: &o1 { name: “John”, age: 47, relatives: {child: &o2, child: &o3}} person: &o2 { name: “Mary”, age: 21, relatives: {father: &o1, sister: &o3}} person: &o3{ name: “Paula”, age: 23, relatives: {father: &o1, sister: &o2}}}} 12

XML schemas: uniqueness and key constraints

  • like a DB schema language, XML Schema allows to express

– key constraints: e.g., the combination of date-of-birth and place-of-birth is a key for persons – uniqueness constraints: e.g., each phone number should only be mentioned once in the list of contacts of a person

  • let’s see how this works in DTDs via ID/IDREFs:

– they relate elements with elements, like pointers – IDs, IDREFs are attributes of elements, e.g., – rather restrictive:

  • 1. only XML names can be used as IDs.

Tricky if we want to transform a DB with its keys into an XML document. Tricky also because identity is string-identity...

  • 2. only one ID per element allowed!
  • 3. ID must be unique per document -- not per type
  • 4. we cannot build “composite keys”

<!ATTLIST myName identifier ID #REQUIRED> <!ATTLIST mybook owner IDREF #IMPLIED>

slide-4
SLIDE 4

13

XML Schema: uniqueness and key constraints

In XML Schema,

  • the idea of ID/IDREFs is extended

– and ID/IDREFs are supported in slight disguise

  • we can have

– keys of various types, e.g., integer or string – composite keys – context-sensitive keys

  • identity is based on “value identity”

– e.g., for integers, 002 = 2 – e.g., for strings, 002 != 2 – compare this with IDs in DTDs!

  • keys are a bit more tricky (no pain, no gain) to be declared and used

14

XML Schema: uniqueness

  • how can we make something

a key or unique?

  • e.g., for our list of person, myId

– should identify a person – hence it must be unique – i.e., no two elements of persontype have the same myId value xs:key requires a myId value for all elements of persontype

  • in case we cannot/do not want to guarantee , use xs:unique
  • in general, xs:selector is a local XPath location path (some restrictions apply) that

selects a set of elements E such that no 2 elements of E have the same value w.r.t. xs:field – hence all nodes in xs:field must have simple content <xs:complexType name="PersonType">

<xs:sequence> <xs:element name="fname" type="xs:string"/> <xs:element name="lname" type="xs:string"/> </xs:sequence> <xs:attribute name="myId" type="xs:int"/> </xs:complexType> <xs:complexType name="personList"> <xs:sequence> <xs:element name="person" type="PersonType" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="myFriendList" type="personList"> <xs:key name="myKey"> <xs:selector xpath="person"/> <xs:field xpath="@myId"/> </xs:key> </xs:element> 15

XML Schema: uniqueness

  • combined uniqueness? composite keys?
  • BTW, we can also use elements as keys
  • e.g., in our list of persons, DoBirth

and PoBirth together shall be unique:

  • in general, we can have any

number of xs:fields

  • and the uniqueness/key constraint

imposes uniquenes of their combination

<xs:element name="myList" type="personList2"> <xs:unique name="myKey2"> <xs:selector xpath="persontype"/> <xs:field xpath="DoBirth"/> <xs:field xpath="PoBirth"/> </xs:unique> </xs:element> <xs:complexType name="persontype"> <xs:sequence> <xs:element name="fname" type="xs:string"/> <xs:element name="lname" type="xs:string"/> <xs:element name="DoBirth" type="xs:date"/> <xs:element name="PoBirth" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="personList2"> <xs:sequence> <xs:element name="persontype" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType>

16

XML Schemas: keys and key references

  • alternatively to xs:key and xs:unique,

we can use attributes of type xs:ID and then reference these from other elements

  • xs:ID is a subtype of NCName which

we could further restrict

  • eg, if we want to cross-reference in
  • ur friend list via “likes” and

“marriedTo”, we use xs:ID as type for myId

  • as in DTDs,

– values of ID type attributes must be unique across document and all attributes of type ID – we can’t use other types such as xs:integer

<xs:complexType name="PersonType3"> <xs:sequence> <xs:element name="fname" type="xs:string"/> <xs:element name="lname" type="xs:string"/> </xs:sequence> <xs:attribute name="myId" type="xs:ID"/> <xs:attribute name="marriedTo" type="xs:IDREF"/> <xs:attribute name="likes" type="xs:IDREFS"/> <xs:attribute name="knows" type="xs:IDREFS"/> </xs:complexType> <xs:complexType name="personList"> <xs:sequence> <xs:element name="person" type="PersonType3" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="myFriendList" type="personList"/>

slide-5
SLIDE 5

Does uniqueness/keyconstraints matter?

  • A little case study: schema emptiness

A grammar G is empty if L(G) = . A schema S is empty if the result GS of its faithful translation into a grammar is empty.

  • E.g., each of these DTDs is empty
  • do we care?

– if your DTD/XSD/Relax NG schema is empty, – it can never have a document validate against it, – so it’s probably not what you had in mind when writing/buying it…

  • we will later see another reason

17

<!ELEMENT root (root+)> <!ELEMENT root (first, next+)> <!ELEMENT first (#PCDATA)> <!ELEMENT next (first, next+)>

Schema Emptiness

  • Being able to detect schema emptiness would be helpful

– not being empty is minimal quality requirement of schema

  • so, let’s design an algorithm
  • let’s make our life easy:

– start with local grammars (we know they correspond to structural part of DTDs)

18 TestEmpty schema/tree Grammar G “non-empty”, if L(G) “empty”, if L(G) =

Input: a tree grammar G = (N, , S, P), good, new-good: subsets of N Init: set good = { Ni | there is a production rule Ni a e in P and L(e) } % start with good as those non-terminals that can label a leaf node set new-good = Repeat set new-good = { Ni N\good | there is a production rule Ni a e in P and a string w good* with w L(e)} % compute those non-terminals not yet in good that can label a node whose % children are labelled with non-terminals from good only set good = good new-good Until new-good = If good S % is any of the start symbols good? then report “non-empty” % if yes, then G can run on some tree else report “empty” % if no, then G can’t run (successfully) on any tree

19 TestEmpty local schema/tree Grammar G “non-empty”, if L(G) “empty”, if L(G) =

locality we can pick

Input: a tree grammar G = (N, , S, P), good, new-good: subsets of N Init: set good = { Ni | there is a production rule Ni a e in P and L(e) } set new-good = Repeat set new-good = { Ni N\good | there is a production rule Ni a e in P and a string w good* with w L(e)} set good = good new-good Until new-good = If good S then report “non-empty” else report “empty” 20 TestEmpty local schema/tree Grammar G “non-empty”, if L(G) “empty”, if L(G) =

L(G) = if and only if TestEmpty(G) reports “empty”

  • how complex is algorithm?
  • in terms of space:
  • it maintains G, good, new-good
  • thus linear (and thus polynomial)

in size of G

  • in terms of time:
  • assume that computation of

good, new-good is polynomial

  • verify that loop is run at most

|N| times

  • thus linear (and thus polynomial)

in size of G TestEmpty is polynomial for DTDs/local grammars

slide-6
SLIDE 6

Back to study: do ID/IDREFS matter for schema emptiness?

  • Now assume you want to design
  • ID/IDREFs have impact on emptiness:

the above DTD is empty!

  • In fact, deciding emptiness of DTDs with ID/IDREFs is considerably harder

than without those...

21 TestEmpty DTD with ID, IDREFs “non-empty”, if L(G) “empty”, if L(G) = <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT root (next,next)> <!ELEMENT next (deepest)> <!ATTLIST next myID ID #REQUIRED> <!ELEMENT deepest (#PCDATA)> <!ATTLIST deepest ref1 IDREF #FIXED "v1"> <!ATTLIST deepest ref2 IDREF #FIXED "v2"> <!ATTLIST deepest ref3 IDREF #FIXED "v3">

Back to study: do ID/IDREFS matter for schema emptiness?

deciding emptiness of DTDs with ID/IDREFs is NP-hard.

  • Proof: we take known NP-hard problem, satisfiability of Boolean formulae

(in special normal form called 3CNF) and build a “cheap”, size-preserving translator – that takes a Boolean formula f and – transforms f into a DTD D(f) (with ID/IDREFs) such that – hence we have an algorithm to test satisfiability of Boolean formulae – since we know that

  • no algorithm for satisfiability of Boolean formulae runs in polynomial time
  • our runs in polynomial time,

hence must be hyper-polynomial

22

L(D(f)) if and only if f is satisfiable

TestEmpty DTD D(f) with ID/ IDREFs “non-empty”, if L(G) and thus f satisfiable “empty”, if L(G) = and thus f unsatisfiable Translator Boolean formula f Translator Translator TestEmpty

...and thus intractable!

Back to study: do ID/IDREFS matter for schema emptiness?

  • Let’s build :
  • remember, it must be such that
  • rather than building it, let’s see an example f and D(f):

23 TestEmpty DTD D(f) with ID/ IDREFs “non-empty”, if L(G) and thus f satisfiable “empty”, if L(G) = and thus f unsatisfiable Translator Boolean formula f

L(D(f)) if and only if f is satisfiable

Translator

<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT root (bindings, clauses)> <!-- Here is the Henry Thompson and Richard Tobin's translation of ( v1 or not(v2) or v3) and ( not(v1) or v3 or v4) and ( not(v3) or not(v4) or v2) --> <!ELEMENT bindings (v1, (v1y|v1n) , v2, (v2y|v2n), v3, (v3y|v3n) , v4, (v4y|v4n))> <!ELEMENT clauses (( v1y | v2n | v3y ) , ( v1n | v3y | v4y ) , ( v3n | v4n | v2y ))> <!ELEMENT v1 EMPTY <!ATTLIST v1 value ID #REQUIRED> <!ELEMENT v2 EMPTY> <!ATTLIST v2 value ID #REQUIRED> <!ELEMENT v3 EMPTY> <!ATTLIST v3 value ID #REQUIRED> <!ELEMENT v4 EMPTY> <!ATTLIST v4 value ID #REQUIRED> <!ELEMENT v1y EMPTY> <!ATTLIST v1y value IDREF #FIXED "v1true"> <!ELEMENT v1n EMPTY> <!ATTLIST v1n value IDREF #FIXED "v1false"> <!ELEMENT v2y EMPTY> <!ATTLIST v2y value IDREF #FIXED "v2true"> <!ELEMENT v2n EMPTY> <!ATTLIST v2n value IDREF #FIXED "v2false"> <!ELEMENT v3y EMPTY> <!ATTLIST v3y value IDREF #FIXED "v3true"> <!ELEMENT v3n EMPTY> <!ATTLIST v3n value IDREF #FIXED "v3false"> <!ELEMENT v4y EMPTY> <!ATTLIST v4y value IDREF #FIXED "v4true"> <!ELEMENT v4n EMPTY> <!ATTLIST v4n value IDREF #FIXED "v4false">

  • example f and D(f):

24

<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT root (bindings, clauses)> <!-- Here is the Henry Thompson and Richard Tobin's translation of ( v1 or not(v2) or v3) and ( not(v1) or v3 or v4y ) and ( not(v3) or not(v4) or v2) --> <!ELEMENT bindings (v1, (v1y|v1n) , v2, (v2y|v2n), v3, (v3y|v3n) , v4, (v4y|v4n))> <!ELEMENT clauses (( v1y | v2n | v3y ) , ( v1n | v3y | v4y ) , ( v3n | v4n | v2y ))> <!ELEMENT v1 EMPTY <!ATTLIST v1 value ID #REQUIRED> <!ELEMENT v2 EMPTY> <!ATTLIST v2 value ID #REQUIRED> <!ELEMENT v3 EMPTY> <!ATTLIST v3 value ID #REQUIRED> <!ELEMENT v4 EMPTY> <!ATTLIST v4 value ID #REQUIRED> <!ELEMENT v1y EMPTY> <!ATTLIST v1y value IDREF #FIXED "v1true"> <!ELEMENT v1n EMPTY> <!ATTLIST v1n value IDREF #FIXED "v1false"> <!ELEMENT v2y EMPTY> <!ATTLIST v2y value IDREF #FIXED "v2true"> <!ELEMENT v2n EMPTY> <!ATTLIST v2n value IDREF #FIXED "v2false"> <!ELEMENT v3y EMPTY> <!ATTLIST v3y value IDREF #FIXED "v3true"> <!ELEMENT v3n EMPTY> <!ATTLIST v3n value IDREF #FIXED "v3false"> <!ELEMENT v4y EMPTY> <!ATTLIST v4y value IDREF #FIXED "v4true"> <!ELEMENT v4n EMPTY> <!ATTLIST v4n value IDREF #FIXED "v4false">

TestEmpty DTD D(f) with ID/ IDREFs “non-empty”, if L(G) and thus f satisfiable “empty”, if L(G) = and thus f unsatisfiable Translator Boolean formula f

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE root SYSTEM "DTD- conformance-NPhard.dtd"> <root><!-- Here is an asignment that makes ( v1 or not(v2) or v3) and ( not(v1) or v3 or v4) and , ( not(v3) or not(v4) or v2) true <bindings> <v1 value="v1false"/><v1n/> <v2 value="v2false"/><v2n/> <v3 value="v3false"/><v3n/> <v4 value="v4false"/><v4n/> </bindings> <clauses> <v2n/> <v1n/> <v3n/> </clauses> </root>

slide-7
SLIDE 7

Back to study: do ID/IDREFS matter for schema emptiness?

  • So, we learned that
  • testing emptiness of DTDs

– without ID/IDREFs is ‘simple’: implementable in polynomial space and time – with ID/IDREFs is ‘hard’:

  • not implementable in polynomial time
  • requires exponential time (unless P=NP) or non-deterministic algorithm
  • but still requires polynomial space
  • what without #FIXED ?

– don’t know yet

  • what about key/uniqueness constraints in XSD?

– even worse: undecidable (we skip proof) – i.e., no chance of building the following :

25 TestEmpty XSD schema with key/uniqueness constraints “non-empty”, if L(G) “empty”, if L(G) = TestEmpty

So ID/IDREFS matter for schema emptiness

  • Do we care? We should:
  • 1. it’s useful to know/check whether schema is empty
  • 2. schema emptiness is the easy/a special case of schema containment

A schema S1 is contained in a schema S2 (written S1 S2) if L(S1) L(S2).

  • If S1 S2, then every document that validates against S1 also validates against

S2.

  • Interesting, e.g., if you have S and want to

– refine it, to S2: make sure that S2 S, or – generalize it, to S3: make sure that S S3

26

Schema containment

A schema S1 is contained in a schema S2 (written S1 S2) if L(S1) L(S2).

  • Let’s see an example: S1 S2

27

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="root" type="A"/> <xs:complexType name="A"> <xs:sequence> <xs:element name="Next" maxOccurs="7" minOccurs="1" type="C"/> </xs:sequence> </xs:complexType> <xs:simpleType name="C"> <xs:restriction base="xs:string"> <xs:maxLength value="7"/> </xs:restriction> </xs:simpleType> </xs:schema> <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="root" type="A"/> <xs:complexType name="A"> <xs:sequence> <xs:element name="Next" maxOccurs="8" minOccurs="0" type="C"/> </xs:sequence> </xs:complexType> <xs:simpleType name="C"> <xs:restriction base="xs:string"> <xs:maxLength value="27"/> </xs:restriction> </xs:simpleType> </xs:schema>

S1 S2

Schema containment

  • I claimed before that

“Schema emptiness is the easy/a special case of schema containment”

  • why?
  • because S is empty if and only if S
  • thus any schema containment tester can be used to test schema emptiness

28

<!ELEMENT root (root+)>

Test Containment schema S1 “contained”, if S1 S2 “not contained” otherwise schema S2 Test Containment schema S “contained” “not contained”

<!ELEMENT root (root+)>

“empty” “not empty”

slide-8
SLIDE 8

Schema containment

  • So we now know that

“Schema emptiness is the easy/a special case of schema containment”

  • hence schema containment is atleast as complex as schema emptiness

– i.e., possibly sensitive to ID/IDREFs, key constraints, etc – NP-hard with ID/IDREFs – undecidable with uniqueness/key constraints

29 Test Containment schema S “contained” “not contained”

<!ELEMENT root (root+)>

“empty” “not empty”

One more: schema containment vs subsumption

  • Remember:

A schema S1 is contained in a schema S2 (written S1 S2) if L(S1) L(S2).

  • ...what happens

– in the presence of types – with validates as? A schema S1 is subsumed by a schema S2 (written S1 S2) if

  • 1. L(S1) L(S2) and %% i.e., S1 S2
  • 2. for all trees T L(S1), for all nodes n in T:
  • Hence if S1 S2 and T validates against S1, then
  • validation of T against S1 produces the same PSVI as
  • validation of T against S2

30

if n validates as type X against S1, then n validates as type X against S2.

Schema subsumption: why do we care?

  • Useful for the same reasons as schema containment:

– e.g., if you have S, and make use of PSVI, and want to

  • refine it, to S2: make sure that S2 S, or
  • generalize it, to S3: make sure that S S3
  • Isn’t schema containment and schema subsumption the same?

– no: S1 S2 implies S1 S2 (easy, by definition) – but S1 S2 and not S1 S2 is possible, e.g.:

31

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="root" type="A"/> <xs:complexType name="A"> <xs:sequence> <xs:element name="Next" maxOccurs="7" minOccurs="1" type="C"/> </xs:sequence> </xs:complexType> <xs:simpleType name="C"> <xs:restriction base="xs:string"> <xs:maxLength value="7"/> </xs:restriction> </xs:simpleType> </xs:schema> <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="root" type="A1"/> <xs:complexType name="A1"> <xs:sequence> <xs:element name="Next" maxOccurs="8" minOccurs="0" type="C1"/> </xs:sequence> </xs:complexType> <xs:simpleType name="C1"> <xs:restriction base="xs:string"> <xs:maxLength value="27"/> </xs:restriction> </xs:simpleType> </xs:schema>

Summary

  • ID/IDREFS, uniqueness/key constraints

– were introduced to XML schema languages to handle non-tree data – so that consistency/maintainability is made easier – (remember family tree)

  • works mostly nice

– but has serious consequences – e.g., on complexity of algorithms/services around schemas, – e.g., schema containment, containment, subsumption

  • ….how would we need to modify

to ensure ID/IDREFs, keys, etc? – at least we need to keep track of all values of attributes in T of ID/IDREF type – hence algorithm

  • can no longer run in linear space w.r.t. depth of T but
  • requires at least linear space w.r.t. size of T
  • (remember SAX/DOM tree memory requirement discussion)?

32 ValAlgo XML doc/Tree T Grammar G “yes”, if T L(G) “no”, otherwise

slide-9
SLIDE 9

Graphs in Trees?

  • Trees are great, even for ‘graphs’:

– the tree back-bone provides a context for the

  • ther edges

– we have a root node – we know a node’s subtree and – its unqiue path to the root node – e.g., if I find <name>John Smith</name> in my XML tree, and

  • if its ancestor is a <person>, then...
  • if its ancestor is a <brewery>, then...

– for OWL/XML, this allows us to use an ‘untyped’ syntax (remember you had, e.g., “ObjectSomeValuesFrom”, “DataSomeValuesFrom”and “ObjectProperty”, “DataProperty”, and everything needed to be declared – from a node’s context, we can disambiguate

33

SubClassOf ( a:Father SomeValuesFrom( a:P a:C) ) <SubClassOf> <OWLClassURI="Father"/> <SomeValuesFrom> <Property URI="hasChild"/> <OWLClass URI="Human"/> </SomeValuesFrom> </SubClassOf>

Graphs in Trees?

  • Trees are great, even for ‘graphs’:

– the tree back-bone provides a context for the

  • ther edges

– we know a node’s subtree and its path to the root node – e.g., we can ask for/copy/pass a node’s subtree and path-to-root node – this possibly gathers all relevant information for that node – think of XSLT’s <xsl:copy-of /> – in a graph, there is no

  • root note
  • subtree
  • ancestors

– in a graph, we have

  • paths and
  • need to make sure ourselves

if they should not repeat nodes and or arcs

34

Graphs

  • Still, various applications come with

– a good notion of identity

  • e.g., NIN for UK residents

– and strong requirements for

  • linking nodes in arbitrary ways
  • without use/necessity/love for tree back-bone
  • This motivated the design of graph based formalisms

– e.g., RDF – an RDF graph is a set of triples <subject, predicate, object> – e.g., <Bijan, teaches-after, Uli>

  • more later today from Bijan

35

Bijan Uli t e a c h e s

  • a

f t e r Thanks for your attention and hard work. I hope you learned a lot, got ideas for new and interesting concepts & questions, also saw a bit how ‘formal methods’ can be useful, saw how seemingly straightforward things (like trees) can become quite...tricky, and enjoyed the experience. I did.

36