The XML Typechecking Problem Dan Suciu, University of Washington - PowerPoint PPT Presentation

The XML Typechecking Problem Dan Suciu, University of Washington Presented by T.J. Green ∗ University of Pennsylvania February 19, 2004 ∗ with L A T EX slides!

XML Data Model Subset of XQuery data model: XML documents are ordered trees with labels at nodes. More precisely, fix an alphabet Σ of tag names, attribute names, and atomic type names. Denote T Σ the set of ordered trees where each node is labeled with an element from Σ. 1

XML Types A type is a subset of T Σ that is a regular tree language. Formally, a type is defined by a set of type identifiers T and associates to each identifier a regular expression over Σ × T . 2

XML Types - an example Here is an example, using XQuery’s syntax. TYPE Catalog = ELEMENT catalog(Products) TYPE Products = (ELEMENT product(Product))* TYPE Product = (ATTRIBUTE name(STRING)?, (ELEMENT mfr-price(INTEGER) | ELEMENT sale-price(INTEGER))*, (ELEMENT color(STRING))*) 3

Expressiveness of type formalism Of course, this formalism does not capture all the details of a real XML type system. But it is actually more powerful than XML Schema or DTD’s in one respect. 4

Expressiveness of type formalism Consider the set of pairs ( σ, t ) ∈ Σ × T that occur in the regular expression for some type identifier. XML Schema requires that σ be a key in this collection. DTD requires that σ be a key in the entire collection of pairs in all regular expressions. 5

Regular tree languages Regular tree languages extensively studied for ranked trees (i.e., where the number of children of a node is fixed). But XML data model is unranked . 6

Modified regular tree languages Various equivalent modifications can handle this (extending tree automata to unranked trees; using specialized DTD’s ; mapping unranked trees into ranked binary trees; defining types as in XDuce or XQuery). Here we use the XQuery style regular types. 7

Containment of regular tree languages Key property of regular tree languages: given two types τ 1 , τ 2 , can check whether τ 1 ⊆ τ 2 . High complexity in general (EXPTIME-complete). But in PTIME if τ 2 corresponds to a deterministic tree automaton. 8

The Validation Problem Given a tree t ∈ T Σ and a type τ , decide whether t ∈ τ . But what if instead of a document , we are given a program whose output is an XML document? 9

The Typechecking Problem Given a program P defining a function P : D → T Σ , where D is the program’s input domain, and a type τ ⊆ T Σ . Decide whether ∀ x ∈ D , P ( x ) ∈ τ . 10

The Typechecking Problem So the typechecker analyzes the program and decides whether all documents produced by the program are valid, and returns yes or no . If no , we would also like to know where in the program typechecking failed. (May be hard though.) 11

The Typechecking Problem Typechecking may not even be possible, in which case we may need to settle for an incomplete typechecker , which may reject some programs that in fact do typecheck. 12

The Type Inference Problem A kind of dual of the typechecking problem. Given a program P , compute the type P ( D ) = { P ( x ) | x ∈ D} . Again, perfect type inference may not be possible, and we may need to settle for incomplete type inference . 13

What kind of programs? We consider two different kinds of programs, depending on the application . 14

Application 1 - XML Publishing Here the XML document is a view over a relational database. The program’s domain is D = Inst ( S ), the set of all database instances of some schema, S . S may contain key and foreign key constraints. P may perform only simple select-project-join queries on the database, nest the results, and add appropriate XML tags. 15

Application 1 - XML Publishing Consider some database whose schema S is defined as follows. product(pid:STRING, name:STRING, mfrprice:INTEGER), colors(cid:STRING, pid:STRING, color:STRING), sale(sid:STRING, pid:STRING, price:INTEGER) First attribute of each relation is a key. Foreign key constraints suggested by attribute names. 16

Application 1 - XML Publishing Now, here is an example of an XQuery program that produce an XML view of this database. <catalog> { FOR $p in $db/product/tuple RETURN <product name = { data($p/name) }> <mfr-price> { data($p/price)} </mfr-price> { FOR $s in $db/sale/tuple WHERE $p/pid = $s/pid RETURN <sale-price> { data($s/sprice) } </sale-price> } { FOR $c in $db/color/tuple WHERE $p/pid = $c/pid RETURN <color> { data($c/color) } </color> </product> } </catalog> 17

Application 2 - XML Transformations The other class of applications we consider is those which require XML Transformations . Here, the program’s input is an XML document, that is, the program’s domain D is either T Σ or some XML type τ . The output is another XML document. 18

Application 2 - XML Transformations We take as our programming language a restricted fragment of XSLT that includes: • recursive templates • modes • apply-template can be called along any XPath axis • variables can be bound to nodes in the input atree, then passed as pa- rameters • an equality test can performed between node ID’s, but not between node values 19

Application 2 - XML Transformations We can formalize this language in terms of k -pebble tree trans- ducers. That formalism is beyond the scope of this talk. 20

Type Checking or Type Inference? One way to perform typechecking is by using type inference: infer the output type τ 1 of the program, and check for containment within the desired output type τ 1 ⊆ τ 2 . We’ll first consider type inference. 21

Type Inference Consider the XQuery program shown a few slides back. We humans can infer its output type as TYPE T1 = ELEMENT catalog(T2) TYPE T2 = (ELEMENT product(T3))* TYPE T3 = ATTRIBUTE name(STRING), ELEMENT mfr-price(INTEGER), (ELEMENT sale-price(INTEGER))*, (ELEMENT color(STRING))* How? catalog tag at root is obvious ( T1 ). Several product children ( T2 ). Analyze RETURN clause: product has exactly one name attribute, one mfr-price child, and several sale-price and color children. 22

Type Inference More programmatically, the general idea is that one infers the type of a RETURN expression from the types of its components. The XQuery formal semantics applies this to the entire language by providing type inference rules for each language construct. Type inference is used to perform typechecking in XQuery. 23

Type Inference For the XML publishing application, we actually need an en- hancement to make use of key and foreign key constraints in order to infer the correct output type. For example, knowing that pid is also a key for sale (each product has at most one sale price) narrows T3 by replacing (ELEMENT sale-price(INTEGER))* with (ELEMENT sale-price(INTEGER))? . 24

Limtations of Type Inference Suppose the the relational schema has a single table, R(x,y) , and the XQuery program is: <result> { FOR $x in $db/R/tuple RETURN <a/>, FOR $x in $db/R/tuple RETURN <b/> } </result> 25

Limitations of Type Inference XQuery infers its output type as TYPE T = ELEMENT result((ELEMENT a)*, (ELEMENT b)*) but the real output type is: P ( D ) = { ELEMENT result (( ELEMENT a ) n , ( ELEMENT b ) n ) | n ≥ 0 } since we have the same number of a ’s and b ’s. But obviously this is not a regular tree language, so we cannot hope to infer it, and must settle for T instead. 26

Limitations of Type Inference But T is an ad-hoc choice, and now we incorrectly fail to typecheck with respect to the output type T1 = ELEMENT result() | ELEMENT result(ELEMENT a, (ELEMENT a)*, ELEMENT b, (ELEMENT b)* The program in reality typechecks to this type, because T1 just rules out the cases of (0 a ’s, 1+ b ’s) or (1+ a ’s, 0 b ’s). Yet the type-checker rejects it, because T 1 �⊆ T . 27

Typechecking Given these limitations, maybe we can do better trying to do typechecking without type inference? Indeed, given certain restrictions on the programming language and output type, it is possible. 28

Typechecking for XML Publishing Here is an algorithm for typechecking P against τ : enumerate all “small” input databases (up to a size which depends only on P and τ ); run P on each; check that the output conforms to τ . Not the most efficient algorithm, but it works*! 29

Typechecking for XML Publishing *Actually, two restrictions on the output type τ are required: • τ must be a DTD type • all regular expressions in τ must be “star-free” 30

Aside: star-free regular expressions Star-free means no Kleene closure, but can use the comple- ment, compl , and the empty set, ∅ . This gives something Kleene closure-like, which in fact can express all examples given so far in this talk. For example, if Σ = { a, b, c } , then compl ( ∅ ) denotes Σ ∗ , and compl (Σ ∗ .b. Σ ∗ | Σ ∗ .c. Σ ∗ ) denotes a ∗ . But, not all Kleene closure expressions can be expressed this way. An example that cannot: ( a.a ) ∗ . 31

Limitations of Typechecking Unfortunately, the restrictions we have given are critical. Allowing output types that are not DTD’s or increasing the ex- pressive power of the language leads to undecidability. 32

The XML Typechecking Problem Dan Suciu, University of Washington - PowerPoint PPT Presentation

The XML Typechecking Problem Dan Suciu, University of Washington Presented by T.J. Green University of Pennsylvania February 19, 2004 with L A T EX slides! XML Data Model Subset of XQuery data model: XML documents are ordered trees

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios,

XML and Web Services Lecture 8 1 Outline XML (Section 17) XML syntax, semistructured

Binary XML and its Characterization Robin Berjon, XML Prague, 25/06/2005 What is Binary XML?

An example typechecking operation An example typechecking operation class IntLiteralExpr extends

Java 2 Micro Edition XML F. Ricci 2010/2011 J2Me XML overview XML, REST Parsing XML :

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.

Querying XML Documents Querying XML Documents How XML may be supported in databases with

XML in Programming Patryk Czarnik XML and Applications 2015/2016 Lecture 5 4.04.2016 XML in

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

Session 23 XML XML Reading and Reference Reading https://en.wikipedia.org/wiki/XML

XML and Content Management Lecture 3: Modelling XML Documents: XML Schema Maciej Ogrodniczuk,

Modelling XML Applications Patryk Czarnik XML and Applications 2015/2016 Lecture 2

XML Walking the Tree Modifying the Tree Generating XML Documents Creating Documents Volker

Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2

How does does it it look? look? How <?xml version= <?xml version= 1.0 1.0

Generating SGML specific editors from DTDs to Attribute Grammars Jos Carlos Ramalho Alda Reis

Identifying Query Incompatibilities with Evolving XML Schemas Pierre Genevs (with Nabil

XML data exchange Amlie Gheerbrant LFCS University of Edinburgh 11/11/2010 - Dagstuhl

COMP6037 Semi-structured Data and the Web Tree Grammars and Relax NG, week 3 Uli Sattler

Institute of Information Systems & Information Management UAd Building Linked Data For Both

PROV-AQ: Provenance Access and Query Editors: Authors: Graham Klyne Luc Moreau Paul Groth

Scaling Topic Maps Marc Wilhelm Kster Graham Moore TMRA 2007, 2007-10-11 Marc Wilhelm

Overview Database Management Systems Semi-Structured Data Introduction to XML Winter

Sambuz

Useful Links

Newsletter

Mail Us

The XML Typechecking Problem Dan Suciu, University of Washington - PowerPoint PPT Presentation

The XML Typechecking Problem Dan Suciu, University of Washington Presented by T.J. Green University of Pennsylvania February 19, 2004 with L A T EX slides! XML Data Model Subset of XQuery data model: XML documents are ordered trees

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios,

XML and Web Services Lecture 8 1 Outline XML (Section 17) XML syntax, semistructured

Binary XML and its Characterization Robin Berjon, XML Prague, 25/06/2005 What is Binary XML?

An example typechecking operation An example typechecking operation class IntLiteralExpr extends

Java 2 Micro Edition XML F. Ricci 2010/2011 J2Me XML overview XML, REST Parsing XML :

XML Documents XML Documents The XML Namespace mechanism Anders Mller &amp; Michael I.

Querying XML Documents Querying XML Documents How XML may be supported in databases with

XML in Programming Patryk Czarnik XML and Applications 2015/2016 Lecture 5 4.04.2016 XML in

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

Session 23 XML XML Reading and Reference Reading https://en.wikipedia.org/wiki/XML

XML and Content Management Lecture 3: Modelling XML Documents: XML Schema Maciej Ogrodniczuk,

Modelling XML Applications Patryk Czarnik XML and Applications 2015/2016 Lecture 2

XML Walking the Tree Modifying the Tree Generating XML Documents Creating Documents Volker

Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2

How does does it it look? look? How &lt;?xml version= &lt;?xml version= 1.0 1.0

Generating SGML specific editors from DTDs to Attribute Grammars Jos Carlos Ramalho Alda Reis

Identifying Query Incompatibilities with Evolving XML Schemas Pierre Genevs (with Nabil

XML data exchange Amlie Gheerbrant LFCS University of Edinburgh 11/11/2010 - Dagstuhl

COMP6037 Semi-structured Data and the Web Tree Grammars and Relax NG, week 3 Uli Sattler

Institute of Information Systems &amp; Information Management UAd Building Linked Data For Both

PROV-AQ: Provenance Access and Query Editors: Authors: Graham Klyne Luc Moreau Paul Groth

Scaling Topic Maps Marc Wilhelm Kster Graham Moore TMRA 2007, 2007-10-11 Marc Wilhelm

Overview Database Management Systems Semi-Structured Data Introduction to XML Winter

Sambuz

Useful Links

Newsletter

Mail Us

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.

How does does it it look? look? How <?xml version= <?xml version= 1.0 1.0

Institute of Information Systems & Information Management UAd Building Linked Data For Both