comp6037 semi structured data and the web xpath and
play

COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 - PowerPoint PPT Presentation

COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 Uli Sattler University of Manchester 1 Manipulation of XML documents there are various standards, tools, APIs, data models for XML: validate parse query


  1. COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 Uli Sattler University of Manchester 1

  2. Manipulation of XML documents • there are various standards, tools, APIs, data models for XML: – validate – parse – query – transform • into other XML documents • into other formats, e.g., html, excel, relational tables • we continue with XPath.. – navigating and querying through XML documents – used in XQuery and in XSLT 2

  3. Manipulation of XML documents • XPath for navigating and querying through XML documents • XQuery – more expressive than XPath, uses XPath – for querying and data manipulation – Turing complete – designed to access large amounts of data, to interface with relational systems • XSLT – similar to XQuery in that it uses XPath, .... – designed for “styling”, together with XSL-FO or CSS • DOM and SAX – a collection of APIs for programmatic manipulation – includes data model and parser – to build your own applications 3

  4. XPath • designed to navigate to/select parts in a well-formed XML document • no transformational capabilities (as in XQuery and XSLT) • is a W3C standard: – XPath 1.0 is a 1999 W3C standard – XPath 2.0 is a 2007 W3C standard that extends/is a superset of XPath 1.0 • richer set of WXS datatypes and support ➡ type information from WXS validation Difference list - set? – see http://www.w3.org/TR/xpath20 • allows to select/define parts of an XML document: lists of nodes • uses path expressions – to navigate in XML documents – to select node-lists in an XML document • you have worked with path expressions in your 1st assignment: like the expressions in a traditional computer file system • provides numerous built-in functions – e.g., for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, etc. 4

  5. XPath: Datamodel • remember how an XML document can be seen as a node-labelled tree – with element names as labels • XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax - but not on DOM tree ! XPath uses XQuery/XPath Datamodel • • there is a translation at http://www.w3.org/TR/xpath20/#datamodel – see XPath process model... 5

  6. 6

  7. Content models and types in DTD and WXS • in DTDs, we don’t really have types, only element names • in WXS, we have a type hierarchy – an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y – we call this ‘named’ typing: sub-types are declared (restriction or extension), and not inferred (by comparing structure), e.g., • Age and YoungAge <xs:simpleType name="Age"> are subtypes of integer, <xs:restriction base="xs:integer"> • but YoungAge is not a <xs:minInclusive value="0"/> <xs:maxInclusive value="130"/> subtype of Age </xs:restriction></xs:simpleType> • however, ProperYoungAge is a subtype of Age <xs:simpleType name="YoungAge"> <xs:restriction base="xs:integer"> <xs:simpleType name="ProperYoungAge"> <xs:minInclusive value="0"/> <xs:restriction base="Age"> <xs:maxInclusive value="19"/> <xs:minInclusive value="0"/> </xs:restriction></xs:simpleType> <xs:maxInclusive value="19"/> </xs:restriction></xs:simpleType> 7

  8. Types in WXS • how do we determine a type of an element w.r.t. a WXS schema? 1. determine the type hierarchy, i.e., all types and where they are derived from an element of a type X derived by • if Y1, ..., Yk are all subtypes of X, then restriction or extension from Y can be used in place of an element of type Y e(X) := e(X) ∪ e(Y1) ∪ ... ∪ e(Yk) for e(T) the extension of type T, i.e., its instances 2. for each element in document, find its type (and supertypes) • difficult, e.g., if <xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> 8

  9. Content models and types in DTD and WXS • In order to prevent difficulties in WXS as caused by <xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> WXS’s Element Declarations Consistent constraint is imposed (and also on the schema at top level): If the {particles} contains, either directly, indirectly (that is, within the {particles} of a contained model group, recursively) or implicitly two or more element declaration particles with the same {name} and {target namespace}, then all their type definitions must be the same top-level definition, that is, all of the following must be true: 1 all their {type definition}s must have a non-absent {name}. 2 all their {type definition}s must have the same {name}. 3 all their {type definition}s must have the same {target namespace}. 9

  10. Determining types in DTD and WXS • [DTD] element name = type of that element • [WXS] as a consequence of the Element Declarations Consistent constraint, we can determine all element’s types in a top down manner (and this is done during validation and recorded in PSVI): – start with n = root element node – from element name e of n , determine type t of n (if n is root node, since schema cannot contain two global components with the same name, this is possible otherwise EDC constraint ensures this) 1. in schema, find model group G for t and – for each element child node n’ of e with name e’ , determine in G type t’ of e’ and recurse into (1.) 10

  11. XPath: Datamodel • the XPath DM uses the following concepts • nodes : • atomic value: – element • behave like nodes without children or parents – attribute • is a value in the value space of a WXS atomic type, – text e.g., xsd:string – namespace • item: atomic values or nodes – processing-instruction – comment <?xml version="1.0" encoding="ISO-8859-1"?> – document (root) <bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> attribute node </book> </bookstore> element node text node 11

  12. XPath Data Model <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <?xml-stylesheet href="screen.css" type="text/css" media="screen"?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://vlib.example.org/"> vlib.example.org</a>.</p> </body> </html> From: http://xformsinstitute.com/essentials/browse/ch03s02.php 12

  13. Document nodeType = DOCUMENT_NODE nodeName = #document nodeValue = (null) Comparison XPath DM and DOM datamodel Element nodeType = ELEMENT_NODE nodeName = mytext nodeValue = (null) firstchild lastchild attributes • XPath DM and DOM DM are similar, but different – most importantly regarding names and values of nodes but also structurally (see ★ ) – in XPath, only attributes, elements, processing instructions, and namespace nodes have names, of form (local part, namespace URI) – whereas DOM uses pseudo-names like #document, #comment, #text – In XPath, the value of an element or root node is the concatenation of the values of all its text node descendants , not null as it is in DOM: • e.g, XPath value of <a>A<b>B</b></a> is “AB” ★ XPath does not have separate nodes for CDATA sections (they are merged with their surrounding text) <N>here is some text and <![CDATA[some CDATA < >]]> – XPath has no representation </N> 13

  14. XPath: core terms -- relation between nodes • (since we view XML documents as trees) each node has at most one parent – each node but the root node has exactly one parent – the root node has no parent • each node has zero or more children • ancestor is the transitive closure of parent, i.e., a node’s parent, its parent, its parent, ... • descendant is the transitive closure of child, i.e., a node’s children, their children, their children, ... • when evaluating an XPath expression p , we assume that we know – which document and – which context we are evaluating p over – … we see later how they are chosen/given • an XPath expression evaluates to an item sequence , – an item is either a node (doc., element, attribute,...) or an atomic value – document order is preserved among items 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend