COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 - PowerPoint PPT Presentation

COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 Uli Sattler University of Manchester 1

Manipulation of XML documents • there are various standards, tools, APIs, data models for XML: – validate – parse – query – transform • into other XML documents • into other formats, e.g., html, excel, relational tables • we continue with XPath.. – navigating and querying through XML documents – used in XQuery and in XSLT 2

Manipulation of XML documents • XPath for navigating and querying through XML documents • XQuery – more expressive than XPath, uses XPath – for querying and data manipulation – Turing complete – designed to access large amounts of data, to interface with relational systems • XSLT – similar to XQuery in that it uses XPath, .... – designed for “styling”, together with XSL-FO or CSS • DOM and SAX – a collection of APIs for programmatic manipulation – includes data model and parser – to build your own applications 3

XPath • designed to navigate to/select parts in a well-formed XML document • no transformational capabilities (as in XQuery and XSLT) • is a W3C standard: – XPath 1.0 is a 1999 W3C standard – XPath 2.0 is a 2007 W3C standard that extends/is a superset of XPath 1.0 • richer set of WXS datatypes and support ➡ type information from WXS validation Difference list - set? – see http://www.w3.org/TR/xpath20 • allows to select/define parts of an XML document: lists of nodes • uses path expressions – to navigate in XML documents – to select node-lists in an XML document • you have worked with path expressions in your 1st assignment: like the expressions in a traditional computer file system • provides numerous built-in functions – e.g., for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, etc. 4

XPath: Datamodel • remember how an XML document can be seen as a node-labelled tree – with element names as labels • XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax - but not on DOM tree ! XPath uses XQuery/XPath Datamodel • • there is a translation at http://www.w3.org/TR/xpath20/#datamodel – see XPath process model... 5

Content models and types in DTD and WXS • in DTDs, we don’t really have types, only element names • in WXS, we have a type hierarchy – an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y – we call this ‘named’ typing: sub-types are declared (restriction or extension), and not inferred (by comparing structure), e.g., • Age and YoungAge <xs:simpleType name="Age"> are subtypes of integer, <xs:restriction base="xs:integer"> • but YoungAge is not a <xs:minInclusive value="0"/> <xs:maxInclusive value="130"/> subtype of Age </xs:restriction></xs:simpleType> • however, ProperYoungAge is a subtype of Age <xs:simpleType name="YoungAge"> <xs:restriction base="xs:integer"> <xs:simpleType name="ProperYoungAge"> <xs:minInclusive value="0"/> <xs:restriction base="Age"> <xs:maxInclusive value="19"/> <xs:minInclusive value="0"/> </xs:restriction></xs:simpleType> <xs:maxInclusive value="19"/> </xs:restriction></xs:simpleType> 7

Types in WXS • how do we determine a type of an element w.r.t. a WXS schema? 1. determine the type hierarchy, i.e., all types and where they are derived from an element of a type X derived by • if Y1, ..., Yk are all subtypes of X, then restriction or extension from Y can be used in place of an element of type Y e(X) := e(X) ∪ e(Y1) ∪ ... ∪ e(Yk) for e(T) the extension of type T, i.e., its instances 2. for each element in document, find its type (and supertypes) • difficult, e.g., if <xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> 8

Content models and types in DTD and WXS • In order to prevent difficulties in WXS as caused by <xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> WXS’s Element Declarations Consistent constraint is imposed (and also on the schema at top level): If the {particles} contains, either directly, indirectly (that is, within the {particles} of a contained model group, recursively) or implicitly two or more element declaration particles with the same {name} and {target namespace}, then all their type definitions must be the same top-level definition, that is, all of the following must be true: 1 all their {type definition}s must have a non-absent {name}. 2 all their {type definition}s must have the same {name}. 3 all their {type definition}s must have the same {target namespace}. 9

Determining types in DTD and WXS • [DTD] element name = type of that element • [WXS] as a consequence of the Element Declarations Consistent constraint, we can determine all element’s types in a top down manner (and this is done during validation and recorded in PSVI): – start with n = root element node – from element name e of n , determine type t of n (if n is root node, since schema cannot contain two global components with the same name, this is possible otherwise EDC constraint ensures this) 1. in schema, find model group G for t and – for each element child node n’ of e with name e’ , determine in G type t’ of e’ and recurse into (1.) 10

XPath: Datamodel • the XPath DM uses the following concepts • nodes : • atomic value: – element • behave like nodes without children or parents – attribute • is a value in the value space of a WXS atomic type, – text e.g., xsd:string – namespace • item: atomic values or nodes – processing-instruction – comment <?xml version="1.0" encoding="ISO-8859-1"?> – document (root) <bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> attribute node </book> </bookstore> element node text node 11

XPath Data Model <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <?xml-stylesheet href="screen.css" type="text/css" media="screen"?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://vlib.example.org/"> vlib.example.org</a>.</p> </body> </html> From: http://xformsinstitute.com/essentials/browse/ch03s02.php 12

Document nodeType = DOCUMENT_NODE nodeName = #document nodeValue = (null) Comparison XPath DM and DOM datamodel Element nodeType = ELEMENT_NODE nodeName = mytext nodeValue = (null) firstchild lastchild attributes • XPath DM and DOM DM are similar, but different – most importantly regarding names and values of nodes but also structurally (see ★ ) – in XPath, only attributes, elements, processing instructions, and namespace nodes have names, of form (local part, namespace URI) – whereas DOM uses pseudo-names like #document, #comment, #text – In XPath, the value of an element or root node is the concatenation of the values of all its text node descendants , not null as it is in DOM: • e.g, XPath value of <a>A<b>B</b></a> is “AB” ★ XPath does not have separate nodes for CDATA sections (they are merged with their surrounding text) <N>here is some text and <![CDATA[some CDATA < >]]> – XPath has no representation </N> 13

XPath: core terms -- relation between nodes • (since we view XML documents as trees) each node has at most one parent – each node but the root node has exactly one parent – the root node has no parent • each node has zero or more children • ancestor is the transitive closure of parent, i.e., a node’s parent, its parent, its parent, ... • descendant is the transitive closure of child, i.e., a node’s children, their children, their children, ... • when evaluating an XPath expression p , we assume that we know – which document and – which context we are evaluating p over – … we see later how they are chosen/given • an XPath expression evaluates to an item sequence , – an item is either a node (doc., element, attribute,...) or an atomic value – document order is preserved among items 14

COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 - PowerPoint PPT Presentation

COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 Uli Sattler University of Manchester 1 Manipulation of XML documents there are various standards, tools, APIs, data models for XML: validate parse query

Semi-structured data Data is not just text, but is not as well- Semi-structured data

COMP6037 Semi-structured Data and the Web Tree Grammars and Relax NG, week 3 Uli Sattler

COMP6037 We know Semi-structured Data and the Web when a grammar is local: i.e., if none

XPath Reference XPath leashed, Michael Benedikt and Christoph Koch, TR, 2006 1

XPATH and XQUERY Two query language to search for features in XML documents XML Query

Session 16 XPath 1 Objectives Understand XPath well enough to provide a background to jQuery

XPath: Arithmetical Operations XPath : Arithmetical Operations 3.1 Additional Features 3.1

Information Systems XPath Nikolaj Popov Research Institute for Symbolic Computation Johannes

9. Path expressions: XPath XPath is a language for selecting parts of XML documents it is

XPath Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer

Reference XPath leashed, Michael Benedikt and Christoph Koch, TR, 2006 XPath Formal setting

XPath and XSLT Based on slides by Dan Suciu University of Washington CS330 Lecture November 12,

XPath XPath is a language for describing paths in XML documents. XML query languages

Generative XPath One XPath to rule them all Oleg Parashchenko Saint-Petersburg State University,

COMP6037 Read Blackboards Announcements Read Blackboards Discussions

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

Text mining with ngram variables Matthias Schonlau, Ph.D. University of Waterloo, Canada

Introduction A bit of history Lodestone-magnetite Fe 3 O 4 known in antic Greece and ancient

Network Infrastructure Security APRICOT 2005 Workshop February 18-20, 2005 Merike Kaeo

Towards Temporal Reasoning in Portuguese Livy Real 4 Alexandre Rademaker 1 , 2 Fabricio Chalub 1

Lecture 12: Structural Software Modelling 2015-06-25 Prof. Dr. Andreas Podelski, Dr. Bernd

Measurements - time Measurements - steps MASPAR MP-1 256 processors (Zimmermann, Kumm)

JSTL Tag-Library http://www.tutego.com/ Custom-Tags und Tag-Library JSPs bestehen im Kern aus

Sambuz

Useful Links

Newsletter

Mail Us