COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 - - PowerPoint PPT Presentation
COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 - - PowerPoint PPT Presentation
COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 Uli Sattler University of Manchester 1 Manipulation of XML documents there are various standards, tools, APIs, data models for XML: validate parse query
2
Manipulation of XML documents
- there are various standards, tools, APIs, data models for XML:
– validate – parse – query – transform
- into other XML documents
- into other formats, e.g., html, excel, relational tables
- we continue with XPath..
– navigating and querying through XML documents – used in XQuery and in XSLT
3
Manipulation of XML documents
- XPath for navigating and querying through XML documents
- XQuery
– more expressive than XPath, uses XPath – for querying and data manipulation – Turing complete – designed to access large amounts of data, to interface with relational systems
- XSLT
– similar to XQuery in that it uses XPath, .... – designed for “styling”, together with XSL-FO or CSS
- DOM and SAX
– a collection of APIs for programmatic manipulation – includes data model and parser – to build your own applications
4
XPath
- designed to navigate to/select parts in a well-formed XML document
- no transformational capabilities (as in XQuery and XSLT)
- is a W3C standard:
– XPath 1.0 is a 1999 W3C standard – XPath 2.0 is a 2007 W3C standard that extends/is a superset of XPath 1.0
- richer set of WXS datatypes and support
➡ type information from WXS validation – see http://www.w3.org/TR/xpath20
- allows to select/define parts of an XML document: lists of nodes
- uses path expressions
– to navigate in XML documents – to select node-lists in an XML document
- you have worked with path expressions in your 1st assignment:
like the expressions in a traditional computer file system
- provides numerous built-in functions
– e.g., for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, etc. Difference list - set?
5
XPath: Datamodel
- remember how an XML document can be seen as a node-labelled tree
– with element names as labels
- XPath operates on the abstract, logical structure of an XML document,
rather than its surface syntax - but not on DOM tree!
- XPath uses XQuery/XPath Datamodel
- there is a translation at http://www.w3.org/TR/xpath20/#datamodel
– see XPath process model...
6
Content models and types in DTD and WXS
- in DTDs, we don’t really have types, only element names
- in WXS, we have a type hierarchy
– an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y – we call this ‘named’ typing: sub-types are declared (restriction or extension), and not inferred (by comparing structure), e.g.,
- Age and YoungAge
are subtypes of integer,
- but YoungAge is not a
subtype of Age
- however, ProperYoungAge is
a subtype of Age
7
<xs:simpleType name="Age"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="130"/> </xs:restriction></xs:simpleType> <xs:simpleType name="YoungAge"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="19"/> </xs:restriction></xs:simpleType> <xs:simpleType name="ProperYoungAge"> <xs:restriction base="Age"> <xs:minInclusive value="0"/> <xs:maxInclusive value="19"/> </xs:restriction></xs:simpleType>
Types in WXS
- how do we determine a type of an element w.r.t. a WXS schema?
- 1. determine the type hierarchy, i.e., all types and where they are
derived from
- if Y1, ..., Yk are all subtypes of X, then
e(X) := e(X) ∪ e(Y1) ∪ ... ∪ e(Yk) for e(T) the extension of type T, i.e., its instances
- 2. for each element in document, find its type (and supertypes)
- difficult, e.g., if
8
<xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType>
an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y
Content models and types in DTD and WXS
- In order to prevent difficulties in WXS as caused by
WXS’s Element Declarations Consistent constraint is imposed (and also on the schema at top level):
9
<xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType>
If the {particles} contains, either directly, indirectly (that is, within the {particles} of a contained model group, recursively) or implicitly two or more element declaration particles with the same {name} and {target namespace}, then all their type definitions must be the same top-level definition, that is, all of the following must be true: 1 all their {type definition}s must have a non-absent {name}. 2 all their {type definition}s must have the same {name}. 3 all their {type definition}s must have the same {target namespace}.
Determining types in DTD and WXS
- [DTD] element name = type of that element
- [WXS] as a consequence of the Element Declarations Consistent constraint,
we can determine all element’s types in a top down manner (and this is done during validation and recorded in PSVI): – start with n = root element node – from element name e of n, determine type t of n (if n is root node, since schema cannot contain two global components with the same name, this is possible
- therwise EDC constraint ensures this)
- 1. in schema, find model group G for t and
– for each element child node n’ of e with name e’, determine in G type t’ of e’ and recurse into (1.)
10
XPath: Datamodel
- the XPath DM uses the following concepts
- nodes:
– element – attribute – text – namespace – processing-instruction – comment – document (root)
11
- atomic value:
- behave like nodes without children or parents
- is a value in the value space of a WXS atomic type,
e.g., xsd:string
- item: atomic values or nodes
<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore>
attribute node element node text node
XPath Data Model
12
From: http://xformsinstitute.com/essentials/browse/ch03s02.php
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <?xml-stylesheet href="screen.css" type="text/css" media="screen"?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://vlib.example.org/"> vlib.example.org</a>.</p> </body> </html>
Comparison XPath DM and DOM datamodel
- XPath DM and DOM DM are similar, but different
– most importantly regarding names and values of nodes but also structurally (see ★) – in XPath, only attributes, elements, processing instructions, and namespace nodes have names, of form (local part, namespace URI) – whereas DOM uses pseudo-names like #document, #comment, #text – In XPath, the value of an element or root node is the concatenation of the values of all its text node descendants, not null as it is in DOM:
- e.g, XPath value of <a>A<b>B</b></a> is “AB”
★ XPath does not have separate nodes for CDATA sections (they are merged with their surrounding text) – XPath has no representation
13
Document nodeType = DOCUMENT_NODE nodeName = #document nodeValue = (null) Element nodeType = ELEMENT_NODE nodeName = mytext nodeValue = (null) firstchild lastchild attributes
<N>here is some text and <![CDATA[some CDATA < >]]> </N>
14
XPath: core terms -- relation between nodes
- (since we view XML documents as trees) each node has at most one parent
– each node but the root node has exactly one parent – the root node has no parent
- each node has zero or more children
- ancestor is the transitive closure of parent,
i.e., a node’s parent, its parent, its parent, ...
- descendant is the transitive closure of child,
i.e., a node’s children, their children, their children, ...
- when evaluating an XPath expression p, we assume that we know
– which document and – which context we are evaluating p over – … we see later how they are chosen/given
- an XPath expression evaluates to an item sequence,
– an item is either a node (doc., element, attribute,...) or an atomic value – document order is preserved among items
XPath: core terms -- document order
- Within a tree, document order is
specified as follows: – The root node is the first node ➥ top-down – Every node occurs before all of its children and descendants ➥ top-down – Namespace nodes immediately follow the element node with which they are associated. The relative order of namespace nodes is stable but implementation-dependent. ➥ exception for NSs - they are first – Attribute nodes immediately follow the namespace nodes of the element node with which they are associated. The relative order of attribute nodes is stable but implementation-dependent. ➥ exception for attributes - they are second – The relative order of siblings is the order in which they occur in the children property of their parent node. ➥ left-before-right – Children and descendants occur before following siblings ➥ depth-first
15
16
XPath: core terms -- selecting nodes
Location paths/Path expressions are used to select nodes
- they are based on (location) steps
- a step is of the form axis::test[predicate]... [predicate] where
– axis indicates the navigation direction in the tree – test determines what (nodes by kind or name) to select – zero or more predicates select a further subset
- axis include:
– self for the context node – child for all child nodes – descendant for all descendant nodes (but not attribute or NS nodes) – parent for the parent node (in case it exists) – ancestor for all ancestor nodes – attribute for the attribute nodes of the context node – etc.
- A path expression, when evaluated on an XML document, returns an item
sequence
XPath: core terms -- axis
17
ForwardAxis ::= ("child" "::") | ("descendant" "::") | ("attribute" "::") | ("self" "::") | ("descendant-or-self" "::") | ("following-sibling" "::") | ("following" "::") | ("namespace" "::") ReverseAxis ::= ("parent" "::") | ("ancestor" "::") | ("preceding-sibling" "::") | ("preceding" "::") | ("ancestor-or-self" "::")
18
XPath: core terms -- selectors
Node Tests include:
- element or attribute names,
e.g., child::title selects the title element child nodes of the context node attribute::height selects the height attribute node of the context node
- * - a wildcard for element or attribute names,
e.g., child::* selects all element child nodes of the context node attribute::* selects all the attributes of the context node
- text() - to select text nodes
e.g., child::text() selects all text child nodes of the context node
- node() - to select all nodes, regardless of their type
e.g., child::node() selects all child nodes of the context node
- element(x,Y) - to select all element nodes with name x and type Y
e.g., child::element(person,ModernPersonType) and child::element(*,ModernPersonType)
- etc.
19
XPath: core terms -- predicates
- [pred1]...[predm] sub-selects those nodes that satisfy
pred1 and ... and predm
- they sub-select element nodes according to their
– position, e.g., [position() = 1], [position() = last() -1], etc. – properties of descendants,e.g., [attribute::type="warning"] selects those element child nodes that have an attribute child node for type with value "warning" – and can be combined using “or”, e.g., [position() = 1 or position() = 2]
- Examples:
– child::chapter[attribute::type=“warning”][position()=5] selects the fifth chapter child of the context node that has a type attribute with value “warning” – child::chapter[attribute::type=“warning” or position()=5] selects all chapter child nodes of the context node that have a type attribute with value “warning” together with the 5th chapter child node – descendant::chapter[child::title=“Introduction”] selects the chapter descendant nodes of the context node that have one or more title children whose value is “string-equal” to “Introduction”
20
XPath: core terms -- location path & path expression
- a location path
– is basically a sequence of location steps loc-step – can be local/relative: of the form (loc-step”/”)* loc-step – can be global/absolute: of the form “/” or “/” (loc-step”/”)* loc-step
- if we do not have a context node, the root node is context node
- a path expression is a location path or a disjunction of location paths
p1|..|pk
- e.g.,
– child::person[position()=1]/child::name/attr::given is a local location path selecting the given attribute node of all name child nodes of the first person child node of the context node – /child::personlist/child::person[position()=1]/child::name/attr::given is a global location path selecting the given attribute node of all name child nodes of the first person child node of the personlist root node – /child::doc/child::chapter | /child::doc/child::appendix is a path expression selecting all chapter and all appendix child nodes of the doc root node
21
XPath: core terms -- abbreviated syntax
- this becomes rather unreadable, so the abbreviated syntax was
introduced:
– child:: can be omitted from a location step e.g., div/para is short for child::div/child::para – attribute:: can be abbreviated to @ e.g., para[@type="warn"] is short for child::para[attribute::type="warn”] para[@type] selects all para element child nodes having an attribute type – // is short for /descendant-or-self::node()/ e.g., //para is short for /descendant-or-self::node()/child::para – . is short for self::node() e.g., .//para is short for self::node()/descendant-or-self::node()/child::para – .. is short for parent::node() e.g., ../title is short for parent::node()/child::title
Example
<?xml version="1.0"?><?xml-stylesheet... ?> <!DOCTYPE people [... ]> <people> <person born="1912" died="1954" id="p342"> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> <homepage myref="http://www.turing.org.uk/"/> </person> <person born="1918" died="1988" id="p4567"> <name> <first_name>Richard</first_name> <middle_initial>M</middle_initial> <last_name>Feynman</last_name> </name> <profession>physicist</profession> <hobby>Playing the bongoes</hobby> </person></people> from: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html 22
- /people/person/name/first_name/text()
selects "Alan" and "Richard"
- //middle_initial/../first_name
selects <first_name>Richard</first_name>
- //person[profession="physicist"]
selects all person nodes with a profession child node with the value "physicist.”
/child::people/child::person/child::name/ child::first_name/child::comment()text() /descendant-or-self::middle_initial/ parent::node()/first_name /descendant-or-self::person [child::profession="physicist"]
23
Example
<?xml version="1.0"?><?xml-stylesheet... ?> <!DOCTYPE people [... ]> <people> <person born="1912" died="1954" id="p342"> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> <homepage xlink:href="http://www.turing.org.uk/"/> </person> <person born="1918" died="1988" id="p4567"> <name> <first_name>Richard</first_name> <middle_initial>M</middle_initial> <last_name>Feynman</last_name> </name> <profession>physicist</ profession> <hobby>Playing the bongoes</hobby> </person></people> from: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
- //name[first_name="Richard" or first_name="Dick"]
selects all name nodes with a first_name child with the value Richard or Dick
- /people/person[@born < 1950]/name[first_name = "Alan"]
selects all name child nodes –
- f person nodes whose born value attribute value is below 1950 and
– that have a first_name child nodes with the value Alan
/descendant-or-self::name [child::first_name="Richard" or child::first_name="Dick"] /child::people/child::person[attribute::born < 1950]/ child::name[child::first_name = "Alan"]
Using XPath: e.g. in XQuery
24
25
XQuery
- is a language for querying XML data
- is for XML what SQL is for databases
- like XSLT, it is built on/heavily uses XPath expressions
- a W3C standard since 2007, see http://www.w3.org/TR/xquery/
- is supported by major database engines (IBM, Oracle, Microsoft, etc.)
- is of expressivity comparable to XSLT, but of different philosophy:
– typed (XSLT can be said to be “less strictly” typed) – functional
- like XSLT, it can be used, e.g., to
– extract information to use in a Web Service – generate summary reports – transform XML data to HTML – search Web documents for relevant information – ...and to answer queries
26
XQuery: some basics
- XQuery provides support for datatypes, i.e., we
– have variables and can – declare their type, yet the query processor may be strict: no attempt at a conversion to the correct type needs to be made! – e.g., if I try to add an integer with a decimal or write an integer into a decimal variable, the query processor may stop with an error
- like XPath, XQuery is based on item sequences
– a sequence is a (poss. empty) list of items separated by comma – items are atomic values or nodes (no nesting for sequences!) – as usual, nodes are of one of 7 kinds: element, attribute, text, namespace, processing-instruction, comment, or document (root) – if $mySeq is a sequence, $mySeq[3] is its third item
- all variable names start with “$” as in $mySeq
- comments are between “(:” and “:)” as in “(: this is a comment:)”
- a central, SQL-like part are FLOWR expressions
W3C speak
27
XQuery: FLWOR expressions
- “FLWOR” is pronounced “flower”
- a FLWOR expression has 5 possibly overlapping parts:
– For e.g., for $x in doc(”people.xml")/contactList/person – Let e.g., let $i := 3 let $n := $x/name/firstname – Where e.g., where $x/@categ = “friend” – Order by e.g., order by $x/name/lastname ascending – Return e.g., return concat($x/name/lastname, “, “$x/name/firstname)
people.xml
<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>
...
F and L can appear any (!) number of times in any order. W and O are optional, but must appear in the order given. R has always to be there...depending on who you ask...
28
XQuery FLWOR expressions: for
- a for expression determines what to
iterate through
- is basically of the form
- where expression is
– any XPath location path or – a FLWOR expression (nesting!) or – a logic expression (if-then-else, etc.), later more
- e.g., for $b in doc(”people.xml")/contactList/person[@categ = “friend”]
– query processor goes through the sequence of all (element) nodes selected by the XPath location path
- e.g., let $p := 3
for $b as element() at $p in doc(”people.xml")/contactList/person[@categ = “friend”] – query processor goes through (the singleton sequence containing) the third element node of the node set selected by the XPath location path (obviously, we can do this much nicer...suggestions?)
for variable (as datatype)? (at position)? in expression
people.xml
<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>
...
29
XQuery FLWOR expressions: let
- a let expression binds a variable to a value
- is basically of the form
- where expression is
– any XPath location path or – a FLOWR expression or – a logic expression (if-then-else, etc.), later more
- e.g.,
let variable (as datatype)? := expression
people.xml
<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>
...
for $b in doc("people.xml")/contactlist/person let $name as text() := if (xs:integer($b/@age) < xs:integer(16)) then ($b/name/firstname/text()) else ($b/name/lastname/text()) return $name for $b in doc("people.xml")/contactlist/person let $name as element() := $b/name/firstname return $name
30
XQuery FLWOR expressions: for & let
- we can repeat and mix for and let expressions
- a FLOWR expression
– has at least one for or one let expression, – but can have any number of them in any order
- careful: the order plays a crucial role for their meaning: make sure to bind
variables to the right values before using them in for expression:
- more careful: in the above example, is the ‘double’ really a ‘double’?
people.xml
<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>
... let $doc := doc("people.xml") for $p in $doc/contactlist/person let $n := $p/name/lastname/text() let $a := $p/@age for $double in $doc/contactlist/person[@age = $a][name/lastname/text() = $n]
31
XQuery FLWOR expressions: return
- a return expression determines the output
- is basically of the form
- where expression is one of the logical expressions to be defined later
- it returns elements as they are, i.e., with attributes and descendants
- e.g.,
returns <MyFriendList>John Millie...</MyFriendList>
- careful: we needed “{“, “}” to distinguish between text and instructions
return expression
people.xml
<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>
... return <MyFriendList> { for $b in doc("people.xml")/contactlist/person[@categ="friend"] return $b/name/firstname/text() } </MyFriendList>
32
XQuery FLWOR expressions: return & logical expressions
- as mentioned before, in FLWOR expression,
we can make use of logical expressions including – if-then-else – some/every – Boolean expressions
- e.g.,
people.xml
<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>
... let $doc := doc("people.xml") return <MyFriendList> { for $b in $doc/contactlist/person[@categ="friend"] return <friend> { (if (xs:integer($b/@age) < xs:integer(16)) then $b/name/firstname/text() else $b/name/lastname/text()) } </friend> } </MyFriendList>
33
XQuery: constructors
- as we have seen, we can use text in the return part
- to return a more complex XML document, we can make use of
constructors – e.g., direct element constructors as in the previous example – or direct element constructors with attributes
- we use “{“ and “}” to delimit expressions that are evaluated, e.g.,
- if we want to construct
elements with attributes, we can do this easily: e.g., return <friend phone =“{ xs:string($p/phone) }”>{ (if (...
let $doc := doc("contactlist-john-doe.xml") for $p in $doc/contactlist/person return <example> <p> Here is a query. </p> <eg> $p/name</eg> <p> Here is the result of the query. </p> <eg>{ $p/name }</eg> </example>
34
XQuery FLOWR expressions: where
- where is used to filter the node sets
selected through let and for
- like in SQL, we can use where for joins
- f several trees or documents
- e.g.,
cities.xml <?xml version="1.0" encoding=”UTF-8"?> <citylist> <city> <name>Manchester</name> <club>Manchester United</club> </city> <city> <name>Munich</name> <club>Die Loewen</club> </city> ...
for $p in doc("contactlist-john-doe.xml")/contactlist/person for $c in doc("cities.xml")/citylist/city where $p/city/text() = $c/name/text() return concat("Dear ", $p/name/firstname, ", do you like ", $c/club ,"? " )
people.xml
<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>
...
35
XQuery FLOWR expressions: where
- a more realistic, SQL-like example
(from <oXygen/>):
product.xml <?xml version="1.0" encoding="UTF-8"?> <products> <product> <productId>1</productId> <productName>Wave Runner</productName> <productSpec>120 HP blaa</productSpec> </product> ... sale.xml <?xml version="1.0" encoding="UTF-8"?> <sales> <sale productId="1"> <mrq>180$</mrq> <ytd>18.87% up</ytd> <margin>5%</margin> </sale ...
<sales> { for $product in doc("products.xml")/products/product, $sale in doc("sales.xml")/sales/sale where $product/productId = $sale/@productId return <product id="{$product/productId}"> { $product/productName, $product/productSpec, $sale/mrq, $sale/ytd, $sale/margin } </product> } </sales>
36
XQuery FLOWR expressions: nesting
- like in SQL, we can nest expressions
- e.g., the previous example does not
work in case a city has several clubs:
cities.xml <?xml version="1.0" encoding=”UTF-8"?> <citylist> <city> <name>Manchester</name> <club>Manchester United</club> <club>Manchester City</club> </city> <city> <name>Munich</name> <club>Die Loewen</club> <club>Bayern-Muenchen</club> </city> ...
<sales> {for $p in doc("contactlist-john-doe.xml")/contactlist/person for $c in doc("cities.xml")/citylist/city where $p/city = $c/name return (for $i in 1 to fn:count($c/club) return concat("Dear ", $p/name/firstname, ", do you like ", $c/club[$i], " ?"))} </sales>
for $p in doc("contactlist-john-doe.xml")/contactlist/person for $c in doc("cities.xml")/citylist/city where $p/city/text() = $c/name/text() return concat("Dear ", $p/name/firstname, people.xml
<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>
...
37
XQuery FLOWR expressions: order by
- rder by allows us to order sequences before we return them
- we can combine several orderings into new ones lexicographically
- e.g., for $nr in 1 to 5
for $letter in ("a", "b", "c")
- rder by $nr descending, $letter descending
return concat($nr, $letter) yields 5c 5b 5a 4c 4b ....
- e.g., for $nr in 1 to 5
for $letter in ("a", "b", "c")
- rder by $letter descending, $nr descending
return concat($nr, $letter) yields 5c 4c 3c 2c 1c 5b...
38
XQuery: grouping
- like SQL, XQuery provides aggregation functions
– max and min – average – count, etc
- like in SQL, when we want to use them, we need to group:
- but this comes natural, e.g.,
for $an in fn:distinct-values(doc("orders.xml")/orderlist/order/artNr) let $arts := doc("orders.xml")/orderlist/order[artNr = $an] where fn:count($arts) >= 3 return <high-demand-item> <articleNr> { $an } </articleNr> <maxPrice> { fn:max($arts/price) } </maxPrice> <avgPrice> { fn:avg($arts/price) } </avgPrice> </high-demand-item>
39
XQuery: functions
- XQuery is more than FLWOR expression
- it provides more than 100 built-in functions, we have already seen
some, plus – e.g., <name>{uppercase($p/lastname)}</name> – e.g., let $nickname := (substring($p/firstname,1,4))
- it allows the user to define functions
- e.g.,
declare function prefix:function_name(($parameter as datatype)*) as returnDatatype { (: ...your function code here... :) };
declare function local:minPrice( $price as xs:decimal, $discount as xs:decimal ) as xs:decimal { let $disc := ($price * $discount) div 100 return ($price - $disc) }
to be used e.g., in <minPrice> { local:minPrice($book/price, $book/discount)} </minPrice>
To summarize the departments from Manchester, use: local:summary(doc("acme_corp.xml")//employee[location = “Manchester"])
declare function local:summary($emps as element(employee)*) as element(dept)* { for $d in fn:distinct-values($emps/deptno) let $e := $emps[deptno = $d] return <dept> <deptno>{$d}</deptno> <headcount> {fn:count($e)} </headcount> <payroll> {fn:sum($e/salary)} </payroll> </dept> };
XQuery: functions, e.g., namespace analysis
- remember: XQuery is a Turing-complete programming language
- so, we should be able to do our namespace analysis in XQuery:
– how does XQuery treat namespaces? – how can we compare ‘prodigy’ of a namespace? – ...let’s see: for a start – get all namespaces and prefixes that are valid at a node
- not necessarily
defined there) ...and store in a sequence
40
declare function bjp:nsBindingsForNode($node) { for $prefix in in-scope-prefixes($node) for $ns in namespace-uri-for-prefix($prefix, $node)
- rder by $prefix ascending
return <nsb pre="{$prefix}" ns="{$ns}"/> }; declare namespace bjp = 'http://ex.org/'; declare variable $d := doc('testsuper.xml');
XQuery: functions, e.g., namespace analysis
- how to test documents for “superconfusing”?
- remember: a superconfusing document contains at least
- ne node which has 2 distinct in-scope prefixes bound to the same
namespace
- so, check the namespace bindings for a single node
(using sequence from bjp:nsBindingsForNode):
41
declare function bjp:multiPrefixedNs($bindings){ for $b in $bindings for $b2 in $bindings where not($b/@pre = $b2/@pre) and ($b/@ns = $b2/@ns) return <multi>{$b} {$b2}</multi> };
XQuery: functions, e.g., namespace analysis
- finally, we need to test all nodes in our documents for superconfusion:
- finally, we call our function -- in a way that cuts out the (otherwise far too
numerous) repetitions of our return string:
42
declare function bjp:isSuperConfusing(){ for $n in $d//* for $m in bjp:multiPrefixedNs(bjp:nsBindingsForNode($n)) return 'YES - itʼs superconfusing!' }; distinct-values(bjp:isSuperConfusing())
XQuery, schemas, and types
- if you query documents that are associated with a schema, you can exploit
schema-aware query answering: – [DTD] no types, but default values, e.g., answer to this query may vary depending on DTD
43
<!ELEMENT person (name,email*,url*,link?)> <!ATTLIST person id ID #REQUIRED> <!ATTLIST person isFriend (true|false) 'true'> for $m in doc('personal.xml')//*[@isFriend = 'true'] return $m/name/family/text()
XQuery, schemas, and types
- if you query documents that are associated with a schema, you can exploit
schema-aware query answering, eg XML Schema aware like SAXON-SA: – careful if you use <oXygen>: it sometimes gets confused whether you use SAXON-B
- r SAXON-SA
– [WXS] has default values, e.g., answer to this query may vary depending on your schema
44
<?xml version="1.0" encoding="UTF-8"?> <uli:nlist xmlns:uli="www.uli.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="www.uli.org file:test4.xsd"> <uli:nEl>3</uli:nEl> <uli:nEl attr="4">4</uli:nEl> <uli:nEl>5</uli:nEl> </uli:nlist>
import schema namespace uli="www.uli.org" at "test4.xsd"; for $m in doc('Untitled7.xml')//uli:nEl return data($m/@attr)
<xs:element name="nlist"> <xs:complexType> <xs:sequence> <xs:element name="nEl" type="uliS:number" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:complexType name="number"> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name="attr" default="15"/> </xs:extension> </xs:simpleContent> </xs:complexType>
XQuery, schemas, and types
- if you query documents that are associated with a schema, you can exploit
schema-aware query answering, eg XML Schema aware like SAXON-SA: – [WXS] has types, e.g., answer to this query may vary depending
- n your schema
45
<?xml version="1.0" encoding="UTF-8"?> <uli:list xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="www.uli.org test4.xsd" xmlns:uli="www.uli.org"> <uli:friend>Paul</uli:friend> <uli:friend>Peter</uli:friend> <uli:friend>Mary</uli:friend> <uli:friend>Joanne</uli:friend> <uli:friend>Lucy</uli:friend> </uli:list> import schema namespace uli="www.uli.org" at "test4.xsd"; for $m in doc('Untitled5.xml')//element(*,uli:A) return $m/uli:friend/text()
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema" targetNamespace="www.uli.org" xmlns:uliS="www.uli.org" elementFormDefault="qualified"> <xs:element name="list" type="uliS:B"> </xs:element> <xs:complexType name="A"> <xs:sequence> <xs:element name="friend type='xs:string' minOccurs = '3' maxOccurs ='5'/> </xs:sequence></xs:complexType> <xs:complexType name="B"> <xs:complexContent> <xs:restriction base="uliS:A"> <xs:sequence> <xs:element name="friend" type='xs:string' minOccurs = '4' maxOccurs ='5'/> </xs:sequence></xs:restriction> </xs:complexContent> </xs:complexType>
Namespace, schemas, and queries
- schemas and queries can be used together in a powerful way
– e.g., to retrieve values and default values – e.g., by exploiting type hierarchy in query: this can have various advantage:
- we can safe big ‘unions’ of queries through querying for instances of
super types
- should we change our schema/want to work with documents with
new kind of elements (see XML/OWL coursework), it may suffice to adapt the schema to new types; queries may remain unchanged!
- usage of namespace, schemas, and queries is a bit tricky:
– when to use/declare which namespace/prefix where – tool support required
- more in coursework and later
46
A Picture: how good is your schema for your data & queries?
- how difficult is it to write
queries for your questions – in the chosen query language and – against a chosen schema? – e.g., is it easier to write q21 and q22 than q11 and q12? Then Schema2 might be better suited… – other concerns: what happens when data or questions changes?
47
Data Data Data
...
Schema1 Schema2 q22 q21 q12 q11 An1 An2
Query Containment
- in general, query containment can be used for query optimisation:
– if I ask Q2 after Q1 against a DB B (and a schema S), and – if I know that all answers to Q2 are contained in the answers to Q1 (in all databases conforming to S), – then I can answer Q2 against the answers to Q1 instead of B… which is hopefully much smaller and therefor answering faster
- for simple (!) SQL queries, query containment can be decided,
– i.e., there exists a decision procedure that takes Q1, Q2, and – determines whether, for all DBs B, ans(Q1,B) ⊆ ans(Q2,B)
- remember how we said that XQuery is Turing-complete?