COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 - - PowerPoint PPT Presentation

comp6037 semi structured data and the web xpath and
SMART_READER_LITE
LIVE PREVIEW

COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 - - PowerPoint PPT Presentation

COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2 Uli Sattler University of Manchester 1 Manipulation of XML documents there are various standards, tools, APIs, data models for XML: validate parse query


slide-1
SLIDE 1

1

COMP6037 Semi-structured Data and the Web XPath and XQuery, week 2

Uli Sattler

University of Manchester

slide-2
SLIDE 2

2

Manipulation of XML documents

  • there are various standards, tools, APIs, data models for XML:

– validate – parse – query – transform

  • into other XML documents
  • into other formats, e.g., html, excel, relational tables
  • we continue with XPath..

– navigating and querying through XML documents – used in XQuery and in XSLT

slide-3
SLIDE 3

3

Manipulation of XML documents

  • XPath for navigating and querying through XML documents
  • XQuery

– more expressive than XPath, uses XPath – for querying and data manipulation – Turing complete – designed to access large amounts of data, to interface with relational systems

  • XSLT

– similar to XQuery in that it uses XPath, .... – designed for “styling”, together with XSL-FO or CSS

  • DOM and SAX

– a collection of APIs for programmatic manipulation – includes data model and parser – to build your own applications

slide-4
SLIDE 4

4

XPath

  • designed to navigate to/select parts in a well-formed XML document
  • no transformational capabilities (as in XQuery and XSLT)
  • is a W3C standard:

– XPath 1.0 is a 1999 W3C standard – XPath 2.0 is a 2007 W3C standard that extends/is a superset of XPath 1.0

  • richer set of WXS datatypes and support

➡ type information from WXS validation – see http://www.w3.org/TR/xpath20

  • allows to select/define parts of an XML document: lists of nodes
  • uses path expressions

– to navigate in XML documents – to select node-lists in an XML document

  • you have worked with path expressions in your 1st assignment:

like the expressions in a traditional computer file system

  • provides numerous built-in functions

– e.g., for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, etc. Difference list - set?

slide-5
SLIDE 5

5

XPath: Datamodel

  • remember how an XML document can be seen as a node-labelled tree

– with element names as labels

  • XPath operates on the abstract, logical structure of an XML document,

rather than its surface syntax - but not on DOM tree!

  • XPath uses XQuery/XPath Datamodel
  • there is a translation at http://www.w3.org/TR/xpath20/#datamodel

– see XPath process model...

slide-6
SLIDE 6

6

slide-7
SLIDE 7

Content models and types in DTD and WXS

  • in DTDs, we don’t really have types, only element names
  • in WXS, we have a type hierarchy

– an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y – we call this ‘named’ typing: sub-types are declared (restriction or extension), and not inferred (by comparing structure), e.g.,

  • Age and YoungAge

are subtypes of integer,

  • but YoungAge is not a

subtype of Age

  • however, ProperYoungAge is

a subtype of Age

7

<xs:simpleType name="Age"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="130"/> </xs:restriction></xs:simpleType> <xs:simpleType name="YoungAge"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="19"/> </xs:restriction></xs:simpleType> <xs:simpleType name="ProperYoungAge"> <xs:restriction base="Age"> <xs:minInclusive value="0"/> <xs:maxInclusive value="19"/> </xs:restriction></xs:simpleType>

slide-8
SLIDE 8

Types in WXS

  • how do we determine a type of an element w.r.t. a WXS schema?
  • 1. determine the type hierarchy, i.e., all types and where they are

derived from

  • if Y1, ..., Yk are all subtypes of X, then

e(X) := e(X) ∪ e(Y1) ∪ ... ∪ e(Yk) for e(T) the extension of type T, i.e., its instances

  • 2. for each element in document, find its type (and supertypes)
  • difficult, e.g., if

8

<xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType>

an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y

slide-9
SLIDE 9

Content models and types in DTD and WXS

  • In order to prevent difficulties in WXS as caused by

WXS’s Element Declarations Consistent constraint is imposed (and also on the schema at top level):

9

<xs:complexType> <xs:sequence> <xs:element name="person" type= "NewPersonType" minOccurs="0" maxOccurs="1"/> <xs:element name="person" type= "OldPersonType" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType>

If the {particles} contains, either directly, indirectly (that is, within the {particles} of a contained model group, recursively) or implicitly two or more element declaration particles with the same {name} and {target namespace}, then all their type definitions must be the same top-level definition, that is, all of the following must be true: 1 all their {type definition}s must have a non-absent {name}. 2 all their {type definition}s must have the same {name}. 3 all their {type definition}s must have the same {target namespace}.

slide-10
SLIDE 10

Determining types in DTD and WXS

  • [DTD] element name = type of that element
  • [WXS] as a consequence of the Element Declarations Consistent constraint,

we can determine all element’s types in a top down manner (and this is done during validation and recorded in PSVI): – start with n = root element node – from element name e of n, determine type t of n (if n is root node, since schema cannot contain two global components with the same name, this is possible

  • therwise EDC constraint ensures this)
  • 1. in schema, find model group G for t and

– for each element child node n’ of e with name e’, determine in G type t’ of e’ and recurse into (1.)

10

slide-11
SLIDE 11

XPath: Datamodel

  • the XPath DM uses the following concepts
  • nodes:

– element – attribute – text – namespace – processing-instruction – comment – document (root)

11

  • atomic value:
  • behave like nodes without children or parents
  • is a value in the value space of a WXS atomic type,

e.g., xsd:string

  • item: atomic values or nodes

<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore>

attribute node element node text node

slide-12
SLIDE 12

XPath Data Model

12

From: http://xformsinstitute.com/essentials/browse/ch03s02.php

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <?xml-stylesheet href="screen.css" type="text/css" media="screen"?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://vlib.example.org/"> vlib.example.org</a>.</p> </body> </html>

slide-13
SLIDE 13

Comparison XPath DM and DOM datamodel

  • XPath DM and DOM DM are similar, but different

– most importantly regarding names and values of nodes but also structurally (see ★) – in XPath, only attributes, elements, processing instructions, and namespace nodes have names, of form (local part, namespace URI) – whereas DOM uses pseudo-names like #document, #comment, #text – In XPath, the value of an element or root node is the concatenation of the values of all its text node descendants, not null as it is in DOM:

  • e.g, XPath value of <a>A<b>B</b></a> is “AB”

★ XPath does not have separate nodes for CDATA sections (they are merged with their surrounding text) – XPath has no representation

13

Document nodeType = DOCUMENT_NODE nodeName = #document nodeValue = (null) Element nodeType = ELEMENT_NODE nodeName = mytext nodeValue = (null) firstchild lastchild attributes

<N>here is some text and <![CDATA[some CDATA < >]]> </N>

slide-14
SLIDE 14

14

XPath: core terms -- relation between nodes

  • (since we view XML documents as trees) each node has at most one parent

– each node but the root node has exactly one parent – the root node has no parent

  • each node has zero or more children
  • ancestor is the transitive closure of parent,

i.e., a node’s parent, its parent, its parent, ...

  • descendant is the transitive closure of child,

i.e., a node’s children, their children, their children, ...

  • when evaluating an XPath expression p, we assume that we know

– which document and – which context we are evaluating p over – … we see later how they are chosen/given

  • an XPath expression evaluates to an item sequence,

– an item is either a node (doc., element, attribute,...) or an atomic value – document order is preserved among items

slide-15
SLIDE 15

XPath: core terms -- document order

  • Within a tree, document order is

specified as follows: – The root node is the first node ➥ top-down – Every node occurs before all of its children and descendants ➥ top-down – Namespace nodes immediately follow the element node with which they are associated. The relative order of namespace nodes is stable but implementation-dependent. ➥ exception for NSs - they are first – Attribute nodes immediately follow the namespace nodes of the element node with which they are associated. The relative order of attribute nodes is stable but implementation-dependent. ➥ exception for attributes - they are second – The relative order of siblings is the order in which they occur in the children property of their parent node. ➥ left-before-right – Children and descendants occur before following siblings ➥ depth-first

15

slide-16
SLIDE 16

16

XPath: core terms -- selecting nodes

Location paths/Path expressions are used to select nodes

  • they are based on (location) steps
  • a step is of the form axis::test[predicate]... [predicate] where

– axis indicates the navigation direction in the tree – test determines what (nodes by kind or name) to select – zero or more predicates select a further subset

  • axis include:

– self for the context node – child for all child nodes – descendant for all descendant nodes (but not attribute or NS nodes) – parent for the parent node (in case it exists) – ancestor for all ancestor nodes – attribute for the attribute nodes of the context node – etc.

  • A path expression, when evaluated on an XML document, returns an item

sequence

slide-17
SLIDE 17

XPath: core terms -- axis

17

ForwardAxis ::= ("child" "::") | ("descendant" "::") | ("attribute" "::") | ("self" "::") | ("descendant-or-self" "::") | ("following-sibling" "::") | ("following" "::") | ("namespace" "::") ReverseAxis ::= ("parent" "::") | ("ancestor" "::") | ("preceding-sibling" "::") | ("preceding" "::") | ("ancestor-or-self" "::")

slide-18
SLIDE 18

18

XPath: core terms -- selectors

Node Tests include:

  • element or attribute names,

e.g., child::title selects the title element child nodes of the context node attribute::height selects the height attribute node of the context node

  • * - a wildcard for element or attribute names,

e.g., child::* selects all element child nodes of the context node attribute::* selects all the attributes of the context node

  • text() - to select text nodes

e.g., child::text() selects all text child nodes of the context node

  • node() - to select all nodes, regardless of their type

e.g., child::node() selects all child nodes of the context node

  • element(x,Y) - to select all element nodes with name x and type Y

e.g., child::element(person,ModernPersonType) and child::element(*,ModernPersonType)

  • etc.
slide-19
SLIDE 19

19

XPath: core terms -- predicates

  • [pred1]...[predm] sub-selects those nodes that satisfy

pred1 and ... and predm

  • they sub-select element nodes according to their

– position, e.g., [position() = 1], [position() = last() -1], etc. – properties of descendants,e.g., [attribute::type="warning"] selects those element child nodes that have an attribute child node for type with value "warning" – and can be combined using “or”, e.g., [position() = 1 or position() = 2]

  • Examples:

– child::chapter[attribute::type=“warning”][position()=5] selects the fifth chapter child of the context node that has a type attribute with value “warning” – child::chapter[attribute::type=“warning” or position()=5] selects all chapter child nodes of the context node that have a type attribute with value “warning” together with the 5th chapter child node – descendant::chapter[child::title=“Introduction”] selects the chapter descendant nodes of the context node that have one or more title children whose value is “string-equal” to “Introduction”

slide-20
SLIDE 20

20

XPath: core terms -- location path & path expression

  • a location path

– is basically a sequence of location steps loc-step – can be local/relative: of the form (loc-step”/”)* loc-step – can be global/absolute: of the form “/” or “/” (loc-step”/”)* loc-step

  • if we do not have a context node, the root node is context node
  • a path expression is a location path or a disjunction of location paths

p1|..|pk

  • e.g.,

– child::person[position()=1]/child::name/attr::given is a local location path selecting the given attribute node of all name child nodes of the first person child node of the context node – /child::personlist/child::person[position()=1]/child::name/attr::given is a global location path selecting the given attribute node of all name child nodes of the first person child node of the personlist root node – /child::doc/child::chapter | /child::doc/child::appendix is a path expression selecting all chapter and all appendix child nodes of the doc root node

slide-21
SLIDE 21

21

XPath: core terms -- abbreviated syntax

  • this becomes rather unreadable, so the abbreviated syntax was

introduced:

– child:: can be omitted from a location step e.g., div/para is short for child::div/child::para – attribute:: can be abbreviated to @ e.g., para[@type="warn"] is short for child::para[attribute::type="warn”] para[@type] selects all para element child nodes having an attribute type – // is short for /descendant-or-self::node()/ e.g., //para is short for /descendant-or-self::node()/child::para – . is short for self::node() e.g., .//para is short for self::node()/descendant-or-self::node()/child::para – .. is short for parent::node() e.g., ../title is short for parent::node()/child::title

slide-22
SLIDE 22

Example

<?xml version="1.0"?><?xml-stylesheet... ?> <!DOCTYPE people [... ]> <people> <person born="1912" died="1954" id="p342"> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> <homepage myref="http://www.turing.org.uk/"/> </person> <person born="1918" died="1988" id="p4567"> <name> <first_name>Richard</first_name> <middle_initial>&#x4D;</middle_initial> <last_name>Feynman</last_name> </name> <profession>physicist</profession> <hobby>Playing the bongoes</hobby> </person></people> from: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html 22

  • /people/person/name/first_name/text()

selects "Alan" and "Richard"

  • //middle_initial/../first_name

selects <first_name>Richard</first_name>

  • //person[profession="physicist"]

selects all person nodes with a profession child node with the value "physicist.”

/child::people/child::person/child::name/ child::first_name/child::comment()text() /descendant-or-self::middle_initial/ parent::node()/first_name /descendant-or-self::person [child::profession="physicist"]

slide-23
SLIDE 23

23

Example

<?xml version="1.0"?><?xml-stylesheet... ?> <!DOCTYPE people [... ]> <people> <person born="1912" died="1954" id="p342"> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> <homepage xlink:href="http://www.turing.org.uk/"/> </person> <person born="1918" died="1988" id="p4567"> <name> <first_name>Richard</first_name> <middle_initial>&#x4D;</middle_initial> <last_name>Feynman</last_name> </name> <profession>physicist</ profession> <hobby>Playing the bongoes</hobby> </person></people> from: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html

  • //name[first_name="Richard" or first_name="Dick"]

selects all name nodes with a first_name child with the value Richard or Dick

  • /people/person[@born < 1950]/name[first_name = "Alan"]

selects all name child nodes –

  • f person nodes whose born value attribute value is below 1950 and

– that have a first_name child nodes with the value Alan

/descendant-or-self::name [child::first_name="Richard" or child::first_name="Dick"] /child::people/child::person[attribute::born < 1950]/ child::name[child::first_name = "Alan"]

slide-24
SLIDE 24

Using XPath: e.g. in XQuery

24

slide-25
SLIDE 25

25

XQuery

  • is a language for querying XML data
  • is for XML what SQL is for databases
  • like XSLT, it is built on/heavily uses XPath expressions
  • a W3C standard since 2007, see http://www.w3.org/TR/xquery/
  • is supported by major database engines (IBM, Oracle, Microsoft, etc.)
  • is of expressivity comparable to XSLT, but of different philosophy:

– typed (XSLT can be said to be “less strictly” typed) – functional

  • like XSLT, it can be used, e.g., to

– extract information to use in a Web Service – generate summary reports – transform XML data to HTML – search Web documents for relevant information – ...and to answer queries

slide-26
SLIDE 26

26

XQuery: some basics

  • XQuery provides support for datatypes, i.e., we

– have variables and can – declare their type, yet the query processor may be strict: no attempt at a conversion to the correct type needs to be made! – e.g., if I try to add an integer with a decimal or write an integer into a decimal variable, the query processor may stop with an error

  • like XPath, XQuery is based on item sequences

– a sequence is a (poss. empty) list of items separated by comma – items are atomic values or nodes (no nesting for sequences!) – as usual, nodes are of one of 7 kinds: element, attribute, text, namespace, processing-instruction, comment, or document (root) – if $mySeq is a sequence, $mySeq[3] is its third item

  • all variable names start with “$” as in $mySeq
  • comments are between “(:” and “:)” as in “(: this is a comment:)”
  • a central, SQL-like part are FLOWR expressions

W3C speak

slide-27
SLIDE 27

27

XQuery: FLWOR expressions

  • “FLWOR” is pronounced “flower”
  • a FLWOR expression has 5 possibly overlapping parts:

– For e.g., for $x in doc(”people.xml")/contactList/person – Let e.g., let $i := 3 let $n := $x/name/firstname – Where e.g., where $x/@categ = “friend” – Order by e.g., order by $x/name/lastname ascending – Return e.g., return concat($x/name/lastname, “, “$x/name/firstname)

people.xml

<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>

...

F and L can appear any (!) number of times in any order. W and O are optional, but must appear in the order given. R has always to be there...depending on who you ask...

slide-28
SLIDE 28

28

XQuery FLWOR expressions: for

  • a for expression determines what to

iterate through

  • is basically of the form
  • where expression is

– any XPath location path or – a FLWOR expression (nesting!) or – a logic expression (if-then-else, etc.), later more

  • e.g., for $b in doc(”people.xml")/contactList/person[@categ = “friend”]

– query processor goes through the sequence of all (element) nodes selected by the XPath location path

  • e.g., let $p := 3

for $b as element() at $p in doc(”people.xml")/contactList/person[@categ = “friend”] – query processor goes through (the singleton sequence containing) the third element node of the node set selected by the XPath location path (obviously, we can do this much nicer...suggestions?)

for variable (as datatype)? (at position)? in expression

people.xml

<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>

...

slide-29
SLIDE 29

29

XQuery FLWOR expressions: let

  • a let expression binds a variable to a value
  • is basically of the form
  • where expression is

– any XPath location path or – a FLOWR expression or – a logic expression (if-then-else, etc.), later more

  • e.g.,

let variable (as datatype)? := expression

people.xml

<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>

...

for $b in doc("people.xml")/contactlist/person let $name as text() := if (xs:integer($b/@age) < xs:integer(16)) then ($b/name/firstname/text()) else ($b/name/lastname/text()) return $name for $b in doc("people.xml")/contactlist/person let $name as element() := $b/name/firstname return $name

slide-30
SLIDE 30

30

XQuery FLWOR expressions: for & let

  • we can repeat and mix for and let expressions
  • a FLOWR expression

– has at least one for or one let expression, – but can have any number of them in any order

  • careful: the order plays a crucial role for their meaning: make sure to bind

variables to the right values before using them in for expression:

  • more careful: in the above example, is the ‘double’ really a ‘double’?

people.xml

<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>

... let $doc := doc("people.xml") for $p in $doc/contactlist/person let $n := $p/name/lastname/text() let $a := $p/@age for $double in $doc/contactlist/person[@age = $a][name/lastname/text() = $n]

slide-31
SLIDE 31

31

XQuery FLWOR expressions: return

  • a return expression determines the output
  • is basically of the form
  • where expression is one of the logical expressions to be defined later
  • it returns elements as they are, i.e., with attributes and descendants
  • e.g.,

returns <MyFriendList>John Millie...</MyFriendList>

  • careful: we needed “{“, “}” to distinguish between text and instructions

return expression

people.xml

<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>

... return <MyFriendList> { for $b in doc("people.xml")/contactlist/person[@categ="friend"] return $b/name/firstname/text() } </MyFriendList>

slide-32
SLIDE 32

32

XQuery FLWOR expressions: return & logical expressions

  • as mentioned before, in FLWOR expression,

we can make use of logical expressions including – if-then-else – some/every – Boolean expressions

  • e.g.,

people.xml

<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>

... let $doc := doc("people.xml") return <MyFriendList> { for $b in $doc/contactlist/person[@categ="friend"] return <friend> { (if (xs:integer($b/@age) < xs:integer(16)) then $b/name/firstname/text() else $b/name/lastname/text()) } </friend> } </MyFriendList>

slide-33
SLIDE 33

33

XQuery: constructors

  • as we have seen, we can use text in the return part
  • to return a more complex XML document, we can make use of

constructors – e.g., direct element constructors as in the previous example – or direct element constructors with attributes

  • we use “{“ and “}” to delimit expressions that are evaluated, e.g.,
  • if we want to construct

elements with attributes, we can do this easily: e.g., return <friend phone =“{ xs:string($p/phone) }”>{ (if (...

let $doc := doc("contactlist-john-doe.xml") for $p in $doc/contactlist/person return <example> <p> Here is a query. </p> <eg> $p/name</eg> <p> Here is the result of the query. </p> <eg>{ $p/name }</eg> </example>

slide-34
SLIDE 34

34

XQuery FLOWR expressions: where

  • where is used to filter the node sets

selected through let and for

  • like in SQL, we can use where for joins
  • f several trees or documents
  • e.g.,

cities.xml <?xml version="1.0" encoding=”UTF-8"?> <citylist> <city> <name>Manchester</name> <club>Manchester United</club> </city> <city> <name>Munich</name> <club>Die Loewen</club> </city> ...

for $p in doc("contactlist-john-doe.xml")/contactlist/person for $c in doc("cities.xml")/citylist/city where $p/city/text() = $c/name/text() return concat("Dear ", $p/name/firstname, ", do you like ", $c/club ,"? " )

people.xml

<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>

...

slide-35
SLIDE 35

35

XQuery FLOWR expressions: where

  • a more realistic, SQL-like example

(from <oXygen/>):

product.xml <?xml version="1.0" encoding="UTF-8"?> <products> <product> <productId>1</productId> <productName>Wave Runner</productName> <productSpec>120 HP blaa</productSpec> </product> ... sale.xml <?xml version="1.0" encoding="UTF-8"?> <sales> <sale productId="1"> <mrq>180$</mrq> <ytd>18.87% up</ytd> <margin>5%</margin> </sale ...

<sales> { for $product in doc("products.xml")/products/product, $sale in doc("sales.xml")/sales/sale where $product/productId = $sale/@productId return <product id="{$product/productId}"> { $product/productName, $product/productSpec, $sale/mrq, $sale/ytd, $sale/margin } </product> } </sales>

slide-36
SLIDE 36

36

XQuery FLOWR expressions: nesting

  • like in SQL, we can nest expressions
  • e.g., the previous example does not

work in case a city has several clubs:

cities.xml <?xml version="1.0" encoding=”UTF-8"?> <citylist> <city> <name>Manchester</name> <club>Manchester United</club> <club>Manchester City</club> </city> <city> <name>Munich</name> <club>Die Loewen</club> <club>Bayern-Muenchen</club> </city> ...

<sales> {for $p in doc("contactlist-john-doe.xml")/contactlist/person for $c in doc("cities.xml")/citylist/city where $p/city = $c/name return (for $i in 1 to fn:count($c/club) return concat("Dear ", $p/name/firstname, ", do you like ", $c/club[$i], " ?"))} </sales>

for $p in doc("contactlist-john-doe.xml")/contactlist/person for $c in doc("cities.xml")/citylist/city where $p/city/text() = $c/name/text() return concat("Dear ", $p/name/firstname, people.xml

<?xml version="1.0" encoding="UTF-8"?> <contactlist> <person categ="friend" age="25"> <name> <lastname>Doe</lastname> <firstname>John</firstname> </name> <phone>0044 161 1234 5667</phone> <address> 123 Main Street</address> <city>Manchester</city> </person>

...

slide-37
SLIDE 37

37

XQuery FLOWR expressions: order by

  • rder by allows us to order sequences before we return them
  • we can combine several orderings into new ones lexicographically
  • e.g., for $nr in 1 to 5

for $letter in ("a", "b", "c")

  • rder by $nr descending, $letter descending

return concat($nr, $letter) yields 5c 5b 5a 4c 4b ....

  • e.g., for $nr in 1 to 5

for $letter in ("a", "b", "c")

  • rder by $letter descending, $nr descending

return concat($nr, $letter) yields 5c 4c 3c 2c 1c 5b...

slide-38
SLIDE 38

38

XQuery: grouping

  • like SQL, XQuery provides aggregation functions

– max and min – average – count, etc

  • like in SQL, when we want to use them, we need to group:
  • but this comes natural, e.g.,

for $an in fn:distinct-values(doc("orders.xml")/orderlist/order/artNr) let $arts := doc("orders.xml")/orderlist/order[artNr = $an] where fn:count($arts) >= 3 return <high-demand-item> <articleNr> { $an } </articleNr> <maxPrice> { fn:max($arts/price) } </maxPrice> <avgPrice> { fn:avg($arts/price) } </avgPrice> </high-demand-item>

slide-39
SLIDE 39

39

XQuery: functions

  • XQuery is more than FLWOR expression
  • it provides more than 100 built-in functions, we have already seen

some, plus – e.g., <name>{uppercase($p/lastname)}</name> – e.g., let $nickname := (substring($p/firstname,1,4))

  • it allows the user to define functions
  • e.g.,

declare function prefix:function_name(($parameter as datatype)*) as returnDatatype { (: ...your function code here... :) };

declare function local:minPrice( $price as xs:decimal, $discount as xs:decimal ) as xs:decimal { let $disc := ($price * $discount) div 100 return ($price - $disc) }

to be used e.g., in <minPrice> { local:minPrice($book/price, $book/discount)} </minPrice>

To summarize the departments from Manchester, use: local:summary(doc("acme_corp.xml")//employee[location = “Manchester"])

declare function local:summary($emps as element(employee)*) as element(dept)* { for $d in fn:distinct-values($emps/deptno) let $e := $emps[deptno = $d] return <dept> <deptno>{$d}</deptno> <headcount> {fn:count($e)} </headcount> <payroll> {fn:sum($e/salary)} </payroll> </dept> };

slide-40
SLIDE 40

XQuery: functions, e.g., namespace analysis

  • remember: XQuery is a Turing-complete programming language
  • so, we should be able to do our namespace analysis in XQuery:

– how does XQuery treat namespaces? – how can we compare ‘prodigy’ of a namespace? – ...let’s see: for a start – get all namespaces and prefixes that are valid at a node

  • not necessarily

defined there) ...and store in a sequence

40

declare function bjp:nsBindingsForNode($node) { for $prefix in in-scope-prefixes($node) for $ns in namespace-uri-for-prefix($prefix, $node)

  • rder by $prefix ascending

return <nsb pre="{$prefix}" ns="{$ns}"/> }; declare namespace bjp = 'http://ex.org/'; declare variable $d := doc('testsuper.xml');

slide-41
SLIDE 41

XQuery: functions, e.g., namespace analysis

  • how to test documents for “superconfusing”?
  • remember: a superconfusing document contains at least
  • ne node which has 2 distinct in-scope prefixes bound to the same

namespace

  • so, check the namespace bindings for a single node

(using sequence from bjp:nsBindingsForNode):

41

declare function bjp:multiPrefixedNs($bindings){ for $b in $bindings for $b2 in $bindings where not($b/@pre = $b2/@pre) and ($b/@ns = $b2/@ns) return <multi>{$b} {$b2}</multi> };

slide-42
SLIDE 42

XQuery: functions, e.g., namespace analysis

  • finally, we need to test all nodes in our documents for superconfusion:
  • finally, we call our function -- in a way that cuts out the (otherwise far too

numerous) repetitions of our return string:

42

declare function bjp:isSuperConfusing(){ for $n in $d//* for $m in bjp:multiPrefixedNs(bjp:nsBindingsForNode($n)) return 'YES - itʼs superconfusing!' }; distinct-values(bjp:isSuperConfusing())

slide-43
SLIDE 43

XQuery, schemas, and types

  • if you query documents that are associated with a schema, you can exploit

schema-aware query answering: – [DTD] no types, but default values, e.g., answer to this query may vary depending on DTD

43

<!ELEMENT person (name,email*,url*,link?)> <!ATTLIST person id ID #REQUIRED> <!ATTLIST person isFriend (true|false) 'true'> for $m in doc('personal.xml')//*[@isFriend = 'true'] return $m/name/family/text()

slide-44
SLIDE 44

XQuery, schemas, and types

  • if you query documents that are associated with a schema, you can exploit

schema-aware query answering, eg XML Schema aware like SAXON-SA: – careful if you use <oXygen>: it sometimes gets confused whether you use SAXON-B

  • r SAXON-SA

– [WXS] has default values, e.g., answer to this query may vary depending on your schema

44

<?xml version="1.0" encoding="UTF-8"?> <uli:nlist xmlns:uli="www.uli.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="www.uli.org file:test4.xsd"> <uli:nEl>3</uli:nEl> <uli:nEl attr="4">4</uli:nEl> <uli:nEl>5</uli:nEl> </uli:nlist>

import schema namespace uli="www.uli.org" at "test4.xsd"; for $m in doc('Untitled7.xml')//uli:nEl return data($m/@attr)

<xs:element name="nlist"> <xs:complexType> <xs:sequence> <xs:element name="nEl" type="uliS:number" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:complexType name="number"> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name="attr" default="15"/> </xs:extension> </xs:simpleContent> </xs:complexType>

slide-45
SLIDE 45

XQuery, schemas, and types

  • if you query documents that are associated with a schema, you can exploit

schema-aware query answering, eg XML Schema aware like SAXON-SA: – [WXS] has types, e.g., answer to this query may vary depending

  • n your schema

45

<?xml version="1.0" encoding="UTF-8"?> <uli:list xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="www.uli.org test4.xsd" xmlns:uli="www.uli.org"> <uli:friend>Paul</uli:friend> <uli:friend>Peter</uli:friend> <uli:friend>Mary</uli:friend> <uli:friend>Joanne</uli:friend> <uli:friend>Lucy</uli:friend> </uli:list> import schema namespace uli="www.uli.org" at "test4.xsd"; for $m in doc('Untitled5.xml')//element(*,uli:A) return $m/uli:friend/text()

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema" targetNamespace="www.uli.org" xmlns:uliS="www.uli.org" elementFormDefault="qualified"> <xs:element name="list" type="uliS:B"> </xs:element> <xs:complexType name="A"> <xs:sequence> <xs:element name="friend type='xs:string' minOccurs = '3' maxOccurs ='5'/> </xs:sequence></xs:complexType> <xs:complexType name="B"> <xs:complexContent> <xs:restriction base="uliS:A"> <xs:sequence> <xs:element name="friend" type='xs:string' minOccurs = '4' maxOccurs ='5'/> </xs:sequence></xs:restriction> </xs:complexContent> </xs:complexType>

slide-46
SLIDE 46

Namespace, schemas, and queries

  • schemas and queries can be used together in a powerful way

– e.g., to retrieve values and default values – e.g., by exploiting type hierarchy in query: this can have various advantage:

  • we can safe big ‘unions’ of queries through querying for instances of

super types

  • should we change our schema/want to work with documents with

new kind of elements (see XML/OWL coursework), it may suffice to adapt the schema to new types; queries may remain unchanged!

  • usage of namespace, schemas, and queries is a bit tricky:

– when to use/declare which namespace/prefix where – tool support required

  • more in coursework and later

46

slide-47
SLIDE 47

A Picture: how good is your schema for your data & queries?

  • how difficult is it to write

queries for your questions – in the chosen query language and – against a chosen schema? – e.g., is it easier to write q21 and q22 than q11 and q12? Then Schema2 might be better suited… – other concerns: what happens when data or questions changes?

47

Data Data Data

...

Schema1 Schema2 q22 q21 q12 q11 An1 An2

slide-48
SLIDE 48

Query Containment

  • in general, query containment can be used for query optimisation:

– if I ask Q2 after Q1 against a DB B (and a schema S), and – if I know that all answers to Q2 are contained in the answers to Q1 (in all databases conforming to S), – then I can answer Q2 against the answers to Q1 instead of B… which is hopefully much smaller and therefor answering faster

  • for simple (!) SQL queries, query containment can be decided,

– i.e., there exists a decision procedure that takes Q1, Q2, and – determines whether, for all DBs B, ans(Q1,B) ⊆ ans(Q2,B)

  • remember how we said that XQuery is Turing-complete?

– what does this mean for the difficulty of XQuery containment?

48