Module 5 Module 5 Introduction to XQuery Introduction to XQuery - - PowerPoint PPT Presentation

module 5 module 5 introduction to xquery introduction to
SMART_READER_LITE
LIVE PREVIEW

Module 5 Module 5 Introduction to XQuery Introduction to XQuery - - PowerPoint PPT Presentation

Module 5 Module 5 Introduction to XQuery Introduction to XQuery XML is now everywhere XML is now everywhere Google search (warning: unreliable Google search (warning: unreliable numbers) numbers) 285.000.000 for XML 285.000.000 for


slide-1
SLIDE 1

Module 5 Module 5 Introduction to XQuery Introduction to XQuery

slide-2
SLIDE 2

01/31/07 2

XML is now everywhere XML is now everywhere

 Google search (warning: unreliable

Google search (warning: unreliable numbers) numbers)

 285.000.000 for XML

285.000.000 for XML

 1.000.000 for XQuery

1.000.000 for XQuery

 11.000.000 for XSLT

11.000.000 for XSLT

 12.000.000 for XML Schema

12.000.000 for XML Schema

 60.000.000 for .NET

60.000.000 for .NET

 200.000.000 for Java

200.000.000 for Java

 64.000.000 for SQL

64.000.000 for SQL

 The highest Google number among all the

The highest Google number among all the technology buzzwords that I searched (except RSS) technology buzzwords that I searched (except RSS)

slide-3
SLIDE 3

01/31/07 3

Sources of XML data Sources of XML data

1. 1.

Inter-application communication data (WS, Rest, etc) Inter-application communication data (WS, Rest, etc)

2. 2.

Mobile devices communication data Mobile devices communication data

3. 3.

Logs Logs

4. 4.

Blogs (RSS) Blogs (RSS)

5. 5.

Metadata (e.g. Schema, WSDL, XMP) Metadata (e.g. Schema, WSDL, XMP)

6. 6.

Presentation data (e.g. XHTML) Presentation data (e.g. XHTML)

7. 7.

Documents (e.g. Word) Documents (e.g. Word)

8. 8.

Views of other sources of data Views of other sources of data

Relational, LDAP, CSV, Excel, etc. Relational, LDAP, CSV, Excel, etc.

9. 9.

Sensor data Sensor data

slide-4
SLIDE 4

01/31/07 4

Some vertical application Some vertical application domains for XML domains for XML

 HealthCare Level Seven

HealthCare Level Seven http://www.hl7.org/ http://www.hl7.org/

 Geography Markup Language (GML)

Geography Markup Language (GML)

 Systems Biology Markup Language (SBML)

Systems Biology Markup Language (SBML) http://sbml.org/ http://sbml.org/

 XBRL, the XML based Business Reporting standard

XBRL, the XML based Business Reporting standard http://www.xbrl.org/ http://www.xbrl.org/

 Global Justice XML Data Model

Global Justice XML Data Model (GJXDM)

(GJXDM) http://it.ojp.gov/jxdm

http://it.ojp.gov/jxdm

ebXML ebXML http://www.ebxml.org/ http://www.ebxml.org/

 e.g. Encoded Archival Description Application

e.g. Encoded Archival Description Application http://lcweb.loc.gov/ead/ http://lcweb.loc.gov/ead/

 Digital photography metadata XMP

Digital photography metadata XMP

 An XML grammar for sensor data (SensorML)

An XML grammar for sensor data (SensorML)

 Real Simple Syndication (RSS 2.0)

Real Simple Syndication (RSS 2.0)

Basically everywhere. Basically everywhere.

slide-5
SLIDE 5

01/31/07 5

Processing the XML data Processing the XML data

  • Huge amount of XML information, and growing

Huge amount of XML information, and growing

  • We need to “

We need to “manage manage” it, and then “ ” it, and then “process process” it ” it

  • Store it efficiently

Store it efficiently

  • Verify the correctness

Verify the correctness

  • Filter, search, select, join, aggregate

Filter, search, select, join, aggregate

  • Create new pieces of information

Create new pieces of information

  • Clean, normalize the data

Clean, normalize the data

  • Update it

Update it

  • Take actions based on the existing data

Take actions based on the existing data

  • Write complex execution flows

Write complex execution flows

  • No conceptual organization like for relational

No conceptual organization like for relational databases (applications are too heterogeneous) databases (applications are too heterogeneous)

slide-6
SLIDE 6

01/31/07 6

Frequent solutions to XML data Frequent solutions to XML data management management

1. 1.

Map it to Map it to generic generic programming APIs (e.g. programming APIs (e.g. DOM, SAX, StaX) DOM, SAX, StaX)

2. 2.

Manually Manually map it to map it to non-generic non-generic APIs APIs

3. 3.

Automatically Automatically map it to map it to non-generic non-generic structures structures

4. 4.

Use Use XML extensions XML extensions of existing languages

  • f existing languages

5. 5.

Shredding Shredding for relational stores for relational stores

6. 6.

Native Native XML processing through XSLT and XML processing through XSLT and XQuery XQuery

slide-7
SLIDE 7

01/31/07 7

  • 1. Mapping to generic structures
  • 1. Mapping to generic structures

 Represent the data:

Represent the data:

 Original UNICODE form or

Original UNICODE form or

 Some binary representation (e.g FastInfoset)

Some binary representation (e.g FastInfoset)

 Store it:

Store it:

 Directly on a file system or

Directly on a file system or

 On a “transacted” file system (e.g. SleepyCat, or a relational

On a “transacted” file system (e.g. SleepyCat, or a relational database) database)

 Map the XML data to generic XML programmatic

Map the XML data to generic XML programmatic APIs APIs

 E.g. Dom, Sax, Stax (JSR 173), XMLReader

E.g. Dom, Sax, Stax (JSR 173), XMLReader

 Use the native programming languages (e.g. Java, C#)

Use the native programming languages (e.g. Java, C#) to manipulate the data to manipulate the data

 Re-serialize it at the end

Re-serialize it at the end

slide-8
SLIDE 8

01/31/07 8

  • 1. Manual mapping to generic
  • 1. Manual mapping to generic

structures (example) structures (example)

<purchaseOrder> <purchaseOrder>

<lineItem> <lineItem> … ….. .. </lineItem> </lineItem> <lineItem> <lineItem> … ….. .. </lineItem> </lineItem>

</purchaseOrder> </purchaseOrder> <book> <book> <author>…</author> <author>…</author> <title>….</title> <title>….</title> … ….. .. </book> </book>

Class DomNode{

public String getNodeName(); public String getNodeValue(); public void setNodeValue(nodeValue); public short getNodeType();

} Hard coded mappings

slide-9
SLIDE 9

01/31/07 9

  • 2. Manual mapping to non-
  • 2. Manual mapping to non-

generic structures generic structures

<purchaseOrder> <purchaseOrder>

<lineItem> <lineItem> … ….. .. </lineItem> </lineItem> <lineItem> <lineItem> … ….. .. </lineItem> </lineItem>

</purchaseOrder> </purchaseOrder> <book> <book> <author>…</author> <author>…</author> <title>….</title> <title>….</title> … ….. .. </book> </book>

Class PurchaseOrder{

public List getLineItems();

……..

} Hard coded mappings Class Book{ public List getAuthor();

public String getTitle(); ……

}

slide-10
SLIDE 10

01/31/07 10

  • 3. Automatic mapping to non-
  • 3. Automatic mapping to non-

generic structures generic structures

<type name=“ <type name=“book-type book-type”> ”> <sequence> <sequence> <attribute name=“ <attribute name=“year year” type=“xs:integer”> ” type=“xs:integer”> <element name=“ <element name=“title title” type=“xs:string”> ” type=“xs:string”> <sequence minoccurs=“0”> <sequence minoccurs=“0”> <element name=“ <element name=“author author” type=“xs:string> ” type=“xs:string> </sequence> </sequence> </sequence> </sequence> </type> </type> <element name=“ <element name=“book book” type=“ ” type=“book-type book-type”> ”>

Class Book-type{

public integer getYear(); public string getTitle(); public List getAuthors();

……..

} Automatic mapping e.g.XMLBeans

slide-11
SLIDE 11

01/31/07 11

  • 4. XML extensions of existing
  • 4. XML extensions of existing

procedural languages procedural languages

 Examples:

Examples:

 C-omega, ECMAscript, PHP extensions,

C-omega, ECMAscript, PHP extensions, Phyton extensions, etc. Phyton extensions, etc.

 Most of them define:

Most of them define:

 A way of importing XML data into their native

A way of importing XML data into their native type system type system

 A rich API for XML data manipulation

A rich API for XML data manipulation

 A way of navigating/searching/querying the

A way of navigating/searching/querying the XML data via their extensions (Xpath based or XML data via their extensions (Xpath based or Xpath inspired) Xpath inspired)

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

slide-12
SLIDE 12

01/31/07 12

  • 5. Native XML processing
  • 5. Native XML processing

XSLT and XQuery XSLT and XQuery

 Most promising alternative for the future.

Most promising alternative for the future.

 The

The only

  • nly alternative such that:

alternative such that:

 the data is modeled only once

the data is modeled only once

 is well integrated with XML Schema type system

is well integrated with XML Schema type system

 it preserves the logical/physical data independence

it preserves the logical/physical data independence

 the code deals with non-generic structures

the code deals with non-generic structures

 Code can be optimized automatically

Code can be optimized automatically

 Data is stored:

Data is stored:

in plain file systems in plain file systems or

  • r in sophisticated data stores (e.g. XML

in sophisticated data stores (e.g. XML extensions of relational stores) extensions of relational stores)

 Missing pieces, under development

Missing pieces, under development

 E.g. no procedural logic

E.g. no procedural logic

slide-13
SLIDE 13

01/31/07 13

Why XQuery ? Why XQuery ?

 Why a “

Why a “query” language query” language for XML ? for XML ?

 Need to process XML data

Need to process XML data

 Preserve logical/physical data independence

Preserve logical/physical data independence

 The semantics is described in terms of an

The semantics is described in terms of an abstract data model abstract data model, , independent of the physical data storage independent of the physical data storage

 Declarative

Declarative programming programming

 Such programs should describe the “

Such programs should describe the “what what”, not the “ ”, not the “how” how”

 Why a

Why a native native query language ? Why not query language ? Why not SQL SQL ? ?

 We need to deal with the

We need to deal with the specificities specificities of XML

  • f XML

(hierarchical, ordered , textual, potentially schema-less (hierarchical, ordered , textual, potentially schema-less structure) structure)

 Why another XML processing language ? Why not

Why another XML processing language ? Why not XSLT XSLT? ?

 The template nature of XSLT was not appealing to the

The template nature of XSLT was not appealing to the database people. Not declarative enough. database people. Not declarative enough.

QuickTime TIFF (Unc are neede

slide-14
SLIDE 14

01/31/07 14

What is XQuery ? What is XQuery ?

 A programming language that can express arbitrary

A programming language that can express arbitrary XML to XML data transformations XML to XML data transformations

 Logical/physical data independence

Logical/physical data independence

 “

“Declarative” Declarative”

 “

“High level” High level”

 “

“Side-effect free” Side-effect free”

 “

“Strongly typed” language Strongly typed” language

 “

“An expression language for XML.” An expression language for XML.”

 Commonalities with

Commonalities with functional functional programming, programming, imperative imperative programming and programming and query query languages languages

 The “

The “query query” part might be a misnomer (***) ” part might be a misnomer (***)

slide-15
SLIDE 15

01/31/07 15

XQuery family of standards XQuery family of standards

  • XQuery

XQuery 1.0: An XML Query Language 1.0: An XML Query Language:an XML-aware syntax for querying collections of :an XML-aware syntax for querying collections of structured and semi-structured data both locally and over the Web structured and semi-structured data both locally and over the Web

  • XSL Transformations (XSLT) Version 2.0
  • XSL Transformations (XSLT) Version 2.0:transforms data model instances (XML and

:transforms data model instances (XML and non-XML) into other documents, including into XSL-FO for printing non-XML) into other documents, including into XSL-FO for printing

  • XML Path Language (
  • XML Path Language (XPath

XPath) 2.0 ) 2.0:expression syntax for referring to parts of XML :expression syntax for referring to parts of XML documents documents

  • XQuery

XQuery 1.0 and 1.0 and XPath XPath 2.0 Functions and Operators 2.0 Functions and Operators:the functions you can call in XPath :the functions you can call in XPath expressions and the operations you can perform on XPath 2.0 data types expressions and the operations you can perform on XPath 2.0 data types

  • XQuery

XQuery 1.0 and 1.0 and XPath XPath 2.0 Data Model (XDM) 2.0 Data Model (XDM):representation and access for both XML :representation and access for both XML and non-XML sources and non-XML sources

  • XSLT 2.0 and
  • XSLT 2.0 and XQuery

XQuery 1.0 Serialization 1.0 Serialization:how to output the results of XSLT 2.0 and XML :how to output the results of XSLT 2.0 and XML Query evaluation in XML, HTML or as text Query evaluation in XML, HTML or as text

  • XML Syntax for
  • XML Syntax for XQuery

XQuery 1.0 ( 1.0 (XQueryX XQueryX) ): an XML-aware syntax for querying collections : an XML-aware syntax for querying collections

  • f structured and semi-structured data both locally and over the Web
  • f structured and semi-structured data both locally and over the Web
  • XQuery

XQuery 1.0 and 1.0 and XPath XPath 2.0 Formal Semantics 2.0 Formal Semantics:the type system used in XQuery and XSLT :the type system used in XQuery and XSLT 2 via XPath defined precisely for implementers 2 via XPath defined precisely for implementers

slide-16
SLIDE 16

01/31/07 16

XQuery, Xpath, XSLT XQuery, Xpath, XSLT

Xpath 1.0 XSLT 2.0 XQuery 1.0 Xpath 2.0 XSLT 1.0 uses uses extends, almost backwards compatible extends

FLWOR expressions Node constructors Validation

1999 2007

slide-17
SLIDE 17

01/31/07 17

Roadmap for today Roadmap for today

 XQuery Data Model (XDM)

XQuery Data Model (XDM)

 XQuery type system

XQuery type system

 Xquery environment

Xquery environment

 XQuery basic constructs

XQuery basic constructs

variables variables

constants constants

function calls, function library function calls, function library

arithmetic operations arithmetic operations

boolean operations boolean operations

path expressions path expressions

conditionals conditionals

slide-18
SLIDE 18

01/31/07 18

The need for an abstract XML The need for an abstract XML data model data model

 XML 1.0 specification only talks about

XML 1.0 specification only talks about characters characters

 We cannot have a programming language

We cannot have a programming language processing “characters” (one by one) processing “characters” (one by one)

 An XML abstract/logical data model !?

An XML abstract/logical data model !?

 Unfortunately too many of those

Unfortunately too many of those

 Infoset, PSVI, DOM,

Infoset, PSVI, DOM, XDM XDM, etc , etc

slide-19
SLIDE 19

01/31/07 19

XML Data Model (XDM) XML Data Model (XDM)

 Abstract (I.e. logical) data model for XML data

Abstract (I.e. logical) data model for XML data

 Same role for XQuery as the relational data model for SQL

Same role for XQuery as the relational data model for SQL

 Purely

Purely logical logical --- no

  • -- no standard

standard storage or access model (in purpose) storage or access model (in purpose)

 XQuery is

XQuery is closed closed with respect to the Data Model with respect to the Data Model

Infoset PSVI XML Data Model XQuery Xpath 2.0 XSLT 2.0

slide-20
SLIDE 20

01/31/07 20

XML Data model life cycle XML Data model life cycle

parse validate .xml .xsd

XQuery Data Model XQuery Data Model

Xpath 2.0 XQuery XSLT 2.0

application- dependent

.xml serialize

slide-21
SLIDE 21

01/31/07 21

XML Data Model XML Data Model

 Instance of the data model:

Instance of the data model:

 a

a sequence sequence composed of zero or more composed of zero or more items items

 The

The empty sequence empty sequence often

  • ften

considered as the “null value” considered as the “null value”

 Items

Items

 nodes

nodes or

  • r atomic values

atomic values

 Nodes

Nodes

document | element | attribute | text | namespaces | PI | comment document | element | attribute | text | namespaces | PI | comment

 Atomic values

Atomic values

 Instances of all XML Schema atomic types

Instances of all XML Schema atomic types string, boolean, ID, IDREF, decimal, QName, URI, ... string, boolean, ID, IDREF, decimal, QName, URI, ...

 untyped atomic values

untyped atomic values

 Typed

Typed (I.e. schema validated) and (I.e. schema validated) and untyped untyped (I.e. non schema (I.e. non schema validated) nodes and values validated) nodes and values Remember Lisp ?

slide-22
SLIDE 22

01/31/07 22

Sequences Sequences

 Can be

Can be heterogeneous heterogeneous (nodes (nodes and and atomic values) atomic values) (<a/>, 3) (<a/>, 3)

 Can contain

Can contain duplicates duplicates (by value and by identity) (by value and by identity) (1,1,1) (1,1,1)

 Are

Are not not necessarily ordered in necessarily ordered in document order document order

 Nested sequences are

Nested sequences are automatically flattened automatically flattened ( 1, 2, (3, 4) ) = (1, 2, 3, 4) ( 1, 2, (3, 4) ) = (1, 2, 3, 4)

 Single items and singleton sequences are the same

Single items and singleton sequences are the same 1 = (1) 1 = (1)

slide-23
SLIDE 23

01/31/07 23

Atomic values Atomic values

 The values of the 19

The values of the 19 atomic types atomic types available in XML available in XML Schema Schema

 E.g. xs:integer, xs:boolean, xs:date

E.g. xs:integer, xs:boolean, xs:date

 All the

All the user defined derived atomic types user defined derived atomic types

 E.g myNS:ShoeSize

E.g myNS:ShoeSize

 xs:untypedAtomic

xs:untypedAtomic

 Atomic values carry their type together with the

Atomic values carry their type together with the value value

(8, myNS:ShoeSize) is not the same as (8, xs:integer) (8, myNS:ShoeSize) is not the same as (8, xs:integer)

slide-24
SLIDE 24

01/31/07 24

XML nodes XML nodes

 7 types of nodes:

7 types of nodes:

 document | element | attribute | text | namespaces | PI

document | element | attribute | text | namespaces | PI | comment | comment

 Every node has a unique

Every node has a unique node identifier node identifier

 Scope of node identifier uniqueness is implementation

Scope of node identifier uniqueness is implementation dependent dependent

 Nodes have children and an optional parent

Nodes have children and an optional parent

 conceptual “

conceptual “tree tree” ”

 Nodes are ordered based of the topological order in

Nodes are ordered based of the topological order in the tree (“ the tree (“document order document order”) ”)

slide-25
SLIDE 25

01/31/07 25

Node accessors Node accessors

 node-kind : xs:string

node-kind : xs:string

 node-name : xs:Qname ?

node-name : xs:Qname ?

 parent : node() ?

parent : node() ?

 string-value : xs:string

string-value : xs:string

 typed-value : xs:anyAtomicType *

typed-value : xs:anyAtomicType *

 type-name : xs:Qname ?

type-name : xs:Qname ?

 children : node()*

children : node()*

 attributes : attribute() *

attributes : attribute() *

 namespaces : node() *

namespaces : node() *

slide-26
SLIDE 26

01/31/07 26

Example of well formed XML Example of well formed XML data data

< <book book year year=“1967”> =“1967”> < <title title>The politics of experience</ >The politics of experience</title title> > < <author author>R.D. Laing</ >R.D. Laing</author author> > </ </book book> >

 3 element nodes, 1 attribute node, 5 text nodes

3 element nodes, 1 attribute node, 5 text nodes

 name(book element) = {-}:book

name(book element) = {-}:book

 In the absence of schema validation

In the absence of schema validation

 type(book element) = xs:untyped

type(book element) = xs:untyped

 type(author element) = xs:untyped

type(author element) = xs:untyped

 type(year attribute) = xs:untypedAtomic

type(year attribute) = xs:untypedAtomic

 typed-value(author element) = (“R.D. Laing” , xs:untypedAtomic)

typed-value(author element) = (“R.D. Laing” , xs:untypedAtomic)

 typed-value(year attribute) = (“1967”, xs:untypedAtomic)

typed-value(year attribute) = (“1967”, xs:untypedAtomic)

slide-27
SLIDE 27

01/31/07 27

XML schema example XML schema example

<type name=“ <type name=“book-type book-type”> ”> <sequence>

<sequence> <attribute name=“ <attribute name=“year year” type=“xs:integer”> ” type=“xs:integer”> <element name=“ <element name=“title title” type=“xs:string”> ” type=“xs:string”> <sequence minoccurs=“0”> <sequence minoccurs=“0”> <element name=“ <element name=“author author” type=“xs:string> ” type=“xs:string> </sequence> </sequence> </sequence> </sequence>

</type> </type>

<element name=“ <element name=“book book” type=“ ” type=“book-type book-type”> ”>

slide-28
SLIDE 28

01/31/07 28

Schema validated XML data Schema validated XML data

< <book book year year=“1967” > =“1967” > < <title title>The politics of experience</ >The politics of experience</title title> > < <author author>R.D. Laing</ >R.D. Laing</author author> > </ </book book> >

 After schema validation

After schema validation

 type(book element) = {uri}:book-type

type(book element) = {uri}:book-type

 type(author element) = xs:string

type(author element) = xs:string

 type(year attribute) = xs:integer

type(year attribute) = xs:integer

 typed-value(author element) = (“R.D. Laing” , xs:string)

typed-value(author element) = (“R.D. Laing” , xs:string)

 typed-value(year attribute) = (1967 , xs:integer)

typed-value(year attribute) = (1967 , xs:integer)  Schema validation impacts the data model

Schema validation impacts the data model representation and therefore the XQuery semantics!! representation and therefore the XQuery semantics!!

slide-29
SLIDE 29

01/31/07 29

Lexical and binary aspect Lexical and binary aspect

  • f the data
  • f the data

 Every node holds (logically) redundant information:

Every node holds (logically) redundant information:

 <a xsi:type=“xs:integer”>001</a>

<a xsi:type=“xs:integer”>001</a>

 dm:string-value () “001” as xs:string

dm:string-value () “001” as xs:string

 dm:typed-value ()

dm:typed-value ()

 “

“001” as an xs:untyped 001” as an xs:untyped before before validation validation

 1 as an xs:integer

1 as an xs:integer after after validation validation  Implementations can store :

Implementations can store :

 The

The string value string value

 Retrieve the typed value dynamically based on the type, every

Retrieve the typed value dynamically based on the type, every time is needed time is needed

 The

The typed value typed value

 Retrieve an acceptable lexical value for that type every time this is

Retrieve an acceptable lexical value for that type every time this is required required

 Both

Both

 In case of unvalidated data the two are the same

In case of unvalidated data the two are the same

slide-30
SLIDE 30

01/31/07 30

Typed vs. untyped XML Data Typed vs. untyped XML Data

  • Untyped data (non XML Schema validated)

<a>3</a> eq 3 <a>3</a> eq “3”

  • Typed data (after XML Schema validation)

<a xsi:type=“xs:integer”>3</a> eq 3 <a xsi:type=“xs:string”>3</a> eq 3 <a xsi:type=“xs:integer”>3</a> eq “3” <a xsi:type=“xs:string”>3</a> eq “3”

slide-31
SLIDE 31

01/31/07 31

XML data equivalence XML data equivalence

 XQuery has multiple notions of data “equality”

XQuery has multiple notions of data “equality”

 “

“= =“, “ “, “eq eq”, “ ”, “is is”, “ ”, “fn:deep-equal()” fn:deep-equal()”

 Expected properties:

Expected properties:

 Transitivity

Transitivity, , reflexivity reflexivity and and symmetry symmetry

 Necessary for grouping, indexing and hashing

Necessary for grouping, indexing and hashing

 Additional property:

Additional property:

 if (

if ( data data1

1

equal equal data data2

2 )

) then ( then ( f f( (data data1)

1)

equal equal f f( (data data2

2)

)

)

)

 Necessary for memoization, caching

Necessary for memoization, caching

 None of the equality relationships above (except “is”)

None of the equality relationships above (except “is”) satisfies those properties satisfies those properties

 The “

The “is is” relationship only applies to nodes ” relationship only applies to nodes

 Careful implementations for

Careful implementations for indexes indexes, , hashing hashing, , caches caches

slide-32
SLIDE 32

01/31/07 32

Document order Document order

< <book book year year=“1967” price=“45.32> =“1967” price=“45.32> < <title title>The politics of experience</ >The politics of experience</title title> > < <author author>R.D. Laing</ >R.D. Laing</author author> > </ </book book> >

 How many nodes here ?

How many nodes here ?

 What is the order between nodes ?

What is the order between nodes ?

slide-33
SLIDE 33

01/31/07 33

Document order Document order

< <book book(n1) (n1) year year(n2) (n2) =“1967” price =“1967” price(n3) (n3)=“45.32> =“45.32>(n4) (n4) < <title title(n5) (n5)> >(n6) (n6) The politics of The politics of experience</ experience</title title> >(n7) (n7) < <author author(n8) (n8)> >(n9) (n9) R.D. Laing</ R.D. Laing</author author> > </ </book book> >

How many nodes here ? 9 How many nodes here ? 9

What is the order between nodes ? What is the order between nodes ?

 n1 before all the others

n1 before all the others

 order of n2 and n3 non-deterministic

  • rder of n2 and n3 non-deterministic

 n2 and n3 are before n4,n5,n6,n7,n8,n9

n2 and n3 are before n4,n5,n6,n7,n8,n9

 n4<n5<n6<n7<n8<n9 (top-down, left to right among the

n4<n5<n6<n7<n8<n9 (top-down, left to right among the children) children)

slide-34
SLIDE 34

01/31/07 34

XQuery type system XQuery type system

XQuery has a powerful (and complex!) type system XQuery has a powerful (and complex!) type system

XQuery types are imported from XML Schemas XQuery types are imported from XML Schemas

Every XML data model instance has a dynamic type Every XML data model instance has a dynamic type

Every XQuery expression has a static type Every XQuery expression has a static type

Pessimistic static type inference Pessimistic static type inference

The goal of the type system is: The goal of the type system is:

1. 1.

detect statically errors in the queries

detect statically errors in the queries

2. 2.

infer the type of the result of valid queries infer the type of the result of valid queries

3. 3.

ensure statically that the result of a given query is of a given ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given type (expected) type if the input dataset is guaranteed to be of a given type

slide-35
SLIDE 35

01/31/07 35

XQuery type system XQuery type system components components

 Atomic types

Atomic types

 xs:untypedAtomic

xs:untypedAtomic

 All 19 primitive XML Schema types

All 19 primitive XML Schema types

 All user defined atomic types

All user defined atomic types

 Empty, None

Empty, None

 Type constructors (simplification!)

Type constructors (simplification!)

 Elements:

Elements: element name {type} element name {type}

 Attributes:

Attributes: attribute name {type} attribute name {type}

 Alternation :

Alternation : type1 | type 2 type1 | type 2

 Sequence:

Sequence: type1, type2 type1, type2

 Repetition:

Repetition: type* type*

 Interleaved product:

Interleaved product: type1 & type2 type1 & type2

  • type1 intersect type2 ?
  • type1 subtype of type2 ?
  • type1 equals type2 ?
slide-36
SLIDE 36

01/31/07 36

XML queries XML queries

 An XQuery basic structure:

An XQuery basic structure:

a a prolog prolog + an + an expression expression

 Role of the prolog:

Role of the prolog:

Populate the context where the expression is compiled Populate the context where the expression is compiled and evaluated and evaluated

 Prologue contains:

Prologue contains:

namespace definitions namespace definitions

schema imports schema imports

default element and function namespace default element and function namespace

function definitions function definitions

collations declarations collations declarations

function library imports function library imports

global and external variables definitions global and external variables definitions

 etc

etc

slide-37
SLIDE 37

01/31/07 37

XQuery processing XQuery processing

slide-38
SLIDE 38

01/31/07 38

XQuery expressions XQuery expressions

XQuery Expr := XQuery Expr :=Constants | Variable | FunctionCalls | PathExpr Constants | Variable | FunctionCalls | PathExpr ComparisonExpr | ArithmeticExpr | LogicExpr | ComparisonExpr | ArithmeticExpr | LogicExpr | FLWRExpr | ConditionalExpr | QuantifiedExpr | FLWRExpr | ConditionalExpr | QuantifiedExpr | TypeSwitchExpr | InstanceofExpr | CastExpr | TypeSwitchExpr | InstanceofExpr | CastExpr | UnionExpr | IntersectExceptExpr | UnionExpr | IntersectExceptExpr | ConstructorExpr | ValidateExpr ConstructorExpr | ValidateExpr

Expressions can be nested with full generality ! Expressions can be nested with full generality ! Functional programming heritage (ML, Haskell, Lisp) Functional programming heritage (ML, Haskell, Lisp)

slide-39
SLIDE 39

01/31/07 39

Constants Constants

XQuery grammar has built-in support for: XQuery grammar has built-in support for:

 Strings:

Strings: “125.0” or ‘125.0’ “125.0” or ‘125.0’

 Integers:

Integers: 150 150

 Decimal:

Decimal: 125.0 125.0

 Double:

Double: 125.e2 125.e2

 19 other

19 other atomic types atomic types available via XML Schema available via XML Schema

 Values can be constructed

Values can be constructed

 with constructors in F&O doc:

with constructors in F&O doc: fn:true(), fn:date(“2002-5-20”) fn:true(), fn:date(“2002-5-20”)

 by casting

by casting

 by schema validation

by schema validation

slide-40
SLIDE 40

01/31/07 40

Variables Variables

 $ + Qname (e.g. $x, $ns:foo)

$ + Qname (e.g. $x, $ns:foo)

 bound, not assigned

bound, not assigned

 XQuery does not allow variable assignment

XQuery does not allow variable assignment

 created by

created by let let, , for for, , some/every, typeswitch some/every, typeswitch expressions, function parameters expressions, function parameters

 example:

example: let $x := ( 1, 2, 3 ) let $x := ( 1, 2, 3 ) return count($x) return count($x)

 above scoping ends at conclusion of

above scoping ends at conclusion of return return expression expression

slide-41
SLIDE 41

01/31/07 41

A built-in function sampler A built-in function sampler

 fn:document(xs:anyURI)=> document?

fn:document(xs:anyURI)=> document?

 fn:empty(item*) => boolean

fn:empty(item*) => boolean

 fn:index-of(item*, item) => xs:unsignedInt?

fn:index-of(item*, item) => xs:unsignedInt?

 fn:distinct-values(item*) => item*

fn:distinct-values(item*) => item*

 fn:distinct-nodes(node*) => node*

fn:distinct-nodes(node*) => node*

 fn:union(node*, node*) => node*

fn:union(node*, node*) => node*

 fn:except(node*, node*) => node*

fn:except(node*, node*) => node*

 fn:string-length(xs:string?) => xs:integer?

fn:string-length(xs:string?) => xs:integer?

 fn:contains(xs:string, xs:string) => xs:boolean

fn:contains(xs:string, xs:string) => xs:boolean

 fn:true() => xs:boolean

fn:true() => xs:boolean

 fn:date(xs:string) => xs:date

fn:date(xs:string) => xs:date

 fn:add-date(xs:date, xs:duration) => xs:date

fn:add-date(xs:date, xs:duration) => xs:date

See Functions and Operators W3C specification See Functions and Operators W3C specification

slide-42
SLIDE 42

01/31/07 42

Atomization Atomization

 fn:data(item*) ->

fn:data(item*) -> xs:anyAtomicType*

 Extracting the “value” of a node, or returning

Extracting the “value” of a node, or returning the atomic value the atomic value

 Implicitly applied:

  • Arithmetic expressions

Arithmetic expressions

  • Comparison expressions

Comparison expressions

  • Function calls and returns

Function calls and returns

  • Cast expressions

Cast expressions

  • Constructor expressions for various kinds of nodes

Constructor expressions for various kinds of nodes

  • order by
  • rder by clauses in FLWOR expressions

clauses in FLWOR expressions

slide-43
SLIDE 43

01/31/07 43

Constructing sequences Constructing sequences

(1, 2, 2, 3, 3, <a/>, <b/>) (1, 2, 2, 3, 3, <a/>, <b/>)

 “

“,” is the sequence concatenation operator ,” is the sequence concatenation operator

 Nested sequences are flattened:

Nested sequences are flattened: (1, 2, 2, (3, 3)) => (1, 2, 2, 3,3) (1, 2, 2, (3, 3)) => (1, 2, 2, 3,3)

 range expressions:

range expressions: (1 to 3) => (1, 2,3) (1 to 3) => (1, 2,3)

slide-44
SLIDE 44

01/31/07 44

Combining sequences Combining sequences

 Union, Intersect, Except

Union, Intersect, Except

 Work only for sequences of nodes, not atomic values

Work only for sequences of nodes, not atomic values

 Eliminate duplicates and reorder to document order

Eliminate duplicates and reorder to document order $x := <a/>, $y := <b/>, $z := <c/> $x := <a/>, $y := <b/>, $z := <c/> ($x, $y) union ($y, $z) => (<a/>, <b/>, ($x, $y) union ($y, $z) => (<a/>, <b/>, <c/>) <c/>)

 F&O specification provides other functions &

F&O specification provides other functions &

  • perators; eg.
  • perators; eg. fn:distinct-values()

fn:distinct-values() and and fn:distinct-nodes() fn:distinct-nodes() particularly useful particularly useful

slide-45
SLIDE 45

01/31/07 45

Arithmetic expressions Arithmetic expressions

1 + 4 1 + 4 $a div 5 $a div 5 5 div 6 5 div 6 $b mod 10 $b mod 10 1 - (4 * 8.5) 1 - (4 * 8.5)

  • 55.5
  • 55.5

<a>42</a> + 1 <a>baz</a> + 1 <a>42</a> + 1 <a>baz</a> + 1 validate {<a xsi:type=“xs:integer”>42</a> }+ 1 validate {<a xsi:type=“xs:integer”>42</a> }+ 1

validate {<a xsi:type=“xs:string”>42</a> }+ 1

validate {<a xsi:type=“xs:string”>42</a> }+ 1  Apply the following rules:

Apply the following rules:

 atomize

atomize all operands. if either operand is (), => () all operands. if either operand is (), => ()

 if an operand is untyped, cast to

if an operand is untyped, cast to xs:double xs:double (if unable, => (if unable, => error) error)

 if the operand types differ but can be

if the operand types differ but can be promoted promoted to common type, do so to common type, do so (e.g.: (e.g.: xs:integer xs:integer can be promoted to can be promoted to xs:double xs:double) )

 if operator is consistent w/ types, apply it; result is either atomic

if operator is consistent w/ types, apply it; result is either atomic value or value or error error

 if type is not consistent, throw type exception

if type is not consistent, throw type exception

slide-46
SLIDE 46

01/31/07 46

Logical expressions Logical expressions

expr1

expr1 and and expr2 expr2 expr1 expr1 or

  • r expr2

expr2 fn:not fn:not() as a function () as a function

 return

return true, false true, false

 Different from SQL

Different from SQL

 two

two value logic, value logic, not not three three value logic value logic

 Different from imperative languages

Different from imperative languages

 and

and, , or

  • r are commutative in Xquery, but not in Java.

are commutative in Xquery, but not in Java.

if (($x castable as xs:integer) and (($x cast as xs:integer) eq 2) ) ….. if (($x castable as xs:integer) and (($x cast as xs:integer) eq 2) ) …..

 Non-deterministic

Non-deterministic

false and error => false false and error => false or

  • r

error ! (non-deterministically) error ! (non-deterministically)

  • Rules:

Rules:

first compute the first compute the Boolean Effective Value (BEV) Boolean Effective Value (BEV) for each operand: for each operand:

if (), “”, NaN, 0, then return if (), “”, NaN, 0, then return false false

if the operand is of type xs:boolean, return it; if the operand is of type xs:boolean, return it;

 If operand is a sequence with first item a node, return true

If operand is a sequence with first item a node, return true

 else raises an error

else raises an error

then use standard two value Boolean logic on the two BEV's as appropriate then use standard two value Boolean logic on the two BEV's as appropriate

slide-47
SLIDE 47

01/31/07 47

Comparisons Comparisons

<<, >>

testing relative position

  • f one node vs. another

(in document order)

Order is, isnot

for testing identity of single nodes

Node =, !=, <=, <, >, >=

Existential quantification + automatic type coercion

General eq, ne, lt, le, gt, ge

for comparing single values

Value

slide-48
SLIDE 48

01/31/07 48

Value and general Value and general comparisons comparisons

 <a>42</a> eq “42” true

<a>42</a> eq “42” true

 <a>42</a> eq 42 error

<a>42</a> eq 42 error

 <a>42</a> eq “42.0” false

<a>42</a> eq “42.0” false

 <a>42</a> eq 42.0 error

<a>42</a> eq 42.0 error

 <a>42</a> = 42 true

<a>42</a> = 42 true

 <a>42</a> = 42.0 true

<a>42</a> = 42.0 true

 <a>42</a> eq <b>42</b> true

<a>42</a> eq <b>42</b> true

 <a>42</a> eq <b> 42</b> false

<a>42</a> eq <b> 42</b> false

 <a>baz</a> eq 42 error

<a>baz</a> eq 42 error

 () eq 42 ()

() eq 42 ()

 () = 42 false

() = 42 false

 (<a>42</a>, <b>43</b>) = 42.0 true

(<a>42</a>, <b>43</b>) = 42.0 true

 (<a>42</a>, <b>43</b>) = “42” true

(<a>42</a>, <b>43</b>) = “42” true

 ns:shoesize(5) eq ns:hatsize(5) true

ns:shoesize(5) eq ns:hatsize(5) true

 (1,2) = (2,3) true

(1,2) = (2,3) true

slide-49
SLIDE 49

01/31/07 49

Algebraic properties of Algebraic properties of comparisons comparisons

 General comparisons not reflexive, transitive

General comparisons not reflexive, transitive

 (1,3) = (1,2)

(1,3) = (1,2) (but also !=, <, >, <=, >= !!!!!) (but also !=, <, >, <=, >= !!!!!)

 Reasons

Reasons

implicit existential quantification, dynamic casts implicit existential quantification, dynamic casts  Negation rule does not hold

Negation rule does not hold

 fn:not($x = $y) is not equivalent to $x != $y

fn:not($x = $y) is not equivalent to $x != $y

 General comparison not transitive, not reflexive

General comparison not transitive, not reflexive

 Value comparisons are

Value comparisons are almost almost transitive transitive

 Exception:

Exception:

 xs:decimal due to the loss of precision

xs:decimal due to the loss of precision

Impact on grouping, hashing, indexing, caching !!!

slide-50
SLIDE 50

01/31/07 50

XPath expressions XPath expressions

 An expression that defines the set of nodes where the

An expression that defines the set of nodes where the navigation starts + a series of selection steps that explain how navigation starts + a series of selection steps that explain how to navigate into the XML tree to navigate into the XML tree

 A step:

A step:

 axis

axis ‘::’ ‘::’ nodeTest nodeTest

 Axis control the navigation direction in the tree

Axis control the navigation direction in the tree

 attribute, child, descendant, descendant-or-self, parent, self

attribute, child, descendant, descendant-or-self, parent, self

 The other Xpath 1.0 axes (

The other Xpath 1.0 axes (following, following-sibling, preceding, following, following-sibling, preceding, preceding-sibling, ancestor, ancestor-or-self preceding-sibling, ancestor, ancestor-or-self) are optional in XQuery ) are optional in XQuery

 Node test by:

Node test by:

 Name

Name (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* ) (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* )

 Kind of item

Kind of item (e.g. node(), comment(), text() ) (e.g. node(), comment(), text() )

 Type test

Type test (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer) (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer)

slide-51
SLIDE 51

01/31/07 51

Examples of path expressions Examples of path expressions

 document(“bibliography.xml”)/child::bib

document(“bibliography.xml”)/child::bib

 $x/child::bib/child::book/attribute::year

$x/child::bib/child::book/attribute::year

 $x/parent::*

$x/parent::*

 $x/child::*/descendent::comment()

$x/child::*/descendent::comment()

 $x/child::element(*, ns:PoType)

$x/child::element(*, ns:PoType)

 $x/attribute::attribute(*, xs:integer)

$x/attribute::attribute(*, xs:integer)

 $x/ancestors::document(schema-element(ns:PO))

$x/ancestors::document(schema-element(ns:PO))

 $x/(child::element(*, xs:date) |

$x/(child::element(*, xs:date) | attribute::attribute(*, xs:date) attribute::attribute(*, xs:date)

 $x/f(.)

$x/f(.)

slide-52
SLIDE 52

01/31/07 52

Xpath abbreviated syntax Xpath abbreviated syntax

 Axis can be missing

Axis can be missing

 By default the child axis

By default the child axis $x/ $x/child:: child::person -> $x/person person -> $x/person

 Short-hands for common axes

Short-hands for common axes

 Descendent-or-self

Descendent-or-self

$x/ $x/descendant-or-self::*/child:: descendant-or-self::*/child::comment()-> $x comment()-> $x/ // /comment() comment()

 Parent

Parent

$x/ $x/parent::* parent::* -> $x/

  • > $x/..

..

 Attribute

Attribute

$x/ $x/attribute:: attribute::year -> $x/ year -> $x/@ @year year

 Self

Self

$x/ $x/self::* self::* -> $x/

  • > $x/.

.

slide-53
SLIDE 53

01/31/07 53

Xpath filter predicates Xpath filter predicates

 Syntax:

Syntax:

expression1 expression1 [ [ expression2 expression2 ] ]

 [ ] is an overloaded operator

[ ] is an overloaded operator

 Filtering by position (if numeric value) :

Filtering by position (if numeric value) :

/book[3] /book[3] /book[3]/author[1] /book[3]/author[1] /book[3]/author[1 to 2] /book[3]/author[1 to 2]  Filtering by predicate :

Filtering by predicate :

 //book [author/firstname = “ronald”]

//book [author/firstname = “ronald”]

 //book [@price <25]

//book [@price <25]

 //book [count(author [@gender=“female”] )>0

//book [count(author [@gender=“female”] )>0

 Classical Xpath mistake

Classical Xpath mistake

 $x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1]

$x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1]

slide-54
SLIDE 54

01/31/07 54

Conditional expressions Conditional expressions

if ( $book/@year <1980 ) if ( $book/@year <1980 ) then ns:WS(<old>{$x/title}</old>) then ns:WS(<old>{$x/title}</old>) else ns:WS(<new>{$x/title}</new>) else ns:WS(<new>{$x/title}</new>)

 Only one branch allowed to raise execution errors

Only one branch allowed to raise execution errors

 Impacts scheduling and parallelization

Impacts scheduling and parallelization