Module 5 Module 5 Introduction to XQuery Introduction to XQuery - - PowerPoint PPT Presentation
Module 5 Module 5 Introduction to XQuery Introduction to XQuery - - PowerPoint PPT Presentation
Module 5 Module 5 Introduction to XQuery Introduction to XQuery XML is now everywhere XML is now everywhere Google search (warning: unreliable Google search (warning: unreliable numbers) numbers) 285.000.000 for XML 285.000.000 for
01/31/07 2
XML is now everywhere XML is now everywhere
Google search (warning: unreliable
Google search (warning: unreliable numbers) numbers)
285.000.000 for XML
285.000.000 for XML
1.000.000 for XQuery
1.000.000 for XQuery
11.000.000 for XSLT
11.000.000 for XSLT
12.000.000 for XML Schema
12.000.000 for XML Schema
60.000.000 for .NET
60.000.000 for .NET
200.000.000 for Java
200.000.000 for Java
64.000.000 for SQL
64.000.000 for SQL
The highest Google number among all the
The highest Google number among all the technology buzzwords that I searched (except RSS) technology buzzwords that I searched (except RSS)
01/31/07 3
Sources of XML data Sources of XML data
1. 1.
Inter-application communication data (WS, Rest, etc) Inter-application communication data (WS, Rest, etc)
2. 2.
Mobile devices communication data Mobile devices communication data
3. 3.
Logs Logs
4. 4.
Blogs (RSS) Blogs (RSS)
5. 5.
Metadata (e.g. Schema, WSDL, XMP) Metadata (e.g. Schema, WSDL, XMP)
6. 6.
Presentation data (e.g. XHTML) Presentation data (e.g. XHTML)
7. 7.
Documents (e.g. Word) Documents (e.g. Word)
8. 8.
Views of other sources of data Views of other sources of data
Relational, LDAP, CSV, Excel, etc. Relational, LDAP, CSV, Excel, etc.
9. 9.
Sensor data Sensor data
01/31/07 4
Some vertical application Some vertical application domains for XML domains for XML
HealthCare Level Seven
HealthCare Level Seven http://www.hl7.org/ http://www.hl7.org/
Geography Markup Language (GML)
Geography Markup Language (GML)
Systems Biology Markup Language (SBML)
Systems Biology Markup Language (SBML) http://sbml.org/ http://sbml.org/
XBRL, the XML based Business Reporting standard
XBRL, the XML based Business Reporting standard http://www.xbrl.org/ http://www.xbrl.org/
Global Justice XML Data Model
Global Justice XML Data Model (GJXDM)
(GJXDM) http://it.ojp.gov/jxdm
http://it.ojp.gov/jxdm
ebXML ebXML http://www.ebxml.org/ http://www.ebxml.org/
e.g. Encoded Archival Description Application
e.g. Encoded Archival Description Application http://lcweb.loc.gov/ead/ http://lcweb.loc.gov/ead/
Digital photography metadata XMP
Digital photography metadata XMP
An XML grammar for sensor data (SensorML)
An XML grammar for sensor data (SensorML)
Real Simple Syndication (RSS 2.0)
Real Simple Syndication (RSS 2.0)
Basically everywhere. Basically everywhere.
01/31/07 5
Processing the XML data Processing the XML data
- Huge amount of XML information, and growing
Huge amount of XML information, and growing
- We need to “
We need to “manage manage” it, and then “ ” it, and then “process process” it ” it
- Store it efficiently
Store it efficiently
- Verify the correctness
Verify the correctness
- Filter, search, select, join, aggregate
Filter, search, select, join, aggregate
- Create new pieces of information
Create new pieces of information
- Clean, normalize the data
Clean, normalize the data
- Update it
Update it
- Take actions based on the existing data
Take actions based on the existing data
- Write complex execution flows
Write complex execution flows
- No conceptual organization like for relational
No conceptual organization like for relational databases (applications are too heterogeneous) databases (applications are too heterogeneous)
01/31/07 6
Frequent solutions to XML data Frequent solutions to XML data management management
1. 1.
Map it to Map it to generic generic programming APIs (e.g. programming APIs (e.g. DOM, SAX, StaX) DOM, SAX, StaX)
2. 2.
Manually Manually map it to map it to non-generic non-generic APIs APIs
3. 3.
Automatically Automatically map it to map it to non-generic non-generic structures structures
4. 4.
Use Use XML extensions XML extensions of existing languages
- f existing languages
5. 5.
Shredding Shredding for relational stores for relational stores
6. 6.
Native Native XML processing through XSLT and XML processing through XSLT and XQuery XQuery
01/31/07 7
- 1. Mapping to generic structures
- 1. Mapping to generic structures
Represent the data:
Represent the data:
Original UNICODE form or
Original UNICODE form or
Some binary representation (e.g FastInfoset)
Some binary representation (e.g FastInfoset)
Store it:
Store it:
Directly on a file system or
Directly on a file system or
On a “transacted” file system (e.g. SleepyCat, or a relational
On a “transacted” file system (e.g. SleepyCat, or a relational database) database)
Map the XML data to generic XML programmatic
Map the XML data to generic XML programmatic APIs APIs
E.g. Dom, Sax, Stax (JSR 173), XMLReader
E.g. Dom, Sax, Stax (JSR 173), XMLReader
Use the native programming languages (e.g. Java, C#)
Use the native programming languages (e.g. Java, C#) to manipulate the data to manipulate the data
Re-serialize it at the end
Re-serialize it at the end
01/31/07 8
- 1. Manual mapping to generic
- 1. Manual mapping to generic
structures (example) structures (example)
<purchaseOrder> <purchaseOrder>
<lineItem> <lineItem> … ….. .. </lineItem> </lineItem> <lineItem> <lineItem> … ….. .. </lineItem> </lineItem>
</purchaseOrder> </purchaseOrder> <book> <book> <author>…</author> <author>…</author> <title>….</title> <title>….</title> … ….. .. </book> </book>
Class DomNode{
public String getNodeName(); public String getNodeValue(); public void setNodeValue(nodeValue); public short getNodeType();
} Hard coded mappings
01/31/07 9
- 2. Manual mapping to non-
- 2. Manual mapping to non-
generic structures generic structures
<purchaseOrder> <purchaseOrder>
<lineItem> <lineItem> … ….. .. </lineItem> </lineItem> <lineItem> <lineItem> … ….. .. </lineItem> </lineItem>
</purchaseOrder> </purchaseOrder> <book> <book> <author>…</author> <author>…</author> <title>….</title> <title>….</title> … ….. .. </book> </book>
Class PurchaseOrder{
public List getLineItems();
……..
} Hard coded mappings Class Book{ public List getAuthor();
public String getTitle(); ……
}
01/31/07 10
- 3. Automatic mapping to non-
- 3. Automatic mapping to non-
generic structures generic structures
<type name=“ <type name=“book-type book-type”> ”> <sequence> <sequence> <attribute name=“ <attribute name=“year year” type=“xs:integer”> ” type=“xs:integer”> <element name=“ <element name=“title title” type=“xs:string”> ” type=“xs:string”> <sequence minoccurs=“0”> <sequence minoccurs=“0”> <element name=“ <element name=“author author” type=“xs:string> ” type=“xs:string> </sequence> </sequence> </sequence> </sequence> </type> </type> <element name=“ <element name=“book book” type=“ ” type=“book-type book-type”> ”>
Class Book-type{
public integer getYear(); public string getTitle(); public List getAuthors();
……..
} Automatic mapping e.g.XMLBeans
01/31/07 11
- 4. XML extensions of existing
- 4. XML extensions of existing
procedural languages procedural languages
Examples:
Examples:
C-omega, ECMAscript, PHP extensions,
C-omega, ECMAscript, PHP extensions, Phyton extensions, etc. Phyton extensions, etc.
Most of them define:
Most of them define:
A way of importing XML data into their native
A way of importing XML data into their native type system type system
A rich API for XML data manipulation
A rich API for XML data manipulation
A way of navigating/searching/querying the
A way of navigating/searching/querying the XML data via their extensions (Xpath based or XML data via their extensions (Xpath based or Xpath inspired) Xpath inspired)
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
01/31/07 12
- 5. Native XML processing
- 5. Native XML processing
XSLT and XQuery XSLT and XQuery
Most promising alternative for the future.
Most promising alternative for the future.
The
The only
- nly alternative such that:
alternative such that:
the data is modeled only once
the data is modeled only once
is well integrated with XML Schema type system
is well integrated with XML Schema type system
it preserves the logical/physical data independence
it preserves the logical/physical data independence
the code deals with non-generic structures
the code deals with non-generic structures
Code can be optimized automatically
Code can be optimized automatically
Data is stored:
Data is stored:
in plain file systems in plain file systems or
- r in sophisticated data stores (e.g. XML
in sophisticated data stores (e.g. XML extensions of relational stores) extensions of relational stores)
Missing pieces, under development
Missing pieces, under development
E.g. no procedural logic
E.g. no procedural logic
01/31/07 13
Why XQuery ? Why XQuery ?
Why a “
Why a “query” language query” language for XML ? for XML ?
Need to process XML data
Need to process XML data
Preserve logical/physical data independence
Preserve logical/physical data independence
The semantics is described in terms of an
The semantics is described in terms of an abstract data model abstract data model, , independent of the physical data storage independent of the physical data storage
Declarative
Declarative programming programming
Such programs should describe the “
Such programs should describe the “what what”, not the “ ”, not the “how” how”
Why a
Why a native native query language ? Why not query language ? Why not SQL SQL ? ?
We need to deal with the
We need to deal with the specificities specificities of XML
- f XML
(hierarchical, ordered , textual, potentially schema-less (hierarchical, ordered , textual, potentially schema-less structure) structure)
Why another XML processing language ? Why not
Why another XML processing language ? Why not XSLT XSLT? ?
The template nature of XSLT was not appealing to the
The template nature of XSLT was not appealing to the database people. Not declarative enough. database people. Not declarative enough.
QuickTime TIFF (Unc are neede
01/31/07 14
What is XQuery ? What is XQuery ?
A programming language that can express arbitrary
A programming language that can express arbitrary XML to XML data transformations XML to XML data transformations
Logical/physical data independence
Logical/physical data independence
“
“Declarative” Declarative”
“
“High level” High level”
“
“Side-effect free” Side-effect free”
“
“Strongly typed” language Strongly typed” language
“
“An expression language for XML.” An expression language for XML.”
Commonalities with
Commonalities with functional functional programming, programming, imperative imperative programming and programming and query query languages languages
The “
The “query query” part might be a misnomer (***) ” part might be a misnomer (***)
01/31/07 15
XQuery family of standards XQuery family of standards
- XQuery
XQuery 1.0: An XML Query Language 1.0: An XML Query Language:an XML-aware syntax for querying collections of :an XML-aware syntax for querying collections of structured and semi-structured data both locally and over the Web structured and semi-structured data both locally and over the Web
- XSL Transformations (XSLT) Version 2.0
- XSL Transformations (XSLT) Version 2.0:transforms data model instances (XML and
:transforms data model instances (XML and non-XML) into other documents, including into XSL-FO for printing non-XML) into other documents, including into XSL-FO for printing
- XML Path Language (
- XML Path Language (XPath
XPath) 2.0 ) 2.0:expression syntax for referring to parts of XML :expression syntax for referring to parts of XML documents documents
- XQuery
XQuery 1.0 and 1.0 and XPath XPath 2.0 Functions and Operators 2.0 Functions and Operators:the functions you can call in XPath :the functions you can call in XPath expressions and the operations you can perform on XPath 2.0 data types expressions and the operations you can perform on XPath 2.0 data types
- XQuery
XQuery 1.0 and 1.0 and XPath XPath 2.0 Data Model (XDM) 2.0 Data Model (XDM):representation and access for both XML :representation and access for both XML and non-XML sources and non-XML sources
- XSLT 2.0 and
- XSLT 2.0 and XQuery
XQuery 1.0 Serialization 1.0 Serialization:how to output the results of XSLT 2.0 and XML :how to output the results of XSLT 2.0 and XML Query evaluation in XML, HTML or as text Query evaluation in XML, HTML or as text
- XML Syntax for
- XML Syntax for XQuery
XQuery 1.0 ( 1.0 (XQueryX XQueryX) ): an XML-aware syntax for querying collections : an XML-aware syntax for querying collections
- f structured and semi-structured data both locally and over the Web
- f structured and semi-structured data both locally and over the Web
- XQuery
XQuery 1.0 and 1.0 and XPath XPath 2.0 Formal Semantics 2.0 Formal Semantics:the type system used in XQuery and XSLT :the type system used in XQuery and XSLT 2 via XPath defined precisely for implementers 2 via XPath defined precisely for implementers
01/31/07 16
XQuery, Xpath, XSLT XQuery, Xpath, XSLT
Xpath 1.0 XSLT 2.0 XQuery 1.0 Xpath 2.0 XSLT 1.0 uses uses extends, almost backwards compatible extends
FLWOR expressions Node constructors Validation
1999 2007
01/31/07 17
Roadmap for today Roadmap for today
XQuery Data Model (XDM)
XQuery Data Model (XDM)
XQuery type system
XQuery type system
Xquery environment
Xquery environment
XQuery basic constructs
XQuery basic constructs
variables variables
constants constants
function calls, function library function calls, function library
arithmetic operations arithmetic operations
boolean operations boolean operations
path expressions path expressions
conditionals conditionals
01/31/07 18
The need for an abstract XML The need for an abstract XML data model data model
XML 1.0 specification only talks about
XML 1.0 specification only talks about characters characters
We cannot have a programming language
We cannot have a programming language processing “characters” (one by one) processing “characters” (one by one)
An XML abstract/logical data model !?
An XML abstract/logical data model !?
Unfortunately too many of those
Unfortunately too many of those
Infoset, PSVI, DOM,
Infoset, PSVI, DOM, XDM XDM, etc , etc
01/31/07 19
XML Data Model (XDM) XML Data Model (XDM)
Abstract (I.e. logical) data model for XML data
Abstract (I.e. logical) data model for XML data
Same role for XQuery as the relational data model for SQL
Same role for XQuery as the relational data model for SQL
Purely
Purely logical logical --- no
- -- no standard
standard storage or access model (in purpose) storage or access model (in purpose)
XQuery is
XQuery is closed closed with respect to the Data Model with respect to the Data Model
Infoset PSVI XML Data Model XQuery Xpath 2.0 XSLT 2.0
01/31/07 20
XML Data model life cycle XML Data model life cycle
parse validate .xml .xsd
XQuery Data Model XQuery Data Model
Xpath 2.0 XQuery XSLT 2.0
application- dependent
.xml serialize
01/31/07 21
XML Data Model XML Data Model
Instance of the data model:
Instance of the data model:
a
a sequence sequence composed of zero or more composed of zero or more items items
The
The empty sequence empty sequence often
- ften
considered as the “null value” considered as the “null value”
Items
Items
nodes
nodes or
- r atomic values
atomic values
Nodes
Nodes
document | element | attribute | text | namespaces | PI | comment document | element | attribute | text | namespaces | PI | comment
Atomic values
Atomic values
Instances of all XML Schema atomic types
Instances of all XML Schema atomic types string, boolean, ID, IDREF, decimal, QName, URI, ... string, boolean, ID, IDREF, decimal, QName, URI, ...
untyped atomic values
untyped atomic values
Typed
Typed (I.e. schema validated) and (I.e. schema validated) and untyped untyped (I.e. non schema (I.e. non schema validated) nodes and values validated) nodes and values Remember Lisp ?
01/31/07 22
Sequences Sequences
Can be
Can be heterogeneous heterogeneous (nodes (nodes and and atomic values) atomic values) (<a/>, 3) (<a/>, 3)
Can contain
Can contain duplicates duplicates (by value and by identity) (by value and by identity) (1,1,1) (1,1,1)
Are
Are not not necessarily ordered in necessarily ordered in document order document order
Nested sequences are
Nested sequences are automatically flattened automatically flattened ( 1, 2, (3, 4) ) = (1, 2, 3, 4) ( 1, 2, (3, 4) ) = (1, 2, 3, 4)
Single items and singleton sequences are the same
Single items and singleton sequences are the same 1 = (1) 1 = (1)
01/31/07 23
Atomic values Atomic values
The values of the 19
The values of the 19 atomic types atomic types available in XML available in XML Schema Schema
E.g. xs:integer, xs:boolean, xs:date
E.g. xs:integer, xs:boolean, xs:date
All the
All the user defined derived atomic types user defined derived atomic types
E.g myNS:ShoeSize
E.g myNS:ShoeSize
xs:untypedAtomic
xs:untypedAtomic
Atomic values carry their type together with the
Atomic values carry their type together with the value value
(8, myNS:ShoeSize) is not the same as (8, xs:integer) (8, myNS:ShoeSize) is not the same as (8, xs:integer)
01/31/07 24
XML nodes XML nodes
7 types of nodes:
7 types of nodes:
document | element | attribute | text | namespaces | PI
document | element | attribute | text | namespaces | PI | comment | comment
Every node has a unique
Every node has a unique node identifier node identifier
Scope of node identifier uniqueness is implementation
Scope of node identifier uniqueness is implementation dependent dependent
Nodes have children and an optional parent
Nodes have children and an optional parent
conceptual “
conceptual “tree tree” ”
Nodes are ordered based of the topological order in
Nodes are ordered based of the topological order in the tree (“ the tree (“document order document order”) ”)
01/31/07 25
Node accessors Node accessors
node-kind : xs:string
node-kind : xs:string
node-name : xs:Qname ?
node-name : xs:Qname ?
parent : node() ?
parent : node() ?
string-value : xs:string
string-value : xs:string
typed-value : xs:anyAtomicType *
typed-value : xs:anyAtomicType *
type-name : xs:Qname ?
type-name : xs:Qname ?
children : node()*
children : node()*
attributes : attribute() *
attributes : attribute() *
namespaces : node() *
namespaces : node() *
01/31/07 26
Example of well formed XML Example of well formed XML data data
< <book book year year=“1967”> =“1967”> < <title title>The politics of experience</ >The politics of experience</title title> > < <author author>R.D. Laing</ >R.D. Laing</author author> > </ </book book> >
3 element nodes, 1 attribute node, 5 text nodes
3 element nodes, 1 attribute node, 5 text nodes
name(book element) = {-}:book
name(book element) = {-}:book
In the absence of schema validation
In the absence of schema validation
type(book element) = xs:untyped
type(book element) = xs:untyped
type(author element) = xs:untyped
type(author element) = xs:untyped
type(year attribute) = xs:untypedAtomic
type(year attribute) = xs:untypedAtomic
typed-value(author element) = (“R.D. Laing” , xs:untypedAtomic)
typed-value(author element) = (“R.D. Laing” , xs:untypedAtomic)
typed-value(year attribute) = (“1967”, xs:untypedAtomic)
typed-value(year attribute) = (“1967”, xs:untypedAtomic)
01/31/07 27
XML schema example XML schema example
<type name=“ <type name=“book-type book-type”> ”> <sequence>
<sequence> <attribute name=“ <attribute name=“year year” type=“xs:integer”> ” type=“xs:integer”> <element name=“ <element name=“title title” type=“xs:string”> ” type=“xs:string”> <sequence minoccurs=“0”> <sequence minoccurs=“0”> <element name=“ <element name=“author author” type=“xs:string> ” type=“xs:string> </sequence> </sequence> </sequence> </sequence>
</type> </type>
<element name=“ <element name=“book book” type=“ ” type=“book-type book-type”> ”>
01/31/07 28
Schema validated XML data Schema validated XML data
< <book book year year=“1967” > =“1967” > < <title title>The politics of experience</ >The politics of experience</title title> > < <author author>R.D. Laing</ >R.D. Laing</author author> > </ </book book> >
After schema validation
After schema validation
type(book element) = {uri}:book-type
type(book element) = {uri}:book-type
type(author element) = xs:string
type(author element) = xs:string
type(year attribute) = xs:integer
type(year attribute) = xs:integer
typed-value(author element) = (“R.D. Laing” , xs:string)
typed-value(author element) = (“R.D. Laing” , xs:string)
typed-value(year attribute) = (1967 , xs:integer)
typed-value(year attribute) = (1967 , xs:integer) Schema validation impacts the data model
Schema validation impacts the data model representation and therefore the XQuery semantics!! representation and therefore the XQuery semantics!!
01/31/07 29
Lexical and binary aspect Lexical and binary aspect
- f the data
- f the data
Every node holds (logically) redundant information:
Every node holds (logically) redundant information:
<a xsi:type=“xs:integer”>001</a>
<a xsi:type=“xs:integer”>001</a>
dm:string-value () “001” as xs:string
dm:string-value () “001” as xs:string
dm:typed-value ()
dm:typed-value ()
“
“001” as an xs:untyped 001” as an xs:untyped before before validation validation
1 as an xs:integer
1 as an xs:integer after after validation validation Implementations can store :
Implementations can store :
The
The string value string value
Retrieve the typed value dynamically based on the type, every
Retrieve the typed value dynamically based on the type, every time is needed time is needed
The
The typed value typed value
Retrieve an acceptable lexical value for that type every time this is
Retrieve an acceptable lexical value for that type every time this is required required
Both
Both
In case of unvalidated data the two are the same
In case of unvalidated data the two are the same
01/31/07 30
Typed vs. untyped XML Data Typed vs. untyped XML Data
- Untyped data (non XML Schema validated)
<a>3</a> eq 3 <a>3</a> eq “3”
- Typed data (after XML Schema validation)
<a xsi:type=“xs:integer”>3</a> eq 3 <a xsi:type=“xs:string”>3</a> eq 3 <a xsi:type=“xs:integer”>3</a> eq “3” <a xsi:type=“xs:string”>3</a> eq “3”
01/31/07 31
XML data equivalence XML data equivalence
XQuery has multiple notions of data “equality”
XQuery has multiple notions of data “equality”
“
“= =“, “ “, “eq eq”, “ ”, “is is”, “ ”, “fn:deep-equal()” fn:deep-equal()”
Expected properties:
Expected properties:
Transitivity
Transitivity, , reflexivity reflexivity and and symmetry symmetry
Necessary for grouping, indexing and hashing
Necessary for grouping, indexing and hashing
Additional property:
Additional property:
if (
if ( data data1
1
equal equal data data2
2 )
) then ( then ( f f( (data data1)
1)
equal equal f f( (data data2
2)
)
)
)
Necessary for memoization, caching
Necessary for memoization, caching
None of the equality relationships above (except “is”)
None of the equality relationships above (except “is”) satisfies those properties satisfies those properties
The “
The “is is” relationship only applies to nodes ” relationship only applies to nodes
Careful implementations for
Careful implementations for indexes indexes, , hashing hashing, , caches caches
01/31/07 32
Document order Document order
< <book book year year=“1967” price=“45.32> =“1967” price=“45.32> < <title title>The politics of experience</ >The politics of experience</title title> > < <author author>R.D. Laing</ >R.D. Laing</author author> > </ </book book> >
How many nodes here ?
How many nodes here ?
What is the order between nodes ?
What is the order between nodes ?
01/31/07 33
Document order Document order
< <book book(n1) (n1) year year(n2) (n2) =“1967” price =“1967” price(n3) (n3)=“45.32> =“45.32>(n4) (n4) < <title title(n5) (n5)> >(n6) (n6) The politics of The politics of experience</ experience</title title> >(n7) (n7) < <author author(n8) (n8)> >(n9) (n9) R.D. Laing</ R.D. Laing</author author> > </ </book book> >
How many nodes here ? 9 How many nodes here ? 9
What is the order between nodes ? What is the order between nodes ?
n1 before all the others
n1 before all the others
order of n2 and n3 non-deterministic
- rder of n2 and n3 non-deterministic
n2 and n3 are before n4,n5,n6,n7,n8,n9
n2 and n3 are before n4,n5,n6,n7,n8,n9
n4<n5<n6<n7<n8<n9 (top-down, left to right among the
n4<n5<n6<n7<n8<n9 (top-down, left to right among the children) children)
01/31/07 34
XQuery type system XQuery type system
XQuery has a powerful (and complex!) type system XQuery has a powerful (and complex!) type system
XQuery types are imported from XML Schemas XQuery types are imported from XML Schemas
Every XML data model instance has a dynamic type Every XML data model instance has a dynamic type
Every XQuery expression has a static type Every XQuery expression has a static type
Pessimistic static type inference Pessimistic static type inference
The goal of the type system is: The goal of the type system is:
1. 1.
detect statically errors in the queries
detect statically errors in the queries
2. 2.
infer the type of the result of valid queries infer the type of the result of valid queries
3. 3.
ensure statically that the result of a given query is of a given ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given type (expected) type if the input dataset is guaranteed to be of a given type
01/31/07 35
XQuery type system XQuery type system components components
Atomic types
Atomic types
xs:untypedAtomic
xs:untypedAtomic
All 19 primitive XML Schema types
All 19 primitive XML Schema types
All user defined atomic types
All user defined atomic types
Empty, None
Empty, None
Type constructors (simplification!)
Type constructors (simplification!)
Elements:
Elements: element name {type} element name {type}
Attributes:
Attributes: attribute name {type} attribute name {type}
Alternation :
Alternation : type1 | type 2 type1 | type 2
Sequence:
Sequence: type1, type2 type1, type2
Repetition:
Repetition: type* type*
Interleaved product:
Interleaved product: type1 & type2 type1 & type2
- type1 intersect type2 ?
- type1 subtype of type2 ?
- type1 equals type2 ?
01/31/07 36
XML queries XML queries
An XQuery basic structure:
An XQuery basic structure:
a a prolog prolog + an + an expression expression
Role of the prolog:
Role of the prolog:
Populate the context where the expression is compiled Populate the context where the expression is compiled and evaluated and evaluated
Prologue contains:
Prologue contains:
namespace definitions namespace definitions
schema imports schema imports
default element and function namespace default element and function namespace
function definitions function definitions
collations declarations collations declarations
function library imports function library imports
global and external variables definitions global and external variables definitions
etc
etc
01/31/07 37
XQuery processing XQuery processing
01/31/07 38
XQuery expressions XQuery expressions
XQuery Expr := XQuery Expr :=Constants | Variable | FunctionCalls | PathExpr Constants | Variable | FunctionCalls | PathExpr ComparisonExpr | ArithmeticExpr | LogicExpr | ComparisonExpr | ArithmeticExpr | LogicExpr | FLWRExpr | ConditionalExpr | QuantifiedExpr | FLWRExpr | ConditionalExpr | QuantifiedExpr | TypeSwitchExpr | InstanceofExpr | CastExpr | TypeSwitchExpr | InstanceofExpr | CastExpr | UnionExpr | IntersectExceptExpr | UnionExpr | IntersectExceptExpr | ConstructorExpr | ValidateExpr ConstructorExpr | ValidateExpr
Expressions can be nested with full generality ! Expressions can be nested with full generality ! Functional programming heritage (ML, Haskell, Lisp) Functional programming heritage (ML, Haskell, Lisp)
01/31/07 39
Constants Constants
XQuery grammar has built-in support for: XQuery grammar has built-in support for:
Strings:
Strings: “125.0” or ‘125.0’ “125.0” or ‘125.0’
Integers:
Integers: 150 150
Decimal:
Decimal: 125.0 125.0
Double:
Double: 125.e2 125.e2
19 other
19 other atomic types atomic types available via XML Schema available via XML Schema
Values can be constructed
Values can be constructed
with constructors in F&O doc:
with constructors in F&O doc: fn:true(), fn:date(“2002-5-20”) fn:true(), fn:date(“2002-5-20”)
by casting
by casting
by schema validation
by schema validation
01/31/07 40
Variables Variables
$ + Qname (e.g. $x, $ns:foo)
$ + Qname (e.g. $x, $ns:foo)
bound, not assigned
bound, not assigned
XQuery does not allow variable assignment
XQuery does not allow variable assignment
created by
created by let let, , for for, , some/every, typeswitch some/every, typeswitch expressions, function parameters expressions, function parameters
example:
example: let $x := ( 1, 2, 3 ) let $x := ( 1, 2, 3 ) return count($x) return count($x)
above scoping ends at conclusion of
above scoping ends at conclusion of return return expression expression
01/31/07 41
A built-in function sampler A built-in function sampler
fn:document(xs:anyURI)=> document?
fn:document(xs:anyURI)=> document?
fn:empty(item*) => boolean
fn:empty(item*) => boolean
fn:index-of(item*, item) => xs:unsignedInt?
fn:index-of(item*, item) => xs:unsignedInt?
fn:distinct-values(item*) => item*
fn:distinct-values(item*) => item*
fn:distinct-nodes(node*) => node*
fn:distinct-nodes(node*) => node*
fn:union(node*, node*) => node*
fn:union(node*, node*) => node*
fn:except(node*, node*) => node*
fn:except(node*, node*) => node*
fn:string-length(xs:string?) => xs:integer?
fn:string-length(xs:string?) => xs:integer?
fn:contains(xs:string, xs:string) => xs:boolean
fn:contains(xs:string, xs:string) => xs:boolean
fn:true() => xs:boolean
fn:true() => xs:boolean
fn:date(xs:string) => xs:date
fn:date(xs:string) => xs:date
fn:add-date(xs:date, xs:duration) => xs:date
fn:add-date(xs:date, xs:duration) => xs:date
See Functions and Operators W3C specification See Functions and Operators W3C specification
01/31/07 42
Atomization Atomization
fn:data(item*) ->
fn:data(item*) -> xs:anyAtomicType*
Extracting the “value” of a node, or returning
Extracting the “value” of a node, or returning the atomic value the atomic value
Implicitly applied:
- Arithmetic expressions
Arithmetic expressions
- Comparison expressions
Comparison expressions
- Function calls and returns
Function calls and returns
- Cast expressions
Cast expressions
- Constructor expressions for various kinds of nodes
Constructor expressions for various kinds of nodes
- order by
- rder by clauses in FLWOR expressions
clauses in FLWOR expressions
01/31/07 43
Constructing sequences Constructing sequences
(1, 2, 2, 3, 3, <a/>, <b/>) (1, 2, 2, 3, 3, <a/>, <b/>)
“
“,” is the sequence concatenation operator ,” is the sequence concatenation operator
Nested sequences are flattened:
Nested sequences are flattened: (1, 2, 2, (3, 3)) => (1, 2, 2, 3,3) (1, 2, 2, (3, 3)) => (1, 2, 2, 3,3)
range expressions:
range expressions: (1 to 3) => (1, 2,3) (1 to 3) => (1, 2,3)
01/31/07 44
Combining sequences Combining sequences
Union, Intersect, Except
Union, Intersect, Except
Work only for sequences of nodes, not atomic values
Work only for sequences of nodes, not atomic values
Eliminate duplicates and reorder to document order
Eliminate duplicates and reorder to document order $x := <a/>, $y := <b/>, $z := <c/> $x := <a/>, $y := <b/>, $z := <c/> ($x, $y) union ($y, $z) => (<a/>, <b/>, ($x, $y) union ($y, $z) => (<a/>, <b/>, <c/>) <c/>)
F&O specification provides other functions &
F&O specification provides other functions &
- perators; eg.
- perators; eg. fn:distinct-values()
fn:distinct-values() and and fn:distinct-nodes() fn:distinct-nodes() particularly useful particularly useful
01/31/07 45
Arithmetic expressions Arithmetic expressions
1 + 4 1 + 4 $a div 5 $a div 5 5 div 6 5 div 6 $b mod 10 $b mod 10 1 - (4 * 8.5) 1 - (4 * 8.5)
- 55.5
- 55.5
<a>42</a> + 1 <a>baz</a> + 1 <a>42</a> + 1 <a>baz</a> + 1 validate {<a xsi:type=“xs:integer”>42</a> }+ 1 validate {<a xsi:type=“xs:integer”>42</a> }+ 1
validate {<a xsi:type=“xs:string”>42</a> }+ 1
validate {<a xsi:type=“xs:string”>42</a> }+ 1 Apply the following rules:
Apply the following rules:
atomize
atomize all operands. if either operand is (), => () all operands. if either operand is (), => ()
if an operand is untyped, cast to
if an operand is untyped, cast to xs:double xs:double (if unable, => (if unable, => error) error)
if the operand types differ but can be
if the operand types differ but can be promoted promoted to common type, do so to common type, do so (e.g.: (e.g.: xs:integer xs:integer can be promoted to can be promoted to xs:double xs:double) )
if operator is consistent w/ types, apply it; result is either atomic
if operator is consistent w/ types, apply it; result is either atomic value or value or error error
if type is not consistent, throw type exception
if type is not consistent, throw type exception
01/31/07 46
Logical expressions Logical expressions
expr1
expr1 and and expr2 expr2 expr1 expr1 or
- r expr2
expr2 fn:not fn:not() as a function () as a function
return
return true, false true, false
Different from SQL
Different from SQL
two
two value logic, value logic, not not three three value logic value logic
Different from imperative languages
Different from imperative languages
and
and, , or
- r are commutative in Xquery, but not in Java.
are commutative in Xquery, but not in Java.
if (($x castable as xs:integer) and (($x cast as xs:integer) eq 2) ) ….. if (($x castable as xs:integer) and (($x cast as xs:integer) eq 2) ) …..
Non-deterministic
Non-deterministic
false and error => false false and error => false or
- r
error ! (non-deterministically) error ! (non-deterministically)
- Rules:
Rules:
first compute the first compute the Boolean Effective Value (BEV) Boolean Effective Value (BEV) for each operand: for each operand:
if (), “”, NaN, 0, then return if (), “”, NaN, 0, then return false false
if the operand is of type xs:boolean, return it; if the operand is of type xs:boolean, return it;
If operand is a sequence with first item a node, return true
If operand is a sequence with first item a node, return true
else raises an error
else raises an error
then use standard two value Boolean logic on the two BEV's as appropriate then use standard two value Boolean logic on the two BEV's as appropriate
01/31/07 47
Comparisons Comparisons
<<, >>
testing relative position
- f one node vs. another
(in document order)
Order is, isnot
for testing identity of single nodes
Node =, !=, <=, <, >, >=
Existential quantification + automatic type coercion
General eq, ne, lt, le, gt, ge
for comparing single values
Value
01/31/07 48
Value and general Value and general comparisons comparisons
<a>42</a> eq “42” true
<a>42</a> eq “42” true
<a>42</a> eq 42 error
<a>42</a> eq 42 error
<a>42</a> eq “42.0” false
<a>42</a> eq “42.0” false
<a>42</a> eq 42.0 error
<a>42</a> eq 42.0 error
<a>42</a> = 42 true
<a>42</a> = 42 true
<a>42</a> = 42.0 true
<a>42</a> = 42.0 true
<a>42</a> eq <b>42</b> true
<a>42</a> eq <b>42</b> true
<a>42</a> eq <b> 42</b> false
<a>42</a> eq <b> 42</b> false
<a>baz</a> eq 42 error
<a>baz</a> eq 42 error
() eq 42 ()
() eq 42 ()
() = 42 false
() = 42 false
(<a>42</a>, <b>43</b>) = 42.0 true
(<a>42</a>, <b>43</b>) = 42.0 true
(<a>42</a>, <b>43</b>) = “42” true
(<a>42</a>, <b>43</b>) = “42” true
ns:shoesize(5) eq ns:hatsize(5) true
ns:shoesize(5) eq ns:hatsize(5) true
(1,2) = (2,3) true
(1,2) = (2,3) true
01/31/07 49
Algebraic properties of Algebraic properties of comparisons comparisons
General comparisons not reflexive, transitive
General comparisons not reflexive, transitive
(1,3) = (1,2)
(1,3) = (1,2) (but also !=, <, >, <=, >= !!!!!) (but also !=, <, >, <=, >= !!!!!)
Reasons
Reasons
implicit existential quantification, dynamic casts implicit existential quantification, dynamic casts Negation rule does not hold
Negation rule does not hold
fn:not($x = $y) is not equivalent to $x != $y
fn:not($x = $y) is not equivalent to $x != $y
General comparison not transitive, not reflexive
General comparison not transitive, not reflexive
Value comparisons are
Value comparisons are almost almost transitive transitive
Exception:
Exception:
xs:decimal due to the loss of precision
xs:decimal due to the loss of precision
Impact on grouping, hashing, indexing, caching !!!
01/31/07 50
XPath expressions XPath expressions
An expression that defines the set of nodes where the
An expression that defines the set of nodes where the navigation starts + a series of selection steps that explain how navigation starts + a series of selection steps that explain how to navigate into the XML tree to navigate into the XML tree
A step:
A step:
axis
axis ‘::’ ‘::’ nodeTest nodeTest
Axis control the navigation direction in the tree
Axis control the navigation direction in the tree
attribute, child, descendant, descendant-or-self, parent, self
attribute, child, descendant, descendant-or-self, parent, self
The other Xpath 1.0 axes (
The other Xpath 1.0 axes (following, following-sibling, preceding, following, following-sibling, preceding, preceding-sibling, ancestor, ancestor-or-self preceding-sibling, ancestor, ancestor-or-self) are optional in XQuery ) are optional in XQuery
Node test by:
Node test by:
Name
Name (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* ) (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* )
Kind of item
Kind of item (e.g. node(), comment(), text() ) (e.g. node(), comment(), text() )
Type test
Type test (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer) (e.g. element(ns:PO, ns:PoType), attribute(*, xs:integer)
01/31/07 51
Examples of path expressions Examples of path expressions
document(“bibliography.xml”)/child::bib
document(“bibliography.xml”)/child::bib
$x/child::bib/child::book/attribute::year
$x/child::bib/child::book/attribute::year
$x/parent::*
$x/parent::*
$x/child::*/descendent::comment()
$x/child::*/descendent::comment()
$x/child::element(*, ns:PoType)
$x/child::element(*, ns:PoType)
$x/attribute::attribute(*, xs:integer)
$x/attribute::attribute(*, xs:integer)
$x/ancestors::document(schema-element(ns:PO))
$x/ancestors::document(schema-element(ns:PO))
$x/(child::element(*, xs:date) |
$x/(child::element(*, xs:date) | attribute::attribute(*, xs:date) attribute::attribute(*, xs:date)
$x/f(.)
$x/f(.)
01/31/07 52
Xpath abbreviated syntax Xpath abbreviated syntax
Axis can be missing
Axis can be missing
By default the child axis
By default the child axis $x/ $x/child:: child::person -> $x/person person -> $x/person
Short-hands for common axes
Short-hands for common axes
Descendent-or-self
Descendent-or-self
$x/ $x/descendant-or-self::*/child:: descendant-or-self::*/child::comment()-> $x comment()-> $x/ // /comment() comment()
Parent
Parent
$x/ $x/parent::* parent::* -> $x/
- > $x/..
..
Attribute
Attribute
$x/ $x/attribute:: attribute::year -> $x/ year -> $x/@ @year year
Self
Self
$x/ $x/self::* self::* -> $x/
- > $x/.
.
01/31/07 53
Xpath filter predicates Xpath filter predicates
Syntax:
Syntax:
expression1 expression1 [ [ expression2 expression2 ] ]
[ ] is an overloaded operator
[ ] is an overloaded operator
Filtering by position (if numeric value) :
Filtering by position (if numeric value) :
/book[3] /book[3] /book[3]/author[1] /book[3]/author[1] /book[3]/author[1 to 2] /book[3]/author[1 to 2] Filtering by predicate :
Filtering by predicate :
//book [author/firstname = “ronald”]
//book [author/firstname = “ronald”]
//book [@price <25]
//book [@price <25]
//book [count(author [@gender=“female”] )>0
//book [count(author [@gender=“female”] )>0
Classical Xpath mistake
Classical Xpath mistake
$x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1]
$x/a/b[1] means $x/a/(b[1]) and not ($x/a/b)[1]
01/31/07 54
Conditional expressions Conditional expressions
if ( $book/@year <1980 ) if ( $book/@year <1980 ) then ns:WS(<old>{$x/title}</old>) then ns:WS(<old>{$x/title}</old>) else ns:WS(<new>{$x/title}</new>) else ns:WS(<new>{$x/title}</new>)
Only one branch allowed to raise execution errors
Only one branch allowed to raise execution errors
Impacts scheduling and parallelization