Module 5 Implementation of XQuery
(Rewrite, Indexes, Runtime System)
1
Module 5 Implementation of XQuery (Rewrite, Indexes, Runtime - - PowerPoint PPT Presentation
Module 5 Implementation of XQuery (Rewrite, Indexes, Runtime System) 1 XQuery: a language at the cross-roads Query languages Functional programming languages Object-oriented languages Procedural languages Some new features :
1
2
+ Environment for expressions + Expressions nested with full generality + Lazy evaluation
+ High level construct (FLWOR/Select-From-Where) + Streaming execution + Logical/physical data mismatch and the appropriate optimizations
3
+ Expressions nested with full generality + Nodes with node/object identity
+ Side effects + Error handling
4
5
Data access pattern (APIs)
6
SELECT * FROM Hotels h, Cities c WHERE h.city = c.name; Parser & Query Optimizer <Ritz, Paris, ...> <Weisser Hase, Passau, ...> <Edgewater, Madison, ...> Scan(Hotels) Hash Join Scan(Cities) Execution Engine plan Catalogue Indexes & Base Data Schema info, DB statistics <Ritz, ...> ... <Paris, ...> ...
7
8
9
10
$x/chapter//section/title
11
chapter section title
<book> <chapter> <section> <title/> </section> </chapter> </book> begin book begin chapter begin section begin title end title end section end chapter end book
– Logical and physical operation
– Select, project, join, duplicate-elim., …
12
– Differences: Match ( expr, NodeTest) for path expressions
– E.g. unordered is an annotation – Annotations exploited during optimization
– E.g. general FLWR, but also LET and MAP – E.g. typeswitch, but also instanceof and conditionals
13
14
15
for $line in $doc/Order/OrderLine where xs:integer(fn:data($line/SellersID)) eq 1 return <lineItem>{$line/Item/ID}</lineItem>
16
for $line in $doc/Order/OrderLine for $line in $doc/Order/OrderLine where $line/SellersID eq 1 where $line/SellersID eq 1 return <lineItem>{$line/Item/ID}</lineItem> return <lineItem>{$line/Item/ID}</lineItem>
17
18
19
20
21
22
mandatory the same error)
values accepted for E1, or error
23
– () is converted into fn:false() before use
24
let $x := 3 3+2 return $x +2
let $x := <a/> (<a/>, <a/> ) NO. Side effects. (Node identity) return ($x, $x ) declare namespace ns=“uri1” NO. Context sensitive let $x := <ns:a/> namespace processing. return <b xmlns:ns=“uri2”>{$x}</b> declare namespace ns:=“uri1” <b xmlns:ns=“uri2”>{<ns:a/>}</b>
25
26
(: before LET :) (: before LET :) let $x := expr1 (: after LET :) (: after LET :) return expr2’ return expr2 where expr2’ is expr2 with substitution {$x/expr1}
27
28
for $x := (1 to 10) let $y := ($input+2) return ($input+2)+$x for $x in (1 to 10) return $y+$x
for $x in (1 to 10) let $y := ($input idiv 0) return if($x lt 1) for $x in (1 to 10) then ($input idiv 0) return if ($x lt 1) else $x then $y else $x
29
define function f($x as xs:integer) as xs:integer 2+1 {$x+1} f(2)
define function f($x as xs:double) as xs:boolean {$x instance of xs:double} f(2) (2 instance of xs:double) NO
30
for $x in (for $y in $input/a/b for $y in $input/a/b, where $y/c eq 3 $x in $y/d return $y/d) where ($x/e eq 4) and ($y/c eq 3) where $x/e eq 4 return $x return $x
– No nested collections in XML
31
for $x in $input/a/b for $x in $input/a/b, where $x/c eq 3 $y in $x/d return (for $y in $x/d) where ($x/e eq 4) and ($x/c eq 3) where $x/e eq 4 return $y return $y)
32
for $x in $input/a/b, for $x in $input/a/b $y in $input/c where ($x/d eq 3) where ($x/d eq 3) return $input/c/e return $y/e for $x in $input/a/b, for $x in $input/a/b $y in $input/c where $x/d eq 3 and $input/c/f eq 4 NO where $x/d eq 3 and $y/f eq 4 return $input/c/e return $y/e for $x in $input/a/b for $x $input/a/b $y in $input/c where ($x/d eq 3) where ($x/d eq 3) return <e>{$x, $input/c}</e> return <e>{$x, $y}</e>
33
NO
for $x in (1 to 10) for $x in (1 to 10) where $x eq 3 where $x eq 3 YES return $x+1 return (3+1) for $x in $input/a for $x in $input/a where $x eq 3 where $x eq 3 NO return <b>{$x}</b> return <b>{3}</b> for $x in (1.0,2.0,3.0) for $x in (1.0,2.0,3.0) NO where $x eq 1 where $x eq 1 return ($x instance of xs:integer) return (1 instance of xs:integer)
34
for $x in $input/a/b let $y := (1 idiv 0) where $x/c lt 3 for $x in $input/a/b return if ($x/c lt 2) where $x/c lt 3 then if ($x/c eq 1) return if($x/c lt 2) then (1 idiv 0) then if ($x/c eq 1) else $x/c+1 then $y else if($x/c eq 0) else $x/c+1 then (1 idiv 0) else if($x/c eq 0) else $x/c+2 then $y else $x/c+2
35
36
for $x in $input/a/b for $y in $input/a, return <c>{$x/.., $x/d}</c> $x in $y/b return <c>{$y, $x/d}</c> for $x in $input/a/b return <c>{$x//e/..}</c> ??
37
38
39
identifiers (e.g. sort by doc order, is, parent, <<) OR node identifiers are required for the result
40
contain duplicates
41
for $x in (1 to 10) return ns:WS($i)
– Scheduling based on data dependency
42
43
– XSLT easier to use if the shape of the data is totally unknown (entropy high) – XQuery easier to use if the shape of the data is known (entropy low)
– Static typing, error detection, lots of optimizations
44
45
46
Data access pattern (APIs)
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
f1 f4 f8 f7 f5 f6 f3 f2
66
67
68
69
70
BeginDocument() BeginElement(„order“, „xs:untypedAny“, 1) BeginAttribute(„id“, „xs:untypedAtomic“, 2) CharData(„4711“) EndAttribute() BeginElement(„date“, „xs:untypedAny“, 3) Text(„2003-08-19“, 4) EndElement() BeginElement(„www.boo.com:lineitem“, „xs:untypedAny“, 5) NameSpace(„www.boo.com“, 6) EndElement() EndElement() EndDocument()
71
<?xml version=„1.0“> <order id=„4711“ > <date>2003-08-19</date> <lineitem xmlns = „www.boo.com“ > </lineitem> </order>
72
<?xml version=„1.0“> <order id=„4711“ > <date>2003-08-19</date> <lineitem xmlns = „www.boo.com“ > </lineitem> </order>
73
– Significant compression over textual format – Used in all tiers of Oracle stack: DB, iAS, etc.
– Generic token table used by binary XML, XML index and in-memory instances
– Encode values in native format (e.g. integers and floats) – Avoid tokens when order is known – For fully structured XML (relational), format very similar to current row format (continuity of storage !)
– Allow any backwards-compatible schema evolution, plus a few incompatible changes, without data migration
74
75
76
77
78
Harry Potter
Lilly Potter James Potter
79
80
81
82
83
Pfad Pfad Surrogat Surrogat Value Value Author[1]/FN[1 Author[1]/FN[1 ] ] 2.1.1.1 2.1.1.1 Rudolf Rudolf Author[1]/LN[1 Author[1]/LN[1 ] ] 2.1.2.1 2.1.2.1 Bayer Bayer
84
85
86
87
88
89
90
91
92
Data access pattern (APIs)
93
94
95
– Combined navigation + read properties – Special methods for fast forward, reverse navigation
Token getNext(), void skipToNextNode(), …
– good: less method calls, stream-based processing – good: integration of data from multiple sources – bad: difficult to wrap existing XML data sources – bad: reverse navigation tricky, difficult programming model
96
97
98
99
100
101
102
103
104
105
106
– All database – Document by document – Collection
– e.g., //emp/salary/fn:data(.) , //emp/salary/fn:string(.) – singletons vs. sequences – string vs. typed-value – which type? homogeneous vs. heterogeneous domains – composite indexes – indexes and errors
– =: problematic due to implicit cast + exists – eq, leq, … less problematic
107
108
109
110
111
112
OrderKe OrderKe y y
po.data.item po.data.item
po.data.pkg po.data.pkg
po.data.item po.data.item
113
<po> <data> <item>foo</item> <pkg>123</pkg> <item>bar</item> </data> </po>
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
for $line in $doc/Order/OrderLine where xs:integer(fn:data($line/SellersID)) eq 1 return <lineItem>{$line/Item/ID}</lineItem>
139
140
141
142
http://www.bluestream.com/dr/?page=Home/Products/XStreamDB/
Based on and part of the Kawa framework. An online sandbox is available too. Open-source.
Nimble Technology's Nimble Integration Suite: http://www.nimble.com/
143
documentation and description. PHP implementation, open-source.
Java.
/• XQuark Group and Université de Versailles Saint-Quentin's: XQuark Fusion and XQuark Bridge, open-source (see also theXQuark home page)
144
– XML data model, XML type system, XQuery basic constructs – Major XQuery applications
– Compilation issues – Data storage and indexing – Runtime algorithms
145
146
147
– Containment and equivalence of a fragment of Xpath, Gerome Miklau,
Dan Suciu
– Algebraic XML Construction and its Optimization in Natix, Thorsten Fiebig
Guido Moerkotte
– TAX: A Tree Algebra for XML , H. V. Jagadish, Laks V. S. Lakshmanan, Divesh
Srivastava, et al.
– Honey, I Shrunk the XQuery! --- An XML Algebra Optimization Approach, Xin
Zhang, Bradford Pielech, Elke A. Rundensteiner
– XML queries and algebra in the Enosys integration platform, the Enosys
team
– An Efficient Compressor for XML Data, Hartmut Liefke, Dan Suciu – Path Queries on Compressed XML, Peter Buneman, Martin Grohe, Christoph
Koch
– XPRESS: A Queriable Compression for XML Data, Jun-Ki Min, Myung-Jae Park,
Chin-Wan Chung
148
Alon Halevy, Dan Suciu
Wu, MI Jignesh M. Patel, MI H. V. Jagadish
Ramanath
Garofalakis, and Yannis Ioannidis
Jeffrey F. Naughton
149
Yahia, C. Botev, J. Shanmugasundaram
Srivastava and Yu Xu
Sihem Amer-Yahia, Laks V. S. Lakshmanan, Shashank Pandit
150
Yahia, Nick Koudas, Divesh Srivastava
Polyzotis, Minos N. Garofalakis, Yannis E. Ioannidis
151
Cho, Sihem Amer-Yahia, Laks V. S. Lakshmanan and Divesh Srivastava
Gerome Miklau Dan Suciu
Giovanni Mella
Wenfei Fan, and Minos Garofalakis
152
Grust, Maurice van Keulen, Jens Teubner
Philip Bohannon, Jeff Naughton, Hank Korth
Cooper, Nigel Sample, M. Franklin, Gisli Hjaltason, Shadmon
Thorsten Fiebig et al.
153
Kini, R. Krishnamurthy, A. N. Rao, F. Tian, S. Viglas, Y. Wang, J. F. Naughton, D. J. DeWitt:
Paparizos
Nick Koudas and Divesh Srivastava.
Bruno, Luis Gravano, Nick Koudas and Divesh Srivastava
Sayed Katica Dimitrova Elke A. Rundensteiner
154
Todd J. Green, Gerome Miklau, Makoto Onizuka, Dan Suciu
Gupta, Dan Suciu
Levine, Sujoe Bose, Vamsi Chaluvadi
Diao, Michael J. Franklin
Dan Olteanu Tim Furche François Bry
155
Alessandro Campi, Stefano Ceri
Lakshmanan, Andrew Nierman, Divesh Srivastava and Yuqing Wu
Wang-Chiew Tan
Wenfei Fan, Jérôme Siméon, Scott Weinstein
156
– Univ. Michigan, At&T, Univ. British Columbia – http://www.eecs.umich.edu/db/timber/
157