ROLEX Relational On-Line Exchange with XML Phil Bohannon, Sumit - - PowerPoint PPT Presentation
ROLEX Relational On-Line Exchange with XML Phil Bohannon, Sumit - - PowerPoint PPT Presentation
ROLEX Relational On-Line Exchange with XML Phil Bohannon, Sumit Ganguly, Hank Korth, PPS Narayan, Pradeep Shenoy Database Principles Research Department Information Sciences Research Center Bell Labs ROLEX Evolution of Applications
Lucent Proprietary ROLEX 2
ROLEX
Client Application
Evolution of Applications Evolution of Applications
Net DBMS
SQL/ODBC
In the beginning was client- server
ODBC App. Servers JDBC J2EE HTML
Web Servers
Many layers of software later, applications are portable and browser-based…but slow!
cache
So, cache some data, manage consistency in application
XML
B2B
XML Pub/Shred
XML
Application
Newest applications speak XML Supported by XML Publishing through View Queries Even slower, more caching
cache
Lucent Proprietary ROLEX 3
ROLEX
Example: A ROLEX View Query Example: A ROLEX View Query
Relational Schema
Metroarea(metroid, metroname) Hotel(hotelid, hotelname, starrating, chain_id, metro_id, city, pool, gym) Guestroom(r_id, roomnumber, type, rackrate) Confroom(c_id, chotel_id, croomnumber, capacity, rackrate) Availability(a_id, a_r_id, startdate, enddate, price)
Desired Hierarchical Structure
<metroarea> <hotel> <confstat> <guestrooms>
Lucent Proprietary ROLEX 4
ROLEX
$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel WHERE metro_id = $m.metroid AND starrating > 4 $m = SELECT metroid, metroname FROM metro
Example: Add “tag queries” Example: Add “tag queries”
<metro> <hotel> <confstat> <guestroom>
Lucent Proprietary ROLEX 5
ROLEX
$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel WHERE metro_id = $m.metroid AND starrating > 4 $m = SELECT metroid, metroname FROM metro
Example: “binding variables” Example: “binding variables”
<metro> <hotel> <confstat> <guestroom>
Lucent Proprietary ROLEX 6
ROLEX
Client Application
Our World View Our World View
Net DBMS
SQL/ODBC
In the beginning was client- server
ODBC App. Servers JDBC J2EE HTML
Web Servers
Many layers of software later, applications are portable and browser-based…but slow!
cache
So, cache some data, manage consistency in application
XML
B2B
XML Pub/Shred
XML
Application
Newest applications speak XML Even slower, more caching
cache
Worse, applications need to interoperate for OLTP
Lucent Proprietary ROLEX 7
ROLEX
Current XML Publishing Current XML Publishing
XML
Application
Query Rewrite
Publisher DBMS
SQL Tuple- Stream Tagger Parser Logic Optimize Execute View Query XML Text
DOM Tree
Issues
Inherent inefficiencies DOM Tree = Cache? Updates?
Lucent Proprietary ROLEX 8
ROLEX
Six Steps to ROLEX: Step 1 Six Steps to ROLEX: Step 1
XML
Application
Parser Logic View Query XML Text
DOM Tree
- 1. Combine DBMS and
Publisher (many vendors)
Query Rewrite
“ROLEX” DBMS
SQL Tuple Stream
Tagger
Optimize Execute
Lucent Proprietary ROLEX 9
ROLEX
Six Steps to ROLEX: Step 2 Six Steps to ROLEX: Step 2
XML
Application
Parser Logic View Query XML Text
DOM Tree
- 1. DBMS += Publisher
- 2. Optimize to produce
tree results
Optimize Tree Query
ROLEX DBMS
Tree
Tagger
Execute
Lucent Proprietary ROLEX 10
ROLEX
Six Steps to ROLEX: Step 3 Six Steps to ROLEX: Step 3
- 1. DBMS += Publisher
- 2. Optimize to produce
trees
- 3. Main-Memory DBMS
XML
Application
Parser Logic View Query XML Text
DOM Tree Optimize Tree Query
ROLEX
Tree
Tagger
Execute
DBMS DataBlitz
TM
Lucent Proprietary ROLEX 11
ROLEX
Six Steps to ROLEX: Step 4 Six Steps to ROLEX: Step 4
View Query
4. Share memory, produce DOM tree directly, eliminate parsing
Optimize Tree Query ROLEX
Execute
Application
Logic
DataBlitz
TM
DOM Tree
DBMS
Tree
Lucent Proprietary ROLEX 12
ROLEX
Six Steps to ROLEX: Step 5 Six Steps to ROLEX: Step 5
View Query
4. Share memory, produce DOM tree directly, eliminate parsing 5. Virtual DOM, lazy evaluation
Optimize Tree Query
ROLEX
Execute
Application
Logic
DataBlitz
TM
DOM Tree
Lucent Proprietary ROLEX 13
ROLEX
Six Steps to ROLEX: Step 6 Six Steps to ROLEX: Step 6
6. Optimize for the expected navigation profile
Optimize Tree Query
Execute
Application
Logic
DataBlitz
TM
DOM Tree View Query
ROLEX
Navigation Profile +
Lucent Proprietary ROLEX 14
ROLEX
Considering Expected Use Considering Expected Use
Recent interest in making use of user feedback during long queries MQO takes advantage of co-incident query execution What about consecutive execution? OLTP interaction with DBMS very stylized, yet little work looks at pattern of related queries Notable exception: [Florescu, Levy, Suciu,
Yagoub, 1999]
Lucent Proprietary ROLEX 15
ROLEX
Navigational Profile Navigational Profile
Navigational Profile, an input to query
- ptimization
Simple approach to expected use for a tree- based data model Example Profile:
<metro> <hotel> <confstat> <guestrooms>
0.2 1.0 0.1 Pr{hotel|metro} Pr{confstat|hotel}
Lucent Proprietary ROLEX 16
ROLEX
Remainder of Talk Remainder of Talk
The Virtual DOM
“Document Object Model”
A Decorrelation Plan Space A VOLCANO-Based Optimizer Cost Estimation Performance Conclusion
Lucent Proprietary ROLEX 17
ROLEX
Notation for queries Notation for queries
<confstat>
$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel WHERE metro_id = $m.metroid AND starrating > 4 $m = SELECT metroid, metroname FROM metro
<metro> <hotel> <guestroom>
Let the query for a schema-tree node be Qb(s1,..,sk) where s1, .., sk are parameters and b is a binding variable.
Qh(m)
Schema-tree nodes
Lucent Proprietary ROLEX 18
ROLEX
The Virtual DOM The Virtual DOM
A query “sub-plan” and navigation index, NI, for each node
maps s1,…sk to query results or “absent” supports sibling navigation Allow support for DOM
- perations within DB
If s1,..,sk “absent” in NI, execute sub-plan for node.
<metro> <hotel> <guestrooms>
NI
Lucent Proprietary ROLEX 19
ROLEX
Decorrelation Plan-Space Decorrelation Plan-Space
Usual strategy: decorrelate as much as possible to explore alternate join orders ROLEX strategy: may want to leave sub- query correlated to avoid work when navigation probability is low Approach: develop a decorrelation plan- space and let the optimizer decide what to do
Lucent Proprietary ROLEX 20
ROLEX
Decorrelation Decorrelation
$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel WHERE metro_id = $m.metroid AND starrating > 4 $m = SELECT metroid, metroname FROM metro
<metro> <hotel>
Qm() Qh(m)
$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel, metro WHERE metro_id = metroid AND starrating > 4 GROUP BY hotelid
Qhm()
Lucent Proprietary ROLEX 21
ROLEX
Decorrelation Plan-Space Decorrelation Plan-Space
<metroarea> <hotel>
Qm() Qh(m)
<hotel_available> <metro_available>
Qa(h) Qv(m,a)
{Qv(m,a), Qv
a(h,m),
Alternatives for Qv(m,a): Qv
a,h,m()}
Qv
a,h(m),
Lucent Proprietary ROLEX 22
ROLEX
Notes Notes
We always replace bottom query rather than top query
Keeps space from being exponential Outer-join rules avoided Cost of greater emphasis on CSE required by optimizer
Lucent Proprietary ROLEX 23
ROLEX
VOLCANO-Based Optimizer VOLCANO-Based Optimizer
VOLCANO [Graefe McKenna 1993]-style
- ptimizer [Roy 2000] modified to support
ROLEX optimization New operators (NavNodes) added to model schema-tree nodes from the query Decorrelation plan-space modeled as distinct but equivalent NavNodes VOLCANO automatically collapses equivalent plan nodes across the tree
Lucent Proprietary ROLEX 24
ROLEX
Cost Model Cost Model
Goal:
estimate expected unique calls for each node given a Navigation Profile
Approach:
Estimate, for each schema-tree node, n, in query plan: …expected # visits: EVis(n) …expected output from one call: ESize(n) …exp. unique parameter bindings: EUniqBind(s1,..,sk)
Unique calls = min(EVis(n),EUniqBind(s1,..,sk))
Lucent Proprietary ROLEX 25
ROLEX
Materialization Materialization
For individual query operators, consider only the binding parameters present in that operator
σ σ
metroid=$m.id hotelid=$h.id
{$m} {$h} {$m,$h} {$m} {$h}
Materialize “opportunistically” as bindings increase:
Compute this sub- expression only when $m changes Generalizes [Rao Ross 98]
If EVis(n)>EUniqBind(s1,..,sk) for a node, then materialize that node’s result
Lucent Proprietary ROLEX 26
ROLEX
Experimental Setup Experimental Setup
Execution engine built on top of DataBlitz™ tuple-layer interface
Handles complex queries including group-by and aggregation
Virtual DOM under construction Prototype for performance testing scans tree result
Obeys navigation probabilities Does a local ‘sprintf’ to model consumption of a node
Lucent Proprietary ROLEX 27
ROLEX
Experimental Results Experimental Results
Vary the size of a simple view query Compare time needed to:
Execute query and traverse results Parse result with Xerces C++ DOM Parse result with … IDOM
First, validate the basic idea:
Lucent Proprietary ROLEX 28
ROLEX
Do navigation probabilities matter? Do navigation probabilities matter?
Existence proof…
Create view v1 as <hotel> ( $h = Select hotelid, name, starrating, state_id From hotel ) <avail> ($a = SELECT rhotel_id, startdate, roomnumber FROM availability, guestroom WHERE type > 5 and rhotel_id = $h.hotelid AND startdate > 12/15/02 and r_id = a_r_id ) </avail>
Optimizer says: P1: 0 to .15 P2: .15 to .25 P3: .25 to 3.0
Lucent Proprietary ROLEX 29
ROLEX
Size of Plan Space Size of Plan Space
Start with a complicated query Vary the probabilities on a subset of the edges so that expected number of output tuples varied exponentially (1, 2, 4, 8, …) While holding the other probabilities constant at 1 (or 0). Compute the number of distinct plans generated
Lucent Proprietary ROLEX 30
ROLEX
Importance of Navigation Profile Importance of Navigation Profile
FROM confroom SELECT SUM(capacity) AND startdate = .startdate WHERE chotel_id = .hotelid AND enddate >= 2/1/01 <confstat> <metro_available> <hotel_available> <hotel> FROM confroom SELECT * AND startdate <= 1/1/01 WHERE chotel_id = .hotelid FROM metroarea SELECT metroid, metroname WHERE rhotel_id = .hotelid FROM availability, guestroom AND a_r_id = r_id <metro> SELECT * FROM hotel <confroom> WHERE metro_id = .metroid AND starrating > 4 SELECT COUNT(a_id) AND enddate >= 2/1/01 GROUP BY startdate SELECT COUNT(a_id) AND a_r_id = r_id FROM availability, hotel, guestroom WHERE rhotel_id = hotelid AND metro_id = .metroid
$h m $m $h $m Q = h c a Q = Q = $h $a Q = s Q = Q =
Binding Var.
$m
Tag
v $c $s $a $v $h $h $h $h $m $a, $m
Parameters
H-S-V H-A-C
Lucent Proprietary ROLEX 31
ROLEX
Number of plans generated Number of plans generated
1 20 30 40 50
h-c-s-a-v h-c-s-a h-c-s-v h-c-a-v h-s-a-v h-c-a h-s-a h-c-s h-c-v h-a-v h-s-v c-s-a-v h-a h-c h-s h-v c-s-v c-a-v c-s-a c-v c-a s-a-v h c-s s-v a-v s-a v c a s
Navigation Profiles Varied Number of Plans
Lucent Proprietary ROLEX 32
ROLEX
Summary: ROLEX Benefits Summary: ROLEX Benefits
Optimize Tree Query
Execute
DataBlitz
TM
DOM Tree View Query
ROLEX
Navigation Profile +
Lazy Evaluation Optimize for expected use
Cost model Materialization
Use DBMS data structures to support navigation Replace application cache Potential to support updates via DOM Incremental support for new XML tools
Lucent Proprietary ROLEX 33
ROLEX
Related Work Related Work
Relational Publishing:
XPERANTO (IBM, Uwisc, Cornell) SilkRoute (AT&T & UWash)
Both compose application query and view into one expression that is optimized Neither supports navigation: full document must be materialized
[Florescu, Levy, Suciu, Yagoub, 1999]
considers navigation probabilities on a web site, performs a variety of heuristic optimizations
[Ludascher, Papakonstantinou, Velikhov, LNCS 2000]
lazy evaluation of DOM tree For slow sources, not
- ptimized for expected
pattern
Lucent Proprietary ROLEX 34