ROLEX Relational On-Line Exchange with XML Phil Bohannon, Sumit - - PowerPoint PPT Presentation

rolex
SMART_READER_LITE
LIVE PREVIEW

ROLEX Relational On-Line Exchange with XML Phil Bohannon, Sumit - - PowerPoint PPT Presentation

ROLEX Relational On-Line Exchange with XML Phil Bohannon, Sumit Ganguly, Hank Korth, PPS Narayan, Pradeep Shenoy Database Principles Research Department Information Sciences Research Center Bell Labs ROLEX Evolution of Applications


slide-1
SLIDE 1

ROLEX

Relational On-Line Exchange with XML

Phil Bohannon, Sumit Ganguly, Hank Korth, PPS Narayan, Pradeep Shenoy Database Principles Research Department Information Sciences Research Center Bell Labs

slide-2
SLIDE 2

Lucent Proprietary ROLEX 2

ROLEX

Client Application

Evolution of Applications Evolution of Applications

Net DBMS

SQL/ODBC

In the beginning was client- server

ODBC App. Servers JDBC J2EE HTML

Web Servers

Many layers of software later, applications are portable and browser-based…but slow!

cache

So, cache some data, manage consistency in application

XML

B2B

XML Pub/Shred

XML

Application

Newest applications speak XML Supported by XML Publishing through View Queries Even slower, more caching

cache

slide-3
SLIDE 3

Lucent Proprietary ROLEX 3

ROLEX

Example: A ROLEX View Query Example: A ROLEX View Query

Relational Schema

Metroarea(metroid, metroname) Hotel(hotelid, hotelname, starrating, chain_id, metro_id, city, pool, gym) Guestroom(r_id, roomnumber, type, rackrate) Confroom(c_id, chotel_id, croomnumber, capacity, rackrate) Availability(a_id, a_r_id, startdate, enddate, price)

Desired Hierarchical Structure

<metroarea> <hotel> <confstat> <guestrooms>

slide-4
SLIDE 4

Lucent Proprietary ROLEX 4

ROLEX

$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel WHERE metro_id = $m.metroid AND starrating > 4 $m = SELECT metroid, metroname FROM metro

Example: Add “tag queries” Example: Add “tag queries”

<metro> <hotel> <confstat> <guestroom>

slide-5
SLIDE 5

Lucent Proprietary ROLEX 5

ROLEX

$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel WHERE metro_id = $m.metroid AND starrating > 4 $m = SELECT metroid, metroname FROM metro

Example: “binding variables” Example: “binding variables”

<metro> <hotel> <confstat> <guestroom>

slide-6
SLIDE 6

Lucent Proprietary ROLEX 6

ROLEX

Client Application

Our World View Our World View

Net DBMS

SQL/ODBC

In the beginning was client- server

ODBC App. Servers JDBC J2EE HTML

Web Servers

Many layers of software later, applications are portable and browser-based…but slow!

cache

So, cache some data, manage consistency in application

XML

B2B

XML Pub/Shred

XML

Application

Newest applications speak XML Even slower, more caching

cache

Worse, applications need to interoperate for OLTP

slide-7
SLIDE 7

Lucent Proprietary ROLEX 7

ROLEX

Current XML Publishing Current XML Publishing

XML

Application

Query Rewrite

Publisher DBMS

SQL Tuple- Stream Tagger Parser Logic Optimize Execute View Query XML Text

DOM Tree

Issues

Inherent inefficiencies DOM Tree = Cache? Updates?

slide-8
SLIDE 8

Lucent Proprietary ROLEX 8

ROLEX

Six Steps to ROLEX: Step 1 Six Steps to ROLEX: Step 1

XML

Application

Parser Logic View Query XML Text

DOM Tree

  • 1. Combine DBMS and

Publisher (many vendors)

Query Rewrite

“ROLEX” DBMS

SQL Tuple Stream

Tagger

Optimize Execute

slide-9
SLIDE 9

Lucent Proprietary ROLEX 9

ROLEX

Six Steps to ROLEX: Step 2 Six Steps to ROLEX: Step 2

XML

Application

Parser Logic View Query XML Text

DOM Tree

  • 1. DBMS += Publisher
  • 2. Optimize to produce

tree results

Optimize Tree Query

ROLEX DBMS

Tree

Tagger

Execute

slide-10
SLIDE 10

Lucent Proprietary ROLEX 10

ROLEX

Six Steps to ROLEX: Step 3 Six Steps to ROLEX: Step 3

  • 1. DBMS += Publisher
  • 2. Optimize to produce

trees

  • 3. Main-Memory DBMS

XML

Application

Parser Logic View Query XML Text

DOM Tree Optimize Tree Query

ROLEX

Tree

Tagger

Execute

DBMS DataBlitz

TM

slide-11
SLIDE 11

Lucent Proprietary ROLEX 11

ROLEX

Six Steps to ROLEX: Step 4 Six Steps to ROLEX: Step 4

View Query

4. Share memory, produce DOM tree directly, eliminate parsing

Optimize Tree Query ROLEX

Execute

Application

Logic

DataBlitz

TM

DOM Tree

DBMS

Tree

slide-12
SLIDE 12

Lucent Proprietary ROLEX 12

ROLEX

Six Steps to ROLEX: Step 5 Six Steps to ROLEX: Step 5

View Query

4. Share memory, produce DOM tree directly, eliminate parsing 5. Virtual DOM, lazy evaluation

Optimize Tree Query

ROLEX

Execute

Application

Logic

DataBlitz

TM

DOM Tree

slide-13
SLIDE 13

Lucent Proprietary ROLEX 13

ROLEX

Six Steps to ROLEX: Step 6 Six Steps to ROLEX: Step 6

6. Optimize for the expected navigation profile

Optimize Tree Query

Execute

Application

Logic

DataBlitz

TM

DOM Tree View Query

ROLEX

Navigation Profile +

slide-14
SLIDE 14

Lucent Proprietary ROLEX 14

ROLEX

Considering Expected Use Considering Expected Use

Recent interest in making use of user feedback during long queries MQO takes advantage of co-incident query execution What about consecutive execution? OLTP interaction with DBMS very stylized, yet little work looks at pattern of related queries Notable exception: [Florescu, Levy, Suciu,

Yagoub, 1999]

slide-15
SLIDE 15

Lucent Proprietary ROLEX 15

ROLEX

Navigational Profile Navigational Profile

Navigational Profile, an input to query

  • ptimization

Simple approach to expected use for a tree- based data model Example Profile:

<metro> <hotel> <confstat> <guestrooms>

0.2 1.0 0.1 Pr{hotel|metro} Pr{confstat|hotel}

slide-16
SLIDE 16

Lucent Proprietary ROLEX 16

ROLEX

Remainder of Talk Remainder of Talk

The Virtual DOM

“Document Object Model”

A Decorrelation Plan Space A VOLCANO-Based Optimizer Cost Estimation Performance Conclusion

slide-17
SLIDE 17

Lucent Proprietary ROLEX 17

ROLEX

Notation for queries Notation for queries

<confstat>

$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel WHERE metro_id = $m.metroid AND starrating > 4 $m = SELECT metroid, metroname FROM metro

<metro> <hotel> <guestroom>

Let the query for a schema-tree node be Qb(s1,..,sk) where s1, .., sk are parameters and b is a binding variable.

Qh(m)

Schema-tree nodes

slide-18
SLIDE 18

Lucent Proprietary ROLEX 18

ROLEX

The Virtual DOM The Virtual DOM

A query “sub-plan” and navigation index, NI, for each node

maps s1,…sk to query results or “absent” supports sibling navigation Allow support for DOM

  • perations within DB

If s1,..,sk “absent” in NI, execute sub-plan for node.

<metro> <hotel> <guestrooms>

NI

slide-19
SLIDE 19

Lucent Proprietary ROLEX 19

ROLEX

Decorrelation Plan-Space Decorrelation Plan-Space

Usual strategy: decorrelate as much as possible to explore alternate join orders ROLEX strategy: may want to leave sub- query correlated to avoid work when navigation probability is low Approach: develop a decorrelation plan- space and let the optimizer decide what to do

slide-20
SLIDE 20

Lucent Proprietary ROLEX 20

ROLEX

Decorrelation Decorrelation

$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel WHERE metro_id = $m.metroid AND starrating > 4 $m = SELECT metroid, metroname FROM metro

<metro> <hotel>

Qm() Qh(m)

$h = SELECT hotelid, hotelname, starrating, state_id FROM hotel, metro WHERE metro_id = metroid AND starrating > 4 GROUP BY hotelid

Qhm()

slide-21
SLIDE 21

Lucent Proprietary ROLEX 21

ROLEX

Decorrelation Plan-Space Decorrelation Plan-Space

<metroarea> <hotel>

Qm() Qh(m)

<hotel_available> <metro_available>

Qa(h) Qv(m,a)

{Qv(m,a), Qv

a(h,m),

Alternatives for Qv(m,a): Qv

a,h,m()}

Qv

a,h(m),

slide-22
SLIDE 22

Lucent Proprietary ROLEX 22

ROLEX

Notes Notes

We always replace bottom query rather than top query

Keeps space from being exponential Outer-join rules avoided Cost of greater emphasis on CSE required by optimizer

slide-23
SLIDE 23

Lucent Proprietary ROLEX 23

ROLEX

VOLCANO-Based Optimizer VOLCANO-Based Optimizer

VOLCANO [Graefe McKenna 1993]-style

  • ptimizer [Roy 2000] modified to support

ROLEX optimization New operators (NavNodes) added to model schema-tree nodes from the query Decorrelation plan-space modeled as distinct but equivalent NavNodes VOLCANO automatically collapses equivalent plan nodes across the tree

slide-24
SLIDE 24

Lucent Proprietary ROLEX 24

ROLEX

Cost Model Cost Model

Goal:

estimate expected unique calls for each node given a Navigation Profile

Approach:

Estimate, for each schema-tree node, n, in query plan: …expected # visits: EVis(n) …expected output from one call: ESize(n) …exp. unique parameter bindings: EUniqBind(s1,..,sk)

Unique calls = min(EVis(n),EUniqBind(s1,..,sk))

slide-25
SLIDE 25

Lucent Proprietary ROLEX 25

ROLEX

Materialization Materialization

For individual query operators, consider only the binding parameters present in that operator

σ σ

metroid=$m.id hotelid=$h.id

{$m} {$h} {$m,$h} {$m} {$h}

Materialize “opportunistically” as bindings increase:

Compute this sub- expression only when $m changes Generalizes [Rao Ross 98]

If EVis(n)>EUniqBind(s1,..,sk) for a node, then materialize that node’s result

slide-26
SLIDE 26

Lucent Proprietary ROLEX 26

ROLEX

Experimental Setup Experimental Setup

Execution engine built on top of DataBlitz™ tuple-layer interface

Handles complex queries including group-by and aggregation

Virtual DOM under construction Prototype for performance testing scans tree result

Obeys navigation probabilities Does a local ‘sprintf’ to model consumption of a node

slide-27
SLIDE 27

Lucent Proprietary ROLEX 27

ROLEX

Experimental Results Experimental Results

Vary the size of a simple view query Compare time needed to:

Execute query and traverse results Parse result with Xerces C++ DOM Parse result with … IDOM

First, validate the basic idea:

slide-28
SLIDE 28

Lucent Proprietary ROLEX 28

ROLEX

Do navigation probabilities matter? Do navigation probabilities matter?

Existence proof…

Create view v1 as <hotel> ( $h = Select hotelid, name, starrating, state_id From hotel ) <avail> ($a = SELECT rhotel_id, startdate, roomnumber FROM availability, guestroom WHERE type > 5 and rhotel_id = $h.hotelid AND startdate > 12/15/02 and r_id = a_r_id ) </avail>

Optimizer says: P1: 0 to .15 P2: .15 to .25 P3: .25 to 3.0

slide-29
SLIDE 29

Lucent Proprietary ROLEX 29

ROLEX

Size of Plan Space Size of Plan Space

Start with a complicated query Vary the probabilities on a subset of the edges so that expected number of output tuples varied exponentially (1, 2, 4, 8, …) While holding the other probabilities constant at 1 (or 0). Compute the number of distinct plans generated

slide-30
SLIDE 30

Lucent Proprietary ROLEX 30

ROLEX

Importance of Navigation Profile Importance of Navigation Profile

FROM confroom SELECT SUM(capacity) AND startdate = .startdate WHERE chotel_id = .hotelid AND enddate >= 2/1/01 <confstat> <metro_available> <hotel_available> <hotel> FROM confroom SELECT * AND startdate <= 1/1/01 WHERE chotel_id = .hotelid FROM metroarea SELECT metroid, metroname WHERE rhotel_id = .hotelid FROM availability, guestroom AND a_r_id = r_id <metro> SELECT * FROM hotel <confroom> WHERE metro_id = .metroid AND starrating > 4 SELECT COUNT(a_id) AND enddate >= 2/1/01 GROUP BY startdate SELECT COUNT(a_id) AND a_r_id = r_id FROM availability, hotel, guestroom WHERE rhotel_id = hotelid AND metro_id = .metroid

$h m $m $h $m Q = h c a Q = Q = $h $a Q = s Q = Q =

Binding Var.

$m

Tag

v $c $s $a $v $h $h $h $h $m $a, $m

Parameters

H-S-V H-A-C

slide-31
SLIDE 31

Lucent Proprietary ROLEX 31

ROLEX

Number of plans generated Number of plans generated

1 20 30 40 50

h-c-s-a-v h-c-s-a h-c-s-v h-c-a-v h-s-a-v h-c-a h-s-a h-c-s h-c-v h-a-v h-s-v c-s-a-v h-a h-c h-s h-v c-s-v c-a-v c-s-a c-v c-a s-a-v h c-s s-v a-v s-a v c a s

Navigation Profiles Varied Number of Plans

slide-32
SLIDE 32

Lucent Proprietary ROLEX 32

ROLEX

Summary: ROLEX Benefits Summary: ROLEX Benefits

Optimize Tree Query

Execute

DataBlitz

TM

DOM Tree View Query

ROLEX

Navigation Profile +

Lazy Evaluation Optimize for expected use

Cost model Materialization

Use DBMS data structures to support navigation Replace application cache Potential to support updates via DOM Incremental support for new XML tools

slide-33
SLIDE 33

Lucent Proprietary ROLEX 33

ROLEX

Related Work Related Work

Relational Publishing:

XPERANTO (IBM, Uwisc, Cornell) SilkRoute (AT&T & UWash)

Both compose application query and view into one expression that is optimized Neither supports navigation: full document must be materialized

[Florescu, Levy, Suciu, Yagoub, 1999]

considers navigation probabilities on a web site, performs a variety of heuristic optimizations

[Ludascher, Papakonstantinou, Velikhov, LNCS 2000]

lazy evaluation of DOM tree For slow sources, not

  • ptimized for expected

pattern

slide-34
SLIDE 34

Lucent Proprietary ROLEX 34

ROLEX

Future Work Future Work

Complete Virtual DOM Updates Key applications -- XSLT Beyond Navigation Profiles

More sophisticated profiles Adaptability

Optimizing large, complex correlated queries