Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson - - PowerPoint PPT Presentation

symmetrically exploiting xml
SMART_READER_LITE
LIVE PREVIEW

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson - - PowerPoint PPT Presentation

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference May 2006 Edinburgh, Scotland 1970s


slide-1
SLIDE 1

Symmetrically Exploiting XML

Shuohao Zhang and Curtis Dyreson

School of E.E. and Computer Science Washington State University Pullman, Washington, USA

The 15th International World Wide Web Conference May 2006 Edinburgh, Scotland

slide-2
SLIDE 2

Symmetrically Exploiting XML: Zhang, Dyreson

  • Hierarchical model vs. relational model
  • Codd: symmetric exploitation of data

part/project works on some, but not all

  • Path expressions are asymmetric
  • Currently, all XML query languages use path expressions

1970’s Database Controversy

Part Project Project Part Commit Project Part

slide-3
SLIDE 3

Symmetrically Exploiting XML: Zhang, Dyreson

Querying Data with Path Expressions

  • Task

Find books by E. F. Codd

  • XQuery

return doc("author.xml")//author[name= 'E. F. Codd']/book

name author book book title title publisher publisher price price

Addison Wesley Academic Press DB 46.95 Automata 9.99

  • E. F. Codd
slide-4
SLIDE 4

Symmetrically Exploiting XML: Zhang, Dyreson

Same Data, Different Structure

  • Same task

Find books by E. F. Codd

  • Need different XQuery

return doc("book.xml")//book[author/name='E. F. Codd']

publisher book book title title author author price price

Addison Wesley DB 46.95 Automata 9.99

name

  • E. F. Codd

publisher

Academic Press

name

Codd

name author book book title title publisher publisher price price

Addison Wesley Academic Press DB 46.95 Automata 9.99

  • E. F. Codd
slide-5
SLIDE 5

Symmetrically Exploiting XML: Zhang, Dyreson

Goal

  • Make same query work on different structures
  • Useful when there is

lack of schema knowledge heterogeneous data irregular data schema evolution

  • Factor off problem of different label sets, others are

working on it

slide-6
SLIDE 6

Symmetrically Exploiting XML: Zhang, Dyreson

Existing Axes are Directional

preceding following descendent ancestor self

slide-7
SLIDE 7

Symmetrically Exploiting XML: Zhang, Dyreson

Proposal: A Non-directional Axis

preceding following descendent ancestor self

slide-8
SLIDE 8

Symmetrically Exploiting XML: Zhang, Dyreson

Proposal: A Non-directional Axis

preceding following descendent ancestor self

slide-9
SLIDE 9

Symmetrically Exploiting XML: Zhang, Dyreson

Proposal: A Non-directional Axis

preceding following descendent ancestor self

slide-10
SLIDE 10

Symmetrically Exploiting XML: Zhang, Dyreson

The Closest Axis

  • Syntax

closest::

  • >name is abbreviation for closest::name
  • Semantics

a function that takes a context node and returns a sequence of

closest nodes

slide-11
SLIDE 11

Symmetrically Exploiting XML: Zhang, Dyreson

Closest Axis of the First Title

  • closest::*

Returns a list of five nodes

  • closest::price

Returns the first price node

name author book book title title publisher publisher price price

slide-12
SLIDE 12

Symmetrically Exploiting XML: Zhang, Dyreson

  • Node selection restricted by minimal type distance

The minimal distance between a title and a price is 2

  • closest::price

Returns an empty list

When the First Book Lacks a Price

name author book book title title publisher publisher price

slide-13
SLIDE 13

Symmetrically Exploiting XML: Zhang, Dyreson

  • closest::name for each book?
  • Root-to-node path type

author/name author/book/publisher/name

Type Distance is Crucial

name author book book title title publisher publisher price name

slide-14
SLIDE 14

Symmetrically Exploiting XML: Zhang, Dyreson

Querying with the Closest Axes

Closest axis-enabled XQuery evaluation engine

Query

Same query --

return doc("any.xml")->author[->name='E. F. Codd']->book Query Result#2 Result#3 Query Result#1

slide-15
SLIDE 15

Symmetrically Exploiting XML: Zhang, Dyreson

Querying with Directional Axes

XQuery evaluation engine

Query#1 -- return doc("author.xml")//author[name= 'E. F. Codd']/book Query#2 -- …… Query#3 -- return doc("book.xml")//book[author/name='E. F. Codd'] Result#2 Result#3 Result#1

slide-16
SLIDE 16

Symmetrically Exploiting XML: Zhang, Dyreson

Find the closest price for title

Non-directional expression closest::price Directional (path) expression parent::*/child::price

  • Naïve approach

Compute Closest for every node Time complexity is O(sn2)

s: number of labels in the signature n: number of nodes

  • Converting to a path expression

In-memory Implementation

name author book title publisher price

slide-17
SLIDE 17

Symmetrically Exploiting XML: Zhang, Dyreson

Experiment

  • Compare directional vs. nondirectional

for $b in doc("bib.xml")//title/closest::publisher return $b for $b in doc("bib.xml")//title/..//publisher return $b

  • Implemented closest in

eXist (an XML DBMS)

200 400 600 800 1000 1200 1400 1600 2 5 5 7 5 1 1 2 5 1 5 Number of Nodes Time (milliseconds) descendant closest

slide-18
SLIDE 18

Symmetrically Exploiting XML: Zhang, Dyreson

Persistent Implementation

  • Take advantage of type indexes
  • LCA-join

Every Closest pair related via an LCA Idea is to merge lists of types O(sn)

… … current lca direction of merge … … current parent … … current child

slide-19
SLIDE 19

Symmetrically Exploiting XML: Zhang, Dyreson

Related Work

  • Data integration
  • TSIMMIS
  • Garcia-Molina et al. (Journal of Intelligent Information Systems 1997)
  • YAT
  • Christophides, Cluet, Simèon (SIGMOD Record June 2000)
  • Silkroute
  • Fernandez, Tan, Suciu (WWW 2000)
  • LCA-related techniques
  • Schmidt, Kersten, Windhouwer (ICDE 2001)
  • Cohen, Mamou, Kanza, Sagiv (VLDB 2003)
  • Li, Yu, Jagadish (VLDB 2004)
slide-20
SLIDE 20

Symmetrically Exploiting XML: Zhang, Dyreson

Related Research Projects

  • XML Restructuring

Zhang, Dyreson (IIWeb 2006)

  • XML Compaction

Zhang, Dyreson, Dang (DASFAA 2006)

  • Common theme – symmetric exploitation!
slide-21
SLIDE 21

Symmetrically Exploiting XML: Zhang, Dyreson

Conclusion

  • Current XQuery depends on path expressions
  • A path expression is directional (asymmetric)

May break down if structure changes

  • The closest axis is non-directional (symmetric)

Simple in syntax

Can be easily integrated in XQuery

Can be implemented efficiently

In-memory Persistent

slide-22
SLIDE 22

Thank You!