OrientX: an Integrated, Schema- Based Native XML Database System - - PowerPoint PPT Presentation

orientx an integrated schema based native xml database
SMART_READER_LITE
LIVE PREVIEW

OrientX: an Integrated, Schema- Based Native XML Database System - - PowerPoint PPT Presentation

OrientX: an Integrated, Schema- Based Native XML Database System Meng Xiaofeng, Wang Xiaofeng, Xie Min, Zhang Xin, Zhou Junfeng School of information, Renmin University of China WISA2006 1 Introduction OrientX means: O riginal R UC I DK


slide-1
SLIDE 1

1

OrientX: an Integrated, Schema- Based Native XML Database System

Meng Xiaofeng, Wang Xiaofeng, Xie Min, Zhang Xin, Zhou Junfeng School of information, Renmin University of China WISA2006

slide-2
SLIDE 2

2/21

WISA2006

Introduction

  • OrientX means:

Original RUC IDKE Native XML Database – RUC: Renmin University of China – IDKE: Institute of Data and Knowledge Engineering – Native XML DataBase: Exposing a logical model

  • f storing and retrieving XML documents.

(non Native XML DataBase: for example, based on relation database)

slide-3
SLIDE 3

3/21

WISA2006

Outline

  • Architecture and Features
  • Storage and data management
  • Indexing Schema
  • Query processing
  • Conclusion and Future Work
slide-4
SLIDE 4

4/21

WISA2006

Architecture

slide-5
SLIDE 5

5/21

WISA2006

Features

  • Full support to XML Schema
  • Supporting XQuery1.0 and XPath2.0

Data Model

  • Various native storage techniques
  • Path index and value index
  • Multi-Query Processing strategies

based on native storage.

slide-6
SLIDE 6

6/21

WISA2006

Outline

  • Architecture and Features
  • Storage and data management
  • Indexing Schema
  • Query processing
  • Conclusion and Future Work
slide-7
SLIDE 7

7/21

WISA2006

Different storage granularities

  • Document:

– do not decompose the document, build index on it to direct the structure. – Query complexity and efficiency are restricted by the power of index.

  • Sub tree:

– decompose the document into sub trees according to storage space partition. – Persistent the structure in the tree. – save space

  • Node:

– decompose the document into nodes sequence , each node corresponding to a type (element, attribute, …). – May use too many links to persistent relation between nodes

slide-8
SLIDE 8

8/21

WISA2006

Storage Techniques in OrientX

CSB CEB Clustered BSB BEB Broad-first DB DSB DEB Depth-first Document- based SubTree- based Element- based

Implemented techniques are marked in red

One node is a record, through preorder traversing tree Like DEB, but each record is a sub-tree. The size of sub tree is close to physical page size One element is a record, but all node with the same tag name will be clustered-stored. Akin to DSB, each record is a sub tree. But all sub trees with the same structure are clustered store.

slide-9
SLIDE 9

9/21

WISA2006

Example-- Element based

r

a1 a2 t1 l1 l2 f2 f1 DEB CEB

r

a2 a1 l2 f2 l1 f1 t1 t1

r

a1 a2 f1 f2 l2 l1

Source doc

slide-10
SLIDE 10

10/21

WISA2006

Example-- Subtree based

r

a1 a2 t1 l1 l2 f2 f1

r

a1 a2 t1 l1 l2 f2 f1

DSB(Depth-first sub-tree based) CSB (clustered sub-tree based)

r

a1 a2 t1 l1 l2 f2 f1

DOC

Proxy node (virtual node)

Also have Proxy node

slide-11
SLIDE 11

11/21

WISA2006

Outline

  • Architecture and Features
  • Storage and data management
  • Indexing Schema
  • Query processing
  • Conclusion and Future Work
slide-12
SLIDE 12

12/21

WISA2006

Path index SUPEX: Index Architecture

slide-13
SLIDE 13

13/21

WISA2006

Features of SUPEX

  • Constructed based on DTD,Schema
  • Integrating path index with value indexes
  • Supporting Twig query efficiently
  • Supporting label path expressions

( bib//author)

  • Supporting the evaluation of value-based

condition predicates (//author[firstname = “jone”])

slide-14
SLIDE 14

14/21

WISA2006

Outline

  • Architecture and Features
  • Storage and data management
  • Indexing Schema
  • Query processing
  • Conclusion and Future Work
slide-15
SLIDE 15

15/21

WISA2006

Query processing

  • Navigation strategy

– Supporting XPath2.0 and XQuery1.0 – Combine continuous steps in one XPath into a single path. – Reform syntax tree into reduced execution plan. – Introducing the pipeline operator to XQuery process.

slide-16
SLIDE 16

16/21

WISA2006

1. Step 2. CondTreeNode 3. Path 4. ForVarBind 5. LetVarBind 6. FLWR 7. EleConstructor 8. AttrConstructor 9. BuiltInFun

  • 10. IfThenElse
  • 11. Quanlify
  • 12. SetOpt
  • 13. SortBy

Currently, Navigation Containing 13 operators:

Operators in Navigation

slide-17
SLIDE 17

17/21

WISA2006

General Steps to process XQuery

Parser and Translator

  • ptimizer

Evaluator Engine

XQuery Query Initial Query plan

  • ptimized Query plan
slide-18
SLIDE 18

18/21

WISA2006

The query plan

slide-19
SLIDE 19

19/21

WISA2006

Outline

  • Architecture and Features
  • Storage and data management
  • Indexing Schema
  • Query processing
  • Conclusion and Future Work
slide-20
SLIDE 20

20/21

WISA2006

Conclusion and Future Work

  • Conclusion:

– OrientX is an integrated, schema-based native XML database system. – It implements storing and querying xml data.

  • Future work:

– XQuery optimization. – Xml Update and Other XQuery processing engine.

slide-21
SLIDE 21

21/21

WISA2006

Thanks

Q&A☺

Welcome to our website http://idke.ruc.edu.cn to obtain more information about OrientX