Storing XML Data In a Native Repository Kamil Toman - PowerPoint PPT Presentation

Storing XML Data In a Native Repository Kamil Toman ktoman@ksi.mff.cuni.cz Dept. of Software Engineering Faculty of Mathematics and Physics Charles University

Introduction ● Since 1998 XML has become a very popular standard for electronic interchange and application data ● XML documents don't need a rigid schema but they still offer a logical structure ● XML data originate from many different sources and are very heterogenous ● Greater flexibility creates a strong demand of XML Databases

XML Querying ● New XML query languages have been pro- posed – XPath and Xquery ● Both languages use the basic concept of path expressions ● Implementation of these languages on top of traditional relational and object-relational database systems is problematic ● Storing XML in object-oriented databases is ineffective ● Native XML databases are being developed

SXQ-DB ● Experimental native XML DB to store and manage collections of XML documents with a common DTD ● As the query language, SXQ (Simple Xquery) querying language is implemented ● The general and extensible modular architecture is built up on XMLCollection framework

SXQ-DB, Overall Architecture User Interface User Interface Query Processing Module XML Repository XML Repository XML Data XML Data

Document Representation ● XML Information Set augmented by relevant parts of XQuery Data Model ● Oriented tree where to each node is associ- ated a type and a label, vertices with a common parent ordered left-to-right – Text values of elements or attributes are represen- ted as artificial nodes – Mixed contents elements are modeled as trees

Document Representation <text>begin<bf>bold</bf>normal<it>italic</it>end</text> text 3 4 5 2 1 PCDATA it PCDATA bf PCDATA 1 1 1 1 1 “normal” “end” PCDATA “begin” PCDATA 1 1 “italic” “bold”

Node Identification ● Numbering scheme: a function that assigns a unique binary identifier to each node – This id can be used as a reference in an index or while query evaluation – Can be used as on document updates ● Primary: sequential numbering scheme ● Secondary: structural numbering scheme – Allows effective query evaluation utilizing structural joins

Node Identification (1,100,1) 3 contact (10,5,2) 9 (20,50,2) 4 name phone (11,0,3) 12 (40,10,3) (25,10,3) 5 18 ( office home “Joe” (45,0,4) 21 (30,0,4) 6 “192 837 465” “123 234 345”

XML Repository Architecture Common Infrastructure Value Storage DTD Storage Element Storage Structure Index Word Index Value Index

Physical Access To External Memory ● All XML nodes identifiers, their types and adjacent node identifiers are stored into individual fixed-length records in a binary file ● For effective access all records are indexed in a B+-tree ● Better representation of more complex relations between nodes is left to structural indices ● The system resources are limited – paging mechanism is used

Object Cache ● XML nodes are accessed frequently but – the information is mostly short-lived – Every node must be first looked up in an index (possibly unbuffered), its respective page has to be computed and fetched ● To avoid this, secondary object cache is implemented ● All cache objects are kept in main memory at all times and only reinitialized with new data

Query Processing Module XML Query Lexical Analyzis Symbols XML Repository Syntactic Analyzis Syntactic tree Query Normalization Document Canonic Tree Data Model Information Query Optimization Plan Generation Query Plan Evaluation Query Result

Sources of Difficulties ● Size of indices – Besides common word or value indices, additional indices are needed for structural joins or effective tree traversals ● Slow updates: – Not only data but even the structure of XML documents may change significantly – Expensive index updates may be needed ● Generality of XML query languages – Both XPath and XQuery are Turing-complete

Other Native XML Databases ● TIMBER – XML tree algebra (TAX) approach – XQuery subset translated to TAX operations ● eXist – Lightweight, can manage only small to medium sized XML documents – XPath subset + fulltext extensions ● dbXML – Using B-trees, fully updatable – Navigational approach + large indices ● Xindice – XPath fully implemented, navigational approach – XUpdate supported

Conclusion & Future Work ● Efficient XML database is achievable – Chosen data model is sufficient for implementation of the most important parts of XQuery – Managing dynamic XML data is much harder than static XML documents ● Future work should be probably focused on – Finding a more general way how to express and evaluate the most common XML queries – Reducing space needed for structural and term indices of the database

References ● M. Kopecny (2002): Implementacni prostredi pro kolekce XML dat (thesis, in Czech). MFF UK. ● K. Toman(2003): XML data na disku jako databaze (thesis, in Czech). MFF UK. ● J. Cowan, R. Tobin (2001): XML Information Set. http://www.w3.org/TR/xml-infoset ● J. Clark, S. DeRose (1999): XML Path Language (XPath 1.0) http://www.w3.org/TR/xpath ● M. Marchiori (2003): XML Query Specifications. http://www.w3.org/XML/Query#specs ● E. Cohen, H. Caplan, T. Milo (2002): Labeling XML Trees. Symposium on PODS, p. 271-281

Storing XML Data In a Native Repository Kamil Toman - PowerPoint PPT Presentation

Storing XML Data In a Native Repository Kamil Toman ktoman@ksi.mff.cuni.cz Dept. of Software Engineering Faculty of Mathematics and Physics Charles University Introduction Since 1998 XML has become a very popular standard for electronic

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios,

XML and Web Services Lecture 8 1 Outline XML (Section 17) XML syntax, semistructured

Binary XML and its Characterization Robin Berjon, XML Prague, 25/06/2005 What is Binary XML?

XML in Programming Patryk Czarnik XML and Applications 2015/2016 Lecture 5 4.04.2016 XML in

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.

Java 2 Micro Edition XML F. Ricci 2010/2011 J2Me XML overview XML, REST Parsing XML :

Querying XML Documents Querying XML Documents How XML may be supported in databases with

Module 5: XML Modeling & Storage The major aspects of storing XML include Concepts: Data and

Module 5: XML Modeling & Storage The major aspects of storing XML include Concepts: Data and

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

Session 23 XML XML Reading and Reference Reading https://en.wikipedia.org/wiki/XML

XML and Content Management Lecture 3: Modelling XML Documents: XML Schema Maciej Ogrodniczuk,

Modelling XML Applications Patryk Czarnik XML and Applications 2015/2016 Lecture 2

XML Walking the Tree Modifying the Tree Generating XML Documents Creating Documents Volker

Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2

Catholic Health Australia in Action 2015 Chief Executive Officer Report Suzanne Greenwood, CEO

Beyond GDP? Welfare across Countries and Time Chad Jones and Pete Klenow Stanford University and

Selected issues I. IMF resources and reform II. Financial Transaction Tax III. Tax havens

Th The Loc e Localist t Solution How incentives can drive economic development (and make

Performance Management Research Data Set 21 st ITS World Congress September 11, 2014 Peter

TOC 1 Introduction to Differential Equations 1.1 Preliminaries 1.2 Differential Equations; Basic

TOCTOU, Traps, & Trusted Computing Sergey Bratus Nihal D'Cunha Evan Sparks Sean Smith

Practical GNOME Programming with Ruby FOSDEM 2004 Tutorial Laurent Sansonetti - lrz@gnome.org

Storing XML Data In a Native Repository Kamil Toman - PowerPoint PPT Presentation

Storing XML Data In a Native Repository Kamil Toman ktoman@ksi.mff.cuni.cz Dept. of Software Engineering Faculty of Mathematics and Physics Charles University Introduction Since 1998 XML has become a very popular standard for electronic

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios,

XML and Web Services Lecture 8 1 Outline XML (Section 17) XML syntax, semistructured

Binary XML and its Characterization Robin Berjon, XML Prague, 25/06/2005 What is Binary XML?

XML in Programming Patryk Czarnik XML and Applications 2015/2016 Lecture 5 4.04.2016 XML in

XML Documents XML Documents The XML Namespace mechanism Anders Mller &amp; Michael I.

Java 2 Micro Edition XML F. Ricci 2010/2011 J2Me XML overview XML, REST Parsing XML :

Querying XML Documents Querying XML Documents How XML may be supported in databases with

Module 5: XML Modeling &amp; Storage The major aspects of storing XML include Concepts: Data and

Module 5: XML Modeling &amp; Storage The major aspects of storing XML include Concepts: Data and

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

Session 23 XML XML Reading and Reference Reading https://en.wikipedia.org/wiki/XML

XML and Content Management Lecture 3: Modelling XML Documents: XML Schema Maciej Ogrodniczuk,

Modelling XML Applications Patryk Czarnik XML and Applications 2015/2016 Lecture 2

XML Walking the Tree Modifying the Tree Generating XML Documents Creating Documents Volker

Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2

Catholic Health Australia in Action 2015 Chief Executive Officer Report Suzanne Greenwood, CEO

Beyond GDP? Welfare across Countries and Time Chad Jones and Pete Klenow Stanford University and

Selected issues I. IMF resources and reform II. Financial Transaction Tax III. Tax havens

Th The Loc e Localist t Solution How incentives can drive economic development (and make

Performance Management Research Data Set 21 st ITS World Congress September 11, 2014 Peter

TOC 1 Introduction to Differential Equations 1.1 Preliminaries 1.2 Differential Equations; Basic

TOCTOU, Traps, &amp; Trusted Computing Sergey Bratus Nihal D'Cunha Evan Sparks Sean Smith

Practical GNOME Programming with Ruby FOSDEM 2004 Tutorial Laurent Sansonetti - lrz@gnome.org

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.

Module 5: XML Modeling & Storage The major aspects of storing XML include Concepts: Data and

Module 5: XML Modeling & Storage The major aspects of storing XML include Concepts: Data and

TOCTOU, Traps, & Trusted Computing Sergey Bratus Nihal D'Cunha Evan Sparks Sean Smith