9/17/2010 Todays lecture Advanced databases and data models: - PDF document

9/17/2010 Today´s lecture Advanced databases and data models: Native XML management Shredding Hybrid solutions Theme3: Efficient storage of XML SQL/XML HShreX HShreX Efficency Lena Strömbäck June 17, 2009 1 XML as a data model Storage possibilities for XML XML is richer than the relational model Tree structure, Order Storage Loading Querying design … XML XML XML XQuery Que y Vary from highly structured to unstructured Vary from highly structured to unstructured schema schema Docs Docs results results Database export … Native XML Annotated text documents DBMS Can contain links to other type of entities What does this mean for efficient storage? 1

9/17/2010 Native XML databases Storage possibilities for XML • Defines a (logical) model for an XML document Storage Loading Querying design • Elements, attributes, PCDATA, document order. XML XML XML XQuery schema Docs results • The XML document is the logical unit of storage Storage Loading Querying design • Can have any physical underlying storage model Mapping layer XML XML XML • Often: grouping documents, collections Often: grouping documents, collections XQuery XQuery schema schema Docs Docs results results Tuples • Query model: Xpath and in most cases Xquery Relational SQL Relational schema results Native XML Relational • Examples: DBMS DBMS • eXist http://exist-db.org/ • MarkLogic http://www.marklogic.com Shredding How to shred XML? Related work <?xml version="1.0" encoding="UTF-8"?> <families Families xmlns:xsi=http://www.w3.org/2001/XMLSchem Id Pid a-instance> Shredding of XML into relational storage <family> 0 - <parent> Family <name>Lena</name> <job>Lektor</job> Bohannon et al. 2002, (ICDE), Id Pid </parent> Florescu and Cossman 1999 (IEEE Data Eng), <child> 1 0 <name>Ludvig</name> Georgiadis and Vassalos 2007 (SIGMOD), g ( ), <school>Skolan</school> Parent Parent </child> Id Pid Name Job </family> </families> 2 1 Lena Lektor Grust et al. (2007) (SIMOD), Child Source Ordinal attrName isValue Value Mlynkova 2009 (DEXA) Id Pid Name School 0 1 Families False 1 3 1 Ludvig Skolan 1 1 Family False 2 2 1 Parent False 3 3 1 Name True Lena 3 2 Job True Docent .. 2

9/17/2010 Hybrid XML Storage New possibilities… <model id="Tyson1991CellModel_6" name="Tyson1991_CellCycle_6var"> <listOfSpecies> Species: + <species id="C2" name="cdc2k" compartment="cell"> + <species id="M" name="p-cyclin_cdc2" compartment="cell"> XML data Id Name Compartment + <species id="YP" name="p-cyclin" compartment="cell"> … XQuery C2 cdc2k cell </listOfSpecies> M p-cyclin_cdc2 cell XML results <listOfReactions> YP p-cyclin cell <reaction id="Reaction1" name="cyclin_cdc2k dissociation"> …. …. …. <annotation> <rdf:li rdf:resource="http://www.reactome.org/#REACT_6308"/> Reaction: XML data <rdf:li rdf:resource="http://www.geneontology.org/#GO:0000079"/> Id Name Annotation Formula </annotation> SQL/XQuery Mapping app g Reaction1 cyclin_cdc2k <annotation <kinetic_law …..> <listOfReactants> dissociation …..> XML /R l ti XML /Relational l <speciesReference species="M"/> f / Reaction2 cdc2k <annotation <kinetic_law ….> layer </listOfReactants> phosphorylation ….> results …. …. …. …. <listOfProducts> Reactants: Native <speciesReference species="C2"/> <speciesReference species="YP"/> Id Species </listOfProducts> Hybrid Reaction1 M <kineticLaw> Reaction2 …. <math xmlns="http://www.w3.org/1998/Math/MathML"> …. …. <apply> <times/> <ci> k6 </ci> <ci> M </ci> </apply></math> Products: <listOfParameters> <parameter id="k6" value="1“> </listOfParameters> Id Species </kineticLaw> Reaction1 C2 Hybrid Reaction1 YP </reaction> …. …. + <reaction id="Reaction2" name="cdc2k phosphorylation"> DBMS ... more reactions </listOfReactions> </model> </sbml> Hybrid storage and SQL/XML XML type CREATE TABLE: Hybrid XML storage Beyer et al. create table sbml_data (sbml_doc XML); Hua Liu et al. Rys et al. (2005) (SIGMOD) create table reaction ( id varchar(100) not null, Proposed as SQL standard name varchar(250), math XML, Available in: DB2, Microsoft SQL Server, Oracle annotation XML, primary key(id) More limited versions in other databases…. ) 3

9/17/2010 Querying Querying select select reactome_doc.query( sbml_doc.query('/sbml/model/listOfSpecies/species[@id 'for $react in /model/listOfReactions/reaction return <path> = "C2"]') <from> {data($react/listOfReactants/speciesReference/@species)} from sbml_data; </from> <via> {data($react/@id)} </via> { ($ @ )} <species id="C2" name="cdc2k" compartment="cell"> <to> {data($react/listOfProducts/speciesReference/@species)} </to> <annotation> …</annotation> </path>') from reactome_data; </species> <path><from>M</from><via>Reaction1</via><to>C2</to></path> <path><from>M</from><via>Reaction1</via><to>YP</to></path> … Transforming data to relations Transforming data to XML insert into reaction(id, name, annotation, formula) select reaction.reaction as name, reaction.id, (select select r.value('@id','varchar(100)') as id, (select speciesReference.reactant as species from reactants r.value('@name','varchar(250)') as name, speciesReference where reaction.id = speciesReference.id r.value('annotation','xml') as annotation, for XML auto, type) r value('kinetic formula' 'xml') as formula r.value( kinetic formula , xml ) as formula, from emptyXML as listOfReactants for XML auto, type), from (select (select (select speciesReference.product as species from products sbml_doc speciesReference where reaction.id = speciesReference.id .nodes(sbml/model/listOfReactions/reaction‘) for XML auto, type) from sbml_data) as d(r); from emptyXML as listOfProducts for XML auto, type) , from reaction for XML auto, type 4

9/17/2010 Syntax differs between DBMS´s (DB2) Efficiency: Increasing query complexity select sbml_doc.query( 'for $y in /model/listOfReactions/reaction/listOfModifiers/modifierSpeciesReference, $z in /model/listOfSpecies/species[@id = $y/@species] 4000 return <modifier> {$y/@species} {$y/../../@id} {$z/@compartment} </modifier>') from reactome_data; Native xquery Designed shredding 2000 for Automatic shredding $y in db2-fn:xmlcolumn('REACTOME_DATA.REACTOME_DOC')/model/ listOfReactions/reaction /listOfModifiers/modifierSpeciesReference, $z in db2-fn:xmlcolumn('REACTOME_DATA.REACTOME_DOC')/model/ listOfSpecies/species[@id = $y/@species] 0 return <modifier> {$y/@species} {$y/../../@id} {$z/@compartment} </modifier> Species Path Path (2 step) Path (3 step) Path (4 step) Efficiency: Efficiency: Combining representations Return the result as XML 500 400 4000 XQuery 300 XQuery (rdbms 1) Que y ( db s ) SQL (designed) SQL (rdbms1) 200 SQL (automatic) 2000 Mixed 100 Species Species Species Reaction Reaction Reaction 0 (1) (100) (1000) (1) (500) (10000) 3.4 5

9/17/2010 Higher complexity What are we doing? Numerous alternatives for XML storage Efficiency studies How to achieve models that are efficient and easy to use Dependent on application, XML data and query load Guidelines Tool development Guidelines: HShreX – a tool for evaluation Keep together what naturally belong together Queries XML data Do not shred parts of the XML where the schema allows large variation Take variations of the actual data into account XML Relational Shred elements that are critical for performance p Schema schema DBMS + + Prefer the representation that is required for query Annotations Shredding rules results Avoid shredding where future versions of the schema is likely to change. Avoid shredding if parallel versions of data is to be kept Extension of an old tool Shrex to allow hybrid storage 6

9/17/2010 Todays lecture Advanced databases and data models: - PDF document

9/17/2010 Todays lecture Advanced databases and data models: Native XML management Shredding Hybrid solutions Theme3: Efficient storage of XML SQL/XML HShreX HShreX Efficency Lena Strmbck June 17, 2009 1 XML as a data model

Eye and Brain Eye and Brain Central visual pathways 1 2/22/2010 2 2/22/2010 3 2/22/2010 4

I.M. Skaugen SE 3Q 2010 presentation IMS Innovative Maritime Solutions 15 October 2010 1

Financial Results for 4/2010- -9/2010 9/2010 Financial Results for 4/2010 and and Financial

2010 Interim Results 2010 Interim Results 12 August 2010 2010 Interim Results 2010 Interim

Amazing Android How We Built Square Friday, May 14, 2010 Friday, May 14, 2010 Friday, May 14,

2010 half-year results 2010 half-year results 26 July 2010 26 July 2010 26 July 2010 26 July

1 2 Monday, October 25, 2010 3 4 Monday, October 25, 2010 5 6 Monday, October 25, 2010 7

From Dev To Production Sam Newman QCon London 2010 Wednesday, 10 March 2010 Wednesday, 10 March

Fourth Quarter 2010 Results Fourth Quarter 2010 Results Fourth Quarter 2010 Results Fourth

CO CONF NFERE RENCE NCE CA CALL Q3 3 2010 2010 Rev Reven enue Q3 2010 Highlights Fas

2010 2010 LEVY HEARING LEVY HEARING December 14, 2010 Meeting of the Board NORTH SHORE SCHOOL

Results presentation for the year ended 30 June 2010 8 September 2010 1 1 De Vessey Village,

2010 Census Operational Press Briefing April 28, 2010 2000 and 2010 Average Mailback Participation

2010 Full Year Result 2010 Full Year Result 23 February 2011 2010 Full Year Result 2010 Full

AFRICACRYPT 2010 Call for Papers AFRICACRYPT 2010 Call for Papers Africacrypt 2010 You are

2010 2010 .............................................. London 19 May 2010 London, 19 May 2010

Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento

Modern Virtual Machine Performance murphee (Werner Schuster)

Constraint Handling Rules - Basic CHR programs and their analysis Prof. Dr. Thom Fr uhwirth |

You Only Live Multiple Times Black box re-use of Crash-Stop Algorithms In Realistic Crash-Recovery

S EMANTIC S OLUTIONS FOR O IL & G AS : R OLES AND R ESPONSIBILITIES

Falling oil prices and the impact on housing markets September 8, 2016 EMAD Global oil market

"Crisis in the Oil Market: Remembrances of the Past, Policy Responses for the Future"

Resilient Data Collection in Oil and Gas Refinery Sensor Networks Klara Nahrstedt (Presenter)

9/17/2010 Todays lecture Advanced databases and data models: - PDF document

9/17/2010 Todays lecture Advanced databases and data models: Native XML management Shredding Hybrid solutions Theme3: Efficient storage of XML SQL/XML HShreX HShreX Efficency Lena Strmbck June 17, 2009 1 XML as a data model

Eye and Brain Eye and Brain Central visual pathways 1 2/22/2010 2 2/22/2010 3 2/22/2010 4

I.M. Skaugen SE 3Q 2010 presentation IMS Innovative Maritime Solutions 15 October 2010 1

Financial Results for 4/2010- -9/2010 9/2010 Financial Results for 4/2010 and and Financial

2010 Interim Results 2010 Interim Results 12 August 2010 2010 Interim Results 2010 Interim

Amazing Android How We Built Square Friday, May 14, 2010 Friday, May 14, 2010 Friday, May 14,

2010 half-year results 2010 half-year results 26 July 2010 26 July 2010 26 July 2010 26 July

1 2 Monday, October 25, 2010 3 4 Monday, October 25, 2010 5 6 Monday, October 25, 2010 7

From Dev To Production Sam Newman QCon London 2010 Wednesday, 10 March 2010 Wednesday, 10 March

Fourth Quarter 2010 Results Fourth Quarter 2010 Results Fourth Quarter 2010 Results Fourth

CO CONF NFERE RENCE NCE CA CALL Q3 3 2010 2010 Rev Reven enue Q3 2010 Highlights Fas

2010 2010 LEVY HEARING LEVY HEARING December 14, 2010 Meeting of the Board NORTH SHORE SCHOOL

Results presentation for the year ended 30 June 2010 8 September 2010 1 1 De Vessey Village,

2010 Census Operational Press Briefing April 28, 2010 2000 and 2010 Average Mailback Participation

2010 Full Year Result 2010 Full Year Result 23 February 2011 2010 Full Year Result 2010 Full

AFRICACRYPT 2010 Call for Papers AFRICACRYPT 2010 Call for Papers Africacrypt 2010 You are

2010 2010 .............................................. London 19 May 2010 London, 19 May 2010

Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento

Modern Virtual Machine Performance murphee (Werner Schuster)

Constraint Handling Rules - Basic CHR programs and their analysis Prof. Dr. Thom Fr uhwirth |

You Only Live Multiple Times Black box re-use of Crash-Stop Algorithms In Realistic Crash-Recovery

S EMANTIC S OLUTIONS FOR O IL &amp; G AS : R OLES AND R ESPONSIBILITIES

Falling oil prices and the impact on housing markets September 8, 2016 EMAD Global oil market

&quot;Crisis in the Oil Market: Remembrances of the Past, Policy Responses for the Future&quot;

Resilient Data Collection in Oil and Gas Refinery Sensor Networks Klara Nahrstedt (Presenter)

S EMANTIC S OLUTIONS FOR O IL & G AS : R OLES AND R ESPONSIBILITIES

"Crisis in the Oil Market: Remembrances of the Past, Policy Responses for the Future"