5. Storage and Manipulation Foundations of XML Data of SSD - PDF document

5. Storage and Manipulation Foundations of XML Data of SSD Manipulation Shamelessly “inspired” by Ioana Manolescu tutorial Giorgio Ghelli The problem Classifying stores • Consider the queries • Essential criteria: – $doc // e-mail – Clustering – $doc //.[name = 'ghelli']/e-mail – Encoding of parent/child relationship • We do not want to bring the whole $doc in main memory • Set manipulation rather than tuple manipulation OEM data model Storing OEM Bib • No schema! &o1 complex object • Storing objects in LORE: paper paper book – Store the graph, clustered in depth-first references references &o12 &o24 &o29 order author pages author author title http titlepublisher title author year author – Operator: Scan(document,path), returns a &o43 &25 &96 set of objects 1997 last firstname firstname lastname first lastname atomic object &243 &206 “Abiteboul” “Serge” “Victor” “Vianu” 122 133

Indexing in LORE Access plans • Top down or bottom up navigation? • VIndex( l , o, pred ): all objects o with an select x incoming l-edge, satisfying the predicate from A.B x • LIndex( o , l , p): all parents of o via an l - where x.C = 5 edge A A A D D • BIndex(x, l , y): all edges labeled l D D B B B B B B B B B C C C C C C C C C 5 5 5 4 4 5 4 4 5 Access plans: bottom up Bottom up and top down select x select x from A.B x from A.B x where x.C = 5 where x.C = 5 Return(n2) Return(n2) Name(n3,'A') Name(n3,'A') Return(n1) NLJoin LIndex(n2,'B',n3) Select(n2,'=5') NLJoin LIndex(n2,'B',n3) LIndex(n1,'C',n2) Scan(n1,'C',n2) VIndex('C',n1,'=5') LIndex(n1,'C',n2) VIndex('C',n1,'=5') Scan('A','B',n1) Hybrid access plans Path indexes select x • PIndex( p , o): all objects reachable by from A.B x the path p where x.C = 5 Intersect Project(n2) Project(n2) Name(n1,'A') LIndex(n3,'C',n2) BIndex(n1,'B',n2) VIndex('C',n3,'=5')

Using a path index DataGuides select x • Introduced in Lore: a compact representation of all paths in the graph from A.B x • A DG for s is an OEM object d such that: where x.C = 5 – Every path of s reaches exactly one object in d – Every path in d is a path for s Intersect • DG: a schema a posteriori • Used to: PIndex('A.B',n2) Project(n2) – Inform the user LIndex(n1,'C',n2) – Expand wildcards in paths – Inform the optimizer VIndex('C',n1,'=5') DataGuides for trees Building a DataGuide A A A • Similar to converting a NFA to a DFA • Linear time for trees D D B D B D B • Exponential in time and space for C C C C C C graphs A A A • In practice, works well for regular ? structures D D B D B D B A C C A C C A C Which dataguide is better? Optimization via dataguide • Minimal dataguide • Expanding paths: a//z => a/b/z | a/c/d/z • Strong dataguide: if p1 and p2 both reach the • Deleting paths that are not in the data same node in d , then p1 and p2 have the • Contracting paths: a/c/e/d/z => //d/z same target set in s – Each target-set in the source has its own node – May be more efficient for a bottom-up and in the guide evaluation – Easy to build • Keeping statistic information – Easy to maintain, by keeping track of the many-to- many node correspondence between s and d – However, statistic are needed for every k- length path, not just for rooted paths

Graph Schemas Graph schemas A A A • Each edge in the scheme is labeled by a label predicate (a set of labels) D D B D B D ∨ B • Predicates are deterministic A C C A C C A C • Conformance is defined by a simulation between s and d: A – Root of data in root of schema – For every n1 in d1 with n1-l-n2, we have d1-l-d2 B C ¬ (B ∨ C) with n2 in d2 B • No request for surjectivity, or injectivity D E Graph indexing • Group nodes in sets, possibly disjoint • Store the extent of each set XML Storage in RDBMS • Grouping criteria: – Reachable by exactly the same Forward paths: 1- index – Indistinguishable by any F&B path: FB-index – Indistinguishable by the paths in a set Q: covering indexes – Indistinguishable by any path longer than k: A(k) index Using RDBMS for XML Storing data • Advantages • Data may be schema-less – Transactions • Data may have a schema – Optimization • Data may change its schema over time • Issues – Data storage – Query translation

Approaches Unknown schema: the edge table From Pos Tag To Data employees 1 • Based on schema: - 1 employees 1 1 1 emp 2 – Based only on schema 2 1 ID 3 1 emp 11 emp 21 2 2 FN 4 Nancy emp – Based on schema + cost informations 2 2 3 BD 5 5 1 Day 6 8 • Unknown schema: FstNm BirthD HiringD ID 5 2 Month 7 dec 3 4 5 9 5 3 Year 8 1968 – Derive schema from data Day 6 M 7 1 N Y 8 2 4 HD 9 8 dec 1968 – Generic approach 9 … … … 1 2 emp 11 • User defined 11 … … … 1 3 emp 21 Navigating the edge table Partitioned edge table employees From Pos Tag To Data • //FN/text(): - 1 employees 1 From Pos To Data select e.Data from edge e where e.Tag = 'FN' 1 1 emp 2 - 1 1 2 1 ID 3 1 • //emp[ID=‘1’]/FN/text() 2 2 FN 4 Nancy emp select e3.Data 2 3 BD 5 From Pos To Data srom edge e1, edge e2, edge e3 5 1 Day 6 8 1 1 2 5 2 Month 7 dec where e1.Tag = ‘emp’ and e1.to = e2.from 1 2 11 and e2.Tag = ‘ID’ and e2.Data = 1 5 3 Year 8 1968 1 3 21 and e1.to = e3.from and e3.Tag = ‘FN’ 2 4 HD 9 9 … … … ID • Navigation through multi-way join (XPath to 1 2 emp 11 FO translation) From Pos To Data 11 … … … 2 1 3 1 1 3 emp 21 Navigation Related storage schemes • //emp[ID=‘1’]/FN/text() • The universal relation: select e1.Data – employees � � � emp � � �� ID � � �� FN � � �� … from edge e1, edge e2, edge e3 • Materialized views over edges: where e1.Tag = ‘emp’ and e1.to = e2.from and e2.Tag = ‘ID’ and e2.Data = 1 – emp � � ID � � FN � � HD … and e1.to = e3.from and e3.Tag = ‘FN’ � select FN.Data • The STORED approach: from emp, ID, FN – Materialized views based on pattern where emp.to = ID.from and ID.Data = 1 frequencies in the database and emp.to = FN.from – Overflow tables for the rest • Joining smaller tables

Flat storage Path partitioning in Monet employees • For each root-to-inner-node path: emp emp emp – Path(n1,n2,ord): FstNm BirthD HiringD ID • employees.emp{(1,2,1);(1,11,2);(1,21,3)} • employees.emp.ID{(2,3,1);…} Day 6 M 7 1 N Y 8 8 dec 1968 • For each root-to-leaf path: – Path(n1,val) ID First Name BD-D BD-M BD-Y • employees.emp.ID.text{(3,’1’);…} 1 Nancy 8 dec 1968 2 Andrew 19 feb 1952 • Path summary 3 Janet 30 aug 1963 • No join for linear path expressions 4 Margaret 19 sep 1958 XRel approach: interval coding XParent • Tables: • Tables: – Path(PathID,PathExpr) – LabelPath(ID, lengh, pathExpr) – Element(DocID, PathID, Start, End, Ordinal) – Data(pathID, nID, ord, value) – Text(DocID, PathID, Start, End, Value) – Element(pathID, ord, nID) – Attribute(DocID, PathID, Start, End, Value) – ParentChild(pID, cID) • Ancestor relation: • Use ParentChild instead of interval – N1.start< N2.start and N2.end > N1.end coding: equi-join instead of <-join • Path expression: regexp matching with Path table, join the result with the data tables Schema-driven storage Sharing the address DB Employees Depts DB Employees Depts Employee* Dep* Name Employee* Dep* Address Name Street Number E-mail* Address GN FN Street GN FN Number Phone? Employee(PId,Id,Name,Name.GN,Name.FN) Dep(PId,Id) DB(Id,Employees,Depts) Address(PId,Id,Address,Addr.Street,Addr.Number) Employee(PId,Id,Name,Name.GN,Name.FN,Phone) E-Mail(PId,Id,E-mail) Employee(PId,Id,…,Address,Addr.Street,Addr.Number) Depts(PId,Id,Address,Addr.Street,Addr.Number) Dep(PId,Id,Address,Addr.Street,Addr.Number)

5. Storage and Manipulation Foundations of XML Data of SSD - PDF document

5. Storage and Manipulation Foundations of XML Data of SSD Manipulation Shamelessly inspired by Ioana Manolescu tutorial Giorgio Ghelli The problem Classifying stores Consider the queries Essential criteria: $doc //

Money Manipulation & the Effects on the International -Spencer Houston Community Definition

Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017

Manipulation in Political Stock Manipulation in Political Stock Markets Markets Koleman Strumpf

Recap: Strategic Manipulation We had seen two theorems that show that we cannot rule out strategic

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

5. Storage and Manipulation of SSD Shamelessly inspired by Ioana Manolescu tutorial 1

Mobile Manipulation and Mobility as Manipulation Design and Algorithms of RoboSimian DARPA

String manipulation and String manipulation and regexes regexes Programming for Statistical

Semi-Automated SVG Programming via Direct Manipulation Brian Hempel and Ravi Chugh Direct

CHAPTER 5: Direct Manipulation and Virtual Environments Examples of Direct-Manipulation Systems:

Direct Manipulation and Instrumental Interaction 1 CS349 - Direct Manipulation Review:

Electric Field Devices for Manipulation, Electric Field Devices for Manipulation, Directed

Building a Platform for Commercial Drones Buddy Michini CTO Venture-backed, 70-person startup

ownCloud@*.edu Christian Schmitz Director Business Development cschmitz@owncloud.com Executive

Deployment Coalition V2I DC Update on Issue Collaboration TWGs Focus on Issues Issue TWG 1

DeepSec 2011 Windows Pwn 7 OEM Owned Every Mobile? Alex Plaskett November 2011 Main

On the role of 5G in automotive industry Dr Mikael Fallgren, Ericsson Research 30 November 2018,

Region 10 Youth Preparedness Council (R10 YPC) Modeled after National YPC Purpose: Formed

New Blockcipher Modes of Operation with Beyond the Birthday Bound Security Tetsu Iwata

Cryptanalysis of JAMBU Thomas Peyrin 1 Siang Meng Sim 1 Lei Wang 1 Guoyan Zhang 1 , 2 , 3

5. Storage and Manipulation Foundations of XML Data of SSD - PDF document

5. Storage and Manipulation Foundations of XML Data of SSD Manipulation Shamelessly inspired by Ioana Manolescu tutorial Giorgio Ghelli The problem Classifying stores Consider the queries Essential criteria: $doc //

Money Manipulation &amp; the Effects on the International -Spencer Houston Community Definition

Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017

Manipulation in Political Stock Manipulation in Political Stock Markets Markets Koleman Strumpf

Recap: Strategic Manipulation We had seen two theorems that show that we cannot rule out strategic

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN &amp; Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

5. Storage and Manipulation of SSD Shamelessly inspired by Ioana Manolescu tutorial 1

Mobile Manipulation and Mobility as Manipulation Design and Algorithms of RoboSimian DARPA

String manipulation and String manipulation and regexes regexes Programming for Statistical

Semi-Automated SVG Programming via Direct Manipulation Brian Hempel and Ravi Chugh Direct

CHAPTER 5: Direct Manipulation and Virtual Environments Examples of Direct-Manipulation Systems:

Direct Manipulation and Instrumental Interaction 1 CS349 - Direct Manipulation Review:

Electric Field Devices for Manipulation, Electric Field Devices for Manipulation, Directed

Building a Platform for Commercial Drones Buddy Michini CTO Venture-backed, 70-person startup

ownCloud@*.edu Christian Schmitz Director Business Development cschmitz@owncloud.com Executive

Deployment Coalition V2I DC Update on Issue Collaboration TWGs Focus on Issues Issue TWG 1

DeepSec 2011 Windows Pwn 7 OEM Owned Every Mobile? Alex Plaskett November 2011 Main

On the role of 5G in automotive industry Dr Mikael Fallgren, Ericsson Research 30 November 2018,

Region 10 Youth Preparedness Council (R10 YPC) Modeled after National YPC Purpose: Formed

New Blockcipher Modes of Operation with Beyond the Birthday Bound Security Tetsu Iwata

Cryptanalysis of JAMBU Thomas Peyrin 1 Siang Meng Sim 1 Lei Wang 1 Guoyan Zhang 1 , 2 , 3

Money Manipulation & the Effects on the International -Spencer Houston Community Definition

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage