towards schema independent querying on document data
play

Towards Schema-independent Querying on Document Data Stores H. BEN - PDF document

Towards Schema-independent Querying on Document Data Stores H. BEN HAMADOU 1 , F. GHOZZI 2 , A. PENINOU 1 , O. TESTE 1 1 IRIT , Univesit de Toulouse - France UT3, UT2J 2 MIRACL, Universit de Sfax - Tunisie ISIMS hamdi.ben-hamadou@irit.fr


  1. Towards Schema-independent Querying on Document Data Stores H. BEN HAMADOU 1 , F. GHOZZI 2 , A. PENINOU 1 , O. TESTE 1 1 IRIT , Univesité de Toulouse - France UT3, UT2J 2 MIRACL, Université de Sfax - Tunisie ISIMS hamdi.ben-hamadou@irit.fr 26-03-2018, DOLAP’18 H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 1 / 28 Introduction Document-oriented Database Documement-oriented Database Data format: Semi-structured documents, JSON, BSON . . . Data model: Schema-less Advantage: Big data support, Scalability, Availability Example: MongoDB, CouchDB Applications: Web, IoT, social media . . . Interrogation: JDBC, Drivers, API, Command line . . . H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 2 / 28

  2. Introduction Backgrounds Modeling Multi-structured Data Collection C = { d 1 , . . . , d c } Document d i = ( k i , v i ) k i is the document’ identi fi er. v i = { a i , 1 : v i , 1 , . . . , a i , n : v i , n i } is the document’ value. Document Schema s i = { p 1 , . . . , p m } where p i is a path leading to leaf node in document d i . Collection Schema S = � � C � i = 1 s i H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 3 / 28 Introduction Backgrounds Structural Heterogeneity Document 3 Document 1 { "_id": 3, { "title": "Despicable Me 3", "_id": 1, "year":2017 "title":"Fast and furious", } "year":2017 , "language":"English" Document 4 } { Document 2 "_id": 4, "title": "The Hobbit", { "versions": "_id": 2, [{ "title": "Titanic", "year":2012, "details": "language":"English" { }, "year":1997, { "language":"English" "year":2013, } "language":"French" } }] } H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 4 / 28

  3. Introduction Querying Semi-structured Documents Query Operators Kernel of Unary Operators k = { π , σ } Projection Operator π ( A ) ( C in ) = C out The project operator reduces the initial schemas of documents to a fi nite subset of attributes A . Selection Operator σ ( P ) ( C in ) = C out The select operator retrieves only documents that match the selection condition P expressed in normal form ( Norm p ). H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 5 / 28 Introduction Querying Semi-structured Documents Querying Multi-structured Data Problem π (“ title ” , “ year ”) (C) Document 3 Document 1 { "_id": 3, { "title": "Despicable Me 3", "_id": 1, "year":2017 "title":"Fast and furious", } "year":2017 , "language":"English" Document 4 } { Document 2 "_id": 4, "title": "The Hobbit", "versions": "_id": 2, [{ "title": "Titanic", "year":2012 , "details": "language":"English" { }, "year":1997 , { "language":"English" "year":2013 , } "language":"French" }] } H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 6 / 28

  4. Introduction Querying Semi-structured Documents Querying Multi-structured Data Problem π (“ title ” , “ year ”) (C) Document 3 Document 1 { "_id": 3, { "title": "Despicable Me 3", "_id": 1, "year":2017 "title": "Fast and furious", } "year":2017 "language":"English" } Document 4 Document 2 { "_id": 4, "title": "The Hobbit"", { "versions": "_id": 2, [{ "title": "Titanic", "year":2012 "details": "language":"English" { }, "year":1997 { "language":"English" "year":2013 } "language":"French" } }] } H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 6 / 28 Introduction Querying Semi-structured Documents Querying Multi-structured Data Problem π (“ title ” , “ year ” , “ details . year ” , “ versions . 1 . year ” , “ versions . 2 . year ”) (C) H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 6 / 28

  5. Introduction Querying Semi-structured Documents Querying Multi-structured Data Problem π (“ title ” , “ year ” , “ details . year ” , “ versions . 1 . year ” , “ versions . 2 . year ”) ( C ) Document 3 Document 1 { "_id": 3, { "title": "Despicable Me 3", "_id": 1, "year":2017 "title": "Fast and furious", } "year":2017 "language":"English" } Document 4 Document 2 { "_id": 4, "title": "The Hobbit"", { "versions": "_id": 2, [{ "title": "Titanic", "year":2012 "details": "language":"English" { }, "year":1997 { "language":"English" "year":2013 } "language":"French" } }] } H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 6 / 28 Querying Heterogeneous Documents Plan Introduction 1 Querying Heterogeneous Documents 2 Experiments 3 Conclusion & perspectives 4 H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 7 / 28

  6. Querying Heterogeneous Documents State of The Art Physical data transformation Flattening data. Using additional databases. Introducing new structures. [( Chasseuretal ., 2013 ) , ( Taharaetal ., 2014 )( Taharaetal ., 2014 )] ⇒ Need to learn new schema . ⇒ Loss of initial document schemas / structures . ⇒ Need to re − build new schemas when structres are changed . H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 8 / 28 Querying Heterogeneous Documents State of The Art Virtual data transformation Inferring existing schemas. Building an uni fi ed schema. Tracking di ff erent schemas versions. [(Baazizi et al., 2017),(Ruiz et al., 2015),(Wang et al., 2015)] ⇒ Need to learn new structures . ⇒ Querying is only limited to structural level . ⇒ Heterogeneity is manually managed to formulate application queries . H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 9 / 28

  7. Querying Heterogeneous Documents Our Approach EasyQ Figure: EasyQ Architecture H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 10 / 28 Querying Heterogeneous Documents Dictionary Dictionary The dictionary dict C constructed from a collection C is de fi ned by dict C = { ( p k , � k ) } ∀ p k ∈ S C p k ∈ S C is a path leading to a leaf node which is present in at least one document; � k = { p p k , 1 , . . . , p p k , q } ⊆ S C , is a set of navigational paths leading to p k ; H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 11 / 28

  8. Querying Heterogeneous Documents Dictionary Dictionary Construction Process “ year ” Document 3 Document 1 { "_id": 3, { "title": "Despicable Me 3", "_id": 1, "year":2017 "title": "Fast and furious", } "year":2017, "language":"English" } Document 4 Document 2 { "_id": 4, "title": "The Hobbit", "versions": "_id": 2, [{ "title": "Titanic", "year":2012, "details": "language":"English" { }, "year":1997, { "language":"English" "year":2013 } "language":"French" }] } H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 12 / 28 Querying Heterogeneous Documents Dictionary Dictionary Construction Process dict = { (“ year �� , { “ year �� , “ details . year �� , “ versions . 1 . year �� , “ versions . 2 . year ” } ) } H. BEN HAMADOU et al. (IRIT) Schema-independent Querying 26-03-2018, DOLAP’18 12 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend