On bridging relational and document-centric data stores John - - PowerPoint PPT Presentation

on bridging relational and document centric data stores
SMART_READER_LITE
LIVE PREVIEW

On bridging relational and document-centric data stores John - - PowerPoint PPT Presentation

On bridging relational and document-centric data stores John Roijackers and George Fletcher Eindhoven University of Technology BNCOD 2013 Oxford, UK 9 July 2013 Outline 1 Introduction 2 Theoretical framework 3 Experimental framework 4


slide-1
SLIDE 1

On bridging relational and document-centric data stores

John Roijackers and George Fletcher

Eindhoven University of Technology

BNCOD 2013 Oxford, UK 9 July 2013

slide-2
SLIDE 2

Outline

1 Introduction 2 Theoretical framework 3 Experimental framework 4 Empirical analysis 5 Conclusions

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 2 / 29

slide-3
SLIDE 3

Outline

1 Introduction 2 Theoretical framework 3 Experimental framework 4 Empirical analysis 5 Conclusions

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 3 / 29

slide-4
SLIDE 4

NoSQL

Definition An SQL database is a “traditional” database built around the relational model, which can be queried using SQL. A NoSQL database is a database that is not an SQL database. Data is not stored in relations and/or the main query language to retrieve data is not SQL.

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 4 / 29

slide-5
SLIDE 5

NoSQL

Definition An SQL database is a “traditional” database built around the relational model, which can be queried using SQL. A NoSQL database is a database that is not an SQL database. Data is not stored in relations and/or the main query language to retrieve data is not SQL. Properties of NoSQL databases: Non-relational, schemaless, e.g., “document”-oriented (JSON) Often, distributed architecture

sacrifice full Atomicity, Consistency, Isolation, Durability

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 4 / 29

slide-6
SLIDE 6

Problem statement

Situation: Different types of data for a single application Data is split over an SQL and a (document-centric) NoSQL database Data separation problem: Combine data from both sources

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 5 / 29

slide-7
SLIDE 7

Problem statement

SQL data: Places id name population stations 1 New York City 8.2 M 468 2 Paris 2.2 M 300

Table: Relational data

NoSQL data: {name : Alexis, age : 37, father : {name : Alexis, place : 2}} {name : Bob, gender : male, age : 37, father : {name : Bob, place : 1}}

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 6 / 29

slide-8
SLIDE 8

Problem statement

SQL data: Places id name population stations 1 New York City 8.2 M 468 2 Paris 2.2 M 300

Table: Relational data

NoSQL data: {name : Alexis, age : 37, father : {name : Alexis, place : 2}} {name : Bob, gender : male, age : 37, father : {name : Bob, place : 1}} Query For people with a known gender, 37 years old, and named after their father, give the father’s name and place name.

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 6 / 29

slide-9
SLIDE 9

Solution direction

Abstraction Query Result SQL NoSQL

Figure: Desired database abstraction architecture

Solution directions: Separate software layer Virtually load SQL data in NoSQL Virtually load NoSQL data in SQL

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 7 / 29

slide-10
SLIDE 10

Solution direction

Abstraction Query Result SQL NoSQL

Figure: Desired database abstraction architecture

Solution directions: Separate software layer Virtually load SQL data in NoSQL Virtually load NoSQL data in SQL

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 7 / 29

slide-11
SLIDE 11

Solution direction

Advantages of loading NoSQL data in SQL: Mature basis for the hybrid database SQL is a well-known standard SQL database can take care of more advanced tasks

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 8 / 29

slide-12
SLIDE 12

Solution direction

Advantages of loading NoSQL data in SQL: Mature basis for the hybrid database SQL is a well-known standard SQL database can take care of more advanced tasks General approach: Reconstruct the NoSQL data in SQL Extend SQL to support NoSQL data

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 8 / 29

slide-13
SLIDE 13

Related work

Logical representation “First-order normal form” (Litwin et al., SIGMOD Rec. 1991) SQL query pattern extensions RDF querying in Oracle (Chong et al., VLDB 2005)

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 9 / 29

slide-14
SLIDE 14

Related work

Logical representation “First-order normal form” (Litwin et al., SIGMOD Rec. 1991) SQL query pattern extensions RDF querying in Oracle (Chong et al., VLDB 2005) Very early days for querying document-centric stores Atzeni, Bugiotti, Rossi (EDBT 2012, CAiSE 2012) Chasseur, Li, Patel (WebDB 2013) Standardization of document query languages

UnQL JSONiq

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 9 / 29

slide-15
SLIDE 15

Outline

1 Introduction 2 Theoretical framework 3 Experimental framework 4 Empirical analysis 5 Conclusions

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 10 / 29

slide-16
SLIDE 16

NoSQL data representation

{ name: Alexis, age: 37, father: { name: Alexis, place: 2 } }, { name: Bob, gender: male, age: 37, father: { name: Bob, place: 1 } } NoSQL F id key value i1 name Alexis i1 age 37 i1 father i2 i2 name Alexis i2 place 2 i3 name Bob i3 gender male i3 age 37 i3 father i4 i4 name Bob i4 place 1

Table: Logical NoSQL relation

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 11 / 29

slide-17
SLIDE 17

Query language

Query For people with a known gender, 37 years old, and named after their father, give the father’s name and place name. NoSQL query pattern: ( name: ?n, gender: ?g, age: 37, father: ( name: ?n, place: ?p ) )

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 12 / 29

slide-18
SLIDE 18

Query language

Include in SQL query: Alias pattern and treat as relation Use variable names as attributes in SQL

1

SELECT

2

f.n,

3

p.name

4

FROM

5

NoSQL(

6

name: ?n,

7

gender: ?g,

8

age: 37,

9

father: (

10

name: ?n,

11

place: ?p

12

)

13

) AS f,

14

places AS p

15

WHERE

16

f.p = p.id

Listing 1: Example SQL query with included NoSQL query pattern

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 13 / 29

slide-19
SLIDE 19

Query translation

General relation construction with renaming: Ft = ρFt(it,kt,vt) (F) Add selection conditions to Ft based on pattern: NoSQL query pattern Translation Selection ( name: ?n, Fname σkey=name gender: ?g, Fgender σkey=gender age: 37, Fage σkey=age∧value=37 father: ( Ffather σkey=father name: ?n, Ffather,name σkey=name place: ?p Ffather,place σkey=place ) )

Table: Query translation example

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 14 / 29

slide-20
SLIDE 20

Query translation

NoSQL query pattern (name : ?n, gender : ?g, age : 37, father : (name : ?n, place : ?p)) Join relations using id attributes: Fname ⋊ ⋉iname=igender Fgender ⋊ ⋉igender=iage Fage ⋊ ⋉iage=ifather Ffather ⋊ ⋉vfather=ifather,name Ffather,name ⋊ ⋉ifather,name=ifather,place Ffather,place Variables in pattern: Replace with corresponding F relation attribute Add equality condition if the same variable is reused

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 15 / 29

slide-21
SLIDE 21

Query translation

1

SELECT

2

f.n,

3

p.name

4

FROM

5

NoSQL(

6

name: ?n,

7

gender: ?g,

8

age: 37,

9

father: (

10

name: ?n,

11

place: ?p

12

)

13

) AS f,

14

places AS p

15

WHERE

16

f.p = p.id

πvname,p.name( σvname=vfather,name( Fname ⋊ ⋉iname=igender Fgender ⋊ ⋉igender=iage Fage ⋊ ⋉iage=ifather Ffather ⋊ ⋉vfather=ifather,name Ffather,name ⋊ ⋉ifather,name=ifather,place Ffather,place ) ⋊ ⋉vfather,place=p.id ρp (places) )

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 16 / 29

slide-22
SLIDE 22

Query processing strategies

Data filtering Selection pushdown

Let F only contain NoSQL data that matches the pattern Selection conditions of F relation copies can be combined

Projection pushdown

Only variables in NoSQL query pattern can be used in query Exclude unnecessary records in F and relations from the query

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 17 / 29

slide-23
SLIDE 23

Query processing strategies

Data filtering Selection pushdown

Let F only contain NoSQL data that matches the pattern Selection conditions of F relation copies can be combined

Projection pushdown

Only variables in NoSQL query pattern can be used in query Exclude unnecessary records in F and relations from the query

Temporary relation First query to materialize records in temporary relation T Let each Ft use T instead of F

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 17 / 29

slide-24
SLIDE 24

Query processing strategies

Data filtering Selection pushdown

Let F only contain NoSQL data that matches the pattern Selection conditions of F relation copies can be combined

Projection pushdown

Only variables in NoSQL query pattern can be used in query Exclude unnecessary records in F and relations from the query

Temporary relation First query to materialize records in temporary relation T Let each Ft use T instead of F Tuple reconstruction facilitate nested document reconstruction by extending F with an additional document id field include equality selection on this field for each F used during query processing

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 17 / 29

slide-25
SLIDE 25

Query workflow

User query SQL NoSQL SQL

Translation

Query

Merge

NoSQL data F view

Transformation

SQL data

Execution Stream in

Output

Figure: Architectural overview illustrating the life of an SQL+NoSQL query

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 18 / 29

slide-26
SLIDE 26

Outline

1 Introduction 2 Theoretical framework 3 Experimental framework 4 Empirical analysis 5 Conclusions

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 19 / 29

slide-27
SLIDE 27

Prototype implementation

Prototype implementation: PostgreSQL MongoDB F relation implementation: Foreign table

Foreign data wrapper SQL/MED, ISO/IEC 9075-9:2008

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 20 / 29

slide-28
SLIDE 28

Experimental setup

Data: 12 product datasets

Vary NoSQL size, SQL size, and join probability 2 × 2 × 3 datasets

Twitter dataset Queries: 4 flow classes

NoSQL only NoSQL→ SQL SQL→ NoSQL SQL→ NoSQL→ SQL

4 query types per flow class

Use all mentioned NoSQL keys in the SQL part of the query all mentioned NoSQL keys exist in the data for all documents

4 × 4 query templates

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 21 / 29

slide-29
SLIDE 29

Experimental setup

Measurements: Per query template, 10 concrete queries, each repeated 5 times Measure translation and execution time separately

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 22 / 29

slide-30
SLIDE 30

Experimental setup

Measurements: Per query template, 10 concrete queries, each repeated 5 times Measure translation and execution time separately Prototype implementations: Limit to first 100 documents:

Ia – naive translation plus data filtering Ib – base implementation Ia plus temporary table

Limit to first 25 000 documents:

Ic – identical to Ib Id – Ic plus tuple reconstruction

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 22 / 29

slide-31
SLIDE 31

Outline

1 Introduction 2 Theoretical framework 3 Experimental framework 4 Empirical analysis 5 Conclusions

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 23 / 29

slide-32
SLIDE 32

Overview

0.1 1 10 100 T w i t t e r P r

  • d

u c t s Average query time (s) Dataset Ia Ib

Figure: Performance effect using a single temporary relation

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 24 / 29

slide-33
SLIDE 33

Increased limit

Ic Id Id / Ic Q1 13.5964 11.3169 0.832 Q2 34.1419 7.7161 0.226 Q3 9.2465 9.6599 1.045 Q4 7.0996 5.7576 0.811

(a) Flow class Fiii

Ic Id Id / Ic Q1 15.5227 11.8015 0.760 Q2 12.8297 8.9534 0.698 Q3 11.8088 10.2127 0.865 Q4 8.2041 6.1557 0.750

(b) Flow class Fiv Table: Comparison of Ic and Id performance in seconds, per flow class.

Observations: Better performance than base implementation Ia Tuple reconstruction is always beneficial Not efficient enough for deployment, further study required

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 25 / 29

slide-34
SLIDE 34

Outline

1 Introduction 2 Theoretical framework 3 Experimental framework 4 Empirical analysis 5 Conclusions

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 26 / 29

slide-35
SLIDE 35

Contributions

Theoretical framework: Transform NoSQL data to logical relation F of fixed schema SQL extension with NoSQL query pattern Automatic translation to pure SQL Optimization strategies to reduce communication Prototype results: Successfully created a hybrid SQL-NoSQL database prototype Many interesting engineering possibilities Space for improvement towards an industrial-strength implementation

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 27 / 29

slide-36
SLIDE 36

Future work

Translation optimization:

Nested join reduction F relation reuse

Dynamic multi-query optimization strategies NoSQL query language standardization

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 28 / 29

slide-37
SLIDE 37

Thanks you! Questions?

Roijackers & Fletcher (TU Eindhoven) Bridging SQL and NoSQL BNCOD 2013 29 / 29