NoDB: Efficient Query Processing on Raw Data Files Ioannis - - PowerPoint PPT Presentation

nodb efficient query processing on
SMART_READER_LITE
LIVE PREVIEW

NoDB: Efficient Query Processing on Raw Data Files Ioannis - - PowerPoint PPT Presentation

NoDB: Efficient Query Processing on Raw Data Files Ioannis Alagiannis Renata Borovica Miguel Branco Stratos Idreos * Anastasia Ailamaki cole Polytechnique *CWI, Amsterdam Fdrale de Lausanne From data to results Time


slide-1
SLIDE 1

NoDB: Efficient Query Processing on Raw Data Files

Ioannis Alagiannis‡ Renata Borovica‡ Miguel Branco‡ Stratos Idreos* Anastasia Ailamaki‡

*CWI, Amsterdam

‡École Polytechnique

Fédérale de Lausanne

slide-2
SLIDE 2

Time loading data preparation

From data to results

Reduce data-to-query time

query data

slide-3
SLIDE 3

Querying data in situ

3

Straw-man approach is slow

0|ALGERIA|0| haggle. carefully fina 1|ARGENTINA|1|al foxes promise 2|BRAZIL|1|y alongside of the pen 3|CANADA|1|eas hang ironic, silen 4|EGYPT|4|y above the carefully th ...

raw file

1 ARGENTINA 1 al foxes promis

Fields

0|ALGERIA|0| haggle. carei… 1|ARGENTINA|1|al foxes p…

2|BRAZIL|1|y alongside o… 3|CANADA|1|eas hang iro… ……………………………….…………

convert to binary

parsing tokenize

50 100 150 200 250

Straw-man approach

Execution Time (sec)

I/O parse token. convert query

slide-4
SLIDE 4

NoDB philosophy

4

Adaptive in situ DBMS

Raw files first-class citizen No data loading Driven by the workload Instant gateway to data

Time

slide-5
SLIDE 5

NoDB in practice

5

+ NoDB philosophy = PostgresRaw

Efficient in situ querying Minimal changes to the query engine

slide-6
SLIDE 6

PostgresRaw

6

positional map scan

  • perator

raw files PostgreSQL query engine

cache statistics

Adaptive indexing mechanism

slide-7
SLIDE 7

Positional map

7

Raw File

tuples attributes positions of attributes

Created on-the-fly Reduce parsing Reduce tokenizing

slide-8
SLIDE 8

Positional map

8

Raw File

known position looking for

Created on-the-fly Reduce parsing Reduce tokenizing

slide-9
SLIDE 9

Positional map in action

9

tuples attributes Indexed attributes:

  • 1. Positional map is empty

PM:

slide-10
SLIDE 10

Positional map in action

10

tuples attributes a4, a6

  • 1. Positional map is empty
  • 2. Q1 accesses a4 and a6

PM:

p4, p6 p4, p6 p4, p6 p4, p6

Indexed attributes:

slide-11
SLIDE 11

Positional map in action

11

tuples attributes a4, a6

  • 1. Positional map is empty
  • 2. Q1 accesses a4 and a6
  • 3. Q2 accesses a4 and a9

PM:

p4, p6 p4, p6 p4, p6 p4, p6

Indexed attributes:

p4, p6, p9 p4, p6, p9 p4, p6, p9 p4, p6, p9

a4, a6, a9

Make raw data access progressively cheaper

slide-12
SLIDE 12

PostgresRaw

12

positional map scan

  • perator

raw files PostgreSQL query engine

cache statistics

avoid raw file accesses generate statistics adaptively

slide-13
SLIDE 13

10 20 30 40 50 200 400 600 800 1000 1200 Execution Time (sec) # pointers (in millions)

Impact of positional map

13

No need for the whole positional map

15% from full positional map

2x improvement

Random queries on 10 attributes Vary storage capacity (15MB-2GB)

slide-14
SLIDE 14

1 10 100 Q1 Q5 Q10 Q15 Q20 Q25 Q30 Q35 Q40 Execution Time (sec) - log scale Query Sequence PostgreSQL PostgresRaw

Adapting to changes

14

Graceful adaptation to workload changes

workload changes

slide-15
SLIDE 15

200 400 600 800 1000 1200 1400 1600 1800 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw Execution Time (sec) Q20 Q19 Q18 Q17 Q16 Q15 Q14 Q13 Q12 Q11 Q10 Q9 Q8 Q7 Q6 Q5 Q4 Q3 Q2 Q1 Load

PostgresRaw vs. other DBMS

15

Data Loading

slide-16
SLIDE 16

PostgresRaw vs. other DBMS

16

200 400 600 800 1000 1200 1400 1600 1800 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw Execution Time (sec) Q20 Q19 Q18 Q17 Q16 Q15 Q14 Q13 Q12 Q11 Q10 Q9 Q8 Q7 Q6 Q5 Q4 Q3 Q2 Q1 Load

slide-17
SLIDE 17

200 400 600 800 1000 1200 1400 1600 1800 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw Execution Time (sec) Q20 Q19 Q18 Q17 Q16 Q15 Q14 Q13 Q12 Q11 Q10 Q9 Q8 Q7 Q6 Q5 Q4 Q3 Q2 Q1 Load

PostgresRaw vs. other DBMS

17

Competitive with conventional DBMS

slide-18
SLIDE 18

18

NoDB

Query Data Query Query

Today

Query Data Preparation

Adaptive load - store - execute

Thank you!!