MonetDBLite Bringing Column Stores to the Masses Hannes Mhleisen* - - PowerPoint PPT Presentation

monetdblite bringing column stores to the masses
SMART_READER_LITE
LIVE PREVIEW

MonetDBLite Bringing Column Stores to the Masses Hannes Mhleisen* - - PowerPoint PPT Presentation

MonetDBLite Bringing Column Stores to the Masses Hannes Mhleisen* DBGDBD 2015-12-16 Integrate not Reinvent ? Statistical Toolkits Flexibility Data Management 2 Systems Efficiency 2 Running a DB is hard* Installation


slide-1
SLIDE 1

MonetDBLite – 
 Bringing Column Stores to the Masses

Hannes Mühleisen*


DBGDBD 2015-12-16

slide-2
SLIDE 2

2

Efficiency Flexibility Statistical Toolkits Data Management Systems

2

Integrate not Reinvent ?

slide-3
SLIDE 3

Running a DB is hard*

  • Installation / Automatic startup difficult
  • Configuration for workload often crucial
  • checkpoint_completion_target?
  • effective_cache_size?
  • Maintenance, updates, … workload increase

3

Most people don’t bother if not forced!

slide-4
SLIDE 4

What about ?

  • In-process SQL database, data either in memory or

in a file, rock-solid, used on every smartphone, browser, OS, ….

  • People also use it for large-ish dataset analysis
  • Bad idea, SQLite was never built for this
  • e.g. row-based storage model

4

slide-5
SLIDE 5

5

2

MonetDBLite for R Released Nov. 2015

Enter MonetDBLite

slide-6
SLIDE 6

What is MonetDBLite

  • Embedded & streamlined MonetDB
  • No installation
  • In-Process operation
  • Query results are pointers to C arrays
  • Data append from C arrays
  • Wrappers for R, Python*, Java*, …

6

slide-7
SLIDE 7

Engineering Challenges

  • MonetDB was never designed to run embedded
  • Fatal errors exit()
  • Relative paths relying on setwd()
  • Symbol clashes error() etc.
  • Global variables galore (restartability?)
  • stdout/stderr used for error reporting
  • Build/runtime dependencies (esp. Windows)
  • R’s windows compilation toolchain

7

slide-8
SLIDE 8

Installation

  • Installs like any other package
  • Linux: Source install
  • Windows/OSX: Binary packages

8

slide-9
SLIDE 9

9

DBI Usage

slide-10
SLIDE 10

10

dplyr Usage

slide-11
SLIDE 11

Experiments

  • Home Mortgage Disclosure Act dataset & queries
  • 128M records, 71 fields each, 56 GB CSV
  • Sampled down to 6M and 60M for testing
  • Contenders: MonetDB, MonetDBLite & SQLite
  • Experiments:
  • CSV loading
  • HMDA queries
  • Table transfer

11

slide-12
SLIDE 12

12

142s 179s 171s 714s 1496s 655s 1607s 7776s 1544s

2500 5000 7500 10000 6M 60M 128M

# Tuples Time (s)

Loading from CSV files

MonetDBLite SQLite MonetDB.R MonetDBLite SQLite MonetDB.R MonetDBLite SQLite MonetDB.R

slide-13
SLIDE 13

13

0.7s 1.9s 1s 6.4s 23s 7.9s 13s 64s 16s

20 40 60 6M 60M 128M

# Tuples Time (s)

Run HMDA analysis

MonetDBLite SQLite MonetDB.R MonetDBLite SQLite MonetDB.R MonetDBLite SQLite MonetDB.R

slide-14
SLIDE 14

14

1.5s 7.6s 79s 17s 187s DNF 94s 1297s DNF

500 1000 1500 1M 10M 14M

# Tuples Time (s)

Convert table to data.frame

MonetDBLite SQLite MonetDB.R MonetDBLite SQLite MonetDB.R MonetDBLite SQLite MonetDB.R

slide-15
SLIDE 15

Live Demo

15

slide-16
SLIDE 16

16

MonetDBLite Installation

[Reproduction of live demonstration]

slide-17
SLIDE 17

17

MonetDBLite Startup Read 110 MB CSV into R - ~13s Import table into MonetDBLite - ~1s

[Reproduction of live demonstration]

slide-18
SLIDE 18

18

Fast SQL querying

[Reproduction of live demonstration]

slide-19
SLIDE 19

19

[Reproduction of live demonstration]

Fast table export into R Fast data availability after R restart

slide-20
SLIDE 20

Next Steps

  • Single-file MonetDB (single DLL/so/dylib)
  • Need to inline startup MAL/SQL files
  • Finish Python/Java wrappers
  • “Restartability”
  • Multiple MonetDB’s in a single process?

20

slide-21
SLIDE 21

Questions?

https://www.monetdb.org/blog/monetdblite-r
 @hfmuehleisen