Efficient Data Management and Statistics with Zero-Copy Integration
SSDBM 2014, 2014-06-30
Jonathan Lajus & Hannes Mühleisen
Efficient Data Management and Statistics with Zero-Copy Integration - - PowerPoint PPT Presentation
Efficient Data Management and Statistics with Zero-Copy Integration Jonathan Lajus & Hannes Mhleisen SSDBM 2014, 2014-06-30 Collect data Bottleneck, thanks David! Statistical Toolkit Filter, transform Load data Analyze & Plot
SSDBM 2014, 2014-06-30
Jonathan Lajus & Hannes Mühleisen
Collect data Load data Filter, transform & aggregate data Analyze & Plot Publish paper/ Profit Data Management System Statistical Toolkit Bottleneck, thanks David!
get away with metadata management
0x00000000 Statistics Database SELECT...? 0x00000000 Statistics Database Query Result 0x10000000 0x00000000 Statistics Database Query Result 0x10000000 0x10000000!
(1) (2) (3)
Analyze & Plot Filter, transform & aggregate
https://github.com/lajus/monetinr
BAT Descriptor Column Descriptor 1 2 ... 42 43 44 ... Column Descriptor Arrays head tail Reference
42 43 44 ... Reference SEXP Header Array
BAT Descriptor Column Descriptor tail 42 43 44 ... Reference SEXP Header R Reference MonetDB
+ Garbage Collection Fun
better at data management than pimped stats tools
Fedora Linux
1% Rows Selected 10% Rows Selected 50% Rows Selected
100ms 1s 10s 1min 10min 10 MB 100 MB 1 GB 10 GB 10 MB 100 MB 1 GB 10 GB 10 MB 100 MB 1 GB 10 GB
Dataset Size (log) Execution Time (log)
RSQLite MonetDB.R Prototype
1 Group 500 Groups 10% Groups
100ms 1s 10s 1min 10min 10 MB 100 MB 1 GB 10 GB 10 MB 100 MB 1 GB 10 GB 10 MB 100 MB 1 GB 10 GB
Dataset Size (log) Execution Time (log)
RSQLite MonetDB.R Prototype
1% Join Partner Size 10% Join Partner Size
100ms 1s 10s 1min 10min 10 MB 100 MB 1 GB 10 GB 10 MB 100 MB 1 GB 10 GB
Dataset Size (log) Execution Time (log)
RSQLite MonetDB.R Prototype
CREATE FUNCTION kmeans (data FLOAT, ncluster INTEGER) RETURNS INTEGER LANGUAGE R { kmeans(data,ncluster)$cluster };
Watch the next MonetDB release…
20 30 40 1 K 1 K 1 K 1 K 10 K 10 K 10 K 10 K 100 K 100 K 100 K 100 K 1 M 1 M 1 M 1 M 10 M 10 M 10 M 10 M 100 M 100 M 100 M 100 M 1 K 10 K 100 K 200 K
Rows Time (s)
sys
dumbtime udftime vdumbtime plrtime
quantile(c(.05,.95))
PL/R R in MonetDB
Questions?