Data Systems that are Easy to Design, Tune and Use
Stratos Idreos
Data Systems that are Easy to Design, Tune and Use Stratos Idreos - - PowerPoint PPT Presentation
Data Systems that are Easy to Design, Tune and Use Stratos Idreos applications api/sql algorithms/operators cpu data data data memory hierarchy data system kernel Stratos Idreos design space it all starts with how we store data every
Data Systems that are Easy to Design, Tune and Use
Stratos Idreos
Stratos Idreos
data system kernel
data data data
algorithms/operators
applications api/sql
cpu memory hierarchy
Stratos Idreos
it all starts with how we store data every bit matters
design space
Stratos Idreos
it all starts with how we store data every bit matters
design space no fixed decisions from static to dynamic designs
Stratos Idreos
today
Stratos Idreos
today tomorrow
Stratos Idreos
soon everyone will need to be a “data scientist”
hmm, my data is too big :(
Stratos Idreos
not always sure what we are looking for (until we find it)
data exploration
data has always been big
volume velocity variety veracity
Stratos Idreos
years daily data
[IBMbigdata]
Stratos Idreos
years daily data
[IBMbigdata]
years daily data
[StratosGuess]
data* skills
data system design, set-up, tune, use
Stratos Idreos
design & build data systems that are easy to: (years)
Stratos Idreos
design & build data systems that are easy to: (years) set-up & tune (months)
Stratos Idreos
design & build data systems that are easy to: (years) set-up & tune (months) use (hours/days)
Stratos Idreos
timeline storage indexing query
expert users - idle time - workload knowledge
too many preparation options lead to complex installation schema load
Stratos Idreos
users/applications declarative interface
ask what you want
db system DBA
Stratos Idreos
users/applications data system 1 data system 2 … need to choose the proper system & workloads/ applications change rapidly DBA2 DBA1
Stratos Idreos
storage indexing query schema load
be able to query the data immediately & with good performance
Stratos Idreos
storage indexing query schema load
be able to query the data immediately & with good performance
raw data
knowledge
Stratos Idreos
tune= create proper indices offline performance 10-100X
load indexing query
storage
Stratos Idreos
tune= create proper indices offline performance 10-100X
but it depends on workload!
which indices to build?
and when to build them?
load indexing query
storage
Stratos Idreos
query load indexing storage
Stratos Idreos
timeline query load indexing storage
Stratos Idreos
timeline
sample workload
query load indexing storage
Stratos Idreos
timeline
sample workload analyze
query load indexing storage
Stratos Idreos
timeline
sample workload analyze create indices
query load indexing storage
Stratos Idreos
timeline
sample workload analyze create indices query
query load indexing storage
Stratos Idreos
timeline
sample workload analyze create indices query
complex and time consuming process
query load indexing storage
Stratos Idreos
timeline
sample workload analyze create indices query
complex and time consuming process
query load indexing storage
human administrators + auto-tuning tools
Stratos Idreos
what can go wrong?
not enough idle time to finish proper tuning by the time we finish tuning, the workload changes
big data V’s
volume velocity variety veracity
not enough space to index all data not enough money - energy - resources
Stratos Idreos
what can go wrong?
not enough idle time to finish proper tuning by the time we finish tuning, the workload changes
big data V’s
volume velocity variety veracity
not enough space to index all data not enough money - energy - resources
Stratos Idreos
database cracking
Stratos Idreos
database cracking
idle time
workload knowledge external tools
human control
Stratos Idreos
database cracking
auto-tuning database kernels
incremental, adaptive, partial indexing
idle time
workload knowledge external tools
human control
Stratos Idreos
database cracking
auto-tuning database kernels
incremental, adaptive, partial indexing indexing initialization querying
idle time
workload knowledge external tools
human control
Stratos Idreos
database cracking
auto-tuning database kernels
incremental, adaptive, partial indexing indexing initialization querying
idle time
workload knowledge external tools
human control
Stratos Idreos
database cracking
auto-tuning database kernels
incremental, adaptive, partial indexing indexing initialization querying
idle time
workload knowledge external tools
human control
every query is treated as an advice
Stratos Idreos
A B C D
... ...
relation1/table1
column-store database a fixed-width and dense array per attribute
Database Cracking CIDR 2007
Stratos Idreos
A B C D
... ...
relation1/table1
column-store database a fixed-width and dense array per attribute
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
sort
1 2 3 4 6 7 8 9 11 12 13 14 16 19
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
sort
1 2 3 4 6 7 8 9 11 12 13 14 16 19
binary search
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
sort
1 2 3 4 6 7 8 9 11 12 13 14 16 19
binary search
result
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
sort
1 2 3 4 6 7 8 9 11 12 13 14 16 19
binary search
result
time + knowledge
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14 result
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14 result
gain knowledge on how data is organized
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14
dynamically/on-the-fly within the select-operator
result
gain knowledge on how data is organized
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14
Q2: select R.A from R where R.A>7 and R.A<=16
dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14
Q2: select R.A from R where R.A>7 and R.A<=16
dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14 4 2 1 3 6 7 9 8 13 12 11 14 16 19 piece1: A<=7 piece2: 7<A<=10
Q2: select R.A from R where R.A>7 and R.A<=16
dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14 4 2 1 3 6 7 9 8 13 12 11 14 16 19 piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14
Q2: select R.A from R where R.A>7 and R.A<=16
dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14 4 2 1 3 6 7 9 8 13 12 11 14 16 19 piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16
Q2: select R.A from R where R.A>7 and R.A<=16
dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14 4 2 1 3 6 7 9 8 13 12 11 14 16 19 piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16
Q2: select R.A from R where R.A>7 and R.A<=16
dynamically/on-the-fly within the select-operator
result
Database Cracking CIDR 2007
Stratos Idreos
13 16 4 9 2 12 7 1 19 3 14 11 8 6
column A
Q1: select R.A from R where R.A>10 and R.A<14
4 9 2 7 1 3 8 6 13 12 11 16 19 14 piece1: A<=10 piece2: 10<A<14 piece3: A>=14 4 2 1 3 6 7 9 8 13 12 11 14 16 19 piece1: A<=7 piece2: 7<A<=10 piece3: 10<A<14 piece4: 14<=A<=16 piece5: A>16
Q2: select R.A from R where R.A>7 and R.A<=16
dynamically/on-the-fly within the select-operator
result
the more we crack, the more we learn
Database Cracking CIDR 2007
Stratos Idreos
Database Cracking CIDR 2007
Stratos Idreos
select [15,55]
Database Cracking CIDR 2007
Stratos Idreos
select [15,55]
Database Cracking CIDR 2007
Stratos Idreos
10 20 30 40 50 60 select [15,55]
Database Cracking CIDR 2007
Stratos Idreos
10 20 30 40 50 60 select [15,55] select [15,55]
Database Cracking CIDR 2007
Stratos Idreos
10 20 30 40 50 60 select [15,55] select [15,55]
Database Cracking CIDR 2007
Stratos Idreos
10 20 30 40 50 60 select [15,55] select [15,55]
pieces become smaller and smaller touch at most two pieces at a time
Database Cracking CIDR 2007
Stratos Idreos
100K random selections random selectivity random value ranges in a 10 million integer column
set-up Scan Full Index Crack
0.001 0.01 0.1 1 10 100 1000 1 10 100 1000 10000 100000 Response time (secs) Query sequence (x1000)
continuous adaptation
Database Cracking CIDR 2007
Stratos Idreos
100K random selections random selectivity random value ranges in a 10 million integer column
almost no initialization overhead set-up Scan Full Index Crack
0.001 0.01 0.1 1 10 100 1000 1 10 100 1000 10000 100000 Response time (secs) Query sequence (x1000)
continuous adaptation
Database Cracking CIDR 2007
Stratos Idreos
100K random selections random selectivity random value ranges in a 10 million integer column
almost no initialization overhead continuous improvement set-up Scan Full Index Crack
0.001 0.01 0.1 1 10 100 1000 1 10 100 1000 10000 100000 Response time (secs) Query sequence (x1000)
continuous adaptation
Database Cracking CIDR 2007
Stratos Idreos
100K random selections random selectivity random value ranges in a 10 million integer column
almost no initialization overhead continuous improvement set-up Scan Full Index Crack
0.001 0.01 0.1 1 10 100 1000 1 10 100 1000 10000 100000 Response time (secs) Query sequence (x1000)
continuous adaptation
Database Cracking CIDR 2007
Stratos Idreos
10K random selections selectivity 10% random value ranges in a 30 million integer column
Database Cracking CIDR 2007
set-up
0.004 200 0.001 0.01 0.1 1 10 100 1 10 100 1000 10000 Cumulative average response time (secs) Query sequence
Scan Full Index Crack
continuous adaptation
Stratos Idreos
10K random selections selectivity 10% random value ranges in a 30 million integer column
Database Cracking CIDR 2007
set-up
0.004 200 0.001 0.01 0.1 1 10 100 1 10 100 1000 10000 Cumulative average response time (secs) Query sequence
Scan Full Index Crack
continuous adaptation
Stratos Idreos
10K random selections selectivity 10% random value ranges in a 30 million integer column
10K queries later, Full Index still has not amortized the initialization costs Database Cracking CIDR 2007
set-up
0.004 200 0.001 0.01 0.1 1 10 100 1 10 100 1000 10000 Cumulative average response time (secs) Query sequence
Scan Full Index Crack
continuous adaptation
Stratos Idreos
A
table1
Stratos Idreos
A B C D
... ...
table1
Stratos Idreos
A B C D
... ...
table1
select R.A from R where R.A>10 and R.A<14
Stratos Idreos
A B C D
... ...
table1
select R.A from R where R.A>10 and R.A<14 select max(R.A),max(R.B),max(S.A),max(S.B) from R,S where v1 <R.C<v2 and v3 <R.D<v4 and v5 <R.E<v6 and k1 <S.C<k2 and k3 <S.D<k4 and k5 <S.E<k6 and R.F = S.F
Stratos Idreos
A B C D
... ...
table1
select R.A from R where R.A>10 and R.A<14 select max(R.A),max(R.B),max(S.A),max(S.B) from R,S where v1 <R.C<v2 and v3 <R.D<v4 and v5 <R.E<v6 and k1 <S.C<k2 and k3 <S.D<k4 and k5 <S.E<k6 and R.F = S.F
updates joins concurrency control ...
cracking databases
updates
(SIGMOD07)
>1 columns
storage- restrictions
(SIGMOD09)
benchmarking
(TPCTC10)
robustne
concurrency control
(PVLDB12)
algorithms
(PVLDB11)
basics
(CIDR07)
multi-cores
(SIGMOD15)
hadoop
(Yale/Saarland)
b-trees
(HP Labs)
>1 columns
(SIGMOD09)
robustness
(PVLDB12)
adaptive storage
(SIGMOD14)
time-series
(SIGMOD14)
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive... base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment
A B C D
table 1
A B C D
table 2 base data
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment
A B C D
table 1
A B C D
table 2 base data
sort in caches
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment
A B C D
table 1
A B C D
table 2 base data
sort in caches crack joins
A B C D
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment
A B C D
table 1
A B C D
table 2 base data
sort in caches crack joins
A B C D
lightweight locking
q1 q2
Stratos Idreos
cracking tangram
A B C D
table 1 table 2 as queries arrive...
partial materialization partial indexing continuous adaptation storage adaptation no tuple reconstruction adaptive alignment
A B C D
table 1
A B C D
table 2 base data
sort in caches crack joins
A B C D
lightweight locking stochastic cracking
query random
Stratos Idreos
70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000
MonetDB Presorted MySQL Presorted Response time (milli secs)
Sideways Cracking, SIGMOD 09
Stratos Idreos
70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000
MonetDB Presorted MySQL Presorted Response time (milli secs)
normal MonetDB selection cracking
Sideways Cracking, SIGMOD 09
Stratos Idreos
70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000
MonetDB Presorted MySQL Presorted
preparation cost 3-14 minutes
Response time (milli secs)
presorted MonetDB normal MonetDB selection cracking
Sideways Cracking, SIGMOD 09
Stratos Idreos
70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000
MonetDB Presorted MySQL Presorted
preparation cost 3-14 minutes
Response time (milli secs)
presorted MonetDB MonetDB with sideways cracking normal MonetDB selection cracking
Sideways Cracking, SIGMOD 09
Stratos Idreos
70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000
MonetDB Presorted MySQL Presorted
preparation cost 3-14 minutes
Response time (milli secs)
presorted MonetDB MonetDB with sideways cracking normal MonetDB selection cracking
Sideways Cracking, SIGMOD 09
Stratos Idreos
70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000
MonetDB Presorted MySQL Presorted
preparation cost 3-14 minutes
Response time (milli secs)
presorted MonetDB MonetDB with sideways cracking normal MonetDB selection cracking
Sideways Cracking, SIGMOD 09
Stratos Idreos
70 330 100 150 200 250 300 5 10 15 20 25 30 Query sequence TPC-H Query 15 764 420 1000 10000
MonetDB Presorted MySQL Presorted
preparation cost 3-14 minutes
Response time (milli secs)
presorted MonetDB MonetDB with sideways cracking normal MonetDB selection cracking
Sideways Cracking, SIGMOD 09
Stratos Idreos
cracking on Skyserver (4TB)
(Sloan Digital Sky Survey, www.sdss.org)
cracking answers 160.000 queries while full indexing is still half way creating one index
Stochastic Cracking, PVLDB 12
Stratos Idreos
storage indexes query schema load
Stratos Idreos
adaptive loading (NoDB, CIDR11/SIGMOD12)
storage indexes query schema load
Stratos Idreos
adaptive storage (H20, SIGMOD14) adaptive loading (NoDB, CIDR11/SIGMOD12)
storage indexes query schema load
Stratos Idreos
adaptive storage (H20, SIGMOD14) adaptive time series indexing (ADS, SIGMOD14) adaptive loading (NoDB, CIDR11/SIGMOD12)
storage indexes query schema load
Stratos Idreos
data systems that are easy to design
(storage, data flow, algorithms, tuning, etc)
Stratos Idreos
e.g., column-stores: first ideas in 80s, first advanced architectures in 90s, first rather complete designs in early 2000s, industry adoption 2010+ still no indexing, cost based optimizations, …
Stratos Idreos
application requirements hardware budget energy profile performance
(hardware and requirements change continuously and rapidly)
conflicting goals moving target
Stratos Idreos
data systems design (and research) is kind of an art
Stratos Idreos
disk memory flash
…
Stratos Idreos
data+queries+hardware
data system
self-designing data systems
ACM SIGMOD Blog, June’15
Stratos Idreos
data+queries+hardware
data system
self-designing data systems
ACM SIGMOD Blog, June’15
easy to design adapt to environment
Stratos Idreos
adaptivity across architecture borders
row-store column-store key-value store hybrid store
Stratos Idreos
data systems that are easy to use
dbTouch
show me something interesting
Queriosity DATA
Stratos Idreos
data systems today
allow us to answer queries fast
data systems tomorrow
should allow us to find fast which queries to ask
db
db explore
Stratos Idreos
every query is treated as an advice
instead of making fixed decisions
Stratos Idreos
http://daslab.seas.harvard.edu/
Martin Kersten Stefan Manegold Goetz Graefe Harumi Kuno Anastasia Ailamaki Themis Palpanas Eleni Petraki Ioannis Alagiannis Miguel Branco Renata Borovica Erietta Liarou Felix Halim Ronald Yap Panos Karras Kostas Zoumpatianos Manos Athanassoulis Lukas Maas Abdul Wasay Mike Kester Dhruv Gupta
Stratos Idreos
http://daslab.seas.harvard.edu/
thank you!
Martin Kersten Stefan Manegold Goetz Graefe Harumi Kuno Anastasia Ailamaki Themis Palpanas Eleni Petraki Ioannis Alagiannis Miguel Branco Renata Borovica Erietta Liarou Felix Halim Ronald Yap Panos Karras Kostas Zoumpatianos Manos Athanassoulis Lukas Maas Abdul Wasay Mike Kester Dhruv Gupta