Taster: Self-Tuning, Elastic and Online Approximate Query Processing
Matthaios Olma Odysseas Papapetrou Raja Appuswamy Anastasia Ailamaki
Taster: Self-Tuning , Elastic and Online Approximate Query - - PowerPoint PPT Presentation
JOIN AGGR SAMPLE HASH Taster: Self-Tuning , Elastic and Online Approximate Query Processing Matthaios Olma Odysseas Papapetrou Raja Appuswamy Anastasia Ailamaki Data Exploration vs. Data Preparation Challenges of interactive data
Matthaios Olma Odysseas Papapetrou Raja Appuswamy Anastasia Ailamaki
2
Exploratory Applications
Scientific exploration “Internet of Things” analytics
3
Online Offline
Query Workload Query Sample selection Pre-sampling
No Preprocessing Workload knowledge required No workload knowledge ~2x performance No storage overhead ~10x performance 0.5-2x storage overhead
Sampler Offline AQP Online AQP (e.g., BlinkDB) (e.g., Quickr)
5 10 15 20 25 20 40 60 80 100 120 140 160 180 200
4
Sampling pays off after 85 queries
11 node SparkSQL cluster, TPC-H (300GB) 200 queries (18 TPC-H query templates)
Sampling pays off after 159 queries
5
6
sampler
sampler
– Update when subplans re-appear
Summary warehouse
7
Q1 Q2 Q3 Q4 Q5 Q6
Summary warehouse
w = 2
materialize materialize use Useful Summaries
8
Answer large subset of queries Large size ~ 10% of input I/O cost depending on size
Answer specific queries Compact ~KB Constant access time
Query Execution Query Optimization Online tuner
Metadata store
10
200 400 600 800 1000 1200 1400 Baseline Quickr BlinkDB (50%) Taster (50%) BlinkDB (100%) Taster (100%)
11 node SparkSQL cluster, TPC-H sf300, 200 queries (18 TPC-H templates)
12
11 node SparkSQL cluster, TPC-H sf300, 80 queries (18 TPC-H templates)
– In the context of distributed approximate query processing
– With reduced building and storage cost
13