SLIDE 1 Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte
How Achaeans Would Construct Columns in Troy
How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix - - PowerPoint PPT Presentation
How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix - - PowerPoint PPT Presentation
How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte Number of Visas Received 1 0,75 0,5 0,25 0 Alekh Jens Health Level 5 days before CIDR 100
SLIDE 2 0,25 0,5 0,75 1
Number of Visas Received
Alekh Jens
SLIDE 3 25 50 75 100
Health Level 5 days before CIDR
Alekh Jens
percentage
SLIDE 4 50 100 150
Average Number of Slides per 20min talk
Alekh Jens
SLIDE 5 50 100 150
Number of Slides Actually Prepared
Alekh Jens
SLIDE 6
“
SLIDE 7 What is the problem?
SLIDE 8 8
Row-stores
SLIDE 9 9
Column-stores
SLIDE 10
OLTP OLAP
10 SLIDE 11 11
SLIDE 12
OLTP OLAP?
12 Can we do efficient OLAP in Row-stores? SLIDE 13 Any solutions out there?
SLIDE 14 C-Tables
14
Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
SLIDE 15 C-Tables
15
Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household
Relation
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
SLIDE 16 C-Tables
16
Sorted Relation Physical Table
T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
SLIDE 17 C-Tables
17
Sorted Relation Physical Table
T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
SLIDE 18 C-Tables
18
Sorted Relation Physical Table
T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
SLIDE 19 C-Tables
19
Sorted Relation Physical Table
T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
SLIDE 20 C-Tables
20
Sorted Relation Physical Table
T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 T phone f v 1 2134 2 2435 3 4312 4 9878 5 6756 6 3425 7 5766 8 8789 T name f v 1 smith 2 steve 3 mark 4 joe 5 kim 6 john 7 jim 8 ian Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household
JOINS !
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
SLIDE 21 21
JOINS !
0.1 1 10 100 1000 10000 1 2 3 4 5 6 7 8 9 10 Query Time (sec) # referenced Attributes
C-Table Standard Row
(a) Cardinality = 10
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
C-Tables
SLIDE 22 22 0.1 1 10 100 1000 10000 1 2 3 4 5 Query Time (sec) # referenced Attributes
C-Table Standard Row
(b) Cardinality = 100
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
*
C-Tables
SLIDE 23 C-Tables
23
Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
*
* Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
SLIDE 24 Column Index
24
Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
* P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
*
SLIDE 25 25
Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household
Relation Physical Table
Column Index
* P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
*
segment size = 4
SLIDE 26 26
Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household
Relation Physical Table
Column Index
* P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
*
segment size = 4
SLIDE 27 27
Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household
Relation Physical Table
Column Index
* P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
*
segment size = 4
SLIDE 28 Column Index
28
Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
DEEP CHANGES !
*
* P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
SLIDE 29 Column Index
28
Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
LONG TIME !
*
* P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
SLIDE 30 Column Index
28
Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
SOURCE CODE !
*
* P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
SLIDE 31 Column Index
28
Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
*
* P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
SLIDE 32 What do we propose?
SLIDE 33 Trojan Columns
31
UDF Storage Layer Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
SLIDE 34 32
Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household
Relation Physical Table
Trojan Columns
SLIDE 35 32
Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household
Relation Physical Table
Trojan Columns
Tuple Iterator Data Parser Data Accesso r
(a) Convert row tuples into blobs (b) Store blob data (c) Get next row data
write-UDF
SLIDE 36 33
Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household
Relation Physical Table
Trojan Columns
Tuple Iterator Data Parser
Data Accessor
(e) Reconstruct row tuples (d) Parse blob data (f) Fetch blob data (g)End of table
read-UDF
SLIDE 37 Example: TPC-H Query 6
Result
quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24
σ π
agg (extendedprice * discount)
γ
lineitem
SCAN
34
SLIDE 38 Example: TPC-H Query 6
shipd
scanUDF
Result
quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24
σ π
agg (extendedprice * discount)
γ
lineitem
SCAN
scanUDF Result
quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24
σ π
agg (extendedprice * discount)
γ
lineitem
SCAN
35
SLIDE 39 Example: TPC-H Query 6
shipd
scanUDF
σ
ate BETWEEN
σ
selectUDF
σ
Result
quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24
σ π
agg (extendedprice * discount)
γ
lineitem
SCAN
scanUDF Result
quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24
σ π
agg (extendedprice * discount)
γ
lineitem
SCAN
selectUDF Result
quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24
σ π
agg (extendedprice * discount)
γ
lineitem
SCAN
36
SLIDE 40 37
Example: TPC-H Query 14
Result
shipdate, discount extendedprice, partkey shipdate BETWEEN ‘1995-09-01’ AND ‘1995-10-01’
σ π
agg
100 * SUM(CASE WHEN type LIKE ‘PROMO%’ THEN extendedprice*(1-discount) ELSE 0 END) / SUM(extendedprice*(1-discount))
γ
lineitem
SCAN
part
SCAN
type, partkey
π
partkey
SLIDE 41 Trojan Columns
38
UDF Storage Layer Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
Plug-and-play
SLIDE 42 Trojan Columns
38
UDF Storage Layer Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
Quick Deployment
SLIDE 43 Trojan Columns
38
UDF Storage Layer Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
Closed-source
SLIDE 44 Trojan Columns
38
UDF Storage Layer Query Processor Relations Physical Representation
File 1 File 2 File 3 File n ....
Application
User Database
SLIDE 45 Will this work?
SLIDE 46 Experimental Setup
Three variants of TPC-H benchmark:
Commercial closed-source Row-store (Standard Row) Trojan Columns in commercial closed-source Row-store (Trojan Columns)
- 1. simplified queries , simplified dataset
- 2. simplified queries , original dataset
- 3. original queries
SLIDE 47 10 20 30 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec)
Standard Row Trojan Columns
71.74058 72.41696
Simplified Queries, Simplified Dataset *
41
* Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005
SLIDE 48 10 20 30 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec)
Standard Row Trojan Columns
71.74058 72.41696
Simplified Queries, Simplified Dataset *
41
* Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005
5x
SLIDE 49 125 250 375 500 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec)
Standard Row Trojan Columns
292.16139
Simplified Queries, Original Dataset
42
SLIDE 50 125 250 375 500 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec)
Standard Row Trojan Columns
292.16139
Simplified Queries, Original Dataset
42
13x
SLIDE 51 43
* tpch.org/tpch
25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec)
Standard Row Trojan Columns
Original Queries, Original Dataset *
The “Good Queries“
SLIDE 52 43
* tpch.org/tpch
25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec)
Standard Row Trojan Columns
9x
Original Queries, Original Dataset *
The “Good Queries“
SLIDE 53 What are the trade-offs?
SLIDE 54 45
* tpch.org/tpch
225 450 675 900 Q3 Q5 Q10 Q19 Query Time (sec)
Standard Row Trojan Columns
Original Queries, Original Dataset *
The “Bad Queries“
SLIDE 55 Micro-Benchmark:
Improvement over Row-store
46
16
2.13 2.13 2.11 2.06 1.55 0.47 0.06
15
4.64 4.62 4.55 4.27 2.57 0.55 0.06
13
5.00 5.00 4.94 4.61 2.70 0.57 0.06
11
5.79 5.82 5.75 5.24 2.87 0.56 0.06
9
6.39 6.38 6.25 5.79 3.11 0.54 0.06
7
7.00 6.96 6.80 6.23 3.17 0.56 0.06
5
10.96 10.94 10.55 9.27 3.75 0.57 0.06
3
12.86 13.57 13.22 11.03 4.16 0.56 0.06
1
17.43 17.61 16.61 13.57 4.39 0.57 0.06
1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00
selectivity (fraction of tuples accessed) # referenced attributes (r)
SLIDE 56 Micro-Benchmark:
Improvement over Row-store
46
16
2.13 2.13 2.11 2.06 1.55 0.47 0.06
15
4.64 4.62 4.55 4.27 2.57 0.55 0.06
13
5.00 5.00 4.94 4.61 2.70 0.57 0.06
11
5.79 5.82 5.75 5.24 2.87 0.56 0.06
9
6.39 6.38 6.25 5.79 3.11 0.54 0.06
7
7.00 6.96 6.80 6.23 3.17 0.56 0.06
5
10.96 10.94 10.55 9.27 3.75 0.57 0.06
3
12.86 13.57 13.22 11.03 4.16 0.56 0.06
1
17.43 17.61 16.61 13.57 4.39 0.57 0.06
1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00
selectivity (fraction of tuples accessed) # referenced attributes (r)
Not Affected
SLIDE 57 Micro-Benchmark:
Improvement over Row-store
46
16
2.13 2.13 2.11 2.06 1.55 0.47 0.06
15
4.64 4.62 4.55 4.27 2.57 0.55 0.06
13
5.00 5.00 4.94 4.61 2.70 0.57 0.06
11
5.79 5.82 5.75 5.24 2.87 0.56 0.06
9
6.39 6.38 6.25 5.79 3.11 0.54 0.06
7
7.00 6.96 6.80 6.23 3.17 0.56 0.06
5
10.96 10.94 10.55 9.27 3.75 0.57 0.06
3
12.86 13.57 13.22 11.03 4.16 0.56 0.06
1
17.43 17.61 16.61 13.57 4.39 0.57 0.06
1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00
selectivity (fraction of tuples accessed) # referenced attributes (r)
Not Affected Affected
SLIDE 58 How far are we?
SLIDE 59 Four Systems
Commercial Row-store (Standard Row) Trojan Columns in commercial Row-store Commercial Row-store with vendor support for column technology (DBMS-Y) Commercial Column-store (DBMS-Z)
(a) default TPC-H schema
(b) tuned schema
SLIDE 60 TPC-H Benchmark
49
25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec)
Standard Row Trojan Columns DBMS-Y DBMS-Z (a) DBMS-Z (b)
SLIDE 61 TPC-H Benchmark
49
25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec)
Standard Row Trojan Columns DBMS-Y DBMS-Z (a) DBMS-Z (b)
Comparable or Better!
SLIDE 62 TPC-H Benchmark
49
25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec)
Standard Row Trojan Columns DBMS-Y DBMS-Z (a) DBMS-Z (b)
Comparable or Better! Still to catch-up!
SLIDE 63 What about query
- ptimization?
SLIDE 64 Rules out query optimization?
SLIDE 65 Rules out query optimization?
NO!
SLIDE 66 Rules out query optimization?
NO!
QO with aggregate UDFs [SIGMOD’06] Manimal [WebDB’10] HadoopToSQL [EuroSys’10] Black box QO [VLDB’12]
SLIDE 67
“
SLIDE 68 The UDF Business Model
SLIDE 69 Not just for application-specific code Integrate core database functionality after the fact Column layouts are just one example! Meet customer demands quickly Provide quick feedback before new product release
UDFs
SLIDE 70 performance
slow fast good-enough
Summary
row-store
SLIDE 71 performance
slow fast good-enough
native column-store
Summary
row-store
SLIDE 72 performance
slow fast good-enough
Trojan Columns native column-store
Summary
row-store