How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix - - PowerPoint PPT Presentation

how achaeans would construct columns in troy
SMART_READER_LITE
LIVE PREVIEW

How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix - - PowerPoint PPT Presentation

How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte Number of Visas Received 1 0,75 0,5 0,25 0 Alekh Jens Health Level 5 days before CIDR 100


slide-1
SLIDE 1 Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte How Achaeans Would Construct Columns in Troy
slide-2
SLIDE 2 0,25 0,5 0,75 1 Number of Visas Received Alekh Jens
slide-3
SLIDE 3 25 50 75 100 Health Level 5 days before CIDR Alekh Jens percentage
slide-4
SLIDE 4 50 100 150 Average Number of Slides per 20min talk Alekh Jens
slide-5
SLIDE 5 50 100 150 Number of Slides Actually Prepared Alekh Jens
slide-6
SLIDE 6

slide-7
SLIDE 7 What is the problem?
slide-8
SLIDE 8 8

Row-stores

slide-9
SLIDE 9 9

Column-stores

slide-10
SLIDE 10

OLTP OLAP

10
slide-11
SLIDE 11 11
slide-12
SLIDE 12

OLTP OLAP?

12 Can we do efficient OLAP in Row-stores?
slide-13
SLIDE 13 Any solutions out there?
slide-14
SLIDE 14 C-Tables 14 Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 *
slide-15
SLIDE 15 C-Tables 15 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Relation * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 *
slide-16
SLIDE 16 C-Tables 16 Sorted Relation Physical Table T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 *
slide-17
SLIDE 17 C-Tables 17 Sorted Relation Physical Table T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 *
slide-18
SLIDE 18 C-Tables 18 Sorted Relation Physical Table T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 *
slide-19
SLIDE 19 C-Tables 19 Sorted Relation Physical Table T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 *
slide-20
SLIDE 20 C-Tables 20 Sorted Relation Physical Table T market segment f v c 1 automobile 2 3 building 2 5 furniture 1 6 household 3 T phone f v 1 2134 2 2435 3 4312 4 9878 5 6756 6 3425 7 5766 8 8789 T name f v 1 smith 2 steve 3 mark 4 joe 5 kim 6 john 7 jim 8 ian Customer name phone market segment smith 2134 automobile steve 2435 automobile mark 4312 building joe 9878 building kim 6756 furniture john 3425 household jim 5766 household ian 8789 household JOINS ! * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 *
slide-21
SLIDE 21 21 JOINS ! 0.1 1 10 100 1000 10000 1 2 3 4 5 6 7 8 9 10 Query Time (sec) # referenced Attributes C-Table Standard Row (a) Cardinality = 10 * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 * C-Tables
slide-22
SLIDE 22 22 0.1 1 10 100 1000 10000 1 2 3 4 5 Query Time (sec) # referenced Attributes C-Table Standard Row (b) Cardinality = 100 * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009 * C-Tables
slide-23
SLIDE 23 C-Tables 23 Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database * * Nicolas Bruno. Teaching an Old Elephant New Tricks. CIDR 2009
slide-24
SLIDE 24 Column Index 24 Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database * P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011 *
slide-25
SLIDE 25 25 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household Relation Physical Table Column Index * P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011 * segment size = 4
slide-26
SLIDE 26 26 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household Relation Physical Table Column Index * P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011 * segment size = 4
slide-27
SLIDE 27 27 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household Relation Physical Table Column Index * P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011 * segment size = 4
slide-28
SLIDE 28 Column Index 28 Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database DEEP CHANGES ! * * P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
slide-29
SLIDE 29 Column Index 28 Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database LONG TIME ! * * P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
slide-30
SLIDE 30 Column Index 28 Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database SOURCE CODE ! * * P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
slide-31
SLIDE 31 Column Index 28 Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database * * P . Larson et. al. SQL Server Column Store Indexes. SIGMOD 2011
slide-32
SLIDE 32 What do we propose?
slide-33
SLIDE 33 Trojan Columns 31 UDF Storage Layer Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database
slide-34
SLIDE 34 32 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household Relation Physical Table Trojan Columns
slide-35
SLIDE 35 32 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household Relation Physical Table Trojan Columns Tuple Iterator Data Parser Data Accesso r (a) Convert row tuples into blobs (b) Store blob data (c) Get next row data write-UDF
slide-36
SLIDE 36 33 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household Relation Physical Table Trojan Columns Tuple Iterator Data Parser Data Accessor (e) Reconstruct row tuples (d) Parse blob data (f) Fetch blob data (g)End of table read-UDF
slide-37
SLIDE 37 Example: TPC-H Query 6 Result quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 σ π agg (extendedprice * discount) γ lineitem SCAN 34
slide-38
SLIDE 38 Example: TPC-H Query 6 shipd scanUDF Result quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 σ π agg (extendedprice * discount) γ lineitem SCAN scanUDF Result quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 σ π agg (extendedprice * discount) γ lineitem SCAN 35
slide-39
SLIDE 39 Example: TPC-H Query 6 shipd scanUDF σ ate BETWEEN σ selectUDF σ Result quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 σ π agg (extendedprice * discount) γ lineitem SCAN scanUDF Result quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 σ π agg (extendedprice * discount) γ lineitem SCAN selectUDF Result quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 σ π agg (extendedprice * discount) γ lineitem SCAN 36
slide-40
SLIDE 40 37 Example: TPC-H Query 14 Result shipdate, discount extendedprice, partkey shipdate BETWEEN ‘1995-09-01’ AND ‘1995-10-01’ σ π agg 100 * SUM(CASE WHEN type LIKE ‘PROMO%’ THEN extendedprice*(1-discount) ELSE 0 END) / SUM(extendedprice*(1-discount)) γ lineitem SCAN part SCAN type, partkey π partkey
slide-41
SLIDE 41 Trojan Columns 38 UDF Storage Layer Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database Plug-and-play
slide-42
SLIDE 42 Trojan Columns 38 UDF Storage Layer Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database Quick Deployment
slide-43
SLIDE 43 Trojan Columns 38 UDF Storage Layer Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database Closed-source
slide-44
SLIDE 44 Trojan Columns 38 UDF Storage Layer Query Processor Relations Physical Representation File 1 File 2 File 3 File n .... Application User Database
slide-45
SLIDE 45 Will this work?
slide-46
SLIDE 46 Experimental Setup Three variants of TPC-H benchmark:
 Commercial closed-source Row-store (Standard Row) Trojan Columns in commercial closed-source Row-store (Trojan Columns)
  • 1. simplified queries , simplified dataset
  • 2. simplified queries , original dataset
  • 3. original queries
, original dataset
slide-47
SLIDE 47 10 20 30 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec) Standard Row Trojan Columns 71.74058 72.41696 Simplified Queries, Simplified Dataset * 41 * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005
slide-48
SLIDE 48 10 20 30 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec) Standard Row Trojan Columns 71.74058 72.41696 Simplified Queries, Simplified Dataset * 41 * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005 5x
slide-49
SLIDE 49 125 250 375 500 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec) Standard Row Trojan Columns 292.16139 Simplified Queries, Original Dataset 42
slide-50
SLIDE 50 125 250 375 500 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec) Standard Row Trojan Columns 292.16139 Simplified Queries, Original Dataset 42 13x
slide-51
SLIDE 51 43 * tpch.org/tpch 25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec) Standard Row Trojan Columns Original Queries, Original Dataset * The “Good Queries“
slide-52
SLIDE 52 43 * tpch.org/tpch 25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec) Standard Row Trojan Columns 9x Original Queries, Original Dataset * The “Good Queries“
slide-53
SLIDE 53 What are the trade-offs?
slide-54
SLIDE 54 45 * tpch.org/tpch 225 450 675 900 Q3 Q5 Q10 Q19 Query Time (sec) Standard Row Trojan Columns Original Queries, Original Dataset * The “Bad Queries“
slide-55
SLIDE 55 Micro-Benchmark:
 Improvement over Row-store 46 16 2.13 2.13 2.11 2.06 1.55 0.47 0.06 15 4.64 4.62 4.55 4.27 2.57 0.55 0.06 13 5.00 5.00 4.94 4.61 2.70 0.57 0.06 11 5.79 5.82 5.75 5.24 2.87 0.56 0.06 9 6.39 6.38 6.25 5.79 3.11 0.54 0.06 7 7.00 6.96 6.80 6.23 3.17 0.56 0.06 5 10.96 10.94 10.55 9.27 3.75 0.57 0.06 3 12.86 13.57 13.22 11.03 4.16 0.56 0.06 1 17.43 17.61 16.61 13.57 4.39 0.57 0.06 1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) # referenced attributes (r)
slide-56
SLIDE 56 Micro-Benchmark:
 Improvement over Row-store 46 16 2.13 2.13 2.11 2.06 1.55 0.47 0.06 15 4.64 4.62 4.55 4.27 2.57 0.55 0.06 13 5.00 5.00 4.94 4.61 2.70 0.57 0.06 11 5.79 5.82 5.75 5.24 2.87 0.56 0.06 9 6.39 6.38 6.25 5.79 3.11 0.54 0.06 7 7.00 6.96 6.80 6.23 3.17 0.56 0.06 5 10.96 10.94 10.55 9.27 3.75 0.57 0.06 3 12.86 13.57 13.22 11.03 4.16 0.56 0.06 1 17.43 17.61 16.61 13.57 4.39 0.57 0.06 1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) # referenced attributes (r) Not Affected
slide-57
SLIDE 57 Micro-Benchmark:
 Improvement over Row-store 46 16 2.13 2.13 2.11 2.06 1.55 0.47 0.06 15 4.64 4.62 4.55 4.27 2.57 0.55 0.06 13 5.00 5.00 4.94 4.61 2.70 0.57 0.06 11 5.79 5.82 5.75 5.24 2.87 0.56 0.06 9 6.39 6.38 6.25 5.79 3.11 0.54 0.06 7 7.00 6.96 6.80 6.23 3.17 0.56 0.06 5 10.96 10.94 10.55 9.27 3.75 0.57 0.06 3 12.86 13.57 13.22 11.03 4.16 0.56 0.06 1 17.43 17.61 16.61 13.57 4.39 0.57 0.06 1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) # referenced attributes (r) Not Affected Affected
slide-58
SLIDE 58 How far are we?
slide-59
SLIDE 59 Four Systems Commercial Row-store (Standard Row) Trojan Columns in commercial Row-store Commercial Row-store with vendor support for column technology (DBMS-Y) Commercial Column-store (DBMS-Z)
 (a) default TPC-H schema
 (b) tuned schema

slide-60
SLIDE 60 TPC-H Benchmark 49 25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec) Standard Row Trojan Columns DBMS-Y DBMS-Z (a) DBMS-Z (b)
slide-61
SLIDE 61 TPC-H Benchmark 49 25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec) Standard Row Trojan Columns DBMS-Y DBMS-Z (a) DBMS-Z (b) Comparable or Better!
slide-62
SLIDE 62 TPC-H Benchmark 49 25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec) Standard Row Trojan Columns DBMS-Y DBMS-Z (a) DBMS-Z (b) Comparable or Better! Still to catch-up!
slide-63
SLIDE 63 What about query
  • ptimization?
slide-64
SLIDE 64 Rules out query optimization?
slide-65
SLIDE 65 Rules out query optimization? NO!
slide-66
SLIDE 66 Rules out query optimization? NO! QO with aggregate UDFs [SIGMOD’06] Manimal [WebDB’10] HadoopToSQL [EuroSys’10] Black box QO [VLDB’12]
slide-67
SLIDE 67

slide-68
SLIDE 68 The UDF Business Model
slide-69
SLIDE 69 Not just for application-specific code Integrate core database functionality after the fact Column layouts are just one example! Meet customer demands quickly Provide quick feedback before new product release UDFs
slide-70
SLIDE 70 performance slow fast good-enough Summary row-store
slide-71
SLIDE 71 performance slow fast good-enough native column-store Summary row-store
slide-72
SLIDE 72 performance slow fast good-enough Trojan Columns native column-store Summary row-store