Making Cost-Based Query Optimization Asymmetry-Aware Daniel Bausch, - PowerPoint PPT Presentation

Making Cost-Based Query Optimization Asymmetry-Aware Daniel Bausch, Ilia Petrov, and Alejandro Buchmann {bausch, petrov, buchmann}@dvs.tu-darmstadt.de 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 1

Asymmetry in new storage devices Writing to Flash memory is slower than reading from it This also applies to emerging non-volatile memories (PCM, etc.) Small writes to random locations on Flash are even more slow Random reads from Flash are only 1 3 slower than sequential reads 1 1 on Intel X25-E using full command queue 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 2

Cost-Based Query Optimization estimation of run-time before real execution (e.g. in PostgreSQL) model comprised of functions like c ( seqscan ) = c s � R � p + q R ,0 + (˙ c cpu + ˙ q R ) � R � t I/O cost CPU cost order statistics and plan A cost value A select plan B cost value B “best” query planner cost model . . . . plan . . cost value Z plan Z configuration 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 3

Splitting Parameters c sr sequential page read accesses c s sequential page accesses c sw sequential page write accesses c rr random page read accesses c r random page accesses c rw random page write accesses 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 4

Cost functions for “pure load” algorithms cost function kind original replacement sequential scan read only c s c sr index scan read only c s c sr c r c rr bitmap scan read only c s c sr c r c rr TID scan read only c r c rr e materialization write only c s c sw re-scan read only c s c sr 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 5

Cost function of sort algorithm startup � 3 4 c s + 1 � c io ( sort ) = 2 � S � p ⌈ log m n ⌉ 4 c r blktrace stats write read s r s r external sort of unordered data external sort of ordered data sort-merge join � c sw + c rw � � + ⌈ log m n ⌉ c sr + c rr � c io ( sort ) rw = � S � p c sw + ⌈ log m n ⌉ − 1 2 2 startup 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 6

Cost function of hash join startup � � c io ( hashjoin ) = � P i � p c s + � P i � p + 2 � P o � p c s blktrace stats write read s r s r hash join c io ( hashjoin ) rw = � P i � p c rw + � P i � p c sr + � P o � p ( c rw + c sr ) startup 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 7

System and Load MB system with tight memory configuration 4000 ⇒ simulate data-intensive systems under high load application class benchmark based on TPC-H specs 3000 2000 1000 0 obj C L N O P PS R S iCn iLcd iLo iLoq iLp iLrd iLsd iLs iLsp iNr iOc iOod iPSp iPSs iSn iC iN iO iP iPS iR iS RAM e sb w 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 8

Calibration order statistics and plan A cost value A select plan B cost value B “best” query planner cost model . . . . plan . . cost value Z plan Z configuration calibration cost value, based on simulated annealing [Kirk1983], execution time accepts inferior configuration at calibration decreasing probability cache modify by multiplication with logarithmic normally distributed random variable cooling cycle of 100 iterations, restarted 100 times 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 9

Found “Optimal” Settings SSD HDD original model asymmetric model original model asymmetric model c sr = 1.00000 c sr = 1.00000 c s = 1.00000 c s = 1.00000 c sw = 49.91840 c sw = 110.21139 c rr = 5.62724 c rr = 19.25494 c r = 6.77405 c r = 29.04790 c rw = 19.08421 c rw = 20.18467 c cpu = 0.00121 ˙ c cpu = ˙ 0.00003 c cpu = ˙ 0.00280 c cpu = ˙ 0.00082 ˙ ˙ ˙ ˙ ˆ ˆ ˆ ˆ c cpu = 0.03658 c cpu = 0.01608 c cpu = 0.03718 c cpu = 0.00045 c op = 0.00016 c op = 0.00008 c op = 0.00004 c op = 0.00119 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 10

2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 11 Comparison total time [s] 1000 1500 2000 2500 500 0 960637766 2888465430 2988248363 orig. model / calibrated config orig. model / initial config 109994110 2090918519 80136562 742878636 3137579638 3972888868 TPC−H random seed 47261513 644961076 1231071980 SSD 3962584638 4133947234 1671380912 asym. model / calibrated config asym. model / initial config 747421973 4204648090 3759550337 1960789542 3615766274 1073327142 970312735 2473950213 1712708155 3744046782 10000 15000 20000 25000 30000 5000 0 960637766 2888465430 TPC−H random seed 2988248363 109994110 2090918519 HDD 80136562 742878636 3137579638 3972888868 47261513

Individual Query Speed-Up 200% 153% 140% 100% speed−up 50% 48% 36% 22% 16% 14% 10% −17% −10% −74% 7% 7% −1% 5% 5% −4% 5% −1% −1% −1% −2% −1% −1% −4% −0% −1% 4% 3% 3% 3% 4% 2% 2% 2% 2% 2% 2% 0% 0% 0% 0% 2609% 549% faster with asym. model on SSD faster with asym. model on HDD faster with orig. model on SSD faster with orig. model on HDD −100% 01 02 03 04 05 06 07 08 09 10 11 12 13 14 16 17 18 19 20 21 22 tot TPC−H query 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 12

Discussion – Cause for Peaks ⇒ Query 9 700 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 600 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 ● ● ● ● ● time of query 9 [s] ● ● ● ● ● ● ● ● ● ● ● ● ● 400 300 ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 orig_default orig_pgcalib acm_default acm_pgcalib 0 960637766 2888465430 2988248363 109994110 2090918519 80136562 742878636 3137579638 3972888868 47261513 644961076 1231071980 3962584638 4133947234 1671380912 747421973 4204648090 3759550337 1960789542 3615766274 1073327142 970312735 2473950213 1712708155 3744046782 seed 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 13

Discussion – Query 9 – the slow plan Nested Loop 509541.375 using pk_orders on orders Nested Loop Index Scan orders.o_orderkey=lineitem.l_orderkey 102642.716 0.209*1934051=404216.659 partsupp.ps_partkey=lineitem.l_partkey using pk_supplier on supplier Merge Join Index Scan Join Filter: lineitem.l_suppkey=partsupp.ps_suppkey supplier.s_suppkey=lineitem.l_suppkey 88687.896 0.006*1934051=11604.306 Index Scan using i_ps_partkey on partsupp Materialize 2986.769 79808.663 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 14

Discussion – Query 9 – the fast plan Nested Loop 68231.426 lineitem.l_suppkey=supplier.s_suppkey AND using pk_orders on orders Hash Join Index Scan lineitem.l_partkey=partsupp.ps_partkey orders.o_orderkey=lineitem.l_orderkey 54756.688 0.008*1468202=11745.616 Seq Scan on lineitem Hash (1 Batch) 13467.469 6460.738 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 15

Discussion – Biggest Improvement ⇒ Query 21 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 400 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● time of query 21 [s] 300 200 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● orig_default orig_pgcalib acm_default acm_pgcalib 0 960637766 2888465430 2988248363 109994110 2090918519 80136562 742878636 3137579638 3972888868 47261513 644961076 1231071980 3962584638 4133947234 1671380912 747421973 4204648090 3759550337 1960789542 3615766274 1073327142 970312735 2473950213 1712708155 3744046782 seed 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel Bausch, Ilia Petrov, Alejandro Buchmann | 16

Making Cost-Based Query Optimization Asymmetry-Aware Daniel Bausch, - PowerPoint PPT Presentation

Making Cost-Based Query Optimization Asymmetry-Aware Daniel Bausch, Ilia Petrov, and Alejandro Buchmann {bausch, petrov, buchmann}@dvs.tu-darmstadt.de 2012-05-21 | Computer Science Department | Databases and Distributed Systems Group | Daniel

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Cost-based Query Optimization Christoph Koch Computing the cost of a query plan We know how

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

St Stress Aware Layout Stress Aware Layout St A A L L t t Optimization Optimization

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Introduction Query Execution Engine Implements a set of physical operators 2 key

Particle Hole Asymmetry in the Pseudogap Particle-Hole Asymmetry in the Pseudogap Phase of the

Cost-Based Plan Selection Enumerate, Estimate, Select 304 Cost-Based Plan Selection Execution

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Hierarchical Pattern Discovery in Stochastic Lattice Systems Markos Katsoulakis University of

Announcements The second referee report is due March 29th at 5pm The empirical project is due

1 Investment and Policy Clusters What Role is Bioengineering Playing? Health systems:

Plasmodium falciparum Parasites Oct/28/08 Feng, Chi-wei Supervised by Prof. Ching-Tsan Huang,

5/26/2016 Outline of Talk Pelvic Serous Carcinoma: 2014 W.H.O. Update Changes to 2014 WHO

Yes You Can Work! Working While Applying for and Receiving SSA Benefits PRESENTED BY: AMY

Civil Rights and Labor Compliance Through the Life Cycle of a 2017 Project Todays Schedule

Labor Response to Climate Variation in Eastern Africa Valerie Mueller Arizona State University,