motivation
play

Motivation Database as a service (DaaS) User Service Provider - PowerPoint PPT Presentation

Wentao Wu 1 , Yun Chi 2 , Shenghuo Zhu 2 , Junichi Tatemura 2 , Hakan Hacigumus 2 , Jeffrey Naughton 1 1 Dept of Computer Sciences, University of Wisconsin-Madison 2 NEC Laboratories America 1 Motivation Database as a service (DaaS) User


  1. Wentao Wu 1 , Yun Chi 2 , Shenghuo Zhu 2 , Junichi Tatemura 2 , Hakan Hacigumus 2 , Jeffrey Naughton 1 1 Dept of Computer Sciences, University of Wisconsin-Madison 2 NEC Laboratories America 1

  2. Motivation  Database as a service (DaaS) User Service Provider Service Level Database Agreement (SLA) How to predict the execution time of a query before it runs? 2

  3. Applications  Admission c ontrol  Run this query or not?  Query scheduling  If we decide to run it, when?  Progress monitoring  How long should we wait if something is wrong?  System sizing  How much hardware does it require to run in the given time? 3

  4. Use Optimizers’ Cost Estimates?  Query optimizers have cost estimates for queries.  Can we just use them?  Previous work ([Ganapathi ICDE’09], [ Akdere ICDE’12 ])  Query optimizers’ cost estimates are unusable . avg err: 120% Naïve Scaling : Predict the execution time T by scaling the cost estimate C , i.e., T = a · C Fig. 5 of [Akdere ICDE’12 ] 4

  5. Why Does Naïve Cost Unit Value Scaling Fail? c s : seq_page_cost 1.0 c r : rand_page_cost 4.0  PostgreSQL’s cost model c t : cpu_tuple_cost 0.01 𝐷 = 𝑜 𝑡 𝑑 𝑡 + 𝑜 𝑠 𝑑 𝑠 + 𝑜 𝑢 𝑑 𝑢 + 𝑜 𝑗 𝑑 𝑗 + 𝑜 𝑝 𝑑 𝑝 c i : cpu_index_tuple_cost 0.005 c o : cpu_operator_cost 0.0025 Naïve Scaling 𝑑 𝑠 𝑑 𝑢 𝑑 𝑗 𝑑 𝑝 ′ ⋅ (𝑜 𝑡 + 𝑜 𝑠 𝑈 = 𝑏 ⋅ 𝐷 = 𝑑 𝑡 + 𝑜 𝑢 + 𝑜 𝑗 + 𝑜 𝑝 ) 𝑑 𝑡 𝑑 𝑡 𝑑 𝑡 𝑑 𝑡 Should be ′ = 𝑏 ⋅ 𝑑 𝑡 = 𝑏 ⋅ 1.0 = 𝑏 𝑑 𝑡 correct!  The assumptions required (for naïve scaling to work)  The ratios between the c ’s are correct.  The n ’s are correct. 5

  6. Beat Naïve Scaling  PostgreSQL’s cost model 𝐷 = 𝑜 𝑡 𝑑 𝑡 + 𝑜 𝑠 𝑑 𝑠 + 𝑜 𝑢 𝑑 𝑢 + 𝑜 𝑗 𝑑 𝑗 + 𝑜 𝑝 𝑑 𝑝 Unfortunately, both the c ’s and the n ’s could be incorrect !  To beat naïve scaling  Use machine learning ([Ganapathi ICDE’09], [ Akdere ICDE’12 ])  Calibrate the c ’s and the n ’s! (our work) 6

  7. What if We Use Calibrated c ’s and n ’s ?  Cost models become much more effective. Prediction by Naïve Scaling: Prediction by Calibration: 𝑈 𝑞𝑠𝑓𝑒 = ∑𝑑 ′ ⋅ 𝑜′ 𝑈 𝑞𝑠𝑓𝑒 = 𝑏 ⋅ (∑𝑑 ⋅ 𝑜) 7

  8. Main Idea  How can we calibrate the c ’s and the n ’s ?  Calibrate the c ’s: use profiling queries .  Calibrate the n ’s: refine cardinality estimates . 8

  9. Contribution of This Work  We proposed a systematic framework to calibrate the cost models used by the query optimizer.  We showed that the calibrated cost model is much better than naïvely scaling the cost estimates.  We further showed that the calibrated cost model is also much better than the state-of-the-art machine- learning based approaches. 9

  10. Cost Unit c s : seq_page_cost Calibrating The c ’s c r : rand_page_cost c t : cpu_tuple_cost  Basic idea (an example) c i : cpu_index_tuple_cost  Want to know the true 𝑑 𝑢 and 𝑑 𝑝 c o : cpu_operator_cost R in memory q 1 : select * from R 𝑢 1 = 𝑑 𝑢 ∙ 𝑜 𝑢 𝑢 2 = 𝑑 𝑢 ∙ 𝑜 𝑢 + 𝑑 𝑝 ∙ 𝑜 𝑝 q 2 : select count(*) from R  General case  k cost units (i.e., k unknowns) => k queries (i.e., k equations)  k = 5 in the case of PostgreSQL 10

  11. How to Pick Profiling Queries?  Completeness  Each c should be covered by at least one query.  Conciseness  The set of queries is incomplete if any query is removed.  Simplicity  Each query should be as simple as possible. 11

  12. Profiling Queries For PostgreSQL Isolate the unknowns and solve them one per equation ! R in memory 𝑢 1 = 𝑑 𝑢 ∙ 𝑜 𝑢1 q 1 : select * from R R in memory 𝑢 2 = 𝑑 𝑢 ∙ 𝑜 𝑢2 + 𝑑 𝑝 ∙ 𝑜 𝑝2 q 2 : select count(*) from R R in memory q 3 : select * from R where R.A 𝑢 3 = 𝑑 𝑢 ∙ 𝑜 𝑢3 + 𝑑 𝑗 ∙ 𝑜 𝑗3 + 𝑑 𝑝 ∙ 𝑜 𝑝3 < a (R.A with an Index) R on disk q 4 : select * from R 𝑢 4 = 𝑑 𝑡 ∙ 𝑜 𝑡4 + 𝑑 𝑢 ∙ 𝑜 𝑢4 R on disk 𝑢 5 q 5 : select * from R where R.B = 𝑑 𝑡 ∙ 𝑜 𝑡5 + 𝑑 𝑠 ∙ 𝑜 𝑠5 + 𝑑 𝑢 ∙ 𝑜 𝑢5 < b (R.B unclustered Index) + 𝑑 𝑗 ∙ 𝑜 𝑗5 + 𝑑 𝑝 ∙ 𝑜 𝑝5 12

  13. Calibrating The n ’s  The n ’s are functions of N ’s (i.e., input cardinalities).  Calibrating the n ’s => Calibrating the N ’s Example 1 (In-Memory Sort) n o 𝑡𝑑 = [2 ∙ 𝑂 𝑢 ∙ log 𝑂 𝑢 ] ∙ 𝑑 𝑝 + 𝑢𝑑 𝑝𝑔 𝑑ℎ𝑗𝑚𝑒 𝑠𝑑 = 𝑑 𝑢 ⋅ 𝑂 𝑢 Example 2 (Nested-Loop Join) 𝑡𝑑 = 𝑡𝑑 𝑝𝑔 𝑝𝑣𝑢𝑓𝑠 𝑑ℎ𝑗𝑚𝑒 + 𝑡𝑑 𝑝𝑔 𝑗𝑜𝑜𝑓𝑠 𝑑ℎ𝑗𝑚𝑒 𝑝 ⋅ 𝑂 𝑢 𝑗 + 𝑂 𝑢 𝑝 ⋅ 𝑠𝑑 𝑝𝑔 𝑗𝑜𝑜𝑓𝑠 𝑑ℎ𝑗𝑚𝑒 𝑠𝑑 = 𝑑 𝑢 ⋅ 𝑂 𝑢 n t 𝑡𝑑 : start-cost 𝑠𝑑 : run-cost 𝑢𝑑 = 𝑡𝑑 + 𝑠𝑑 : total-cost 𝑂 𝑢 : # of input tuples 13

  14. Refine Cardinality Estimates  Cardinality Estimation Traditional Role Our Case (Execution (Query Optimization) Time Prediction) # of Plans Hundreds/Thousands of 1 Time per Plan Must be very short Can be a bit longer Precision Important Critical Approach Histograms (dominant) Sampling (one option) 14

  15. A Sampling-Based Estimator  Estimate the selectivity 𝜍 𝑟 of a select-join query 𝑟 . [Haas et al., J. Comput. Syst. Sci. 1996] q : R 1 ⋈ R 2 n samples (w/ replacement) Partition 𝜍 1 = |𝐶 11 ⋈ 𝐶 22 | 𝐶 11 × |𝐶 22 | 𝐶 11 𝐶 21 𝐶 11 ⋈ 𝐶 22 𝜍 1 s 1 : 𝐶 12 𝐶 22 𝑜 𝜍 𝑟 = 1 ……….. ……….. ො 𝑜 ෍ 𝜍 𝑗 … … 𝑗=1 ⋈ 𝐶 1𝑂 1 𝜍 𝑜 𝐶 21 s n : 𝐶 2𝑂 2 𝐶 1𝑂 1 R 2 R 1 The estimator ො 𝜍 𝑟 is unbiased and strongly consistent ! 15

  16. The Cardinality Refinement Algorithm  Design the algorithm based on the previous estimator. Problem Our Solution 1. The estimator needs random 1. Take samples offline and store I/Os at runtime to take samples. them as tables in the database. 2. Query plans usually contain 2. Estimate multiple operators in a more than one operators. single run, by reusing partial results. 3. The estimator only works for 3. Rely on PostgreSQL’s cost select/join operators. models for aggregates . Future work : Add estimators for aggregates ([Charikar PODS’00]). 16

  17. The Cardinality Refinement Algorithm (Example) 𝑟 1 = 𝑆 1 ⋈ 𝑆 2 Plan for q : 𝑟 2 = 𝑆 1 ⋈ 𝑆 2 ⋈ 𝑆 3 agg agg 𝑡 ⋈ 𝑆 2 𝑡 | 𝜍 𝑟 1 = |𝑆 1 q 2 ෞ Rewrite Run 𝑡 | × |𝑆 2 𝑡 | |𝑆 1 ⋈ ⋈ 𝑡 ⋈ 𝑆 2 𝑡 ⋈ 𝑆 3 𝑡 | 𝜍 𝑟 2 = |𝑆 1 q 1 ⋈ ⋈ ෞ 𝑡 𝑆 3 𝑡 | × |𝑆 2 𝑡 | × |𝑆 3 𝑡 | 𝑆 3 |𝑆 1 𝑡 𝑡 𝑆 1 𝑆 2 𝑆 1 𝑆 2 𝑡 are samples (as tables) of 𝑆 1 , 𝑆 2 , 𝑆 3 Reuse 𝑡 , 𝑆 2 𝑡 , 𝑆 3 𝑆 1 For agg , use PostgreSQL’s estimates based on the refined input estimates from q 2 . 17

  18. Experimental Settings  PostgreSQL 9.0.4, Linux 2.6.18  TPC-H 1GB and 10GB databases  Both uniform and skewed data distribution  Two different hardware configurations  PC1: 1-core 2.27 GHz Intel CPU, 2GB memory  PC2: 8-core 2.40 GHz Intel CPU, 16GB memory 18

  19. Calibrating Cost Units PC1: Cost Unit Calibrated Calibrated Default (ms) (normalized to c s ) c s : seq_page_cost 5.53e-2 1.0 1.0 c r : rand_page_cost 6.50e-2 1.2 4.0 c t : cpu_tuple_cost 1.67e-4 0.003 0.01 c i : cpu_index_tuple_cost 3.41e-5 0.0006 0.005 c o : cpu_operator_cost 1.12e-4 0.002 0.0025 PC2: Cost Unit Calibrated Calibrated Default (ms) (normalized to c s ) c s : seq_page_cost 5.03e-2 1.0 1.0 c r : rand_page_cost 4.89e-1 9.7 4.0 c t : cpu_tuple_cost 1.41e-4 0.0028 0.01 c i : cpu_index_tuple_cost 3.34e-5 0.00066 0.005 c o : cpu_operator_cost 7.10e-5 0.0014 0.0025 19

  20. Prediction Precision  Metric of precision  Mean Relative Error (MRE)  Dynamic database workloads  Unseen queries frequently occur.  Compare with existing approaches  Naive scaling  More complex machine learning approaches 20

  21. Existing Machine-Learning Methods  The idea  Represent a query as a feature vector  Train a regression model  SVM [Akdere ICDE’12 ]  REP trees [Xiong SoCC’11 ]  KCCA [Ganapathi ICDE’09 ]  Did not compare since [Akdere ICDE’12 ] is better. 21

  22. Precision on TPC-H 1GB DB Uniform data: Naïve Scaling E t : c ’s (calibrated) + n ’s ( true cardinalities) E o : c ’s (calibrated) + n ’s (cardinalities by optimizer ) E s : c ’s (calibrated) + n ’s (cardinalities by sampling ) 22

  23. Precision on TPC-H 1GB DB (Cont.) Skewed data: E t : c ’s (calibrated) + n ’s ( true cardinalities) E o : c ’s (calibrated) + n ’s (cardinalities by optimizer ) E s : c ’s (calibrated) + n ’s (cardinalities by sampling ) 23

  24. Precision on TPC-H 10GB DB Uniform data (similar results on skewed data): E t : c ’s (calibrated) + n ’s ( true cardinalities) E o : c ’s (calibrated) + n ’s (cardinalities by optimizer ) E s : c ’s (calibrated) + n ’s (cardinalities by sampling ) 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend