 
              Automated Physical Designers: What You See is What You Get? Renata Borovica , Ioannis Alagiannis, Anastasia Ailamaki
Physical designers’ estimates Workload Workload Constraints Potentially Actually ? Arrgh! 50% better! 10% worse! Database Estimated Estimated Physical ID val time ID val sales improvement improvement Designer 8 22 11a.m 1 21 8% 50% 5% 23 33 12a.m 2 21 12% 92 88 12a.m X 3 77 33% DBA Configuration Configuration Overestimates = User frustration Underestimates = Missing opportunities 2
Our approach Compare existing physical database designers in terms of predictability (actual vs. estimated improvement). Database Physical Space budget Query What-lf Designer Num. of queries optimizer Updates Statistics 3
Experimental setup • Hardware – 2 x 4-core AMD 2.7GHz, 32 GB, Win 2008 R2 – I/O: 2 x 750 GB SATA 7200rpm, RAID 0, 90 MB/s • Commercial DBMS – System A, System B, System C – Buffer pool 20% of DB size, cold runs, updated statistics • Workloads – TPC-H: SF (10 and 100), 17 queries – NREF: 6.7GB, 200 queries 4
Experimental methodology 2. Tune Workload 1. Run 4. Re-run Database Physical Designer ID val time 3. Apply ID val sales 8 22 11a.m 1 21 8% 23 33 12a.m 2 21 12% 92 88 12a.m 3 77 33% 5. Calculate… Metric Label Formula 1 − Tuned_time × 100 Actual improvement (%) I A Original_time Original_time − Estimated_improvement × Original_time Estimated tuned time (sec) E TT 100 𝐅 𝐔𝐔 − 𝐔𝐯𝐨𝐟𝐞_𝐮𝐣𝐧𝐟 × 𝟐𝟏𝟏 Relative estimation error (%) R EE 𝐔𝐯𝐨𝐟𝐞_𝐮𝐣𝐧𝐟 5
Impact of space budget Setting : TPC-H, SF10, Unlimited time 100 Estimated I. Actual I. Improvement (%) 80 60 40 20 -219% -776% 0 5GB 15GB Unbounded 5GB 15GB Unbounded 5GB ~ ~ -20 System A System B System C Space budget size Improvement usually higher than estimated Performance hurt in Systems B and C 6
Analyzing performance degradation Setting : TPC-H, SF10, System B, Space budget 15GB Original 1.2 Normalized exec. time 5x 75x Tuned 15GB 1.0 0.8 0.6 0.4 0.2 0.0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q16 Q18 Q19 TPC-H Workload Two queries prolonged workload execution 8 times 7
Cause for sub-optimal plans • Cardinality errors • Cost model Actual: … Estimated: … 3.2M Estimated: Actual: 192K 305K 298K Nested loop join Table lookup Actual: Estimated: Table Index 108K 6K lookup scan Part Index Table scan scan Lineitem Part Part Lineitem - Order of magnitude more tuples - Wrong decision of cost model - 75x longer execution time! - 5x longer execution time! Optimizer’s mistakes -> mislead designer -> hurt predictability 8
Increasing number of queries Setting : NREF, Space budget 20GB, Time budget 30min 120 Estimated I. Actual I. 100 Improvement (%) 80 60 40 20 - 91% - 69% - 60% 0 20 50 100 200 20 50 100 200 20 50 100 200 ~ ~ ~ -20 System A System B System C Number of queries Improvement lower than estimated Wrong cardinalities hurt performance of System B! 9
Impact of updates Setting : NREF, Space budget 20GB, Time budget 30min, 400 statements Metric System A System B System C Estimated I. (%) 58.62 -- 2.23 Actual I. (%) -18.3 -- -8.13 Relative error (%) 65.02 -- 9.58 Complex trade-off between improvement and maintenance 10
Summary • System A – Relative error 2 – 46 % in read-only workloads – Performance hurt by 18% only in update-intensive workload • System B – Relative error 14 – 92 % in read-only workloads – Performance hurt up to 8x after tuning • System C – Relative error 42 – 87 % in read-only workloads – Performance hurt up to 2x after tuning What you see is what you get NOT 11
Thank you! http://dias.epfl.ch 12
Recommend
More recommend