What You See is What You Get? Renata Borovica , Ioannis Alagiannis, - - PowerPoint PPT Presentation
What You See is What You Get? Renata Borovica , Ioannis Alagiannis, - - PowerPoint PPT Presentation
Automated Physical Designers: What You See is What You Get? Renata Borovica , Ioannis Alagiannis, Anastasia Ailamaki Physical designers estimates Workload Workload Constraints Potentially Actually ? Arrgh! 50% better! 10% worse!
Configuration Workload Constraints
Physical designers’ estimates
2
DBA
ID val time 8 22 11a.m 23 33 12a.m 92 88 12a.m ID val sales 1 21 8% 2 21 12% 3 77 33%
Estimated improvement 50% Actually 10% worse! Arrgh!
Database
Overestimates = User frustration Underestimates = Missing opportunities
?
X
Potentially 50% better!
Physical Designer
Workload Configuration Estimated improvement 5%
Our approach
3
Compare existing physical database designers in terms
- f predictability (actual vs. estimated improvement).
Query
- ptimizer
What-lf Statistics Database
Space budget
- Num. of queries
Updates Physical
Designer
Experimental setup
4
- Hardware
– 2 x 4-core AMD 2.7GHz, 32 GB, Win 2008 R2 – I/O: 2 x 750 GB SATA 7200rpm, RAID 0, 90 MB/s
- Commercial DBMS
– System A, System B, System C – Buffer pool 20% of DB size, cold runs, updated statistics
- Workloads
– TPC-H: SF (10 and 100), 17 queries – NREF: 6.7GB, 200 queries
Experimental methodology
5
Metric Label Formula Actual improvement (%) IA 1 − Tuned_time Original_time × 100 Estimated tuned time (sec) ETT Original_time − Estimated_improvement × Original_time 100 Relative estimation error (%) REE 𝐅𝐔𝐔 − 𝐔𝐯𝐨𝐟𝐞_𝐮𝐣𝐧𝐟 𝐔𝐯𝐨𝐟𝐞_𝐮𝐣𝐧𝐟 × 𝟐𝟏𝟏
ID val time 8 22 11a.m 23 33 12a.m 92 88 12a.m ID val sales 1 21 8% 2 21 12% 3 77 33%
Database Physical Designer Workload
- 2. Tune
- 3. Apply
- 1. Run
- 4. Re-run
- 5. Calculate…
Impact of space budget
6
Setting: TPC-H, SF10, Unlimited time
- 20
20 40 60 80 100
5GB 15GB Unbounded 5GB 15GB Unbounded 5GB System A System B System C
Improvement (%) Space budget size
Estimated I. Actual I.
Improvement usually higher than estimated Performance hurt in Systems B and C
~ ~
- 776%
- 219%
Analyzing performance degradation
7
0.0 0.2 0.4 0.6 0.8 1.0 1.2 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q16 Q18 Q19
Normalized exec. time TPC-H Workload
Original Tuned 15GB
Setting: TPC-H, SF10, System B, Space budget 15GB 75x 5x Two queries prolonged workload execution 8 times
Cause for sub-optimal plans
8
Optimizer’s mistakes -> mislead designer -> hurt predictability
- Cardinality errors
- Order of magnitude more tuples
- 75x longer execution time!
- Cost model
- Wrong decision of cost model
- 5x longer execution time!
Part Part Index scan Table lookup Estimated: 305K Actual: 298K … Lineitem Lineitem Index scan Table lookup Estimated: 6K Actual: 3.2M … Part Table scan Nested loop join Actual: 108K Estimated: 192K
Increasing number of queries
9
Setting: NREF, Space budget 20GB, Time budget 30min Improvement lower than estimated Wrong cardinalities hurt performance of System B!
- 20
20 40 60 80 100 120
20 50 100 200 20 50 100 200 20 50 100 200 System A System B System C
Improvement (%) Number of queries
Estimated I. Actual I.
~
- 69%
~
- 60%
~
- 91%
Impact of updates
Metric System A System B System C Estimated I. (%) 58.62
- 2.23
Actual I. (%)
- 18.3
- 8.13
Relative error (%) 65.02
- 9.58
10
Complex trade-off between improvement and maintenance
Setting: NREF, Space budget 20GB, Time budget 30min, 400 statements
Summary
- System A
– Relative error 2 – 46 % in read-only workloads – Performance hurt by 18% only in update-intensive workload
- System B
– Relative error 14 – 92 % in read-only workloads – Performance hurt up to 8x after tuning
- System C
– Relative error 42 – 87 % in read-only workloads – Performance hurt up to 2x after tuning
11
What you see is what you get NOT
12