What You See is What You Get? Renata Borovica , Ioannis Alagiannis, - - PowerPoint PPT Presentation

what you see is what you get
SMART_READER_LITE
LIVE PREVIEW

What You See is What You Get? Renata Borovica , Ioannis Alagiannis, - - PowerPoint PPT Presentation

Automated Physical Designers: What You See is What You Get? Renata Borovica , Ioannis Alagiannis, Anastasia Ailamaki Physical designers estimates Workload Workload Constraints Potentially Actually ? Arrgh! 50% better! 10% worse!


slide-1
SLIDE 1

Automated Physical Designers: What You See is What You Get?

Renata Borovica, Ioannis Alagiannis, Anastasia Ailamaki

slide-2
SLIDE 2

Configuration Workload Constraints

Physical designers’ estimates

2

DBA

ID val time 8 22 11a.m 23 33 12a.m 92 88 12a.m ID val sales 1 21 8% 2 21 12% 3 77 33%

Estimated improvement 50% Actually 10% worse! Arrgh!

Database

Overestimates = User frustration Underestimates = Missing opportunities

?

X

Potentially 50% better!

Physical Designer

Workload Configuration Estimated improvement 5%

slide-3
SLIDE 3

Our approach

3

Compare existing physical database designers in terms

  • f predictability (actual vs. estimated improvement).

Query

  • ptimizer

What-lf Statistics Database

Space budget

  • Num. of queries

Updates Physical

Designer

slide-4
SLIDE 4

Experimental setup

4

  • Hardware

– 2 x 4-core AMD 2.7GHz, 32 GB, Win 2008 R2 – I/O: 2 x 750 GB SATA 7200rpm, RAID 0, 90 MB/s

  • Commercial DBMS

– System A, System B, System C – Buffer pool 20% of DB size, cold runs, updated statistics

  • Workloads

– TPC-H: SF (10 and 100), 17 queries – NREF: 6.7GB, 200 queries

slide-5
SLIDE 5

Experimental methodology

5

Metric Label Formula Actual improvement (%) IA 1 − Tuned_time Original_time × 100 Estimated tuned time (sec) ETT Original_time − Estimated_improvement × Original_time 100 Relative estimation error (%) REE 𝐅𝐔𝐔 − 𝐔𝐯𝐨𝐟𝐞_𝐮𝐣𝐧𝐟 𝐔𝐯𝐨𝐟𝐞_𝐮𝐣𝐧𝐟 × 𝟐𝟏𝟏

ID val time 8 22 11a.m 23 33 12a.m 92 88 12a.m ID val sales 1 21 8% 2 21 12% 3 77 33%

Database Physical Designer Workload

  • 2. Tune
  • 3. Apply
  • 1. Run
  • 4. Re-run
  • 5. Calculate…
slide-6
SLIDE 6

Impact of space budget

6

Setting: TPC-H, SF10, Unlimited time

  • 20

20 40 60 80 100

5GB 15GB Unbounded 5GB 15GB Unbounded 5GB System A System B System C

Improvement (%) Space budget size

Estimated I. Actual I.

Improvement usually higher than estimated Performance hurt in Systems B and C

~ ~

  • 776%
  • 219%
slide-7
SLIDE 7

Analyzing performance degradation

7

0.0 0.2 0.4 0.6 0.8 1.0 1.2 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q16 Q18 Q19

Normalized exec. time TPC-H Workload

Original Tuned 15GB

Setting: TPC-H, SF10, System B, Space budget 15GB 75x 5x Two queries prolonged workload execution 8 times

slide-8
SLIDE 8

Cause for sub-optimal plans

8

Optimizer’s mistakes -> mislead designer -> hurt predictability

  • Cardinality errors
  • Order of magnitude more tuples
  • 75x longer execution time!
  • Cost model
  • Wrong decision of cost model
  • 5x longer execution time!

Part Part Index scan Table lookup Estimated: 305K Actual: 298K … Lineitem Lineitem Index scan Table lookup Estimated: 6K Actual: 3.2M … Part Table scan Nested loop join Actual: 108K Estimated: 192K

slide-9
SLIDE 9

Increasing number of queries

9

Setting: NREF, Space budget 20GB, Time budget 30min Improvement lower than estimated Wrong cardinalities hurt performance of System B!

  • 20

20 40 60 80 100 120

20 50 100 200 20 50 100 200 20 50 100 200 System A System B System C

Improvement (%) Number of queries

Estimated I. Actual I.

~

  • 69%

~

  • 60%

~

  • 91%
slide-10
SLIDE 10

Impact of updates

Metric System A System B System C Estimated I. (%) 58.62

  • 2.23

Actual I. (%)

  • 18.3
  • 8.13

Relative error (%) 65.02

  • 9.58

10

Complex trade-off between improvement and maintenance

Setting: NREF, Space budget 20GB, Time budget 30min, 400 statements

slide-11
SLIDE 11

Summary

  • System A

– Relative error 2 – 46 % in read-only workloads – Performance hurt by 18% only in update-intensive workload

  • System B

– Relative error 14 – 92 % in read-only workloads – Performance hurt up to 8x after tuning

  • System C

– Relative error 42 – 87 % in read-only workloads – Performance hurt up to 2x after tuning

11

What you see is what you get NOT

slide-12
SLIDE 12

12

Thank you!

http://dias.epfl.ch