advanced
play

ADVANCED DATABASE SYSTEMS Optimizer Implementation (Part III) @ - PowerPoint PPT Presentation

Lect ure # 21 ADVANCED DATABASE SYSTEMS Optimizer Implementation (Part III) @ Andy_Pavlo // 15- 721 // Spring 2020 2 O BSERVATIO N The best plan for a query can change as the database evolves over time. Physical design changes.


  1. Lect ure # 21 ADVANCED DATABASE SYSTEMS Optimizer Implementation (Part III) @ Andy_Pavlo // 15- 721 // Spring 2020

  2. 2 O BSERVATIO N The best plan for a query can change as the database evolves over time. → Physical design changes. → Data modifications. → Prepared statement parameters. → Statistics updates. The query optimizers that we have talked about so far all generate a plan for a query before the DBMS executes a query. 15-721 (Spring 2020)

  3. 3 BAD Q UERY PLAN S The most common problem in a query plan is incorrect join orderings. → This occurs because of inaccurate cardinality estimations that propagate up the plan. If the DBMS can detect how bad a query plan is, then it can decide to adapt the plan rather than continuing with the current sub-optimal plan. 15-721 (Spring 2020)

  4. 4 BAD Q UERY PLAN S Original Plan HASH_JOIN(A ⨝ B ⨝ C,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) 15-721 (Spring 2020)

  5. 4 BAD Q UERY PLAN S Original Plan HASH_JOIN(A ⨝ B ⨝ C,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) Estimated Cardinality: 1000 JOIN C ON A.id = C.id Actual Cardinality: 100000 JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) 15-721 (Spring 2020)

  6. 4 BAD Q UERY PLAN S Original Plan HASH_JOIN(A ⨝ B ⨝ C,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) Estimated Cardinality: 1000 JOIN C ON A.id = C.id Actual Cardinality: 100000 JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) If the optimizer knew the true cardinality, would it have picked the same the join ordering, join algorithms, or access methods? 15-721 (Spring 2020)

  7. 5 WH Y GO O D PLAN S GO BAD Estimating the execution behavior of a plan to determine its quality relative to other plans. These estimations are based on a static summarizations of the contents of the database and its operating environment: → Statistical Models / Histograms / Sampling → Hardware Performance → Concurrent Operations 15-721 (Spring 2020)

  8. 6 ADAPTIVE Q UERY O PTIM IZATIO N Modify the execution behavior of a query by generating multiple plans for it: → Individual complete plans. → Embed multiple sub-plans at materialization points. Use information collected during query execution to improve the quality of these plans. → Can use this data for planning one query or merge the it back into the DBMS's statistics catalog. ADAPTIVE QUERY PROCESSING IN THE LOOKING GLASS CIDR 2 2005 15-721 (Spring 2020)

  9. 7 ADAPTIVE Q UERY O PTIM IZATIO N Approach #1: Modify Future Invocations Approach #2: Replan Current Invocation Approach #3: Plan Pivot Points 15-721 (Spring 2020)

  10. 8 M O DIFY FUTURE IN VO CATIO N S The DBMS monitors the behavior of a query during execution and uses this information to improve subsequent invocations. Approach #1: Plan Correction Approach #2: Feedback Loop 15-721 (Spring 2020)

  11. 9 REVERSIO N - BASED PLAN CO RRECTIO N The DBMS tracks the history of query invocations: → Cost Estimations → Query Plan → Runtime Metrics If the DBMS generates a new plan for a query, then check whether that plan performs worse than the previous plan. → If it regresses, then switch back to the cheaper plans. 15-721 (Spring 2020)

  12. 10 REVERSIO N - BASED PLAN CO RRECTIO N Original Plan HASH_JOIN(A ⨝ B ⨝ C,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) 15-721 (Spring 2020)

  13. 10 REVERSIO N - BASED PLAN CO RRECTIO N Original Plan HASH_JOIN(A ⨝ B ⨝ C,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) Estimated Cost: 1000 Actual Cost: 1000 Execution History 15-721 (Spring 2020)

  14. 10 REVERSIO N - BASED PLAN CO RRECTIO N Original Plan HASH_JOIN(A ⨝ B ⨝ C,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) Estimated Cost: 1000 Actual Cost: 1000 CREATE INDEX idx_b_val ON B (val); Execution CREATE INDEX idx_d_val ON D (val); History 15-721 (Spring 2020)

  15. 10 REVERSIO N - BASED PLAN CO RRECTIO N Original Plan New Plan HASH_JOIN(A ⨝ B ⨝ C,D) NL_JOIN(C ⨝ B ⨝ A,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) NL_JOIN(C ⨝ B,A) IDX_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) NL_JOIN(C,B) SEQ_SCAN(A) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) SEQ_SCAN(C) IDX_SCAN(B) Estimated Cost: 1000 Actual Cost: 1000 CREATE INDEX idx_b_val ON B (val); Execution CREATE INDEX idx_d_val ON D (val); History 15-721 (Spring 2020)

  16. 10 REVERSIO N - BASED PLAN CO RRECTIO N Original Plan New Plan HASH_JOIN(A ⨝ B ⨝ C,D) NL_JOIN(C ⨝ B ⨝ A,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) NL_JOIN(C ⨝ B,A) IDX_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) NL_JOIN(C,B) SEQ_SCAN(A) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) SEQ_SCAN(C) IDX_SCAN(B) Estimated Cost: 1000 Estimated Cost: 800 Actual Cost: 1000 Actual Cost: 1200 CREATE INDEX idx_b_val ON B (val); Execution CREATE INDEX idx_d_val ON D (val); History 15-721 (Spring 2020)

  17. 11 M ICRO SO FT PLAN STITCH ING Combine useful sub-plans from queries to create potentially better plans. → Sub-plans do not need to be from the same query. → Can still use sub-plans even if overall plan becomes invalid after a physical design change. Uses a dynamic programming search (bottom-up) that is not guaranteed to find a better plan. PLAN STITCH: HARNESSING THE BEST OF MANY PLANS VLDB 2018 15-721 (Spring 2020)

  18. 12 M ICRO SO FT PLAN STITCH ING Original Plan New Plan HASH_JOIN(A ⨝ B ⨝ C,D) NL_JOIN(C ⨝ B ⨝ A,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) NL_JOIN(C ⨝ B,A) IDX_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) NL_JOIN(C,B) SEQ_SCAN(B) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) SEQ_SCAN(C) IDX_SCAN(B) CREATE INDEX idx_b_val ON B (val); CREATE INDEX idx_d_val ON D (val); 15-721 (Spring 2020)

  19. 12 M ICRO SO FT PLAN STITCH ING Original Plan New Plan HASH_JOIN(A ⨝ B ⨝ C,D) NL_JOIN(C ⨝ B ⨝ A,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) NL_JOIN(C ⨝ B,A) IDX_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) NL_JOIN(C,B) SEQ_SCAN(B) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) SEQ_SCAN(C) IDX_SCAN(B) CREATE INDEX idx_b_val ON B (val); CREATE INDEX idx_d_val ON D (val); DROP INDEX idx_b_val; 15-721 (Spring 2020)

  20. 12 M ICRO SO FT PLAN STITCH ING Original Plan New Plan HASH_JOIN(A ⨝ B ⨝ C,D) NL_JOIN(C ⨝ B ⨝ A,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) NL_JOIN(C ⨝ B,A) IDX_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) NL_JOIN(C,B) SEQ_SCAN(B) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) SEQ_SCAN(C) IDX_SCAN(B) Sub-Plan Cost: 600 CREATE INDEX idx_b_val ON B (val); CREATE INDEX idx_d_val ON D (val); DROP INDEX idx_b_val; 15-721 (Spring 2020)

  21. 12 M ICRO SO FT PLAN STITCH ING Original Plan New Plan Sub-Plan Cost: 150 HASH_JOIN(A ⨝ B ⨝ C,D) NL_JOIN(C ⨝ B ⨝ A,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) NL_JOIN(C ⨝ B,A) IDX_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) NL_JOIN(C,B) SEQ_SCAN(B) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) SEQ_SCAN(C) IDX_SCAN(B) Sub-Plan Cost: 600 CREATE INDEX idx_b_val ON B (val); CREATE INDEX idx_d_val ON D (val); DROP INDEX idx_b_val; 15-721 (Spring 2020)

  22. 12 M ICRO SO FT PLAN STITCH ING Original Plan New Plan Sub-Plan Cost: 150 HASH_JOIN(A ⨝ B ⨝ C,D) NL_JOIN(C ⨝ B ⨝ A,D) SELECT * FROM A JOIN B ON A.id = B.id HASH_JOIN(A ⨝ B,C) SEQ_SCAN(D) NL_JOIN(C ⨝ B,A) IDX_SCAN(D) JOIN C ON A.id = C.id JOIN D ON A.id = D.id HASH_JOIN(A,B) SEQ_SCAN(C) NL_JOIN(C,B) SEQ_SCAN(B) WHERE B.val = 'WuTang' AND D.val = 'Clan'; SEQ_SCAN(A) SEQ_SCAN(B) SEQ_SCAN(C) IDX_SCAN(B) Sub-Plan Cost: 600 NL_JOIN(C ⨝ B ⨝ A,D) Total Estimated Cost: CREATE INDEX idx_b_val ON B (val); HASH_JOIN(A ⨝ B,C) IDX_SCAN(D) 600 + 150 CREATE INDEX idx_d_val ON D (val); HASH_JOIN(A,B) SEQ_SCAN(C) DROP INDEX idx_b_val; SEQ_SCAN(A) SEQ_SCAN(B) 15-721 (Spring 2020)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend