PGCon 2020 Tatsuro Yamada Julien Rouhaud Who we are Tatsuro - PowerPoint PPT Presentation

PGCon 2020 Tatsuro Yamada Julien Rouhaud

Who we are Tatsuro Yamada ⁃ Works for NTT Comware as a Database Engineer ⁃ PostgreSQL Contributor ⁃ Oracle_fdw Committer ⁃ pg_plan_advsr Author ⁃ Member of PGConf.Asia Organizers since 2016 Julian Rouhaud ⁃ Works for VMware ⁃ PostgreSQL Contributor ⁃ HypoPG Author ⁃ pg_qualstats Co-Author

Agenda 1. What is a Query plan? 2. Why inefficient plan can be chosen? 3. Our Challenges to get an efficient plan 4. Conclusion

1. What is a Query plan?

1. What is a Query plan? SQL • declarative Language Query Plan • planner steps to retrieve the wanted data • contains multiple plan nodes (e.g. scan, join, aggregate, …) • tree structure Planner • selects a query plan it thinks is the most efficient plan based on calculated cost Does planner select an efficient plan every time?

Inefficient plan • It tends to be a long query execution time • EXPLAIN command allows checking a query plan e.g. EXPLAIN (ANALYZE, BUFFERS) select ….

Inefficient plan In this example, using an Index can reduce the execution time QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop ([...] rows=313475000 width=16) ([...] rows=0 loops=1) -> Seq Scan on pgqs t2 ([...] rows=25078 width=8) ([...] rows=25000 loops=1) Filter: (val1 = 0) Rows Removed by Filter: 25000 -> Materialize ([...] rows=12500 width=8) ([...] rows=0 loops=25000) -> Seq Scan on pgqs t1 ([...] rows=12500 width=8) ([...] rows=0 loops=1) Filter: ((val1 = 0) AND (val2 = 0)) Rows Removed by Filter: 50000 Why inefficient plan can be chosen? https://rjuju.github.io/postgresql/2020/02/28/pg_qualstats-2-selectivity-error.html

2. Why inefficient plan can be chosen?

2. Why inefficient plan can be chosen? Problem Reasons Root causes Inefficient Bad Index deficiency plan scan method Bad Inaccurate statistics join method Bad Planner specification join order

2.1. Index deficiency or not working Problems • No suitable index available • Good candidate index can't be used Solutions • Create suitable index and check query writing • Better to create a minimum number of indexes to achieve better performance • If you have thousands of queries, it’s a tough work

2.2. Inaccurate statistics What are statistics? Query • Sampled data from a table Planner e.g. • Number of rows Plan Plan • Number of unique values Plan • Most common values Stats How does planner use statistics? • To calculate the cardinality/selectivity of nodes, and cost of query plans • Planner assumes the lowest cost plan is the most efficient plan In what cases can statistics be inaccurate?

2.2. Inaccurate statistics Cases a. Autovacuum not properly tuned b. Usage of temporary tables c. Massive imports directly followed by queries or in the same transaction

2.2. Inaccurate statistics a. autovacuum not properly tuned Problem • autovacuum appears to not work properly Solution • Tune the threshold for executing analyze by using GUC parameters: • autovacuum_analyze_threshold • autoavacuum_analyze_scale_factor • And ideally on a per-table basis • ALTER TABLE tbl SET (autovacuum_analyze _... TO …)

2.2. Inaccurate statistics b. Usage of temporary tables Problem • Temporary tables have no statistics by default, as autovacuum won’t see them • Planner uses Default rows number to calculate a cost Solution • Run manual ANALYZE on temporary tables

2.2. Inaccurate statistics c. Massive imports followed by queries in the same transaction Problem • A query executed just after importing a large number of rows can lead to inefficient plan since planner can’t use up to date statistics Solutions • Execute analyze manually is better than waiting for autovacuum • Divide the transaction in three processing steps, such as import, analyze, and query processing, if possible imports + query imports analyze query

2.3. Planner specification Selectivity estimation for AND-ed quals Problem • The planner by default assumes that columns are totally independent for selectivity estimation Example WHERE zipcode = 75 AND city = ‘Paris’ • Paris zipcode is 75, so both predicate are actually redundant • If both predicates retain 1% of the rows, applying both should also retain 1% of the rows • But the planner will by default multiply the selectivity, assuming that only 0.01% of the rows will be retained

2.3. Planner specification Solution • Detect those correlated columns • Create extended statistics (CREATE STATISTICS)

Causes and Solutions Problem Reasons Root causes Solutions Inefficient Bad Index Create Index plan scan method deficiency Bad Inaccurate Appropriate join method statistics setting Bad Planner Create join order specification Extended Stats Plan Tuning Is it possible to apply the solutions easily?

3. Our Challenges to get efficient plan

3. Our Challenges to get an efficient plan 3.1. Automatic indexing tool • pg_qualstats 3.2. Auto query plan tuning tool • pg_plan_advsr 3.3. What's next?

3.1. Auto index and extended statistics advising tool: pg_qualstats 3.1.a What is it? 3.1.b How does index suggestion work? 3.1.c Demo 3.1.d Summary

3.1.a What is pg_qualstats • PostgreSQL extension that keeps track of predicates (WHERE/JOIN clauses) statistics during query execution • Selectivity, error in selectivity estimation, number of execution, type of scan… • Aggregate per unique query (same as pg_stat_statements) • And a global index advisor using those statistics • Take into account the server actual workload • Use multiple heuristics to suggest the minimum number of new index • E.g. Preference for multi-column indexes • Returns new index suggestion and list predicates that can automatically be optimized • E.g. Operator that don’t have any index AM compatibility

3.1.b How does index suggestion work • Retrieve the list of “interesting” predicates • Filter more than 50% of the rows • Using a sequential scan • For queries run more than 10 times • …

3.1.b How does index suggestion work • Build the full set of paths, with each AND-ed predicates combination

3.1.b How does index suggestion work • Compute a score for each path, giving a weight to each predicate corresponding to the number of « simple » predicate

3.1.b How does index suggestion work • Highest score is the « best »index • a single index optimizing the higher number of predicate • Generate the index DDL given the predicate list • The correct column order correspond to the ascending weight in the path! • Remove all optimized predicates from the list, and start again until all predicates are optimized

3.1.c Demo – the queries

3.1.c Demo – optimizable predicates

3.1.c Demo – non optimizable predicates • The ILIKE (~~*) can’t be optimized automatically. DBA attention is required here to know how to properly handle this.

3.1.d Summary • Useful extension to extract knowledge on your production workload • H elps DBA to focus on the “complex” optimization part thanks to the global index suggestion • But also gives a lot of other information that can be used for other parts of optimization • Automatically find correlated columns • Automatically find problematic statistics • …

3.2. Auto query plan tuning tool: pg_plan_advsr 3.2.a What is this? 3.2.b How it works 3.2.c Demo 3.2.d Summary

3.2.a What are pg_plan_advsr and its merit pg_plan_advsr • allows automatic tuning of a query plan by using a feedback loop Advantages - The Machine performs the tuning, rather than humans - Tuning is possible even when there is no expert. - For experts, it is possible to gain new awareness - Can find an optimal plan - Since it can controls each plan node, It can create an optimal plan that might not be selected by Vanilla PostgreSQL - Availability of all plan history in tuning process - Moreover, able to reproduce any plan from the history

3.2.a What are pg_plan_advsr and its merit Use-cases • System development field • Tuning during performance tests and performance troubles • Regular reporting process • Analytical processing • Batch processing • Field of study • Verify planner's capability • You can check the performance of the selected plan when there is no estimation error How did this tool achieve automatic tuning? Note: - pg_plan_advsr is in POC phase, so recommend to use it in a verification environment - Refer to the manual for details due to functional restrictions

3.2.b How does Auto plan tuning work - Add Feedback Loop 2. Correct Query estimation error by using feedback info on - EXPLAIN Analyze next query execution Parser - PDCA cycle Feedback Info Planner pg_plan_advsr 3. Record Executer Error Info [1] 1. Detect Result estimation error [1] The difference between Estimated rows and Actual rows

3.2.b Rough concept of pg_plan_advsr How to correct an estimation error 1st iteration 2nd iteration Nest Loop Hash Join Feedback Info estimate: 1 estimate:1000 (Hints) actual: 1000 actual: 1000 (A B) (A B) Fix A B A B Seq Scan Index Scan Seq Scan Index Scan

PGCon 2020 Tatsuro Yamada Julien Rouhaud Who we are Tatsuro - PowerPoint PPT Presentation

PGCon 2020 Tatsuro Yamada Julien Rouhaud Who we are Tatsuro Yamada Works for NTT Comware as a Database Engineer PostgreSQL Contributor Oracle_fdw Committer pg_plan_advsr Author Member of PGConf.Asia Organizers since 2016

Performance analysis at full power Julien Rouhaud pgconf.eu 2019 Oct. 16th 2019 1/40 Julien

Oranda-jima Project Yamada-machi Oranda-jima Project Yamada-machi March10th 2011 Yamada-machi

Review of Ghost-Fishing; Scientific Approaches to Evaluation and Solution Tatsuro Matsuoka,

Pitting Corrosion on Epoxy-Coated Surface of Ship Structures Tatsuro Nakai, Hisao Matsushita,

PGCon-2020 Online 2020-05-26 Database systems: 2002-2005: since

Relabeling Julien Pivotto (@roidelapluie) PromConf Munich August 9, 2017 user{name="Julien

Building Better Benchmarks PGCon 2020 . . . . . . . . . . . . . . . . . . . .

Implementing System Versioned Temporal Table Surafel Temesgen Mamo Pgcon 2020 About me

Hacking the Query Planner, Again Richard Guo / VMware PGCon 2020 Agenda What does planner

Mechanical Sympathy for Elephants Reducing I/O and memory stalls Thomas Munro, PGCon 2020

CREATE STATISTICS What is it for? Tomas Vondra, 2ndQuadrant tomas.vondra@2ndquadrant.com PGCon

Standard SQL Gap Analysis Features where PostgreSQL lags behind its competitors PgCon.org 2018

H a c k i n g P o s t g r e S Q L PGCon 2013 Ottawa, Canada Stephen Frost sfrost@snowman.net

Performance Evaluation of Inter-vehicle Packet Relay for Fast Mobile Road-vehicle Communication

Recent advances in document network embedding @ERIC Julien Velcin julien.velcin@univ-lyon2.fr

OpenStack Telemetry and the 10,000 Instances To infinity and beyond Julien Danjou Alex Krzos 9

Wim Peeters PBDKO vzw (Belgium) Abstract In Flanders (Belgium) secondary schools are responsible

Lean logistics, lessons learnt from Japan Adrian Blumenthal, Special Projects Director,

Three Ways to make your Industrial Data Science Projects a Success Prof. Dr.-Ing. Jochen Deuse

The FIFE Project: Computing for Experiments Ken Herner for the FIFE Project DPF 2017 3 August

Deep Neural Networks for PDEs Philipp Grohs DL and Vis, September 2018 Short Reading List 1 Ian

Stochastic (partial) differential equations and Gaussian processes Simo Srkk Aalto

Separation of Variables Bessel Equations Bernd Schr oder logo1 Bernd Schr oder

MA-207 Differential Equations II Ronnie Sebastian Department of Mathematics Indian Institute of

Sambuz

Useful Links

Newsletter

Mail Us

PGCon 2020 Tatsuro Yamada Julien Rouhaud Who we are Tatsuro - PowerPoint PPT Presentation

PGCon 2020 Tatsuro Yamada Julien Rouhaud Who we are Tatsuro Yamada Works for NTT Comware as a Database Engineer PostgreSQL Contributor Oracle_fdw Committer pg_plan_advsr Author Member of PGConf.Asia Organizers since 2016

Performance analysis at full power Julien Rouhaud pgconf.eu 2019 Oct. 16th 2019 1/40 Julien

Oranda-jima Project Yamada-machi Oranda-jima Project Yamada-machi March10th 2011 Yamada-machi

Review of Ghost-Fishing; Scientific Approaches to Evaluation and Solution Tatsuro Matsuoka,

Pitting Corrosion on Epoxy-Coated Surface of Ship Structures Tatsuro Nakai, Hisao Matsushita,

PGCon-2020 Online 2020-05-26 Database systems: 2002-2005: since

Relabeling Julien Pivotto (@roidelapluie) PromConf Munich August 9, 2017 user{name=&quot;Julien

Building Better Benchmarks PGCon 2020 . . . . . . . . . . . . . . . . . . . .

Implementing System Versioned Temporal Table Surafel Temesgen Mamo Pgcon 2020 About me

Hacking the Query Planner, Again Richard Guo / VMware PGCon 2020 Agenda What does planner

Mechanical Sympathy for Elephants Reducing I/O and memory stalls Thomas Munro, PGCon 2020

CREATE STATISTICS What is it for? Tomas Vondra, 2ndQuadrant tomas.vondra@2ndquadrant.com PGCon

Standard SQL Gap Analysis Features where PostgreSQL lags behind its competitors PgCon.org 2018

H a c k i n g P o s t g r e S Q L PGCon 2013 Ottawa, Canada Stephen Frost sfrost@snowman.net

Performance Evaluation of Inter-vehicle Packet Relay for Fast Mobile Road-vehicle Communication

Recent advances in document network embedding @ERIC Julien Velcin julien.velcin@univ-lyon2.fr

OpenStack Telemetry and the 10,000 Instances To infinity and beyond Julien Danjou Alex Krzos 9

Wim Peeters PBDKO vzw (Belgium) Abstract In Flanders (Belgium) secondary schools are responsible

Lean logistics, lessons learnt from Japan Adrian Blumenthal, Special Projects Director,

Three Ways to make your Industrial Data Science Projects a Success Prof. Dr.-Ing. Jochen Deuse

The FIFE Project: Computing for Experiments Ken Herner for the FIFE Project DPF 2017 3 August

Deep Neural Networks for PDEs Philipp Grohs DL and Vis, September 2018 Short Reading List 1 Ian

Stochastic (partial) differential equations and Gaussian processes Simo Srkk Aalto

Separation of Variables Bessel Equations Bernd Schr oder logo1 Bernd Schr oder

MA-207 Differential Equations II Ronnie Sebastian Department of Mathematics Indian Institute of

Sambuz

Useful Links

Newsletter

Mail Us

Relabeling Julien Pivotto (@roidelapluie) PromConf Munich August 9, 2017 user{name="Julien