PGCon-2020 Online 2020-05-26
PGCon-2020 Online 2020-05-26 Database systems: - - PowerPoint PPT Presentation
PGCon-2020 Online 2020-05-26 Database systems: - - PowerPoint PPT Presentation
PGCon-2020 Online 2020-05-26 Database systems: 2002-2005: since 2005: Worked on XML data type and functions (2005-2007) Long-term community activist #RuPostgres, 2000+ members Conferences Program
○ Database systems: ○ 2002-2005: ○ since 2005: ○ Worked on XML data type and functions (2005-2007) ○ Long-term community activist – #RuPostgres, 2000+ members ○ Conferences Program Committee ○ Current business:
etc.
CHEWY.COM
- — boost development of fast-growing
PostgreSQL-based projects using thin cloning and high level of automation
Clients and partners:
CHEWY.COM
- Get an overview of numerous tools and methods in Postgres ecosystem
- Focus on remembering what can be done rather than how
- Learn about the new methodology: “Seamless SQL Optimization” to build
powerful and scalable optimization process in your organization
- Bonus: PostgreSQL and tools developers might find some new (or old) ideas
- SQL Performance analysis. Methodologies
- Macro-analysis: tools, metrics, visualization
- Switching from Macro- to Micro-analysis (and back)
- Micro-analysis: tools, metrics, visualization
- How do we optimize? Where?
- 3 key principles of Seamless SQL Optimization methodology
bit.ly/pgcon2020
Just launch it, do not follow the steps (yet). Watch what I’m doing
(inherited from Brendan Gregg’s talks on Performance Analysis)
- “Zero methodology”: lack of tools and analysis / optimization activities.
(inherited from Brendan Gregg’s talks on Performance Analysis)
- “Zero methodology”: lack of tools and analysis / optimization activities.
- “Street Light” anti-method:
- Pick query that is known to be slow (e.g., random slow query from logs)
- Use tools and metrics that are familiar, without understanding the whole picture
- Avoid query analysis, focus on config tuning, scaling-in, or partitioning / sharding
(inherited from Brendan Gregg’s talks on Performance Analysis)
- “Zero methodology”: lack of tools and analysis / optimization activities.
- “Street Light” anti-method.
- “Drunk man” anti-method. Tune random things. For example:
- Increase shared_buffers. Or work_mem. Or random/seq_page_cost. Or effective_cache_size
- Build indexes for all columns
- Put VACUUM FULL to cron.
- Pick random inefficient query and optimize it, then iterate
- Let’s use Google, StackOverflow, etc.
… until the problems go away
(inherited from Brendan Gregg’s talks on Performance Analysis)
- “Zero methodology”: lack of tools and analysis / optimization activities.
- “Street Light” anti-method.
- “Drunk man” anti-method.
- “Blame someone else” anti-method:
- “Database is slow again!”
- “It’s not database, it’s network!”
- “Our configuration is fine, it’s AWS/GCP/… problem!”
- USE (Utilization, Saturation, Errors) by Brendan Gregg
- RED (Requests Rate, Errors, Duration) by Tom Wilkie
- The Four Golden Signals (RED + Saturation) by Rob Ewaschuk
- Problem statement method
- Resource analysis
- CPU and Off-CPU FlameGraph analysis (perf, eBPF)
- Wait Events analysis
- Transaction and lock analysis
- Tuple and operation stats analysis
- Workload analysis (pg_stat_statements, log-based)
- Single query analysis (EXPLAIN, auto_explain). Variations:
- Planner numbers only
- Time-centric
- Focused on cardinality/selectivity (row counts)
- Buffers-centric
- Database Experiments
Macro-analysis:
- analyze workload as a whole
- split to segments (e.g., group queries removing parameters)
- find “heavy” segments (“heavy” may have various meanings here)
- apply macro- or micro-optimization to speed up whole workload or significant part of it
- pro step: ensure that no parts of workload have slowed down
Micro-analysis:
- analyze a single query, forget about all others – run EXPLAIN to obtain the plan
- find bottlenecks, inefficient nodes in the plan
- apply micro- or macro-optimization to improve performance
- pro step: ensure that all queries have not slowed down
pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)
MACRO-ANALYSIS MICRO-ANALYSIS
pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)
MACRO-ANALYSIS MICRO-ANALYSIS
pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)
MACRO-ANALYSIS MICRO-ANALYSIS
pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)
MACRO-ANALYSIS MICRO-ANALYSIS
pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)
MACRO-ANALYSIS MICRO-ANALYSIS
- pg_stat_activity
- pid, query (usually limited to default 1024 bytes), ... , xact_start, query_start, state_change, ….,
state, …, wait_event, wait_event_type, …
- pg_locks
- pg_stat_statements
- queryid, query (entire text w/o params), calls, total_time, mean_time, …, shared_blk_***, …
- Non-standard: pg_stat_kcache, pg_qualstats, pg_wait_sampling
- PostgreSQL logs
- Slow queries:
- pid, query (full), duration (if exceeds log_min_duration_statement), ...
- Blocked queries, with blockers (when blocking state exceeds deadlock_timeout)
- Canceled queries (deadlocks, statement_timeout, idle_in_transaction_session_timeout)
pg_stat_activity ┌──────────────────┬──────────────────────────┐ │ Column │ Type │ ├──────────────────┼──────────────────────────┤ │ datid │ oid │ │ datname │ name │ │ pid │ integer │ │ usename │ name │ │ application_name │ text │ │ client_addr │ inet │ │ client_hostname │ text │ │ client_port │ integer │ │ backend_start │ timestamp with time zone │ │ xact_start │ timestamp with time zone │ │ query_start │ timestamp with time zone │ │ state_change │ timestamp with time zone │ │ wait_event_type │ text │ │ wait_event │ text │ │ state │ text │ │ backend_xid │ xid │ │ backend_xmin │ xid │ │ query │ text │ │ backend_type │ text │ └──────────────────┴──────────────────────────┘ pg_stat_statements (extension, in core) ┌─────────────────────┬──────────────────┐ │ Column │ Type │ ├─────────────────────┼──────────────────┤ │ userid │ oid │ │ dbid │ oid │ │ queryid │ bigint │ │ query │ text │ │ calls │ bigint │ │ total_time │ double precision │ │ min_time │ double precision │ │ max_time │ double precision │ │ mean_time │ double precision │ │ stddev_time │ double precision │ │ rows │ bigint │ │ shared_blks_hit │ bigint │ │ shared_blks_read │ bigint │ │ shared_blks_dirtied │ bigint │ │ shared_blks_written │ bigint │ │ local_blks_hit │ bigint │ │ local_blks_read │ bigint │ │ local_blks_dirtied │ bigint │ │ local_blks_written │ bigint │ │ temp_blks_read │ bigint │ │ temp_blks_written │ bigint │ │ blk_read_time │ double precision │ │ blk_write_time │ double precision │ └─────────────────────┴──────────────────┘
┌─────────────────────┬──────────────────┐ │ Column │ Type │ ├─────────────────────┼──────────────────┤ │ userid │ oid │ │ dbid │ oid │ │ queryid │ bigint │ │ query │ text │ │ calls │ bigint │ │ total_time │ double precision │ │ min_time │ double precision │ │ max_time │ double precision │ │ mean_time │ double precision │ │ stddev_time │ double precision │ │ rows │ bigint │ │ shared_blks_hit │ bigint │ │ shared_blks_read │ bigint │ │ shared_blks_dirtied │ bigint │ │ shared_blks_written │ bigint │ │ local_blks_hit │ bigint │ │ local_blks_read │ bigint │ │ local_blks_dirtied │ bigint │ │ local_blks_written │ bigint │ │ temp_blks_read │ bigint │ │ temp_blks_written │ bigint │ │ blk_read_time │ double precision │ │ blk_write_time │ double precision │ └─────────────────────┴──────────────────┘
PostgreSQL 13 – more columns (committed!):
plans {total,min,max,mean,stddev}_plan_time {total,min,max,mean,stddev}_exec_time wal_records wal_fpi wal_bytes
3rd party extensions: PoWA team: pg_stat_kcache – disk- and CPU-level stats pg_qualstats – WHERE and JOIN predicates stats PostgresPro (included to PoWA monitoring): pg_wait_sampling – sampling stats of wait events pgsentinel – sampling active session history
pg_top pgCenter
https://github.com/akardapolov/ASH-Viewer https://github.com/dbacvetkov/PASH-Viewer
https://github.com/postgrespro/pg_wait_sampling https://powa.readthedocs.io/en/latest/
bit.ly/pgcon2020
Perform the steps 1-2
total_time
total_time calls
total_time calls mean_time
total_time calls mean_time
shared_blks_read
postgres-checkup https://gitlab.com/postgres-ai/postgres-checkup
Two snapshots are needed to generate this report
The 1st “subrow” shows absolute values from pg_stat_statements & pg_stat_kcache
The 2nd “subrow” – “per second” The 3rd – “per call”
Finally, the 4th “subrow”: relative “contribution” of this query group
Every PostgreSQL monitoring must have query analysis Preferably in both forms: graphs (historical data) and tables (detailed view of all metrics) None of the current products are perfect, but they are developing quickly #RuPostgres PostgreSQL monitoring review checklist: bit.ly/pgmonitoring
EXPLAIN shows only what planner thinks about the query EXPLAIN ANALYZE executes the query and shows details Consider using EXPLAIN (ANALYZE, BUFFERS) always. Why? Reasons of slowness:
- Contention issues (may not be seen in micro-analysis)
- A lot of work to do
- Logical level: high number of “rows”
- Physical level: high number of buffers (Postgres 13 addition: amount of WAL generated)
- Slow disk IO (check with cold and warm caches)
test=# CREATE TABLE t AS SELECT * FROM generate_series(1, 100000); CREATE test=# \timing on Timing is on. test=# SELECT COUNT(*) FROM t; count
- 100000
(1 row) Time: 8.211 ms test=# EXPLAIN ANALYZE SELECT COUNT(*) FROM t; QUERY PLAN
- Aggregate (cost=1693.00..1693.01 rows=1 width=8) (actual time=18.753..18.753 rows=1 loops=1)
- > Seq Scan on t (cost=0.00..1443.00 rows=100000 width=0) (actual time=0.011..10.451 rows=100000 loops=1)
Planning Time: 0.034 ms Execution Time: 18.788 ms (4 rows) Time: 19.494 ms
https://www.postgresql.org/docs/current/using-explain.html#USING-EXPLAIN-CAVEATS
https://ongres.com/blog/explain_analyze_may_be_lying_to_you/
pgAdmin{3,4}
explain.depesz.com
FlameGraphs
PEV2 – explain.dalibo.com
explain.tensor.ru
bit.ly/pgcon2020
Perform the steps 3-4
- Just guess // anti-method
- Use statistics to guess better
- Use pg_stat_activity (but queries are truncated to track_activity_query_size bytes)
- Use slow query logging
- Only queries longer than log_min_duration_statement
- r auto_explain.log_min_duration // careful! BUFFERS with TIMING also has overhead
- PostgreSQL 12: log sampling
- log_transaction_sample_rate (%) of all transactions
- r auto_explain.sample_rate (%) of all statements
- If possible, consider
- increasing track_activity_query_size
- using pg_qualstas or pg_wait_sampling
To find match pg_stat_statements query groups and individual queries:
- use libpg_query
- Patch exists to add queryid to
logs and pg_stat_activity
bit.ly/pgcon2020
Perform the steps 5-9
- Run on production // anti-method
- Concurrent workload may affect observations
- Cannot check with cold caches
- How to check a modifying query? (BEGIN; EXPLAIN …; ROLLBACK;) // anti-method *2!
- How to verify an index idea? Or schema redesign? → impossible
- Run on a clone
- Easier in clouds
- Better if the clone was created at physical level
- For large databases, harder to iterate (very time- and resource-heavy)
→ Database Lab by Postgres.ai
https://postgres.ai/docs/database-lab/what_is_database_lab
https://www.katacoda.com/postgres-ai/
bit.ly/pgcon2020
Perform the step 10
Locking issues https://gitlab.com/snippets/1890428 Thin clones Demo of Joe Bot https://postgres.ai/console/demo/joeinstances
@postgresmen
58
D a t a b a s e L a b , J
- e
B
- t
,
postgres-checkup, and more:
h t t p : / / p
- s