PGCon-2020 Online 2020-05-26 Database systems: - - PowerPoint PPT Presentation

pgcon 2020 online 2020 05 26 database systems 2002 2005
SMART_READER_LITE
LIVE PREVIEW

PGCon-2020 Online 2020-05-26 Database systems: - - PowerPoint PPT Presentation

PGCon-2020 Online 2020-05-26 Database systems: 2002-2005: since 2005: Worked on XML data type and functions (2005-2007) Long-term community activist #RuPostgres, 2000+ members Conferences Program


slide-1
SLIDE 1

PGCon-2020 Online 2020-05-26

slide-2
SLIDE 2

○ Database systems: ○ 2002-2005: ○ since 2005: ○ Worked on XML data type and functions (2005-2007) ○ Long-term community activist – #RuPostgres, 2000+ members ○ Conferences Program Committee ○ Current business:

etc.

slide-3
SLIDE 3

CHEWY.COM

  • — boost development of fast-growing

PostgreSQL-based projects using thin cloning and high level of automation

Clients and partners:

CHEWY.COM

slide-4
SLIDE 4
  • Get an overview of numerous tools and methods in Postgres ecosystem
  • Focus on remembering what can be done rather than how
  • Learn about the new methodology: “Seamless SQL Optimization” to build

powerful and scalable optimization process in your organization

  • Bonus: PostgreSQL and tools developers might find some new (or old) ideas
slide-5
SLIDE 5
  • SQL Performance analysis. Methodologies
  • Macro-analysis: tools, metrics, visualization
  • Switching from Macro- to Micro-analysis (and back)
  • Micro-analysis: tools, metrics, visualization
  • How do we optimize? Where?
  • 3 key principles of Seamless SQL Optimization methodology
slide-6
SLIDE 6

bit.ly/pgcon2020

Just launch it, do not follow the steps (yet). Watch what I’m doing

slide-7
SLIDE 7

(inherited from Brendan Gregg’s talks on Performance Analysis)

  • “Zero methodology”: lack of tools and analysis / optimization activities.
slide-8
SLIDE 8

(inherited from Brendan Gregg’s talks on Performance Analysis)

  • “Zero methodology”: lack of tools and analysis / optimization activities.
  • “Street Light” anti-method:
  • Pick query that is known to be slow (e.g., random slow query from logs)
  • Use tools and metrics that are familiar, without understanding the whole picture
  • Avoid query analysis, focus on config tuning, scaling-in, or partitioning / sharding
slide-9
SLIDE 9

(inherited from Brendan Gregg’s talks on Performance Analysis)

  • “Zero methodology”: lack of tools and analysis / optimization activities.
  • “Street Light” anti-method.
  • “Drunk man” anti-method. Tune random things. For example:
  • Increase shared_buffers. Or work_mem. Or random/seq_page_cost. Or effective_cache_size
  • Build indexes for all columns
  • Put VACUUM FULL to cron.
  • Pick random inefficient query and optimize it, then iterate
  • Let’s use Google, StackOverflow, etc.

… until the problems go away

slide-10
SLIDE 10

(inherited from Brendan Gregg’s talks on Performance Analysis)

  • “Zero methodology”: lack of tools and analysis / optimization activities.
  • “Street Light” anti-method.
  • “Drunk man” anti-method.
  • “Blame someone else” anti-method:
  • “Database is slow again!”
  • “It’s not database, it’s network!”
  • “Our configuration is fine, it’s AWS/GCP/… problem!”
slide-11
SLIDE 11
  • USE (Utilization, Saturation, Errors) by Brendan Gregg
  • RED (Requests Rate, Errors, Duration) by Tom Wilkie
  • The Four Golden Signals (RED + Saturation) by Rob Ewaschuk
  • Problem statement method
  • Resource analysis
  • CPU and Off-CPU FlameGraph analysis (perf, eBPF)
  • Wait Events analysis
  • Transaction and lock analysis
  • Tuple and operation stats analysis
  • Workload analysis (pg_stat_statements, log-based)
  • Single query analysis (EXPLAIN, auto_explain). Variations:
  • Planner numbers only
  • Time-centric
  • Focused on cardinality/selectivity (row counts)
  • Buffers-centric
  • Database Experiments
slide-12
SLIDE 12
slide-13
SLIDE 13

Macro-analysis:

  • analyze workload as a whole
  • split to segments (e.g., group queries removing parameters)
  • find “heavy” segments (“heavy” may have various meanings here)
  • apply macro- or micro-optimization to speed up whole workload or significant part of it
  • pro step: ensure that no parts of workload have slowed down

Micro-analysis:

  • analyze a single query, forget about all others – run EXPLAIN to obtain the plan
  • find bottlenecks, inefficient nodes in the plan
  • apply micro- or macro-optimization to improve performance
  • pro step: ensure that all queries have not slowed down
slide-14
SLIDE 14

pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)

MACRO-ANALYSIS MICRO-ANALYSIS

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)

MACRO-ANALYSIS MICRO-ANALYSIS

slide-18
SLIDE 18

pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)

MACRO-ANALYSIS MICRO-ANALYSIS

slide-19
SLIDE 19

pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)

MACRO-ANALYSIS MICRO-ANALYSIS

slide-20
SLIDE 20

pgwatch2 postgres-checkup PEV2 EXPLAIN (BUFFERS, ANALYZE)

MACRO-ANALYSIS MICRO-ANALYSIS

slide-21
SLIDE 21
slide-22
SLIDE 22
  • pg_stat_activity
  • pid, query (usually limited to default 1024 bytes), ... , xact_start, query_start, state_change, ….,

state, …, wait_event, wait_event_type, …

  • pg_locks
  • pg_stat_statements
  • queryid, query (entire text w/o params), calls, total_time, mean_time, …, shared_blk_***, …
  • Non-standard: pg_stat_kcache, pg_qualstats, pg_wait_sampling
  • PostgreSQL logs
  • Slow queries:
  • pid, query (full), duration (if exceeds log_min_duration_statement), ...
  • Blocked queries, with blockers (when blocking state exceeds deadlock_timeout)
  • Canceled queries (deadlocks, statement_timeout, idle_in_transaction_session_timeout)
slide-23
SLIDE 23

pg_stat_activity ┌──────────────────┬──────────────────────────┐ │ Column │ Type │ ├──────────────────┼──────────────────────────┤ │ datid │ oid │ │ datname │ name │ │ pid │ integer │ │ usename │ name │ │ application_name │ text │ │ client_addr │ inet │ │ client_hostname │ text │ │ client_port │ integer │ │ backend_start │ timestamp with time zone │ │ xact_start │ timestamp with time zone │ │ query_start │ timestamp with time zone │ │ state_change │ timestamp with time zone │ │ wait_event_type │ text │ │ wait_event │ text │ │ state │ text │ │ backend_xid │ xid │ │ backend_xmin │ xid │ │ query │ text │ │ backend_type │ text │ └──────────────────┴──────────────────────────┘ pg_stat_statements (extension, in core) ┌─────────────────────┬──────────────────┐ │ Column │ Type │ ├─────────────────────┼──────────────────┤ │ userid │ oid │ │ dbid │ oid │ │ queryid │ bigint │ │ query │ text │ │ calls │ bigint │ │ total_time │ double precision │ │ min_time │ double precision │ │ max_time │ double precision │ │ mean_time │ double precision │ │ stddev_time │ double precision │ │ rows │ bigint │ │ shared_blks_hit │ bigint │ │ shared_blks_read │ bigint │ │ shared_blks_dirtied │ bigint │ │ shared_blks_written │ bigint │ │ local_blks_hit │ bigint │ │ local_blks_read │ bigint │ │ local_blks_dirtied │ bigint │ │ local_blks_written │ bigint │ │ temp_blks_read │ bigint │ │ temp_blks_written │ bigint │ │ blk_read_time │ double precision │ │ blk_write_time │ double precision │ └─────────────────────┴──────────────────┘

slide-24
SLIDE 24

┌─────────────────────┬──────────────────┐ │ Column │ Type │ ├─────────────────────┼──────────────────┤ │ userid │ oid │ │ dbid │ oid │ │ queryid │ bigint │ │ query │ text │ │ calls │ bigint │ │ total_time │ double precision │ │ min_time │ double precision │ │ max_time │ double precision │ │ mean_time │ double precision │ │ stddev_time │ double precision │ │ rows │ bigint │ │ shared_blks_hit │ bigint │ │ shared_blks_read │ bigint │ │ shared_blks_dirtied │ bigint │ │ shared_blks_written │ bigint │ │ local_blks_hit │ bigint │ │ local_blks_read │ bigint │ │ local_blks_dirtied │ bigint │ │ local_blks_written │ bigint │ │ temp_blks_read │ bigint │ │ temp_blks_written │ bigint │ │ blk_read_time │ double precision │ │ blk_write_time │ double precision │ └─────────────────────┴──────────────────┘

PostgreSQL 13 – more columns (committed!):

plans {total,min,max,mean,stddev}_plan_time {total,min,max,mean,stddev}_exec_time wal_records wal_fpi wal_bytes

3rd party extensions: PoWA team: pg_stat_kcache – disk- and CPU-level stats pg_qualstats – WHERE and JOIN predicates stats PostgresPro (included to PoWA monitoring): pg_wait_sampling – sampling stats of wait events pgsentinel – sampling active session history

slide-25
SLIDE 25

pg_top pgCenter

slide-26
SLIDE 26
slide-27
SLIDE 27

https://github.com/akardapolov/ASH-Viewer https://github.com/dbacvetkov/PASH-Viewer

slide-28
SLIDE 28

https://github.com/postgrespro/pg_wait_sampling https://powa.readthedocs.io/en/latest/

slide-29
SLIDE 29

bit.ly/pgcon2020

Perform the steps 1-2

slide-30
SLIDE 30

total_time

slide-31
SLIDE 31

total_time calls

slide-32
SLIDE 32

total_time calls mean_time

slide-33
SLIDE 33

total_time calls mean_time

shared_blks_read

slide-34
SLIDE 34
slide-35
SLIDE 35

postgres-checkup https://gitlab.com/postgres-ai/postgres-checkup

slide-36
SLIDE 36

Two snapshots are needed to generate this report

slide-37
SLIDE 37

The 1st “subrow” shows absolute values from pg_stat_statements & pg_stat_kcache

slide-38
SLIDE 38

The 2nd “subrow” – “per second” The 3rd – “per call”

slide-39
SLIDE 39

Finally, the 4th “subrow”: relative “contribution” of this query group

slide-40
SLIDE 40
slide-41
SLIDE 41

Every PostgreSQL monitoring must have query analysis Preferably in both forms: graphs (historical data) and tables (detailed view of all metrics) None of the current products are perfect, but they are developing quickly #RuPostgres PostgreSQL monitoring review checklist: bit.ly/pgmonitoring

slide-42
SLIDE 42

EXPLAIN shows only what planner thinks about the query EXPLAIN ANALYZE executes the query and shows details Consider using EXPLAIN (ANALYZE, BUFFERS) always. Why? Reasons of slowness:

  • Contention issues (may not be seen in micro-analysis)
  • A lot of work to do
  • Logical level: high number of “rows”
  • Physical level: high number of buffers (Postgres 13 addition: amount of WAL generated)
  • Slow disk IO (check with cold and warm caches)
slide-43
SLIDE 43

test=# CREATE TABLE t AS SELECT * FROM generate_series(1, 100000); CREATE test=# \timing on Timing is on. test=# SELECT COUNT(*) FROM t; count

  • 100000

(1 row) Time: 8.211 ms test=# EXPLAIN ANALYZE SELECT COUNT(*) FROM t; QUERY PLAN

  • Aggregate (cost=1693.00..1693.01 rows=1 width=8) (actual time=18.753..18.753 rows=1 loops=1)
  • > Seq Scan on t (cost=0.00..1443.00 rows=100000 width=0) (actual time=0.011..10.451 rows=100000 loops=1)

Planning Time: 0.034 ms Execution Time: 18.788 ms (4 rows) Time: 19.494 ms

https://www.postgresql.org/docs/current/using-explain.html#USING-EXPLAIN-CAVEATS

slide-44
SLIDE 44

https://ongres.com/blog/explain_analyze_may_be_lying_to_you/

slide-45
SLIDE 45

pgAdmin{3,4}

slide-46
SLIDE 46

explain.depesz.com

slide-47
SLIDE 47

FlameGraphs

slide-48
SLIDE 48

PEV2 – explain.dalibo.com

slide-49
SLIDE 49

explain.tensor.ru

slide-50
SLIDE 50

bit.ly/pgcon2020

Perform the steps 3-4

slide-51
SLIDE 51
  • Just guess // anti-method
  • Use statistics to guess better
  • Use pg_stat_activity (but queries are truncated to track_activity_query_size bytes)
  • Use slow query logging
  • Only queries longer than log_min_duration_statement
  • r auto_explain.log_min_duration // careful! BUFFERS with TIMING also has overhead
  • PostgreSQL 12: log sampling
  • log_transaction_sample_rate (%) of all transactions
  • r auto_explain.sample_rate (%) of all statements
  • If possible, consider
  • increasing track_activity_query_size
  • using pg_qualstas or pg_wait_sampling

To find match pg_stat_statements query groups and individual queries:

  • use libpg_query
  • Patch exists to add queryid to

logs and pg_stat_activity

slide-52
SLIDE 52

bit.ly/pgcon2020

Perform the steps 5-9

slide-53
SLIDE 53
  • Run on production // anti-method
  • Concurrent workload may affect observations
  • Cannot check with cold caches
  • How to check a modifying query? (BEGIN; EXPLAIN …; ROLLBACK;) // anti-method *2!
  • How to verify an index idea? Or schema redesign? → impossible
  • Run on a clone
  • Easier in clouds
  • Better if the clone was created at physical level
  • For large databases, harder to iterate (very time- and resource-heavy)

→ Database Lab by Postgres.ai

https://postgres.ai/docs/database-lab/what_is_database_lab

slide-54
SLIDE 54

https://www.katacoda.com/postgres-ai/

slide-55
SLIDE 55

bit.ly/pgcon2020

Perform the step 10

slide-56
SLIDE 56

Locking issues https://gitlab.com/snippets/1890428 Thin clones Demo of Joe Bot https://postgres.ai/console/demo/joeinstances

slide-57
SLIDE 57
slide-58
SLIDE 58

@postgresmen

58

D a t a b a s e L a b , J

  • e

B

  • t

,

postgres-checkup, and more:

h t t p : / / p

  • s

t g r e s . a i

slide-59
SLIDE 59
slide-60
SLIDE 60
slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63

Expose queryid in pg_stat_activity in log_line_prefix https://commitfest.postgresql.org/28/2069/ Default gucs for EXPLAIN // BUFFERS should be ON by default! https://commitfest.postgresql.org/28/2567/

slide-64
SLIDE 64