when it all goes wrong
play

When it all Goes Wrong @leinweber Will Leinweber @leinweber Citus - PowerPoint PPT Presentation

When it all Goes Wrong @leinweber Will Leinweber @leinweber Citus Data (Microsoft) bitfission.com (warning autoplays midi) @leinweber coming from citus cloud heroku postgres @leinweber special thanks citus cloud dan farina


  1. When it all Goes Wrong

  2. @leinweber Will Leinweber @leinweber Citus Data (Microsoft) bitfission.com 
 (warning autoplays midi)

  3. @leinweber coming from citus cloud heroku postgres

  4. @leinweber special thanks citus cloud 
 — dan farina (@danfarina) heroku postgres 
 — maciek sakrejda (@uhoh_itsmaciek)

  5. @leinweber same sorts of problems from pages & alerts from support tickets

  6. @leinweber this talk more app dev who uses postgres 
 rather than dba

  7. @leinweber the problem with Postgres it’s pretty good you don’t get experience with how it breaks

  8. @leinweber what to do for a problem

  9. @leinweber what to do for a problem

  10. @leinweber complicated system network hardware o/s postgres

  11. @leinweber using the database (too much) 95% application 4% auto vacuum 1% everything else

  12. @leinweber hard to convince all the graphs saying DB is slow and nothing has changed …must be the database!

  13. @leinweber https://upload.wikimedia.org/wikipedia/commons/9/98/Survivorship-bias.png

  14. @leinweber “but I didn’t change anything” no deploys! no database migrations! no scaling!

  15. @leinweber “but I didn’t change anything” https://upload.wikimedia.org/wikipedia/commons/0/09/Redherring.gif

  16. @leinweber “but I didn’t change anything” more tra ffi c? change in access patterns? one big user logged in?

  17. @leinweber run out of a resource

  18. @leinweber snowball

  19. @leinweber example manageable user 1s query => 2x expensive frequent, small queries 3ms => 12ms

  20. @leinweber assumptions app maintenance hardware

  21. @leinweber assumptions postgres should not crash …with overcommit o ff and no containers large extensions increase chance

  22. @leinweber if not postgres, then what

  23. @leinweber system resources cpu memory disk parallelism / backends locks

  24. @leinweber cpu cpu mem mem disk disk parallelism parallelism

  25. @leinweber cpu mem disk parallelism credentials wrong networking broken locking issue, check pg_locks idle in transaction

  26. @leinweber cpu mem disk parallelism application submitting backlogged workload connection leak pool sizes set too large pg_lock issue + application backlog

  27. @leinweber cpu mem disk parallelism workload skew causing thrashing unusual sequential scan workload failover or restart => no cache pg_prewarm

  28. @leinweber cpu mem disk parallelism same as just disk, but also the application is piling on

  29. @leinweber cpu mem disk parallelism large GROUP BY s high disk latency due to unusual page dispersion pattern in the workload

  30. @leinweber cpu mem disk parallelism workload has high mem ( GROUP BY ) 
 + app adding backlog lock contention slowing mem release

  31. @leinweber cpu mem disk parallelism large GROUP BY s + paging in unusual data

  32. @leinweber cpu mem disk parallelism Look for what is causing disk access

  33. @leinweber cpu mem disk parallelism small, in-memory workload lots of seq scans on small table index scan w/ filter dropping lots

  34. @leinweber cpu mem disk parallelism app backlog 
 + too much processing on small data simply a lot of work

  35. @leinweber cpu mem disk parallelism large seq scans

  36. @leinweber cpu mem disk parallelism loading cold data + application backlog

  37. @leinweber cpu mem disk parallelism small # of backends doing a lot more work

  38. @leinweber cpu mem disk parallelism entity, workload, entity*workload soft deletes and non-conditional indexes

  39. @leinweber cpu mem disk parallelism reporting query

  40. @leinweber cpu mem disk parallelism app backlog, but with CPU/mem problems

  41. @leinweber tools of the trade

  42. @leinweber tools of the trade C symbols

  43. @leinweber tools of the trade: perf perf record -p <pid> && perf report

  44. @leinweber tools of the trade: perf perf top

  45. @leinweber tools of the trade: perf www.brendangregg.com/perf.html

  46. @leinweber tools of the trade: gdb gdb -batch -ex 'bt' -p <pid>

  47. @leinweber

  48. @leinweber

  49. @leinweber tools of the trade: iostat iostat -xm 10

  50. @leinweber tools of the trade: iotop

  51. @leinweber tools of the trade: htop

  52. @leinweber Tools of the trade: bwm-ng

  53. @leinweber tools of the trade: backends pgrep -lf postgres + grep + wc select * from pg_stat_activity

  54. @leinweber tools of the trade: pg_s_s select * from pg_stat_statements

  55. @leinweber tools of the trade: summary cpu mem disk parallelism network perf x gdb x iostat x iotop x htop x x bwm x pgrep x

  56. @leinweber what to do

  57. @leinweber what to do configuration change

  58. @leinweber what to do db change

  59. @leinweber what to do code change

  60. @leinweber flirting with disaster Velocity NY 2013: Richard Cook 
 "Resilience In Complex Adaptive Systems” Jens Rasmussen: 
 Risk management in a dynamic society: a modeling problem

  61. @leinweber flirting with disaster economic boundary

  62. @leinweber flirting with disaster economic boundary workload boundary

  63. @leinweber flirting with disaster economic boundary performance boundary workload boundary

  64. @leinweber flirting with disaster economic boundary error margin performance boundary workload boundary

  65. @leinweber flirting with disaster economic boundary performance boundary workload boundary

  66. @leinweber flirting with disaster economic boundary error margin performance boundary workload boundary

  67. @leinweber flirting with disaster economic boundary error margin performance boundary workload boundary

  68. @leinweber flirting with disaster Velocity NY 2013: Richard Cook 
 "Resilience In Complex Adaptive Systems” Jens Rasmussen: 
 Risk management in a dynamic society: a modeling problem

  69. @leinweber thank you Will Leinweber @leinweber citusdata.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend