Database Stalls, From the Ordinary to the Obscure Preetam Jinka - - PowerPoint PPT Presentation

database stalls from the ordinary to the obscure
SMART_READER_LITE
LIVE PREVIEW

Database Stalls, From the Ordinary to the Obscure Preetam Jinka - - PowerPoint PPT Presentation

Database Stalls, From the Ordinary to the Obscure Preetam Jinka (@PreetamJinka) Software Engineer Percona Live 2017 VividCortexs database monitoring application is the best way to improve your database performance, efficiency, and uptime.


slide-1
SLIDE 1

Database Stalls, From the Ordinary to the Obscure

Preetam Jinka (@PreetamJinka)

Software Engineer Percona Live 2017

slide-2
SLIDE 2

VividCortex’s database monitoring application is the best way to improve your database performance, efficiency, and uptime. Supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora, VividCortex uses patented algorithms to reveal key insights, helping users fix performance problems before they impact customers. Say hello and see a demo, Booth #205.

We’re hiring!

slide-3
SLIDE 3

3

This talk isn’t about the math.

Come to the O’Reilly booth after the talk to pick up a free copy of

  • ur book!
slide-4
SLIDE 4

What is a stall?

4

slide-5
SLIDE 5

5

Stalls

  • Short periods when work isn’t being done
  • We’re detecting stalls as short as 1 second
  • We do this with zero configuration and no fixed thresholds

○ The secret sauce: we have a model.

slide-6
SLIDE 6

6

We’re trying to catch small problems before they turn into bigger

  • nes.
slide-7
SLIDE 7

Little’s Law

  • L = λ × W
  • Concurrency = Throughput × Latency
  • Little’s Law provides a model to relate throughput and concurrency

In MySQL:

  • Concurrency: threads_running

○ There’s one thread per query. ○ From SHOW STATUS

  • Throughput: queries completed per second

7

slide-8
SLIDE 8

MySQL Server Stall Example

8

More queries in progress Fewer being completed

slide-9
SLIDE 9

MySQL Server Stall Example

9

All of the stalled queries are completing after the fault ends.

slide-10
SLIDE 10

Where do stalls come from?

10

  • Running out of credits on EBS volumes
  • MySQL query cache
  • Lock contention
  • A bad network cable!
  • Transparent huge pages (THP)

○ “If a transparent huge page isn’t available, the application will stall to let memory compaction run to free a page.”

slide-11
SLIDE 11

But we don’t really care about any of those things. We’re focused on the work your database is doing.

11

slide-12
SLIDE 12

Work-centric monitoring

12

slide-13
SLIDE 13

13

Work-centric monitoring in one slide

  • Focus on the work your systems are doing
  • Find relationships between metrics (maybe using a model)
  • Monitor what you want to optimize
  • Focus on heavy hitters
  • Automatically detect changes
slide-14
SLIDE 14

How to respond to database stalls

14

slide-15
SLIDE 15

15

Slowness is about spending time on something. Things spend time doing work or waiting.

slide-16
SLIDE 16

16

Work

  • CPU
  • Disk I/O
  • Various storage engine metrics
  • Slow queries

○ Large scans Waiting

  • Lock contention
  • Disk I/O
  • Memory compaction
slide-17
SLIDE 17

Walkthrough

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

slide-22
SLIDE 22

22

slide-23
SLIDE 23

23

Be careful about causality.

slide-24
SLIDE 24

Thread states

24

slide-25
SLIDE 25

Back pressure

25

slide-26
SLIDE 26

26

Back pressure is about systems receiving more work than they can process.

slide-27
SLIDE 27

27

slide-28
SLIDE 28

28

It’s much better to handle back pressure higher up the stack.

slide-29
SLIDE 29

Clients

29

APIs

Database System

slide-30
SLIDE 30

30

Low-level back pressure can cause unfair slowdowns higher up the stack.*

*Totally untested hypothesis. :)

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

50 ms shift

slide-33
SLIDE 33

33

50 ms shift

~1 sec queries stay ~1 sec queries (1x) ~1 ms queries become ~50 ms queries (50x)

slide-34
SLIDE 34
  • Rate limiting / throttling
  • Use a queue to contain requests at a higher level
  • Somehow prioritize some requests over others

34

Ways to deal with back pressure

slide-35
SLIDE 35

35

Can you eliminate stalls?

Probably not all. Most? Perhaps!

slide-36
SLIDE 36

Come find me at the O’Reilly booth!

36

Questions?

Twitter: @PreetamJinka Email: preetam@vividcortex.com