Diagnosing and Fixing MySQL Performance Problems Percona, Inc. - PowerPoint PPT Presentation

Morgan-Tockers-MacBook-Pro:~ morgo$ uptime 15:00 up 11 days, 16:58, 5 users, load averages: 0.88 0.61 0.44 35 Monday, April 12, 2010

Morgan-Tockers-MacBook-Pro:~ morgo$ uptime 15:00 up 11 days, 16:58, 5 users, load averages: 0.88 0.61 0.44 mysql> show global status like 'slow_queries%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Slow_queries | 3 | +---------------+-------+ 1 row in set (0.00 sec) 36 Monday, April 12, 2010

Why do I show you this example? ★ A lock is a form of queue. ✦ R = Queue + Service Time ★ Some queuing shows up as “Load”. ✦ CPUs have a run queue. ✦ IO requests are queued before execution. 38 Monday, April 12, 2010

However; ★ Not all queuing shows up in the Operating System - such as internal locking contention. Cause: Possible Effect: More people request a System may look busy resource than current (CPU or Disks). capacity that can be offered. ★ It’s always better to instrument for the cause - as it may still be there without any ‘effect’ present. 39 Monday, April 12, 2010

Back to the Sequence Diagram ★ You want to find what doesn’t look right . Then take those quick wins. ★ This sounds trivial - but without instrumentation your guesses are often not correct. 40 Monday, April 12, 2010

Another Sequence Diagram: Browser Web Server Database Server Submit Login Form Login via External API Return Results Render page 41 Monday, April 12, 2010

The Lesson ★ In this case, the external API looks like a great piece to attack first. ★ How do we improve it? ✦ Maybe a developer can help here.. ✦ Can we cache the credentials? 42 Monday, April 12, 2010

The Important Lesson (cont.) 43 Monday, April 12, 2010

The Important Lesson (cont.) ★ We had a customer exactly like this: 43 Monday, April 12, 2010

The Important Lesson (cont.) ★ We had a customer exactly like this: ✦ They looked at their webservers. 43 Monday, April 12, 2010

The Important Lesson (cont.) ★ We had a customer exactly like this: ✦ They looked at their webservers. ✦ They weren’t loaded. 43 Monday, April 12, 2010

The Important Lesson (cont.) ★ We had a customer exactly like this: ✦ They looked at their webservers. ✦ They weren’t loaded. ✦ They looked at their database servers. 43 Monday, April 12, 2010

The Important Lesson (cont.) ★ We had a customer exactly like this: ✦ They looked at their webservers. ✦ They weren’t loaded. ✦ They looked at their database servers. ✦ They weren’t loaded either. 43 Monday, April 12, 2010

The Important Lesson (cont.) ★ We had a customer exactly like this: ✦ They looked at their webservers. ✦ They weren’t loaded. ✦ They looked at their database servers. ✦ They weren’t loaded either. ✦ The tried to optimize their database servers because ‘it was a database problem last time’. 43 Monday, April 12, 2010

The Important Lesson (cont.) ★ We had a customer exactly like this: ✦ They looked at their webservers. ✦ They weren’t loaded. ✦ They looked at their database servers. ✦ They weren’t loaded either. ✦ The tried to optimize their database servers because ‘it was a database problem last time’. ✦ Performance wasn’t much better. 43 Monday, April 12, 2010

Why do we set different goals for each task? Monday, April 12, 2010

Two different Tasks: ★ Sequence #1: (Total time 0.01 seconds) Start Update my status Done ★ Sequence #2: (Total time 60 seconds) Start Prepare Report Done 45 Monday, April 12, 2010

The “Optimization” ★ The customer was worried about a 60 second long query that ran at 3am every night. ★ Oh no, let’s optimize it by adding an index! 46 Monday, April 12, 2010

The outcome: ★ Sequence #1: (Total time 0.02 seconds) Start Update my status Done ★ Sequence #2: (Total time 10 seconds) Start Prepare Report Done 47 Monday, April 12, 2010

Look at: ★ The (ideally user facing) tasks that matter. ★ Write instrumentation so that you can think like a sequence diagram does. 48 Monday, April 12, 2010

Limits of Sequence Diagrams: ★ Why you might not see everyone using a sequence diagram in production - ✦ It breaks down when there’s too many sub-tasks involved in execution. 49 Monday, April 12, 2010

Introducing the Profile 50 Monday, April 12, 2010

Introducing the Profile (cont.) ★ The profile is related to the sequence diagram; ✦ What we do in a profile is aggregate similar sub-tasks into one entry. ✦ Then we order the profile from total R of a given task. 51 Monday, April 12, 2010

Table Of Contents 0. Welcome 5. Your Toolchest 1. Defining Performance 6. CPUs and Tasks 2. The Stack at 10000 Feet 7. Utilization versus Backlog 3. Isolating Problems 8. External-to-the-database Problems 4. Approaching the Stack 9. Conclusion 52 Monday, April 12, 2010

The Method for Isolation ★ In general: ✦ Start at the user, measure time taken from the front to the backend. ★ The absolutely certain method: ✦ Plod through the sequence diagram, measure each step. 53 Monday, April 12, 2010

The sequence diagram ★ Advantage: ✦ You will inevitably find the solution and be able to prove it. ★ Disadvantages: ✦ might take too much time ✦ things might be too hard or impossible to measure ✦ there might be too much information 54 Monday, April 12, 2010

Method for Isolation (cont.) ★ Slightly more practical preferred way: ✦ 1. Use a tool that makes it easy ✦ 2. If that's not possible, guess and measure. ✦ 3. If you can't measure anything, reason from logic. 55 Monday, April 12, 2010

Method #1: Instrument by time ★ Our favourite - and where you should start. ★ Only Con: ✦ Not always that easy to do in a complicated system. ✦ May not always be feasible to install instrumentation 56 Monday, April 12, 2010

Method #2: Guess and Measure ★ Look for a part of the system that “looks bad” ★ How much time does it consume? ★ Example: If logging in takes 2.1 seconds and DNS takes 2 seconds, you found the problem. 57 Monday, April 12, 2010

Method #3 - Reason from Logic ★ Really hard, really slow. ★ If you can't do any of those, you can ✦ ask for help ✦ trial-and-error (more on this later) ★ We suggest asking for help. 58 Monday, April 12, 2010

What not to do. Monday, April 12, 2010

Work on Unimportant Things ★ [DO] Apply Amdahl's Law: a 5% time consumer can't benefit you more than 5%. ★ [DON’T] Apply "best practices" ✦ These can have unexpected side effects. ✦ Example: I see many sort_merge_passes, maybe I need to increase sort_buffer_size. ✦ Example: I've heard that I should put the logs and data on different disks. 60 Monday, April 12, 2010

Guess-and-don't-measure ★ If you skip measuring, then you cause problems. ✦ This is also known as trial-and-error. ★ This method has 3 interesting properties: ✦ It sometimes works. ✦ It often wastes a lot of time and makes things worse. ✦ It can cost you your job. 61 Monday, April 12, 2010

Guess and Feel-Good-Measure ★ Apply BS-Benchmarks: ✦ Time how long it takes to create 10,000 tables and conclude InnoDB is slower. ✦ Write an application benchmark that doesn’t mimic that of your application (not enough data, single threaded only). 62 Monday, April 12, 2010

The best tool to make things easy is a profiler. Monday, April 12, 2010

System-wide ★ System-wide: ✦ oprofile, strace -c ★ Perl: Devel::NYTProf ★ Ruby: ruby -r profile (or just NewRelic) ★ PHP: ✦ Xdebug + KCachegrind -- but not in production ✦ XHProf http://mirror.facebook.net/facebook/xhprof/ 64 Monday, April 12, 2010

XHProf Demo ★ We’ve got it running here: ✦ http://184.73.91.208/test.php ✦ http://184.73.91.208/runs.php 65 Monday, April 12, 2010

Real Life. ★ Custom written profiling code from boardreader.com. ✦ Works similar to what you can achieve with NewRelic and XHProf. 67 Monday, April 12, 2010

A Screenshot: 68 Monday, April 12, 2010

Another Screenshot: 69 Monday, April 12, 2010

Another Screenshot: 70 Monday, April 12, 2010

User Tasks ★ If I had to guess it - I would say that they have a few different tasks that people do: ✦ Loading Static Pages (like the home page) ✦ Browse by a popular topic. ✦ Returning results for a custom search. 71 Monday, April 12, 2010

Task Goals ★ Each of these items (in my mind) has a different response time goal. Let's make some up for context: ✦ Loading the homepage 100ms. ✦ Browsing by a popular topic 500ms ✦ Search requests 2000ms (2 seconds) 72 Monday, April 12, 2010

Measuring Goals ★ Here's how boardreader.com stores the information: CREATE TABLE `performance_log_090721` ( `ip` varchar(15) NOT NULL, `server_ip` varchar(25) NOT NULL, `page` varchar(3000) NOT NULL, `utime` float NOT NULL, `stime` float NOT NULL, `wtime` float NOT NULL, `mysql_time` float NOT NULL, `sphinx_time` float NOT NULL, `mysql_count_queries` int(11) NOT NULL, `mysql_queries` text NOT NULL, `sphinx_count_queries` int(11) NOT NULL, `sphinx_queries` text NOT NULL, .. ) ENGINE=MyISAM DEFAULT CHARSET=latin1; 73 Monday, April 12, 2010

Measuring Goals (cont.) ★ Here's an aggregation of a day's worth of search responses: mysql> select avg(wtime) request, avg(stime+utime)/avg(wtime) cpu ,avg(mysql_time)/avg(wtime) mysql, avg(sphinx_time)/avg(wtime) sphinx, avg(wtime-stime-utime-sphinx_time-mysql_time)/avg(wtime) rest from performance_log_090721 where page_type='search' \G *************************** 1. row *************************** request: 1.2175869055517 cpu: 0.16983144536072 mysql: 0.1544487152423 sphinx: 0.61537297006254 rest: 0.060346869334443 1 row in set (4.16 sec) 74 Monday, April 12, 2010

Measuring Goals (cont.) mysql> select date_format(logged,"%H") h ,round(avg(wtime),3) r, round(avg(stime+utime)/avg ★ (wtime),2) cpup ,round(avg(mysql_time)/avg(wtime),2) mp, round(avg(sphinx_time)/avg(wtime),2) sp, round(avg(wtime-stime-utime-sphinx_time-mysql_time)/avg(wtime),2) rst from performance_log_090721 where page_type='search' group by h; +------+-------+------+------+------+------+ | h | r | cpup | mp | sp | rst | +------+-------+------+------+------+------+ | 00 | 1.816 | 0.11 | 0.14 | 0.70 | 0.05 | | 01 | 1.480 | 0.17 | 0.18 | 0.59 | 0.06 | | 02 | 1.394 | 0.16 | 0.22 | 0.53 | 0.09 | .... | 08 | 1.384 | 0.13 | 0.09 | 0.74 | 0.04 | | 09 | 1.315 | 0.17 | 0.11 | 0.67 | 0.04 | | 10 | 0.950 | 0.20 | 0.15 | 0.60 | 0.05 | | 11 | 0.874 | 0.21 | 0.16 | 0.57 | 0.06 | | 12 | 1.139 | 0.17 | 0.13 | 0.65 | 0.05 | | 13 | 1.191 | 0.16 | 0.14 | 0.65 | 0.05 | | 14 | 1.349 | 0.16 | 0.19 | 0.58 | 0.06 | | 15 | 1.076 | 0.20 | 0.21 | 0.53 | 0.06 | | 16 | 1.526 | 0.14 | 0.14 | 0.58 | 0.13 | | 17 | 0.853 | 0.24 | 0.19 | 0.50 | 0.07 | | 18 | 0.978 | 0.25 | 0.23 | 0.43 | 0.09 | | 19 | 0.924 | 0.23 | 0.17 | 0.54 | 0.06 | | 20 | 1.310 | 0.18 | 0.26 | 0.47 | 0.09 | | 21 | 1.211 | 0.17 | 0.24 | 0.51 | 0.08 | | 22 | 1.538 | 0.14 | 0.19 | 0.59 | 0.08 | | 23 | 1.450 | 0.15 | 0.18 | 0.60 | 0.06 | +------+-------+------+------+------+------+ 24 rows in set (4.33 sec) 75 Monday, April 12, 2010

Interpreting Results ★ Always interesting to see the difference in timing on a production system with real load! ★ 1 hour as I've got on my slides is probably not the best aggregate - some ‘blips’ in service may not show up. 76 Monday, April 12, 2010

Interpreting Results (cont.) ★ Average (mean) is easy to do in MySQL - but 95th percentile is probably better. ★ What is the problem with min/max/mean? 77 Monday, April 12, 2010

Actioning Results ★ It is only when you've got these metrics in place that you can start to look at where to fix the problem. ★ This is where the toolchest comes in. 78 Monday, April 12, 2010

Quick Glossary of Tools Monday, April 12, 2010

Top ★ Best use it to confirm what is really running. ✦ On a DB server mysqld should be at the top. ✦ Most of the time there should only be one mysqld process. ★ Example usage: ✦ top -bn1 81 Monday, April 12, 2010

ps ★ Quick Trick - verify that the sum of VSZ adds up to roughly the amount of memory used by the system: ✦ ps -e -o vsz | awk '{size += $1}END{print(size)}' ★ Check for all running servers: ✦ ps aux | grep mysqld 82 Monday, April 12, 2010

free ★ Doesn’t really show anything vmstat won’t. ✦ But very handy one line math to show caches. Example: $ free -m total used free shared buffers cached Mem: 32177 30446 1730 0 368 16649 -/+ buffers/cache: 13428 18748 Swap: 4095 2 4093 83 Monday, April 12, 2010

vmstat ★ Best run when server is busy. You can watch what is happening right now. Example: ✦ vmstat 5 ★ General-purpose, but a good way to sense what the system as a whole is doing. 84 Monday, April 12, 2010

vmstat (cont.) ★ Most important bits: ✦ si/so should be zero ✦ bi/bo are blocks read and written, so you can see IO ✦ What does 12% cpu usage mean? 85 Monday, April 12, 2010

mpstat ★ Installed as part of sysstat. ★ More useful than vmstat because it shows individual CPUs. Example: ✦ mpstat -P ALL 5 86 Monday, April 12, 2010

netstat ★ netstat -antp ✦ "ss -ant" works acceptably as well. 87 Monday, April 12, 2010

netstat (cont.) ★ Show count of states: ✦ netstat -antp | awk '{print $6}' | sort | uniq -c | sort -rn ★ Show count of peers: ✦ netstat -antp | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn 88 Monday, April 12, 2010

netstat (cont.) ★ Things to inspect: ✦ Are there a lot of sockets in TIME_WAIT? ✦ A lot going to port 53 (DNS) ? ✦ Who's connecting to mysqld? ★ You need to know the TCP handshake process: ✦ SYN, SYN-ACK, ACK, ESTABLISHED. 89 Monday, April 12, 2010

ping ★ useful for checking for packet loss. ★ let it run, watch the rtt, cancel, ★ look for missing packets. 90 Monday, April 12, 2010

iostat ★ Much better IO statistics than what vmstat provides. Example usage: ✦ iostat -dx 5 (need x for extended statistics) ✦ iostat -kx 5 (show CPU stats at the same time) ★ Main problem is that it lumps reads and writes together. ✦ Look for what the disks are doing and ask if this is reasonable. 91 Monday, April 12, 2010

iostat (cont.) ★ Main items to look at: ✦ What is the queue length? ✦ How much is being read and written? ✦ What is the average wait, and what is the service time? 92 Monday, April 12, 2010

Case Study #1 Monday, April 12, 2010

Diagnosing and Fixing MySQL Performance Problems Percona, Inc. - PowerPoint PPT Presentation

Diagnosing and Fixing MySQL Performance Problems Percona, Inc. http://www.percona.com/ 1 Monday, April 12, 2010 Table Of Contents 0. Welcome 5. Your Toolchest 1. Defining Performance 6. CPUs and Tasks 2. The Stack at 10000 Feet 7.

Performance Guide for MySQL Cluster Mikael Ronstrm, Ph.D Senior MySQL Architect Sun

Diagnosing bacterial Diagnosing bacterial Diagnosing bacterial Diagnosing bacterial infections

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

MySQL Replication Update MySQL 5.5 (GA) & MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Proxy meets: binlogs Jan Kneschke MySQL Enterprise Tools mailto: jan@mysql.com What is

MySQL Group Replication & MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

MySQL Cluster und MySQL Proxy Alles Online Diese Slides gibt es auch unter:

PHP and MySQL Dr. E. Benoist Winter Term 2006-2007 PHP and MySQL 1 PHP and MySQL Introduction

More on gdb for MySQL DBAs or Using gdb to study MySQL internals and as a last resort Valerii

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My

gdb tips and tricks for MySQL DBAs or How gdb can help you to solve MySQL problems Valerii

MySQL Backup and Restore at Facebook Scale Ola Berjak Production Engineer at MySQL

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

PHP + MySQL MySQL on the command line is great and all well not its not really that great

Percona MySQL About Me Qunar.com DB Director

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference & Expo 2011

Dealing with Data in Title Performance Testing Bogdan Veres March 2018 Hi, Im Bogdan

The dark art of container monitoring Luca Marturana About me Luca Marturana Software Engineer

Welcome! NERC 2017 Standards and Compliance Workshop JW Marriott New Orleans July 11-12, 2017

PRODUCTS PRESENTATION Version 1.9 MAY 2019 1 R&D Features Acrons Mobile Technologies

:

How to use OPNFV Testing tools to test/validate an Openstack provided by a vendor Speakers

Our Quibble With Tibble Tibble v. Edison International Nancy Ross Brian Netter

Building Kendall E. Cook - GEAR UP Multi- Coordinator, CPIP - Maryland Higher Education

Diagnosing and Fixing MySQL Performance Problems Percona, Inc. - PowerPoint PPT Presentation

Diagnosing and Fixing MySQL Performance Problems Percona, Inc. http://www.percona.com/ 1 Monday, April 12, 2010 Table Of Contents 0. Welcome 5. Your Toolchest 1. Defining Performance 6. CPUs and Tasks 2. The Stack at 10000 Feet 7.

Performance Guide for MySQL Cluster Mikael Ronstrm, Ph.D Senior MySQL Architect Sun

Diagnosing bacterial Diagnosing bacterial Diagnosing bacterial Diagnosing bacterial infections

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

MySQL Replication Update MySQL 5.5 (GA) &amp; MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Proxy meets: binlogs Jan Kneschke MySQL Enterprise Tools mailto: jan@mysql.com What is

MySQL Group Replication &amp; MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

MySQL Cluster und MySQL Proxy Alles Online Diese Slides gibt es auch unter:

PHP and MySQL Dr. E. Benoist Winter Term 2006-2007 PHP and MySQL 1 PHP and MySQL Introduction

More on gdb for MySQL DBAs or Using gdb to study MySQL internals and as a last resort Valerii

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My

gdb tips and tricks for MySQL DBAs or How gdb can help you to solve MySQL problems Valerii

MySQL Backup and Restore at Facebook Scale Ola Berjak Production Engineer at MySQL

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

PHP + MySQL MySQL on the command line is great and all well not its not really that great

Percona MySQL About Me Qunar.com DB Director

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference &amp; Expo 2011

Dealing with Data in Title Performance Testing Bogdan Veres March 2018 Hi, Im Bogdan

The dark art of container monitoring Luca Marturana About me Luca Marturana Software Engineer

Welcome! NERC 2017 Standards and Compliance Workshop JW Marriott New Orleans July 11-12, 2017

PRODUCTS PRESENTATION Version 1.9 MAY 2019 1 R&amp;D Features Acrons Mobile Technologies

:

How to use OPNFV Testing tools to test/validate an Openstack provided by a vendor Speakers

Our Quibble With Tibble Tibble v. Edison International Nancy Ross Brian Netter

Building Kendall E. Cook - GEAR UP Multi- Coordinator, CPIP - Maryland Higher Education

MySQL Replication Update MySQL 5.5 (GA) & MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Group Replication & MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference & Expo 2011

PRODUCTS PRESENTATION Version 1.9 MAY 2019 1 R&D Features Acrons Mobile Technologies