Performance Tuning best pracitces and performance monitoring with Zabbix
Andrew Nelson Senior Linux Consultant May 28, 2015 NLUUG Conf, Utrecht, Netherlands
Performance Tuning best pracitces and performance monitoring with - - PowerPoint PPT Presentation
Performance Tuning best pracitces and performance monitoring with Zabbix Andrew Nelson Senior Linux Consultant May 28, 2015 NLUUG Conf, Utrecht, Netherlands Overview Introduction Performance tuning is Science! A little Law and
Performance Tuning best pracitces and performance monitoring with Zabbix
Andrew Nelson Senior Linux Consultant May 28, 2015 NLUUG Conf, Utrecht, Netherlands
RED HAT | Andrew Nelson
2/47
Overview
RED HAT | Andrew Nelson
3/47
$ whoami
years
forums and IRC
Ruby library zbxapi
RED HAT | Andrew Nelson
5/47
Performance tuning and the Scientific Method
RED HAT | Andrew Nelson
6/47
Understanding the problem
components
problem at all
RED HAT | Andrew Nelson
7/47
Understanding the problem
You can't navigate somewhere when you don't know where you're going.
RED HAT | Andrew Nelson
8/47
Defining the problem
reference
N seconds to write.”
moving the mouse”
RED HAT | Andrew Nelson
9/47
Define your tests
1 $ time cp one /test_dir 2 $ time cp two /test_dir
$ run_test.sh Subsystem A write tests Run Size Time (seconds) 1 100KB 0.05 2 500KB 0.24 3 1MB 0.47
RED HAT | Andrew Nelson
10/47
Define your tests
a)It is representative of the problem b)It has easy to collate and process output.
group C but managed by department D.
support and may not lend much assistance placing department A in a difficult position.
RED HAT | Andrew Nelson
11/47
Perform your tests
baseline data
tests with the changes made
closer
RED HAT | Andrew Nelson
12/47
Perform your tests and DOCUMENT!
foreseen?
If someone ran a test on a server, but did not log it, did it really happen?
RED HAT | Andrew Nelson
13/47
When testing, don't forget to...
RED HAT | Andrew Nelson
14/47
Story time!
RHEL5 running on x86
was “slower” on RHEL
slower on newer systems
RED HAT | Andrew Nelson
15/47
More Story time!
6 with GFS2
quantification of “slow”.
but Developers claimed NFS was faster.
developers became more educated about performance and overall things are improved.
something before asking for it.
RED HAT | Andrew Nelson
17/47
Little's Law
Law in action.
analogous to lambda.
Queue length
RED HAT | Andrew Nelson
18/47
Little's Law
latency)
RED HAT | Andrew Nelson
19/47
Little's Law
9000 1500 150 Outbound Packets Inbound Packets
RED HAT | Andrew Nelson
20/47
Little's law in action.
monitoring.
RED HAT | Andrew Nelson
21/47
Little's law in action.
products, so how can we monitor it's performance in Zabbix?
parsing the status page
however we can use a script to parse the logs in realtime from Zabbix and use a file socket for data
RED HAT | Andrew Nelson
22/47
Little's law in action.
into Zabbix.
script
# YYYYMMDD-HHMMSS Path BytesReceived BytesSent TimeSpent MicrosecondsSpent LogFormat "%{%Y%m%d-%H%M%S}t %U %I %O %T %D" zabbix-log CustomLog "|$/var/lib/zabbix/apache-log.rb >>var/lib/zabbix/errors" zabbix-log
$ cat /var/lib/zabbix/apache-data-out Count Received Sent total_time total_microsedonds 4150693 573701315 9831930078 0 335509340
RED HAT | Andrew Nelson
23/47
Little's law in action.
Zabbix_sender
$ crontab -e */1 * * * * /var/lib/zabbix/zabbix_sender.sh
RED HAT | Andrew Nelson
25/47
The test environment
Storage Server Physical System (desktop) Infiniband GigE Router/Firewall 100Mbit Hypervisor 1 (Terry) Hypervisor 2 (Sherri) Zabbix Server
NOTE: See last slides for more details
RED HAT | Andrew Nelson
26/47
What are we looking for
investigative testing will help shape this.
the server.
increased time to service the queue
will issue an error, or the client will time out.
RED HAT | Andrew Nelson
27/47
Finding Peak Performance, initial test
system “Desktop”
connections per second.
Test Window
RED HAT | Andrew Nelson
28/47
Finding Peak Performance, initial test
plateau, but not saturation on the client.
appearance
appears very busy.
Test Window
RED HAT | Andrew Nelson
29/47
Finding Peak Performance, initial test
report that it responds faster with more connections
increased latency
Test Window
RED HAT | Andrew Nelson
30/47
Finding Peak Performance, initial test
RED HAT | Andrew Nelson
31/47
Finding Peak Performance, Initial analysis
cache.
cache efficiency.
test server and other systems behind the firewall/router
RED HAT | Andrew Nelson
32/47
Finding Peak Performance, Initial analysis
and switch as the server and not traverse the router.
RED HAT | Andrew Nelson
33/47
Finding Peak Performance, second test
VM with full 1Gb links to test server
connections count, it seems an upper limit has again been found.
plateau
load, but overall still low
Test Window
RED HAT | Andrew Nelson
34/47
Finding Peak Performance, second test
appears to be the bottleneck
plateau noted
picture of load, but it would seem there is still CPU capacity left.
Test Window
RED HAT | Andrew Nelson
35/47
Finding Peak Performance, second test
to respond faster under load than when idle
smooth appearance
any change in Apache
is back to “normal” load.
Test Window
RED HAT | Andrew Nelson
36/47
Finding Peak Performance, second test
Zabbix graphs.
RED HAT | Andrew Nelson
37/47
Finding Peak Performance, Second analysis
to to processor cache as noted before.
be expected for a saturated server.
for this server with the test web page is about 1,200 pages per second
RED HAT | Andrew Nelson
39/47
One more story...
allocated memory.
and appeared to spend 100% of one core's time in IO wait.
RED HAT | Andrew Nelson
40/47
One more story...
RED HAT | Andrew Nelson
41/47
One more story...
RED HAT | Andrew Nelson
42/47
Conclusion
long term monitoring
expected.
metrics for use by external software.
RED HAT | Andrew Nelson
43/47
Questions
Fragen jautājumi 質問 pytania vragen kysymykset питання вопросы spørgsmål domande preguntas
RED HAT | Andrew Nelson
44/47
Source Code
RED HAT | Andrew Nelson
45/47
The test environment (More details)
Storage Server Physical System (desktop) Infiniband GigE Router/Firewall 100Mbit Hypervisor 1 (Terry) Hypervisor 2 (Sherri) Zabbix Server
RED HAT | Andrew Nelson
46/47
The test environment (More details)
equipment.
RED HAT | Andrew Nelson
47/47
The test environment (More details)
embedded graphic. Two connections equals one page load.
logging script
track response times from the Zabbix server.