Linux Performance Analysis and Tools
Lead Performance Engineer brendan@joyent.com
Brendan Gregg
@brendangregg
SCaLE11x
February, 2013
Sunday, February 24, 13
Linux Performance Analysis and Tools Brendan Gregg Lead - - PowerPoint PPT Presentation
Linux Performance Analysis and Tools Brendan Gregg Lead Performance Engineer brendan@joyent.com SCaLE11x @brendangregg February, 2013 Sunday, February 24, 13 Find the Bottleneck Operating System Hardware Applications DBs, all server
Lead Performance Engineer brendan@joyent.com
Brendan Gregg
@brendangregg
SCaLE11x
February, 2013
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port
Expander Interconnect I/O Bus Interface Transports
I/O Controller Network Controller I/O Bridge System Libraries Device Drivers Scheduler Virtual Memory System Call Interface
CPU Inter- connect Memory Bus
CPU 1 DRAM
Operating System Hardware Linux Kernel
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
http://dtrace.org/blogs/brendan/2012/01/30/performance-analysis-talk-at-scale10x/
Sunday, February 24, 13
ENTERPRISE AND THE CLOUD Brendan Gregg Prentice Hall, 2013
(secondary is the OpenSolaris-illumos-based SmartOS)
Sunday, February 24, 13
Sunday, February 24, 13
tune, and do more with less
and develop around them
Sunday, February 24, 13
latency outliers
new perf issues are encountered.
Sunday, February 24, 13
Workload Application System Calls System Libraries Kernel Devices Workload Analysis Resource Analysis Operating System Software Stack
Developers Sysadmins
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Int. Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS System Libraries Device Drivers Scheduler Virtual Memory System Call Interface Linux Kernel user-level kernel-level Operating System
Sunday, February 24, 13
even if you intend to use a monitoring product. Monitoring products often use the same metrics, read from /proc.
$ iostat Linux 3.2.6-3.fc16.x86_64 (node104) 02/20/2013 _x86_64_ (1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.02 0.00 0.10 0.04 0.00 99.84 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn vda 0.24 7.37 2.15 80735422 23571828 vdb 0.06 5.51 7.79 60333940 85320072
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port I/O Controller Network Controller I/O Bridge System Libraries Device Drivers Scheduler Virtual Memory System Call Interface CPU 1 DRAM
Operating System Hardware
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port I/O Controller Network Controller I/O Bridge System Libraries Device Drivers Scheduler Virtual Memory System Call Interface CPU 1 DRAM
Operating System Hardware
perf dtrace stap perf iostat iotop blktrace dtrace perf top pidstat mpstat dstat slabtop dstat free top netstat strace tcpdump ip nicstat dtrace perf dstat sar /proc
Various:
vmstat ping
Sunday, February 24, 13
Sunday, February 24, 13
and waiting. Linux includes tasks blocked on disk I/O.
constants of 1, 5 and 15 minutes. With three values you can see if load is increasing, steady, or decreasing.
CPUs are saturated (100% utilized), and threads are suffering scheduler latency. Might. There’s that disk I/O factor too.
$ uptime 16:23:34 up 126 days, 1:03, 1 user, load average: 5.09, 2.12, 1.82
Sunday, February 24, 13
$ top top - 01:38:11 up 63 days, 1:17, 2 users, load average: 1.57, 1.81, 1.77 Tasks: 256 total, 2 running, 254 sleeping, 0 stopped, 0 zombie Cpu(s): 2.0%us, 3.6%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 49548744k total, 16746572k used, 32802172k free, 182900k buffers Swap: 100663292k total, 0k used, 100663292k free, 14925240k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11721 web 20 0 623m 50m 4984 R 93 0.1 0:59.50 node 11715 web 20 0 619m 20m 4916 S 25 0.0 0:07.52 node 10 root 20 0 0 0 0 S 1 0.0 248:52.56 ksoftirqd/2 51 root 20 0 0 0 0 S 0 0.0 0:35.66 events/0 11724 admin 20 0 19412 1444 960 R 0 0.0 0:00.07 top 1 root 20 0 23772 1948 1296 S 0 0.0 0:04.35 init [...]
Sunday, February 24, 13
Sunday, February 24, 13
upgrading to faster CPUs doesn’t help!
Sunday, February 24, 13
Sunday, February 24, 13
$ mpstat -P ALL 1 02:47:49 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 02:47:50 all 54.37 0.00 33.12 0.00 0.00 0.00 0.00 0.00 12.50 02:47:50 0 22.00 0.00 57.00 0.00 0.00 0.00 0.00 0.00 21.00 02:47:50 1 19.00 0.00 65.00 0.00 0.00 0.00 0.00 0.00 16.00 02:47:50 2 24.00 0.00 52.00 0.00 0.00 0.00 0.00 0.00 24.00 02:47:50 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:47:50 4 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:47:50 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:47:50 6 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:47:50 7 16.00 0.00 63.00 0.00 0.00 0.00 0.00 0.00 21.00 02:47:50 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [...]
Sunday, February 24, 13
$ iostat -xkdz 1 Linux 2.6.35-32-server (prod21) 02/20/13 _x86_64_ (16 CPU) Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s \ ... sda 0.00 0.00 0.00 0.00 0.00 0.00 / ... sdb 0.00 0.35 0.00 0.05 0.10 1.58 \ ... / ... Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s \ ... sdb 0.00 0.00 591.00 0.00 2364.00 0.00 / ... ... \ avgqu-sz await r_await w_await svctm %util ... / 0.00 0.84 0.84 0.00 0.84 0.00 ... \ 0.00 3.82 3.47 3.86 0.30 0.00 ... / 0.00 2.31 2.31 0.00 2.31 0.00 ... \ ... / avgqu-sz await r_await w_await svctm %util ... \ 0.95 1.61 1.61 0.00 1.61 95.00
workload input resulting performance
Sunday, February 24, 13
by multiple disks may accept more work a 100% utilization
Sunday, February 24, 13
memory by swapping pages to disk, but costs performance
$ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 15 0 2852 46686812 279456 1401196 0 0 0 0 0 0 0 0 100 0 16 0 2852 46685192 279456 1401196 0 0 0 0 2136 36607 56 33 11 0 15 0 2852 46685952 279456 1401196 0 0 0 56 2150 36905 54 35 11 0 15 0 2852 46685960 279456 1401196 0 0 0 0 2173 36645 54 33 13 0 [...]
Sunday, February 24, 13
$ free total used free shared buffers cached Mem: 49548744 32787912 16760832 0 61588 342696
Swap: 100663292 0 100663292
Sunday, February 24, 13
IP stack latency, including how the network handles ICMP.
Lots of other/better tools for this (eg, hping). Try using TCP.
$ ping www.hilton.com PING a831.b.akamai.net (63.234.226.9): 56 data bytes 64 bytes from 63.234.226.9: icmp_seq=0 ttl=56 time=737.737 ms Request timeout for icmp_seq 1 64 bytes from 63.234.226.9: icmp_seq=2 ttl=56 time=819.457 ms 64 bytes from 63.234.226.9: icmp_seq=3 ttl=56 time=897.835 ms 64 bytes from 63.234.226.9: icmp_seq=4 ttl=56 time=669.052 ms 64 bytes from 63.234.226.9: icmp_seq=5 ttl=56 time=799.932 ms ^C
6 packets transmitted, 5 packets received, 16.7% packet loss round-trip min/avg/max/stddev = 669.052/784.803/897.835/77.226 ms
Sunday, February 24, 13
(Tim Cook ported and enhanced it on Linux)
# nicstat -z 1 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 01:20:58 eth0 0.07 0.00 0.95 0.02 79.43 64.81 0.00 0.00 01:20:58 eth4 0.28 0.01 0.20 0.10 1451.3 80.11 0.00 0.00 01:20:58 vlan123 0.00 0.00 0.00 0.02 42.00 64.81 0.00 0.00 01:20:58 br0 0.00 0.00 0.00 0.00 42.00 42.07 0.00 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 01:20:59 eth4 42376.0 974.5 28589.4 14002.1 1517.8 71.27 35.5 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 01:21:00 eth0 0.05 0.00 1.00 0.00 56.00 0.00 0.00 0.00 01:21:00 eth4 41834.7 977.9 28221.5 14058.3 1517.9 71.23 35.1 0.00 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat 01:21:01 eth4 42017.9 979.0 28345.0 14073.0 1517.9 71.24 35.2 0.00 [...]
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port System Libraries Device Drivers Scheduler Virtual Memory System Call Interface CPU 1 DRAM
Operating System Hardware
iostat top mpstat dstat dstat free top dstat vmstat ping
I/O Bridge
nicstat
I/O Controller Network Controller
infer infer
Sunday, February 24, 13
Sunday, February 24, 13
$ sar -B 1 Linux 3.2.6-3.fc16.x86_64 (node104) 02/20/2013 _x86_64_ (1 CPU) 05:24:34 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 05:24:35 PM 0.00 0.00 267.68 0.00 29.29 0.00 0.00 0.00 0.00 05:24:36 PM 19.80 0.00 265.35 0.99 28.71 0.00 0.00 0.00 0.00 05:24:37 PM 12.12 0.00 1339.39 1.01 2763.64 0.00 1035.35 1035.35 100.00 05:24:38 PM 0.00 0.00 534.00 0.00 28.00 0.00 0.00 0.00 0.00 05:24:39 PM 220.00 0.00 644.00 3.00 74.00 0.00 0.00 0.00 0.00 05:24:40 PM 2206.06 0.00 6188.89 17.17 5222.22 2919.19 0.00 2919.19 100.00 [...]
Sunday, February 24, 13
$ netstat -s [...] Tcp: 127116 active connections openings 165223 passive connection openings 12904 failed connection attempts 19873 connection resets received 20 connections established 662889209 segments received 354923419 segments send out 405146 segments retransmited 6 bad segments received. 26379 resets sent [...] TcpExt: 2142 invalid SYN cookies received 3350 resets received for embryonic SYN_RECV sockets 7460 packets pruned from receive queue because of socket buffer overrun 2932 ICMP packets dropped because they were out-of-window 96670 TCP sockets finished time wait in fast timer 86 time wait sockets recycled by time stamp 1007 packets rejects in established connections because of timestamp [...many...]
Sunday, February 24, 13
# pidstat 1 Linux 3.2.6-3.fc16.x86_64 (node107) 02/20/2013 _x86_64_ (1 CPU) 05:55:18 PM PID %usr %system %guest %CPU CPU Command 05:55:19 PM 12642 0.00 1.01 0.00 1.01 0 pidstat 05:55:19 PM 12643 5.05 11.11 0.00 16.16 0 cksum 05:55:19 PM PID %usr %system %guest %CPU CPU Command 05:55:20 PM 12643 6.93 6.93 0.00 13.86 0 cksum [...] # pidstat -d 1 Linux 3.2.6-3.fc16.x86_64 (node107) 02/20/2013 _x86_64_ (1 CPU) 05:55:22 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:55:23 PM 279 0.00 61.90 0.00 jbd2/vda2-8 05:55:23 PM 12643 151985.71 0.00 0.00 cksum 05:55:23 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:55:24 PM 12643 96616.67 0.00 0.00 cksum [...]
disk I/O (yay!)
Sunday, February 24, 13
and most resource I/O is performed by syscalls
$ strace -tttT -p 12670 1361424797.229550 read(3, "REQUEST 1888 CID 2"..., 65536) = 959 <0.009214> 1361424797.239053 read(3, "", 61440) = 0 <0.000017> 1361424797.239406 close(3) = 0 <0.000016> 1361424797.239738 munmap(0x7f8b22684000, 4096) = 0 <0.000023> 1361424797.240145 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.000017> [...]
Sunday, February 24, 13
# time dd if=/dev/zero of=/dev/null bs=512 count=1024k [...] 536870912 bytes (537 MB) copied, 0.35226 s, 1.5 GB/s real 0m0.355s user 0m0.021s sys 0m0.022s # time strace -c dd if=/dev/zero of=/dev/null bs=512 count=1024k [...] 536870912 bytes (537 MB) copied, 71.9565 s, 7.5 MB/s real 1m11.969s user 0m3.179s sys 1m6.346s
200x slower
# strace -c dd if=/dev/zero of=/dev/null bs=512 count=1024k [...] % time seconds usecs/call calls errors syscall
51.32 0.028376 0 1048581 read 48.68 0.026911 0 1048579 write 0.00 0.000000 0 7 open [...]
Sunday, February 24, 13
# tcpdump -i eth4 -w /tmp/out.tcpdump tcpdump: listening on eth4, link-type EN10MB (Ethernet), capture size 65535 bytes ^C33651 packets captured 34160 packets received by filter 508 packets dropped by kernel # tcpdump -nr /tmp/out.tcpdump reading from file /tmp/out.tcpdump, link-type EN10MB (Ethernet) 06:24:43.908732 IP 10.2.0.2.55502 > 10.2.203.2.22: Flags [.], ack ... 06:24:43.908922 IP 10.2.0.2.55502 > 10.2.203.2.22: Flags [.], ack ... 06:24:43.908943 IP 10.2.203.2.22 > 10.2.0.2.55502: Flags [.], seq ... 06:24:43.909061 IP 10.2.0.2.55502 > 10.2.203.2.22: Flags [.], ack ...
Sunday, February 24, 13
example dropped packets
Sunday, February 24, 13
highlighted (seconds).
# btrace /dev/sdb 8,16 3 1 0.429604145 20442 A R 184773879 + 8 <- (8,17) 184773816 8,16 3 2 0.429604569 20442 Q R 184773879 + 8 [cksum] 8,16 3 3 0.429606014 20442 G R 184773879 + 8 [cksum] 8,16 3 4 0.429607624 20442 P N [cksum] 8,16 3 5 0.429608804 20442 I R 184773879 + 8 [cksum] 8,16 3 6 0.429610501 20442 U N [cksum] 1 8,16 3 7 0.429611912 20442 D R 184773879 + 8 [cksum] 8,16 1 1 0.440227144 0 C R 184773879 + 8 [0] [...]
Sunday, February 24, 13
than pidstat’s Kbytes)
similar enabled to work.
# iotop -bod5 Total DISK READ: 35.38 M/s | Total DISK WRITE: 39.50 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 12824 be/4 root 35.35 M/s 0.00 B/s 0.00 % 80.59 % cksum ... 279 be/3 root 0.00 B/s 27.65 K/s 0.00 % 2.21 % [jbd2/vda2-8] 12716 be/4 root 28.44 K/s 0.00 B/s 2.35 % 0.00 % sshd: root@pts/0 12816 be/4 root 6.32 K/s 0.00 B/s 0.89 % 0.00 % python /usr/bin/ iotop -bod5 [...]
Sunday, February 24, 13
# slabtop -sc Active / Total Objects (% used) : 900356 / 1072416 (84.0%) Active / Total Slabs (% used) : 29085 / 29085 (100.0%) Active / Total Caches (% used) : 68 / 91 (74.7%) Active / Total Size (% used) : 237067.98K / 260697.24K (90.9%) Minimum / Average / Maximum Object : 0.01K / 0.24K / 10.09K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 112035 110974 99% 0.91K 3201 35 102432K ext4_inode_cache 726660 579946 79% 0.11K 20185 36 80740K buffer_head 4608 4463 96% 4.00K 576 8 18432K kmalloc-4096 83496 76878 92% 0.19K 1988 42 15904K dentry 23809 23693 99% 0.55K 821 29 13136K radix_tree_node 11016 9559 86% 0.62K 216 51 6912K proc_inode_cache 3488 2702 77% 1.00K 109 32 3488K kmalloc-1024 510 431 84% 5.73K 102 5 3264K task_struct 10948 9054 82% 0.17K 238 46 1904K vm_area_struct 2585 1930 74% 0.58K 47 55 1504K inode_cache [...]
Sunday, February 24, 13
# sysctl -a [...] net.ipv4.tcp_fack = 1 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_ecn = 2 net.ipv4.tcp_dsack = 1 net.ipv4.tcp_mem = 24180 32240 48360 net.ipv4.tcp_wmem = 4096 16384 1031680 net.ipv4.tcp_rmem = 4096 87380 1031680 [...]
Sunday, February 24, 13
$ cat /proc/meminfo MemTotal: 8181740 kB MemFree: 71632 kB Buffers: 163288 kB Cached: 4518600 kB SwapCached: 7036 kB Active: 4765476 kB Inactive: 2866016 kB Active(anon): 2480336 kB Inactive(anon): 478580 kB Active(file): 2285140 kB Inactive(file): 2387436 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 2932728 kB SwapFree: 2799568 kB Dirty: 76 kB Writeback: 0 kB [...]
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
CPU performance counters (programmable registers)
subcommands, including:
kmem Trace/measure kernel memory (slab) properties kvm Trace/measure KVM guest OS list List available events (targets of instrumentation) lock Analyze lock events probe Create dynamic probe points (dynamic tracing!) record Run a command and record profile data (as perf.data) report Read perf.data and summarize, has an interactive mode sched Trace/measure kernel scheduler statistics stat Run a command, gather, and report perf counter stats
Sunday, February 24, 13
ways to reduce memory I/O, and improve locality (NUMA)
$ perf stat gzip file1 Performance counter stats for 'gzip file1': 2294.924314 task-clock-msecs # 0.901 CPUs 62 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 265 page-faults # 0.000 M/sec 5496871381 cycles # 2395.230 M/sec 12210601948 instructions # 2.221 IPC 1263678628 branches # 550.641 M/sec 13037608 branch-misses # 1.032 % 4725467 cache-references # 2.059 M/sec 2779597 cache-misses # 1.211 M/sec 2.546444859 seconds time elapsed
yay
Sunday, February 24, 13
whatever the processor supports. Examine bus events.
$ perf list | grep Hardware cpu-cycles OR cycles [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] instructions [Hardware event] cache-references [Hardware event] [...] $ perf stat -e instructions,cycles,L1-dcache-load-misses,LLC-load- misses,dTLB-load-misses gzip file1 Performance counter stats for 'gzip file1': 12278136571 instructions # 2.199 IPC 5582247352 cycles 90367344 L1-dcache-load-misses 1227085 LLC-load-misses 685149 dTLB-load-misses 2.332492555 seconds time elapsed
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port I/O Controller Network Controller I/O Bridge System Libraries Device Drivers Scheduler Virtual Memory System Call Interface CPU 1 DRAM
Operating System Hardware
perf stat
Expander Interconnect I/O Bus CPU Inter- connect Memory Bus
advanced activity: refer to the processor manuals
Sunday, February 24, 13
# perf record -a -g -F 997 sleep 10 [ perf record: Woken up 44 times to write data ]
Sunday, February 24, 13
# perf report --stdio [...] # Overhead Command Shared Object Symbol # ........ ........... ................. .............................. # 72.98% swapper [kernel.kallsyms] [k] native_safe_halt |
default_idle cpu_idle rest_init start_kernel x86_64_start_reservations x86_64_start_kernel 9.43% dd [kernel.kallsyms] [k] acpi_pm_read |
ktime_get_ts | |--87.75%-- __delayacct_blkio_start | io_schedule_timeout | balance_dirty_pages_ratelimited_nr | generic_file_buffered_write [...]
Sunday, February 24, 13
Sunday, February 24, 13
$ perf list | grep block: block:block_rq_abort [Tracepoint event] block:block_rq_requeue [Tracepoint event] block:block_rq_complete [Tracepoint event] block:block_rq_insert [Tracepoint event] block:block_rq_issue [Tracepoint event] block:block_bio_bounce [Tracepoint event] block:block_bio_complete [Tracepoint event] block:block_bio_backmerge [Tracepoint event] block:block_bio_frontmerge [Tracepoint event] block:block_bio_queue [Tracepoint event] block:block_getrq [Tracepoint event] block:block_sleeprq [Tracepoint event] block:block_plug [Tracepoint event] block:block_unplug [Tracepoint event] block:block_split [Tracepoint event] block:block_bio_remap [Tracepoint event] block:block_rq_remap [Tracepoint event] $ perf list | grep Tracepoint | wc -l 840
Sunday, February 24, 13
Disk Disk Port Port I/O Controller Network Controller I/O Bridge CPU 1 DRAM
Operating System Hardware
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS System Libraries Device Drivers Scheduler Virtual Memory System Call Interface
sock: net: skb: scsi: irq: block: ext4: vmscan: kmem: sched: syscalls:
... more can be added as needed device stats can be inferred
Sunday, February 24, 13
# perf probe --add='tcp_sendmsg' Add new event: probe:tcp_sendmsg (on tcp_sendmsg) [...] # perf record -e probe:tcp_sendmsg -aR -g sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.091 MB perf.data (~3972 samples) ] # perf report --stdio [...] # Overhead Command Shared Object Symbol # ........ ....... ................. ........... # 100.00% sshd [kernel.kallsyms] [k] tcp_sendmsg |
sock_aio_write do_sync_write vfs_write sys_write system_call __GI___libc_write
active traced call stacks from arbitrary kernel locations!
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port I/O Controller Network Controller I/O Bridge System Libraries Device Drivers Scheduler Virtual Memory System Call Interface CPU 1 DRAM
Operating System Hardware
perf probe --add
advanced activity: refer to the kernel source code device stats can be inferred
Sunday, February 24, 13
analyze cycle, and using post-processors to rework the
Sunday, February 24, 13
Sunday, February 24, 13
There are a couple of awesome books about DTrace too
Sunday, February 24, 13
I’m using it the lab to solve replicated production perf issues
tools/get-deps.pl # if using Ubuntu tools/get-deps-fedora.sh # RedHat/Fedora make all make install make load (need to be root or have sudo access) # make load tools/load.pl 13:40:14 Syncing... 13:40:14 Loading: build-3.2.6-3.fc16.x86_64/driver/dtracedrv.ko 13:40:15 Preparing symbols... 13:40:15 Probes available: 281887 13:40:18 Time: 4s
Sunday, February 24, 13
liners and scripts. In-kernel custom filtering and aggregation.
# dtrace -n 'fbt::tcp_sendmsg:entry /execname == "sshd"/ { @["bytes"] = quantize(arg3); }' dtrace: description 'fbt::tcp_sendmsg:entry ' matched 1 probe ^C bytes value ------------- Distribution ------------- count 16 | 0 32 |@@@@@@@@@@@@@@@@ 1869 64 |@@@@@@@@@@@@@ 1490 128 |@@@ 355 256 |@@@@ 461 512 |@@@ 373 1024 |@ 95 2048 | 4 4096 | 1 8192 | 0
Sunday, February 24, 13
liners and scripts. In-kernel custom filtering and aggregation.
# dtrace -n 'fbt::tcp_sendmsg:entry /execname == "sshd"/ { @["bytes"] = quantize(arg3); }' dtrace: description 'fbt::tcp_sendmsg:entry ' matched 1 probe ^C bytes value ------------- Distribution ------------- count 16 | 0 32 |@@@@@@@@@@@@@@@@ 1869 64 |@@@@@@@@@@@@@ 1490 128 |@@@ 355 256 |@@@@ 461 512 |@@@ 373 1024 |@ 95 2048 | 4 4096 | 1 8192 | 0
these examples use dtrace4linux
filter aggregation (summarizes) kernel -> user transfers these these numbers
Sunday, February 24, 13
Cloud Analytics, showing real-time cloud-wide syscall latency:
Sunday, February 24, 13
You may just:
Sunday, February 24, 13
#!/usr/sbin/dtrace -s fbt::vfs_read:entry { self->start = timestamp; } fbt::vfs_read:return /self->start/ { @[execname, "ns"] = quantize(timestamp - self->start); self->start = 0; } # ./vfsread.d dtrace: script './vfsread.d' matched 2 probes cksum ns value ------------- Distribution ------------- count [...] 262144 | 0 524288 |@@@@@@@@@@ 834 1048576 | 8 2097152 | 30 4194304 | 40 8388608 |@ 66 16777216 | 28 33554432 | 1
13 line script to time VFS reads by process name read latency distribution, 0.5ms -> 33ms (disks)
Sunday, February 24, 13
probe description
filter do this when probe “fires”
dtrace -n ‘provider:module:function:name /predicate/ { action }’
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port I/O Controller Network Controller I/O Bridge System Libraries Device Drivers Scheduler Virtual Memory System Call Interface CPU 1 DRAM
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port I/O Controller Network Controller I/O Bridge System Libraries Device Drivers Scheduler Virtual Memory System Call Interface CPU 1 DRAM
fbt pid syscall sched proc vminfo io profile tcp udp ip cpc
infer
ip fbt
infer
cpc profile java javascript node perl python php ruby erlang
mysql postgres ...
fbt and pid are dynamic
plockstat
Sunday, February 24, 13
Sunday, February 24, 13
# ./ext4slower.d 10 Tracing ext4 read/write slower than 10 ms TIME PROCESS D KB ms FILE 2013 Feb 22 17:17:02 cksum R 64 35 100m 2013 Feb 22 17:17:02 cksum R 64 16 1m 2013 Feb 22 17:17:03 cksum R 64 18 data1 2013 Feb 22 17:17:03 cksum R 64 23 data1
Sunday, February 24, 13
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option defaultargs #pragma D option switchrate=5 dtrace:::BEGIN { min_ns = $1 * 1000000; printf("Tracing ext4 read/write slower than %d ms\n", $1); printf("%-20s %-16s %1s %4s %6s %s\n", "TIME", "PROCESS", "D", "KB", "ms", "FILE"); } fbt::vfs_read:entry, fbt::vfs_write:entry { this->file = (struct file *)arg0; this->fs = this->file->f_path.dentry->d_inode->i_sb->s_type->name; }
Sunday, February 24, 13
fbt::vfs_read:entry, fbt::vfs_write:entry /stringof(this->fs) == "ext4"/ { self->start = timestamp; self->name = this->file->f_path.dentry->d_name.name; } fbt::vfs_read:return, fbt::vfs_write:return /self->start && (this->delta = timestamp - self->start) > min_ns/ { this->dir = probefunc == "vfs_read" ? "R" : "W"; printf("%-20Y %-16s %1s %4d %6d %s\n", walltimestamp, execname, this->dir, arg1 / 1024, this->delta / 1000000, stringof(self->name)); } fbt::vfs_read:return, fbt::vfs_write:return { self->start = 0; self->name = 0; }
... should add more vfs_*() calls; or trace ext4 funcs directly
Sunday, February 24, 13
Sunday, February 24, 13
# ./tcpretransmit.d Tracing TCP retransmits... Ctrl-C to end. 2013 Feb 23 18:24:11: retransmit to 10.2.124.2, by: kernel`tcp_retransmit_timer+0x1bd kernel`tcp_write_timer+0x188 kernel`run_timer_softirq+0x12b kernel`tcp_write_timer kernel`__do_softirq+0xb8 kernel`read_tsc+0x9 kernel`sched_clock+0x9 kernel`sched_clock_local+0x25 kernel`call_softirq+0x1c kernel`do_softirq+0x65 kernel`irq_exit+0x9e kernel`smp_apic_timer_interrupt+0x6e kernel`apic_timer_interrupt+0x6e [...]
... can trace those stack functions directly for more detail
Sunday, February 24, 13
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { trace("Tracing TCP retransmits... Ctrl-C to end.\n"); } fbt::tcp_retransmit_skb:entry { this->so = (struct sock *)arg0; this->d = (unsigned char *)&this->so->__sk_common.skc_daddr; printf("%Y: retransmit to %d.%d.%d.%d, by:", walltimestamp, this->d[0], this->d[1], this->d[2], this->d[3]); stack(99); }
Sunday, February 24, 13
the API is dynamically built from the kernel code
aren’t there on Linux yet
Sunday, February 24, 13
kernel: linux == illumos distros: {ubuntu|CentOS|Fedora} == {SmartOS|OmniOS|OpenIndiana}
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
wasn’t DTrace for Linux ports
tracing, probes, tapsets, scripts, ...
Sunday, February 24, 13
Applications DBs, all server types, ... Block Device Interface Ethernet LVM IP ext3/... TCP/UDP VFS Sockets ZFS Disk Disk Port Port I/O Controller Network Controller I/O Bridge System Libraries Device Drivers Scheduler Virtual Memory System Call Interface CPU 1 DRAM
Operating System Hardware
Given the tools to see everything, how do you use them?
Sunday, February 24, 13
kcachegrind, valgrind, google profiler, nfsiostat, cifsiostat, latencytop, powertop, LLTng, ktap, ...
Sunday, February 24, 13
checklist, and experts a reminder
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
A policeman sees a drunk looking under a streetlight, and asks what he is looking for. The drunk says he has lost his keys. The policeman can't find them either, and asks if he lost them under the streetlight. The drunk replies: “No, but this is where the light is best.”
Sunday, February 24, 13
top - 15:09:38 up 255 days, 16:54, 10 users, load average: 0.00, 0.03, 0.00 Tasks: 274 total, 1 running, 273 sleeping, 0 stopped, 0 zombie Cpu(s): 0.7%us, 0.0%sy, 0.0%ni, 99.1%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8181740k total, 7654228k used, 527512k free, 405616k buffers Swap: 2932728k total, 125064k used, 2807664k free, 3826244k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16876 root 20 0 57596 17m 1972 S 4 0.2 3:00.60 python 3947 brendan 20 0 19352 1552 1060 R 0 0.0 0:00.06 top 15841 joshw 20 0 67144 23m 908 S 0 0.3 218:21.70 mosh-server 16922 joshw 20 0 54924 11m 920 S 0 0.1 121:34.20 mosh-server 1 root 20 0 23788 1432 736 S 0 0.0 0:18.15 init 2 root 20 0 0 0 0 S 0 0.0 0:00.61 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.11 migration/0 4 root 20 0 0 0 0 S 0 0.0 18:43.09 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 [...]
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
attributes examined will not be a problem
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
Drill-Down Analysis
Applications Block Device Interface ext4 VFS Device Drivers System Call Interface System Libraries
User Kernel
Dynamic Tracing / DTrace is well suited for this, as it can dig through all layers with custom detail
Sunday, February 24, 13
# dtrace -n 'fbt::ext4_readpages:entry { self->ts = timestamp; } fbt::ext4_readpages:return /self->ts/ { @["us"] = lquantize((timestamp - self->ts) / 1000, 0, 10000, 250); self->ts = 0; }' dtrace: description 'fbt::ext4_readpages:entry ' matched 2 probes ^C us value ------------- Distribution ------------- count < 0 | 0 0 |@@@@@@@@@@@@ 303 250 | 0 500 | 0 750 |@@@@ 88 1000 |@@@@@@@@@@@@@@ 335 1250 | 0 1500 | 0 1750 |@@@@ 107 2000 |@@@@@@ 144 2250 | 0 2500 | 0 [...]
cache hits disk I/O
Sunday, February 24, 13
# dtrace -n 'fbt::ext4_readpages:entry { this->file = (struct file *)arg0; this->name = this->file->f_path.dentry->d_name.name; @[stringof(this->name), stack()] = count(); }' dtrace: description 'fbt::ext4_readpages:entry ' matched 1 probe ^C[...] foo8 kernel`__do_page_cache_readahead+0x1c7 kernel`ra_submit+0x21 kernel`ondemand_readahead+0x115 kernel`page_cache_async_readahead+0x80 kernel`radix_tree_lookup_slot+0xe kernel`find_get_page+0x1e kernel`generic_file_aio_read+0x48b kernel`vma_merge+0x121 kernel`do_sync_read+0xd2 kernel`__switch_to+0x132 kernel`security_file_permission+0x93 kernel`rw_verify_area+0x61 kernel`vfs_read+0xb0 kernel`sys_read+0x4a kernel`system_call_fastpath+0x16 122
# of
Sunday, February 24, 13
environment-wide down to metal
direction
Sunday, February 24, 13
Sunday, February 24, 13
Sunday, February 24, 13
Saturation Utilization Errors
X
Sunday, February 24, 13
data path...
Sunday, February 24, 13
For each check:
Disk Disk Port Port
Expander Interconnect I/O Bus Interface Transports
I/O Controller Network Controller I/O Bridge
CPU Interconnect
CPU 1 DRAM
Hardware
Memory Bus
CPU 1 DRAM
Memory Bus
Sunday, February 24, 13
Resource Type Metric CPU Utilization
per-cpu: mpstat -P ALL 1, “%idle”; sar -P ALL, “%idle”; system-wide: vmstat 1, “id”; sar -u, “%idle”; dstat -c, “idl”; per-process:top, “%CPU”; htop, “CPU %”; ps -o pcpu; pidstat 1, “%CPU”; per-kernel- thread: top/htop (“K” to toggle), where VIRT == 0 (heuristic). [1]
CPU Saturation
system-wide: vmstat 1, “r” > CPU count [2]; sar -q, “runq-sz” > CPU count; dstat -p, “run” > CPU count; per-process: /proc/PID/schedstat 2nd field (sched_info.run_delay); perf sched latency (shows “Average” and “Maximum” delay per-schedule); dynamic tracing, eg, SystemTap schedtimes.stp “queued(us)” [3]
CPU Errors
perf (LPE) if processor specific error events (CPC) are available; eg, AMD64′s “04Ah Single-bit ECC Errors Recorded by Scrubber” [4]
http://dtrace.org/blogs/brendan/2012/03/07/the-use-method-linux-performance-checklist ... etc for all combinations (would fill a dozen slides)
Sunday, February 24, 13
Sunday, February 24, 13
by your current toolset
critical issues
Sunday, February 24, 13
from what may be dozens available
Sunday, February 24, 13
Sunday, February 24, 13
analysis) is time consuming – would like tools for convenience
when complete
Sunday, February 24, 13
than physical limits
can’t analyze down to metal directly
physical statistics, or does the analysis on your behalf
to change. Limited time only. Sign up at joyent.com
Sunday, February 24, 13
linux-performance-checklist/
performance-analysis-methodology/
http://illumos.org, http://smartos.org
Sunday, February 24, 13
checklist/
graphs/
Sunday, February 24, 13