H a r d w a r e
- a
s s i s t e d s
- f
t w a r e t r a c i n g
A d r i e n V e r g é
a d r i e n v e r g e @ g m a i l . c
- m
H a r d w a r e - a s s i s t e d s o f t w a - - PowerPoint PPT Presentation
H a r d w a r e - a s s i s t e d s o f t w a r e t r a c i n g A d r i e n V e r g a d r i e n v e r g e @ g m a i l . c o m t a l k a b o u t t r a c i n g i m p
a d r i e n v e r g e @ g m a i l . c
f r
t h e k e r n e l : i n u s e r
p a c e : t
r a c e p
n t s i n s i d e y
r a p p l i c a t i
I R Q h a n d l e r s , s y s t e m c a l l s , s c h e d u l i n g a c t i v i t y , n e t w
k a c t i v i t y , e t c .
W h y i s m y s
t w a r e c r a s h i n g ?
t h r e a d 1 t h r e a d 1 t h r e a d 2 t h r e a d 2 p r
e s s p r
e s s
t i m e
12:40:48.500 12:40:48.600 12:40:48.700
Process TID PTID bottleneck bottleneck bottleneck 26242 26243 26244 26226 26242 26242
12:40:48.500 12:40:48.600 12:40:48.700
CPU 0 CPU 1 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 CPU 7 IRQ 44 IRQ 46 IRQ 43 SOFT_IRQ 9 SOFT_IRQ 4 SOFT_IRQ 1 SOFT_IRQ 7
WAIT_BLOCKED WAIT_FOR_CPU USERMODE SYSCALL INTERRUPTED
12:40:48.500 12:40:48.600 12:40:48.700
Process TID PTID bottleneck bottleneck bottleneck 26242 26243 26244 26226 26242 26242
WAIT_BLOCKED WAIT_FOR_CPU USERMODE SYSCALL INTERRUPTED
execve e c c m m
12:40:48.500 12:40:48.600 12:40:48.700
Process TID PTID bottleneck bottleneck bottleneck 26242 26243 26244 26226 26242 26242
WAIT_BLOCKED WAIT_FOR_CPU USERMODE SYSCALL INTERRUPTED
read exec
12:40:48.500 12:40:48.600 12:40:48.700
Process TID PTID bottleneck bottleneck bottleneck 26242 26243 26244 26226 26242 26242
WAIT_BLOCKED WAIT_FOR_CPU USERMODE SYSCALL INTERRUPTED
read write wri write read exec
credit: ARM
credit: Samsung, tabletolic.com, player.de, digitaltrends.com
credit: Intel
b u s , b u f f e r , t i m e s t a m p i n g
CPU
ETM STM ETB
system bus timestamping
system-on-chip
5 10 15 20
computation + tracepoints time per iteration (µs) no tracing LTTng-UST STM + ETB
i n d i c a t i v e b e n c h m a r k :
e r h e a d m
t l y d e p e n d s
t h e t r a c e d a p p l i c a t i
!
5 10 15 20
computation + tracepoints time per iteration (µs) no tracing LTTng-UST STM + ETB
a d d r e s s c
p a r a t
s , b u f f e r , t i m e s t a m p i n g
a d d r e s s c
p a r a t
s , b u f f e r , t i m e s t a m p i n g
t r i g g e r s u p
c u s t
c
d i t i
s
a d d r e s s c
p a r a t
s , b u f f e r , t i m e s t a m p i n g
t r i g g e r s u p
c u s t
c
d i t i
s
CPU
ETM STM ETB
system bus timestamping
system-on-chip
5 10 15 20 25 30 35 40 45 50
computation more computation time per iteration (µs) EVENT LOSS no tracing LTTng-UST ETM + ETB
CPU
BTS
RAM
x86 host
branch records
4015a8 7f2aac77e024 7f2aac77e012 40ef26 4015b0 4015b4$ perf record -e branches:u -c 1 -d ./myprogram $ perf script -f time,ip,addr 101918.272364: ffffffff814a6f2c => 7f8d7b9b3180 101918.272364: ffffffff814a6f2c => 7f8d7b9b3180 101918.272364: 7f8d7b9b3183 => 7f8d7b9b6730 101918.272364: ffffffff814a6f2c => 7f8d7b9b6730 101918.272364: ffffffff814a6f2c => 7f8d7b9b674f 101918.272364: ffffffff814a6f2c => 7f8d7b9b6756 101918.272364: 7f8d7b9b67c2 => 7f8d7b9b67df 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67ef => 7f8d7b9b6a30 101918.272364: 7f8d7b9b6a38 => 7f8d7b9b6a58 101918.272364: 7f8d7b9b6a62 => 7f8d7b9b6bc0 101918.272364: 7f8d7b9b6bd7 => 7f8d7b9b67d3 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8 101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8
s i m p l e p r
r a m , e v e r y b r a n c h r e c
d e d s a m e p r
r a m , a d d a t r a c e p
n t ( ) a t e v e r y b r a n c h
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 time per iteration (µs) program branching rate (branch/s) no tracing LTTng-UST BTS with perf
core
64K 512K
core 1
64K 512K
core 6
64K 512K
core 7
64K 512K
disk user-space
system bufger
B T S w r i t e s t r a c e t
d e d i c a t e d b u f f e r t r a c e i s c
i e d t
b i g g e r m e m
y z
e u p
b u f f e r f u l l
c
t e x t s w i t c h u s e r s t
e s t r a c e t
i s k u s i n g t h e w r i t e s y s t e m c a l l p
s i b l e c
y i n a n
h e r b u f f e r b e c a u s e n
_ S Y N C f l a g
core core 1 core 6 core 7 disk
64K 512K × number
B T S w r i t e s t r a c e t
d e d i c a t e d b u f f e r u p
b u f f e r f u l l
c
t e x t s w i t c h , m
e t
h e n e x t s u b
u f f e r f i l l e d s u b
u f f e r s a r e l a b e l e d t
e w r i t t e n t
i s k l a t e r w r i t i n g i s d
e b y a k e r n e l t a s k i n u s e r c
t e x t
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 time per iteration (µs) program branching rate (branch/s) no tracing LTTng-UST BTS with perf BTS with "spliced" perf
5 %
e r h e a d c
p a r e d t
T T n g
S T n e e d s p
t
e c
i n g
% t
%
e r h e a d l i m i t e d n u m b e r
t r a c e p
n t s n
a y l
d n
s u i t e d f
e v e n t t r a c i n g ( n
f l e x i b l e ) c
p a r e d t
a n i l l a p e r f , 2 × f a s t e r
D a t a A c q u i s i t i
P r
r a m T r a c e
C
t e x
9 : ~ 5 s µ / e v e n t C
e i 7 : ~ 2 n s / e v e n t
L T T n g
a n d
T M F :
h t t p s : / / l t t n g .
g /
S T M l i b r a r i e s :
h t t p s : / / g i t h u b . c
/ a d r i e n v e r g e / l i b c
e s i g h t
a p 4 4 3
E T M p a t c h :
h t t p s : / / l k m l .
g / l k m l / 2 1 4 / 1 / 3 / 2 5 9
B T S p a t c h :
h t t p s : / / g i t h u b . c
/ a d r i e n v e r g e / l i n u x / t r e e / p a t c h _ p e r f _ b t s _ s p l i c e