Does your tool support PAPI SDEs yet?
13th Scalable Tools Workshop
Anthony Danalis, Heike Jagode, Jack Dongarra Tahoe City, CA July 28-Aug 1, 2019
Does your tool support PAPI SDEs yet? 13 th Scalable Tools Workshop - - PowerPoint PPT Presentation
Does your tool support PAPI SDEs yet? 13 th Scalable Tools Workshop Anthony Danalis, Heike Jagode, Jack Dongarra Tahoe City, CA July 28-Aug 1, 2019 Case study: PaRSECs task scheduling algorithm Core 0 Core 1 Core 2 Core N Core 0
Anthony Danalis, Heike Jagode, Jack Dongarra Tahoe City, CA July 28-Aug 1, 2019
Core 0 Core 0 Core 1 Core 1 Core 2 Core 2 Core N Core N
Core 0 Core 0 Core 1 Core 1 Core 2 Core 2 Core N Core N
There is no standardized way for a software layer to export information about its behavior such that other, independently developed, software layers can read it.
HPC Application Math library Task runtime MPI Libibverbs RDMA completion One Sided Communication Data Dependency Distributed Factorization Quantum Chemistry Method
SDEs from your library can be read using the standard PAPI_start()/PAPI_stop()/PAPI_read().
Performance critical codes can implement SDEs with zero overhead by exporting existing code variables without adding any new instructions in the fast path.
PAPI SDE supports counters, groups, recordings, simple statistics, thread safety, custom callbacks.
s t a t i c l
g l
g l
a l _ v a r ; v
d s m a l l _ t e s t _ i n i t ( v
d ) { l
a l _ v a r = ; p a p i _ h a n d l e _ t * h a n d l e = p a p i _ s d e _ i n i t ( ” T E S T " ) ; p a p i _ s d e _ r e g i s t e r _ c
n t e r ( h a n d l e , ” E v n t " , P A P I _ S D E _ R O | P A P I _ S D E _ D E L T A , P A P I _ S D E _ l
g _ l
g , & l
a l _ v a r ) ; . . . }
s
e t y p e _ t * d a t a ; v
d s m a l l _ t e s t _ i n i t ( v
d ) { d a t a = . . . p a p i _ h a n d l e _ t * h a n d l e = p a p i _ s d e _ i n i t ( ” T E S T " ) ; p a p i _ s d e _ r e g i s t e r _ f p _ c
n t e r ( h a n d l e , " E v n t " , P A P I _ S D E _ R O | P A P I _ S D E _ D E L T A , P A P I _ S D E _ l
g _ l
g , a c c e s s
, d a t a ) ; . . . }
v
d * c
n t e r _ h a n d l e ; v
d s m a l l _ t e s t _ i n i t ( v
d ) { p a p i _ h a n d l e _ t * h a n d l e = p a p i _ s d e _ i n i t ( ” T E S T " ) ; p a p i _ s d e _ c r e a t e _ c
n t e r ( h a n d l e , " E v n t " , P A P I _ S D E _ l
g _ l
g , & c
n t e r _ h a n d l e ) ; . . . }
v
d * r e c
d e r _ h a n d l e ; v
d s m a l l _ t e s t _ i n i t ( v
d ) { p a p i _ h a n d l e _ t * h a n d l e = p a p i _ s d e _ i n i t ( ” T E S T " ) ; p a p i _ s d e _ c r e a t e _ r e c
d e r ( h a n d l e , " R C R D R " , s i z e
( d
b l e ) , c m p r _ f u n c _ p t r , & r e c
d e r _ h a n d l e ) ; . . . }
v
d * r e c
d e r _ h a n d l e ; v
d s m a l l _ t e s t _ i n i t ( v
d ) { p a p i _ h a n d l e _ t * h a n d l e = p a p i _ s d e _ i n i t ( ” T E S T " ) ; p a p i _ s d e _ c r e a t e _ r e c
d e r ( h a n d l e , " R C R D R " , s i z e
( d
b l e ) , c m p r _ f u n c _ p t r , & r e c
d e r _ h a n d l e ) ; . . . }
v
d * r e c
d e r _ h a n d l e ; v
d s m a l l _ t e s t _ i n i t ( v
d ) { p a p i _ h a n d l e _ t * h a n d l e = p a p i _ s d e _ i n i t ( ” T E S T " ) ; p a p i _ s d e _ c r e a t e _ r e c
d e r ( h a n d l e , " R C R D R " , s i z e
( d
b l e ) , c m p r _ f u n c _ p t r , & r e c
d e r _ h a n d l e ) ; . . . }
v
d * r e c
d e r _ h a n d l e ; v
d s m a l l _ t e s t _ i n i t ( v
d ) { p a p i _ h a n d l e _ t * h a n d l e = p a p i _ s d e _ i n i t ( ” T E S T " ) ; p a p i _ s d e _ c r e a t e _ r e c
d e r ( h a n d l e , " R C R D R " , s i z e
( d
b l e ) , c m p r _ f u n c _ p t r , & r e c
d e r _ h a n d l e ) ; . . . }
v
d * c
n t e r _ h a n d l e ; v
d * r e c
d e r _ h a n d l e ; v
d p u s h _ t e s t _ d
k ( v
d ) { d
b l e v a l ; l
g l
g i n c r e m e n t = 3 ; v a l = p e r f
m _ u s e f u l _ w
k ( ) ; p a p i _ s d e _ i n c _ c
n t e r ( c
n t e r _ h a n d l e , i n c r e m e n t ) ; p a p i _ s d e _ r e c
d ( r e c
d e r _ h a n d l e , s i z e
( v a l ) , & v a l ) ; }
35
36
37
38
– Code location – Hardware events (e.g. cache misses) – Patterns in history (e.g. last task before stealing event) – Patterns in call-path/stack/originating thread