1
Correlating Performance, Code Location and Memory Access
Harald Servat, Jesus Labarta, Judit Gimenez
Scalable Tools Workshop - Lake Tahoe, Aug 2nd 2016
Correlating Performance, Code Location and Memory Access Harald - - PowerPoint PPT Presentation
Correlating Performance, Code Location and Memory Access Harald Servat, Jesus Labarta, Judit Gimenez Scalable Tools Workshop - Lake Tahoe, Aug 2 nd 2016 1 Folding: instantaneous metric with minimum overhead Combine instrumentation and sampling
1
Scalable Tools Workshop - Lake Tahoe, Aug 2nd 2016
2 2
3 3
– Possibly different rate for both loads and stores – One entry PEBS buffer. Signal Extrae on individual event.
4 4
Different base addresses Different most frequent buffers
5 5
– Extrae_event API call – Paraver analysis
– Isolate different modes, eliminate outliers
instance of the repetitive region
6 6
MPI calls Useful duration Useful instructions 27 MPI ranks in 2 nodes (2 sockets x 12 cores each node)
7 7
Histogram useful duration Histogram useful instructions Process mapping Histogram clock frequency
8 8
One iteration 4 tasks selected
9 9
Approximation based on call stack @ MPI calls Approximation based on folded call stack
10 10
PEBS address
11 11
PEBS address
12 12
DRAM
LFB L2 L3 PEBS level providing the data
13 13
PEBS cost in cycles (avg.)
14 14
15 15