Sampling XOR Instrumentation? Or both? Scalable Tools Workshop, Lake - - PowerPoint PPT Presentation

sampling xor instrumentation or both
SMART_READER_LITE
LIVE PREVIEW

Sampling XOR Instrumentation? Or both? Scalable Tools Workshop, Lake - - PowerPoint PPT Presentation

Center for Information Services and High Performance Computing (ZIH) Sampling XOR Instrumentation? Or both? Scalable Tools Workshop, Lake Tahoe, 2015-08-04 Andreas Knpfer, Bert Wesarg, Thomas Ilsche, Ronny Tschter, Joseph Schuchart, Hartmut


slide-1
SLIDE 1

Center for Information Services and High Performance Computing (ZIH)

Sampling XOR Instrumentation? Or both?

Scalable Tools Workshop, Lake Tahoe, 2015-08-04 Andreas Knüpfer, Bert Wesarg, Thomas Ilsche, Ronny Tschüter, Joseph Schuchart, Hartmut Mix, Holger Brunst from ZIH, TU Dresden, Germany

slide-2
SLIDE 2

Overview Introduction and existing approaches Recording and data formats Analysis of samples and events combined – Timeline visualization – Statistics Conclusions

2

slide-3
SLIDE 3

Definitions Certain terms are used almost synonymously even though they aren’t

3

Acquisition Recording Data representation “Profiling” “Event Tracing” Sampling Instrumentation Summarization Logging Profile Event/Call-Path Traces

?

slide-4
SLIDE 4

Existing Combinations: Sample one thing, instrument another: – Sampling of user routines or call-path tracing, instrumentation of MPI [Tallent et.al. 2011, Ilsche et.al. 2014] – Sampling of hardware counters, instrumentation of user routines and MPI Sampling of energy consumption next to instrumentation-based performance monitoring [Hackenberg et. al. 2014] Instrumentation maintains shadow stack, sampling reads it as shortcut of a stack walk [Iwainsky et.al. 2014] Very coarse-grained sampling, then “folding” over many repeated instances, instrumentation is only guiding the folding mechanism, instrumented events are not recorded [Servat, Ph.D. thesis 2015]

4

slide-5
SLIDE 5

Overview Introduction and existing approaches Recordi rding g and data a formats ats Analysis of samples and events combined – Timeline visualization – Statistics Conclusions

5

slide-6
SLIDE 6

Example with Instrumentation and Sampling Main Phase 2 Phase 3 Phase 4 Calc Calc Calc Main Phase 2 Calc MPI Main Phase 2 Calc Main Phase 3 Calc Main Main Phase 4 Calc _Main _Main _Main _Main _Main System System System System System MPI Calc Calc Main Calc Call-stack representation Flat representation “Trampolines” allow tracking uninterrupted calls, reduce overhead Fine-grained call timeline from instrumentation

6

slide-7
SLIDE 7

Samples with Calling Context Tree Main Phase 2 Phase 3 Phase 4 Calc Calc Calc Main Phase 2 Calc MPI Main Phase 2 Calc Main Phase 3 Calc Main Main Phase 4 Calc _Main _Main _Main _Main _Main System System System System System Efficient storage with Calling Context Tree Main Phase 2 Calc MPI 1 5 7 5 9 7 1 11 9 11

7

slide-8
SLIDE 8

Representation in OTF2: CCT and Sample Points Define calling context nodes recursively: DefCallingContext { CallingContextRef self, RegionRef region, // Routine or function SourceCodeLocationRef sourceCodeLocation, CallingContextRef parent } Use them at a sample point to specify entire call stack by single ID: CallingContextSample { <process>, <time>, CallingContextRef callingContext, uint unwindDistance, InterruptGeneratorRef interruptGenerator }

8

slide-9
SLIDE 9

Now add Events from Instrumentation Main Phase 2 Calc Main Phase 2 Calc MPI Main Phase 2 Calc Main Phase 2 Calc Main Phase 2 Calc _Main _Main _Main _Main System System System System Intermix Samples (S) as well as Enter (E) and Leave (L) events, all refer to the CCT S:5 MPI Main Phase 2 Calc _Main System MPI Main Phase 2 Calc _Main System S:5 S:5 L:7 S:7 E:7

9

slide-10
SLIDE 10

Representation in OTF2: Special Enter/Leave Events Introduce new form of enter and leave events: CallingContextEnter { <process>, <time>, CallingContextRef callingContext, uint32_t unwindDistance } CallingContextLeave { <process>, <time>, CallingContextRef callingContext ); Refer to CCT, easily converted to old mode for legacy purposes if needed Little to no storage overhead, but more information (e.g., hidden stack entries) … no reason to keep the old enter/leave event records referring to routines

10

slide-11
SLIDE 11

Overview Introduction and existing approaches Recording and data formats Analysis ysis of sa samples es and events ts combin ined – Timelin ine visu suali lizat zation

  • n

– Statistics Conclusions

11

slide-12
SLIDE 12

Combined Visualization in Timeline Main Phase 2 Calc Main Phase 2 Calc MPI Main Calc _Main System Events are status changes, usually drawn from “now” until “following event” Samples are points in time, but usually drawn 1δ wide (with sample distance δ) Main Phase 2 Calc _Main System Either draw at [t-½δ, t+½δ)

  • r at [t, t+δ)

12

t

slide-13
SLIDE 13

Combined Visualization in Timeline: Shift by ½δ Main Phase 2 Calc Main Calc Unified strategy for events and samples: – draw from “now” until “following event or sample” Main Phase 2 Calc _Main System Main Phase 2 Calc _Main System Do not suppress samples in instrumented function calls (see below), but do optimize the extra stack walk

13

slide-14
SLIDE 14

Overview Introduction and existing approaches Recording and data formats Analysis of samples and events combined – Timeline visualization – Statis tistic tics Conclusions

14

slide-15
SLIDE 15

How to Compute Run-Time Statistics? Main Phase 2 Calc Calc From samples alone or from samples and events combined? Calc Calc

2.2 ms 2.1 ms 8.1 ms 1δ = 3.1 ms

Calc MPI Calc Calc

1δ 1δ 1δ 2.1 ms 1.0 1.2 1.9 ms 3.1 ms 3.1 ms

Time for Calc Time for MPI Sum time

Events only (10.2) 2.2 (12.4) Samples

  • nly

9.3 = ¾ 3.1 = ¼ 12.4 Events AND samples 10.2 2.2 12.4 Events OR samples 9.3 2.2 11.5

15

slide-16
SLIDE 16

How to Compute Run-Time Statistics? Cannot compute from events alone with selective instrumentation Do not compute some from events and some from samples (cherry picking) Compute from samples only: produces statistically correct results – Don’t expect sampling to be more precise than 1δ in the first place Compute from samples and events combined: produces different correct result! – It is not more accurate than the one from sampling (max. error is the same) – Different granularity for instrumented calls may become evident What is easier to comprehend by users? What is easier to explain? Which is the expected model that brings the lesser surprise?

16

slide-17
SLIDE 17

Impressions

17

slide-18
SLIDE 18

Impressions

18

slide-19
SLIDE 19

Conclusions & Outlook Sampling and Instrumentation should be combined – Allow a completely flexible mix from samples and events – Event tracing should adopt favorable event representation via CCT – Make sure to present it in a clear way Release plans: – Sampling records already part of OTF2 – Include sampling in next Score-P release – Visualization in Vampir release version at SC’15

19

slide-20
SLIDE 20

Advertisements 9th Parallel Tools Workshop in Dresden, 2-3 September https://tools.zih.tu-dresden.de/2015/ Extreme Scale Programming Tools Workshop (ESPT) at SC’15 http://www.vi-hps.org --> News Deadline extended until 14 August

20