title goes here
play

Title goes here Tools for Performance Evaluation Timing and - PDF document

Title goes here Tools for Performance Evaluation Timing and performance evaluation has been an art Experiences and Lessons Learned Resolution of the clock with a Portable Interface to Issues about cache effects Hardware Performance


  1. Title goes here Tools for Performance Evaluation » Timing and performance evaluation has been an art Experiences and Lessons Learned » Resolution of the clock with a Portable Interface to » Issues about cache effects Hardware Performance Counters » Different systems » Can be cumbersome and inefficient with Jack Dongarra, Kevin London, Shirley Moore, Philip Mucci, traditional tools Daniel Terpstra, Haihang You, and Zhou Min » Situation about to change » Almost all high performance processors include hardware performance counters. » Some are easy to access, others not available to users. » On most platforms the APIs, if they exist, are not appropriate for the end user or well April 26, 2003 I PDPS/ PADTAD 2003 2 documented. Hardware Counters » Small number of registers dedicated for performance monitoring functions – AMD Athlon, 4 counters » PAPI is a proposed “standard” cross-platform interface to – Pentium < = III, 2 counters hardware performance counters. – Pentium IV, 18 counters – IA64, 4 counters » PAPI provides two API s to access the underlying performance counter hardware: – Alpha 21x64, 2 counters – Power 3, 8 counters » A low- level interface designed for tool developers and expert users, and – Power 4, 8 counters » A high- level interface for application engineers. – UltraSparc II, 2 counters – MIPS R14K, 2 counters April 26, 2003 I PDPS/ PADTAD 2003 3 April 26, 2003 I PDPS/ PADTAD 2003 4 PAPI Implementation PAPI Preset Events » Proposed standard set of event names deemed Tools most relevant for application performance tuning » Exact standardization of the semantics not possible P AP I High Level P ort able » eg IBM’s FMA P AP I Low Level Layer » PAPI supports approximately 100 preset events. » Mapped to native events on a given platform » Preset events are mappings from symbolic P AP I Machine Dependent Subst rat e names to machine specific definitions for a Machine particular hardware event. Kernel Ext ension Specif ic » Example: PAPI_TOT_CYC Layer Operat ing Syst em » PAPI also supports presets that may be derived from multiple underlying hardware metrics. Hardware P erf ormance Count ers » Example: PAPI_L1_DCM April 26, 2003 I PDPS/ PADTAD 2003 5 April 26, 2003 I PDPS/ PADTAD 2003 6 I C L 1

  2. Title goes here Sample Preset Listing Support for Native Events > tests/avail » PAPI supports native events: Test case 8: Available events and hardware information. ---------------------------------------------------------------- --------- » An event countable by the CPU can be counted Vendor string and code : GenuineIntel (- 1) Model string and code : Celeron (Mendocino) (6) even if there is no matching preset PAPI event. CPU revision : 10.000000 CPU Megahertz : 366.504944 ---------------------------------------------------------------- --------- » The developer uses the same API as when Name Code Avail Deriv Description (Note) PAPI_L1_DCM 0x80000000 Yes No Level 1 data cache misses setting up a preset event, but a CPU -specific bit PAPI_L1_ICM 0x80000001 Yes No Level 1 instruction cache misses pattern is used instead of the PAPI event PAPI_L2_DCM 0x80000002 No No Level 2 data cache misses PAPI_L2_ICM 0x80000003 No No Level 2 instruction cache definition. misses PAPI_L3_DCM 0x80000004 No No Level 3 data cache misses PAPI_L3_ICM 0x80000005 No No Level 3 instruction cache misses PAPI_L1_TCM 0x80000006 Yes Yes Level 1 cache misses PAPI_L2_TCM 0x80000007 Yes No Level 2 cache misses PAPI_L3_TCM 0x80000008 No No Level 3 cache misses PAPI_CA_SNP 0x80000009 No No Requests for a snoop PAPI_CA_SHR 0x8000000a No No Requests for shared cache line PAPI_CA_CLN 0x8000000b No No Requests for clean cache line PAPI_CA_INV 0x8000000c No No Requests for cache line inv. . . http: / / icl.cs.utk.edu/ proj ects/ papi/ files/ htm l_m an/ papi_presets. htm l April 26, 2003 I PDPS/ PADTAD 2003 7 April 26, 2003 I PDPS/ PADTAD 2003 8 High-level Interface High-level API Calls PAPI_flops(float *rtime, float *ptime, » Meant for application programmers wanting » long_long *flpins, float *mflops) coarse-grained measurements » Wallclock tim e, process tim e, FP ins since start, » Mflop/ s since last call » As easy to use as SGI IRIX prefex calls PAPI_num_counters () » » a com m and- line interface to the R10000 hardware performance » Ret urns t he num ber of available count ers counters PAPI_start_counters(int *cntrs, int alen) » » Requires no setup code » Start counters » Restrictions: PAPI_stop_counters(long_long *vals, int alen) » » Stop counters and put counter values in array » Allows only PAPI presets PAPI_accum_counters(long_long *vals , int alen ) » Not thread safe » » Accum ulate counters into array and reset » Only aggregate counters PAPI_read_counters(long_long *vals, int alen) » » Copy counter values into array and reset counters April 26, 2003 I PDPS/ PADTAD 2003 9 April 26, 2003 I PDPS/ PADTAD 2003 10 Low-level Interface Low-level Functionality » API Calls for: » Increased efficiency and functionality over the high level PAPI interface » Counter multiplexing » SVR4 compatible profiling » Approximately 60 functions » Processor information » Thread -safe (SMP, OpenMP, Pthreads) » Address space information » Supports both preset and native events » Accurate and low latency timing functions » Hardware event inquiry functions » Eventset management functions » Static and dynamic memory information » Simple locking operations » Callbacks on user defined overflow threshold April 26, 2003 I PDPS/ PADTAD 2003 11 April 26, 2003 I PDPS/ PADTAD 2003 12 I C L 2

  3. Title goes here PAPI 2.3.4 Release Design and Implementation Experiences April 14, 2003 Platforms » Enhancements » Success of com m unity -based open source » Static/ dynamic memory » I BM PPC604, 604e, developm ent effort Power 3, Power4, AI X 5 info » Parallel Tools Consortium » Intel x86/ Linux, » IA64 hardware profiling http: / / www.ptools.org / Windows, including and sam pling Pentium IV » Misc bug fixes » Tradeoffs between ease -of-use and » Sun UltraSparc I / I I / I I I » Sample Tools increased functionality and features » SGI MI PS » Perfometer R10K/ R12K/ R14K » Operating system support » Trapper » Com paq Alpha » Dynaprof » I nterfacing to third -party tools 21164/ 21264 with DADD/ DCPI » Data interpretation and accuracy issues » Itanium/ Itanium2 Linux » Efficiency and scalability issues » Cray T3E April 26, 2003 I PDPS/ PADTAD 2003 13 April 26, 2003 I PDPS/ PADTAD 2003 14 Operating System Support Tools » Perfctr kernel patch by Mikael Pettersson required for » Tools developed by the PAPI project Linux/ x86 » Dynaprof » Kernel modification has met resistance from some system » Perfometer administrators » Effort underway to get perfctr into mainstream Linux » Third -party tools release » HPCView (Rice University) » Vendor cooperation has been good (in m ost cases) » SvPablo (University of Illinois) » Register level operations code provided by Cray » TAU (University of Oregon) » I BM pmtoolkit included in AI X 5 » Vampir 3.x (Pallas) » Perfmon library from Hewlett-Packard for Itanium/ Itanium2 Linux » VProf (Sandia National Lab) » DADD (Dynam ic Access to DCPI Data) extension to DCPI » Others (see PAPI home page) from Hewlett-Packard for Alpha Tru64 UNI X April 26, 2003 I PDPS/ PADTAD 2003 15 April 26, 2003 I PDPS/ PADTAD 2003 16 Dynaprof Dynaprof GUI Screenshot » A portable tool to » Avoiding source-code dynamically instrument instrumentation and serial and parallel programs recompilation for the purpose of » Avoiding perturbation of performance analysis compiler optimizations » Simple and intuitive » Providing complete com m and line interface like language independence GDB » Built on DynInst and DPCL » Java/ Swing GUI » I BM and Maryland » Instrumentation is done through the run-tim e insertion of function calls to specially developed perform ance probes. April 26, 2003 I PDPS/ PADTAD 2003 17 April 26, 2003 I PDPS/ PADTAD 2003 18 I C L 3

  4. Title goes here Perfometer Screenshot April 26, 2003 I PDPS/ PADTAD 2003 19 April 26, 2003 I PDPS/ PADTAD 2003 20 HPCViewScreenshot SvPablo from UIUC • Source based instrumentation of loops and function calls for Fortran and C • Profiling statistics based on time and/or hardware counter data • Supports serial, MPI, and OpenMP programs • Freely available April 26, 2003 I PDPS/ PADTAD 2003 21 April 26, 2003 I PDPS/ PADTAD 2003 22 Vampir 3.x Data Accuracy Issues from Pallas http://www.pallas.com/e/products/vampir/index.htm » Act of measuring perturbs the system being measured » Extra instructions » Cache pollution » Servicing interrupts » PC sam pling can be inaccurate on out - of-order processors with speculative execution. » Solutions: » PAPI is being redesigned to keep its runtime overhead and memory footprint as small as possible. » Hardware support for interrupt handling and profiling (e.g., event address registers) is being used where available. » Work by Pat Teller at University of Texas -El Paso on validation of hardware counter data using microbenchmarks April 26, 2003 I PDPS/ PADTAD 2003 23 April 26, 2003 I PDPS/ PADTAD 2003 24 I C L 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend