The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science - - PowerPoint PPT Presentation

the titan tools
SMART_READER_LITE
LIVE PREVIEW

The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science - - PowerPoint PPT Presentation

The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 2013 Madison, WI July 15, 2013 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD Opteron 6274 @ 2.2GHz


slide-1
SLIDE 1

Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS

Petascale Tools Workshop 2013 Madison, WI July 15, 2013

The Titan Tools Experience

slide-2
SLIDE 2

2 Managed by UT-Battelle for the U.S. Department of Energy

Overview of Titan

  • Cray XK7

– 18,688+ compute nodes

  • 16-core AMD Opteron 6274 @ 2.2GHz
  • 32GB DDR3 RAM
  • NVIDIA Kepler K20 GPU: 14 SM with 6GB RAM

– Gemini Interconnect: 3-D Torus

http://www.olcf.ornl.gov/titan/

slide-3
SLIDE 3

3 Managed by UT-Battelle for the U.S. Department of Energy

My Roles at ORNL

  • “Tools Developer” is my official job title

– HPC debugging, performance, & system administration tools – Matrixed in CSMD & NCCS

  • CSMD: tools research
  • NCCS: evaluate/improve production tool offerings
  • Titan (OLCF-3) Acceptance

– responsible for testing “Programming Environment and Tools”

  • OLCF-4 Tools Lead
slide-4
SLIDE 4

4 Managed by UT-Battelle for the U.S. Department of Energy

Tools for Titan

  • In production use:

– Debugging

  • Allinea DDT, gdb, cuda-gdb, STAT, Cray ATP

– Performance

  • Cray PAT, TAU, Vampirtrace, NVIDIA nvvp, CAPS HMPP wizard
  • In testing/evaluation:

– HPCToolkit, OpenSpeedshop, Score-P, Allinea MAP

  • Allinea, CAPS, and TU-Dresden

– prior/ongoing funding for feature improvements, mostly GPU-related – on-site personnel to assist users and scientific computing liaisons

slide-5
SLIDE 5

5 Managed by UT-Battelle for the U.S. Department of Energy

Performance Tools Study

  • Three goals
  • 1. develop familiarity with tools (as a user, not a developer)
  • 2. evaluate scalability and usability on Titan
  • 3. identify areas for improvements
  • Strategy

– follow tool use recommendations (per Titan user guide) – test functionality on hybrid MPI+OpenMP and MPI+GPU apps – evaluate usability/scalability using production science apps

  • My science app friends let me down L

– settled for: dummy MPI (master-worker), Sequoia IRS v1.0

slide-6
SLIDE 6

6 Managed by UT-Battelle for the U.S. Department of Energy

Tool Configurations

  • HPCToolkit 5.3.2 (svn head from June 20)

– Profile: PAPI_L1_TCM:PAPI_TLB_TL, PAPI_TOT_CYC@50,000,000 – Trace: PAPI_L1_TCM:PAPI_TLB_TL, Process fraction 10%

  • OpenSpeedshop 2.0.2-u11

– Profile: pcsamp – Trace: hwctime

  • PAT 6.0.1 (perftools)

– automatic program analysis in two phases (pat, apa)

  • TAU 2.22.2-openmp

– Profile/Trace: PAPI_L1_TCM:PAPI_TLB_TL, MPI communication tracking

  • Vampirtrace 5.14.2-nogpu

– compiler instrumentation (default on Titan), tauinst currently broken – 512MB trace limit per process, 128MB trace buffer

– Profile/Trace: PAPI_L1_TCM:PAPI_TLB_TL

slide-7
SLIDE 7

7 Managed by UT-Battelle for the U.S. Department of Energy

Tool Evaluations - Functionality

GNU Intel PGI Cray HPCToolkit ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ OpenSpeedshop ✔ ☐ ✔ ☐ ✔ ☐ ✔ ☐ PAT ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ TAU ✔ ☐ ✔ ☐ ✔ ☐ ✗ Vampirtrace (compinst) ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

  • dummy_mpi: simple master-worker MPI

– C and C++ versions

  • CUDA SDK: various GPU apps
slide-8
SLIDE 8

8 Managed by UT-Battelle for the U.S. Department of Energy

IRS Results - HPCToolkit

Storage Requirements (MiB) PGI

512 1728 4096 8000 Profile 23 117 359 969 Trace 4 10 32 107

Execution Overhead

200 400 600 800 512 1728 4096 8000 Baseline Profile Trace

Cray

512 1728 4096 8000 Profile 205 1,232 4,083 11,175 Trace 13 69 240 511 200 400 600 800 512 1728 4096 8000 Baseline Profile Trace

slide-9
SLIDE 9

9 Managed by UT-Battelle for the U.S. Department of Energy

IRS Results – OpenSpeedShop

Storage Requirements (MiB) PGI

512 1728 4096 8000 Profile Trace

Execution Overhead

0.5 1 1.5 512 1728 4096 8000 Baseline Profile Trace

Cray

512 1728 4096 8000 Profile Trace 0.5 1 1.5 512 1728 4096 8000 Baseline Profile Trace

slide-10
SLIDE 10

10 Managed by UT-Battelle for the U.S. Department of Energy

IRS Results – TAU

Storage Requirements (MiB) PGI

512 1728 4096 8000 Profile Trace

Execution Overhead

0.5 1 1.5 512 1728 4096 8000 Baseline Profile Trace

Cray

512 1728 4096 8000 Profile Trace 0.5 1 1.5 512 1728 4096 8000 Baseline Profile Trace

slide-11
SLIDE 11

11 Managed by UT-Battelle for the U.S. Department of Energy

IRS Results – Cray PAT

Storage Requirements (MiB) PGI

512 1728 4096 8000 pat ? ? ? 49 apa 8 27 68 TBD

Execution Overhead

200 400 600 800 512 1728 4096 8000 Baseline pat apa

Cray

512 1728 4096 8000 pat ? ? ? 206 apa 46 159 377 TBD 500 1000 1500 512 1728 4096 8000 Baseline pat apa

script error script error

slide-12
SLIDE 12

12 Managed by UT-Battelle for the U.S. Department of Energy

IRS Results - Vampirtrace

Storage Requirements (MiB) PGI

512 1728 4096 8000 Profile 0.3 1.0 2.3 4.3 Trace 1,400 4,400 11,000 20,000

Execution Overhead

500 1000 1500 512 1728 4096 8000 Baseline Profile Trace

Cray

512 1728 4096 8000 Profile 2.3 7.5 18 35 Trace 1,200 TBD TBD TBD 2000 4000 6000 8000 512 1728 4096 8000 Baseline Profile Trace

> 3hr > 4hr > 5hr

slide-13
SLIDE 13

13 Managed by UT-Battelle for the U.S. Department of Energy

IRS Results – Comparing Tools

Normalized Execution Overhead

1.1 1.4 1.1 1.2 1.1 2.7 1.2 2.8 4.4 49.5 1.8 45.8 1 10 100 PGI CRAY hpctk-prof hpctk-trace pat-pat pat-apa vt-prof vt-trace 1.2 1.4 1.1 1.2 1.5 4.5 1.3 3.7 3.1 33.1 1.5 1 10 100 PGI CRAY hpctk-prof hpctk-trace pat-pat pat-apa vt-prof vt-trace

slide-14
SLIDE 14

14 Managed by UT-Battelle for the U.S. Department of Energy

IRS Results – Comparing Tools

Normalized Execution Overhead

1.1 1.2 1.1 1.1 1.4 3.9 1.4 4.2 2.4 22.1 1.4 1 10 100 PGI CRAY hpctk-prof hpctk-trace pat-pat pat-apa vt-prof vt-trace 1.2 1.4 1.1 1.2 1.5 4.5 1.3 3.7 3.1 33.1 1.5 1 10 100 PGI CRAY hpctk-prof hpctk-trace pat-pat pat-apa vt-prof vt-trace

slide-15
SLIDE 15

15 Managed by UT-Battelle for the U.S. Department of Energy

dummy_mpi Results – Some Tools

Normalized Execution Overhead

1.34 1.25 1.12 1.38 1.36 1 1.1 1.2 1.3 1.4 1.5 Intel hpctk-prof hpctk-trace

  • ss-pcsamp

vt-prof vt-trace Baseline time: 2,812 seconds (~3/4 hr)

Much less overhead than small-scale: limit 512MB trace per process

slide-16
SLIDE 16

16 Managed by UT-Battelle for the U.S. Department of Energy

Next Steps

  • Work with production codes

– LAMMPS : C++, MPI + CUDA – NUCCOR-J, CESM: Fortran, MPI + OpenMP – S3D : Fortran, MPI + OpenACC

  • Compare information collected across tools

– user/developer feedback on new insights gleaned (if any) – tool expert feedback

  • Large-scale tests

– at least half of Titan nodes – up until things break or I run out of allocation

slide-17
SLIDE 17

17 Managed by UT-Battelle for the U.S. Department of Energy

Next Steps – Part 2

  • Identify areas for tool improvements
  • Work with tool developers

– user guidance – scalability, new feature development

slide-18
SLIDE 18

18 Managed by UT-Battelle for the U.S. Department of Energy

Questions & Feedback

www.ornl.gov

This research used resources of the Oak Ridge Leadership Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department

  • f Energy under Contract No.

DE-AC05-00OR22725.