Debugging Scalable Applications on the XT May 2nd 2009 Chris - - PowerPoint PPT Presentation

debugging scalable applications on the xt
SMART_READER_LITE
LIVE PREVIEW

Debugging Scalable Applications on the XT May 2nd 2009 Chris - - PowerPoint PPT Presentation

Debugging Scalable Applications on the XT May 2nd 2009 Chris Gottbrath Director, Product Management Debugging Scalable Applications Intro Challenges Products Scalability Interactive Subset Debugging Batch


slide-1
SLIDE 1

Debugging Scalable Applications

  • n the XT

May 2nd 2009

Chris Gottbrath Director, Product Management

slide-2
SLIDE 2

2

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Debugging Scalable Applications

  • Intro

– Challenges – Products

  • Scalability

– Interactive Subset Debugging

  • Batch Environments

– TVScript

  • Long Distance Collaborations

– Remote Display Client

  • Memory Limitations

– Memory Debugging

  • A look forward

– RedZones – ReplayEngine

  • Questions
slide-3
SLIDE 3

3

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

HPC Debugging Challenges

  • There are different kinds of challenges

– Technical – Educational – Organizational

  • It seems to me that they revolve around 3 C’s

– Concurrency – Complexity – Collaboration

slide-4
SLIDE 4

4

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Challenges: Concurrency

  • Distributed multi-process

– Processes may be doing the same thing or different things – Data is distributed across the cluster – Behavior can sometimes be hard to reproduce – A hung process can sometimes be hard to differentiate from a hung node

  • Hybrid and/or multi-threaded

– Behavior can be hard to reproduce – May introduce a second tier of parallelism

  • Scalability

– Runs may include tens or hundreds of thousands of threads of execution

  • Performance of the user’s program
  • Performance of tool
  • Details can overwhelm the user

– How do the users want to interact with these large jobs

  • Lightweight tools?
  • Work with a subset of the processes?
  • Fully featured debugging on the full scale jobs?
slide-5
SLIDE 5

5

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Challenges: Complexity

  • Software tool chain

– Languages and new language constructs – Multiple compilers and platforms

  • Hardware and runtime

– Available node memory – Processor characteristics (with things like the Cell) – What facilities does the runtime provide

  • Breaking new ground

– The “right” answers may be unknown

  • Validation from previous models
slide-6
SLIDE 6

6

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Challenges: Community

  • Codes are developed by large teams

– Train team members on the code and tools and platforms – Share the most effective techniques – Coordinate troubleshooting with the appropriate experts within the team

  • Teams may be highly distributed

– Geographically and organizationally – Debugging may happen from across the hall or across the globe

  • Management of system resources

– Balancing development and production needs – Problems can occur at production scale and with production datasets

  • Should users be allowed to troubleshoot in production queues and at production scale?

– Debugging needs to be able to work with different queue policies

slide-7
SLIDE 7

7

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Solutions

  • Product Overview

– TotalView – MemoryScape – ReplayEngine

  • Large Scale Concurrency

– Interactive Subset Debugging

  • Batch Environments

– Batch Debugging with TVScript

  • Collaboration

– Long Distance Remote Debugging

  • Memory Limitations

– Memory Debugging

slide-8
SLIDE 8

8

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Develop an understanding of program behaviour

TotalView debugger

  • C, C++, Fortran 77, Fortran90, UPC

– Complex Language Features

  • Wide compiler and platform support

– Cray XT – Linux x86, x86-64 – Others: Solaris, BG, Cell, Mac, etc..

  • Parallel debugging

– MPI, pthreads, OpenMP, UPC

  • Memory debugging capabilities

– Integrated into the debugger

  • Remote Display Client
  • Graphical User Interface

– Simple things are easy – Advanced operations are available – Visualization

  • Scripting

– CLI and TVScript

slide-9
SLIDE 9

9

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

MemoryScape

  • What is

MemoryScape?

– Streamlined – Lightweight – Intuitive – Collaborative – Memory Debugging

  • Features

– Shows

  • Memory errors
  • Memory status
  • Memory leaks
  • Buffer overflows

– MPI memory debugging – Remote memory debugging

– Tech

  • Low overhead
  • No Instrumentation

Interface

  • Inductive
  • Collaboration
  • Multi-process

Simple to use, intuitive memory debugging

slide-10
SLIDE 10

10

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

ReplayEngine

  • Enhances debugging experience
  • Add-on to TotalView (version 8.6)
  • Captures execution history
  • Record all external input to program
  • Records internal sources of non-determinism
  • Replays execution history
  • Examine any part of the execution history
  • Step as easily back through code as you do forwards
  • Jump to points of interest
  • Everything is managed by the tool
  • The user just says where they want to go
  • Supported on Linux x86 and x86-64
  • Supports MPI, Pthreads, and OpenMP

Radically simplified debugging

slide-11
SLIDE 11

11

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Large Scale Concurrency

  • Dealing with Terra and Peta Scale

– Challenging for interactive tools – Multiple approaches

  • Interactive Subset Debugging
  • Ongoing Tool Scalability Improvements
  • Scalable Display of Data
slide-12
SLIDE 12

12

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Attaching the Debugger to Part of a Job

  • Debug a subset of the processes that

make up the job

– Sometimes the user does not need to control and see every process to understand the behavior or id the defect

  • The subset can be changed at any

time

– Can narrow, expand or shift focus

  • Uncouples interactive performance

from job size

– After the subset operation completes – Interactive performance depends on subset size

  • Supports the use of lightweight tools

– LLNL’s STAT

  • Recent work

– 1 k of 16k acts like 1k of 1k – BG subset support – Enhanced support for tools integration

slide-13
SLIDE 13

13

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Unprecedented Scalability for Interactive Tool

  • Techniques for using TotalView at scale

– Subset attach, message queue display, cycle detection, call graph, view data across processes and threads, etc.

  • Current scalability (tested and verified)

– Users debug 1 to 4,000 processes regularly – Many operations at 1k take less than a few seconds – Higher scale, depending on the system and application

  • Blue Gene: up to 16k processes
  • Linux cluster: up to 6k processes
  • Cray XT : up to 4k processes
  • Actively working on performance and

scalability

– Improvements come from rigorous profiling and timing – Requires close collaboration with both customers and other vendors

  • Partnership program
slide-14
SLIDE 14

14

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

14 14

Scalable Display of Data

slide-15
SLIDE 15

15

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Debugging in Batch Environment

  • Batch Environments Support

– Many users – Non-interactive usage model

  • Upload data and code
  • Compile
  • Submit
  • Wait
  • Run
  • Download results

– Interactive queues

  • Some sites
  • Smaller scale
  • How to do debugging in this model?

– printf() – Manual TotalView CLI scripting – TVScript

slide-16
SLIDE 16

16

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Batch Debugging with TVScript

  • New in TotalView 8.6
  • User extensible script to drive a target program to completion under the

TotalView debugger.

  • Handles all the event management overhead so the user doesn’t have to.
  • Allows

– You to gather debugging data in the “regular queue” without interactivity while the program runs – You to do very structured and reproducible kinds of problem analysis – You to “narrow down” problems so that you can do focused interactive debugging as a second stage

  • How does it work

– You define breakpoints – You associate operations with those breakpoints such as

  • Print a specific variable
  • Print all local variables
  • Stack trace
  • Count
  • Set other breakpoints, watchpoints
  • Set data within the program

– You submits the script into the batch queue and it runs without any user interaction – Output is gathered into a single debugging output file

slide-17
SLIDE 17

17

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Collaboration

  • Diverse collaborations

– Scientific or technical domain experts – Computer scientists – System consultants – Grad students of various flavors

  • Geographically Distributed
  • Enabling access

– Long Distance Remote Debugging

  • Sharing Data

– Reports and Exports

slide-18
SLIDE 18

18

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Long Distance Remote Display

  • New in TotalView 8.6
  • The Remote Display Client
  • Included in TV distribution
  • Also available on the web
  • Sets up a graphical connection
  • Via ssh
  • Through one or more hosts
  • To a remote machine
  • Provides for a connection that is
  • Easy
  • Fast
  • Secure
  • The Remote Display Client is available for:
  • Linux x86
  • Linux x86-64
  • Windows XP
  • Windows Vista
  • Does job submission with batch

Environments

  • PBS Pro
  • LoadLeveler
slide-19
SLIDE 19

19

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Memory Comparisons and Reports

  • Diff Processes
  • Share HTML Reports
slide-20
SLIDE 20

20

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Memory Limitations

  • A lot more cores, a little more memory
  • Detect leaks

– Less space available means even small leaks are a problem

  • Understand memory usage

– So that you know where to optimize

  • Compare memory behavior

– Across cluster – Between two nodes – Over time – Between runs

slide-21
SLIDE 21

21

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Parallel Memory Debugging

  • Memory is an issue

– Node resources are limited – Predicting and managing memory usage across parallel applications is complex

  • Analysis may include

– Comparing usage across

  • Processes of job
  • Time
  • Datasets

– Exploring layout of allocations – Leak detection – Buffer overflow detection

slide-22
SLIDE 22

22

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

A Look Forward

  • Red Zones

– ‘as it happens’ heap array bounds error detection – Planned for XT in MemoryScape 2.0 and TotalView 8.7

  • ReplayEngine

– Progress update towards support on the XT

slide-23
SLIDE 23

23

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Redzones catch buffer overflows (TV 8.7, MS 3.0)

  • This is a preview
  • Allocates a “protected page”

– adjacent to selected heap allocations – Before or after

  • Writes into this space trigger

events

– Event occurs as the write is happening

  • Pages have a fixed size

– If there are many heap allocations this can potential have a large memory usage

  • verhead
  • Ways to manage Redzones

memory overhead

– Turn redzones on and off manually – Specify (by size) what allocations you want to have redzones on

slide-24
SLIDE 24

24

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Graphical view displaying Redzones (TV 8.7, MS 3.0)

  • This is a preview
  • Redzone is

displayed next to the block

  • Redzone is large

compared to this particular allocation

  • Information on

Redzone usage presented in the heap information tab below the graphic

slide-25
SLIDE 25

25

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

ReplayEngine Update

  • In current (ReplayEngine 1.1.0) version
  • Platform: Linux-86 and Linux-86-64 machines
  • Not yet supporting Cray XT CLE
  • MPIs (certain configurations and usage modes of)
  • MPICH2
  • OpenMPI
  • MVAPICH2
  • Intel MPI
  • HP MPI
  • LAM
  • In next Version
  • Long-running applications
  • Shared memory
  • Remaining Issues
  • Actively working on XT environmental issues
slide-26
SLIDE 26

26

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

ReplayEngine Shared Memory Support (RE 2.0)

  • Multi-process shmem applications

– Explicitly used in user code – Used in library code

  • Should enable improved MPI support

– Nemesis driver in MPICH – Infiniband support

  • Shared memory becomes another IO stream to be

recorded and replayed

– Because this may involve large amounts of memory it is important to provide ways to manage memory usage ..

slide-27
SLIDE 27

27

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

ReplayEngine Record Buffer (RE 2.0)

  • ReplayEngine creates a “recording” of execution

history

– Collection of memory “snapshots” – Input stream – Other sources of non-determinism

  • Limit the size of this buffer

– User configurable limit – Organize it by time – “Throw out” the oldest information when the buffer is full

  • Earliest part of execution history no longer

accessible

  • ReplayEngine can be used on long-running and

high-input codes

slide-28
SLIDE 28

28

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Questions?

slide-29
SLIDE 29

29

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

Early Experience Program

  • Participate in the program to help define new

products

– High water mark memory debugging – GPGPU debugging – Integration of performance and debugging tools – Reverse Debugging graphical representation of time – TracePoints

  • Broader set of users who debug using print style debugging
  • Eliminates frustrations

– manual instrumentation – working with huge text files

  • Very Early Access
  • Input on Use Cases, Features, Designs, GUI
slide-30
SLIDE 30

30

TotalView Technologies –Proprietary– Plans Subject to Change without Notice

For More Information

  • Early Experience Program or

Product Information

– Contact chris.gottbrath@totalviewtech.com

  • Technical support

– support@totalviewtech.com