Michael Q. Jones & Matt B. Pedersen University of Nevada Las - - PowerPoint PPT Presentation

michael q jones amp matt b pedersen university of nevada
SMART_READER_LITE
LIVE PREVIEW

Michael Q. Jones & Matt B. Pedersen University of Nevada Las - - PowerPoint PPT Presentation

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas The Distributed Application Debugger is a debugging tool for parallel programs Targets the MPI platform Runs remotley even on private networks Has


slide-1
SLIDE 1

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas

slide-2
SLIDE 2

ž The Distributed Application Debugger is

a debugging tool for parallel programs

ž Targets the MPI platform ž Runs remotley even on private networks ž Has record and replay features. ž Integrates GDB

slide-3
SLIDE 3

ž Results from survey of students learning

parallel programming concluded 3 things:

  • 1. Sequential errors are still frequent
  • 2. Message errors are time consuming
  • 3. Print statements are still used for debugging
slide-4
SLIDE 4

ž Survey results categorized according to

the domains of multilevel debugging

  • Sequential errors
  • Message errors
  • Protocol errors

ž In addition to

  • Data decomposition errors
  • Functional decomposition errors
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

ž The Client

  • The GUI interacting with the programmer

ž The Call Center

  • A central messaging hub (running on the cluster) for

– Routing messages from the MPI program to The Client – Routing commands from The Client to the MPI program ž Bridges

  • A relay application for passing data between The

Client and The Call Center, when The Call Center is not directly accessible (cluster behind firewall)

ž The Runtime

  • A libraries with wrapper code for the MPI functions

(talks to The Call Center)

slide-8
SLIDE 8

Home Firewall Login Server Cluster Login Server Cluster Login from Home to Cluster not Directly possible

slide-9
SLIDE 9

Home Firewall Login Server Cluster Login Server Cluster

  • Client runs at home
  • Bridges on the servers in between home

and the cluster

  • Call Center on the cluster
  • MPI processes on the cluster

Client MPI Call Center Bridge Bridge MPI MPI MPI

slide-10
SLIDE 10
slide-11
SLIDE 11

ž The user provides a connection path and

credentials on all machines

slide-12
SLIDE 12

ž The user provides a connection path and

credentials on all machines

ž The system initiates SSH connections to

each configured computer and launches a Bridge or The Call Center.

ž Each component then connects to each

  • ther via TCP.
slide-13
SLIDE 13
slide-14
SLIDE 14

ž Include a special mpi.h header file ž MPI calls are caught by wrapper

functions

ž Upon start up, each node creates a

callback connection to The Call Center

ž Data passed to MPI functions is sent

back.

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

An MPI session can be run in 3 modes:

ž Play

  • Just run like regular MPI

ž Record (Record all messages)

  • Record all messages

ž Replay

  • Use recorded messages to play back
slide-19
SLIDE 19

ž The Runtime behaves like regular MPI

  • Nothing is saved to disk
  • Nothing is read from disk
  • Messages and parameters ARE sent back to The

Client

slide-20
SLIDE 20

ž The Runtime

  • Saves messages and parameters to a log file
  • Executes the actual MPI call
  • Saves the result
slide-21
SLIDE 21

ž The Runtime does not execute any real

MPI calls.

  • All data is supplied from log files.
  • No actual communication takes place
  • Guarantees the same run as when the log file

was recorded

slide-22
SLIDE 22

ž Mixed mode is special

  • Some processes execute real MPI calls
  • Some replay from log file

– Sometimes its necessary to execute MPI calls if communicating with someone who is executing real MPI calls; E.g. to avoid buffer overflow – Validation is done on real values and log file values

slide-23
SLIDE 23

ž The Runtime sends back 2 debugging

messages per MPI command

  • A PRE message indicating that an MPI command

is about to be executed

  • A POST message indicating that an MPI

command completed

ž Console messages are routed per node

to the appropriate window.

slide-24
SLIDE 24

ž Debugging data gets displayed within

the Console, Messages, or MPI tabs

slide-25
SLIDE 25

ž The Console Tab displays anything that

the user’s code wrote to stdout.

slide-26
SLIDE 26

ž The Messages Tab

displays messages as they come

ž Matches Send/

Receive pairs between nodes.

ž Messages without a

corresponding Send

  • r Receive message

get highlighted in red.

slide-27
SLIDE 27

ž The MPI tab displays all

MPI commands

  • in the order they were executed
  • along with their parameters.

ž Commands statuses

(success, fail, or blocked) are displayed with icons in the Status Column.

slide-28
SLIDE 28
slide-29
SLIDE 29

ž Buffer values can

be requested and inspected.

slide-30
SLIDE 30
slide-31
SLIDE 31

ž GDB can be

attached to any node and controlled with the GDB Control Panel.

slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34

ž The source code to The Distributed

Application Debugger can be found on GitHub at:

ž https://github.com/mjones112000/

DistributedApplicationDebugger

slide-35
SLIDE 35