michael q jones amp matt b pedersen university of nevada
play

Michael Q. Jones & Matt B. Pedersen University of Nevada Las - PowerPoint PPT Presentation

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas The Distributed Application Debugger is a debugging tool for parallel programs Targets the MPI platform Runs remotley even on private networks Has


  1. Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas

  2. ž The Distributed Application Debugger is a debugging tool for parallel programs ž Targets the MPI platform ž Runs remotley even on private networks ž Has record and replay features. ž Integrates GDB

  3. ž Results from survey of students learning parallel programming concluded 3 things: • 1. Sequential errors are still frequent • 2. Message errors are time consuming • 3. Print statements are still used for debugging

  4. ž Survey results categorized according to the domains of multilevel debugging • Sequential errors • Message errors • Protocol errors ž In addition to • Data decomposition errors • Functional decomposition errors

  5. ž The Client • The GUI interacting with the programmer ž The Call Center • A central messaging hub (running on the cluster) for – Routing messages from the MPI program to The Client – Routing commands from The Client to the MPI program ž Bridges • A relay application for passing data between The Client and The Call Center, when The Call Center is not directly accessible (cluster behind firewall) ž The Runtime • A libraries with wrapper code for the MPI functions (talks to The Call Center)

  6. Login Server Home Cluster Login Firewall Server Cluster Login from Home to Cluster not Directly possible

  7. Login Server Home Cluster Login Firewall Server Bridge Client Bridge Cluster • Client runs at home • Bridges on the servers in between home Call and the cluster Center MPI • Call Center on the cluster MPI MPI • MPI processes on the cluster MPI

  8. ž The user provides a connection path and credentials on all machines

  9. ž The user provides a connection path and credentials on all machines ž The system initiates SSH connections to each configured computer and launches a Bridge or The Call Center. ž Each component then connects to each other via TCP.

  10. ž Include a special mpi.h header file ž MPI calls are caught by wrapper functions ž Upon start up, each node creates a callback connection to The Call Center ž Data passed to MPI functions is sent back.

  11. An MPI session can be run in 3 modes: ž Play • Just run like regular MPI ž Record (Record all messages) • Record all messages ž Replay • Use recorded messages to play back

  12. ž The Runtime behaves like regular MPI • Nothing is saved to disk • Nothing is read from disk • Messages and parameters ARE sent back to The Client

  13. ž The Runtime • Saves messages and parameters to a log file • Executes the actual MPI call • Saves the result

  14. ž The Runtime does not execute any real MPI calls. • All data is supplied from log files. • No actual communication takes place • Guarantees the same run as when the log file was recorded

  15. ž Mixed mode is special • Some processes execute real MPI calls • Some replay from log file – Sometimes its necessary to execute MPI calls if communicating with someone who is executing real MPI calls; E.g. to avoid buffer overflow – Validation is done on real values and log file values

  16. ž The Runtime sends back 2 debugging messages per MPI command • A PRE message indicating that an MPI command is about to be executed • A POST message indicating that an MPI command completed ž Console messages are routed per node to the appropriate window.

  17. ž Debugging data gets displayed within the Console, Messages, or MPI tabs

  18. ž The Console Tab displays anything that the user’s code wrote to stdout .

  19. ž The Messages Tab displays messages as they come ž Matches Send/ Receive pairs between nodes. ž Messages without a corresponding Send or Receive message get highlighted in red.

  20. ž The MPI tab displays all MPI commands • in the order they were executed • along with their parameters. ž Commands statuses (success, fail, or blocked) are displayed with icons in the Status Column.

  21. ž Buffer values can be requested and inspected.

  22. ž GDB can be attached to any node and controlled with the GDB Control Panel.

  23. ž The source code to The Distributed Application Debugger can be found on GitHub at: ž https://github.com/mjones112000/ DistributedApplicationDebugger

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend