The State of CBTF CScADS 2013 - Petascale Tools Workshop July 15, - - PowerPoint PPT Presentation

the state of cbtf
SMART_READER_LITE
LIVE PREVIEW

The State of CBTF CScADS 2013 - Petascale Tools Workshop July 15, - - PowerPoint PPT Presentation

The State of CBTF CScADS 2013 - Petascale Tools Workshop July 15, 2013 J. Green, HPC-3 LANL on behalf of the Open|Speedshop Engineering Team LA-UR-13-25207 UNCLASSIFIED Operated by Los Alamos National Security, LLC for the U.S. Department of


slide-1
SLIDE 1

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

The State of CBTF

CScADS 2013 - Petascale Tools Workshop

July 15, 2013

  • J. Green, HPC-3 LANL
  • n behalf of the Open|Speedshop Engineering Team

LA-UR-13-25207

slide-2
SLIDE 2

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 2

  • mponent

ased ool ramework

§ Brief Overview of CBTF § Project Status § Discuss Open|Speedshop Implemented with CBTF Framework § Going Public § Site Specific Tools and Tests

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-3
SLIDE 3

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 3

Component Based Tool Framework

§ Framework tailored to rapid, scalable cluster tool development with Reusable Components § C++ / XML Code § Dataflow Programming Model § MRNet (Multicast Reduction Network) communication transport layer

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-4
SLIDE 4

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 4

Open|Speedshop Built on Component Based Tool Framework

§ Supports Same Features, Increased Scalability while Maintaining Ease of Use § New O|SS Experiments Under Development:

– Memory Experiment – Threading Experiment – I/O Profiling Experiment – GPU Experiment

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-5
SLIDE 5

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 5

Open|Speedshop Built on Component Based Tool Framework

§ Production Ready Open|Speedshop Using CBTF Framework Slated for Fall 2013 § “Friendly-testing” Versions Available on LANL Production Clusters § All Current O|SS collectors work with CBTF version

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-6
SLIDE 6

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 6

CBTF Memory Analysis Collector

§ Memory Analysis – Memory Consumption Information – Map Memory Allocations Back to Source Code – Top Ten Malloc(s) and New(s) – Top Ten Malloc(s) and New(s) Not Freed – Allocation Lifetimes and Sizes

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-7
SLIDE 7

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 7

CBTF Threading Analysis Collector

§ Statistics on Pthread Wait § OpenMP (OMP) Blocking Times § Relate Performance to Threads § Alias to Shorten POSIX Thread IDs for Improved Readability § Synchronization Overhead Mapping to Threads

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-8
SLIDE 8

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 8

Other New CBTF O|SS Collectors

§ Lightweight Tracing of I/O Functions – Capability to Efficiently Profile I/O Time Spent in Applications § CUDA/GPU Collector – Support for Performance Analysis of Applications Built with Cuda / OpenCL for Nvidia GPUs

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-9
SLIDE 9

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 9

Public Repository

§ CBTF Source Code to be Moved to SourceForge Publicly Accessible Repository § Documentation and Tutorials Available at new site for Demonstrating Tool Development Techniques

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-10
SLIDE 10

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 10

Tools Created at Los Alamos Nat’l Lab

§ Tool Implementations Using CBTF § Tools Will Be Available in /contrib Directory § Proof of Concept that CBTF Enables Rapid Scalable Tool Development § CBTF Tools Scale

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-11
SLIDE 11

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 11

GPU Monitoring with CBTF

§ Six tools Developed

– checkGpuMemory – checkConfigs – checkPctUsage – checkPstate – checkPstateOnly – checkAll

§ NVIDIA Management Library § Works with MRNet Trees of Depth 3 or More

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-12
SLIDE 12

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 12

Pstool Scaling Study - Success!

§ PSTool performs `ps` command on all nodes – Reports common processes – Reports nodes running “rogue” processes § 1550 pes returned in under twenty seconds – LANL’s Mustang – Correctly identified:

– “rogue” ping process manually injected on node – slurmd and munge processes on head node and node targeted to run `ping`

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-13
SLIDE 13

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 13

Future Works

§ CBTF Components Support Python § New QT4 Based Framework – O|SS GUI Views Under Development § Improving Documentation for System Administrators, Tool Developers and End Users § Goal: Production Ready O|SS/CBTF by SC’13

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!

slide-14
SLIDE 14

| Los Alamos National Laboratory |

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

UNCLASSIFIED

July 2013 | UNCLASSIFIED | 14

Thank you!

To our audience, sponsors and affiliates.

Overview O|SS Over CBTF New Components Going Public Site Specific Future Works Conclusion and Thanks!