Slide 1
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
What is an HPC Work)low ? Applica'on View Run$me - - PowerPoint PPT Presentation
Slide 1 HPC Work)low Performance Karen L. Karavanic New Mexico Consortium & Portland State University David Montoya (LANL) August 2, 2016 UNCLASSIFIED - LA-UR-16-23542 Operated by Los Alamos National Security, LLC for the U.S.
Slide 1
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 2
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
*E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, R. Mayani, W. Chen, R. Ferreira da Silva, M. Livny, and K. Wenger, “Pegasus: a Workflow Management System for Science Automation,” Future Generation Computer Systems, vol. 46, pp. 17-35,
Slide 3
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
*J. Wang, Crawl, D., and Altintas, I., “A Framework for Distributed Data-Parallel Execution in the Kepler Scientific Workflow System”, in 1st International Workshop on Advances in the Kepler Scientific Workflow System and Its Applications at ICCS 2012 Conference, 2012. ¡
Slide 4
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 5
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-20222
Layer 0 – Campaign
project is completed - Working through phases
Layer 1 – Job Run
provide an end-to-end repeatable process with differing input parameters
question.
Layer 2 – Application
Interacts across memory hierarchy to archival targets
aspects of the physics
Layer 3 – Package
various levels of memory, cache levels and the overall underlying platform
Slide 6
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-20222 We ¡described ¡a ¡layer ¡above ¡the ¡ application ¡layer ¡(2) ¡that ¡posed ¡use ¡ cases ¡that ¡used ¡the ¡application ¡in ¡ potential ¡different ¡ways. ¡This ¡also ¡ allowed ¡the ¡entry ¡of ¡environment ¡ based ¡entities ¡that ¡impact ¡a ¡given ¡ workflow ¡and ¡also ¡allow ¡impact ¡of ¡ scale ¡and ¡processing ¡decisions. ¡At ¡ this ¡level ¡we ¡can ¡describe ¡time, ¡ volume ¡and ¡speed ¡requirements. ¡
Slide 7
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 8
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 9
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 10
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 11
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 12
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 13
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 14
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-20222
Collection approaches Pull data from data bases summarized for historic runs What is collected from each run – job level information. App and system – integrated and tracked. Feeds up. During run of app, mainly from within app- data, phases – integrated with system data for environmental
During run of app, mainly from within app – more intrusive collection. Performance, algorithm, architecture, compiler impact etc. Feeds up. For jobs Requirements across time. Scale, checkpoint, data read/written, Data needs
Requirements for job run. Data movement, checkpoint and local needs, data analysis process, data management. Multiple job tracking, resource integration into system. Memory use, BB utilization, differences between packages in app, time step transition, analysis/preparation of data for analysis, IO Detailed measurements traditionally done through instrumentation and traditional tools such as Tau, HPC Toolkit, Open| SpeedShop, Cray Apprentice, etc. Focus on
Slide 15
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 16
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
* Sanchez et al, Design and Implementation of a Scalable HPC Monitoring System, HPCMaspa 2016.
Slide 17
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 18
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 19
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 20
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 21
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Slide 22
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
UNCLASSIFIED - LA-UR-16-23542
Unless ¡otherwise ¡indicated, ¡this ¡work ¡was ¡conducted ¡at ¡the ¡Ultrascale ¡Systems ¡Research ¡ Center ¡(USRC) ¡supported ¡by ¡Los ¡Alamos ¡National ¡Laboratory ¡under ¡Contract ¡No. ¡DE-‑ AC52-‑06NA25396 ¡with ¡the ¡U.S. ¡Department ¡of ¡Energy. ¡The ¡U.S. ¡Government ¡has ¡rights ¡to ¡ use, ¡reproduce, ¡and ¡distribute ¡this ¡information. ¡This ¡work ¡supported ¡in ¡part ¡by ¡Portland ¡ State ¡University ¡and ¡by ¡the ¡New ¡Mexico ¡Consortium. ¡ ¡ It ¡took ¡a ¡whole ¡village ¡to ¡do ¡the ¡work ¡mentioned ¡and ¡described ¡here. ¡ ¡