Many-Task Applications in the Integrated Plasma Simulator
Samantha S. Foley, Wael R. Elwasif, David E. Bernholdt, Aniruddha G. Shet Oak Ridge National Laboratory Randall Bramley Indiana University
Many-Task Applications in the Integrated Plasma Simulator Samantha - - PowerPoint PPT Presentation
Many-Task Applications in the Integrated Plasma Simulator Samantha S. Foley, Wael R. Elwasif, David E. Bernholdt, Aniruddha G. Shet Oak Ridge National Laboratory Randall Bramley Indiana University Motivation ! Computational science is moving
Samantha S. Foley, Wael R. Elwasif, David E. Bernholdt, Aniruddha G. Shet Oak Ridge National Laboratory Randall Bramley Indiana University
! Computational science is moving from single SPMD codes to loosely coupled MPMD applications ! MPMD viewed through a many-task computing (MTC) paradigm:
! Some degree of data and task coupling ! Varying parallelism and runtime between tasks ! Modest number of tasks, executed in a time stepped style
! Mismatch in runtime and parallelism, and the presence of dependencies lead to poor load balancing situations
2
MTAGS - SC10
! The Integrated Plasma Simulator (IPS) is a component framework for fusion energy simulation for the Center for Simulation of RF Wave Interactions with Magnetohydrodynamics (SWIM) ! One of three US DOE SciDAC 2 projects to explore integrated fusion simulation ! Primary directive: “Explore the targeted coupled physics interactions while constituent codes evolve independently, minimizing impact
! Code re-factoring and/or rewriting ruled out.
3
MTAGS - SC10
! Existing physics codes ! Little prior experience with coupling in the fusion community ! Loose coupling and modest data communication ! Target platforms are leadership class facilities (Cray)
4
MTAGS - SC10
Component Adapter Physics App. Framework Services Framework State Adapter Component Adapter Physics App. State Adapter Plasma State
5
Tasks (Parallel Physics Codes)
Resource Manager Task Manager
Batch Allocation Head Node IPS Framework
Simulation A
Comp 1 Comp 2 Comp 3 Driver
allocation and framework instance These levels of parallelism can be used to improve the resource utilization efficiency
6
MTAGS - SC10
Simulation B
Comp 1 Comp 2 Comp 3 Driver
Framework
7
MTAGS - SC10
Simulation A
Comp 1 Comp 2 Comp 3 Driver
RM TM Batch Allocation Queue of Tasks
! We created RUS to examine resource utilization and efficiency of IPS simulations
! Accurately simulates task and resource management in the IPS ! Random variation of task execution times
! RUS provides the ability to examine how the multiple levels of parallelism and characteristics of the tasks interact
! Focus on multiple simulations capability
! Ultimately, this tool will be used to inform how IPS simulations can be configured with respect to resource efficiency
8
MTAGS - SC10
! TNT Scenario ! TORIC: 4 processes, 97 ± 2 seconds ! NUBEAM: 16 processes, 115 ± 15 seconds ! TSC: 1 process, 130 ± 40 seconds ! ANT Scenario ! AORSA: 1024 processes, 1020 ± 5 seconds ! NUBEAM: 512 processes, 1020 ± 300 seconds ! TSC: 1 process, 130 ± 40 seconds
9
MTAGS - SC10 T
T N Time Cores T A N Time Cores
! Single simulation ! 43% resource efficiency ! 8 steps completed ! Two simulations ! 64% resource efficiency ! 12 total steps ! Four simulations ! 86% resource efficiency ! 16 total steps ! More physics can be done in the same time and same resources using MTC capability
10
MTAGS - SC10
11
MTAGS - SC10
Resource efficiency
16 cores, 4 sims, 86% effcy
T 4p N 16p
Time Cores
T 1p
12
MTAGS - SC10
Resource efficiency
T 1p
A 1024p
N 512p
Time Cores
! >90% efficiency achievable for all multi-simulation cases ! Peak efficiencies
the cores needed to run each task ! E.g., 1540 cores allows 1 instance
concurrently
! Using RUS we examine the resource utilization efficiency of variations in SWIM workloads
! What happens to the resource utilization when multiple instances of the same simulation execute concurrently? ! What happens to the resource utilization when the time or parallelism of the tasks are varied?
! We performed four studies on the two scenarios:
! The following graphs show the highest peak for a given number of simulations versus experiment variation (time or parallelism)
13
MTAGS - SC10
14
MTAGS - SC10
T 4p N 16p
Time Cores
T 1p
15
MTAGS - SC10
T 1p
A 1024p
N 512p
Time Cores
T 4p N 16p
Time Cores
T 1p
TNT ANT
16
MTAGS - SC10
T 1p
A 1024p
N 512p
Time Cores
T 4p N 16p
Time Cores
T 1p
TNT ANT
17
MTAGS - SC10
Weak scaling = increase work, increase parallelism, same runtime
MTAGS - SC10
T 4p N 16p
Time Cores
T 1p T 1p
A 1024p
N 512p
Time Cores
TNT ANT
18
MTAGS - SC10
Strong scaling = same work, increase parallelism, decrease runtime
18
T 4p N 16p
Time Cores
T 1p T 1p
A 1024p
N 512p
Time Cores
TNT ANT
! Even small numbers of interleaved simulations (3 or 4) are sufficient for significant resource efficiency improvements
! Local maxima at larger allocation sizes tend to be lower than the first or second peak
! However, it is more important for the tasks to match in parallelism than in time to improve resource efficiency
19
MTAGS - SC10
! Validate and improve model using data from IPS runs ! Study impact of concurrent task execution in a single simulation
20
MTAGS - SC10
21
MTAGS - SC10
22
MTAGS - SC10