Many-Task Applications in the Integrated Plasma Simulator Samantha - - PowerPoint PPT Presentation

many task applications in the integrated plasma simulator
SMART_READER_LITE
LIVE PREVIEW

Many-Task Applications in the Integrated Plasma Simulator Samantha - - PowerPoint PPT Presentation

Many-Task Applications in the Integrated Plasma Simulator Samantha S. Foley, Wael R. Elwasif, David E. Bernholdt, Aniruddha G. Shet Oak Ridge National Laboratory Randall Bramley Indiana University Motivation ! Computational science is moving


slide-1
SLIDE 1

Many-Task Applications in the Integrated Plasma Simulator

Samantha S. Foley, Wael R. Elwasif, David E. Bernholdt, Aniruddha G. Shet Oak Ridge National Laboratory Randall Bramley Indiana University

slide-2
SLIDE 2

Motivation

! Computational science is moving from single SPMD codes to loosely coupled MPMD applications ! MPMD viewed through a many-task computing (MTC) paradigm:

! Some degree of data and task coupling ! Varying parallelism and runtime between tasks ! Modest number of tasks, executed in a time stepped style

! Mismatch in runtime and parallelism, and the presence of dependencies lead to poor load balancing situations

  • Nov. 15, 2010

2

MTAGS - SC10

slide-3
SLIDE 3

The Integrated Plasma Simulator (IPS)

! The Integrated Plasma Simulator (IPS) is a component framework for fusion energy simulation for the Center for Simulation of RF Wave Interactions with Magnetohydrodynamics (SWIM) ! One of three US DOE SciDAC 2 projects to explore integrated fusion simulation ! Primary directive: “Explore the targeted coupled physics interactions while constituent codes evolve independently, minimizing impact

  • n long lived codes and other research/production use”.

! Code re-factoring and/or rewriting ruled out.

  • Nov. 15, 2010

3

MTAGS - SC10

slide-4
SLIDE 4

IPS Landscape

! Existing physics codes ! Little prior experience with coupling in the fusion community ! Loose coupling and modest data communication ! Target platforms are leadership class facilities (Cray)

  • Nov. 15, 2010

4

MTAGS - SC10

Component Adapter Physics App. Framework Services Framework State Adapter Component Adapter Physics App. State Adapter Plasma State

Solution: evolutionary development of a light-weight Python framework that allows underlying codes to remain unchanged, provides a flexible execution environment, and loosely coupled simulation composition strategies with file-based data coupling

slide-5
SLIDE 5

IPS: Architecture

5

Tasks (Parallel Physics Codes)

Resource Manager Task Manager

slide-6
SLIDE 6

Batch Allocation Head Node IPS Framework

IPS: Levels of Parallelism

Simulation A

Comp 1 Comp 2 Comp 3 Driver

  • 1. Tasks are parallel codes
  • 2. Tasks of a single component can run concurrently
  • 3. Tasks of multiple components can run concurrently
  • 4. Multiple simulations can run concurrently within the same batch

allocation and framework instance These levels of parallelism can be used to improve the resource utilization efficiency

  • Nov. 15, 2010

6

MTAGS - SC10

Simulation B

Comp 1 Comp 2 Comp 3 Driver

slide-7
SLIDE 7

Framework

RM & TM in the IPS

  • Nov. 15, 2010

7

MTAGS - SC10

Simulation A

Comp 1 Comp 2 Comp 3 Driver

RM TM Batch Allocation Queue of Tasks

slide-8
SLIDE 8

Resource Usage Simulator (RUS)

! We created RUS to examine resource utilization and efficiency of IPS simulations

! Accurately simulates task and resource management in the IPS ! Random variation of task execution times

! RUS provides the ability to examine how the multiple levels of parallelism and characteristics of the tasks interact

! Focus on multiple simulations capability

! Ultimately, this tool will be used to inform how IPS simulations can be configured with respect to resource efficiency

  • Nov. 15, 2010

8

MTAGS - SC10

slide-9
SLIDE 9

SWIM Scenarios

! TNT Scenario ! TORIC: 4 processes, 97 ± 2 seconds ! NUBEAM: 16 processes, 115 ± 15 seconds ! TSC: 1 process, 130 ± 40 seconds ! ANT Scenario ! AORSA: 1024 processes, 1020 ± 5 seconds ! NUBEAM: 512 processes, 1020 ± 300 seconds ! TSC: 1 process, 130 ± 40 seconds

  • Nov. 15, 2010

9

MTAGS - SC10 T

T N Time Cores T A N Time Cores

slide-10
SLIDE 10

Multiple Simulation Task Interleaving

! Single simulation ! 43% resource efficiency ! 8 steps completed ! Two simulations ! 64% resource efficiency ! 12 total steps ! Four simulations ! 86% resource efficiency ! 16 total steps ! More physics can be done in the same time and same resources using MTC capability

  • Nov. 15, 2010

10

MTAGS - SC10

slide-11
SLIDE 11

Resource Utilization - TNT

  • Nov. 15, 2010

11

MTAGS - SC10

Resource efficiency

  • Avg. time/simulation

16 cores, 4 sims, 86% effcy

T 4p N 16p

Time Cores

T 1p

slide-12
SLIDE 12

Resource Utilization - ANT

  • Nov. 15, 2010

12

MTAGS - SC10

Resource efficiency

  • Avg. time/simulation

T 1p

A 1024p

N 512p

Time Cores

! >90% efficiency achievable for all multi-simulation cases ! Peak efficiencies

  • ccur at multiples of

the cores needed to run each task ! E.g., 1540 cores allows 1 instance

  • f each task to run

concurrently

slide-13
SLIDE 13

Study of Resource Utilization Trends

! Using RUS we examine the resource utilization efficiency of variations in SWIM workloads

! What happens to the resource utilization when multiple instances of the same simulation execute concurrently? ! What happens to the resource utilization when the time or parallelism of the tasks are varied?

! We performed four studies on the two scenarios:

  • 1. Time scaling of TSC
  • 2. Time scaling of NUBEAM
  • 3. Weak parallel scaling of NUBEAM
  • 4. Strong parallel scaling of NUBEAM

! The following graphs show the highest peak for a given number of simulations versus experiment variation (time or parallelism)

  • Nov. 15, 2010

13

MTAGS - SC10

slide-14
SLIDE 14

Scaling Trends

  • Nov. 15, 2010

14

MTAGS - SC10

T 4p N 16p

Time Cores

T 1p

slide-15
SLIDE 15

Time Scaling of TSC

  • Nov. 15, 2010

15

MTAGS - SC10

T 1p

A 1024p

N 512p

Time Cores

T 4p N 16p

Time Cores

T 1p

TNT ANT

slide-16
SLIDE 16

Time Scaling of NUBEAM

  • Nov. 15, 2010

16

MTAGS - SC10

T 1p

A 1024p

N 512p

Time Cores

T 4p N 16p

Time Cores

T 1p

TNT ANT

slide-17
SLIDE 17

Weak Scaling of NUBEAM

  • Nov. 15, 2010

17

MTAGS - SC10

Weak scaling = increase work, increase parallelism, same runtime

MTAGS - SC10

T 4p N 16p

Time Cores

T 1p T 1p

A 1024p

N 512p

Time Cores

TNT ANT

slide-18
SLIDE 18

Strong Scaling of NUBEAM

  • Nov. 15, 2010

18

MTAGS - SC10

Strong scaling = same work, increase parallelism, decrease runtime

18

T 4p N 16p

Time Cores

T 1p T 1p

A 1024p

N 512p

Time Cores

TNT ANT

slide-19
SLIDE 19

General Observations for Many Task Execution

! Interleaving multiple simulations is an effective way to increase resource utilization efficiency

! Even small numbers of interleaved simulations (3 or 4) are sufficient for significant resource efficiency improvements

! Modest increases in allocation size produce high efficiencies

! Local maxima at larger allocation sizes tend to be lower than the first or second peak

! Great differences in parallelism of tasks provide more

  • pportunities for effective resource utilization

! However, it is more important for the tasks to match in parallelism than in time to improve resource efficiency

  • Nov. 15, 2010

19

MTAGS - SC10

slide-20
SLIDE 20

Future Work

! Examine different SWIM simulation scenarios

! Validate and improve model using data from IPS runs ! Study impact of concurrent task execution in a single simulation

! Study, develop and include models for overheads such as task launch time, I/O, component and framework activities in RUS ! Develop the capability to use RUS as a recommendation system for IPS simulation configuration to maximize resource utilization ! Explore the impact of different scheduling algorithms and policies

  • Nov. 15, 2010

20

MTAGS - SC10

slide-21
SLIDE 21

Summary

! The IPS provides a flexible and lightweight execution environment and coupling framework for MPMD fusion energy applications ! Characteristics of fusion tasks lead to poor resource utilization ! Using RUS, we showed how the execution of small numbers of simultaneous simulations can dramatically improve resource utilization ! Through simulation of resource utilization of real and synthetic workloads, we are able to extract some preliminary guidelines for constructing more efficient coupled simulations using a many task approach

  • Nov. 15, 2010

21

MTAGS - SC10

slide-22
SLIDE 22

Questions?

  • Nov. 15, 2010

22

MTAGS - SC10