Dynamic Task Scheduling for the Uintah Framework
Qingyu Meng, Justin Luitjens, and Martin Berzins
Thanks to DOE for funding since 1997, NSF since 2008
for the Uintah Framework Qingyu Meng, Justin Luitjens, and Martin - - PowerPoint PPT Presentation
Dynamic Task Scheduling for the Uintah Framework Qingyu Meng, Justin Luitjens, and Martin Berzins Thanks to DOE for funding since 1997, NSF since 2008 Uintah Applications Plume Fires Angiogenesis Industrial Flares Micropin Flow Explosions
Thanks to DOE for funding since 1997, NSF since 2008
Virtual Soldier Angiogenesis Micropin Flow Shaped Charges Sandstone Compaction Foam Compaction Industrial Flares Plume Fires Explosions
Tuning Expert (CS) Domain Expert (Engineering)
Goal Performance, Salability Problem, Methods Responsibility Infrastructure Components Simulation Components Major Contributions Load balancing, AMR Task-graph based scheduling Asynchronous communication Arches, ICE, MPM, MPM- ICE, etc.. View of Program Parallel Infrastructure MPI, Threads Serial Code Written for a Patch
Information
load balance regrid
Cells Particles
(DARPA hardware report, 2009)
1: 1 1: 2 1: 3 1: 4 2: 2 2: 3 2: 4 2: 2 2: 3 2: 4 3: 3 3: 4 3: 3
Charm++: Object-based Virtualization Intel CnC: new language for graph based parallelism Plasma (Dongarra): DAG based Parallel linear algebra software
4 patches single level ICE task graph
Input variables Output variables (include boundary conditions)
Multicore friendly Support GPU tasks
Running -> Finished -> New task(s) satisfied
Multi-thread
C R M
V V V Var versions
<name, type, patchid> Var del_T Global n/a press CC 1 press CC 2 u_vel FC 1 … … .. … <name, type, patchid> del_T Global n/a press CC 1 press CC 2 u_vel FC 1 … … .. v1 v1 v2 v1 v3 v2 v3 v3 v v v v Variable versioning for out-of-order execution Directory based hash map For fixed order execution
R1 R2 R2 R1 Deadlock Load imbalance
Strong Scaling (Fixed problem size) Weak Scaling (Fixed problem size/Core)
Algorithm Random FCFS PatchOrder MostMsg. Queue Length 3.11 3.16 4.05 4.29 Wait Time 18.9 18.0 7.0 2.6 Overall Time 315.35 308.73 187.19 139.39 executed not executed MPI sends
task wait time
load balance
parallelism
bandwidth and legacy
Problem Specification
(Arches, ICE, MPM, MPMICE, MPMArches, …)
(EoS, Constitutive, …)
Domain Expert Tuning Expert
5
3 3 5
3
3
3
3
5 3 3 5
patches on a single core
volume method for Navier Stokes equations
flows (2009)
uses particles and nodes
common frame of reference
exchange data several times per timestep, not just boundary condition exchange.
Container with PBX explosive