Efficient Abstractions for Exascale Software Design J AMES C. S - - PowerPoint PPT Presentation

efficient abstractions for exascale software design
SMART_READER_LITE
LIVE PREVIEW

Efficient Abstractions for Exascale Software Design J AMES C. S - - PowerPoint PPT Presentation

Institute for CLEAN AND SECURE ENERGY THE UNIVERSITY OF UTAH TM Efficient Abstractions for Exascale Software Design J AMES C. S UTHERLAND Associate Professor - Chemical Engineering The University of Utah DOE Awards DE-NA0002375 PetaApps


slide-1
SLIDE 1

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Efficient Abstractions for Exascale Software Design

JAMES C. SUTHERLAND

Associate Professor - Chemical Engineering The University of Utah

DOE Awards DE-NA0002375 DE-NA-000740

PetaApps award 0904631 XPS award1337145

slide-2
SLIDE 2

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Acknowledgments

Matt Might

ASSOCIATE PROFESSOR SCHOOL OF COMPUTING

Chris Earl

POST-DOCTORAL RESEARCH ASSOCIATE (NOW AT LLNL)

Tony Saad

SENIOR COMPUTATIONAL SCIENTIST NSF PetaApps award 0904631

Devin Robison

M.S. STUDENT NOW AT FUSION IO

Abhishek Bagusetty

M.S. STUDENT NOW AT U. PITTSBURGH US DOE/NNSA

Amir Biglari

PH.D. STUDENT

Babak Goshayeshi

PH.D. STUDENT NOW A POST-DOC

Nathan Yonkee

PH.D. STUDENT

slide-3
SLIDE 3

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

One-Dimensional Turbulence (ODT)

Cost:

  • ODT: ~600 CPU hours (~1.5 hours/realization,

400 realizations) - scales as Re3/2.

  • DNS: ~2 million hours - scales as Re3.

ODT domain

slide-4
SLIDE 4

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

0.2 0.4 0.6 5 10 15 Probability Density Standoff Distance (m) CPD Kob Exp

ODT of Multiphase Reacting Flows

Length (cm) x/dj 20 30 40 50 60 70 −10 −5 5 10 800 1000 1200 1400 1600

slide-5
SLIDE 5

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Parameterizing Manifolds in Turbulent Combustion

Common parameterization PCA parameterization

slide-6
SLIDE 6

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Parameterizing Manifolds in Turbulent Combustion

Common parameterization PCA parameterization

Enabling technologies:

  • Principal component analysis
  • Multivariate adaptive regression
slide-7
SLIDE 7

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Reduced cost while maintaining accuracy

Y/H <YOH>

−5 5 1 2 3 x 10

−3

τ <T>[K]

10 20 30 40 500 1000 1500

PCA to identify model on 11-dimensional original system. Truncation to two dimensions.

slide-8
SLIDE 8

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Time Integration Methods for Stiff, Nonlinear Systems

t (ms) 20 40 60 80 100 T (K) 500 1000 1500 2000 2500 3000

Reactor (DT-BDF-1) Inlet

Heptane/Air (654 spec., 4846 rxns) Δtmax Newton = 0.1 μs Δtprocess = 1 ms

  • est. 2000x speedup

Equally fast as Newton’s method at small Δt. Gives choice of resolution despite ignition/extinction. Stable for any Δt. Robust nonlinear solver for explosive chemistry.

Dual time-stepping

Significant performance gains with our adaptive Δσ scheme.

slide-9
SLIDE 9

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

HPC is becoming more challenging

Physics problems that we are tackling are increasingly complex. Hardware architectures are increasingly complex, uncertain, and even divergent!

Year Machine Peak Speed Cores Programming model Cost ($) Footprint

1997 ASCI Red 1 TFLOPS 9,298 MPI (distributed memory) 55 M 1,600 ft2 2012 Sequoia 16 PFLOPS 1,572,864 (98,304x16) MPI (+threads) (mixed) ~250 M 3,000 ft2 2012 Titan 17 PFLOPS 299,000 CPU (18,688x16) 50,233,344 GPU MPI + CUDA + threads (mixed) 97 M 4,350 ft2 2014 Xeon Phi 1 TFLOPS ~50 “traditional” ~3 K your foot 2014 NVidia GPU ~3 TFLOPS 4,992 CUDA ~3 K your foot

Cost of software rewrites measured in tens of millions of dollars!

slide-10
SLIDE 10

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Enabling technologies: Task-graph:

  • MPI communication
  • Threaded task scheduling
  • Allows overlap of computation with

communication

  • Automatic memory management (fields are

where you need them, when you need them)

Taming the complexity beast…

Domain-Specific Language

  • Array & stencil operations
  • GPU & multithread execution

Goals:

  • Efficiently use complex

modern architectures.

  • Enhance programmer

productivity by insulating the programmer from details.

slide-11
SLIDE 11

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Distributed task-graph:

  • MPI communication
  • Coarse-grained, threaded task scheduling

Hierarchical Parallelization

Uintah framework1

  • Domain decomposition data parallelism
  • Task parallelism & coarse-grained DAG
  • Scales to largest capability machines.
  • 1. Berzins, M., Meng, Q., Schmidt, J., & Sutherland, J. C., DAG-based software frameworks for PDEs. In Euro-Par 2011: Parallel Processing Workshops (pp. 324–333). Springer.
slide-12
SLIDE 12

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Distributed task-graph:

  • MPI communication
  • Coarse-grained, threaded task scheduling

On-node task-graph

  • memory management & fine-

grained task scheduling.

  • thread-pools
  • GPU management

Hierarchical Parallelization

Uintah framework1

  • Domain decomposition data parallelism
  • Task parallelism & coarse-grained DAG
  • Scales to largest capability machines.

“Expression Library” 2

  • “fine-grained” task graph for PDE assembly.
  • tasks consist of stencil & field operations

Nebo EDSL3

  • GPU & multithreaded execution
  • Matlab-style syntax

Utilize resources at “high” level & “push down” as parallelism runs out EDSL

  • array & stencil operations
  • GPU & multithread execution
  • 1. Berzins, M., Meng, Q., Schmidt, J., & Sutherland, J. C., DAG-based software frameworks for PDEs. In Euro-Par 2011: Parallel Processing Workshops (pp. 324–333). Springer.
  • 2. Notz, P

. K., Pawlowski, R. P ., & Sutherland, J. C., Graph-Based Software Design for Managing Complexity and Enabling Concurrency in Multiphysics PDE Software. ACM TOMS (2012).

  • 3. Earl, C., Might, M., Bagusetty, A., & Sutherland, J. C., Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations. Journal
  • f Systems and Software, to appear.
slide-13
SLIDE 13

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

u Γ T p yi τ

Register all expressions

  • Each “expression” calculates one or more field

quantities.

  • Each expression advertises its direct dependencies.

Expression Registry

ρ φ sφ

A Simple Example of DAGs

*Notz, Pawlowski, & Sutherland (2012). ACM Transactions on Mathematical Software, 39(1).

slide-14
SLIDE 14

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

u

Γ

T

Γ = Γ(T, p, yi)

p yi τ

Direct (expressed) dependencies.

Register all expressions

  • Each “expression” calculates one or more field

quantities.

  • Each expression advertises its direct dependencies.

Set a “root” expression; construct a graph

  • All dependencies are discovered/resolved automatically.
  • Highly localized influence of changes in models.
  • Not all expressions in the registry may be relevant/used.

Expression Registry

ρ φ sφ

A Simple Example of DAGs

*Notz, Pawlowski, & Sutherland (2012). ACM Transactions on Mathematical Software, 39(1).

slide-15
SLIDE 15

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

u

Γ

T

Γ = Γ(T, p, yi)

p yi τ

Direct (expressed) dependencies. Indirect (discovered) dependencies.

Register all expressions

  • Each “expression” calculates one or more field

quantities.

  • Each expression advertises its direct dependencies.

Set a “root” expression; construct a graph

  • All dependencies are discovered/resolved automatically.
  • Highly localized influence of changes in models.
  • Not all expressions in the registry may be relevant/used.

Expression Registry

ρ φ sφ

A Simple Example of DAGs

*Notz, Pawlowski, & Sutherland (2012). ACM Transactions on Mathematical Software, 39(1).

slide-16
SLIDE 16

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

u

Γ

T

Γ = Γ(T, p, yi)

p yi τ

Direct (expressed) dependencies. Indirect (discovered) dependencies.

Register all expressions

  • Each “expression” calculates one or more field

quantities.

  • Each expression advertises its direct dependencies.

Set a “root” expression; construct a graph

  • All dependencies are discovered/resolved automatically.
  • Highly localized influence of changes in models.
  • Not all expressions in the registry may be relevant/used.

From the graph:

  • Deduce storage requirements & allocate memory

(externally to each expression).

  • Automatically schedule evaluation, ensuring proper
  • rdering.
  • Asynchronous execution is critical! (overlap communication &

computation)

  • Robust scheduling algorithms are key.

Expression Registry

ρ φ sφ

A Simple Example of DAGs

*Notz, Pawlowski, & Sutherland (2012). ACM Transactions on Mathematical Software, 39(1).

slide-17
SLIDE 17

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Changes in model form are naturally handled

q λ T

q = λrT

Pure substance heat flux:

slide-18
SLIDE 18

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Changes in model form are naturally handled

q λ T

q = λrT +

n

X

i=1

hiJi

h1 hn J1 Jn y1 yn

Multi-species mixture heat flux:

No complex logic changes in code when model are added/changed.

slide-19
SLIDE 19

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

DSL: “Matlab for PDEs on Supercomputers”

Field & stencil

  • perations:

rhs <<= -divOpX( xConvFlux + xDiffFlux )

  • divOpY( yConvFlux + yDiffFlux )
  • divOpZ( zConvFlux + zDiffFlux );

rhs = − ∂ ∂x(Jx + Cx) − ∂ ∂y (Jy + Cy) − ∂ ∂z (Jz + Cz)

Can “chain” stencil operations.

Auto-generate code for efficient execution on CPU, GPU, XeonPhi, etc. during compilation.

(and much more)

  • Embedded in C++ to avoid two-stage compilation & improve portability.
  • CPU (serial, multicore), GPU, Xeon Phi backends.
  • Cross-platform memory pools (CPU/GPU).
  • 70+ natively supported discrete operators (easily define new ones).
  • Strongly typed fields & operators.
  • Masks - allow operations on field subsets.
  • conditional operations (vectorized ‘if’).
  • Field can live in multiple locations (GPU, CPU) simultaneously.

Expressiveness Efficiency

C++ Matlab DSL

slide-20
SLIDE 20

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

( rhoYi0 ) (4 post-procs) ( rhoYi1 ) (4 post-procs) ( rhoYi2 ) (4 post-procs) ( rhoYi3 ) (4 post-procs) ( rhoYi4 ) (4 post-procs) ( rhoYi5 ) (4 post-procs) ( rhoYi6 ) (4 post-procs) ( rhoYi7 ) (4 post-procs) ( rhoYi0_RHS ) ( Jx0 ) ( Jx1 ) ( Jx2 ) ( Jx3 ) ( Jx4 ) ( Jx5 ) ( Jx6 ) ( Jx7 ) ( Jx8 ) ( mixMW ) ( yi0 ) ( yi1 ) ( yi2 ) ( yi3 ) ( yi4 ) ( yi5 ) ( yi6 ) ( yi7 ) ( yi8 ) ( D0 ) ( D1 ) ( D2 ) ( D3 ) ( D4 ) ( D5 ) ( D6 ) ( D7 ) ( D8 ) ( temp ) (4 post-procs) ( pressure ) ( Jy0 ) ( Jy1 ) ( Jy2 ) ( Jy3 ) ( Jy4 ) ( Jy5 ) ( Jy6 ) ( Jy7 ) ( Jy8 ) ( r0 ) ( r1 ) ( r2 ) ( r3 ) ( r4 ) ( r5 ) ( r6 ) ( r7 ) ( r8 ) ( rhoYi1_RHS ) ( rhoYi2_RHS ) ( rhoYi3_RHS ) ( rhoYi4_RHS ) ( rhoYi5_RHS ) ( rhoYi6_RHS ) ( rhoYi7_RHS ) ( heatFluxx ) ( lambda ) ( heatFluxy ) ( enthalpy0 ) ( enthalpy1 ) ( enthalpy2 ) ( enthalpy3 ) ( enthalpy4 ) ( enthalpy5 ) ( enthalpy6 ) ( enthalpy7 ) ( enthalpy8 ) enthalpy_RHS ( density ) ( enthalpy ) (4 post-procs)

Real example: PoKiTT

ρ∂yi ∂t = r · Ji + si ρ∂h ∂t = r · qi

(Portable Kinetics, Thermodynamics & Transport)

Triple flame computed on GPU with PoKiTT

  • Detailed kinetics
  • Mixture-averaged transport
  • Detailed thermodynamics
slide-21
SLIDE 21

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM ( rhoYi0 ) (4 post-procs) ( rhoYi1 ) (4 post-procs) ( rhoYi2 ) (4 post-procs) ( rhoYi3 ) (4 post-procs) ( rhoYi4 ) (4 post-procs) ( rhoYi5 ) (4 post-procs) ( rhoYi6 ) (4 post-procs) ( rhoYi7 ) (4 post-procs) ( rhoYi0_RHS ) ( Jx0 ) ( Jx1 ) ( Jx2 ) ( Jx3 ) ( Jx4 ) ( Jx5 ) ( Jx6 ) ( Jx7 ) ( Jx8 ) ( mixMW ) ( yi0 ) ( yi1 ) ( yi2 ) ( yi3 ) ( yi4 ) ( yi5 ) ( yi6 ) ( yi7 ) ( yi8 ) ( D0 ) ( D1 ) ( D2 ) ( D3 ) ( D4 ) ( D5 ) ( D6 ) ( D7 ) ( D8 ) ( temp ) (4 post-procs) ( pressure ) ( Jy0 ) ( Jy1 ) ( Jy2 ) ( Jy3 ) ( Jy4 ) ( Jy5 ) ( Jy6 ) ( Jy7 ) ( Jy8 ) ( r0 ) ( r1 ) ( r2 ) ( r3 ) ( r4 ) ( r5 ) ( r6 ) ( r7 ) ( r8 ) ( rhoYi1_RHS ) ( rhoYi2_RHS ) ( rhoYi3_RHS ) ( rhoYi4_RHS ) ( rhoYi5_RHS ) ( rhoYi6_RHS ) ( rhoYi7_RHS ) ( heatFluxx ) ( lambda ) ( heatFluxy ) ( enthalpy0 ) ( enthalpy1 ) ( enthalpy2 ) ( enthalpy3 ) ( enthalpy4 ) ( enthalpy5 ) ( enthalpy6 ) ( enthalpy7 ) ( enthalpy8 ) enthalpy_RHS ( density ) ( enthalpy ) (4 post-procs)

Real example: PoKiTT

ρ∂yi ∂t = r · Ji + si ρ∂h ∂t = r · qi

(Portable Kinetics, Thermodynamics & Transport)

Triple flame computed on GPU with PoKiTT

  • Detailed kinetics
  • Mixture-averaged transport
  • Detailed thermodynamics
slide-22
SLIDE 22

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

“Modifiers” — injecting new dependencies

Motivation:

  • Boundary conditions: modify a subset of the

computed values.

  • Multiphase coupling: add source terms to RHS
  • f equations.

A B C

slide-23
SLIDE 23

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

“Modifiers” — injecting new dependencies

Motivation:

  • Boundary conditions: modify a subset of the

computed values.

  • Multiphase coupling: add source terms to RHS
  • f equations.

Modifiers allow “push” rather than “pull” dependency addition. Modifiers are deployed after the node they are attached to, and are provided a handle to the field just computed.

A B C BC1 S1

slide-24
SLIDE 24

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

“Modifiers” — injecting new dependencies

Motivation:

  • Boundary conditions: modify a subset of the

computed values.

  • Multiphase coupling: add source terms to RHS
  • f equations.

Modifiers allow “push” rather than “pull” dependency addition. Modifiers are deployed after the node they are attached to, and are provided a handle to the field just computed. Modifiers can introduce new dependencies to the graph.

A B C BC1 S1 D E F

slide-25
SLIDE 25

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Coal combustion

  • 55 PDEs
  • ~35 ODEs per particle
  • Complex interphase coupling
SpeciesSourceTerm_0 SpeciesSourceTerm_1 SpeciesSourceTerm_2 SpeciesSourceTerm_3 SpeciesSourceTerm_4 SpeciesSourceTerm_5 SpeciesSourceTerm_6 SpeciesSourceTerm_7 SpeciesSourceTerm_8 SpeciesSourceTerm_9 SpeciesSourceTerm_10 SpeciesSourceTerm_11 SpeciesSourceTerm_12 SpeciesSourceTerm_13 SpeciesSourceTerm_14 SpeciesSourceTerm_15 SpeciesSourceTerm_16 SpeciesSourceTerm_17 SpeciesSourceTerm_18 SpeciesSourceTerm_19 SpeciesSourceTerm_20 SpeciesSourceTerm_21 SpeciesSourceTerm_22 SpeciesSourceTerm_23 SpeciesSourceTerm_24 SpeciesSourceTerm_25 SpeciesSourceTerm_26 SpeciesSourceTerm_27 SpeciesSourceTerm_28 SpeciesSourceTerm_29 SpeciesSourceTerm_30 SpeciesSourceTerm_31 SpeciesSourceTerm_32 SpeciesSourceTerm_33 SpeciesSourceTerm_34 SpeciesSourceTerm_35 SpeciesSourceTerm_36 SpeciesSourceTerm_37 SpeciesSourceTerm_38 SpeciesSourceTerm_39 SpeciesSourceTerm_40 SpeciesSourceTerm_41 SpeciesSourceTerm_42 SpeciesSourceTerm_43 SpeciesSourceTerm_44 SpeciesSourceTerm_45 SpeciesSourceTerm_46 SpeciesSourceTerm_47 SpeciesSourceTerm_48 SpeciesSourceTerm_49 SpeciesSourceTerm_50 SpeciesSourceTerm_51 SpeciesSourceTerm_52 SpeciesDiffusionFlux_0 SpeciesDiffusionFlux_1 SpeciesDiffusionFlux_2 SpeciesDiffusionFlux_3 SpeciesDiffusionFlux_4 SpeciesDiffusionFlux_5 SpeciesDiffusionFlux_6 SpeciesDiffusionFlux_7 SpeciesDiffusionFlux_8 SpeciesDiffusionFlux_9 SpeciesDiffusionFlux_10 SpeciesDiffusionFlux_11 SpeciesDiffusionFlux_12 SpeciesDiffusionFlux_13 SpeciesDiffusionFlux_14 SpeciesDiffusionFlux_15 SpeciesDiffusionFlux_16 SpeciesDiffusionFlux_17 SpeciesDiffusionFlux_18 SpeciesDiffusionFlux_19 SpeciesDiffusionFlux_20 SpeciesDiffusionFlux_21 SpeciesDiffusionFlux_22 SpeciesDiffusionFlux_23 SpeciesDiffusionFlux_24 SpeciesDiffusionFlux_25 SpeciesDiffusionFlux_26 SpeciesDiffusionFlux_27 SpeciesDiffusionFlux_28 SpeciesDiffusionFlux_29 SpeciesDiffusionFlux_30 SpeciesDiffusionFlux_31 SpeciesDiffusionFlux_32 SpeciesDiffusionFlux_33 SpeciesDiffusionFlux_34 SpeciesDiffusionFlux_35 SpeciesDiffusionFlux_36 SpeciesDiffusionFlux_37 SpeciesDiffusionFlux_38 SpeciesDiffusionFlux_39 SpeciesDiffusionFlux_40 SpeciesDiffusionFlux_41 SpeciesDiffusionFlux_42 SpeciesDiffusionFlux_43 SpeciesDiffusionFlux_44 SpeciesDiffusionFlux_45 SpeciesDiffusionFlux_46 SpeciesDiffusionFlux_47 SpeciesDiffusionFlux_48 SpeciesDiffusionFlux_49 SpeciesDiffusionFlux_50 SpeciesDiffusionFlux_51 SpeciesDiffusionFlux_52 x_velocity_advect temperature pressure x_velocity heat_capacity enthalpy_0 enthalpy_1 enthalpy_2 enthalpy_3 enthalpy_4 enthalpy_5 enthalpy_6 enthalpy_7 enthalpy_8 enthalpy_9 enthalpy_10 enthalpy_11 enthalpy_12 enthalpy_13 enthalpy_14 enthalpy_15 enthalpy_16 enthalpy_17 enthalpy_18 enthalpy_19 enthalpy_20 enthalpy_21 enthalpy_22 enthalpy_23 enthalpy_24 enthalpy_25 enthalpy_26 enthalpy_27 enthalpy_28 enthalpy_29 enthalpy_30 enthalpy_31 enthalpy_32 enthalpy_33 enthalpy_34 enthalpy_35 enthalpy_36 enthalpy_37 enthalpy_38 enthalpy_39 enthalpy_40 enthalpy_41 enthalpy_42 enthalpy_43 enthalpy_44 enthalpy_45 enthalpy_46 enthalpy_47 enthalpy_48 enthalpy_49 enthalpy_50 enthalpy_51 enthalpy_52 cv mixtureMW species_0 species_1 species_2 species_3 species_4 species_5 species_6 species_7 species_8 species_9 species_10 species_11 species_12 species_13 species_14 species_15 species_16 species_17 species_18 species_19 species_20 species_21 species_22 species_23 species_24 species_25 species_26 species_27 species_28 species_29 species_30 species_31 species_32 species_33 species_34 species_35 species_36 species_37 species_38 species_39 species_40 species_41 species_42 species_43 species_44 species_45 species_46 species_47 species_48 species_49 species_50 species_51 species_52 e0 ke y_velocity char_H2_rhs char_H2O_rhs H2O_Gasification_reaction int_species_5 int_mixtureMW int_pressure char_mass (1 post-procs) cpd_dy_1 cpd_dy_2 cpd_dy_3 cpd_dy_4 cpd_dy_5 cpd_dy_6 cpd_dy_7 cpd_dy_8 cpd_G_RHS_0 cpd_G_RHS_1 cpd_G_RHS_2 cpd_G_RHS_3 cpd_G_RHS_4 cpd_G_RHS_5 cpd_G_RHS_6 cpd_G_RHS_7 cpd_G_RHS_8 cpd_G_RHS_9 cpd_G_RHS_10 cpd_G_RHS_11 cpd_G_RHS_12 cpd_G_RHS_13 cpd_G_RHS_14 cpd_G_RHS_15 cpd_kg_0 cpd_kg_1 cpd_kg_2 cpd_kg_3 cpd_kg_4 cpd_kg_5 cpd_kg_6 cpd_kg_7 cpd_kg_8 cpd_kg_9 cpd_kg_10 cpd_kg_11 cpd_kg_12 cpd_kg_13 cpd_kg_14 cpd_kg_15 cpd_delta_0 (1 post-procs) cpd_delta_1 (1 post-procs) cpd_delta_2 (1 post-procs) cpd_delta_3 (1 post-procs) cpd_delta_4 (1 post-procs) cpd_delta_5 (1 post-procs) cpd_delta_6 (1 post-procs) cpd_delta_7 (1 post-procs) cpd_delta_8 (1 post-procs) cpd_delta_9 (1 post-procs) cpd_delta_10 (1 post-procs) cpd_delta_11 (1 post-procs) cpd_delta_12 (1 post-procs) cpd_delta_13 (1 post-procs) cpd_delta_14 (1 post-procs) cpd_delta_15 (1 post-procs) cpd_kb char_O2_RHS char_oxidation_RHS char_Mole_CO/CO2 p_density int_species_3 int_temperature evaporation_rhs p_Re moisture_mass (1 post-procs) viscosity char_CO2_RHS char_CO_RHS CO2_Gasification_reaction int_species_15 P2CSpeciesSrc_H2 P2CSpeciesSrc_H P2CSpeciesSrc_O2 P2CSpeciesSrc_H2O P2CSpeciesSrc_CH4 P2CSpeciesSrc_CO P2CSpeciesSrc_CO2 P2CSpeciesSrc_C2H2 P2CSpeciesSrc_NH3 P2CSpeciesSrc_HCN rho_H2_RHS (1 post-procs) rho_O2_RHS (1 post-procs) rho_H_RHS (1 post-procs) rho_O_RHS (1 post-procs) rho_OH_RHS (1 post-procs) rho_H2O_RHS (1 post-procs) rho_HO2_RHS (1 post-procs) rho_H2O2_RHS (1 post-procs) rho_C_RHS (1 post-procs) rho_CH_RHS (1 post-procs) rho_CH2_RHS (1 post-procs) rho_CH2(S)_RHS (1 post-procs) rho_CH3_RHS (1 post-procs) rho_CH4_RHS (1 post-procs) rho_CO_RHS (1 post-procs) rho_CO2_RHS (1 post-procs) rho_HCO_RHS (1 post-procs) rho_CH2O_RHS (1 post-procs) rho_CH2OH_RHS (1 post-procs) rho_CH3O_RHS (1 post-procs) rho_CH3OH_RHS (1 post-procs) rho_C2H_RHS (1 post-procs) rho_C2H2_RHS (1 post-procs) rho_C2H3_RHS (1 post-procs) rho_C2H4_RHS (1 post-procs) rho_C2H5_RHS (1 post-procs) rho_C2H6_RHS (1 post-procs) rho_HCCO_RHS (1 post-procs) rho_CH2CO_RHS (1 post-procs) rho_HCCOH_RHS (1 post-procs) rho_N_RHS (1 post-procs) rho_NH_RHS (1 post-procs) rho_NH2_RHS (1 post-procs) rho_NH3_RHS (1 post-procs) rho_NNH_RHS (1 post-procs) rho_NO_RHS (1 post-procs) rho_NO2_RHS (1 post-procs) rho_N2O_RHS (1 post-procs) rho_HNO_RHS (1 post-procs) rho_CN_RHS (1 post-procs) rho_HCN_RHS (1 post-procs) rho_H2CN_RHS (1 post-procs) rho_HCNN_RHS (1 post-procs) rho_HCNO_RHS (1 post-procs) rho_HOCN_RHS (1 post-procs) rho_HNCO_RHS (1 post-procs) rho_NCO_RHS (1 post-procs) rho_N2_RHS (1 post-procs) rho_AR_RHS (1 post-procs) rho_C3H7_RHS (1 post-procs) rho_C3H8_RHS (1 post-procs) rho_CH2CHO_RHS (1 post-procs) rhoE0RHS (1 post-procs) HeatFlux e0src P2CconvenergySrc P2CenergySrc thermal_conductivity pressure_face tauxx tauyx ptempconv Heat_Capacity_of_Coal Volatile_Mass heat_released_to_gas densityRHS (1 post-procs) P2CmassSrc positionRHS x_momentumRHS (1 post-procs) pressure_xmomrhs P2CMomSrc_X p_Xmomdragterm p_drag_coef p_tau y_momentumRHS (1 post-procs) P2CMomSrc_Y p_Ymomdragterm p_x_RHS p_xmom_RHS p_ymom_RHS p_mass_RHS dev_volatile_RHS char_mass_RHS dev_char_production p_temperature_RHS coal_Temperature_rhs cpd_L_RHS cpd_Delta_RHS_0 cpd_Delta_RHS_1 cpd_Delta_RHS_2 cpd_Delta_RHS_3 cpd_Delta_RHS_4 cpd_Delta_RHS_5 cpd_Delta_RHS_6 cpd_Delta_RHS_7 cpd_Delta_RHS_8 cpd_Delta_RHS_9 cpd_Delta_RHS_10 cpd_Delta_RHS_11 cpd_Delta_RHS_12 cpd_Delta_RHS_13 cpd_Delta_RHS_14 cpd_Delta_RHS_15 x_momentum y_momentum rhoY_3 rhoY_0 rhoY_1 rhoY_2 rhoY_4 rhoY_5 rhoY_6 rhoY_7 rhoY_8 rhoY_9 rhoY_10 rhoY_11 rhoY_12 rhoY_13 rhoY_14 rhoY_15 rhoY_16 rhoY_17 rhoY_18 rhoY_19 rhoY_20 rhoY_21 rhoY_22 rhoY_23 rhoY_24 rhoY_25 rhoY_26 rhoY_27 rhoY_28 rhoY_29 rhoY_30 rhoY_31 rhoY_32 rhoY_33 rhoY_34 rhoY_35 rhoY_36 rhoY_37 rhoY_38 rhoY_39 rhoY_40 rhoY_41 rhoY_42 rhoY_43 rhoY_44 rhoY_45 rhoY_46 rhoY_47 rhoY_48 rhoY_49 rhoY_50 rhoY_51 cpd_l cpd_g_0 cpd_g_1 cpd_g_2 cpd_g_3 cpd_g_4 cpd_g_5 cpd_g_6 cpd_g_7 cpd_g_8 cpd_g_9 cpd_g_10 cpd_g_11 cpd_g_12 cpd_g_13 cpd_g_14 cpd_g_15 rhoE0 p_size p_temperature Initial_p_mass p_mass xcoord density p_x parSc p_yvel p_xvel wall_temp Topic
slide-26
SLIDE 26

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Wasatch: flexible, efficient multiphysics solver

ARCHES ICE

Speedup using DSL* relative to

  • ther Uintah codes

*Comparison to ICE and ARCHES, sister codes in Uintah, on a 3D Taylor-Green vortex problem. Run on a single processor.

slide-27
SLIDE 27

Institute for

CLEAN AND SECURE ENERGY

THE UNIVERSITY OF UTAH

TM

Conclusions

Reduced-order modeling is very valuable.

  • Can guide high-cost, high-fidelity models.
  • Allows parametric studies.
  • Tractable on entry-level rather than enterprise-level computing systems.

The current/coming hardware revolution will cause pain. How much pain depends on:

  • How good your crystal ball is; OR
  • How well you can insulate your application from hardware changes.

Graph-based decomposition of the problem handles complexity nicely.

  • Expose and exploit task & data parallelism.
  • Automatically handle data movement (inter-node and intra-node)
  • Overlap communication & computation

EDSL enables rapid development of portable, performant code.

  • insulate developer from platform-specific implementation.

PetaApps award 0904631 XPS award1337145 DE-NA0002375 DE-NA-000740