+ Jerry Lee, Vaidya Sankaran, United Technologies Research Center - - PowerPoint PPT Presentation

jerry lee vaidya sankaran united technologies research
SMART_READER_LITE
LIVE PREVIEW

+ Jerry Lee, Vaidya Sankaran, United Technologies Research Center - - PowerPoint PPT Presentation

Hybrid Simulations Using CPU-GPU Paradigm for Reacting Flows in Accelerating Industrial Competitiveness through Extreme- Scale Computing + Jerry Lee, Vaidya Sankaran, United Technologies Research Center UTC, East Hartford Acknowledgement:


slide-1
SLIDE 1

This page contains no technical information subjected to EAR and ITAR

Hybrid Simulations Using CPU-GPU Paradigm for Reacting Flows in

Accelerating Industrial Competitiveness through Extreme- Scale Computing

Jerry Lee, Vaidya Sankaran, United Technologies Research Center UTC, East Hartford

Acknowledgement: Vivek Venugopal, Hui Gao for helping to implement the GPU code

+

slide-2
SLIDE 2

2

Special Thanks to

  • Dr. Ramanan Sankaran,
  • Dr. Suzy Tichenor & Dr. Jack Wells

Oak Ridge National Laboratories, USA.

slide-3
SLIDE 3

This page contains no technical information subjected to EAR and ITAR

40 ~ 3 , 2 , 1 ; , , 1 ; , , , 1

s s i A i i i i

N i N Y e u A x A V x A u t A                   

Combustion adds a lot more PDEs, 9x cold flow Reactive flow adds a lot more PDEs to cold flow CFD

slide-4
SLIDE 4

This page contains no technical information subjected to EAR and ITAR

c * * * * * * * 1 2 2 2 2

F

  • f

functions linear , , ) / exp( ) / exp( ) / exp( ) 1 ( ) log( ) (log(Pr) log(Pr) 1 ) log( ] [ Pr ) / exp( ) / exp( ) Pr 1 Pr ( ] [ ] [ ] [ ) ( ) (                                         

     

d n c T T T T T T F F c d n c F k M k RT E T A k RT E T A k F k k M O H k dt dY M HO M O H

c c H

 

   

ONE step:

~200-300 steps for jet fuel!

“Fuel+O2 product+heat” has lot of paths, steps, and transcendental functions

14 transcendental

slide-5
SLIDE 5

This page contains no technical information subjected to EAR and ITAR

1.0E-02 1.0E-01 1.0E+00 1.0E+01 1.0E+02 1.0E+03 0.0E+00 5.0E+06 1.0E+07 1.5E+07 2.0E+07 CPU GPU

# threads*

Performance of standalone GPU chem solver (explicit)

Time /step(sec)

* 1 thread = 1 DOE system cuts cost of chemistry compute way down in CFD

slide-6
SLIDE 6

This page contains no technical information subjected to EAR and ITAR

1.0E-06 1.0E-05 1.0E-04 1.0E-03 256 1024 4096 16384 65536 262144 1048576 4194304

GPU Walltime/ODE/step CPU Walltime/ODE/step GPU Walltime/ODE/step DVODE on CPU

Wall time/ODE/time step [s] # threads

Performance of standalone GPU chem solver (implicit)

cuts cost of chemistry compute way down in CFD

Reactive code cost  transport code

  • chem. on CPU

trans

slide-7
SLIDE 7

A i i i i

x A V x A u t A              

Integrate terms in tandem Operator splitting in CFD  thread the chemistry compute Independent of neighbors Collect from many cells and do threading Overall acceleration depends on chem. compute load

slide-8
SLIDE 8

This page contains no technical information subjected to EAR and ITAR

CPU does all terms CPU does all CPU does all

CPU transport terms

GPU threads for concurrent CHEMISTRY GPU CHEMISTRY

CPU transport MPI CPU Transport MPI

GPU CHEMISTRY

In tandem

domain decomposition  CPU transport, GPU chemistry

One GPU keeps up with all 16 CPU cores?

slide-9
SLIDE 9

This page contains no technical information subjected to EAR and ITAR

GPU-CPU hybrid code tested on shear layer turbulent flame AIR Fuel: CO,H2

3D Direct Numerical Simulation 1. fully compressible reactive code 2. detailed chemical kinetics 3. detailed multicomponent transport 4. three dimension 5. all scales fully resolved

slide-10
SLIDE 10

This page contains no technical information subjected to EAR and ITAR

GPU chemistry hidden completely

slide-11
SLIDE 11

This page contains no technical information subjected to EAR and ITAR

CPU only runs  chemistry takes a lion’s share (12 scalars) 6x speed up potential

slide-12
SLIDE 12

This page contains no technical information subjected to EAR and ITAR

GPU: 83% to 7% reduction in chemistry compute load Capacity for bigger chemistry 35-40 species  good for jet fuel!

slide-13
SLIDE 13

This page contains no technical information subjected to EAR and ITAR

# cores wall time Strong scalability of CPU-GPU hybrid

slide-14
SLIDE 14

This page contains no technical information subjected to EAR and ITAR

Weak scalability of CPU-GPU hybrid good up to 64M cells

slide-15
SLIDE 15

This page contains no technical information subjected to EAR and ITAR

+ doable with Tune existing or create new turbulent- chemistry model ~100 nodes Looking forward: GPU-CPU hybrid is a significant tech. enabler

http://www.happynews.com/news/11142008/visualizing-unseen-forces-turbulence.htm Courtesy of Stanford University Center for Turbulence Research