he ur Operated by Los Alamos National Security, LLC for the U.S. - PowerPoint PPT Presentation

he ur Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Los Alamos National Laboratory LA-UR-17-23350 GPU Acceleration of Large Scale you Fluid Dynamics Scientific Codes nt wo Jenniffer Estrada Joseph Schoonover LANL, Research Scientist CIRES, Research Scientist jme@lanl.gov jschoonover@lanl.gov GTC 2017 May 10, 2017

Los Alamos National Laboratory Motivation • Scale Interactions • Resolution vs Resources • Why are they bragging? 5/10/17 | 3

Los Alamos National Laboratory The Scientific Codes - SELF • Continuous and Discontinuous Galerkin (Polynomial based) Nodal Spectral Element Method • Oceanographic and Geophysical Modelling • Target problem: Model a large scale flow ( ~100 km) catalyzes the formation of small features ( ~ 1 km) due to interactions with topography (Vorticity in the Gulf Stream Shelf) • 10-million degrees of freedom 5/10/17 | 4

Los Alamos National Laboratory SELF-DGSEM Algorithm 5/10/17 | 5

Los Alamos National Laboratory Progression • Hot Spot Identification MappedTimeDerivative • Software Changes • Message Passing 5/10/17 | 6

Los Alamos National Laboratory Progression (Continued) • CPU: AMD Opteron (16 core) GPU: Tesla K20X • Serial (Single Core): • Original: 110.8 sec • After changes: 127.2 sec • OpenACC: 5.3 sec 5/10/17 | 7

Los Alamos National Laboratory 1.5x to Ideal 5/10/17 | 8

Los Alamos National Laboratory 16 1 MPR MPR 14 SPR SPR Scaling Efficiency (RK3) 0.8 SPR-LA 12 SPR-LA Speedup (RK3) Ideal Ideal 10 0.6 8 6 0.4 4 0.2 2 2 4 6 8 10 12 14 16 0 # of Threads 200 0 5 10 15 20 MPR # of Threads SPR SPR-LA 150 Runtime (RK3, sec) - Multiple parallel regions (MPR) - Ideal Standard OpenMP 100 - Single parallel region (SPR) - High Level OpenMP without loop bounds 50 - Single Parallel Region Loop Bound Assignments (SPR-LA) - High Level OpenMP 0 2 4 6 8 10 12 14 16 # of Threads ** Yuliana Zamora and Robert Robey, Effective OpenMP Implementations, https://www.lanl.gov/projects/national-security-education-center/information- science-technology/summer-schools/parallelcomputing/_assets/images/2016projects/Zamora.pdf; https://anl.app.box.com/v/IXPUG2016-presentation-23 , 5/10/17 | 9

Los Alamos National Laboratory Higher is better! 5/10/17 | 10

Los Alamos National Laboratory Thermal Bubble • Initial conditions consist of a anomalous warm blob in an otherwise neutral stratification • Domain Size: 10,000 m (cube) • Discretization: Discontinuous Galerkin Spectral Element Method 20x20x20 Elements, Polynomial Degree 7 • Laplacian Diffusion: 0.8 m 2 /s • Simulation Time: 37 minutes 5/10/17 | 11

Los Alamos National Laboratory 5/10/17 | 12

Los Alamos National Laboratory Thermal Bubble UNM Xena CPU: Intel Xeon GPU: Tesla K40m • 1.2 million time steps Wall-Time (CPU) • 37 days Wall Time (GPU) • 24 hours, 13 min 5/10/17 | 13

Los Alamos National Laboratory Across Architectures Benchmarks for ForwardStepRK3 (Euler 3-D) Tests run with CUDA Fortran Polynomial degree = 7 Laplacian Diffusion 15x15x15 Elements (5 time steps) (Footprint: ~1.9 GB memory space ) GPU Model CPU Model Serial Time GPU Runtime Speedup Tesla K40m Intel Xeon E5- 45.969 (sec) 1.282 (sec) 35.854 x 2683 GeForce GTX TitanX Intel Xeon E3- 35.672 (sec) 1.159 (sec) 30.775 x 1285L Tesla P100-SXM2- Power8NVL 49.588 (sec) 0.439 (sec) 112.913 x 16GB 5/10/17 | 14

Los Alamos National Laboratory Going Forward • Initial development of hybrid GPU-MPI code is underway (Improve weak scaling) • Use GPU-Direct technology to overcome CPU-GPU copy • Continue to update data structure layout and CUDA kernel implementation to improve memory access patterns on the GPU 5/10/17 | 15

Los Alamos National Laboratory The Scientific Codes - Higrad • The fluid dynamics core of Higrad solves the same set of equations (Compressible Navier Stokes) using a Finite Volume discretization • Atmospheric Modelling • Couples with other modules (FIRETEC and wildland fire modelling) 5/10/17 | 16

Los Alamos National Laboratory OpenCL OpenMP CUDA Fortran OpenMPI OpenACC 5/10/17 | 17

Los Alamos National Laboratory Progression • Bottom Up Approach 5/10/17 | 18

Los Alamos National Laboratory Progression (Continued) 5/10/17 | 19

Los Alamos National Laboratory Progression • GPU enabled with OpenACC • Handling memory handling with CUDA Fortran Compute intensive kernels currently on GPU 5/10/17 | 20

Los Alamos National Laboratory Going Forward (continued) • Higrad • Finish memory handling with CUDA Fortran • Scaling with GPU Aware MPI • CUDA Implementation • Higrad/Firetec Gatlinburg Fire Simulation on Titan • Mission needs 10x larger problem size (prior limits 1.6 billion cells) and 10x faster 5/10/17 | 21

Los Alamos National Laboratory Where We Are Going With This? • Suppose that I am working on a problem with 100,000 elements, and I need to perform 10,000,000 time steps (not unrealistic for scale interaction problems), then ideal runtimes would be: • T serial = c serial *(100,000)*(10,000,000) ≈ 4 years T ideal = c ideal *(100,000)*(10,000,000) ≈ 3 months T gpu = c gpu *(100,000)*(10,000,000) ≈ 1.8 months • The reduction in wall-time for small problems translates to huge potential savings for larger problems! 5/10/17 | 22

Los Alamos National Laboratory Acknowledgements • This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. • Special thanks to Fernanda Foertter (ORNL), Jeff Larkin (NVIDIA), David Norton (NVIDIA/PGI), Frank Winkler (ORNL), Matt Otten (LLNL) 5/10/17 | 23

Los Alamos National Laboratory Questions? 5/10/17 | 24

he ur Operated by Los Alamos National Security, LLC for the U.S. - PowerPoint PPT Presentation

he ur Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Los Alamos National Laboratory LA-UR-17-23350 GPU Acceleration of Large Scale you Fluid Dynamics Scientific Codes nt wo Jenniffer Estrada Joseph

Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. Zhang) I. Image modeling II.

Interim results for the six months ended 31 August 2015 Welcome Esor supporting Cancer month

TAX AUDIT REPORT U/s 44AB of Income Tax Act, 1961 by CA Suresh Babu S Managing Partner M/s SBS

INDO RAMA SYNTHETICS (INDIA) LTD TWO DECADES OF POLYESTER MANUFACTURING EXCELLENCE Investor &

Crossing Community Church SH-74 Public Meeting Presented by: Oklahoma Department of

ELISA data Human PD-L1 ELISA Kit [28-8] (ab214565) October 2016 Standard ELISA 25 October 2016

Rene newi wing ng You our Licens ense e and nd The he Prof ofessi ession onal l Grow

By By th the e end of of th this s sessio ion we w will l hav ave an an unders rstan

Overview of The Governors Workforce Cabinet Created in 2018 Structure Matters: Leaders

*.fedoraproject.org PGP keys now in DNSSEC All Fedora Account System users have a

Cost Comparison of Spent Fuel Storage and Deep Geological Disposal p g p Graham Smith Graham

1 This is the way that TEC 28.0212 previously read.. 2 Additions are bolded and underlined and

Opportunity Day Q1 : 2019 Mission : To make investments, develop, and operate the renewable

Personal Graduation Plans (PGPs) Planning Your High School Years Handouts Please pick up

EARNINGS WEBCAST Todays Speakers Jos de Jess Valdez Jos Carlos Pons Alejandro Elizondo

Disparities in Educational Opportunity Implications for the U.S. Economy and Policy Erica

Potrero Gateway Park Focus on the Rectangle 1/16/2020 Who is the Steering Committee (PGP SC)?

Gospel Courage Peter, an apostle of Jesus Christ, To those who are elect exiles of the dispersion

Authcoin Validation and Authentication in Decentralized Networks Benjamin Leiding 1 Clemens H.

Pacific Northwest Low Carbon Scenario Analysis Achieving Least-Cost Carbon Emissions Reductions

Second Quarter 2016 Earnings Presentation August 4, 2016 Forward-Looking Statements Statements in

Australian Pipeline Trust Australian Pipeline Trust Managing Directors Managing Directors

Empowering Youth For Employability 2 About BSE Institute 3 BSE Institute Ltd : Focus areas

Welcome Class of 2023 Future Freshman Information Night 2019 State End-of-Course Testing

he ur Operated by Los Alamos National Security, LLC for the U.S. - PowerPoint PPT Presentation

he ur Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA Los Alamos National Laboratory LA-UR-17-23350 GPU Acceleration of Large Scale you Fluid Dynamics Scientific Codes nt wo Jenniffer Estrada Joseph

Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. Zhang) I. Image modeling II.

Interim results for the six months ended 31 August 2015 Welcome Esor supporting Cancer month

TAX AUDIT REPORT U/s 44AB of Income Tax Act, 1961 by CA Suresh Babu S Managing Partner M/s SBS

INDO RAMA SYNTHETICS (INDIA) LTD TWO DECADES OF POLYESTER MANUFACTURING EXCELLENCE Investor &amp;

Crossing Community Church SH-74 Public Meeting Presented by: Oklahoma Department of

ELISA data Human PD-L1 ELISA Kit [28-8] (ab214565) October 2016 Standard ELISA 25 October 2016

Rene newi wing ng You our Licens ense e and nd The he Prof ofessi ession onal l Grow

By By th the e end of of th this s sessio ion we w will l hav ave an an unders rstan

Overview of The Governors Workforce Cabinet Created in 2018 Structure Matters: Leaders

*.fedoraproject.org PGP keys now in DNSSEC All Fedora Account System users have a

Cost Comparison of Spent Fuel Storage and Deep Geological Disposal p g p Graham Smith Graham

1 This is the way that TEC 28.0212 previously read.. 2 Additions are bolded and underlined and

Opportunity Day Q1 : 2019 Mission : To make investments, develop, and operate the renewable

Personal Graduation Plans (PGPs) Planning Your High School Years Handouts Please pick up

EARNINGS WEBCAST Todays Speakers Jos de Jess Valdez Jos Carlos Pons Alejandro Elizondo

Disparities in Educational Opportunity Implications for the U.S. Economy and Policy Erica

Potrero Gateway Park Focus on the Rectangle 1/16/2020 Who is the Steering Committee (PGP SC)?

Gospel Courage Peter, an apostle of Jesus Christ, To those who are elect exiles of the dispersion

Authcoin Validation and Authentication in Decentralized Networks Benjamin Leiding 1 Clemens H.

Pacific Northwest Low Carbon Scenario Analysis Achieving Least-Cost Carbon Emissions Reductions

Second Quarter 2016 Earnings Presentation August 4, 2016 Forward-Looking Statements Statements in

Australian Pipeline Trust Australian Pipeline Trust Managing Directors Managing Directors

Empowering Youth For Employability 2 About BSE Institute 3 BSE Institute Ltd : Focus areas

Welcome Class of 2023 Future Freshman Information Night 2019 State End-of-Course Testing

INDO RAMA SYNTHETICS (INDIA) LTD TWO DECADES OF POLYESTER MANUFACTURING EXCELLENCE Investor &