porting scalable parallel cfd application
play

Porting Scalable Parallel CFD Application Krishnababu et. al. - PowerPoint PPT Presentation

HiFUN on GPU Porting Scalable Parallel CFD Application Krishnababu et. al. HiFUN on NVIDIA GPU D. V. Krishnababu, N. Munikrishna, Nikhil Vijay Shende 1 N. Balakrishnan 2 Thejaswi Rao 3 1. S & I Engineering Solutions Pvt. Ltd., Bangalore,


  1. HiFUN on GPU Porting Scalable Parallel CFD Application Krishnababu et. al. HiFUN on NVIDIA GPU D. V. Krishnababu, N. Munikrishna, Nikhil Vijay Shende 1 N. Balakrishnan 2 Thejaswi Rao 3 1. S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India 3. NVIDIA Graphics Pvt. Ltd., Banglore, India GPU Technology Conference Silicon Valley March 26–29, 2018 1 / 18

  2. Introduction http://www.sandi.co.in HiFUN on GPU The HiFUN Software Krishnababu et. al. Hi gh Resolution F low Solver on Un structured Meshes. A C omputational F luid D ynamics (CFD) Flow Solver. Primary product of the company SandI. Robust, fast, accurate and efficient tool. About SandI A technology company. Incubated from Indian Institute of Science, Bangalore. Promotes high end CFD technologies with uncompromising quality standards. 2 / 18

  3. Introduction http://www.sandi.co.in HiFUN on GPU The HiFUN Software Krishnababu et. al. Hi gh Resolution F low Solver on Un structured Meshes. A C omputational F luid D ynamics (CFD) Flow Solver. Primary product of the company SandI. Robust, fast, accurate and efficient tool. About SandI A technology company. Incubated from Indian Institute of Science, Bangalore. Promotes high end CFD technologies with uncompromising quality standards. 2 / 18

  4. Features of HiFUN http://www.sandi.co.in/home/products HiFUN on GPU General Krishnababu et. al. 3 / 18

  5. Features of HiFUN http://www.sandi.co.in/home/products HiFUN on Well Validated GPU Krishnababu et. al. AIAA DPW SPICES AIAA HiLiftPW 4 / 18

  6. Features of HiFUN http://www.sandi.co.in/home/products HiFUN on Super Scalable Workload: 165 Million Volumes GPU Krishnababu et. al. Simulation CPU Cores Time (Hours/Days) 256 30/1.25 RANS 10000 1 256 108/4.5 URANS 10000 3 256 525/22 DES 10000 15 5 / 18

  7. SandI–NVIDIA Collaboration HiFUN on ✲ HiFUN on NVIDIA Pascal, Volta GPU GPU Way Krishnababu Ahead NVLink With IBM Power CPU et. al. ✲ GTC 2018 2018 GTCx Mumbai 2016 HiFUN in GPU Apps Catalogue ✲ GTC 2016: Poster Presentation ✲ NVIDIA Innovation Award 2015 ✲ Joint Development Initiative Kicks Off 2014 6 / 18

  8. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Hybrid Supercomputers Consist of CPU and NVIDIA GPU. Less power to achieve same FLOPS. Less cooling & space. GPU Thousands of computing cores sharing same RAM. Higher memory bandwidth. High data transfer overheads with CPU. 7 / 18

  9. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Hybrid Supercomputers Consist of CPU and NVIDIA GPU. Less power to achieve same FLOPS. Less cooling & space. GPU Thousands of computing cores sharing same RAM. Higher memory bandwidth. High data transfer overheads with CPU. 7 / 18

  10. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Parallelization Model on GPU Shared memory. Many FLOPS per byte of data from CPU to GPU. Re–look at parallelization of CFD algorithms. Parallelization Challenges General purpose algorithms. Implicit: Global data dependence. Complex multi–layered unstructured data structure. 8 / 18

  11. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Parallelization Model on GPU Shared memory. Many FLOPS per byte of data from CPU to GPU. Re–look at parallelization of CFD algorithms. Parallelization Challenges General purpose algorithms. Implicit: Global data dependence. Complex multi–layered unstructured data structure. 8 / 18

  12. HiFUN on NVIDIA GPU HiFUN on GPU Constraints Krishnababu et. al. No compromise on distributed memory scalability. Source code maintainability should not suffer. Software portability should not suffer. Parallel Strategy Accelerate single node performance via offload model. Hybrid: MPI and OpenACC directives. Offload Model Computationally intensive part is offloaded to GPU. Optimal data communication between CPU & GPU. 9 / 18

  13. HiFUN on NVIDIA GPU HiFUN on GPU Constraints Krishnababu et. al. No compromise on distributed memory scalability. Source code maintainability should not suffer. Software portability should not suffer. Parallel Strategy Accelerate single node performance via offload model. Hybrid: MPI and OpenACC directives. Offload Model Computationally intensive part is offloaded to GPU. Optimal data communication between CPU & GPU. 9 / 18

  14. HiFUN on NVIDIA GPU HiFUN on GPU Constraints Krishnababu et. al. No compromise on distributed memory scalability. Source code maintainability should not suffer. Software portability should not suffer. Parallel Strategy Accelerate single node performance via offload model. Hybrid: MPI and OpenACC directives. Offload Model Computationally intensive part is offloaded to GPU. Optimal data communication between CPU & GPU. 9 / 18

  15. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Onera M6 NASA CRM Trap Wing Configurations & Workloads (Million) Onera M6 Wing: 1.1, 9.3, 12.12, 15.4 NASA CRM: 6.2, 26.5, 30 NASA Trap Wing: 20, 66 Simulation Type Steady RANS Simulations 10 / 18

  16. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Onera M6 NASA CRM Trap Wing Configurations & Workloads (Million) Onera M6 Wing: 1.1, 9.3, 12.12, 15.4 NASA CRM: 6.2, 26.5, 30 NASA Trap Wing: 20, 66 Simulation Type Steady RANS Simulations 10 / 18

  17. HiFUN on NVIDIA GPU HiFUN on GPU Computing Platform: NVIDIA PSG Krishnababu Node configuration et. al. Two Hexa–deca core Intel(R) Xeon(R) Haswell processors. Eight NVIDIA Tesla K–80 GPUs. GPU Memory = 12 GB. Total CPU Memory per node = 256 GB. Infiniband interconnect Software PGI Compiler 16.7 OPENMPI 1.10.2 OpenACC 2.0 11 / 18

  18. HiFUN on NVIDIA GPU HiFUN on GPU Computing Platform: NVIDIA PSG Krishnababu Node configuration et. al. Two Hexa–deca core Intel(R) Xeon(R) Haswell processors. Eight NVIDIA Tesla K–80 GPUs. GPU Memory = 12 GB. Total CPU Memory per node = 256 GB. Infiniband interconnect Software PGI Compiler 16.7 OPENMPI 1.10.2 OpenACC 2.0 11 / 18

  19. HiFUN on NVIDIA GPU Parallel Performance Parameters HiFUN on GPU Ideal Speed–up Krishnababu et. al. Ratio of number of nodes used for a given run to reference number of nodes. Actual Speed–up Ratio of time/iteration using reference number of nodes to time/iteration using number of nodes for given run. Accelerator Speed–up Ratio of time per iteration obtained using given no. of CPUs to time per iteration obtained using same no. of CPUs working in tandem with GPUs. 12 / 18

  20. HiFUN on NVIDIA GPU Single Node Performance HiFUN on GPU Krishnababu et. al. Accelerator Speed–up on 2 GPU Observations Increase in grid size increases GPU utilization and accelerator speed–up. Important to load GPU completely. 13 / 18

  21. HiFUN on NVIDIA GPU Single Node Performance HiFUN on GPU Krishnababu et. al. Varying GPUs % Increase Observations Increase in no. of GPUs increase accelerator speed–up. Use of 4 GPUs per node is optimal. 14 / 18

  22. HiFUN on NVIDIA GPU Single Node Performance HiFUN on GPU Krishnababu et. al. Time to RANS Solution (Hours) Observations Time to solution on 1 million grid ∼ 15 minutes. Time to solution on 30 million grid ∼ half a day. Single node serves as a desktop supercomputer. 15 / 18

  23. HiFUN on NVIDIA GPU Multi–node Performance HiFUN on GPU Krishnababu et. al. Parallel Speed–up: 66 Million Workload Observations Near linear speed–up using 2 GPUs per node. Drop in speed–up for larger no. nodes and/or higher GPUs due to lower GPU utilization. 16 / 18

  24. HiFUN on NVIDIA GPU Multi–node Performance HiFUN on GPU Krishnababu et. al. Normalized Time Per Iteration: 66 Million Workload Observations Drop in time/iter with increase in no. of nodes and/or GPUs. Time to solution with 8 nodes ∼ 4 hours. 17 / 18

  25. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Concluding Remarks Offload model to port HiFUN on GPU. GPU based computing node is powerful enough to serve as desktop supercomputer. HiFUN is ideally suited to solve grand challenge problems on GPU based hybrid supercomputers. OpenACC directives based offload model is an attractive option for porting legacy CFD codes on GPU. 18 / 18

  26. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Concluding Remarks Offload model to port HiFUN on GPU. GPU based computing node is powerful enough to serve as desktop supercomputer. HiFUN is ideally suited to solve grand challenge problems on GPU based hybrid supercomputers. OpenACC directives based offload model is an attractive option for porting legacy CFD codes on GPU. 18 / 18

  27. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Concluding Remarks Offload model to port HiFUN on GPU. GPU based computing node is powerful enough to serve as desktop supercomputer. HiFUN is ideally suited to solve grand challenge problems on GPU based hybrid supercomputers. OpenACC directives based offload model is an attractive option for porting legacy CFD codes on GPU. 18 / 18

  28. HiFUN on NVIDIA GPU HiFUN on GPU Krishnababu et. al. Concluding Remarks Offload model to port HiFUN on GPU. GPU based computing node is powerful enough to serve as desktop supercomputer. HiFUN is ideally suited to solve grand challenge problems on GPU based hybrid supercomputers. OpenACC directives based offload model is an attractive option for porting legacy CFD codes on GPU. 18 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend