hd gp gpu systems for hpc applications
play

HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps - PowerPoint PPT Presentation

Introduction Implementation HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps Sergio Tafur , & Christopher Kung Center for Computational Science | Section Head (Acting) Code 5594 Productivity Enhancement,


  1. Introduction Implementation HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps Sergio Tafur † , & Christopher Kung ‡ † Center for Computational Science | Section Head (Acting) Code 5594 ‡ Productivity Enhancement, Technology Transfer and Training On-Site at NRL DISTRIBUTION A . Approved for public release: distribution unlimited. GPU Technology Conference | April 2016 S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC April 1, 2016

  2. Introduction Implementation Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  3. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  4. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications Synthetic Amperture Radar | RF Amps | RDEs Synthetic Aperture Radar Investigating the Use of GPU-Accelerated Nodes for SAR Image Formation, IEEE Int. Conf. on Cluster Computing and Workshops, 1-8, 31, 2009. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  5. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications SAR | Radio Frequency Amplifiers | RDEs RF Amplifiers Simulation of Klystrons With Slow and Reflected Electrons Using Large-Signal Code TESLA IEEE Transactions on Electron Devices, 54(6), 1555-1561, 2007. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  6. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications SAR | RF Amps | Rotating Detonation Engines Rotating Detonation Engines Thermodynamic Modeling of a Rotating Detonation Engine AIAA Paper 2011-803, 49 th AIAA Aerospace Sciences Meeting, 2011. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  7. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  8. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Configuration | FDR Infiniband Fat-Tree(ish) S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  9. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Configuration | FDR Infiniband Fat-Tree(ish) S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  10. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | Exxact Quantum TXR410-768R Mother board: 2x Intel E5-2600 v3 4x PLX PEX 8747 switch Configuration: 8x Titan Black 128 GB DDR4 Memory http://www.tyan.com/datasheets/DataSheet_FT77A-B7059.pdf http://tyan.com/manuals/FT77C-B7079_QIG.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  11. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | Exxact Quantum TXR410-768R Mother board: 2x Intel E5-2600 v3 4x PLX PEX 8747 switch Configuration: 8x Titan Black 128 GB DDR4 Memory http://www.tyan.com/datasheets/DataSheet_FT77A-B7059.pdf http://tyan.com/manuals/FT77C-B7079_QIG.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  12. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 GTX TITAN Black GPU Engine Specs: 2880 CUDA Cores/960 DP Units 889 Base Clock (MHz) 980 Boost Clock (MHz) GTX TITAN Black Memory Specs: 7.0 Gbps Memory Clock 6144 MB Standard Memory Config 336 Memory Bandwidth (GB/sec) https://forums.geforce.com/default/topic/531846/geforce-gtx-titan-is-here-/ http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/specifications S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  13. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK 110 Die 15 Streaming Multiprocessor (SMX) Architecture Units six 64-bit Memory controllers 3 SP cores / 1 DP Unit http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  14. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 Nvidia Graphics Processing Unit GK 110 w/ 15 SMX @ 250W eight w/ 2880 sp cores @ 0.98/1.12 GHz SP Perf: 8 x 2.8 TFlops SP Eff: 11.3 GFlops/W eight w/ 960 dp cores @ 0.98/1.12 GHz DP Perf: 8 x 1.11 TFlops DP Eff: 4.44 GFlops/W http://www.nvidia.com/content/pdf/kepler/nvidia-kepler-gk110-architecture-whitepaper.pdf http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/specifications S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  15. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK 110 Die 15 Streaming Multiprocessor (SMX) Architecture Units 1536 kB L2 Cache six 64-bit Memory controllers 3 SP cores / 1 DP Unit http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  16. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK110 SMX Unit 192 SP Cores 64 DP Units 64 kB on chip memory 48kB shared / 16kB L1 16kB shared / 48kB L1 32kB shared / 32kB L1 http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  17. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Expected Performance GTX Titan Black Single Precision: 2.8 TFlops Double Precision: 922 TFlops Server Single Precision: 22.4 TFlops Double Precision: 7.4 TFlops Y.O.D.A. Single Precision: 1.4 PFlops Double Precision: 477 TFlops S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  18. Introduction Benchmarks Implementation Challenges Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  19. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs CuBLAS - S | D | C | Z - GEMM S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  20. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs MAGMA - S | D | C | Z - GEMM S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  21. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  22. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 2 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  23. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 4 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  24. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  25. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 2/4/8 GPUs MAGMA - Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  26. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Single GPU Performance: HPL Top 500 Run S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  27. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Aggregate Performance: HPL Top 500 Run S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend