HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps - PowerPoint PPT Presentation

Introduction Implementation HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps Sergio Tafur † , & Christopher Kung ‡ † Center for Computational Science | Section Head (Acting) Code 5594 ‡ Productivity Enhancement, Technology Transfer and Training On-Site at NRL DISTRIBUTION A . Approved for public release: distribution unlimited. GPU Technology Conference | April 2016 S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC April 1, 2016

Introduction Implementation Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications Synthetic Amperture Radar | RF Amps | RDEs Synthetic Aperture Radar Investigating the Use of GPU-Accelerated Nodes for SAR Image Formation, IEEE Int. Conf. on Cluster Computing and Workshops, 1-8, 31, 2009. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications SAR | Radio Frequency Amplifiers | RDEs RF Amplifiers Simulation of Klystrons With Slow and Reflected Electrons Using Large-Signal Code TESLA IEEE Transactions on Electron Devices, 54(6), 1555-1561, 2007. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications SAR | RF Amps | Rotating Detonation Engines Rotating Detonation Engines Thermodynamic Modeling of a Rotating Detonation Engine AIAA Paper 2011-803, 49 th AIAA Aerospace Sciences Meeting, 2011. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Configuration | FDR Infiniband Fat-Tree(ish) S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | Exxact Quantum TXR410-768R Mother board: 2x Intel E5-2600 v3 4x PLX PEX 8747 switch Configuration: 8x Titan Black 128 GB DDR4 Memory http://www.tyan.com/datasheets/DataSheet_FT77A-B7059.pdf http://tyan.com/manuals/FT77C-B7079_QIG.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 GTX TITAN Black GPU Engine Specs: 2880 CUDA Cores/960 DP Units 889 Base Clock (MHz) 980 Boost Clock (MHz) GTX TITAN Black Memory Specs: 7.0 Gbps Memory Clock 6144 MB Standard Memory Config 336 Memory Bandwidth (GB/sec) https://forums.geforce.com/default/topic/531846/geforce-gtx-titan-is-here-/ http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/specifications S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK 110 Die 15 Streaming Multiprocessor (SMX) Architecture Units six 64-bit Memory controllers 3 SP cores / 1 DP Unit http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 Nvidia Graphics Processing Unit GK 110 w/ 15 SMX @ 250W eight w/ 2880 sp cores @ 0.98/1.12 GHz SP Perf: 8 x 2.8 TFlops SP Eff: 11.3 GFlops/W eight w/ 960 dp cores @ 0.98/1.12 GHz DP Perf: 8 x 1.11 TFlops DP Eff: 4.44 GFlops/W http://www.nvidia.com/content/pdf/kepler/nvidia-kepler-gk110-architecture-whitepaper.pdf http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/specifications S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK 110 Die 15 Streaming Multiprocessor (SMX) Architecture Units 1536 kB L2 Cache six 64-bit Memory controllers 3 SP cores / 1 DP Unit http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK110 SMX Unit 192 SP Cores 64 DP Units 64 kB on chip memory 48kB shared / 16kB L1 16kB shared / 48kB L1 32kB shared / 32kB L1 http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Expected Performance GTX Titan Black Single Precision: 2.8 TFlops Double Precision: 922 TFlops Server Single Precision: 22.4 TFlops Double Precision: 7.4 TFlops Y.O.D.A. Single Precision: 1.4 PFlops Double Precision: 477 TFlops S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction Benchmarks Implementation Challenges Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs CuBLAS - S | D | C | Z - GEMM S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs MAGMA - S | D | C | Z - GEMM S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 2/4/8 GPUs MAGMA - Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Single GPU Performance: HPL Top 500 Run S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Aggregate Performance: HPL Top 500 Run S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps - PowerPoint PPT Presentation

Introduction Implementation HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps Sergio Tafur , & Christopher Kung Center for Computational Science | Section Head (Acting) Code 5594 Productivity Enhancement,

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Uni.lu HPC School 2019 PS12a: Machine / Deep learning I Keras/Tensorflow CPU/GPU Uni.lu High

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

HPC AND IN THE DATA CENTER Peter Messmer, DATE 2019, March 27 2019 RISE OF GPU COMPUTING 1000X

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

Speech recognition frontend on Cell BE Pavel Bazika (bazikp1@fel.cvut.cz) Speech recognizer

Modeling And Visualizing Fire Without Getting Burned MCSD Seminar June 29, 2005 Glenn P. Forney

measurements. ENRIS2019, 16-18 June 2019 1 Acknowledgements Dorien van der AA, project manager

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

The role of migration on family formation trajectories Evidence from the United States Andrs

Mining Markov Network Surrogates for Value-Added Optimisation Alexander Brownlee

The role of low-carbon technologies in climate mitigation Perspectives on feasibility of low

Relevant Background World wide industrial agriculture expertise 1 1/10/2019 Who Are We ?

Sambuz

Useful Links

Newsletter

Mail Us

HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps - PowerPoint PPT Presentation

Introduction Implementation HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps Sergio Tafur , & Christopher Kung Center for Computational Science | Section Head (Acting) Code 5594 Productivity Enhancement,

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Uni.lu HPC School 2019 PS12a: Machine / Deep learning I Keras/Tensorflow CPU/GPU Uni.lu High

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

HPC AND IN THE DATA CENTER Peter Messmer, DATE 2019, March 27 2019 RISE OF GPU COMPUTING 1000X

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

Speech recognition frontend on Cell BE Pavel Bazika (bazikp1@fel.cvut.cz) Speech recognizer

Modeling And Visualizing Fire Without Getting Burned MCSD Seminar June 29, 2005 Glenn P. Forney

measurements. ENRIS2019, 16-18 June 2019 1 Acknowledgements Dorien van der AA, project manager

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

The role of migration on family formation trajectories Evidence from the United States Andrs

Mining Markov Network Surrogates for Value-Added Optimisation Alexander Brownlee

The role of low-carbon technologies in climate mitigation Perspectives on feasibility of low

Relevant Background World wide industrial agriculture expertise 1 1/10/2019 Who Are We ?

Sambuz

Useful Links

Newsletter

Mail Us

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team