Algorithmic time, energy, and power on candidate HPC compute - PowerPoint PPT Presentation

Algorithmic time, energy, and power on candidate HPC compute building blocks Jee Choi, Marat Dukhan, Xing Liu, and Richard Vuduc May 20, 2014 Presented at IPDPS’14

Contributions • Energy roofline (IPDPS’13) quantifies relative energy costs of computation to data movement • Power “cap ” limits performance • μ benchmark suite for testing di ff erent levels of the memory hierarchy • Empirical data on systems ranging from server - to mobile -class platforms • Analysis using our methodology

Roofline in energy (IPDPS’13) Slow memory Q (m)ops τ mem = time / (m)op Fast memory (total size = Z ) W (fl)ops xPU τ flop = time / (fl)op

Roofline in energy (IPDPS’13) ¡ GFLOP/s 1 “Roofline” [1] GFLOP/J “Arch line” 1/2 Relative performance 1/4 1/8 1/16 3.6 14 1/32 1/2 1 2 4 8 16 32 64 128 Intensity (FLOP:Byte) [1] ¡S. ¡Williams, ¡A. ¡Waterman, ¡and ¡D. ¡Pa6erson, ¡“Roofline: ¡an ¡insigh>ul ¡visual ¡performance ¡model ¡for ¡mulDcore ¡architectures,” ¡ Commun. ¡ACM, ¡vol. ¡52, ¡no. ¡4, ¡pp. ¡65–76, ¡Apr. ¡2009. ¡[Online]. ¡Available: ¡h6p://doi.acm.org/10.1145/1498765.1498785 ¡

Roofline in energy (IPDPS’13) ¡ GFLOP/s 1 “Roofline” [1] GFLOP/J “Arch line” 1/2 Relative performance 1/4 1/8 1/16 3.6 14 1/32 1/2 1 2 4 8 16 32 64 128 Intensity (FLOP:Byte) [1] ¡S. ¡Williams, ¡A. ¡Waterman, ¡and ¡D. ¡Pa6erson, ¡“Roofline: ¡an ¡insigh>ul ¡visual ¡performance ¡model ¡for ¡mulDcore ¡architectures,” ¡Commun. ¡ ACM, ¡vol. ¡52, ¡no. ¡4, ¡pp. ¡65–76, ¡Apr. ¡2009. ¡[Online]. ¡Available:h6p://doi.acm.org/10.1145/1498765.1498785 ¡

Roofline in energy (IPDPS’13) ¡ Power, relative to flop − power 8 3.6 14 5.0 ● ● ● 4.0 ● ● 4 ● 2 ● ● ● ● ● 1.0 ● ● 1 0.5 1 2 4 8 16 32 64 128 256 512 Intensity (flop:byte) Power dissipated by compute units Power dissipated by memory units

Roofline in energy (IPDPS’13)

Roofline in energy (IPDPS’13) NVIDIA GTX 580 Intel i7 − 950 (GPU − only) (Desktop) 1.4 380 W 1.3 Power (normalized to flop+const) 1.2 180 W 1.1 280 W 160 W ● ● ● ●●●●● ● ● 1 ● ● ● ●●●●● ● ● ● ● ● ● ● 140 W ● ● 0.9 ● ● ● ● ● ●●● ● 220 W 0.8 ● 120 W ● Power ● ● 0.7 0.6 0.5 120 W 0.4 0.3 0.2 0.1 8.2 4.2 0 4.5 5.1 (const=0) 2.1 2.1 (const=0) 1/4 1/2 1 2 4 8 16 32 64 1/4 1/2 1 2 4 8 16 32 64 Intensity (FLOP : Byte)

Roofline in energy (IPDPS’13) NVIDIA GTX 580 Intel i7 − 950 (GPU − only) (Desktop) 1.4 380 W power cap 1.3 Power (normalized to flop+const) prevents peak 1.2 180 W performance 1.1 280 W 160 W ● ● ● ●●●●● ● ● 1 ● ● ● ●●●●● ● ● ● ● ● ● ● 140 W ● ● 0.9 ● ● ● ● ● ●●● ● 220 W 0.8 ● 120 W ● Power ● ● 0.7 0.6 0.5 120 W 0.4 0.3 0.2 0.1 8.2 4.2 0 4.5 5.1 (const=0) 2.1 2.1 (const=0) 1/4 1/2 1 2 4 8 16 32 64 1/4 1/2 1 2 4 8 16 32 64 Intensity (FLOP : Byte)

Power Cap Power is Performance is determined by limited by performance power

Power Cap Power is Performance is determined by limited by performance power “Usable” power

Power Cap Power is Performance is determined by limited by performance power “Usable” power ✗

Power Cap Power, relative to flop − power 8 3.6 14 5.0 ● ● ● 4.0 ● ● 4 ● 2 ● ● ● ● ● 1.0 ● ● 1 0.5 1 2 4 8 16 32 64 128 256 512 Intensity (flop:byte)

μ benchmark Suite • Intensity • x86 CPU - flops, bytes - Intel, AMD • Cache • ARM CPU - shared • Performance × ¡ × ¡ - A9, A15 memory • Energy • GPU - L1, L2, etc. - NVIDIA, AMD, ARM • Random access • Xeon Phi http://hpcgarage.org/archline

μ benchmark Suite • CPU Intensity vmovapd ymm0, [rdi - 128] vmovapd ymm1, [rdi - 96] μ benchmark for Ivy vmovapd ymm2, [rdi - 64] vmovapd ymm3, [rdi - 32] Bridge vmovapd ymm4, [rdi] vmovapd ymm5, [rdi + 32] – aligned memory loads vmovapd ymm6, [rdi + 64] vmovapd ymm7, [rdi + 96] – 1 MUL and 1 ADD AVX %rep MAD_PER_ELEMENT vmulpd ymm0, ymm0, ymm0 instructions issued per vaddpd ymm8, ymm8, ymm0 vmulpd ymm1, ymm1, ymm1 cycle vaddpd ymm9, ymm9, ymm1 vmulpd ymm2, ymm2, ymm2 – maximize AVX register vaddpd ymm10, ymm10, ymm2 vmulpd ymm3, ymm3, ymm3 usage to increase ILP vaddpd ymm11, ymm11, ymm3 – parallelized over all vmulpd ymm4, ymm4, ymm4 vaddpd ymm12, ymm12, ymm4 available cores vmulpd ymm5, ymm5, ymm5 vaddpd ymm13, ymm13, ymm5 vmulpd ymm6, ymm6, ymm6 vaddpd ymm14, ymm14, ymm6 vmulpd ymm7, ymm7, ymm7 vaddpd ymm15, ymm15, ymm7 %endrep http://hpcgarage.org/archline

Algorithmic time, energy, and power on candidate HPC compute - PowerPoint PPT Presentation

Algorithmic time, energy, and power on candidate HPC compute building blocks Jee Choi, Marat Dukhan, Xing Liu, and Richard Vuduc May 20, 2014 Presented at IPDPS14 Contributions Energy roofline (IPDPS13) quantifies relative energy costs of

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Algorithmic Complexity Algorithmic Complexity "Algorithmic Complexity", also called

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Regolith resource requirements for an expanding Mars facility Jonathan Clarke and David Willson

The new The eBASH data flow SHA-3 software shootout One computer, hydra6 , D. J. Bernstein

PhotoSpread Sean Kandel with Eric Ableson, Hector Garcia-Molina, Andreas Paepcke, Martin

CSCI 246 Class 16 MORE EQUIVALENCE RELATIONS Quiz Questions Lecture 27: What is the

Wel elcome! come! NF NFPA A An Annual nual Conf nference rence February 8-10, 2017

PQCRYPTO project in the EU Tanja Lange 3 April 2015 NIST Workshop on Cybersecurity in a

Python Strings and Data Structures Learning Objectives Strings (more) Python data

Scientific Animal Image Analysis SANIMAL David Slovikosky UofA Jaguar and Ocelot Monitoring

Sambuz

Useful Links

Newsletter

Mail Us

Algorithmic time, energy, and power on candidate HPC compute - PowerPoint PPT Presentation

Algorithmic time, energy, and power on candidate HPC compute building blocks Jee Choi, Marat Dukhan, Xing Liu, and Richard Vuduc May 20, 2014 Presented at IPDPS14 Contributions Energy roofline (IPDPS13) quantifies relative energy costs of

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Algorithmic Complexity Algorithmic Complexity &quot;Algorithmic Complexity&quot;, also called

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Regolith resource requirements for an expanding Mars facility Jonathan Clarke and David Willson

The new The eBASH data flow SHA-3 software shootout One computer, hydra6 , D. J. Bernstein

PhotoSpread Sean Kandel with Eric Ableson, Hector Garcia-Molina, Andreas Paepcke, Martin

CSCI 246 Class 16 MORE EQUIVALENCE RELATIONS Quiz Questions Lecture 27: What is the

Wel elcome! come! NF NFPA A An Annual nual Conf nference rence February 8-10, 2017

PQCRYPTO project in the EU Tanja Lange 3 April 2015 NIST Workshop on Cybersecurity in a

Python Strings and Data Structures Learning Objectives Strings (more) Python data

Scientific Animal Image Analysis SANIMAL David Slovikosky UofA Jaguar and Ocelot Monitoring

Sambuz

Useful Links

Newsletter

Mail Us

Algorithmic Complexity Algorithmic Complexity "Algorithmic Complexity", also called