Benchmark Performance of Different Compilers on a Cray XE6 Mike - PowerPoint PPT Presentation

Benchmark Performance of Different Compilers on a Cray XE6 Mike Stewart and Helen He NERSC User Services Group May 23-26, CUG 2011 1

Outline • Introduction • Available Compilers on Hopper • Recommended Compiler Options • Benchmarks Used in the study • Performance Results from Each Compiler • Summary and Recommendations 2

Hopper • Cray XE6, 6,384 nodes, 153,126 cores. • Each node has 2 twelve-core AMD MagnyCours 2.1 GHz procs. • 1.28 Pflops/peak, 212 TB memory. 3

Available Compilers on Hopper • Portland Group Compilers – This is the default compiler on Hopper • Pathscale Compilers – % module swap PrgEnv-pgi PrgEnv-pathscale • Cray Compilers – % module swap PrgEnv-pgi PrgEnv-cray • GNU Compilers – % module swap PrgEnv-pgi PrgEnv-gnu 4

Compile Codes on Hopper • Cross compilation from login nodes to build executables to run on the compute nodes. • To use a particular compiler, first swap to the corresponding PrgEnv. • Then use compiler wrappers: – ftn for Fortran codes – cc for C codes – CC for C++ codes • The wrappers can find the proper system and MPI libraries. 5

Compiler Flags Comparison PGI Pathscale Cray GNU Explanation -fast -Ofast -O3 -O3 High level optimization -mp=nonuma -mp -h omp -fopenmp Enable (default) OpenMP -byteswapio -byteswapio -h byteswapio -fconvert=swap Read files in big-endian -Mfixed -fixedform -f fixed -ffixed-form Fixed form source -Mfree -freeform -f free -ffree-form Free form source -V -dumpversion -V --version Show version info not -zerouv -e 0 -finit-local-zero Zero fill implemented uninitialized values 6

Recommended Options: PGI Compiler • NERSC recommends:  -fast or –fastsse • PGI User Documentation:  “-fast –Mipa=fast” is a good set of options. • Cray recommends:  -fast –Mipa=fast  If can be flexible with precision, also try –Mfpreleaxed. 7

Recommended Options: Pathscale Compiler • NERSC recommends:  -Ofast • Pathscale User Documentation:  Start with –O2, then –O3,  then –O3 –OPT:Ofast, then -Ofast. • Cray recommends:  -Ofast 8

Recommended Options: Cray Compiler • NERSC recommends:  -O3 • Cray recommends:  Use default –O2, which is equivalent to –O3 or –fast in other compilers.  Use –O3,fp3 (or –O3 –hfp3)  -O3 only slightly better than –O2  -hfp3 gives maximum freedom in floating point optimization, may not conform to IEEE standard. 9

Recommended Options: GNU Compiler • NERSC recommends:  -O3 • Cray recommends:  -O3 –ffast-math –funroll-loops  -ffast-math: may not conform IEEE standard 10

NERSC6 Application Benchmarks Benchmark Science Algorithm Concurrency Language GTC Fusion PIC, finite 2048 (waeking F90 difference scaling) IMPACT-T Accelerator PIC, FFT 1024 (strong F90 Physics scaling) MAESTRO Astrophysics Block 2048 (weak F90 structured-grid scaling) multiphysics MILC Lattice Conjugate 1024 (weak C, Assembly Gauge gradient, scaling) Physics sparse matrix, (QCD) FFT PARATEC Material DFT, FFT, 1024 (string F90 Science BLAS scaling) 11

NPB 3.3 Benchmarks Benchmark Full Name Level Concurrency BT Block Tridiagonal D 256 CG Conjugate Gradient E 256 EP Embarassingly Parallel E 256 FT Fast Fourier Transform D 256 LU Lower-Upper Symmetric E 256 Gauss-Siedel MG MultiGrid E 256 SP Scalar Pentadiagonal D 256 12

PGI Compiler Results • Other 3 options do not significantly improve performance over “-fast”. • The NPB FT case D is an exception. 13

Pathscale Compiler Results • -O2 performs worse than other 3 options. cxvxcbcb • -O3 optimizes almost all benchmarks well. • Extra options on top of –O3 do not improve significantly. 14

Cray Compiler Results • Only one benchmark with –Ofp3 shows significant improvement over default –O2. 15

GNU Compiler Results • -O3 generally gives a good level of optimization. cxvxcbcb • Worth to try –ffast-math option. Improves performance significantly in some cases. 16

Overall Compilers Comparison • Pathscale fastest: 6 out of 12. • Cray fastest: 3 out of 12. • PGI fastest: 2 out of 12. • GNU fastest: 1 out of 12. • Mean against PGI: Cray 0.96, Pathscale 0 .94, GNU 0.99 17

Summary and Recommendations • Users should experiment with different compilers and compiler options to tune their application performance on Hopper. • On the average the Pathscale and Cray compilers produce somewhat faster code on Hopper (or another Cray system), since they are specifically designed for these processors. In addition the Cray compilers make use of the Cray math libraries at compile time to further optimize codes. • PGI compilers are available on a wide variety of platforms other than Cray machines. Many existing codes have PGI targeted Makefiles, could generate very good performance. • Using the gnu compilers allows you to compile on virtually every Unix and Linux system. Although the performance on Hopper for some codes with GNU compilers is quite good, there is no guarantee for optimal performance on other platforms. 18

Benchmark Performance of Different Compilers on a Cray XE6 Mike - PowerPoint PPT Presentation

Benchmark Performance of Different Compilers on a Cray XE6 Mike Stewart and Helen He NERSC User Services Group May 23-26, CUG 2011 1 Outline Introduction Available Compilers on Hopper Recommended Compiler Options Benchmarks Used

Medicaid Benchmark Options Analysis Stakeholder Advisory Committee July 23, 2012 Overview

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

Benchmark suites to measure Motivation computer performance Benchmarking overview

Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation,

CMBX Indices The New US Commercial Mortgage Backed Credit Default Swap Benchmark Indices March

ABX Indices The New US Asset Backed Credit Default Swap Benchmark Indices January 2006 CDS

Benchmark and comparison of real-time solutions based on embedded Linux Peter Feuerer August 8,

Establishing Realistic Investment Earnings Benchmarks What is a Benchmark? A benchmark is a

Joint Joint Doctrine Doctrine Ontology as Ontology as Benchmark fo Benchmark for Military r

2016 Benchmark Survey Ken Benson Subaru of America Technical Training OE Benchmark Survey

The HPC Challenge Benchmark The HPC Challenge Benchmark http://icl.cs.utk.edu/hpcc/ Jack

A Benchmark Suite for Formal Verification of Analog Circuits Felix Salfelder, Lars Hedrich

Automatic Configuration of Benchmark Sets for Classical Planning Alvaro Torralba, 1 Jendrik Seipp,

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

WELCOME TO SOUTHERN AFRICA LEAdERSHIp SUMMIT THEME: bENCHMARk HOSTED BY: MEN OF ISSACHAR VISION

2018/19 Performance Benchmark Report Directors Office of Asset Management 1 What is the

Introduction to KVM By Sheng-wei Lee swlee@swlee.org #20110929 Outline Hypervisor - KVM

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

Instruction Set Architectures Part II: x86, RISC, and CISC Readings: 2.16-2.18 1 Which ISA

Risk Parity Portfolios with riskParityPortfolio Prof. Daniel P. Palomar (Joint work with Z

Welcome to the IS ISPD 2017 ACM In International Sym ymposium on Physical Desig ign

AI-assisted Design for Architecture (AIDArc) held in conjunction with ISCA-2018 Los Angles, June

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Arlo Clark-Foos, Ph.D. Medial Temporal Lobes Henry Molaison (HM) (1926-2008) Consequences of