Evaluation of Productivity and Performance Characteristics of CCE - PowerPoint PPT Presentation

Evaluation of Productivity and Performance Characteristics of CCE CAF and UPC Compilers Sadaf Alam, William Sawyer, Tim Stitt, Neil Stringfellow, and Adrian Tineo , Swiss National Supercomputing Center (CSCS)

Motivation  Upcoming CSCS development platform— Baker system with GEMINI interconnect Availability of PGAS compilers on XT5  HP2C projects  PRACE WP8 evaluation 

HP2C Projects (www.hp2c.ch) • Effort to prepare applications for the next-gen platform  BigDFT - Large scale Density Functional Electronic Structure Calculations in a Systematic Wavelet Basis Set; Stefan Goedecker, Uni Basel  Cardiovascular - HPC for Cardiovascular System Simulations; Prof. Alfio Quarteroni, EPF Lausanne  CCLM - Regional Climate and Weather Modeling on the Next Generations High- Performance Computers: Towards Cloud-Resolving Simulations; Dr. Isabelle Bey, ETH Zurich  Cosmology - Computational Cosmology on the Petascale; Prof. Dr. George Lake, Uni Zürich  CP2K - New Frontiers in ab initio Molecular Dynamics; Prof. Dr. Juerg Hutter, Uni Zürich  Gyrokinetic - Advanced Gyrokinetic Numerical Simulations of Turbulence in Fusion Plasmas; Prof. Laurent Villard, EPF Lausanne  MAQUIS - Modern Algorithms for Quantum Interacting Systems; Prof. Thierry Giamarchi, University of Geneva  Petaquake - Large-Scale Parallel Nonlinear Optimization for High Resolution 3D- Seismic Imaging; Dr. Olaf Schenk, Uni Basel  Selectome - Selectome, looking for Darwinian Evolution in the Tree of Life; Prof. Dr. Marc Robinson-Rechavi, Uni Lausanne  Supernova - Productive 3D Models of Stellar Explosions; Dr. Matthias Liebendörfer, Uni Basel

PRACE Work Package 8 • Evaluation of hardware and software prototypes – CSCS focused on CCE PGAS compilers – “Technical Report on the Evaluation of Promising Architectures for Future Multi- Petaflop/s Systems” www. prace -project.eu/documents/d8-3-2.pdf

1-min introduction to PGAS Task Task • PGAS—Partitioned Global Address Space MPI Mine Mine – Not MPI message-passing API approach – Not a single, shared memory OpenMP approach Thread Thread • Memory model with local and remote accesses – Access to local data—fast – Access to remote data—slow Image/ Image/ Ours Thread Thread • Language extensions Shared-mem – CAF (Co Array Fortran) – UPC (Unified Parallel C) Mine Mine PGAS

Yet another prog. Model? • Yes and no – Been around for 10+ years – Limited success stories • What is different now? – GEMINI provides NW support for PGAS access patterns – Compiler can potentially overlap comm./comp.

Target Platforms X2 with proprietary vector XT5 with commodity uProc proc. and custom interconnect and custom interconnect

Building Blocks of CCE PGAS Compilers • Front end (C/C++/Fortran plus CAF and UPC) • X86 back-end • GASNet communication interface – Expected to change on GEMINI based systems

Test Cases X2 • Remote access 791. 1 Vr------< DO j = 1,n 792. 1 Vr b(j) = scalar*c(j)[2] STREAM 793. 1 Vr------> end DO XT • Matrix Multiply 791. 1 1-------< DO j = 1,n 792. 1 1 b(j) = scalar*c(j)[2] • Stencil based filter 793. 1 1--------> end DO

Compiler Listing X2 1------< upc_forall (i=0; i<N; i++; &c[i][0]) { 1 V----< for (j=0; j<M; j++) { 1 V c[i][j]=0; 1 V r--< for (l=0; l<P; l++) 1 V r--> c[i][j]+=a[i][l]*b[l][j]; 1 V----> } XT5 1------< upc_forall (i=0; i<N; i++; &c[i][0]) { 1 i----< for (j=0; j<M; j++) { 1 i c[i][j]=0; 1 i 3--< for (l=0; l<P; l++) 1 i 3--> c[i][j]+=a[i][l]*b[l][j]; 1 i----> } 1------> }

X2 Results Single image (GB/s) Two images (GB/s) Copy 81.25 37.57 Scale 85.63 37.48 Add 57.54 34.95 Triad 60.37 34.95 Vectorization Local memory copies Remote memory copies

XT5 Results Single image (MB/s) Two images (MB/s) Copy 8524.85 3372.67 Scale 8450.93 1.42 8792.65 1.50 Add Triad 8716.84 1.50 Vectorization No vectorization Local memory copies Remote memory copies—one element at a time

Code Rewrite—Reducing Remote Accesses Original matrix multiply Alternative matrix multiply shared [N*P/THREADS] int a[N][P],c[N][M]; shared [N*P/THREADS] int a[N][P],c[N][M]; shared [M/THREADS] int b[P][M]; shared [M/THREADS] int b[P][M]; […] […] upc_forall (i=0; i<N; i++; &c[i][0]) { for(j=0;j<M;j++){ for (j=0; j<M; j++) { for(l=0;l<P;l++){ c[i][j]=0; b_val = b[l][j]; for (l=0; l<P; l++) upc_forall(i=0;i<N;i++;&c[i][0]) c[i][j]+=a[i][l]*b[l][j]; c[i][j]+=a[i][l]*b_val; } } } }

Matrix Multiply Results on XT5 No difference on X2 platform—slowdown for the alternate implementation

Productivity Evaluation CAF UPC Compiler interface   Biggest Issue is availability of Runtime control   multi-platform Debugging tools   compilers esp. for CAF Performance tools  

Conclusions • Need to retain uProc level optimization • Memory and comm. Hierarchy aware runtime • CCE PGAS compilers for x86 and GASNet supported platforms • PGAS aware debugging and performance tools Looking forward to experimenting with GEMINI

Acknowledgements The authors would like to thank Dr Jason Beech- Brandt from the Cray Centre of Excellence for HECToR in the UK for providing access to the X2 nodes of the system. We also appreciate the feedback from Bill Long, Cray for advice on the CAF development of the stencil application.

THANK YOU

Evaluation of Productivity and Performance Characteristics of CCE - PowerPoint PPT Presentation

Evaluation of Productivity and Performance Characteristics of CCE CAF and UPC Compilers Sadaf Alam, William Sawyer, Tim Stitt, Neil Stringfellow, and Adrian Tineo , Swiss National Supercomputing Center (CSCS) Motivation Upcoming CSCS

EVALUATION Richard Kneller School of Economics, University of Nottingham The productivity of

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

What is a performance evaluation? Performance Management v. Performance Evaluation Evaluation

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

X10: a High-Productivity Approach to X10: a High-Productivity Approach to High Performance

Structural change, labor productivity and globalization productivity and globalization Margaret

Training course for policy makers on productivity and working conditions in SMEs SESSION 4:

Productivity Development in Germany And the Financial Crisis by Georg Erber 22. November 2012

DARPA HPCS Overview Productivity Evaluation David Koester, Ph.D. DARPA HPCS Productivity Team

Decent work as a source of Decent work as a source of productivity in Europe productivity in

Automated Productivity Based Automated Productivity Based Schedule Animation (APBSA) Schedule

OUTLOOK, JULY 2 0 1 7 Peter Harris Productivity Com m ission Productivity Commission 1 2

Cilk for High Cilk for High Productivity Computing Productivity Computing Bradley C. Kuszmaul

Testing Kotlin at Scale: Spek Artem Zinnatullin @artem_zin - Productivity - Productivity -

OUTLOOK, JULY 2017 Peter Harris Productivity Commission Productivity Commission 1 2 Topic

CAF Benchmarking CAF Benchmarking Marco MEONI CERN - Offline Week C N O e Wee Alice Offline

CAF C++ Actor Framework Matthias Vallentin UC Berkeley Berkeley C++ Summit October 17, 2016

Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Yuri

Logos Class Letters to the Corinthians online class for Oct 11, 2020 Logos Class. Mod 1 Room

Implementation of flux fit in Cafana - To a large extent, it is done. Cafana oscillation - It

Welcome Child Concern / CAF Training Aims of the Day Introduce partner agencies to: Data

Broker Bros New Communication Library Robin Sommer ICSI / LBNL / Broala

Study abroad programs = Academic programs Take classes seriously

Evaluation of Productivity and Performance Characteristics of CCE - PowerPoint PPT Presentation

Evaluation of Productivity and Performance Characteristics of CCE CAF and UPC Compilers Sadaf Alam, William Sawyer, Tim Stitt, Neil Stringfellow, and Adrian Tineo , Swiss National Supercomputing Center (CSCS) Motivation Upcoming CSCS

EVALUATION Richard Kneller School of Economics, University of Nottingham The productivity of

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

What is a performance evaluation? Performance Management v. Performance Evaluation Evaluation

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

Telematics 2 &amp; Performance Evaluation Chapter 4 Introduction to Performance Evaluation

X10: a High-Productivity Approach to X10: a High-Productivity Approach to High Performance

Structural change, labor productivity and globalization productivity and globalization Margaret

Training course for policy makers on productivity and working conditions in SMEs SESSION 4:

Productivity Development in Germany And the Financial Crisis by Georg Erber 22. November 2012

DARPA HPCS Overview Productivity Evaluation David Koester, Ph.D. DARPA HPCS Productivity Team

Decent work as a source of Decent work as a source of productivity in Europe productivity in

Automated Productivity Based Automated Productivity Based Schedule Animation (APBSA) Schedule

OUTLOOK, JULY 2 0 1 7 Peter Harris Productivity Com m ission Productivity Commission 1 2

Cilk for High Cilk for High Productivity Computing Productivity Computing Bradley C. Kuszmaul

Testing Kotlin at Scale: Spek Artem Zinnatullin @artem_zin - Productivity - Productivity -

OUTLOOK, JULY 2017 Peter Harris Productivity Commission Productivity Commission 1 2 Topic

CAF Benchmarking CAF Benchmarking Marco MEONI CERN - Offline Week C N O e Wee Alice Offline

CAF C++ Actor Framework Matthias Vallentin UC Berkeley Berkeley C++ Summit October 17, 2016

Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Yuri

Logos Class Letters to the Corinthians online class for Oct 11, 2020 Logos Class. Mod 1 Room

Implementation of flux fit in Cafana - To a large extent, it is done. Cafana oscillation - It

Welcome Child Concern / CAF Training Aims of the Day Introduce partner agencies to: Data

Broker Bros New Communication Library Robin Sommer ICSI / LBNL / Broala

Study abroad programs = Academic programs Take classes seriously

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation