and Tuning as a Serv rvice EU H20 EU H2020 Cen Centre of of Ex - PowerPoint PPT Presentation

Parallel Performance Analysis and Tuning as a Serv rvice EU H20 EU H2020 Cen Centre of of Ex Excell llence (Co CoE) 1 1 Oc October 2015 2015 – 31 31 Mar arch 2018 2018 Gr Grant Agr greement No o 6765 676553

POP CoE • A Centre of Excellence • On Performance Optimisation and Productivity • Promoting best practices in parallel programming • Providing Services • Precise understanding of application and system behaviour • Suggestion/support on how to refactor code in the most productive way • Horizontal • Transversal across application areas, platforms, scales • For (your?) academic AND industrial codes and users ! 2

Partners • Who? • BSC (coordinator), ES • HLRS, DE • JSC, DE • NAG, UK • RWTH Aachen, IT Center, DE • TERATEC, FR A team with • Excellence in performance tools and tuning • Excellence in programming models and practices • Research and development background AND proven commitment in application to real academic and industrial use cases 3

Motivation Why? • Complexity of machines and codes  Frequent lack of quantified understanding of actual behaviour  Not clear most productive direction of code refactoring • Important to maximize efficiency (performance, power) of compute intensive applications and productivity of the development efforts What? • Parallel programs, mainly MPI/OpenMP • Although also CUDA, OpenCL, OpenACC , Python, … 4

The process … When? October 2015 – March 2018 How? • Apply • Fill in small questionnaire describing application and needs https://pop-coe.eu/request-service-form • Questions? Ask pop@bsc.es • Selection/assignment process • Install tools @ your production machine (local, PRACE, …) • Interactively: Gather data  Analysis  Report 5

Services provided by the CoE  Report ? Parallel Application Performance Audit • Primary service • Identify performance issues of customer code (at customer site) • Small effort (< 1 month)  Report ! Parallel Application Performance Plan • Follow-up on the audit service • Identifies the root causes of the issues found and qualifies and quantifies approaches to address them • Longer effort (1-3 months)  Proof-of-Concept  Software Demonstrator • Experiments and mock-up tests for customer codes • Kernel extraction, parallelisation, mini-apps experiments to show effect of proposed optimisations • 6 months effort

Outline of a Typical Audit Report • Application Structure • (if appropriate) Region of Interest • Scalability Information • Application Efficiency • E.g. time spent outside MPI • Load Balance • Whether due to internal or external factors • Serial Performance • Identification of poor code quality • Communications • E.g. sensitivity to network performance • Summary and Recommendations 7

Effic iciencies (WIP!) • The following metrics are used in a POP Performance Audit: • Global Efficiency (GE): GE = PE * CompE CT = Computational time TT = Total time • Parallel Efficiency (PE): PE = LB * CommE • Load Balance Efficiency (LB): LB = avg(CT)/max(CT) • Communication Efficiency (CommE): CommE = SerE * TE • Serialization Efficiency (SerE): SerE = max (CT / TT on ideal network) • Transfer Efficiency (TE): TE = TT on ideal network / TT • Computation Efficiency (CompE) • Computed out of IPC Scaling and Instruction Scaling • For strong scaling: ideal scaling -> efficiency of 1.0 • Details see https://sharepoint.ecampus.rwth-aachen.de/units/rz/HPC/public/Shared%20Documents/Metrics.pdf 8

POP Users and Their Codes Area Codes Computational Fluid Dynamics DROPS (RWTH Aachen), Nek5000 (PDC KTH), SOWFA (CENER), ParFlow (FZ-Juelich), FDS (COAC) & others Electronic Structure Calculations ADF (SCM), Quantum Expresso (Cineca), FHI-AIMS (University of Barcelona), SIESTA (BSC), ONETEP (University of Warwick) Earth Sciences NEMO (BULL), UKCA (University of Cambridge), SHEMAT-Suite (RWTH Aachen) & others Finite Element Analysis Ateles (University of Siegen) & others Gyrokinetic Plasma Turbulence GYSELA (CEA), GS2 (STFC) Materials Modelling VAMPIRE (University of York), GraGLeS2D (RWTH Aachen), DPM (University of Luxembourg), QUIP (University of Warwick) & others Neural Networks OpenNN (Artelnics) 9

Customer Feedback (Sep 2016) • Results from 18 of 23 completed feedback surveys (~78%) • How responsive have the POP experts been to • What was the quality of their answers? your questions or concerns about the analysis and the report? 10

Best Practices in Performance Analysis • Powerful tools … • Unify methodologies • Structure • Extrae + Paraver • Spatio temporal / syntactic • Score-P + Scalasca/TAU/Vampir + Cube • Metrics • Dimemas, Extra-P • Parallel fundamental factors: Efficiency, Load balance, Serialization • Commercial tools (if available) • Programming model related metrics • User level code sequential • … and techniques performance • Hierarchical search • Clustering, modeling, projection, • From high level fundamental extrapolation, memory access patterns, behavior to its causes • … with extreme detail … • To deliver insight • … and up to extreme scale • To estimate potentials 11

Proof-of of-Concept Examples 12

GraGLeS2D – RWTH Aachen • Simulates grain growth phenomena in polycrystalline materials • C++ parallelized with OpenMP • Designed for very large SMP machines (e.g. 16 sockets and 2 TB memory) • Key audit results: • Good load balance • Costly use of division and square root inside loops • Not fully utilising vectorisation in key loops • NUMA specific data sharing issues lead to long times for memory access 13

GraGLeS2D – RWTH Aachen • Improvements: • Restructured code to enable vectorisation • Used memory allocation library optimised for NUMA machines • Reordered work distribution to optimise for data locality • Speed up in region of interest is more than 10x • Overall application speed up is 2.5x 14

Ateles – Univ iversity of Sie iegen • Finite element code • C and Fortran code with hybrid MPI+OpenMP parallelisation • Key audit results: • High number of function calls • Costly divisions inside inner loops • Poor load balance • Performance plan: • Improve function inlining • Improve vectorisation • Reduce duplicate computation 15

Ateles – Proof-of of-concept • Inlined key functions → 6% reduction in execution time • Improved mathematical operations in loops → 28% reduction in execution time • Vectorisation: found bug in gnu compiler, confirmed Intel compiler worked as expected • 6 weeks software engineering effort • Customer has confirmed “substantial” performance increase on production runs 16

Sustainability • H2020 CoE’s are supposed to sustain themselves after some point • Proposals had to include a business plan • Current plan: 3 sustainable operation modes • Pay-per-service • Service subscriptions • Continue as non-profit organisation (broker for free + payed services) • Requires to have more industrial rather than academic/research customers • Experience so far • Typically require NDA  delays services by months • No access to code/computers  guide (inexperienced) customer to install tools + measure  delays services by months 17

Performance Optimisation and Productivity A Centre of Excellence in Computing Applications Contact: https://www.pop-coe.eu mailto:pop@bsc.es 05-Oct-16 18 This project has received funding from the European Union‘s Horizon 2020 research and innovation programme under grant agreement No 676553.

and Tuning as a Serv rvice EU H20 EU H2020 Cen Centre of of Ex - PowerPoint PPT Presentation

Parallel Performance Analysis and Tuning as a Serv rvice EU H20 EU H2020 Cen Centre of of Ex Excell llence (Co CoE) 1 1 Oc October 2015 2015 31 31 Mar arch 2018 2018 Gr Grant Agr greement No o 6765 676553 POP CoE A

December 2013 Rea l esta te d ea ls serv ices Rea l esta te d ea ls serv ices Target properties

Front Door Serv rvice Overview Development of the Front Door Service A single integrated team -

Serv rvice Pla lan BART Board Meeting March 22, 2018 Agenda Background 4 AM - 5 AM

Spring 2020 In In-Serv rvice Welc lcome New Emplo loyees New Staff Bob Piontek Campus

KIA KIA PAKAR PAKARI YO YOUNG DA DADS DS SE SERV RVICE What is our service about? Our

THO THORNH RNHILL LL S. S.S. S. GU GUIDA DANC NCE E SE SERV RVICE CES SC SCHO HOLA

The Active Resilience Wellbeing Serv rvice What t is is The Acti tive Resilience Well

Serv rvice Array An Introduction for Schools August, 2020 mhrbwcc.org (513) 695-1695

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

RHC Committee Webinar Series Ca re Ma na gem ent Serv ices a nd Prop osed Virtua l Com m unica

Inattentive Households and Consumption Declines During Retirement Sheng Guo Florida

AI Planning meets Production Logistics Francesco Leofante Imperial College London, United Kingdom

solving Linear Optimization What we did so far How to model an optimization problem

A hybrid metaheuristic for production planning Jo ao Pedro PEDROSO Universidade do Porto,

Large scale agreements via Microdebates Simone Gabbriellini and Paolo Torroni Department of

The UX Life Cycle SWEN-444 Selected material from The UX Book , Hartson & Pyla The Wheel

Tracking FreeBSD in a Commercial Environment Warner Losh imp@FreeBSD.org The FreeBSD Project

1 Basics 7 Basics 8 Examples for Projects and Operations What is an IT-Project? Projects

and Tuning as a Serv rvice EU H20 EU H2020 Cen Centre of of Ex - PowerPoint PPT Presentation

Parallel Performance Analysis and Tuning as a Serv rvice EU H20 EU H2020 Cen Centre of of Ex Excell llence (Co CoE) 1 1 Oc October 2015 2015 31 31 Mar arch 2018 2018 Gr Grant Agr greement No o 6765 676553 POP CoE A

December 2013 Rea l esta te d ea ls serv ices Rea l esta te d ea ls serv ices Target properties

Front Door Serv rvice Overview Development of the Front Door Service A single integrated team -

Serv rvice Pla lan BART Board Meeting March 22, 2018 Agenda Background 4 AM - 5 AM

Spring 2020 In In-Serv rvice Welc lcome New Emplo loyees New Staff Bob Piontek Campus

KIA KIA PAKAR PAKARI YO YOUNG DA DADS DS SE SERV RVICE What is our service about? Our

THO THORNH RNHILL LL S. S.S. S. GU GUIDA DANC NCE E SE SERV RVICE CES SC SCHO HOLA

The Active Resilience Wellbeing Serv rvice What t is is The Acti tive Resilience Well

Serv rvice Array An Introduction for Schools August, 2020 mhrbwcc.org (513) 695-1695

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

RHC Committee Webinar Series Ca re Ma na gem ent Serv ices a nd Prop osed Virtua l Com m unica

Inattentive Households and Consumption Declines During Retirement Sheng Guo Florida

AI Planning meets Production Logistics Francesco Leofante Imperial College London, United Kingdom

solving Linear Optimization What we did so far How to model an optimization problem

A hybrid metaheuristic for production planning Jo ao Pedro PEDROSO Universidade do Porto,

Large scale agreements via Microdebates Simone Gabbriellini and Paolo Torroni Department of

The UX Life Cycle SWEN-444 Selected material from The UX Book , Hartson &amp; Pyla The Wheel

Tracking FreeBSD in a Commercial Environment Warner Losh imp@FreeBSD.org The FreeBSD Project

1 Basics 7 Basics 8 Examples for Projects and Operations What is an IT-Project? Projects

The UX Life Cycle SWEN-444 Selected material from The UX Book , Hartson & Pyla The Wheel