EU H2020 Centre of
- f Excellenc
nce (CoE) 1 Oc Octob
- ber 2015 – 31 March
h 2018 Grant Ag Agreement nt No 676553
Perf erfor
- rmance
e Opti Optimisa sation
- n
and nd Pr Prod
- duct
and nd Pr Prod oduct ctivity EU H2020 Centre of of Excellenc - - PowerPoint PPT Presentation
Perf erfor ormance e Opti Optimisa sation on and nd Pr Prod oduct ctivity EU H2020 Centre of of Excellenc nce (CoE) 1 Oc Octob ober 2015 31 March h
EU H2020 Centre of
nce (CoE) 1 Oc Octob
h 2018 Grant Ag Agreement nt No 676553
way
2
A team with
proven commitment in application to real academic and industrial use cases
3
Why?
Frequent lack of quantified understanding of actual behaviour Not clear most productive direction of code refactoring
compute intensive applications and productivity of the development efforts What?
4
When? October 2015 – March 2018 How?
describing application and needs https://pop-coe.eu/request-service-form
5
? Parallel Application Performance Audit Report
! Parallel Application Performance Plan Report
qualifies and quantifies approaches to address them
Proof-of-Concept Software Demonstrator
effect of proposed optimisations
7
8
CT = Computational time TT = Total time
behaviour
directions to refactor code
performance in specific production conditions
environment setup
provider
performance in production conditions
modifying environment setup
time allocation processes
9
Area Codes Computational Fluid Dynamics
DROPS (RWTH Aachen), Nek5000 (PDC KTH), SOWFA (CENER), ParFlow (FZ-Juelich), FDS (COAC) & others
Electronic StructureCalculations
ADF (SCM), Quantum Expresso (Cineca), FHI-AIMS (University of Barcelona), SIESTA (BSC), ONETEP (University of Warwick)
Earth Sciences
NEMO (BULL), UKCA (University of Cambridge), SHEMAT-Suite (RWTH Aachen) & others
Finite Element Analysis
Ateles (University of Siegen) & others
GyrokineticPlasma Turbulence
GYSELA (CEA), GS2 (STFC)
Materials Modelling
VAMPIRE (University of York), GraGLeS2D (RWTH Aachen), DPM (University of Luxembourg), QUIP (University of Warwick) & others
Neural Networks
OpenNN (Artelnics)
10
11
your questions or concerns about the analysis and the report?
extrapolation, memory access patterns,
Efficiency, Load balance, Serialization
performance
behavior to its causes
12
13
14
(if available at customer site)
Scalasca
wait-state analysis
CUBE4 report CUBE4 report
Online interface Instrumented target application
Score-P
PAPI
OTF2 traces
TAU
PerfExplorer
Periscope TAU ParaProf CUBE Vampir
Remote Guidance
ww.bsc/es/paraver)
17
Instantaneous metrics for ALL hardware counters at “no” cost Adaptive burst mode tracing Tracking performance evolution
26.7MB trace Eff: 0.43; LB: 0.52; Comm:0.81
1600 cores 2.5 s
BSC-ES – EC-EARTH BSC-ES – EC-EARTH AMG2013
Flexible trace visualization and analysis Advanced clustering algorithms
ww.bsc/es/paraver)
18
What if … What if … we increase the IPC of Cluster1? … we balance Clusters 1 & 2?
ww.bsc/es/paraver)
19
eff.csv
Several core countseff_factors.py extrapolation.py Dimemas
No MPI noise + No OS noise
“Scalability prediction for fundamental performance factors ” J. Labarta et al. SuperFRI 2014
Models and Projection Data access patterns Tareador
Intel –BSC ExascaleLab
20
conversion of particulate material in furnaces
21
due to the way that the code had been parallelised
point contention due to sending MPI messages in increasing-rank order
calculations and materials modelling
process does useful computation (1.77x speedup over 1 thread)
22
and replace multiple calls to the random number generator with a single call that returns a vector of numbers
lead to 2x speedup
23
tokamaks
1.4% due to MPI)
24
25
memory)
26
27
28
execution time
worked as expected
production runs
29
…
suggested refactoring efforts
POP Coordination
Barcelona Supercomputing Center (BSC) Email: pop@bsc.es URL: http://www.pop-coe.eu
30
31
29-Sep-16 32
This project has received funding from the European Union‘s Horizon 2020 research and innovation programme under grant agreement No 676553.