11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 1
Felix Wolf, TU Darmstadt
Lightweight Requirements Engineering for Exascale Co-design Felix - - PowerPoint PPT Presentation
Lightweight Requirements Engineering for Exascale Co-design Felix Wolf, TU Darmstadt Application System 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 1 Acknowledgement
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 1
Felix Wolf, TU Darmstadt
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 2
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 3
Better algorithms
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 4
Communication C
p u t a t i
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 5
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 6
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 7
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 8
#Loads & stores #Loads & stores #Loads & stores
#Bytes sent & received #Bytes sent & received #Bytes sent & received
#FLOPS #FLOPS #FLOPS + Stack distance + Stack distance
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 9
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 10
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 11
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 12
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 13
ikl ⋅log2 jkl (xl) l=1 m
k=1 n
Performance Modeling (CLUSTER ’16) www.scalasca.org/software/extra-p/download.html
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 14
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 15
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 16
Lulesh
Computation #FLOPs Communication #Bytes sent & received Memory access #Loads & stores Memory footprint #Bytes used Memory locality Stack distance
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 17
Available sockets # Processes Available memory per process Problem size per process Overall problem size Requirement models Requirements #FLOPS #Bytes sent ...
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 18
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 19
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 20
Kripke LULESH MILC Relearn icoFoam Baseline System Upgrade A: Double the racks Problem size per process 1 1 1 1 0.5 1 Overall problem size 2 2 2 2 1 2 Computation 1 1.2 1 1 0.5 1 Communication 1 1.2 1 1 0.7 1 Memory accesses 2 1.2 2.8 2 0.7 1 System Upgrade B: Double the sockets Problem size per process 0.5 0.5 0.5 0.3 0.3 0.5 Overall problem size 1 1 1 0.5 0.6 1 Computation 0.5 0.6 0.5 0.3 0.2 0.5 Communication 0.5 0.6 0.5 0.3 0.3 0.5 Memory accesses 0.5 1 1.4 1 0.5 0.5 System Upgrade C: Double the memory Problem size per process 2 1.4 2 4 1.4 2 Overall problem size 2 1.4 2 4 1.4 2 Computation 2 1.4 2 4 1.7 2 Communication 2 1.4 2 4 1.4 2 Memory accesses 2 1.4 2 4 1.4 2
Apps Ratios Kripke LULESH MILC Relearn icoFoam Kripke Relearn MILC
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 21
Many but weak processors Few but powerful processors Moderate number of moderate processors
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 22
Metric Massively parallel Vector Hybrid
Maximum overall problem size
Minimum wall time for benchmark problem[s]
Maximum overall problem size
Minimum wall time for benchmark problem [s]
Maximum overall problem size
Minimum wall time for benchmark problem [s]
Maximum overall problem size
Minimum wall time for benchmark problem [s]
Bigger problem versus faster solution Vector system clear winner
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 23
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 24
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 25
#pragma omp task shared(x) x = fib( n – 1 ); #pragma omp task shared(y) y = fib( n – 2 ); #pragma omp taskwait return x + y;
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 26
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 27
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 28
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 29
1
1(n)
1 = 45
6 1 7 3 4 7 5 2
1(n)
1(n)
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 30
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 31
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 32
0.8 =1.55−1.02⋅600.25 + 4.59⋅10−2 ⋅600.25 logn
App. Model Input size for p = 60, E = 0.8 Fibonacci 51 51 49 Strassen 83,600 x 83,600 12,680 x 12,680 1,200 x 1,200 Eac = 0.98− 5.11⋅10−3 p1.25 +1.76⋅10−3 p1.25 logn Ecf = 0.97−1.46⋅10−2 p1.25 + 9.26⋅10−3 p1.25 logn Eac =1.55−1.02p0.25 + 4.59⋅10−2 p0.25 logn Ecf =1.26 − 0.65p0.33 +3.89⋅10−2 p0.33 logn Eub = min 1, 25.48+ 0.49n2.75 logn
Eub = min 1, 0.25n0.75
Eac =1.55−1.02p0.25 + 4.59⋅10−2 p0.25 logn
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 33
http://www.scalasca.org/software/extra-p/download.html
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 34
[1] Alexandru Calotoiu, Alexander Graf, Torsten Hoefler, Daniel Lorenz, Sebastian Rinke, Felix Wolf: Lightweight Requirements for Exascale Co-design. In Proc. of the 2018 IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK [2] Sergei Shudler, Alexandru Calotoiu, Torsten Hoefler, Felix Wolf: Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based
Practice of Parallel Programming (PPoPP), Austin, TX, USA, pages 1-13, ACM, February, 2017 [2] Alexandru Calotoiu, David Beckingsale, Christopher W. Earl, Torsten Hoefler, Ian Karlin, Martin Schulz, Felix Wolf: Fast Multi-Parameter Performance Modeling. In
(CLUSTER), Taipei, Taiwan [3] Alexandru Calotoiu, Torsten Hoefler, Marius Poke, Felix Wolf: Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In Proc. of the ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA
11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 35