- T. Schulthess
|
Thomas C. Schulthess
1
Reflecting on the Goal and Baseline of Exascale Computing Thomas C. - - PowerPoint PPT Presentation
Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess | T. Schulthess 1 Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b | T. Schulthess 2 Tracking supercomputer performance
|
1
|
2
|
2
|
2
|
2
1st application at > 1 TFLOP/s sustained
|
2
1st application at > 1 TFLOP/s sustained 1st application at > 1 PFLOP/s sustained
|
2
1st application at > 1 TFLOP/s sustained 1st application at > 1 PFLOP/s sustained
KKR-CPA (MST)
|
2
1st application at > 1 TFLOP/s sustained 1st application at > 1 PFLOP/s sustained
KKR-CPA (MST) LSMS (MST)
|
2
1st application at > 1 TFLOP/s sustained 1st application at > 1 PFLOP/s sustained
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
2
1st application at > 1 TFLOP/s sustained 1st application at > 1 PFLOP/s sustained
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
2
1st application at > 1 TFLOP/s sustained 1st application at > 1 PFLOP/s sustained
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
3
Source: Peter Bauer, ECMWF Source: Peter Bauer, ECMWF
| 4
|
5
Source: Peter Bauer, ECMWF
|
5
Source: Peter Bauer, ECMWF KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
5
Source: Peter Bauer, ECMWF
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
5
Source: Peter Bauer, ECMWF
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
5
Source: Peter Bauer, ECMWF
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
5
Source: Peter Bauer, ECMWF
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
5
Source: Peter Bauer, ECMWF
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
|
5
Source: Peter Bauer, ECMWF
KKR-CPA (MST) LSMS (MST) WL-LSMS (MST)
| 6
Source: Christoph Schär, ETH Zurich, & Nils Wedi, ECMWF
| 6
Source: Christoph Schär, ETH Zurich, & Nils Wedi, ECMWF
| 6
Source: Christoph Schär, ETH Zurich, & Nils Wedi, ECMWF
|
7
Peter Bauer, ECMWF
|
7
Peter Bauer, ECMWF
|
8
|
9
Source: Christoph Schär, ETH Zurich
|
10
relative frequency
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
cloud area [km2]
10
−2
10 10
2
10
4
71 64 54 47 43 grid-scale clouds [%] convective mass flux [kg m−2 s−1]
−10 −5 5 10 15
relative frequency
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10 10
−7
8 km 4 km 2 km 1 km 500 m
Source: Christoph Schär, ETH Zurich
| 11
Bjorn Stevens, MPI-M
|
12
|
12
|
13
|
14
| 15
September 15, 2015
John Russell
| 16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
Requirements from MeteoSwiss
| 16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
1.7x from software refactoring (old vs. new implementation on x86) Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
1.7x from software refactoring (old vs. new implementation on x86) 2.8x Mathematical improvements (resource utilisation, precision) Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
1.7x from software refactoring (old vs. new implementation on x86) 2.8x Mathematical improvements (resource utilisation, precision) 2.3x Change in architecture (CPU GPU) Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
1.7x from software refactoring (old vs. new implementation on x86) 2.8x Mathematical improvements (resource utilisation, precision) 2.8x Moore’s Law & arch. improvements on x86 2.3x Change in architecture (CPU GPU) Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
1.7x from software refactoring (old vs. new implementation on x86) 2.8x Mathematical improvements (resource utilisation, precision) 2.8x Moore’s Law & arch. improvements on x86 2.3x Change in architecture (CPU GPU) 1.3x additional processors Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
1.7x from software refactoring (old vs. new implementation on x86) 2.8x Mathematical improvements (resource utilisation, precision) 2.8x Moore’s Law & arch. improvements on x86 2.3x Change in architecture (CPU GPU) 1.3x additional processors Requirements from MeteoSwiss
|
16
Constant budget for investments and operations Grid 2.2 km 1.1 km
Ensemble with multiple forecasts Data assimilation
1.7x from software refactoring (old vs. new implementation on x86) 2.8x Mathematical improvements (resource utilisation, precision) 2.8x Moore’s Law & arch. improvements on x86 2.3x Change in architecture (CPU GPU) 1.3x additional processors Requirements from MeteoSwiss
|
17
|
17 0.01 0.1 1 10 100 4888 10 100 1000 SYPD #nodes Δx = 19 km, P100 Δx = 19 km, Haswell Δx = 3.7 km, P100 Δx = 3.7 km, Haswell Δx = 1.9 km, P100 Δx = 930 m, P100
(Filled sym- symbols) on er node.
h∆xi #nodes ∆t [s] SYPD MWh/SY gridpoints 930 m 4,888 6 0.043 596 3.46⇥1010 1.9 km 4,888 12 0.23 97.8 8.64 ⇥ 109 47 km 18 300 9.6 0.099 1.39 ⇥ 107
(c) Time compression (SYPD) and energy cost (MWh/SY) for three moist simulations. At 930 m grid spacing ob- tained with a full 10d simulation, at 1.9 km from 1,000 steps, and at 47 km from 100 steps
y of the time compression achieved in terms of SYPD.
|
17 0.01 0.1 1 10 100 4888 10 100 1000 SYPD #nodes Δx = 19 km, P100 Δx = 19 km, Haswell Δx = 3.7 km, P100 Δx = 3.7 km, Haswell Δx = 1.9 km, P100 Δx = 930 m, P100
(Filled sym- symbols) on er node.
h∆xi #nodes ∆t [s] SYPD MWh/SY gridpoints 930 m 4,888 6 0.043 596 3.46⇥1010 1.9 km 4,888 12 0.23 97.8 8.64 ⇥ 109 47 km 18 300 9.6 0.099 1.39 ⇥ 107
(c) Time compression (SYPD) and energy cost (MWh/SY) for three moist simulations. At 930 m grid spacing ob- tained with a full 10d simulation, at 1.9 km from 1,000 steps, and at 47 km from 100 steps
y of the time compression achieved in terms of SYPD.
|
18
Near-global COSMO [Fuh2018] Global IFS [Wed2009] Value Shortfall Value Shortfall Horizontal resolution 0.93 km (non- uniform) 0.9x 1.25 km 1.25x Vertical resolution 60 levels (surface to 25 km) 3x 62 levels (surface to 40 km) 3x Time resolution 6 s (split-explicit with sub- stepping)
4x Coupled No 1.2x No 1.2x Atmosphere Non-hydrostatic
Double 0.6x Single
0.043 SYPD 23x 0.088 SYPD 11x Other (I/O, full physics, …) Limited I/O Only microphysics 1.5x Full physics, no I/O
65x 198x
|
19
|
19
|
19
|
19
|
19
|
19
|
19
|
19
100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
|
19
100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
|
19
100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
|
20
|
20
|
20 100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
|
20 100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
|
20 100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
0.01 0.1 1 10 100 4888 10 100 1000 SYPD #nodes Δx = 19 km, P100 Δx = 19 km, Haswell Δx = 3.7 km, P100 Δx = 3.7 km, Haswell Δx = 1.9 km, P100 Δx = 930 m, P100
|
20 100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
0.01 0.1 1 10 100 4888 10 100 1000 SYPD #nodes Δx = 19 km, P100 Δx = 19 km, Haswell Δx = 3.7 km, P100 Δx = 3.7 km, Haswell Δx = 1.9 km, P100 Δx = 930 m, P100
|
20 100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
0.01 0.1 1 10 100 4888 10 100 1000 SYPD #nodes Δx = 19 km, P100 Δx = 19 km, Haswell Δx = 3.7 km, P100 Δx = 3.7 km, Haswell Δx = 1.9 km, P100 Δx = 930 m, P100
|
20 100 200 300 400 500 600 0.1 1000 Memory BW (GB/s) Data size (MB) 28.2 1,000 100 1 0.1 362 10
COPY (double) a[i] = b[i] GPU STREAM (double) a[i] = b[i] (1D) AVG i-stride (float) a[i]=b[i-1]+b[i+1] 5-POINT (float) a[i] = b[i] + b[i+1] + b[i-1] + b[i+jstride] +b[i-jstride] COPY (float) a[i] = b[i]
0.01 0.1 1 10 100 4888 10 100 1000 SYPD #nodes Δx = 19 km, P100 Δx = 19 km, Haswell Δx = 3.7 km, P100 Δx = 3.7 km, Haswell Δx = 1.9 km, P100 Δx = 930 m, P100
|
21
Peter Bauer, ECMWF
|
21
Peter Bauer, ECMWF
|
22
|
23
| 24
| 24
| 24
| 24
| 24
| 25
Tim Palmer (U. of Oxford) Christoph Schar (ETH Zurich) Oliver Fuhrer (MeteoSwiss) Peter Bauer (ECMWF) Bjorn Stevens (MPI-M) Torsten Hoefler (ETH Zurich) Nils Wedi (ECMWF)