Radiation Reliability Issues in Current and Future Supercomputers
September 26th 2017 – Grenoble, France
Radiation Reliability Issues in Current and Future Supercomputers - - PowerPoint PPT Presentation
September 26 th 2017 Grenoble, France PAOLO RECH Radiation Reliability Issues in Current and Future Supercomputers Sponsors HPC reliability importance Paolo Rech Grenoble, France 2 Available Accelerators Modern parallel
September 26th 2017 – Grenoble, France
Paolo Rech – Grenoble, France
2
Paolo Rech – Grenoble, France
3
Paolo Rech – Grenoble, France
3
Paolo Rech – Grenoble, France
3
Paolo Rech – Grenoble, France
*(field and experimental data from HPCA’15)
4
Paolo Rech – Grenoble, France
4
Paolo Rech – Grenoble, France
5
Paolo Rech – Grenoble, France
Paolo Rech – Grenoble, France
6
Paolo Rech – Grenoble, France
*JEDEC JESD89A Standard
6
Paolo Rech – Grenoble, France
7
Paolo Rech – Grenoble, France
7
Paolo Rech – Grenoble, France
IONIZING PARTICLE
IONIZING PARTICLE
8
Paolo Rech – Grenoble, France
9
Paolo Rech – Grenoble, France
SM
Blocks Scheduler and Dispatcher L2 Cache
SM SM SM SM SM SM SM SM SM SM SM Streaming Multiprocessor
Instruction Cache Warp Scheduler Dispatch Unit Register File
core core core core
core core
Shared Memory / L1 Cache
core core
Warp Scheduler Dispatch Unit
SM SM SM SM SM SM SM SM SM SM SM SM
core
core core core core core core core
10
Paolo Rech – Grenoble, France
11
Paolo Rech – Grenoble, France
11
Paolo Rech – Grenoble, France
11
Paolo Rech – Grenoble, France
Paolo Rech – Grenoble, France
12
Paolo Rech – Grenoble, France
13
Paolo Rech – Grenoble, France
14
Paolo Rech – Grenoble, France
15
Paolo Rech – Grenoble, France
23/48
Paolo Rech – Grenoble, France
17
Paolo Rech – Grenoble, France
18
Paolo Rech – Grenoble, France
19
Paolo Rech – Grenoble, France
1 10 100 1000 Xeon Phi K40 15 19 23 210 211 212 Hotspot CLAMR N/A lavaMD DGEMM SDC Relative FIT [a.u.]
20
Paolo Rech – Grenoble, France
200 400 600 800 50 100 150 200 250 300 15 19 23 lavaMD 210 211 212 DGEMM Relative FIT [a.u.] Relative FIT [a.u.]
K40 Xeon Phi 21
Paolo Rech – Grenoble, France
22
Paolo Rech – Grenoble, France
29x29 210x210 211x211 212x212 213x213 DGEMM GFlops
0.00E+00 2.00E+02 4.00E+02 6.00E+02 8.00E+02 1.00E+03 1.20E+03
Xeon Phi K40
23
Paolo Rech – Grenoble, France
24
Paolo Rech – Grenoble, France
25
Paolo Rech – Grenoble, France
26
Paolo Rech – Grenoble, France
27
Paolo Rech – Grenoble, France
x x x xx x x x x x x x x x x x x x x x x x x
28
Paolo Rech – Grenoble, France
x x x xx
x x x x x x x x x x x x x x x x x x
28
Paolo Rech – Grenoble, France
29 K40 Xeon Phi
Paolo Rech – Grenoble, France
29 K40 Xeon Phi
Paolo Rech – Grenoble, France
29 K40 Xeon Phi
Paolo Rech – Grenoble, France
29 K40 Xeon Phi
Paolo Rech – Grenoble, France
29 K40 Xeon Phi
Paolo Rech – Grenoble, France
29 K40 Xeon Phi
Paolo Rech – Grenoble, France
29
K40 Xeon Phi
Paolo Rech – Grenoble, France
Paolo Rech – Grenoble, France
1 10 100 1000 10000
MxM MTrans FFT NW lavaMD Hotspot
35
00100000100000000 00000000100000000 00100000000000000 OK
data from Oliveira et
2016
Paolo Rech – Grenoble, France
1 10 100 1000 10000
MxM FFT NW lavaMD Hotspot
ECC OFF ECC ON
36
Paolo Rech – Grenoble, France
MxM FFT NW lavaMD Hotspot
1 10 100 1000 10000
ECC OFF ECC ON
37
00100000100000000
Paolo Rech – Grenoble, France
MxM FFT NW lavaMD Hotspot
1 10 100 1000 10000
38
Paolo Rech – Grenoble, France
Freivalds ’79
Huang and Abraham ’84 Rech et al., TNS ‘13 39
Paolo Rech – Grenoble, France
x0 x1 x2 x3 xN-2 xN-1
÷(2+w-0) ÷(2+w-1) ÷(2+w-2) ÷(2+w-3) ÷(2+w-N-2) ÷(2+w-N-1)
*J.Y. Jou and Abraham ‘88 40
Paolo Rech – Grenoble, France
1 10 100 1000 10000
Unhardened ECC ABFT 41
Paolo Rech – Grenoble, France
SM0
SM1
time
SM0
SM1
time
SM0
SM1
time
42
Paolo Rech – Grenoble, France
1 10 100 1000 Unhardened ECC Spatial DWC E-O Spatial DWC Time DWC
*details on Oliveira et al.
43
Paolo Rech – Grenoble, France
44
Paolo Rech – Grenoble, France
45
45
Paolo Rech – Grenoble, France
45
45
Paolo Rech – Grenoble, France
45
45
Paolo Rech – Grenoble, France
45
45
Paolo Rech – Grenoble, France
46
K40 K40 ECC Titan X
Paolo Rech – Grenoble, France
K40 K40 ECC Titan X
45
Paolo Rech – Grenoble, France
46
Paolo Rech – Grenoble, France
46
Paolo Rech – Grenoble, France
47
Paolo Rech – Grenoble, France
47
Paolo Rech – Grenoble, France
48
Paolo Rech – Grenoble, France
49
Paolo Rech – Grenoble, France
49
Paolo Rech – Grenoble, France
49
Paolo Rech – Grenoble, France
50
Paolo Rech – Grenoble, France
51
Paolo Rech – Grenoble, France
52
Paolo Rech – Grenoble, France
53
Paolo Rech – Grenoble, France
54
Paolo Rech – Grenoble, France
55
Paolo Rech – Grenoble, France
56
Paolo Rech – Grenoble, France
56
Paolo Rech – Grenoble, France
56
Paolo Rech – Grenoble, France
56
Paolo Rech – Grenoble, France