Fault Tolerance Techniques for Sparse Matrix Methods
Simon McIntosh-Smith Rob Hunt
1
Twitter: @simonmcs
Fault Tolerance Techniques for Sparse Matrix Methods Simon - - PowerPoint PPT Presentation
Fault Tolerance Techniques for Sparse Matrix Methods Simon McIntosh-Smith Rob Hunt An Intel Parallel Computing Center Twitter: @simonmcs 1 Acknowledgements Funded by FP7 Exascale project: Mont Blanc 2 Also supported by the
1
Twitter: @simonmcs
2
3
"High Performance in silico Virtual Drug Screening on Many-Core Processors",
DOI: 10.1177/1094342014528252
4
5
S.N. McIntosh-Smith, M. Boulton, D. Curran, & J.R. Price, “On the performance portability of structured grid codes on many-core computer architectures”, ISC, Leipzig, June 2014. DOI: 10.1007/978-3-319-07518-1_4
6
7
8
9
Checkpointing! & Restart (C/R)! Diskless ! Checkpointing ! Algorithm Based! Fault Tolerance! (ABFT)! Overhead!
Small!
Application Specificity!
Small!
Jack Dongarra, ISC, Leipzig, June 2014
10
11
12
13
14
15
31 32 63 64 127
16
17
18
19
20
21
20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Percentage of elements (%) Number of protected bits
The number of protected bits as a proportion of all row index elements
22
20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Percentage of elements (%) Number of protected bits
The number of protected bits as a proportion of all row index elements
23
24
25
26
27
31 32 63 64 127
28
29
[1] "Fault Tolerance Techniques for Sparse Matrix Methods",
[2] "High Performance in silico Virtual Drug Screening on Many-Core Processors", S. McIntosh-Smith, J. Price, R.B. Sessions, A.A. Ibarra, IJHPCA 2014. DOI: 10.1177/1094342014528252 [3] "On the performance portability of structured grid codes on many-core computer architectures", S.N. McIntosh-Smith, M. Boulton, D. Curran and J.R. Price. ISC, Leipzig, June 2014. DOI: 10.1007/978-3-319-07518-1_4 [4] "Accelerating hydrocodes with OpenACC, OpenCL and CUDA", Herdman, J., Gaudin, W., McIntosh-Smith, S., Boulton, M., Beckingsale, D., Mallinson, A., Jarvis, S. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:. (Nov 2012) 465-471. DOI: 10.1109/ SC.Companion.2012.66
30
31