Numerical Reproducibility Challenges
- n Extreme Scale Multi-Threading GPUs
Dylan Chapp1, Travis Johnston1, Michela Becchi2, and Michela Taufer1
1University of Delaware 2University of Missouri
Numerical Reproducibility Challenges on Extreme Scale - - PowerPoint PPT Presentation
Numerical Reproducibility Challenges on Extreme Scale Multi-Threading GPUs Dylan Chapp 1 , Travis Johnston 1 , Michela Becchi 2 , and Michela Taufer 1 1 University of Delaware 2 University of Missouri Molecular Dynamics onto Accelerators MD
1University of Delaware 2University of Missouri
Constant energy MD simulation
Constant energy MD simulation
GPU single precision GPU single precision GPU double precision
GPU double precision
1 Allen and Tildesley, Oxford: Clarendon Press, (1987) 2 Bauer et al., J. Comput. Chem. 32(3): 375 – 385, 2011
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15
8
Distributed Error Magnitudes for 10,000 threads with values within (-1000, 1000)
Error magnitude Number of summation orders
9
10
11
1 Taufer et al. IPDPS (2010)
Value range: (10-1,100) & (106,107)
Value range: (10-1,100) & (106,107)
Value range: (10-1,100) & (106,107)
Value range: (10-1,100) & (106,107)
1.00E-17 1.00E-15 1.00E-13 1.00E-11 1.00E-09 1.00E-07 1.00E-05 1.00E-03 1.00E-01 1.00E+01 1.00E+03 1.00E+05 float double float2 double2
Average global summation Numeric format
1 thread 30 threads 60 threads 120 threads 240 threads
1.00E-17 1.00E-15 1.00E-13 1.00E-11 1.00E-09 1.00E-07 1.00E-05 1.00E-03 1.00E-01 1.00E+01 1.00E+03 1.00E+05 float double float2 double2
Average global summation Numeric format
1 thread 2 threads 4 threads 8 threads 16 threads
1.00E-17 1.00E-15 1.00E-13 1.00E-11 1.00E-09 1.00E-07 1.00E-05 1.00E-03 1.00E-01 1.00E+01 1.00E+03 1.00E+05 float double float2 double2
Average global summation Numeric format
1 thread 30 threads 60 threads 120 threads 240 threads
1.00E-17 1.00E-15 1.00E-13 1.00E-11 1.00E-09 1.00E-07 1.00E-05 1.00E-03 1.00E-01 1.00E+01 1.00E+03 1.00E+05 float double float2 double2
Average global summation Numeric format
1 thread 2 threads 4 threads 8 threads 16 threads