Scalable Precision Tuning of Numerical Software
Cindy Rubio-González
Department of Computer Science University of California, Davis
Best Practices for HPC Software Developers Webinar, October 14th, 2020
Scalable Precision Tuning of Numerical Software Cindy Rubio-Gonzlez - - PowerPoint PPT Presentation
Scalable Precision Tuning of Numerical Software Cindy Rubio-Gonzlez Department of Computer Science University of California, Davis Best Practices for HPC Software Developers Webinar, October 14 th , 2020 Floating-Point Precision Tuning
Best Practices for HPC Software Developers Webinar, October 14th, 2020
2
1 long double fun(long double p) { 2 long double pi = acos(-1.0); 3 long double q = sin(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 long double a, b; 9 long double h, s, x; 10 const long double fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 … 28 }
3
Original Program
4
Original Program
Tuned Program
1 long double fun(double p) { 2 double pi = acos(-1.0); 3 long double q = sinf(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 float a, b; 9 double s, x; float h; 10 const long float fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 … 28 } 1 long double fun(long double p) { 2 long double pi = acos(-1.0); 3 long double q = sin(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 long double a, b; 9 long double h, s, x; 10 const long double fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 … 28 }
5
6
7
Precimonious HiFPTuner
– Leverages relationship among variables to reduce search space and number of runs
– Black-box approach to systematically search over variable types and functions
TYPE CONFIGURATION
TEST INPUTS SOURCE CODE
Annotated with error threshold Less Precision Speedup Result within error threshold for all test inputs
8
Search over types of variables and function implementations
“Precimonious: Tuning Assistant for Floating-Point Precision”, SC 2013.
https://github.com/ucd-plse/precimonious
Dynamic Analysis for Floating-Point Precision Tuning
– Resulting program produces an “accurate enough” answer – Resulting program is faster faster than the original program
9
[1] A. Zeller and R. Hildebrandt. “Simplifying and Isolating Failure-Inducing Input”, TSE 2002.
double precision single precision
10
double precision single precision
11
double precision single precision
12
double precision single precision
13
double precision single precision
14
double precision single precision
15
double precision single precision
Failed configurations Proposed configuration
16
– LLVM IR
17
http://fpanalysistools.org – Dockerfile and examples can be found at https://github.com/ucd-plse/tutorial-precision-tuning
18
19
20
[1] W. Chiang, G. Gopalakrishnan, Z. Rakamaric and A. Solovyev. “Efficient Search for Inputs Causing High Floating-point Errors”, PPoPP 2014. [2] H. Guo and C. Rubio-González. “Efficient Generation of Error-Inducing Floating-Point Inputs via Symbolic Execution”, ICSE 2020.
21
Precimonious
– Black-box approach to systematically search over variable types and functions
HiFPTuner
– Leverages relationship among variables to reduce search space and number of runs
22
Uses lower precision Speedup: 78.7% Shifts precision less often Speedup: 90%
23
1 2 3 4 5 6 7 8 1 4 3 6 8 2 5 7 3 6 8 1 4 2 5 7 Search top to bottom Level 0 Level 1 Level 2
Speeds up program by reducing precision with respect to accuracy constraint
24
SOURCE CODE
Weighted Dependence Graph
TEST INPUTS
Ordered Community Structure of Variables
TYPE CONFIGURATION
Accuracy Constraint
https://github.com/ucd-plse/HiFPTuner
Hierarchical Floating-Point Precision Tuning
25
pi, p, q a, b, h, x s a b h pi p q x s
Ordered community structure
26
http://fpanalysistools.org – Dockerfile and examples can be found at https://github.com/ucd-plse/tutorial-precision-tuning
27
+ Considers both accuracy and performance + Works for medium size non- trivial programs + Easily configurable
configurations
give different results + White-box hierarchical approach, groups variables based on their usage + Over twice as fast as Precimonious + Finds configurations that lead to higher speedups
type configuration + Performs shadow execution, requires a single run of the program + Identifies variables that can be single precision + Combined with Precimonious leads to 9x faster analysis
performance
execution engine
28
Precimonious HiFPTuner Blame Analysis [1]
PROS CONS
[1] C. Rubio-González, C. Nguyen, B. Mehne, K. Sen, J. Demmel, W. Kahan, C. Iancu, W. Lavrijsen, D.H. Bailey and D. Hough. “Floating-Point Precision Tuning Using Blame Analysis”, ICSE 2016.
29
30
GPU Scientific Applications”, ISC 2019.
Precision”. CORRECTNESS@SC 2019.
P.V. Kotipalli, R. Singh, P. Wood, I. Laguna and S. Bagchi. “AMPT-GA: Automatic Mixed Precision Floating Point Tuning for GPU Applications”. ICS 2019.
Algorithmic Differentiation Applied to Floating-Point Precision Tuning”, SC 2018.
Floating-Point Mixed-Precision Tuning”. POPL 2017.
Surveys 2020.
31
Co-Organized with Ignacio Laguna from Lawrence Livermore National Lab November 11th, 2020 (half day, 2:30pm to 6:30pm EDT)
32
Cuong Nguyen Diep Nguyen James Demmel William Kahan Koushik Sen David Bailey Costin Iancu David Hough
UC Berkeley Oracle LBNL
Ben Mehne Wim Lavrijsen 33 Hui Guo
UC Davis
34