[PPT] - Scalable Precision Tuning of Numerical Software Cindy Rubio-Gonzlez PowerPoint Presentation

SLIDE 1

Scalable Precision Tuning of Numerical Software

Cindy Rubio-González

Department of Computer Science University of California, Davis

Best Practices for HPC Software Developers Webinar, October 14th, 2020

SLIDE 2

Reasoning about floating-point programs is difficult
Large variety of numerical problems
Most programmers not expert in floating point
Common practice: use highest

available precision

Disadvantage: more expensive!
Automated techniques for tuning precision

Given : Accuracy Requirement Action: Reduce precision Goal : Accuracy and/or Performance

2

Floating-Point Precision Tuning

SLIDE 3

1 long double fun(long double p) { 2 long double pi = acos(-1.0); 3 long double q = sin(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 long double a, b; 9 long double h, s, x; 10 const long double fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 … 28 }

3

Original Program

Precision Tuning Example

Tuned Program

Error threshold 10-8

SLIDE 4

4

Original Program

Precision Tuning Example

Tuned Program

1 long double fun(double p) { 2 double pi = acos(-1.0); 3 long double q = sinf(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 float a, b; 9 double s, x; float h; 10 const long float fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 … 28 } 1 long double fun(long double p) { 2 long double pi = acos(-1.0); 3 long double q = sin(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 long double a, b; 9 long double h, s, x; 10 const long double fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 … 28 }

Tuned program runs 78.7% faster!

SLIDE 5

Searching efficiently over variable types and

function implementations

– Naïve approach → exponential time

2n or 3n where n is the number of variables

– Global minimum vs. a local minimum

Evaluating type configurations

– Less precision → not necessarily faster – Based on run time, energy consumption, etc.

Determining accuracy constraints

– How accurate must the final result be? – What error threshold to use?

Challenges in Precision Tuning

5

SLIDE 6

Reducing precision vs. improving performance

– Different objectives

Dynamic vs. static approaches

– Dynamic: Performed at runtime, requires program inputs, handles larger and more complex code, no guarantees for untested inputs – Static: Analyzes program without running it, limitations with certain program structures (e.g., loops), formal guarantees for analyzed code

Instructions vs. variables vs. function calls

– Various granularities of program transformation – Different scopes

Binary vs. IR vs. source code

– Tradeoff between granularity of transformation and tool usability

Precision Tuning Approaches

6

SLIDE 7

7

Dynamic Tools for Precision Tuning

Precimonious HiFPTuner

Hierarchical Precision Tuner

– Leverages relationship among variables to reduce search space and number of runs

Dynamic Analysis for Precision Tuning

– Black-box approach to systematically search over variable types and functions

SLIDE 8

TYPE CONFIGURATION

PRECIMONIOUS

TEST INPUTS SOURCE CODE

PRECIMONIOUS

Annotated with error threshold Less Precision Speedup Result within error threshold for all test inputs

8

Search over types of variables and function implementations

C. Rubio-González, C. Nguyen, H. D. Nguyen, J. Demmel, W. Kahan, K. Sen, D.H. Bailey, C. Iancu, and D. Hough.

“Precimonious: Tuning Assistant for Floating-Point Precision”, SC 2013.

https://github.com/ucd-plse/precimonious

Dynamic Analysis for Floating-Point Precision Tuning

SLIDE 9

Based on the Delta-Debugging Search Algorithm [1]
Change the types of variables and function calls

– Examples: double x → float x, sin → sinf

Our success criteria

– Resulting program produces an “accurate enough” answer – Resulting program is faster faster than the original program

Main idea

– Start by associating each variable with set of types

Example: x → {long double, double, float}

– Refine set until it contains only one type

Find a local minimum

– Lowering the precision of one more variable violates success criteria

Search Algorithm

9

[1] A. Zeller and R. Hildebrandt. “Simplifying and Isolating Failure-Inducing Input”, TSE 2002.

SLIDE 10