Exploiting Community Structure for Floating-Point Precision Tuning
ISSTA’18 – Amsterdam, Netherlands, July 2018
Exploiting Community Structure for Floating-Point Precision Tuning - - PowerPoint PPT Presentation
Exploiting Community Structure for Floating-Point Precision Tuning Hui Guo Cindy Rubio-Gonzlez ISSTA18 Amsterdam, Netherlands, July 2018 Background Floating-point (FP) arithmetic used in many domains Reasoning about FP programs
ISSTA’18 – Amsterdam, Netherlands, July 2018
2
1 long double fun(long double p) { 2 long double pi = acos(-1.0); 3 long double q = sin(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 long double a, b; 9 long double h, s, x; 10 const long double fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 //final answer:(long double)h *s/3.0 28 }
3
Original Program
4
Original Program
Tuned Program
1 long double fun(long double p) { 2 long double pi = acos(-1.0); 3 long double q = sin(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 long double a, b; 9 long double h, s, x; 10 const long double fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 //final answer:(long double)h *s/3.0 28 } 1 long double fun(double p) { 2 double pi = acos(-1.0); 3 long double q = sinf(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 float a, b; 9 double s, x; float h; 10 const long float fuzz = 1e-26; 11 const int n = 2000000; 12 … 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 //final answer:(long double)h *s/3.0 28 }
double precision single precision
5
double precision single precision
6
double precision single precision
7
double precision single precision
8
double precision single precision
9
double precision single precision
Failed configurations Proposed configuration
10
11
Uses lower precision Speedup: 78.7% Shifts precision less often Speedup: 90%
12
1 2 3 4 5 6 7 8 1 4 3 6 8 2 5 7 3 6 8 1 4 2 5 7 Search top to bottom Level 0 Level 1 Level 2
Speeds up program by reducing precision with respect to accuracy constraint
13
SOURCE CODE
Weighted Dependence Graph
TEST INPUTS
Ordered Community Structure of Variables
TYPE CONFIGURATION
Accuracy Constraint
14
1 long double fun(long double p) { 2 long double pi = acos(-1.0); 3 long double q = sin(pi * p); 4 return q; 5 } 6 7 void simpsons() { 8 long double a, b; 9 // subinterval length, integral approximation, x 10 long double h,s,x; 11 const long double fuzz = 1e-26; 12 const int n = 2000000; 13 a = 0.0; 14 b = 1.0; 15 h = (b - a) / n; 16 x = a; 17 s = fun(x); 18 L100: 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 22 if (x + fuzz >= b) goto L110; 23 s = s + 2.0 * fun(x); 24 goto L100; 25 L110: 26 s = s + fun(x); 27 printf("%1.16Le\n", (long double)h * s / 3.0); 28 }
Identify assignments to floating-point variables
15
2 long double pi = acos(-1.0); 3 long double q = sin(pi * p); 11 const long double fuzz = 1e-26; 13 a = 0.0; 14 b = 1.0; 15 h = (b - a) / n; 16 x = a; 17 s = fun(x); 19 x = x + h; 20 s = s + 4.0 * fun(x); 21 x = x + h; 23 s = s + 2.0 * fun(x); 26 s = s + fun(x); a h x b
fuzz
s
Variable dependence
p q pi
Variables variables in main variables in fun
1 1 1 2000000 2000000 2000000 2000001 2000001 2000001
Weighted dependence graph
16
c3 x a h b q p c1 pi c2
4000003 2000001 2000000 4000002 2000000 1 1 1 2000000 2000001 2000000 2000001 2000001
s
Top
Bottom x a h b q p c1 pi c2 c3 s
Community structure of floating-point variables
[1] M. E. Newman. Fast algorithm for detecting community structure in networks. Physical review E, 2004. [2] M. E. Newman. Modularity and community structure in networks. Proceedings of the national academy of sciences, 2006.
17
Ordered community structure
c3 x a h b q p c1 pi c2
4000003 2000001 2000000 4000002 2000000 1 1 1 2000000 2000001 2000000 2000001 2000001
s
Top
Bottom x a h b q p c1 pi c2 c3 s
c1 c3 x a h b q p c2 pi s
18
pi, p, q a, b, h, x s a b h pi p q x s Original precision configuration Top-level precision configuration Bottom-level precision configuration
Reduce precision to speed up program Reduce precision to speed up program
c3 x a h b q p c1 pi c2
4000003 2000001 2000000 4000002 2000000 1 1 1 2000000 2000001 2000000 2000001 2000001
s
Top
Bottom x a h b q p c1 pi c2 c3 s
TYPE CONFIGURATION
19
20
Program L D F C simpsons 9 2 arclenght 8 3 piqpr 17 fft 22 1 2 gaussian 56 2 sum 34 2 bessel 24 5 ep 13 4 cp 32 3 Initial Type Configuration L2 L1 L0
11
11
17 11 14 25 18 22 58 23 24 36 11 14 29 9 9 17 21 24 35 Communities # Items 11 11 17 25 58 36 29 17 35 Items to Tune
21
22
24 30 52 43 211 533 45 497 116 142 164 297 275 433 77 735
100 200 300 400 500 600 700 800 simpsons arclength piqpr fft gaussian sum ep cp
Number of Configurations
HiFPTuner Precimonious
L D F S 1 3 5 1 116 7 1 1 142 3 13 1 164 21 2 297 56 2 275 34 2 433 13 4 77 32 3 735 Program L D F C simpsons 9 2 arclenght 8 3 piqpr 17 fft 22 2 gaussian 56 2 sum 34 2 ep 13 4 cp 32 3 L D F S 8 1 1 24 7 1 1 30 3 14 52 22 2 43 10 46 2 211 10 24 2 533 13 4 45 24 8 3 497
Initial Type Configuration HiFPTuner Error threshold: 10-8
23
Precimonious Error threshold: 10-8
24 20 40 60 80 100 120 1uPEHr of ExplorHd Configurations 40 64 80 AvHragH PrHFision
simpsons 10−8
PrHFiPonious HiFPTunHr HiFPTunHr lHvHl linH
20 40 60 80 100 120 1uPEHr of ExplorHd Configurations 40 64 80 AvHragH PrHFision
simpsons 10−8
PrHFiPonious HiFPTunHr HiFPTunHr lHvHl linH
25
26