SLIDE 1 Semi-Automatic Floating-Point Implementation of Special Functions
Christoph Lauter1 Marc Mezzarobba1,2
Pequan group
1Université Paris 6 2CNRS
ARITH 22, Lyon, 2015-06-23
SLIDE 2
Code Generation for Mathematical Functions
f = log(x) explicit function f = esin(x) expression Φ(f) = 0 equation x → f(x) black box
}main() { int temp; float celsius; char repeat; char flag; do { flag='n"; do { if(flag=='n') printf("Input a valid temperature :"); else printf("input a valid temperature,stupid:"); scanf("%d",&temp); flag='y'; } while (temp<0||temp >100); celsius=(5.0/9.0)*(temp-32); printf("%d degrees F is %6.2f degrees celsius\n",temp,cels printf("Do you have another temperature?"); repeat=getchar(); putchar('\n');
❞♦✉❜❧❡ ❢✉♥✭❞♦✉❜❧❡ ①✮ ④ ✳✳✳ ⑥
SLIDE 3
Code Generation for Mathematical Functions
f = log(x) explicit function f = esin(x) expression Φ(f) = 0 equation x → f(x) black box
}main() { int temp; float celsius; char repeat; char flag; do { flag='n"; do { if(flag=='n') printf("Input a valid temperature :"); else printf("input a valid temperature,stupid:"); scanf("%d",&temp); flag='y'; } while (temp<0||temp >100); celsius=(5.0/9.0)*(temp-32); printf("%d degrees F is %6.2f degrees celsius\n",temp,cels printf("Do you have another temperature?"); repeat=getchar(); putchar('\n');
❞♦✉❜❧❡ ❢✉♥✭❞♦✉❜❧❡ ①✮ ④ ✳✳✳ ⑥
SLIDE 4 Our Focus: Special Functions
–2 –1 1 2 –6 –4 –2 2 4 6
Ai(x)
–2 –1 1 2 –6 –4 –2 2 4 6
Ci(x)
–2 –1 1 2 –6 –4 –2 2 4 6
erf(x)
–2 –1 1 2 –6 –4 –2 2 4 6
J0(x)
D-finite Functions
= solns of linear ODEs with polynomial coeffts pr(x) · f (r)(x) + · · · + p1(x) · f ′(x) + p0(x) · f(x) = 0 p0, p1, . . . , pr ∈ R[x] Example: f(x) = arctan(x) (x2 + 1) · f ′′(x) + 2x · f ′(x) = 0, f(0) f ′(0)
1
SLIDE 5 Problem Statement
- prf (r) + · · · + p0f = 0,
f(0), . . . , f (r−1)(0) D ⊆ R, ε > 0
}main() { int temp; float celsius; char repeat; char flag; do { flag='n"; do { if(flag=='n') printf("Input a valid temperature :"); else printf("input a valid temperature,stupid:"); scanf("%d",&temp); flag='y'; } while (temp<0||temp >100); celsius=(5.0/9.0)*(temp-32); printf("%d degrees F is %6.2f degrees celsius\n",temp,cels printf("Do you have another temperature?"); repeat=getchar(); putchar('\n');
❞♦✉❜❧❡ ❢✉♥✭❞♦✉❜❧❡ ①✮❀ ∀x ∈ D ∩ ❞♦✉❜❧❡,
f(x)
◮ “Semi-automatic”: may require implementation hints
Future goal: full automation
◮ Rigorous in principle, some shortcuts in current prototype
SLIDE 6 Overview
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ NumGfun (Maple) ◮ ≈ rigorous multiple
precision solver for ODE
◮ Sollya ◮ ≈ Metalibm-lutetia
(cf. previous talk by
sets of polynomial approximations
SLIDE 7 The Frontend: Taylor Series
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code pr(x) · f (r)(x) + · · · p0(x) · f(x) = 0 Ansatz: f(x) =
∞
∑
n=0
cnxn
◮ f ′(x) =
∞
∑
n=0
(n + 1)cn+1xn
◮ x · f(x) =
∞
∑
n=1
cn−1xn
cn = b0(n)cn−1 + b1(n)cn−2 + · · · f(x) = c0 + c1x + · · · + cd−1xd−1 + · · ·
SLIDE 8 The Frontend: Error Bounds
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code f(x) =
d−1
∑
n=0
cnxn +
∞
∑
n=d
cnxn
?
|x| ≤ ρ “Majorizing series”: using the ODE, find a simple ˆ f(x) = ∑ ˆ cnxn such that |cn| ≤ ˆ cn for all n.
d−1
∑
n=0
cnxn
∞
∑
n=d
ˆ cnρn
!
≤ ε.
SLIDE 9
The Frontend: Analytic Continuation
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ What if D ⊆ [−ρ, ρ]?
i −i
SLIDE 10 The Frontend: Analytic Continuation
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ What if D ⊆ [−ρ, ρ]?
i −i x1
f(x1)
f ′(x1)
1.0000... 0.5404...
0.0000... 0.7352...
f(0)
f ′(0)
SLIDE 11 The Frontend: Analytic Continuation
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ What if D ⊆ [−ρ, ρ]?
i −i x1
f(x1)
f ′(x1)
1.0000... 0.5404...
0.0000... 0.7352...
f(0)
f ′(0)
SLIDE 12 The Frontend: Analytic Continuation
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ What if D ⊆ [−ρ, ρ]?
i −i x1 x2
f(x1)
f ′(x1)
1.0000... 0.5404...
0.0000... 0.7352...
f(0)
f ′(0)
f ′(x2)
1.0000... 0.5095...
0.0000... 0.5055...
f(x1)
f ′(x1)
SLIDE 13 The Frontend: Analytic Continuation
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ What if D ⊆ [−ρ, ρ]?
i −i x1 x2
f(x1)
f ′(x1)
1.0000... 0.5404...
0.0000... 0.7352...
f(0)
f ′(0)
f ′(x2)
1.0000... 0.5095...
0.0000... 0.5055...
f(x1)
f ′(x1)
SLIDE 14 The Frontend: Economization
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code Have:
- f(x) − (c0 + · · · + cd−1xd−1)
- ≤ ε
2 x ∈ C, |x| ≤ 1 c0 + · · · + c7 x7 + c8 x8 + c9 x9 c(1) + · · · + c(1)
7 x7 + c(1) 8 x8
c(2) + · · · + c(2)
7 x7
. . . |Tn| ≤ 1 over [−1, 1] − c9 29 T9(x) − c8 28 T8(x)
+ · · · + c(k)
d−1−kxk )
x ∈ [−1, 1]
SLIDE 15 The Backend: Tight Approximation
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ Frontend:
D =
x ∈ Di |pi(x) − f(x)| ≤ ε deg pi = O(500)
◮ Backend:
D =
i
x ∈ D⋆
i
i (x)
f(x) − 1
deg p⋆
i = O(10)
SLIDE 16 The Backend: Tight Approximation
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ Backend: leveraging Metalibm
◮ Reuse domain splitting algorithm
sketched in last talk D =
i
◮ Reuse minimax implementation
with relative error bounds x ∈ D⋆
i
i (x)
f(x) − 1
◮ Addressing issues
◮ Pure black-box interface too slow ◮ Zeros of f in domain D?
SLIDE 17
The Backend: FP Polynomials
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ Implementation need: p⋆ i ∈ F[x] ◮ Leverage existing FP minimax
heuristics
SLIDE 18
The Backend: FP Polynomials
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ Implementation need: p⋆ i ∈ F[x] ◮ Leverage existing FP minimax
heuristics
◮ unless f has a zero in the domain
SLIDE 19 The Backend: Zeros of f
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ Suppose f(c) = 0 for c ∈ D ◮ We want
i (x)
f(x) − 1
◮ We need p⋆ i (c) = 0 ◮ No if p⋆ i ∈ F[x] but c ∈ R\F
SLIDE 20 The Backend: Zeros of f
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ Suppose f(c) = 0 for c ∈ D ◮ Actually we want
∀x ∈ F.
i (x)
f(x) − 1
◮ p⋆ i (x) never needs to be 0 exactly ◮ Okay but we need to compute p⋆ i
◮ Express c as a symbol, evaluated
with Newton-Raphson on f
◮ Give f(x − c)/xk to minimax
algorithm, yielding q
◮ Deduce p⋆
i from xk q(x)
SLIDE 21
The Backend: Evaluation scheme
Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code
◮ Reuse Metalibm-Lutetia to generate
Horner scheme
◮ We could also use
Metalibm-Lugdunum here
SLIDE 22 Examples: erfc
–0.5 0.5 1 1.5 2 –2 –1 1 2 x
◮ Consider erfc(x) = 1 −
2 √π
x
0 e−t2dt
- ver D = [−2; 2] at ε = 2−62
◮ Describe erfc with
f ′′(x) + 2xf ′(x) = 0, f(0) = 1, f ′(0) = −2 √π
◮ Generated code has domain split
into 16 subdomains
◮ Code runs in 110 to 350 cycles,
libm exponential in 80 cycles
◮ Code is bitwise the same if we take
MPFR’s erfc instead of ODE
- 1.2e-19
- 1e-19
- 8e-20
- 6e-20
- 4e-20
- 2e-20
2e-20 4e-20 6e-20 8e-20 1e-19
0.5 1 1.5 2
SLIDE 23 Examples: J0
–1 –0.8 –0.6 –0.4 –0.2 0.2 0.4 0.6 0.8 1 10 20 30 40 x
◮ Consider Bessel function J0
- ver D = [0.5; 42] at ε = 2−45
◮ J0 is given as
xf ′′(x) + f ′(x) + xf(x) = 0, f(x) → 1, x → 0
◮ Code generation takes about 30
minutes
◮ Generator has to cope with 13 zeros
in the domain
◮ Code runs in 60 to 500 cycles,
libm exponential in 80 cycles
- 2e-14
- 1.5e-14
- 1e-14
- 5e-15
5e-15 1e-14 1.5e-14 5 10 15 20 25 30 35 40
SLIDE 24
Conclusion and Outlook
◮ Don’t code. Have your special functions generated! ◮ Process starts from the very basic definition ◮ Only small domains handled fully automatically ◮ Full range implementions require manual intervention ◮ Frontend-backend interface not satisfactory ◮ Code generation performance unpredictable
SLIDE 25
Thanks!
Thank you for your attention! Questions?