Semi-Automatic Floating-Point Implementation of Special Functions - - PowerPoint PPT Presentation

semi automatic floating point implementation of special
SMART_READER_LITE
LIVE PREVIEW

Semi-Automatic Floating-Point Implementation of Special Functions - - PowerPoint PPT Presentation

Semi-Automatic Floating-Point Implementation of Special Functions Christoph Lauter 1 Marc Mezzarobba 1,2 Pequan group 1 Universit Paris 6 2 CNRS ARITH 22, Lyon, 2015-06-23 Code Generation for Mathematical Functions expression explicit


slide-1
SLIDE 1

Semi-Automatic Floating-Point Implementation of Special Functions

Christoph Lauter1 Marc Mezzarobba1,2

Pequan group

1Université Paris 6 2CNRS

ARITH 22, Lyon, 2015-06-23

slide-2
SLIDE 2

Code Generation for Mathematical Functions

f = log(x) explicit function f = esin(x) expression Φ(f) = 0 equation x → f(x) black box

}main() { int temp; float celsius; char repeat; char flag; do { flag='n"; do { if(flag=='n') printf("Input a valid temperature :"); else printf("input a valid temperature,stupid:"); scanf("%d",&temp); flag='y'; } while (temp<0||temp >100); celsius=(5.0/9.0)*(temp-32); printf("%d degrees F is %6.2f degrees celsius\n",temp,cels printf("Do you have another temperature?"); repeat=getchar(); putchar('\n');

❞♦✉❜❧❡ ❢✉♥✭❞♦✉❜❧❡ ①✮ ④ ✳✳✳ ⑥

slide-3
SLIDE 3

Code Generation for Mathematical Functions

f = log(x) explicit function f = esin(x) expression Φ(f) = 0 equation x → f(x) black box

}main() { int temp; float celsius; char repeat; char flag; do { flag='n"; do { if(flag=='n') printf("Input a valid temperature :"); else printf("input a valid temperature,stupid:"); scanf("%d",&temp); flag='y'; } while (temp<0||temp >100); celsius=(5.0/9.0)*(temp-32); printf("%d degrees F is %6.2f degrees celsius\n",temp,cels printf("Do you have another temperature?"); repeat=getchar(); putchar('\n');

❞♦✉❜❧❡ ❢✉♥✭❞♦✉❜❧❡ ①✮ ④ ✳✳✳ ⑥

slide-4
SLIDE 4

Our Focus: Special Functions

–2 –1 1 2 –6 –4 –2 2 4 6

Ai(x)

–2 –1 1 2 –6 –4 –2 2 4 6

Ci(x)

–2 –1 1 2 –6 –4 –2 2 4 6

erf(x)

–2 –1 1 2 –6 –4 –2 2 4 6

J0(x)

D-finite Functions

= solns of linear ODEs with polynomial coeffts pr(x) · f (r)(x) + · · · + p1(x) · f ′(x) + p0(x) · f(x) = 0 p0, p1, . . . , pr ∈ R[x] Example: f(x) = arctan(x) (x2 + 1) · f ′′(x) + 2x · f ′(x) = 0, f(0) f ′(0)

  • =

1

slide-5
SLIDE 5

Problem Statement

  • prf (r) + · · · + p0f = 0,

f(0), . . . , f (r−1)(0) D ⊆ R, ε > 0

}main() { int temp; float celsius; char repeat; char flag; do { flag='n"; do { if(flag=='n') printf("Input a valid temperature :"); else printf("input a valid temperature,stupid:"); scanf("%d",&temp); flag='y'; } while (temp<0||temp >100); celsius=(5.0/9.0)*(temp-32); printf("%d degrees F is %6.2f degrees celsius\n",temp,cels printf("Do you have another temperature?"); repeat=getchar(); putchar('\n');

❞♦✉❜❧❡ ❢✉♥✭❞♦✉❜❧❡ ①✮❀ ∀x ∈ D ∩ ❞♦✉❜❧❡,

  • ❢✉♥(x) − f(x)

f(x)

  • ≤ ε

◮ “Semi-automatic”: may require implementation hints

Future goal: full automation

◮ Rigorous in principle, some shortcuts in current prototype

slide-6
SLIDE 6

Overview

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ NumGfun (Maple) ◮ ≈ rigorous multiple

precision solver for ODE

◮ Sollya ◮ ≈ Metalibm-lutetia

(cf. previous talk by

  • O. Kupriianova)

sets of polynomial approximations

slide-7
SLIDE 7

The Frontend: Taylor Series

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code pr(x) · f (r)(x) + · · · p0(x) · f(x) = 0 Ansatz: f(x) =

n=0

cnxn

◮ f ′(x) =

n=0

(n + 1)cn+1xn

◮ x · f(x) =

n=1

cn−1xn

cn = b0(n)cn−1 + b1(n)cn−2 + · · · f(x) = c0 + c1x + · · · + cd−1xd−1 + · · ·

slide-8
SLIDE 8

The Frontend: Error Bounds

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code f(x) =

d−1

n=0

cnxn +

n=d

cnxn

?

|x| ≤ ρ “Majorizing series”: using the ODE, find a simple ˆ f(x) = ∑ ˆ cnxn such that |cn| ≤ ˆ cn for all n.

  • f(x) −

d−1

n=0

cnxn

n=d

ˆ cnρn

!

≤ ε.

slide-9
SLIDE 9

The Frontend: Analytic Continuation

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ What if D ⊆ [−ρ, ρ]?

i −i

slide-10
SLIDE 10

The Frontend: Analytic Continuation

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ What if D ⊆ [−ρ, ρ]?

i −i x1

f(x1)

f ′(x1)

  • =

1.0000... 0.5404...

0.0000... 0.7352...

f(0)

f ′(0)

slide-11
SLIDE 11

The Frontend: Analytic Continuation

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ What if D ⊆ [−ρ, ρ]?

i −i x1

f(x1)

f ′(x1)

  • =

1.0000... 0.5404...

0.0000... 0.7352...

f(0)

f ′(0)

slide-12
SLIDE 12

The Frontend: Analytic Continuation

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ What if D ⊆ [−ρ, ρ]?

i −i x1 x2

f(x1)

f ′(x1)

  • =

1.0000... 0.5404...

0.0000... 0.7352...

f(0)

f ′(0)

  • f(x2)

f ′(x2)

  • =

1.0000... 0.5095...

0.0000... 0.5055...

f(x1)

f ′(x1)

slide-13
SLIDE 13

The Frontend: Analytic Continuation

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ What if D ⊆ [−ρ, ρ]?

i −i x1 x2

f(x1)

f ′(x1)

  • =

1.0000... 0.5404...

0.0000... 0.7352...

f(0)

f ′(0)

  • f(x2)

f ′(x2)

  • =

1.0000... 0.5095...

0.0000... 0.5055...

f(x1)

f ′(x1)

  • . . .
slide-14
SLIDE 14

The Frontend: Economization

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code Have:

  • f(x) − (c0 + · · · + cd−1xd−1)
  • ≤ ε

2 x ∈ C, |x| ≤ 1 c0 + · · · + c7 x7 + c8 x8 + c9 x9 c(1) + · · · + c(1)

7 x7 + c(1) 8 x8

c(2) + · · · + c(2)

7 x7

. . . |Tn| ≤ 1 over [−1, 1] − c9 29 T9(x) − c8 28 T8(x)

  • f(x) − ( c(k)

+ · · · + c(k)

d−1−kxk )

  • ≤ ε

x ∈ [−1, 1]

slide-15
SLIDE 15

The Backend: Tight Approximation

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ Frontend:

D =

  • Di

x ∈ Di |pi(x) − f(x)| ≤ ε deg pi = O(500)

◮ Backend:

D =

  • D⋆

i

x ∈ D⋆

i

  • p⋆

i (x)

f(x) − 1

  • ≤ ε

deg p⋆

i = O(10)

slide-16
SLIDE 16

The Backend: Tight Approximation

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ Backend: leveraging Metalibm

◮ Reuse domain splitting algorithm

sketched in last talk D =

  • D⋆

i

◮ Reuse minimax implementation

with relative error bounds x ∈ D⋆

i

  • p⋆

i (x)

f(x) − 1

  • ≤ ε

◮ Addressing issues

◮ Pure black-box interface too slow ◮ Zeros of f in domain D?

slide-17
SLIDE 17

The Backend: FP Polynomials

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ Implementation need: p⋆ i ∈ F[x] ◮ Leverage existing FP minimax

heuristics

slide-18
SLIDE 18

The Backend: FP Polynomials

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ Implementation need: p⋆ i ∈ F[x] ◮ Leverage existing FP minimax

heuristics

◮ unless f has a zero in the domain

slide-19
SLIDE 19

The Backend: Zeros of f

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ Suppose f(c) = 0 for c ∈ D ◮ We want

  • p⋆

i (x)

f(x) − 1

  • ≤ ε

◮ We need p⋆ i (c) = 0 ◮ No if p⋆ i ∈ F[x] but c ∈ R\F

slide-20
SLIDE 20

The Backend: Zeros of f

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ Suppose f(c) = 0 for c ∈ D ◮ Actually we want

∀x ∈ F.

  • p⋆

i (x)

f(x) − 1

  • ≤ ε

◮ p⋆ i (x) never needs to be 0 exactly ◮ Okay but we need to compute p⋆ i

◮ Express c as a symbol, evaluated

with Newton-Raphson on f

◮ Give f(x − c)/xk to minimax

algorithm, yielding q

◮ Deduce p⋆

i from xk q(x)

slide-21
SLIDE 21

The Backend: Evaluation scheme

Frontend Backend differential equation truncated Taylor series compact rough approx. tight approximation FP representable poly. evaluation scheme code

◮ Reuse Metalibm-Lutetia to generate

Horner scheme

◮ We could also use

Metalibm-Lugdunum here

slide-22
SLIDE 22

Examples: erfc

–0.5 0.5 1 1.5 2 –2 –1 1 2 x

◮ Consider erfc(x) = 1 −

2 √π

x

0 e−t2dt

  • ver D = [−2; 2] at ε = 2−62

◮ Describe erfc with

f ′′(x) + 2xf ′(x) = 0, f(0) = 1, f ′(0) = −2 √π

◮ Generated code has domain split

into 16 subdomains

◮ Code runs in 110 to 350 cycles,

libm exponential in 80 cycles

◮ Code is bitwise the same if we take

MPFR’s erfc instead of ODE

  • 1.2e-19
  • 1e-19
  • 8e-20
  • 6e-20
  • 4e-20
  • 2e-20

2e-20 4e-20 6e-20 8e-20 1e-19

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

slide-23
SLIDE 23

Examples: J0

–1 –0.8 –0.6 –0.4 –0.2 0.2 0.4 0.6 0.8 1 10 20 30 40 x

◮ Consider Bessel function J0

  • ver D = [0.5; 42] at ε = 2−45

◮ J0 is given as

xf ′′(x) + f ′(x) + xf(x) = 0, f(x) → 1, x → 0

◮ Code generation takes about 30

minutes

◮ Generator has to cope with 13 zeros

in the domain

◮ Code runs in 60 to 500 cycles,

libm exponential in 80 cycles

  • 2e-14
  • 1.5e-14
  • 1e-14
  • 5e-15

5e-15 1e-14 1.5e-14 5 10 15 20 25 30 35 40

slide-24
SLIDE 24

Conclusion and Outlook

◮ Don’t code. Have your special functions generated! ◮ Process starts from the very basic definition ◮ Only small domains handled fully automatically ◮ Full range implementions require manual intervention ◮ Frontend-backend interface not satisfactory ◮ Code generation performance unpredictable

slide-25
SLIDE 25

Thanks!

Thank you for your attention! Questions?