Code Generators for Mathematical Functions N. Brunie 1 , F. de - - PowerPoint PPT Presentation

code generators for mathematical functions
SMART_READER_LITE
LIVE PREVIEW

Code Generators for Mathematical Functions N. Brunie 1 , F. de - - PowerPoint PPT Presentation

Code Generators for Mathematical Functions N. Brunie 1 , F. de Dinechin 2 , O. Kupriianova 3 , Ch. Lauter 3 1 Kalray , Grenoble, France 2 Universit e de Lyon, INRIA, INSA-Lyon, CITI , F-69621 Villeurbanne, France 3 Sorbonne Universit es, UPMC


slide-1
SLIDE 1

Code Generators for Mathematical Functions

  • N. Brunie 1, F. de Dinechin 2, O. Kupriianova 3, Ch. Lauter 3

1Kalray, Grenoble, France 2Universit´

e de Lyon, INRIA, INSA-Lyon, CITI, F-69621 Villeurbanne, France

3Sorbonne Universit´

es, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005 Paris, France

22nd IEEE Symposium on Computer Arithmetic Lyon, France, 22-24 June 2015

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 1 /30

slide-2
SLIDE 2

Mathematical libraries

Standard libraries (libms)

elementary functions (exp, log, sin, sinh) special functions (xy, Γ, Bessel) standard precisions (single, double, quad)

Existing implementations

Intel’s MKL AMD’s libm ARM’s mathlib libmcr by Sun . . . glibc libm CRLibm by ENS Lyon newlib OpenLibm for Julia Yeppp!

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 2 /30

slide-3
SLIDE 3

One size does not fit all

Current offer

Several performance options (latency vs throughput) Several accuracy options (“quick and dirty”, faithful, correctly-rounded) Several portability options (generic vs AVX512)

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 3 /30

slide-4
SLIDE 4

One size does not fit all

Current offer

Several performance options (latency vs throughput) Several accuracy options (“quick and dirty”, faithful, correctly-rounded) Several portability options (generic vs AVX512)

Some people are still not happy

More performance, less compliance

degraded accuracy reduced domain

Functions not from standard libm

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 3 /30

slide-5
SLIDE 5

One size does not fit all

Current offer

Several performance options (latency vs throughput) Several accuracy options (“quick and dirty”, faithful, correctly-rounded) Several portability options (generic vs AVX512)

Some people are still not happy

More performance, less compliance

degraded accuracy reduced domain

Functions not from standard libm

Who is going to write all these variants?

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 3 /30

slide-6
SLIDE 6

Solution

Metalibm

Write tools to produce code for math functions

Analogy

assembly → compilers code → code generators

. c f i s t a r t p r o c subq $8 , %rsp . c f i d e f c f a

  • f f s e t

16 movl $52 , %r8d movl $37 , %ecx movl $15 , %edx movl $ . LC0 , %e s i movl $1 , %e d i x o r l %eax , %eax c a l l p r i n t f c h k x o r l %eax , %eax addq $8 , %rsp . c f i d e f c f a

  • f f s e t

8 r e t . c f i e n d p r o c #i n c l u d e <s t d i o . h> i n t main () { i n t a , b , sum ; a = 15; b = 37; sum = a + b ; p r i n t f ( ”%d + %d = %d\n” , a , b , sum) ; return 0 ; }

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 4 /30

slide-7
SLIDE 7

Metalibm: generator use-cases

approximation scheme generator

sollya

black-box function specification back end C code generator

python

C11 function code generators

python

function code test code

I/O precision target accuracy range processor

user input generated code

  • pen-ended code generation

C11 libm code generation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 5 /30

slide-8
SLIDE 8

Outline

1

Background in function implementations

2

Lutetia version (open-ended generation)

3

Lugdunum version (C11 function code generator)

4

Conclusion and Future Work

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 6 /30

slide-9
SLIDE 9

How to implement a function manually

The task: from f on [a, b] get an implementation fun:

  • fun −f

f

  • ≤ ¯

ε

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 7 /30

slide-10
SLIDE 10

How to implement a function manually

The task: from f on [a, b] get an implementation fun:

  • fun −f

f

  • ≤ ¯

ε

  • 1. Eliminating special cases: zeros, infinities, NaNs, etc.
  • 2. Argument reduction: transform [a, b] to [α, β], a small interval
  • 3. Polynomial approximation:

minimax approximation, polynomial of low degree, Remez algorithm

  • 4. Reconstruction
  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 7 /30

slide-11
SLIDE 11

How to implement a function manually

The task: from f on [a, b] get an implementation fun:

  • fun −f

f

  • ≤ ¯

ε

  • 1. Eliminating special cases: zeros, infinities, NaNs, etc.
  • 2. Argument reduction: transform [a, b] to [α, β], a small interval
  • 3. Polynomial approximation:

minimax approximation, polynomial of low degree, Remez algorithm

  • 4. Reconstruction

Example

implement f (x) = ex ex = 2

x log 2 = 2

  • x

log 2

  • · 2

x log 2 −

  • x

log 2

  • = 2E · ex−E log 2 = 2E · er
  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 7 /30

slide-12
SLIDE 12

Argument reduction

Based on mathematical properties:

na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . .

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 8 /30

slide-13
SLIDE 13

Argument reduction

Based on mathematical properties:

na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . . What properties do we know for erf, J0

  • r an open-ended function (purely defined by a differential equation)?
  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 8 /30

slide-14
SLIDE 14

Argument reduction

Based on mathematical properties:

na+b = na · nb, sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . . What properties do we know for erf, J0

  • r an open-ended function (purely defined by a differential equation)?

When argument reduction does not work

Piecewise-polynomial approximation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 8 /30

slide-15
SLIDE 15

Outline

1

Background in function implementations

2

Lutetia version (open-ended generation)

3

Lugdunum version (C11 function code generator)

4

Conclusion and Future Work

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 9 /30

slide-16
SLIDE 16

Metalibm: generator use-cases

approximation scheme generator

sollya

black-box function specification back end C code generator

python

C11 function code generators

python

function code test code

I/O precision target accuracy range processor user input

  • pen-ended code generation

C11 libm code generation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 10 /30

slide-17
SLIDE 17

Philosophy

Objective #1: push-button tool to implement functions f : R → R

Similar to yesterday’s talk by D. Thomas, but in software automatic argument reduction automatic polynomial approximation automatic domain splitting with user specified accuracy

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 11 /30

slide-18
SLIDE 18

Philosophy

Objective #1: push-button tool to implement functions f : R → R

Similar to yesterday’s talk by D. Thomas, but in software automatic argument reduction automatic polynomial approximation automatic domain splitting with user specified accuracy

Objective #2: black-box functions

Open-ended means: no function dictionaries specify the function by an expression (composite functions) but not only:

all we need is code that evaluates the function and its first derivatives with arbitrary accuracy

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 11 /30

slide-19
SLIDE 19

Black-box function generator parameters implementation.c Metalibm Lutetia

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 12 /30

slide-20
SLIDE 20

Black-box function generator

1 2 3

parameters

function domain final accuracy max poly degree table size

implementation.c Metalibm Lutetia

1 - Properties detection 2 - Domain splitting 3 - Approximation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 13 /30

slide-21
SLIDE 21

Exponential function detection

Generation hypothesis

f (x) is of type βx, unknown β

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 14 /30

slide-22
SLIDE 22

Exponential function detection

Generation hypothesis

f (x) is of type βx, unknown β

Finding the base

β = exp

  • ln(f (ξ))

ξ

  • , for some ξ ∈ [a, b]
  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 14 /30

slide-23
SLIDE 23

Exponential function detection

Generation hypothesis

f (x) is of type βx, unknown β

Finding the base

β = exp

  • ln(f (ξ))

ξ

  • , for some ξ ∈ [a, b]

Decision of acceptance

˜ ε =

  • βx

f (x) − 1

  • [a,b]

should be small

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 14 /30

slide-24
SLIDE 24

Black-box function generator

1 2 3

parameters

function domain final accuracy max poly degree table size

implementation.c Metalibm Lutetia

1 - Properties detection 2 - Domain splitting 3 - Approximation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 15 /30

slide-25
SLIDE 25

Domain splitting hints

General procedures

Naive: choose some large k Hierarchical: split into 2k subdomains Successive powers of two

Function-adapted procedures

Bisection Optimized bisection

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 16 /30

slide-26
SLIDE 26

Black-box function generator

1 2 3

parameters

function domain final accuracy max poly degree table size

implementation.c Metalibm Lutetia

1 - Properties detection 2 - Domain splitting 3 - Approximation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 17 /30

slide-27
SLIDE 27

Black-box function generator

3

parameters

f = exp(x) [a, b] = [0, 0.3] ¯ ε = 2−53 . . .

implementation.c

polynomial coefficients approximation(...)

Metalibm Lutetia

1 - Properties detection 2 - Domain splitting 3 - Approximation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 18 /30

slide-28
SLIDE 28

Black-box function generator

2 3

parameters

f =

1 1+exp(x)

[a, b] = [−2, 2] ¯ ε = 2−52 . . .

implementation.c

polynomial coefficients constants approximations(...) reconstruction(...)

Metalibm Lutetia

1 - Properties detection 2 - Domain splitting 3 - Approximation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 19 /30

slide-29
SLIDE 29

Black-box function generator

1 3

parameters

f = exp(x) [a, b] = [−100, 100] ¯ ε = 2−53 tableSize = 32 . . .

implementation.c

polynomial coefficients constants, table(s) argumentReduction(...) approximation(...) reconstruction(...)

Metalibm Lutetia

1 - Properties detection 2 - Domain splitting 3 - Approximation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 20 /30

slide-30
SLIDE 30

Outline

1

Background in function implementations

2

Lutetia version (open-ended generation)

3

Lugdunum version (C11 function code generator)

4

Conclusion and Future Work

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 21 /30

slide-31
SLIDE 31

Metalibm: generator use-cases

approximation scheme generator

sollya

black-box function specification back end C code generator

python

C11 function code generators

python

function code test code

I/O precision target accuracy range processor user input

  • pen-ended code generation

C11 libm code generation

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 22 /30

slide-32
SLIDE 32

Philosophy

Objective #1: enhance the productivity of seasoned libm developer

capture many code varieties in a single, high-level source capture function-specific tricks, floating-point tricks, ... enable design space exploration

  • btain better code in less time

Not a push-button tool like Lutetia!

Objective #2: a back-end for Metalibm-lutetia

The Lutetia people are seasoned libm developers... manage processor-specific optimizations manage performance options (vector versus latency, ...)

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 23 /30

slide-33
SLIDE 33

Overview of code generation back-end

Processor Kalray K1a K1b x87 SSE2 AVX2 ARM polynomial approximation evaluation parallelization Gappa generation fast path factorization instructions selection

...

C code generation

exp.py log.py

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 24 /30

slide-34
SLIDE 34

Technical choices

Give the libm developer full control over the generated code: embed code generation in Python scripts

describe the evaluation scheme in Python syntax Sollya embedded in Python use Python scripting for design-space exploration, etc

a framework that provides all sorts of useful services

from high-level (Lutetia-based polynomial approximation) to low-level (code transformations for vectorization)

describe a Gappa proof also in Python

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 25 /30

slide-35
SLIDE 35

Speedups obtained with respect to default libm

Table obtained out of exp.py and log.py. processor function speedup default libm Kalray K1a expf (binary32) 4.0 newlib logf (binary32) 2.7 exp (binary64) 5.8 log (binary64) 5.8 core i7, SSE2 expf (binary32) 1.7 glibc logf (binary32) 1.02 exp (binary64) 1.7 log (binary64) 1.6 core i7, AVX2 expf (binary32) 1.1 logf (binary32) 0.96 exp (binary64) 1.9 log (binary64) 1.6 (all C11-compliant and optimized for latency.)

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 26 /30

slide-36
SLIDE 36

An example

Python generic code

k = NearestInteger(unround_k, precision = self.precision) NearestInteger is a method of the generic Processor class.

Code generated for binary32 on Kalray

k = rintf(unround_k);

Code generated for binary64 on X87/SSE2

t = _mm_set_sd(unround_k); t1 = _mm_round_sd(t, t, _MM_FROUND_TO_NEAREST_INT); k = _mm_cvtsd_f64(t1);

Challenge: find the right balance between Metalibm and the compiler.

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 27 /30

slide-37
SLIDE 37

Outline

1

Background in function implementations

2

Lutetia version (open-ended generation)

3

Lugdunum version (C11 function code generator)

4

Conclusion and Future Work

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 28 /30

slide-38
SLIDE 38

Conclusion

Two tools for libm developers available at http://metalibm.org

Automated generation of evaluation schemes Libm development framework

Reduced cost to get alternative function code Comparable or better performance Next goal: unify the two approaches Addition of argument reduction procedures and new processor classes Offline demos in a coffee-break

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 29 /30

slide-39
SLIDE 39

Q/A

Thank you for your attention! Questions?

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 30 /30

slide-40
SLIDE 40

Meanwhile, research on elementary functions goes on

Preliminary results on correctly rounded logarithm using the 64-bit integer arithmetic of modern processors: source system crlibm crlibm-de fixed-point avg time 94 107 65 70 max time 13K 889 573 165

  • O. Kupriianova (LIP6)

Code generators for math functions ARITH 22, 23 June 2015 31 /30