Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie - - PowerPoint PPT Presentation

preleminary work in lyon
SMART_READER_LITE
LIVE PREVIEW

Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie - - PowerPoint PPT Presentation

Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie Introduction Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? MLs flow: backend and code generation Conclusion Florent de


slide-1
SLIDE 1

Preleminary work in Lyon

Florent de Dinechin, Nicolas Brunie

slide-2
SLIDE 2

Introduction

Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 2

slide-3
SLIDE 3

The big picture

metalibm/C11 metalibm/Open code CR C11 degraded C11 non-standard programmer specialist libm dev sci dev automation assisted fully automatic high performance high genericity

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 3

slide-4
SLIDE 4

The big picture

metalibm/C11 metalibm/Open code CR C11 degraded C11 non-standard programmer specialist libm dev sci dev automation assisted fully automatic high performance high genericity

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 3

slide-5
SLIDE 5

First experiment: FloPoCo-like

Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 4

slide-6
SLIDE 6

Overview

Bottom-up philosophy

Start with working C code Embed it in printf() Introduce genericity and define helper functions in an ad-hoc way.

Pros and cons

Guaranteed success AND performance Limited genericity Very limited abstraction (e.g. for formal proof?)

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 5

slide-7
SLIDE 7

Results

After developing exp, log and trig-of-pi (sinpi, cospi, sincospi)

Genericity

precision (single or double, faithful or degraded) processor (portable, Kalray) performance (Horner/Estrin, vector/scalar)

Shared code

polynomial approximation, of course float-to-int conversions testbench generation (see the demo) Some of the generated code is better than libm for some Kalray applications. Now go see the code in the private svn, directory ProofOfConcept.

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 6

slide-8
SLIDE 8

The CFunction class

Main attributes

basename (string) io format (Format) input list

  • utput list

processor (Processor class) accuracy (int) correct rounding (boolean) manage subnormals (boolean) vectorize (boolean) eval Estrin (boolean)

Main methods

gen code(), gen header(), gen declaration() gen emulation code() gen test program(), gen exhaustive test program()

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 7

slide-9
SLIDE 9

The Processor class

... provides code generation services. methods with a failsafe, portable default actual processor classes inherit them and may overload them (with whatever intrinsincs etc) so the same source is indeed optimized for a range of processor Current examples: possible fma true fma variants of float to int (using magic constants, using nearbyint, using intrinsics TODO: capture higher-level capabilities, such as SIMD capabilities.

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 8

slide-10
SLIDE 10

Second experiment: rewriting rules

Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 9

slide-11
SLIDE 11

Rewriting step library

Gappa MPFR Sollya Rewriting steps library Core library

exponential_first_rr_fp(...) {....} cody_waite_2(...) {...} poly_horner_fp(...) {...}

Logarithm code generator

... ...

Exponential code generator

exponential_first_rr_fp(...); if(...) poly_horner_fp(...) {...} else ... ...

exp log variants variants

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 10

slide-12
SLIDE 12

In practice

The problem

evaluate ex faithfully to a double for x a double

First step: invent a range reduction

Here is its ideal mathematical description:      k =

  • x ×

1 ln(2)

  • and

y = x − k × ln (2) = ⇒            k ∈ Z and y ∈ [− ln(2)

2 , ln(2) 2 ]

and ex = 2k × ey (1)

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 11

slide-13
SLIDE 13

Second step: refine to a machine-implementable version

k =

  • x ×

1 ln (2)

  • (2)

k1 =

  • x ×

1 ln (2) + δk

  • ,

δk ∈ Iδk, k1 − k ∈ Ik (3) y1 = x − k1 × ln(2) + δy, y ∈ Iy, δy ∈ Iδy (4) p1 = ey1 + δp, δp ∈ Iδp (5) r = 2k1 × p1 (6)

Can this two-step derivation be found by a program?

I don’t think so. So I consider (2) to (6) as the starting point of a metaexp.

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 12

slide-14
SLIDE 14

Meta-skeleton for the exp

(Iδk, Ik) = genCodeForComputingK(formatX, ...) (7) (Iδy , Iy) = genCodeForComputingY(Iδk, Ik, ...) (8) (Iδp) = genCodeForPolyApprox(”exp(x)”, targetPrecision, Iy, ...) (9) (Iδr ) = genCodeForReconstruction(Ik, ...) (10)

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 13

slide-15
SLIDE 15

Actual metaexp skeleton

def gen_code(self): # Build the code self.gen_code_for_k("x") self.gen_code_for_y() self.gen_code_for_poly() self.gen_code_for_reconstruction() self.gen_code_for_exceptions() All the previous variables have become global class attributes. more readable but dependencies lost

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 14

slide-16
SLIDE 16

def gen_code_for_k(self, X): self.code.declare("k", int32) self.code.declare("kf", self.fp_format) c=askSollya("1/log(2)") roundedc = round(c, self.fp_format.precision, RN) self.code.declare_const("invLog2", self.fp_format, roundedc) self.code.declare("nrK", self.fp_format) self.code << "nrK" + " = " + "invLog2 * " + X +"; /* not rounded K */\n" self.processor.genCodeForFloatToInt("k", "kf", "nrK", self.fp_format, # Error computation -- at some point to be delegated to Gappa # Error of storing roundedc and not log(2) delta1 = round(c-roundedc, 24, RU) # minor TODO: double rounding here # Error of the floating point multiplication by roundedc maxdelta2 = abs(self.fp_format.u*c) I_inf= round((-maxdelta2+delta1)*self.max_value_for_finite_output, 24, I_sup= round((maxdelta2+delta1)*self.max_value_for_finite_output, 24, self.I_deltak = (I_inf, I_sup) if (self.I_deltak[0] <= -1)

  • r (self.I_deltak[1] >= 1):

raise Exception(’I_deltak to large to ensure I_k is {-1,0,1}’) more comments in the actual metalibm/metaexp.py

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 15

slide-17
SLIDE 17

All this to generate this

float invLog2 = 0x1.715476p0f; float rnd_cst = 12582912.f; float nrK; float nrKrounded; float kf; int32_t k; nrK = invLog2 * x; /* not rounded K */ /* float rounded to an int using the magic constant */ nrKrounded = (nrK + rnd_cst) - rnd_cst; /* this rounds to the nearest int kf = nrKrounded; /* floating-point rounded result */ k = nrKrounded; /* this float to int conversion is a truncation */

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 16

slide-18
SLIDE 18

But perfs are OK (test yourself in the svn)

My laptop: Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz My desktop: Intel(R) Xeon(TM) CPU E5-1620 0 @ 3.60GHz Both running XUbuntu 12.10 with gcc 4.7.2 Core2 U9600 Xeon E5-1620 stock expf 193 45 expf Horner 87 24 expf Estrin 77 27 stock exp 108 60 exp Horner 130 28 exp Estrin 89 36 Disclaimers: timings using rdtsc(), usual caveats apply. inlining switched on for our code, not for the stock function.

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 17

slide-19
SLIDE 19

High-level back-end?

Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 18

slide-20
SLIDE 20

New Metalibm philosophy

New Metalibm features:

function DAG representation

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

slide-21
SLIDE 21

New Metalibm philosophy

New Metalibm features:

function DAG representation abstract target description

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

slide-22
SLIDE 22

New Metalibm philosophy

New Metalibm features:

function DAG representation abstract target description disconnect description/optimization from code/proof generation

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

slide-23
SLIDE 23

New Metalibm philosophy

New Metalibm features:

function DAG representation abstract target description disconnect description/optimization from code/proof generation

Generate implementations according to a standardized flow:

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

slide-24
SLIDE 24

New Metalibm philosophy

New Metalibm features:

function DAG representation abstract target description disconnect description/optimization from code/proof generation

Generate implementations according to a standardized flow:

description of function implementation DAG first round of optimizations

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

slide-25
SLIDE 25

New Metalibm philosophy

New Metalibm features:

function DAG representation abstract target description disconnect description/optimization from code/proof generation

Generate implementations according to a standardized flow:

description of function implementation DAG first round of optimizations code source generation (+ optimization) proof generation

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

slide-26
SLIDE 26

New Metalibm philosophy

New Metalibm features:

function DAG representation abstract target description disconnect description/optimization from code/proof generation

Generate implementations according to a standardized flow:

description of function implementation DAG first round of optimizations code source generation (+ optimization) proof generation validation

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

slide-27
SLIDE 27

example of DAG description

vx = VARIABLE("x", precision = fpformat) # reduced argument red_x = NearestInt(vx / log(2), precision = int32, tag = "red_x") # HIGH qnd LOW part log(2) generation log2_hi = round(log(2), fpformat.sollya_name - 10, RN) log2_lo = round(log(2) - log2_hi, fpformat.sollya_name, RN) r = (vx - (red_x * log2_hi)) - red_x * log2_lo r.set_attributes(tag = "r", exact = True) red_int = Interval(-log(2)/2, log(2)/2) poly = Polynomial.generate_fpminimax(exp(x), 5, red_int, [ML_Binary64]*6, absolute) poly_scheme = PolySchemeGenerator.generate_horner(poly, r) result = Return(ExponentInsertion(red_x) * poly_scheme) backend_scheme = Backend(processor).backend_float(result, ML_Binary64) source_code = CodeGenerator(processor).generate_expr(backend_scheme) Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 20

slide-28
SLIDE 28

The annotation system

Metalibm uses an elaborate system of annotations

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21

slide-29
SLIDE 29

The annotation system

Metalibm uses an elaborate system of annotations to facilitate generated code reading

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21

slide-30
SLIDE 30

The annotation system

Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21

slide-31
SLIDE 31

The annotation system

Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG to enforce numerical constraints

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21

slide-32
SLIDE 32

The annotation system

Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG to enforce numerical constraints Some examples of annotations:

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21

slide-33
SLIDE 33

The annotation system

Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG to enforce numerical constraints Some examples of annotations: tag, debug precision

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21

slide-34
SLIDE 34

The annotation system

Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG to enforce numerical constraints Some examples of annotations: tag, debug precision likely exact, interval

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21

slide-35
SLIDE 35

Backend processing and code generation

A backend manipulates the intermediate representation:

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-36
SLIDE 36

Backend processing and code generation

A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-37
SLIDE 37

Backend processing and code generation

A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions

  • ptimizations at the level of abstract operations

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-38
SLIDE 38

Backend processing and code generation

A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions

  • ptimizations at the level of abstract operations

dynamic support library expansion pre-vectorization processing

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-39
SLIDE 39

Backend processing and code generation

A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions

  • ptimizations at the level of abstract operations

dynamic support library expansion pre-vectorization processing Then code generation is performed:

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-40
SLIDE 40

Backend processing and code generation

A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions

  • ptimizations at the level of abstract operations

dynamic support library expansion pre-vectorization processing Then code generation is performed: generated from fully type-instanciated description

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-41
SLIDE 41

Backend processing and code generation

A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions

  • ptimizations at the level of abstract operations

dynamic support library expansion pre-vectorization processing Then code generation is performed: generated from fully type-instanciated description constants, tables and core code generation

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-42
SLIDE 42

Backend processing and code generation

A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions

  • ptimizations at the level of abstract operations

dynamic support library expansion pre-vectorization processing Then code generation is performed: generated from fully type-instanciated description constants, tables and core code generation several targets are available

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-43
SLIDE 43

Backend processing and code generation

A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions

  • ptimizations at the level of abstract operations

dynamic support library expansion pre-vectorization processing Then code generation is performed: generated from fully type-instanciated description constants, tables and core code generation several targets are available support for processor-specific code generation

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22

slide-44
SLIDE 44

Metalibm performance result

Description arch. CPE VCR log SSE3 35.34 VCR log AVX 21.81 VCR log AVX2 17.98 VCR log Xeon-Phi 45.03 VCR exp SSE3 29.98 VCR exp AVX 20.99 VCR exp Xeon-Phi 63.1

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 23

slide-45
SLIDE 45

Conclusion

Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 24

slide-46
SLIDE 46

All this is work in progress

Productivity already boosted

... for the experienced developer

Skeleton approach doesn’t contradict back-end automation

but does it really improve productivity? yes in the Intel context: one new proc/year ... but then we need parameter space exploration

We are going to argue that Python is a good choice TODOs:

Time to merge in a single code base? See with LIP6. Gappa generation as a first-class concern

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 25

slide-47
SLIDE 47

A Sollya interface for Python

Why?

Python as a scripting language should be good enough Focus Sollya development on its core functionalities

faithful evaluation of arbitrary expression certified supremum norm machine-efficient polynomial approximation

Integrate Sollya in Sage (SoSage?)

What?

A Python module that adds the type SollyaObject to Python Autogenerated wrappers for most Sollya functions

Still TODO

Better typed interfaces, moving away from Sollya’s PythonSollya should be separated from the metalibm project.

Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 26