Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie - - PowerPoint PPT Presentation
Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie - - PowerPoint PPT Presentation
Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie Introduction Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? MLs flow: backend and code generation Conclusion Florent de
Introduction
Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 2
The big picture
metalibm/C11 metalibm/Open code CR C11 degraded C11 non-standard programmer specialist libm dev sci dev automation assisted fully automatic high performance high genericity
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 3
The big picture
metalibm/C11 metalibm/Open code CR C11 degraded C11 non-standard programmer specialist libm dev sci dev automation assisted fully automatic high performance high genericity
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 3
First experiment: FloPoCo-like
Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 4
Overview
Bottom-up philosophy
Start with working C code Embed it in printf() Introduce genericity and define helper functions in an ad-hoc way.
Pros and cons
Guaranteed success AND performance Limited genericity Very limited abstraction (e.g. for formal proof?)
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 5
Results
After developing exp, log and trig-of-pi (sinpi, cospi, sincospi)
Genericity
precision (single or double, faithful or degraded) processor (portable, Kalray) performance (Horner/Estrin, vector/scalar)
Shared code
polynomial approximation, of course float-to-int conversions testbench generation (see the demo) Some of the generated code is better than libm for some Kalray applications. Now go see the code in the private svn, directory ProofOfConcept.
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 6
The CFunction class
Main attributes
basename (string) io format (Format) input list
- utput list
processor (Processor class) accuracy (int) correct rounding (boolean) manage subnormals (boolean) vectorize (boolean) eval Estrin (boolean)
Main methods
gen code(), gen header(), gen declaration() gen emulation code() gen test program(), gen exhaustive test program()
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 7
The Processor class
... provides code generation services. methods with a failsafe, portable default actual processor classes inherit them and may overload them (with whatever intrinsincs etc) so the same source is indeed optimized for a range of processor Current examples: possible fma true fma variants of float to int (using magic constants, using nearbyint, using intrinsics TODO: capture higher-level capabilities, such as SIMD capabilities.
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 8
Second experiment: rewriting rules
Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 9
Rewriting step library
Gappa MPFR Sollya Rewriting steps library Core library
exponential_first_rr_fp(...) {....} cody_waite_2(...) {...} poly_horner_fp(...) {...}
Logarithm code generator
... ...
Exponential code generator
exponential_first_rr_fp(...); if(...) poly_horner_fp(...) {...} else ... ...
exp log variants variants
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 10
In practice
The problem
evaluate ex faithfully to a double for x a double
First step: invent a range reduction
Here is its ideal mathematical description: k =
- x ×
1 ln(2)
- and
y = x − k × ln (2) = ⇒ k ∈ Z and y ∈ [− ln(2)
2 , ln(2) 2 ]
and ex = 2k × ey (1)
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 11
Second step: refine to a machine-implementable version
k =
- x ×
1 ln (2)
- (2)
k1 =
- x ×
1 ln (2) + δk
- ,
δk ∈ Iδk, k1 − k ∈ Ik (3) y1 = x − k1 × ln(2) + δy, y ∈ Iy, δy ∈ Iδy (4) p1 = ey1 + δp, δp ∈ Iδp (5) r = 2k1 × p1 (6)
Can this two-step derivation be found by a program?
I don’t think so. So I consider (2) to (6) as the starting point of a metaexp.
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 12
Meta-skeleton for the exp
(Iδk, Ik) = genCodeForComputingK(formatX, ...) (7) (Iδy , Iy) = genCodeForComputingY(Iδk, Ik, ...) (8) (Iδp) = genCodeForPolyApprox(”exp(x)”, targetPrecision, Iy, ...) (9) (Iδr ) = genCodeForReconstruction(Ik, ...) (10)
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 13
Actual metaexp skeleton
def gen_code(self): # Build the code self.gen_code_for_k("x") self.gen_code_for_y() self.gen_code_for_poly() self.gen_code_for_reconstruction() self.gen_code_for_exceptions() All the previous variables have become global class attributes. more readable but dependencies lost
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 14
def gen_code_for_k(self, X): self.code.declare("k", int32) self.code.declare("kf", self.fp_format) c=askSollya("1/log(2)") roundedc = round(c, self.fp_format.precision, RN) self.code.declare_const("invLog2", self.fp_format, roundedc) self.code.declare("nrK", self.fp_format) self.code << "nrK" + " = " + "invLog2 * " + X +"; /* not rounded K */\n" self.processor.genCodeForFloatToInt("k", "kf", "nrK", self.fp_format, # Error computation -- at some point to be delegated to Gappa # Error of storing roundedc and not log(2) delta1 = round(c-roundedc, 24, RU) # minor TODO: double rounding here # Error of the floating point multiplication by roundedc maxdelta2 = abs(self.fp_format.u*c) I_inf= round((-maxdelta2+delta1)*self.max_value_for_finite_output, 24, I_sup= round((maxdelta2+delta1)*self.max_value_for_finite_output, 24, self.I_deltak = (I_inf, I_sup) if (self.I_deltak[0] <= -1)
- r (self.I_deltak[1] >= 1):
raise Exception(’I_deltak to large to ensure I_k is {-1,0,1}’) more comments in the actual metalibm/metaexp.py
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 15
All this to generate this
float invLog2 = 0x1.715476p0f; float rnd_cst = 12582912.f; float nrK; float nrKrounded; float kf; int32_t k; nrK = invLog2 * x; /* not rounded K */ /* float rounded to an int using the magic constant */ nrKrounded = (nrK + rnd_cst) - rnd_cst; /* this rounds to the nearest int kf = nrKrounded; /* floating-point rounded result */ k = nrKrounded; /* this float to int conversion is a truncation */
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 16
But perfs are OK (test yourself in the svn)
My laptop: Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz My desktop: Intel(R) Xeon(TM) CPU E5-1620 0 @ 3.60GHz Both running XUbuntu 12.10 with gcc 4.7.2 Core2 U9600 Xeon E5-1620 stock expf 193 45 expf Horner 87 24 expf Estrin 77 27 stock exp 108 60 exp Horner 130 28 exp Estrin 89 36 Disclaimers: timings using rdtsc(), usual caveats apply. inlining switched on for our code, not for the stock function.
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 17
High-level back-end?
Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 18
New Metalibm philosophy
New Metalibm features:
function DAG representation
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19
New Metalibm philosophy
New Metalibm features:
function DAG representation abstract target description
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19
New Metalibm philosophy
New Metalibm features:
function DAG representation abstract target description disconnect description/optimization from code/proof generation
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19
New Metalibm philosophy
New Metalibm features:
function DAG representation abstract target description disconnect description/optimization from code/proof generation
Generate implementations according to a standardized flow:
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19
New Metalibm philosophy
New Metalibm features:
function DAG representation abstract target description disconnect description/optimization from code/proof generation
Generate implementations according to a standardized flow:
description of function implementation DAG first round of optimizations
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19
New Metalibm philosophy
New Metalibm features:
function DAG representation abstract target description disconnect description/optimization from code/proof generation
Generate implementations according to a standardized flow:
description of function implementation DAG first round of optimizations code source generation (+ optimization) proof generation
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19
New Metalibm philosophy
New Metalibm features:
function DAG representation abstract target description disconnect description/optimization from code/proof generation
Generate implementations according to a standardized flow:
description of function implementation DAG first round of optimizations code source generation (+ optimization) proof generation validation
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19
example of DAG description
vx = VARIABLE("x", precision = fpformat) # reduced argument red_x = NearestInt(vx / log(2), precision = int32, tag = "red_x") # HIGH qnd LOW part log(2) generation log2_hi = round(log(2), fpformat.sollya_name - 10, RN) log2_lo = round(log(2) - log2_hi, fpformat.sollya_name, RN) r = (vx - (red_x * log2_hi)) - red_x * log2_lo r.set_attributes(tag = "r", exact = True) red_int = Interval(-log(2)/2, log(2)/2) poly = Polynomial.generate_fpminimax(exp(x), 5, red_int, [ML_Binary64]*6, absolute) poly_scheme = PolySchemeGenerator.generate_horner(poly, r) result = Return(ExponentInsertion(red_x) * poly_scheme) backend_scheme = Backend(processor).backend_float(result, ML_Binary64) source_code = CodeGenerator(processor).generate_expr(backend_scheme) Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 20
The annotation system
Metalibm uses an elaborate system of annotations
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21
The annotation system
Metalibm uses an elaborate system of annotations to facilitate generated code reading
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21
The annotation system
Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21
The annotation system
Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG to enforce numerical constraints
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21
The annotation system
Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG to enforce numerical constraints Some examples of annotations:
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21
The annotation system
Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG to enforce numerical constraints Some examples of annotations: tag, debug precision
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21
The annotation system
Metalibm uses an elaborate system of annotations to facilitate generated code reading to optimize DAG to enforce numerical constraints Some examples of annotations: tag, debug precision likely exact, interval
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 21
Backend processing and code generation
A backend manipulates the intermediate representation:
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Backend processing and code generation
A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Backend processing and code generation
A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions
- ptimizations at the level of abstract operations
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Backend processing and code generation
A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions
- ptimizations at the level of abstract operations
dynamic support library expansion pre-vectorization processing
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Backend processing and code generation
A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions
- ptimizations at the level of abstract operations
dynamic support library expansion pre-vectorization processing Then code generation is performed:
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Backend processing and code generation
A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions
- ptimizations at the level of abstract operations
dynamic support library expansion pre-vectorization processing Then code generation is performed: generated from fully type-instanciated description
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Backend processing and code generation
A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions
- ptimizations at the level of abstract operations
dynamic support library expansion pre-vectorization processing Then code generation is performed: generated from fully type-instanciated description constants, tables and core code generation
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Backend processing and code generation
A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions
- ptimizations at the level of abstract operations
dynamic support library expansion pre-vectorization processing Then code generation is performed: generated from fully type-instanciated description constants, tables and core code generation several targets are available
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Backend processing and code generation
A backend manipulates the intermediate representation: instanciating every undetermined format introducing necessary conversions
- ptimizations at the level of abstract operations
dynamic support library expansion pre-vectorization processing Then code generation is performed: generated from fully type-instanciated description constants, tables and core code generation several targets are available support for processor-specific code generation
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 22
Metalibm performance result
Description arch. CPE VCR log SSE3 35.34 VCR log AVX 21.81 VCR log AVX2 17.98 VCR log Xeon-Phi 45.03 VCR exp SSE3 29.98 VCR exp AVX 20.99 VCR exp Xeon-Phi 63.1
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 23
Conclusion
Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 24
All this is work in progress
Productivity already boosted
... for the experienced developer
Skeleton approach doesn’t contradict back-end automation
but does it really improve productivity? yes in the Intel context: one new proc/year ... but then we need parameter space exploration
We are going to argue that Python is a good choice TODOs:
Time to merge in a single code base? See with LIP6. Gappa generation as a first-class concern
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 25
A Sollya interface for Python
Why?
Python as a scripting language should be good enough Focus Sollya development on its core functionalities
faithful evaluation of arbitrary expression certified supremum norm machine-efficient polynomial approximation
Integrate Sollya in Sage (SoSage?)
What?
A Python module that adds the type SollyaObject to Python Autogenerated wrappers for most Sollya functions
Still TODO
Better typed interfaces, moving away from Sollya’s PythonSollya should be separated from the metalibm project.
Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 26