Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie

Introduction Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 2

The big picture metalibm/C11 metalibm/Open degraded C11 CR C11 non-standard code programmer specialist libm dev sci dev fully automatic assisted automation high performance high genericity Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 3

First experiment: FloPoCo-like Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 4

Overview Bottom-up philosophy Start with working C code Embed it in printf() Introduce genericity and define helper functions in an ad-hoc way. Pros and cons Guaranteed success AND performance Limited genericity Very limited abstraction (e.g. for formal proof?) Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 5

Results After developing exp, log and trig-of-pi (sinpi, cospi, sincospi) Genericity precision (single or double, faithful or degraded) processor (portable, Kalray) performance (Horner/Estrin, vector/scalar) Shared code polynomial approximation, of course float-to-int conversions testbench generation (see the demo) Some of the generated code is better than libm for some Kalray applications. Now go see the code in the private svn, directory ProofOfConcept . Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 6

The CFunction class Main attributes basename (string) accuracy (int) io format (Format) correct rounding (boolean) input list manage subnormals (boolean) output list vectorize (boolean) processor (Processor class) eval Estrin (boolean) Main methods gen code() , gen header(), gen declaration() gen emulation code() gen test program(), gen exhaustive test program() Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 7

The Processor class ... provides code generation services. methods with a failsafe, portable default actual processor classes inherit them and may overload them (with whatever intrinsincs etc) so the same source is indeed optimized for a range of processor Current examples: possible fma true fma variants of float to int (using magic constants, using nearbyint , using intrinsics TODO: capture higher-level capabilities, such as SIMD capabilities. Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 8

Second experiment: rewriting rules Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 9

Rewriting step library Rewriting steps library Sollya Core library exponential_first_rr_fp(...) {....} cody_waite_2(...) {...} MPFR poly_horner_fp(...) {...} Gappa Logarithm code generator Exponential code generator if(...) exponential_first_rr_fp(...); ... else ... ... poly_horner_fp(...) {...} ... variants variants log exp Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 10

In practice The problem evaluate e x faithfully to a double for x a double First step: invent a range reduction Here is its ideal mathematical description:  k ∈ Z  � �  1  and k = x ×   ln(2)    y ∈ [ − ln (2) 2 , ln (2) = (1) 2 ] ⇒ and and   y = x − k × ln (2)     e x = 2 k × e y  Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 11

Second step: refine to a machine-implementable version � 1 � k = x × (2) ln (2) � 1 � k 1 = x × ln (2) + δ k δ k ∈ I δ k , k 1 − k ∈ I k (3) , y 1 = x − k 1 × ln(2) + δ y , y ∈ I y , δ y ∈ I δ y (4) p 1 = e y 1 + δ p , δ p ∈ I δ p (5) r = 2 k 1 × p 1 (6) Can this two-step derivation be found by a program? I don’t think so. So I consider (2) to (6) as the starting point of a metaexp. Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 12

Meta-skeleton for the exp ( I δ k , I k ) = genCodeForComputingK(formatX , ... ) (7) ( I δ y , I y ) = genCodeForComputingY( I δ k , I k , ... ) (8) ( I δ p ) = genCodeForPolyApprox(” exp ( x )” , targetPrecision , I y , ... ) (9) ( I δ r ) = genCodeForReconstruction( I k , ... ) (10) Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 13

Actual metaexp skeleton def gen_code(self): # Build the code self.gen_code_for_k("x") self.gen_code_for_y() self.gen_code_for_poly() self.gen_code_for_reconstruction() self.gen_code_for_exceptions() All the previous variables have become global class attributes. more readable but dependencies lost Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 14

def gen_code_for_k(self, X): self.code.declare("k", int32) self.code.declare("kf", self.fp_format) c=askSollya("1/log(2)") roundedc = round(c, self.fp_format.precision, RN) self.code.declare_const("invLog2", self.fp_format, roundedc) self.code.declare("nrK", self.fp_format) self.code << "nrK" + " = " + "invLog2 * " + X +"; /* not rounded K */\n" self.processor.genCodeForFloatToInt("k", "kf", "nrK", self.fp_format, # Error computation -- at some point to be delegated to Gappa # Error of storing roundedc and not log(2) delta1 = round(c-roundedc, 24, RU) # minor TODO: double rounding here # Error of the floating point multiplication by roundedc maxdelta2 = abs(self.fp_format.u*c) I_inf= round((-maxdelta2+delta1)*self.max_value_for_finite_output, 24, I_sup= round((maxdelta2+delta1)*self.max_value_for_finite_output, 24, self.I_deltak = (I_inf, I_sup) if (self.I_deltak[0] <= -1) or (self.I_deltak[1] >= 1): raise Exception(’I_deltak to large to ensure I_k is {-1,0,1}’) more comments in the actual metalibm/metaexp.py Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 15

All this to generate this float invLog2 = 0x1.715476p0f; float rnd_cst = 12582912.f; float nrK; float nrKrounded; float kf; int32_t k; nrK = invLog2 * x; /* not rounded K */ /* float rounded to an int using the magic constant */ nrKrounded = (nrK + rnd_cst) - rnd_cst; /* this rounds to the nearest int kf = nrKrounded; /* floating-point rounded result */ k = nrKrounded; /* this float to int conversion is a truncation */ Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 16

But perfs are OK (test yourself in the svn) My laptop: Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz My desktop: Intel(R) Xeon(TM) CPU E5-1620 0 @ 3.60GHz Both running XUbuntu 12.10 with gcc 4.7.2 Core2 U9600 Xeon E5-1620 stock expf 193 45 expf Horner 87 24 expf Estrin 77 27 stock exp 108 60 exp Horner 130 28 exp Estrin 89 36 Disclaimers: timings using rdtsc() , usual caveats apply. inlining switched on for our code, not for the stock function. Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 17

High-level back-end? Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? ML’s flow: backend and code generation Conclusion Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 18

New Metalibm philosophy New Metalibm features: function DAG representation Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

New Metalibm philosophy New Metalibm features: function DAG representation abstract target description Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

New Metalibm philosophy New Metalibm features: function DAG representation abstract target description disconnect description/optimization from code/proof generation Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

New Metalibm philosophy New Metalibm features: function DAG representation abstract target description disconnect description/optimization from code/proof generation Generate implementations according to a standardized flow: Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

New Metalibm philosophy New Metalibm features: function DAG representation abstract target description disconnect description/optimization from code/proof generation Generate implementations according to a standardized flow: description of function implementation DAG first round of optimizations Florent de Dinechin, Socrate team (ex-AriC (ex-Ar´ enaire)) The Metalibm Project 19

Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie - PowerPoint PPT Presentation

Preleminary work in Lyon Florent de Dinechin, Nicolas Brunie Introduction Introduction First experiment: FloPoCo-like Second experiment: rewriting rules High-level back-end? MLs flow: backend and code generation Conclusion Florent de

Your Career Your Career Your Career Your Career JENNY LYON JENNY LYON JENNY LYON JENNY LYON

jadorelyon In Love with Lyon, France and Everything French By Aga Marchewka Lyon, September 2017

INSA Lyon, FRANCE Summer Programme 2013 INSTITUT NATIONAL DES SCIENCES APPLIQUES DE LYON -

INSA Lyon, FRANCE Summer Programme 2014 INSTITUT NATIONAL DES SCIENCES APPLIQUES DE LYON -

SOUTH SO UTH LYON LYON CO COMMUNITY MMUNITY SC SCHOO HOOLS LS 2020 BOND 2020 BOND PROG

Semantic Array Dataflow Analysis Paul Iannetta Laure Gonnord UCBL 1, CNRS, ENS de Lyon, Inria,

exceptional geometry and string compactifications Henning Samtleben ENS de Lyon meets SISSA Lyon

Factoring bivariate lacunary polynomials without heights Bruno Grenet ENS Lyon Joint work

Tree-based estimators and actuarial applications Lyon-Columbia Workshop (Lyon), 06/27/2016

The Hessen Hessen- -Global Global The Experience Experience Cole Lyon Cole Lyon University

Arkema ODDO MIDCAP Lyon, January 5 th & 6 th , 2012 Lyon, January 5 th & 6 th , 2012

THE GOAL: Repurpose Fort Lyon and Create Solutions for Veteran and Chronic Homelessness 1

Priority U-Net: Detection of Punctuate White Matter Lesions in Preterm Neonate in 3D Cranial

Optimal checkpointing periods with fail-stop and silent errors Anne Benoit ENS Lyon

Kleene Algebra with Converse Talk at RAMICS 14 Paul Brunet & Damien Pous LIP, CNRS, ENS

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, INSA-Lyon LIRIS UMR CNRS

Introduction to Parallel Application Performance Engineering Brian Wylie Jlich Supercomputing

Different approaches to Talk based on the work made in collaboration with: the global periodicity

I ntroduction to Parallel Perform ance Engineering Bert W esarg Technische Universitt Dresden

eSTREAM Algorithms for the Next Round http://www.ecrypt.eu.org/stream/ 27 March 2007 Matt

Flexible Timing Simulation of RISC-V Processors with Sniper Neet eethu B Bal al M Mal ally

DU DUNE NE's Hardware Trigger architecture, Su Supern rnova tri rigger Ba Babak k Ab Abi

FAWN: A Fast Array of Wimpy Nodes David G. Andersen, Jason Franklin, Michael Kaminsky * , Amar

Digital System on Chip (SoC) Computer-Aided Design Flow ELEC 4200 Digital Systems Design