Automatic Synthesis of Fast and Certified Code for Polynomial - PowerPoint PPT Presentation

ANR MetaLibm kick-off meeting Lyon, 22 January, 2014 Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation through the example of the CGPE tool Guillaume Revy Équipe-projet DALI, Univ. Perpignan Via Domitia LIRMM, CNRS: UMR 5506 - Univ. Montpellier 2 DALI G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 1/18

Context of CGPE This work takes mainly part in the context of the development of FLIP ◮ software support for binary32 floating-point arithmetic on integer processors In this talk, we will focus on polynomial evaluation ◮ it frequently appears as a building block of some mathematical operator implementation, typically in FLIP Current challenge: tools and methodologies for the automatic synthesis of fast and certified programs ◮ optimized for a given format, for the target architecture G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 2/18

On the one side: the IEEE 754-2008 standard, ... Definition of IEEE floating-point arithmetic ◮ floating-point formats: single precision, double precision, ... ◮ special values: ± 0, ± ∞ , NaN ◮ 4 rounding modes: to nearest even, upward, downward, and toward zero ◮ mathematical function behavior � special input (ex: √− 0 = − 0) � requires / recommends correct rounding Motivation: ◮ make computations reproducible ◮ and make results architecture-independent G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 3/18

... on the other side: the ST231 processor ST231 core SDI ports 4-issue VLIW 32-bit integer processor Mul Mul ITLB UTLB 4 x SDI DTLB Control � no FPU registers SCU Register Load Write Instruction file (64 Store buffer ICache buffer registers Unit 8 read 4 write) (LSU) DCache STBus Parallel execution unit CMC 64-bit Prefetch buffer PC and Branch ◮ 4 integer ALUs D-side branch register IU IU IU IU memory unit file I-side subsystem memory subsystem Trap ◮ 2 pipelined multipliers 32 × 32 → 32 controller Peripherals STBus 3 x Interrupt Debug 32-bit Timers controller support unit Latencies: ALU = 1 cycle / Mul = 3 cycles 61 interrupts Debuglink G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 4/18

... on the other side: the ST231 processor ST231 core SDI ports 4-issue VLIW 32-bit integer processor Mul Mul ITLB UTLB 4 x SDI DTLB Control � no FPU registers SCU Register Load Write Instruction file (64 Store buffer ICache buffer registers Unit 8 read 4 write) (LSU) DCache STBus Parallel execution unit CMC 64-bit Prefetch buffer PC and Branch ◮ 4 integer ALUs D-side branch register IU IU IU IU memory unit file I-side subsystem memory subsystem Trap ◮ 2 pipelined multipliers 32 × 32 → 32 controller Peripherals STBus 3 x Interrupt Debug 32-bit Timers controller support unit Latencies: ALU = 1 cycle / Mul = 3 cycles 61 interrupts Debuglink VLIW (Very Long Instruction Word) ◮ instructions grouped into bundles ◮ Instruction-Level Parallelism (ILP) explicitly exposed by the compiler G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 4/18

Our objective Compute fast and certified schemes for evaluating a polynomial, such as P ( x , y ) = α + y · a ( x ) ◮ using only additions and multiplications ◮ reducing the evaluation latency on unbounded parallelism G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 5/18

Our objective Compute fast and certified schemes for evaluating a polynomial, such as P ( x , y ) = α + y · a ( x ) ◮ using only additions and multiplications ◮ reducing the evaluation latency on unbounded parallelism Evaluation program = main part of the full software implementation ◮ dominates the cost ◮ make it as fast as possible G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 5/18

Our objective Compute fast and certified schemes for evaluating a polynomial, such as P ( x , y ) = α + y · a ( x ) ◮ using only additions and multiplications ◮ reducing the evaluation latency on unbounded parallelism Evaluation program = main part of the full software implementation ◮ dominates the cost ◮ make it as fast as possible Two families of algorithms ◮ algorithms with coefficient adaptation: Knuth and Eve (1964), Paterson and Stockmeyer (1973), ... � ill-suited in the context of fixed-point arithmetic ◮ algorithms without coefficient adaptation G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 5/18

Remarks on polynomial evaluation There are several other schemes for evaluating a polynomial a ( x ) ◮ can be adapted for bivariate polynomial P ( x , y ) = α + y · a ( x ) Constant number of + , while number of × is non-constant ◮ reducing the latency ⇔ increasing the number of × to expose ILP ◮ trade-off latency / number of multiplications G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 6/18

Remarks on polynomial evaluation There are several other schemes for evaluating a polynomial a ( x ) ◮ can be adapted for bivariate polynomial P ( x , y ) = α + y · a ( x ) Constant number of + , while number of × is non-constant ◮ reducing the latency ⇔ increasing the number of × to expose ILP ◮ trade-off latency / number of multiplications Evaluation error ◮ different theoretical error bounds ◮ difference between numerical quality in practice G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 6/18

Remarks on polynomial evaluation There are several other schemes for evaluating a polynomial a ( x ) ◮ can be adapted for bivariate polynomial P ( x , y ) = α + y · a ( x ) Constant number of + , while number of × is non-constant ◮ reducing the latency ⇔ increasing the number of × to expose ILP ◮ trade-off latency / number of multiplications Evaluation error ◮ different theoretical error bounds ◮ difference between numerical quality in practice � We need a tool for exploring the space of evaluation schemes. G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 6/18

How many schemes for evaluating a polynomial? µ ′ µ n → a ( x ) n → α + y · a ( x ) n 1 1 10 2 7 481 3 163 88384 4 11602 57363910 5 2334244 122657263474 6 1304066578 829129658616013 7 1972869433837 17125741272619781635 8 8012682343669366 1055157310305502607244946 9 86298937651093314877 190070917121184028045719056344 10 2449381767217281163362301 98543690848554380947490522591191672 G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 7/18

How many schemes for evaluating a polynomial? µ ′ µ n → a ( x ) n → α + y · a ( x ) n wn 1 1 10 1 2 7 481 1 3 163 88384 1 4 11602 57363910 2 5 2334244 122657263474 3 6 1304066578 829129658616013 6 7 1972869433837 17125741272619781635 11 8 8012682343669366 1055157310305502607244946 23 9 86298937651093314877 190070917121184028045719056344 46 10 2449381767217281163362301 98543690848554380947490522591191672 98 Two well-known special cases ◮ the number of evaluation schemes for x n � w n ∼ ηξ n ξ ≈ 2 . 48325 n 3 / 2 or η ≈ 0 . 31877 G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 7/18

How many schemes for evaluating a polynomial? µ ′ µ n → a ( x ) n → α + y · a ( x ) ( 2 n − 1 )!! n wn 1 1 10 1 1 2 7 481 1 3 3 163 88384 1 15 4 11602 57363910 2 105 5 2334244 122657263474 3 945 6 1304066578 829129658616013 6 10395 7 1972869433837 17125741272619781635 11 135135 8 8012682343669366 1055157310305502607244946 23 2027025 9 86298937651093314877 190070917121184028045719056344 46 34459425 10 2449381767217281163362301 98543690848554380947490522591191672 98 654729075 Two well-known special cases ◮ the number of evaluation schemes for x n � w n ∼ ηξ n ξ ≈ 2 . 48325 n 3 / 2 or η ≈ 0 . 31877 n √ � 2 n � n ◮ the number of evaluation schemes for ∑ a i est ( 2 n − 1 )!! ∼ 2 e i = 0 G. Revy (DALI UPVD/LIRMM,CNRS,UM2) Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation 7/18

Automatic Synthesis of Fast and Certified Code for Polynomial - PowerPoint PPT Presentation

ANR MetaLibm kick-off meeting Lyon, 22 January, 2014 Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation through the example of the CGPE tool Guillaume Revy quipe-projet DALI, Univ. Perpignan Via Domitia LIRMM, CNRS:

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Hardware Design with VHDL Synthesis of VHDL Code ECE 443 Synthesis of VHDL Code This slide set

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Scaling Program Synthesis by Exploiting Existing Code James Bornholt Emina Torlak University of

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

CS422 Computer Architecture Spring 2004 Lecture 15, 20 Feb 2004 Bhaskaran Raman Department of

ISA Implementations Partly in Run programs for one ISA on hardware with different ISA Techniques:

Exam Review 2 1 ROB: head/tail yes R1 B yes none no X5 R3 A none no no --- --- F

Cryptomaniac A Cautionary Tale Dont Let This Happen to You! AES Selection Process Started

Chapter 2 Instruction-Level Parallelism and Its E Exploitation l it ti 1 Overview

Network Flow-based Bipartitioning Perform flow-based bipartitioning under: Area constraint

Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is

Welcome to CSE 506 Introduc/on & Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

Automatic Synthesis of Fast and Certified Code for Polynomial - PowerPoint PPT Presentation

ANR MetaLibm kick-off meeting Lyon, 22 January, 2014 Automatic Synthesis of Fast and Certified Code for Polynomial Evaluation through the example of the CGPE tool Guillaume Revy quipe-projet DALI, Univ. Perpignan Via Domitia LIRMM, CNRS:

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Hardware Design with VHDL Synthesis of VHDL Code ECE 443 Synthesis of VHDL Code This slide set

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Scaling Program Synthesis by Exploiting Existing Code James Bornholt Emina Torlak University of

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

CS422 Computer Architecture Spring 2004 Lecture 15, 20 Feb 2004 Bhaskaran Raman Department of

ISA Implementations Partly in Run programs for one ISA on hardware with different ISA Techniques:

Exam Review 2 1 ROB: head/tail yes R1 B yes none no X5 R3 A none no no --- --- F

Cryptomaniac A Cautionary Tale Dont Let This Happen to You! AES Selection Process Started

Chapter 2 Instruction-Level Parallelism and Its E Exploitation l it ti 1 Overview

Network Flow-based Bipartitioning Perform flow-based bipartitioning under: Area constraint

Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is

Welcome to CSE 506 Introduc/on &amp; Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

Welcome to CSE 506 Introduc/on & Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506: