A general-purpose method for faithfully rounded floating-point - PowerPoint PPT Presentation

A general-purpose method for faithfully rounded floating-point function approximation in FPGAs David B. Thomas Imperial College London 1 David Thomas, Imperial College, dt10@ic.ac.uk

FloPoCo : Parameterised primitives 2 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox : Parameterised anything Approximation Input format interval Output format 5 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox : Parameterised anything 6 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox • Architecture for FPGA function approximation – Deeply pipelined – Floating-point in and out – Faithfully rounded • Method and tool for approximating functions – Handles any most twice-differentiable functions – Completely automated: expression to VHDL – Designed for reliability rather than optimality 7 David Thomas, Imperial College, dt10@ic.ac.uk

1. Motivation 2. The FloatApprox approach 1. Range reduction and approximation method 2. Evaluation architecture 3. Evaluation in hardware 8 David Thomas, Imperial College, dt10@ic.ac.uk

Context: FPGA accelerators • Mathematical or algorithmic specification • Convert to HLS or VHDL implementation – Rely on optimised IP for floating-point – Integrated at link-time into the final design 9 David Thomas, Imperial College, dt10@ic.ac.uk

Context: FPGA accelerators • Mathematical or algorithmic specification • Convert to HLS or VHDL implementation – Rely on optimised IP for floating-point – Integrated at link-time into the final design • Intellectual challenges for accelerator design – Managing memory accesses and bandwidth – Rewriting to tolerate latency of operators – Keeping pipelines occupied – Not : designing low-level IP cores 9 David Thomas, Imperial College, dt10@ic.ac.uk

Floating-point IP: Requirements • Faithfully rounded – Make every bit count – Tractable error analysis • Pipelined for 150MHz+ clock rate – Must be pipelined: RAM and DSPs are multi-cycle – Synthesis tools have limited retiming capability • Working RTL (circuit) implementation – A paper can’t be synthesised 10 David Thomas, Imperial College, dt10@ic.ac.uk

A fable... Subject: Floating-point log1p? To: dt10@ic.ac.uk From: phd-slash-industry-bod@somewhere.com Body: I’m converting some code for an accelerator, and it uses log1p. Can I use your core from that PoC you did a while back? 11 David Thomas, Imperial College, dt10@ic.ac.uk

A fable... Subject: Re: Floating-point log1p? To: phd-slash-industry-bod@somewhere.com From: dt10@ic.ac.uk Body: Afraid that was written in Handel- C, I don’t have any VHDL. You could recreate it using the attached maple script, plus write a code gen. > I’m converting some code for an accelerator, and > it uses log1p. Can I use your core from that > PoC you did a while back? 12 David Thomas, Imperial College, dt10@ic.ac.uk

A fable... Subject: Re: Floating-point log1p? To: phd-slash-industry-bod@somewhere.com From: dt10@ic.ac.uk Body: Any luck? > Afraid that was written in Handel- C, I don’t > have any VHDL. You could recreate it using > the attached maple script, plus write a code gen. > > I’m converting some code for an accelerator, and > > it uses log1p. Can I use your core from that > > PoC you did a while back? 13 David Thomas, Imperial College, dt10@ic.ac.uk

... becomes a nightmare Subject: Re: Floating-point log1p? To: phd-slash-industry-bod@somewhere.com From: dt10@ic.ic.ac.uk Body: Oh, we don’t have maple . It’s ok, I found out log1p(x)=log(1+x), and just did that. Works fine. > Any luck? > > Afraid that was written in Handel- C, I don’t > > have any VHDL. You could recreate it using > > the attached maple script, plus write a code gen. 14 David Thomas, Imperial College, dt10@ic.ac.uk

What IP is available? Source Pipelined Faithful RTL add, mul, div FloPoCo Yes Yes Yes log, exp FloPoCo Yes Yes Yes sin, cos FPLibrary No Yes Yes Altera Yes Yes Altera flow only Xilinx Yes ? Vivado HLS only log1p Altera Yes Yes Altera flow only expm1 Altera Yes No OpenCL only erf Altera Yes No OpenCL only 15 David Thomas, Imperial College, dt10@ic.ac.uk

Motivation for FloatApprox • We currently have : + , - , * , / , log , exp – Use existing IP: FloPoCo, Xilinx, Altera, ... 16 David Thomas, Imperial College, dt10@ic.ac.uk

Motivation for FloatApprox • We currently have : + , - , * , / , log , exp – Use existing IP: FloPoCo, Xilinx, Altera, ... • We should have : log1p, expm1, erf, sin, acos, ... – What FloatApprox does badly... ... but better than anything else available 16 David Thomas, Imperial College, dt10@ic.ac.uk

Motivation for FloatApprox • We currently have : + , - , * , / , log , exp – Use existing IP: FloPoCo, Xilinx, Altera, ... • We should have : log1p, expm1, erf, sin, acos, ... – What FloatApprox does badly... ... but better than anything else available • What I want : sqrt(-2 log(x)), 1/(1+exp(-x)) – What FloatApprox does well 16 David Thomas, Imperial College, dt10@ic.ac.uk

Goals of FloatApprox • As a tool – Convert any function f(x) to RTL – Able to handle most smooth functions • Smooth = twice differentiable for our purposes – Suitable for automated use • Input : data-types, range, function • Output : faithfully rounded circuit 17 David Thomas, Imperial College, dt10@ic.ac.uk

Goals of FloatApprox • As a tool – Convert any function f(x) to RTL – Able to handle most smooth functions • Smooth = twice differentiable for our purposes – Suitable for automated use • Input : data-types, range, function • Output : faithfully rounded circuit • As generated IP – Pipelined – Faithfully rounded – Working RTL 17 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox: requirements • User can specify any specified target function • Parameterised floating-point representation – Input and output formats can be distinct • Portable between platforms • Usable from many languages • Open-source • Low latency • Minimal resource 19 David Thomas, Imperial College, dt10@ic.ac.uk

Architecture and Approximation • Architecture : – General template for creating any approximator • Approximation – Configuring the template for a given function 20 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox : Approximation • Given a function f t how do we create f a ? 21 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox : Approximation • Given a function f t how do we create f a ? • Segment the function so that segments are: 21 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox : Approximation • Given a function f t how do we create f a ? • Segment the function so that segments are: 1. Contained in one input binade 21 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox : Approximation • Given a function f t how do we create f a ? • Segment the function so that segments are: 1. Contained in one input binade 1. Contained in one output binade 21 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox : Approximation • Given a function f t how do we create f a ? • Segment the function so that segments are: 1. Contained in one input binade 1. Contained in one output binade 1. FaithfulFixed: can faithfully approximate with fixed-point polynomial of degree d 21 David Thomas, Imperial College, dt10@ic.ac.uk

FloatApprox : Approximation • Given a function f t how do we create f a ? • Segment the function so that segments are: 1. Contained in one input binade 2. Monotonically increasing or decreasing in range 3. Contained in one output binade 4. FaithfulReal: can approx. with real degree d poly 5. FaithfulFixed: can faithfully approximate with fixed-point polynomial of degree d 21 David Thomas, Imperial College, dt10@ic.ac.uk

A general-purpose method for faithfully rounded floating-point - PowerPoint PPT Presentation

A general-purpose method for faithfully rounded floating-point function approximation in FPGAs David B. Thomas Imperial College London 1 David Thomas, Imperial College, dt10@ic.ac.uk FloPoCo : Parameterised primitives 2 David Thomas,

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Method Handles Everywhere! Charles Oliver Nutter @headius Method Handles What are method

B Method Proof assistants May 16, 2017 Lucas Franceschino What is B method? B-method goal

Newtons method Newtons method 1 / 8 Newtons method Objective: solving a non-linear

Chapter 2: Method of Alterations The Probabilistic Method Summer 2020 Freie Universitt Berlin

A General-Purpose Machine Learning Method for Tokenization and Sentence Boundary Detection

Income ome A Appr pproach Three methods to determine: Summation Method (build-up method)

Model Generation Method 2ai The Model Generation (MG) method is an easy method for humans to

Computational Optimization Newtons Method 2/5/08 Newtons Method Method for finding a zero

Acceptance-Rejection method The acceptance-rejection method is usually used when the inverse

Lecture 2: Inverse CDF method Todays lecture In this lecture we look at the inverse CDF method

Beyond the Single main() Method } Many classes can be written using only one method: main(),

OUR JOURNEY By Giving PURPOSE to LEARNING & PURPOSE to LIFE

Purpose: Purpose: Purpose: Purpose: Assessment of status and trends in biodiversity

Purpose, Function, and Design Purpose, Function, and Design Purpose, Function, and Design

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P RACTICAL T OOLS by Ilya Kuzovkin

Activation Functions Activation Functions In [1]: % matplotlib inline import d2l from mxnet

JACKPOT: Online Experimentation of Cloud Microservices BY M. TOSLALI 1 , S. PARTHASARATHY 2 , F.

Dense layers IN TRODUCTION TO TEN S ORF LOW IN P YTH ON Isaiah Hull Economist The linear

Gradient for Cross-Entropy Loss with Sigmoid For a single example ( x , y ): K

EN. 601.467/667 Introduction to Human Language Technology Deep Learning II Shinji Watanabe 1

Logistic Regression: From Binary to Multi-Class Shuiwang Ji Department of Computer Science &

CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine

A general-purpose method for faithfully rounded floating-point - PowerPoint PPT Presentation

A general-purpose method for faithfully rounded floating-point function approximation in FPGAs David B. Thomas Imperial College London 1 David Thomas, Imperial College, dt10@ic.ac.uk FloPoCo : Parameterised primitives 2 David Thomas,

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Method Handles Everywhere! Charles Oliver Nutter @headius Method Handles What are method

B Method Proof assistants May 16, 2017 Lucas Franceschino What is B method? B-method goal

Newtons method Newtons method 1 / 8 Newtons method Objective: solving a non-linear

Chapter 2: Method of Alterations The Probabilistic Method Summer 2020 Freie Universitt Berlin

A General-Purpose Machine Learning Method for Tokenization and Sentence Boundary Detection

Income ome A Appr pproach Three methods to determine: Summation Method (build-up method)

Model Generation Method 2ai The Model Generation (MG) method is an easy method for humans to

Computational Optimization Newtons Method 2/5/08 Newtons Method Method for finding a zero

Acceptance-Rejection method The acceptance-rejection method is usually used when the inverse

Lecture 2: Inverse CDF method Todays lecture In this lecture we look at the inverse CDF method

Beyond the Single main() Method } Many classes can be written using only one method: main(),

OUR JOURNEY By Giving PURPOSE to LEARNING &amp; PURPOSE to LIFE

Purpose: Purpose: Purpose: Purpose: Assessment of status and trends in biodiversity

Purpose, Function, and Design Purpose, Function, and Design Purpose, Function, and Design

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT &amp; P RACTICAL T OOLS by Ilya Kuzovkin

Activation Functions Activation Functions In [1]: % matplotlib inline import d2l from mxnet

JACKPOT: Online Experimentation of Cloud Microservices BY M. TOSLALI 1 , S. PARTHASARATHY 2 , F.

Dense layers IN TRODUCTION TO TEN S ORF LOW IN P YTH ON Isaiah Hull Economist The linear

Gradient for Cross-Entropy Loss with Sigmoid For a single example ( x , y ): K

EN. 601.467/667 Introduction to Human Language Technology Deep Learning II Shinji Watanabe 1

Logistic Regression: From Binary to Multi-Class Shuiwang Ji Department of Computer Science &amp;

CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine

OUR JOURNEY By Giving PURPOSE to LEARNING & PURPOSE to LIFE

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P RACTICAL T OOLS by Ilya Kuzovkin

Logistic Regression: From Binary to Multi-Class Shuiwang Ji Department of Computer Science &