Evaluating the hardware cost of the posit number system FPL19 - PowerPoint PPT Presentation

Evaluating the hardware cost of the posit number system FPL’19 – Barcelona Yohann Uguen, Luc Forget, Florent de Dinechin Univ Lyon, INSA Lyon, Inria, CITI September 9, 2019

Motivation Posit : new encoding scheme for real values 2/ 19

Motivation Posit : new encoding scheme for real values Posit claim : fewer bits, better results 2/ 19

Motivation Posit : new encoding scheme for real values Posit claim : fewer bits, better results How much does it cost ? 2/ 19

Floating point numbers Floating point values consist in a value 1 . F scaled by a power of two 2 E . v = ( − 1) s × 1 . F × 2 E 3/ 19

Floating point numbers Floating point values consist in a value 1 . F scaled by a power of two 2 E . v = ( − 1) s × 1 . F × 2 E Number of values Encoding scheme Max 2 power ∈ [1 , 2[ W E = 3 W F = 4 2 7 = 128 2 4 = 16 e 2 e 1 e 0 s f 3 f 2 f 1 f 0 W E = 2 W F = 5 2 3 = 8 2 5 = 32 e 1 e 0 s f 4 f 3 f 2 f 1 f 0 3/ 19

Floating point numbers dilemma Trade-off between dynamic range and precision with the choice of W E and W F . 4/ 19

Floating point numbers dilemma Trade-off between dynamic range and precision with the choice of W E and W F . IEEE binary16 = FP<5, 10> W E = 5 W F = 10 s e 4 e 3 e 2 e 1 e 0 f 9 f 8 f 7 f 6 f 5 f 4 f 3 f 2 f 1 f 0 4/ 19

Floating point numbers dilemma Trade-off between dynamic range and precision with the choice of W E and W F . IEEE binary16 = FP<5, 10> W E = 5 W F = 10 e 4 e 3 e 2 e 1 e 0 s f 9 f 8 f 7 f 6 f 5 f 4 f 3 f 2 f 1 f 0 bfloat16 = FP<8,7> W E = 8 W F = 7 e 7 e 6 e 5 e 4 e 3 e 2 e 1 e 0 s f 6 f 5 f 4 f 3 f 2 f 1 f 0 4/ 19

Floating point numbers dilemma Trade-off between dynamic range and precision with the choice of W E and W F . IEEE binary16 = FP<5, 10> W E = 5 W F = 10 s e 4 e 3 e 2 e 1 e 0 f 9 f 8 f 7 f 6 f 5 f 4 f 3 f 2 f 1 f 0 bfloat16 = FP<8,7> W E = 8 W F = 7 e 7 e 6 e 5 e 4 e 3 e 2 e 1 e 0 s f 6 f 5 f 4 f 3 f 2 f 1 f 0 DLFloat16 = FP<9, 6> W E = 9 W F = 6 e 8 e 7 e 6 e 5 e 4 e 3 e 2 e 1 e 0 s f 5 f 4 f 3 f 2 f 1 f 0 4/ 19

The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits 5/ 19

The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits r = 1 1 . 10001 × 2 1 − 1 = 1 . 53125 Posit<8> 0 1 0 1 0 0 0 1 5/ 19

The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits r = 1 1 . 10001 × 2 1 − 1 = 1 . 53125 0 1 0 1 0 0 0 1 Posit<8> r = 3 1 . 001 × 2 3 − 1 = 4 . 5 0 1 1 1 0 0 0 1 5/ 19

The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits r = 1 1 . 10001 × 2 1 − 1 = 1 . 53125 0 1 0 1 0 0 0 1 r = 3 Posit<8> 1 . 001 × 2 3 − 1 = 4 . 5 0 1 1 1 0 0 0 1 r = 5 1 . 1 × 2 5 − 1 = 24 0 1 1 1 1 1 0 1 5/ 19

The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits r = 1 1 . 10001 × 2 1 − 1 = 1 . 53125 0 1 0 1 0 0 0 1 r = 3 1 . 001 × 2 3 − 1 = 4 . 5 0 1 1 1 0 0 0 1 r = 5 Posit<8> 1 . 1 × 2 5 − 1 = 24 0 1 1 1 1 1 0 1 r = 7 1 × 2 7 − 1 = 64 0 1 1 1 1 1 1 1 5/ 19

Posit simple case limitation Bill Gates’s fortune : ≈ 103 . 5 × 10 9 $ 6/ 19

Posit simple case limitation Bill Gates’s fortune : ≈ 103 . 5 × 10 9 $ Posit < 32 > (103 . 5 × 10 9 ) = 2 30 ≈ 1 . 1 × 10 9 6/ 19

Increasing the range • Shift the exponent of W ES bits (scale by 2 W ES ). • Store exponent W ES low bits before fraction bits. 7/ 19

Increasing the range • Shift the exponent of W ES bits (scale by 2 W ES ). • Store exponent W ES low bits before fraction bits. Posit<8,2> W ES = 2 1 . 001 × 2 0 × 4+3 = 1 . 001 × 2 3 = 9 0 1 0 1 1 0 0 1 7/ 19

Increasing the range • Shift the exponent of W ES bits (scale by 2 W ES ). • Store exponent W ES low bits before fraction bits. Posit<8,2> W ES = 2 1 . 001 × 2 0 × 4+3 = 1 . 001 × 2 3 = 9 0 1 0 1 1 0 0 1 1 . 1 × 2 2 × 4+0 = 1 . 1 × 2 8 = 385 0 1 1 1 0 0 0 1 7/ 19

Increasing the range • Shift the exponent of W ES bits (scale by 2 W ES ). • Store exponent W ES low bits before fraction bits. Posit<8,2> W ES = 2 1 . 001 × 2 0 × 4+3 = 1 . 001 × 2 3 = 9 0 1 0 1 1 0 0 1 1 . 1 × 2 2 × 4+0 = 1 . 1 × 2 8 = 385 0 1 1 1 0 0 0 1 1 × 2 6 × 4+0 = 1 × 2 24 ≈ 16 × 10 6 0 1 1 1 1 1 1 1 7/ 19

Overview Our goals: • Evaluate the hardware cost of posits • Compare this cost to standard FP hardware • Provide an experimentation framework for posit hardware gitlab.inria.fr/lforget/marto 8/ 19

Overview Our goals: • Evaluate the hardware cost of posits • Compare this cost to standard FP hardware • Provide an experimentation framework for posit hardware Our tool Marto (Modern arithmetic tools): • Open source HLS library for custom sized posit arithmetic • Handling of Addition, Product, and quire accumulation gitlab.inria.fr/lforget/marto 8/ 19

Marto usage example IEEE binary32 adder Posit 32,2 adder #include "ieeefloats/ieee_dim.hpp" #include "posit/posit_dim.hpp" // IEEENumber<WE, WF> // PositNumber<N, WES> IEEENumber<8, 23> op1; PositNumber<32, 2> op1; IEEENumber<8, 23> op2; PositNumber<32, 2> op2; IEEENumber<8, 23> op3; PositNumber<32, 2> op3; // Compute the IEEE sum // compute the Posit(32,2) sum auto sum = op1 + op2 + op3; auto sum = op1 + op2 + op3; // ... // ... 9/ 19

Variable-size fields are not hardware friendly Fixed Size Fields Intermediate Representation posit operand 1 decoder input 1 posit operator result encoder result posit operand 2 decoder input 2 10/ 19

Variable-size fields are not hardware friendly Fixed Size Fields Intermediate Representation posit operand 1 decoder input 1 posit operator result encoder result posit operand 2 decoder input 2 Which intermediate representation ? 10/ 19

Posit Intermediate Format Posit Intermediate Format (PIF) : the smallest floating point format to store any value of a Posit format. • Significand stored in 2’s complement • Extra bits for exact rounding (Round, Sticky) • Extra bits for logic simplification (IsNaR, I) Format W E W F Posit(8,0) 4 5 Posit(16, 1) 6 12 Posit(32, 2) 8 27 Posit(64, 3) 10 58 11/ 19

Posit decoder Fixed Size Fields Intermediate Representation posit operand 1 decoder input 1 posit operator result encoder result posit operand 2 decoder input 2 12/ 19

Posit decoder PositN / N − 1 / / N − 2 N − 1 OR reduce LZOC + Shift s r ES F / N − 3 es 1 es 0 0 1 1 0 f 1 f 0 log 2 ( N ) / w es / + Bias / N − 3 − w es S isNaR I E F 12/ 19

Posit decoder PositN / N − 1 / / N − 2 N − 1 OR reduce LZOC + Shift s r ES F / N − 3 es 1 es 0 0 1 1 0 f 1 f 0 log 2 ( N ) / w es / es 1 es 0 0 0 1 f 1 f 0 0 + Bias / N − 3 − w es S isNaR I E F 12/ 19

Posit encoder Fixed Size Fields Intermediate Representation posit operand 1 decoder input 1 Posit operator result encoder result posit operand 2 decoder input 2 13/ 19

Posit encoder isNaR S E F Round Sticky / / ⌈ log 2( N ) ⌉ + 1 + w es / N − 1 − w es N − 3 − w es − Bias w es / / N / ⌈ log 2( N ) ⌉ + 1 01 10 shifter+sticky ∼ / 2 / w es + 2 / N + 1 / ( msb ) 1 / / N − 1 / 1 / / 1 ( lsb ) ⌈ log 2( N ) ⌉ +0 / 1 + / N − 1 / NaR N PositN 13/ 19

Posit addition comparison with state of the art N Design LUTs Delay (ns) Chaurasiya et al. 320 23 16 Jaiswal et al. 460 21 Marto (this work) 320 21 Synthesis targets Zynq FPGA Chaurasiya et al. 981 40 32 Jaiswal et al. 1115 29 Marto (this work) 745 24 • Chaurasiya et al. : Parametrized Posit Arithmetic Hardware Generator 2018 • Jaiswal et al. : PACoGen: A Hardware Posit Arithmetic Core Generator 2019 14/ 19

Posit product comparison with state of the art N Design LUTs delay (ns) DSPs Chaurasiya et al. 218 24 1 16 Jaiswal et al. 271 19 1 Marto (this work) 253 18 1 Synthesis targets Zynq FPGA Chaurasiya et al. 572 33 4 32 Jaiswal et al. 648 27 4 Marto (this work) 4 469 27 • Chaurasiya et al. : Parametrized Posit Arithmetic Hardware Generator 2018 • Jaiswal et al. : PACoGen: A Hardware Posit Arithmetic Core Generator 2019 15/ 19

Comparison with floating point adder N format LUTs Regs. cycles@333 MHz Marto posit 447 371 17 16 IEEE-754 216 205 12 Marto posit 999 975 23 Synthesis targets Kintex 7 32 IEEE-754 425 375 14 Xilinx float 341 467 9 Marto posit 1759 2785 36 64 IEEE-754 918 792 17 Xilinx double 641 1098 11 Posit product : ∼ 2x slower , requires ∼ 2x more LUTs 16/ 19

Evaluating the hardware cost of the posit number system FPL19 - PowerPoint PPT Presentation

Evaluating the hardware cost of the posit number system FPL19 Barcelona Yohann Uguen, Luc Forget, Florent de Dinechin Univ Lyon, INSA Lyon, Inria, CITI September 9, 2019 Motivation Posit : new encoding scheme for real values 2/ 19

Hardware Observability Framework Hardware Observability Framework Hardware Observability

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

Cost Allocation Plans and Indirect Cost Rates Cost Allocation Plans and Indirect Cost Rates

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

12 Tips for giving an Effective Presentation Louise Lehane, UoL, Ireland Tip Number One Tip

Chapter 4 Chapter 4 Marginal Costing and Cost-Volume-Profit Analysis Cost behaviour Cost

software and hardware for the Internet of Things. Choose hardware Design hardware Design

What is a prime number? What is a prime number? What is a prime number? What is a prime number?

COST European Cooperation in Science and Technology Introduction to the COST Framework Programme

Pricing according to cost Cost-based pricing Cost of a service = value of economic means used in

COST Action CA18108 Quantum gravity phenomenology in the multi-messenger approach What is a COST

Chapter 2: Cost Behavior, Activity Analysis, and Cost Estimation Agenda History of Cost

Summary Introduction Exotic Number Systems Basic number systems for Hardware Arithmetic

HE AVY ME T AL DE POSIT ION MONIT ORING ACT IVIT IE S IN MAL AYSIAN ME T E OROL

CS4402-9535: High-Performance Computing with CUDA Marc Moreno Maza University of Western Ontario,

GL Shading Language (GLSL) GLSL: high level C like language Main program (e.g.

Outline Integer representation and operations Bit operations Floating point numbers

Keep Those Ducks in (Type) Check! Francesco Pierfederici https://pythoninside.com Hi, I am

Lecture 9: Floating Point Todays topics: Division IEEE 754 representations 1

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation

Practical Bioinformatics Mark Voorhies 4/29/2011 Mark Voorhies Practical Bioinformatics Our

CS 240 Programming in C Variable Names, Elementary Types, Bit Operations September 25, 2019