Dept. Computer Architecture Universidad de Mlaga (Spain) - PowerPoint PPT Presentation

SONIA GONZALEZ-NAVARRO AND JAVIER HORMIGO Dept. Computer Architecture Universidad de Málaga (Spain) fjhormigo@uma.es

 New embedded applications increasingly demanding FP computation  IEEE-754 FP standard designed for GPP  Problems of using the FP standard: ▪ Lack of flexibility (Ex: word sizes) ▪ Compulsory requirements: costly and not always useful (different rounding modes, special cases, subnormal…) 2

 The problem exists: ▪ FPGA tools use almost compliant formats, but: ▪ Variable sizes, subnormals , special case flags… ▪ Special internal format (Intel fused FP-datapath) ▪ Synopsys Flexible Floating-Point format ▪ Two´ s complement, flags, no normalization, truncation…  Consequences: ▪ Multiple-variations of the standard are used=> incompatibility and irreproducibility ▪ Hardware implementations less efficient 3

Should a new extension of the FP standard be defined for embedded applications?  Multiple choices could be re-studied for these new applications: normalization, rounding, significand representation, special cases, etc.  Here we focus on Normalization (and rounding) ▪ How normalization affects accuracy ▪ Implementation result improvement 4

 Non-Normalized FP format  Proposed arithmetic circuits ▪ Adders ▪ Multipliers  Error measurement in DSP applications  Implementation results  Conclusions 5

 Similar to binary32  Normalization is not compulsory  No special cases  Zero and subnormal are not special cases  Simplify rounding by using truncation: ▪ Round toward zero ▪ Round to nearest by using HUB approach [1] [1] J. Hormigo and J. Villalba , “New formats for computing with real numbers under round-to- nearest”, IEEE Trans. on Computers, vol. 65, no. 7, pp. 2158 – 2168, 2016 7

 If Normalization is not compulsory, it is lost: -The implicit bit => 1 bit of precision -Leading zeros => Accuracy -Comparison operation -Reproducibility  But, it is improved: +Area reduction +Power and energy reduction +Increase of the speed 8

 If Normalization is not compulsory, it is lost: -The implicit bit => 1 bit of precision -Leading zeros => Accuracy -Comparison operation -Reproducibility Aproximate Computing (HW-accuracy trade-off)  But, it is improved: +Area reduction +Power and energy reduction +Increase of the speed 8

Basic FP Adder with no normalization(A1)  No normalization or rounding logic  Only significand overflow is normalized  Gray boxes => HUB version  Round-to-nearest 10

FP Adder with limited normalization(A2)  Up to two leading zero detection and shifting  Significand overflow is also normalized  Grey boxes => HUB version  Round-to-nearest 11

WITHOUT SIGNIFICAND WITH SIGNIFICAND OVERFLOW DETECTION (M) OVERFLOW DETECTION (M2) 12

 Leading zero detection at the input  LZz =LZx+LZy  Significand overflow is always supposed  Two versions: ▪ Limited (MLx) ▪ High radix (MRx) 13

 Using non-normalized numbers implies a loss of accuracy ▪ Loss of the implicit leading one ▪ Unaligned addition ▪ Multiplications increase the number of leading zeros 15

 Using non-normalized numbers implies a loss of accuracy ▪ Loss of the implicit leading one ▪ Unaligned addition ▪ Multiplications increase the number of leading zeros .0101011 1.0101011 15

 Using non-normalized numbers implies a loss of accuracy ▪ Loss of the implicit leading one ▪ Unaligned addition ▪ Multiplications increase the number of leading zeros .0101011 0.0001011 1.0101011 + 1.1101101 15

 Using non-normalized numbers implies a loss of accuracy ▪ Loss of the implicit leading one ▪ Unaligned addition ▪ Multiplications increase the number of leading zeros 0.0101011 .0101011 0.0001011 x 0.1100111 1.0101011 + 1.1101101 00.010001011… 15

 Experiment with several DSP algorithm A1MH noN Reference FP64 FP32 NoN architectures Tested FPGA Non-Normalized ARM A9 Unit Error SNR 𝐹 𝑧 𝑇𝑂𝑆 𝑒𝐶 = 10 ∗ 𝑚𝑝𝑕 10 𝐹 𝑓𝑠𝑠𝑝𝑠 16

A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. 17

A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. A2 A1 17

A1: basic M: no ovf. MRx: radix-x norm. A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. IEEE HUB no HUB 146.2 A1 A2 A1 M 0 135.5 0 M2 133.8 135.9 124 MR1 133.9 135.5 123 MR4 132.0 135.5 123 MR8 1.3 135.5 1.3 ML4 133.9 135.5 123.4 ML6 133.9 135.5 123.2 18

A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. 19

 Round-to-nearest is essential  A2 is the best adder  A2M2H the best combination  Limited normalization in adders give better accuracy than normalizing multipliers 20

 Conditions: ▪ 32-bit FP architectures ▪ Fully combinational architectures ▪ Synopsys Design Compiler Ultra H-2013.03-SP2 ▪ TSMC 65nm Library typical case ▪ Area and power when targeting the same frequency 22

AREA POWER COMSUMPTION • Very important reduction for all versions (around 40%-75%) • Higher speed • HUB version uses slightly less area and power • Partial normalization has a significant cost 23

AREA POWER COMSUMPTION • Much less reduction than for adders • Improvement comes from elimination of rounding logic • HUB version slightly more area and power 24

A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION 25

A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION Upper limit 25

A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION Upper limit Lower limit 25

A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION 25

A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION About 25%- 50% Area and Power reduction 25

 Removing normalization condition allows hardware-cost vs accuracy trade-off  Different adders and multipliers proposed for dealing with this trade-off  Rounding-to-nearest and a few-bit normalization are enough to limit accuracy loss  By reasonable loss of accuracy (10 dB), area and power could be reduced up to 50% 27

 Obtained results encourages us to continue by seeking new non-normalized architectures, and testing more applications  Other FP standard characteristics are also questionable in embedded applications  We aim for opening a debate about the need for defining a new FP standard extension for new embeded applications 28

Questions?

Dept. Computer Architecture Universidad de Mlaga (Spain) - PowerPoint PPT Presentation

SONIA GONZALEZ-NAVARRO AND JAVIER HORMIGO Dept. Computer Architecture Universidad de Mlaga (Spain) fjhormigo@uma.es New embedded applications increasingly demanding FP computation IEEE-754 FP standard designed for GPP Problems of

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

CSE 675.02: three aspects of computer design: instruction set architecture, Introduction to

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &

Introduction to Software Architecture Reid Holmes Architecture Architecture is: All

CMS Strip Readout Architecture for SLHC OUTLINE brief review of LHC strip readout architecture p

Natalie Riggins August 2014 Colorado Marijuana Industry Dept. of Public Dept. of Dept. of

Dimension Reduction for Classification Alfred O. Hero Dept. EECS, Dept BME, Dept. Statistics

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

Open questions in magnetism Fundamental questions 3d and 4fmagnetism Strongly correlated

Marginal Probability If we know the joint probability distribution over a set of variables, we can

Probability: Part I Cunsheng Ding HKUST, Hong Kong October 23, 2015 Cunsheng Ding (HKUST, Hong

Multiple Differential Cryptanalysis: Theory and Practice C eline Blondeau, Beno t G

02 02 FOCUS ON LEARNING EFFECTIVE QUESTIONING Learning intentions for this workshop You

Visualization for Classification ROC, AUC, Confusion Matrix Mahdi Roozbahani Lecturer,

allspammedup.com pascal-network.org allspammedup.com pascal-network.org allspammedup.com

SVMpAUC-tight: A new algorithm for optimizing partial AUC based on a tight convex upper bound

Dept. Computer Architecture Universidad de Mlaga (Spain) - PowerPoint PPT Presentation

SONIA GONZALEZ-NAVARRO AND JAVIER HORMIGO Dept. Computer Architecture Universidad de Mlaga (Spain) fjhormigo@uma.es New embedded applications increasingly demanding FP computation IEEE-754 FP standard designed for GPP Problems of

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

CSE 675.02: three aspects of computer design: instruction set architecture, Introduction to

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture &amp; Computer Architecture &amp;

Introduction to Software Architecture Reid Holmes Architecture Architecture is: All

CMS Strip Readout Architecture for SLHC OUTLINE brief review of LHC strip readout architecture p

Natalie Riggins August 2014 Colorado Marijuana Industry Dept. of Public Dept. of Dept. of

Dimension Reduction for Classification Alfred O. Hero Dept. EECS, Dept BME, Dept. Statistics

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

Open questions in magnetism Fundamental questions 3d and 4fmagnetism Strongly correlated

Marginal Probability If we know the joint probability distribution over a set of variables, we can

Probability: Part I Cunsheng Ding HKUST, Hong Kong October 23, 2015 Cunsheng Ding (HKUST, Hong

Multiple Differential Cryptanalysis: Theory and Practice C eline Blondeau, Beno t G

02 02 FOCUS ON LEARNING EFFECTIVE QUESTIONING Learning intentions for this workshop You

Visualization for Classification ROC, AUC, Confusion Matrix Mahdi Roozbahani Lecturer,

allspammedup.com pascal-network.org allspammedup.com pascal-network.org allspammedup.com

SVMpAUC-tight: A new algorithm for optimizing partial AUC based on a tight convex upper bound

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of