dept computer architecture

Dept. Computer Architecture Universidad de Mlaga (Spain) - PowerPoint PPT Presentation

SONIA GONZALEZ-NAVARRO AND JAVIER HORMIGO Dept. Computer Architecture Universidad de Mlaga (Spain) fjhormigo@uma.es New embedded applications increasingly demanding FP computation IEEE-754 FP standard designed for GPP Problems of


  1. SONIA GONZALEZ-NAVARRO AND JAVIER HORMIGO Dept. Computer Architecture Universidad de Málaga (Spain) fjhormigo@uma.es

  2.  New embedded applications increasingly demanding FP computation  IEEE-754 FP standard designed for GPP  Problems of using the FP standard: ▪ Lack of flexibility (Ex: word sizes) ▪ Compulsory requirements: costly and not always useful (different rounding modes, special cases, subnormal…) 2

  3.  The problem exists: ▪ FPGA tools use almost compliant formats, but: ▪ Variable sizes, subnormals , special case flags… ▪ Special internal format (Intel fused FP-datapath) ▪ Synopsys Flexible Floating-Point format ▪ Two´ s complement, flags, no normalization, truncation…  Consequences: ▪ Multiple-variations of the standard are used=> incompatibility and irreproducibility ▪ Hardware implementations less efficient 3

  4. Should a new extension of the FP standard be defined for embedded applications?  Multiple choices could be re-studied for these new applications: normalization, rounding, significand representation, special cases, etc.  Here we focus on Normalization (and rounding) ▪ How normalization affects accuracy ▪ Implementation result improvement 4

  5.  Non-Normalized FP format  Proposed arithmetic circuits ▪ Adders ▪ Multipliers  Error measurement in DSP applications  Implementation results  Conclusions 5

  6.  Non-Normalized FP format  Proposed arithmetic circuits ▪ Adders ▪ Multipliers  Error measurement in DSP applications  Implementation results  Conclusions 6

  7.  Similar to binary32  Normalization is not compulsory  No special cases  Zero and subnormal are not special cases  Simplify rounding by using truncation: ▪ Round toward zero ▪ Round to nearest by using HUB approach [1] [1] J. Hormigo and J. Villalba , “New formats for computing with real numbers under round-to- nearest”, IEEE Trans. on Computers, vol. 65, no. 7, pp. 2158 – 2168, 2016 7

  8.  If Normalization is not compulsory, it is lost: -The implicit bit => 1 bit of precision -Leading zeros => Accuracy -Comparison operation -Reproducibility  But, it is improved: +Area reduction +Power and energy reduction +Increase of the speed 8

  9.  If Normalization is not compulsory, it is lost: -The implicit bit => 1 bit of precision -Leading zeros => Accuracy -Comparison operation -Reproducibility Aproximate Computing (HW-accuracy trade-off)  But, it is improved: +Area reduction +Power and energy reduction +Increase of the speed 8

  10.  Non-Normalized FP format  Proposed arithmetic circuits ▪ Adders ▪ Multipliers  Error measurement in DSP applications  Implementation results  Conclusions 9

  11. Basic FP Adder with no normalization(A1)  No normalization or rounding logic  Only significand overflow is normalized  Gray boxes => HUB version  Round-to-nearest 10

  12. FP Adder with limited normalization(A2)  Up to two leading zero detection and shifting  Significand overflow is also normalized  Grey boxes => HUB version  Round-to-nearest 11

  13. WITHOUT SIGNIFICAND WITH SIGNIFICAND OVERFLOW DETECTION (M) OVERFLOW DETECTION (M2) 12

  14.  Leading zero detection at the input  LZz =LZx+LZy  Significand overflow is always supposed  Two versions: ▪ Limited (MLx) ▪ High radix (MRx) 13

  15.  Non-Normalized FP format  Proposed arithmetic circuits ▪ Adders ▪ Multipliers  Error measurement in DSP applications  Implementation results  Conclusions 14

  16.  Using non-normalized numbers implies a loss of accuracy ▪ Loss of the implicit leading one ▪ Unaligned addition ▪ Multiplications increase the number of leading zeros 15

  17.  Using non-normalized numbers implies a loss of accuracy ▪ Loss of the implicit leading one ▪ Unaligned addition ▪ Multiplications increase the number of leading zeros .0101011 1.0101011 15

  18.  Using non-normalized numbers implies a loss of accuracy ▪ Loss of the implicit leading one ▪ Unaligned addition ▪ Multiplications increase the number of leading zeros .0101011 0.0001011 1.0101011 + 1.1101101 15

  19.  Using non-normalized numbers implies a loss of accuracy ▪ Loss of the implicit leading one ▪ Unaligned addition ▪ Multiplications increase the number of leading zeros 0.0101011 .0101011 0.0001011 x 0.1100111 1.0101011 + 1.1101101 00.010001011… 15

  20.  Experiment with several DSP algorithm A1MH noN Reference FP64 FP32 NoN architectures Tested FPGA Non-Normalized ARM A9 Unit Error SNR 𝐹 𝑧 𝑇𝑂𝑆 𝑒𝐶 = 10 ∗ 𝑚𝑝𝑕 10 𝐹 𝑓𝑠𝑠𝑝𝑠 16

  21. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. 17

  22. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. 17

  23. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. 17

  24. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. A2 A1 17

  25. A1: basic M: no ovf. MRx: radix-x norm. A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. IEEE HUB no HUB 146.2 A1 A2 A1 M 0 135.5 0 M2 133.8 135.9 124 MR1 133.9 135.5 123 MR4 132.0 135.5 123 MR8 1.3 135.5 1.3 ML4 133.9 135.5 123.4 ML6 133.9 135.5 123.2 18

  26. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. 19

  27. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. 19

  28. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. 19

  29.  Round-to-nearest is essential  A2 is the best adder  A2M2H the best combination  Limited normalization in adders give better accuracy than normalizing multipliers 20

  30.  Non-Normalized FP format  Proposed arithmetic circuits ▪ Adders ▪ Multipliers  Error measurement in DSP applications  Implementation results  Conclusions 21

  31.  Conditions: ▪ 32-bit FP architectures ▪ Fully combinational architectures ▪ Synopsys Design Compiler Ultra H-2013.03-SP2 ▪ TSMC 65nm Library typical case ▪ Area and power when targeting the same frequency 22

  32. AREA POWER COMSUMPTION • Very important reduction for all versions (around 40%-75%) • Higher speed • HUB version uses slightly less area and power • Partial normalization has a significant cost 23

  33. AREA POWER COMSUMPTION • Much less reduction than for adders • Improvement comes from elimination of rounding logic • HUB version slightly more area and power 24

  34. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION 25

  35. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION Upper limit 25

  36. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION Upper limit Lower limit 25

  37. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION 25

  38. A1: basic M: no ovf. MRx: radix-x norm. H: HUB A2: lim. norm. M2: ovf. MLx: lim. x-bit norm. AREA POWER COMSUMPTION About 25%- 50% Area and Power reduction 25

  39.  Non-Normalized FP format  Proposed arithmetic circuits ▪ Adders ▪ Multipliers  Error measurement in DSP applications  Implementation results  Conclusions 26

  40.  Removing normalization condition allows hardware-cost vs accuracy trade-off  Different adders and multipliers proposed for dealing with this trade-off  Rounding-to-nearest and a few-bit normalization are enough to limit accuracy loss  By reasonable loss of accuracy (10 dB), area and power could be reduced up to 50% 27

  41.  Obtained results encourages us to continue by seeking new non-normalized architectures, and testing more applications  Other FP standard characteristics are also questionable in embedded applications  We aim for opening a debate about the need for defining a new FP standard extension for new embeded applications 28

  42. Questions?

Recommend


More recommend