m m adison e adison e mbedded s mbedded s ystems a ystems
play

M M adison E adison E mbedded S mbedded S ystems & A ystems - PowerPoint PPT Presentation

M adison M adison E mbedded E mbedded S ystems & S ystems & A rchitectures Laboratory A rchitectures Laboratory M M adison E adison E mbedded S mbedded S ystems & A ystems & A rchitectures Laboratory rchitectures Laboratory (M


  1. M adison M adison E mbedded E mbedded S ystems & S ystems & A rchitectures Laboratory A rchitectures Laboratory M M adison E adison E mbedded S mbedded S ystems & A ystems & A rchitectures Laboratory rchitectures Laboratory (M ESA (M ESA M ESA) M ESA) Department of Electrical and Computer Engineering Department of Electrical Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering and Computer Engineering Decimal Floating-Point Adder and Multifunction Unit with Injection-Based Rounding Liang-Kai Wang and Michael J. Schulte University of Wisconsin-Madison ARITH-18, Montpellier, France This research is supported by the UW-Madision Graduate 1 School and IBM

  2. Outline • Motivation • Related Research • Algorithm for Decimal Floating-Point (DFP) Adder and Multifunction Unit • Hardware Design • Experimental Results and Analysis • Conclusions 2

  3. Motivation • Important in business applications =0.2 10 = 0.00110011… 2 • The IEEE P754 floating-point standard – Three DFP formats: 34-digit decimal128 format, 16-digit decimal64 format (this paper), and 7- digit decimal32 format • Decimal floating-point software is slow • Decreasing transistor costs 3

  4. Previous Research and Proposed Design • Previous designs – Focus on fixed-point addition and subtraction • For example, [Adiletta89], [Schmookler71] – [Thompson04] presents the first IEEE P754 compliant DFP adder • We propose an DFP multifunction unit that – Supports eight DFP operations • add, sub, quantize, sameQuantum, roundToIntegral, minNum, maxNum, and compare – Optimizes significand alignment – Applies decimal injection-based rounding – Uses a decimal flag-tracing mechanism 4

  5. DFP Adder and Multifunction Unit A B SA = sign of A Forward format conversion SB = sign of B EA = exponent of A EB = exponent of B Operand alignment CA = significand of A CB = significand of B Pre-correction Carry propagation network Post-correction Overflow detection Shift and round Backward format conversion 5 S

  6. Operand Alignment E x p o n e n t s ( E A a n d E B ) a n d L e n g t h s o f L e a d i n g Z e r o ( L A a n d • Decimal operands are not L B ) normalized Y E S • Operand alignment calculation S w a p C A a n d C B E A < E B • E.g. LA = 5 , EA – EB = 9 N O N O L e f t S h i f t C A b y L A < | E A - E B | S ( L A - | E A - E B | ) s S P digits Y E S L e f t S h i f t C A b y L A A=CA X 10 EA = X 10 EA X 10 EA-5 S S 0…0 a i-1 … a 0 0 0 0 0 0 a i-1 … a 0 R i g h t S h i f t C B b y S LA m i n ( | E A - E B | - L A , 1 9 ) G R S S B=CB X 10 EB = X 10 EB 0…0 b k-1 … b 0 0………0 b k-1 b 4 b 3 b 2 X 10 EB+4 LB LB Result X 10 EB+4 6

  7. Pre-correction • Effective operation = SA ⊕ SB ⊕ OP • Place operands based on effective operations simplifies result shifting • Inject value into the digit positions, R and S, based on rounding modes replaces rounding by truncation. L G R S xxxx xxxx xxxx xx x x 0 5 0 A Effective add roundTiesToAway 0000 xxxx xxxx xx x x B 0 5 1 result 0 0 1 7

  8. Pre-correction • Injection value Injection Value Sign inj Rounding Mode (R, S) X TowardZero (0, 0) X TieToAway (5, 0) X TieToZero (4, 9) X TieToEven (5, 0) - + ∞ (0, 0) + - ∞ (9, 9) - + ∞ (9, 9) + - ∞ (0, 0) X AwayZero (9, 9) • Operands are corrected to generate correct carry-out ( ) ( ) ⎧ CB ' If EOP = add + ⎧ ⎪ If EOP = add ( ) ⎪ CA ' 6 2 ( ) i 2 = i = ⎨ CB ⎨ CA ( ) ( ) 3 3 i i ⎪ ⎪ Otherwise Otherwise CA ' CB ' ⎩ ⎩ 2 2 i i 8

  9. Carry Propagation Network • Kogge-Stone parallel prefix 19 digits network L G R S Digit 6 5 4 3 2 1 0 18 17 16 15 14 13 12 11 10 9 8 7 Position row 0 • Two sets of flags row 1 Original – Flag F 1 handles row 2 KS Network row 3 the digit row 4 increment in the post-correction carry-out (C 1 ) row 5 flags (F 1 ) stage. sum digits (UCR) row 6 16 digits – Flag F 2 handles Post- row 7 the carry correction Post- correction row 8 16 digits (LSD) propagation from CR 1 row 9 the injection Injection Correction row 10 correction value. Shift and Block F 2 Round Unit Trailing Nine Detection Network carry 9 CR 2

  10. Post-correction • Compensate the result from the K-S network • Rule 1: effective operation is ADD – Subtract 6 from digit i for which (C 1 ) i+1 is 0 • Rule 2: effective operation is SUB – If the result is positive • Increment the result using F 1 • Subtract 6 from digit i for which (C 1 ) i+1 ⊕ (F 1 ) i ≡ 0 – If the result is negative • Invert all bits of the result • Subtract 6 from digit i for which (C 1 ) i+1 ≡ 1 10

  11. Shift and Round • Most significant digit is zero – No action is needed • Most significant digit is non-zero – Requires an injection correction step P = 16 digits L G R S A 0 5 0 Effective add TieToEven B 0 Predicted result + Significand 0 5 0 0 4 5 0 Real result X Right shift 1 digit Exponent increment 11

  12. Shift and Round • Injection correction value for different rounding modes Injection Correction Value Sign inj Rounding Mode (G, R, S) X TowardZero (0, 0, 0) X TieToAway (4, 5, 0) X TieToZero (4, 5, 0) X TieToEven (4, 5, 0) - + ∞ (0, 0, 0) + - ∞ (9, 0, 0) - + ∞ (9, 0, 0) + - ∞ (0, 0, 0) X AwayZero (9, 0, 0) • Injection correction value may trigger carry propagation 12 • Flag F 2 eliminates carry propagation

  13. Comparison Thompson’s Design This Design Supported DFP 2: add, subtract 8: add, subtract, minNum, Operations maxNum, compare, quantize, sameQuantum, roundToIntegral Internal format Excess-3 encoding BCD encoding Operand Exponent computation and Exponent computation and LZD in Alignment LZD in series parallel Carry-propagate Kogge-Stone with flag Two extra flags for rounding network tracing for post-correction Rounding Random logic and decimal Injection-based rounding with incrementer. correction. Overflow After result is rounded Before the result is rounded Detection 13

  14. Extension to Support More DFP Operations • ToIntegralValue(A) – Round A to an integer value • ToIntegralValue(13545 x 10 -3 ) = 14 with round-ties-to-even – Design strategy • Set CB 1 and EB 1 to zero • Enable right shift even if CB 1 =0 • Set effective operation to ADD • Quantize (A, B) – Change EA to EB • Quantize(12345 x 10 -4 , 1 x 10 -2 ) = 123 x 10 -2 with round-down – Design strategy • Set CB 1 to zero • Enable right shift even if CB 1 =0 • Set effective operation to ADD 14

  15. Extension to Support More DFP Operations • SameQuantum(A, B) – Check if EA ≡ EB – Generate an extra flag in the operand alignment stage • minNum, maxNum, and compare use the original datapath • Many changes are made to exception flag logic • A post-processing unit is added to handle special operands such as infinity and Not-a- Number 15

  16. Block Diagram of the DFP Adder and Multifunction Unit Operation CA 1 CA 2 RSA CB 1 LSA CB 2 Pre-correction and CA S Barrel Operand Op A SA 1 Forward Operand Placement Shifters Alignment CB S Format SB 1 Calculation Conversion Op B and Swapping EA 1 ER 1 EB 1 Rounding Mode SR 1 Sign overflow Overflow C 1 IEEE P754 CA 3 UCR Backward Result (Z) ER 2 Post- Shift and K-S Format F 1 CR 1 Post- CB 3 correction Round CR 2 R 1 Network Conversion processing F 2 16

  17. Hardware Implementation • Modeled using RTL Verilog and simulated using Modelsim • Synthesized using LSI Logic’s 0.11um Standard Cell Library and Synopsys Design Compiler • Tested using a comprehensive testbench generator and the decNumber library 3.32 17

  18. Delay and Area Comparison • Combinational circuit designs Metric Thompson’s adder Injection-based adder Improvement Delay (comb.) 3.50 ns, 63.6 FO4 2.76 ns, 50.2 FO4 21.0% Area 22443 NAND eq. gates 22086 NAND eq. gates 1.6% Table 1. Improvement over Thompson’s Design Metric Injection-based adder Multifunction Unit Overhead Delay 2.76 ns, 50.2 FO4 2.84ns, 51.6 FO4 2.8% Area 22086 NAND eq. gates 24233 NAND eq. gates 9.7% Table 2. Overhead of the Multifunction Unit Compared to the Injection-based Adder 18

  19. Cycle Times vs. Pipeline Depth • Synthesized using the pipeline_design command from the Synopsys Design Compiler 6 0 . 0 0 120000 5 0 . 0 0 100000 Area (NAND2 Gate eq.) 80000 4 0 . 0 0 4 60000 O 3 0 . 0 0 F 40000 2 0 . 0 0 20000 1 0 . 0 0 0 1 2 3 4 5 6 0 . 0 0 1 2 3 4 5 6 # of Stages # o f S t a g e s 19

  20. Conclusion • A 16-digit DFP adder and multifunction unit compliant with the IEEE P754 standard • Novel features: – Delay optimization in the operand alignment, rounding, and overflow detection units – A modified injection-based rounding method – Extensions to support multiple DFP operations • Design analysis – 21% delay improvement over Thompson’s design – 2.8% delay overhead for DFP multifunction unit 20

  21. Questions? 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend