M M adison E adison E mbedded S mbedded S ystems & A ystems - - PowerPoint PPT Presentation

m m adison e adison e mbedded s mbedded s ystems a ystems
SMART_READER_LITE
LIVE PREVIEW

M M adison E adison E mbedded S mbedded S ystems & A ystems - - PowerPoint PPT Presentation

M adison M adison E mbedded E mbedded S ystems & S ystems & A rchitectures Laboratory A rchitectures Laboratory M M adison E adison E mbedded S mbedded S ystems & A ystems & A rchitectures Laboratory rchitectures Laboratory (M


slide-1
SLIDE 1

This research is supported by the UW-Madision Graduate School and IBM 1

Department of Electrical Department of Electrical and Computer Engineering and Computer Engineering

M M adison

adison E

E mbedded

mbedded S

Systems &

ystems & A

A rchitectures Laboratory

rchitectures Laboratory

(M ESA M ESA)

Decimal Floating-Point Adder and Multifunction Unit with Injection-Based Rounding

Liang-Kai Wang and Michael J. Schulte University of Wisconsin-Madison

ARITH-18, Montpellier, France

Department of Electrical Department of Electrical and Computer Engineering and Computer Engineering

M M adison

adison E

E mbedded

mbedded S

Systems &

ystems & A

A rchitectures Laboratory

rchitectures Laboratory

(M ESA M ESA)

slide-2
SLIDE 2

2

Outline

  • Motivation
  • Related Research
  • Algorithm for Decimal Floating-Point (DFP)

Adder and Multifunction Unit

  • Hardware Design
  • Experimental Results and Analysis
  • Conclusions
slide-3
SLIDE 3

3

Motivation

  • Important in business applications

=0.210= 0.00110011…2

  • The IEEE P754 floating-point standard

– Three DFP formats: 34-digit decimal128 format, 16-digit decimal64 format (this paper), and 7- digit decimal32 format

  • Decimal floating-point software is slow
  • Decreasing transistor costs
slide-4
SLIDE 4

4

Previous Research and Proposed Design

  • Previous designs

– Focus on fixed-point addition and subtraction

  • For example, [Adiletta89], [Schmookler71]

– [Thompson04] presents the first IEEE P754 compliant DFP adder

  • We propose an DFP multifunction unit that

– Supports eight DFP operations

  • add, sub, quantize, sameQuantum, roundToIntegral,

minNum, maxNum, and compare

– Optimizes significand alignment – Applies decimal injection-based rounding – Uses a decimal flag-tracing mechanism

slide-5
SLIDE 5

5

DFP Adder and Multifunction Unit

Forward format conversion Operand alignment Pre-correction Carry propagation network Post-correction Overflow detection Shift and round Backward format conversion A B S SA = sign of A SB = sign of B EA = exponent of A EB = exponent of B CA = significand of A CB = significand of B

slide-6
SLIDE 6

6

X 10EB bk-1 … b0 0…0 A=CA X 10EA = 0…0 ai-1 … a0 X 10EA P digits LA Result X 10EB+4 LB bk-1 b4 0………0 b3b2 X 10EB+4 G R S ai-1 … a0 0 0 0 0 0 X 10EA-5

Operand Alignment

  • Decimal operands are not

normalized

  • Operand alignment calculation
  • E.g. LA = 5 , EA – EB = 9

B=CB X 10EB = LB

E x p

  • n

e n t s ( E A a n d E B ) a n d L e n g t h s

  • f

L e a d i n g Z e r

  • (

L A a n d L B ) E A < E B L A

s

< | E A

  • E

B | L e f t S h i f t C A

S

b y ( L A

S

  • |

E A

  • E

B | ) L e f t S h i f t C A

S

b y L A

S

R i g h t S h i f t C B

S

b y m i n ( | E A

  • E

B |

  • L

A

S

, 1 9 ) Y E S N O N O Y E S S w a p C A a n d C B

slide-7
SLIDE 7

7

Pre-correction

  • Effective operation = SA⊕SB⊕OP
  • Place operands based on effective
  • perations simplifies result shifting
  • Inject value into the digit positions, R and

S, based on rounding modes replaces rounding by truncation.

Effective add roundTiesToAway

0000 xxxx xxxx xx x x xxxx xxxx xxxx xx x x

G R S L 1 5 result 1 A B 5 0

slide-8
SLIDE 8

8

Pre-correction

  • Injection value
  • Operands are corrected to generate correct carry-out

(9, 9) AwayZero X (0, 0)

+ (9, 9) +∞

  • (9, 9)

+ (0, 0) +∞

  • (5, 0)

TieToEven X (4, 9) TieToZero X (5, 0) TieToAway X (0, 0) TowardZero X Injection Value (R, S) Rounding Mode Signinj

( ) ( ) ( )

⎪ ⎩ ⎪ ⎨ ⎧ + =

i i i

CA CA CA

2 2 3

' 6 ' If EOP = add Otherwise If EOP = add Otherwise

( ) ( ) ( )

⎪ ⎩ ⎪ ⎨ ⎧ =

i i i

CB CB CB

2 2 3

' '

slide-9
SLIDE 9

9

Carry Propagation Network

  • Kogge-Stone

parallel prefix network

  • Two sets of flags

– Flag F1 handles the digit increment in the post-correction stage. – Flag F2 handles the carry propagation from the injection correction value.

L G R S

F2

Post- correction Original KS Network

row 5 row 4 row 3 row 2 row 1 row 0 row 9 row 8 row 7 row 6

Shift and Round Unit

19 digits row 10 Trailing Nine Detection Network Injection Correction Block Post- correction (LSD)

18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Digit Position carry-out (C1) flags (F1) sum digits (UCR) CR1

16 digits 16 digits carry

CR2

slide-10
SLIDE 10

10

Post-correction

  • Compensate the result from the K-S network
  • Rule 1: effective operation is ADD

– Subtract 6 from digit i for which (C1)i+1 is 0

  • Rule 2: effective operation is SUB

– If the result is positive

  • Increment the result using F1
  • Subtract 6 from digit i for which (C1)i+1 ⊕ (F1)i ≡ 0

– If the result is negative

  • Invert all bits of the result
  • Subtract 6 from digit i for which (C1)i+1 ≡ 1
slide-11
SLIDE 11

11

4 5 0 5 0 0 Significand

Shift and Round

  • Most significant digit is zero

– No action is needed

  • Most significant digit is non-zero

– Requires an injection correction step

Effective add TieToEven X Real result 5 0 G R S L Predicted result A B Right shift 1 digit Exponent increment

+

P = 16 digits

slide-12
SLIDE 12

12

Shift and Round

  • Injection correction value for different rounding modes
  • Injection correction value may trigger carry propagation
  • Flag F2 eliminates carry propagation

(9, 0, 0) AwayZero X (0, 0, 0)

+ (9, 0, 0) +∞

  • (9, 0, 0)

+ (0, 0, 0) +∞

  • (4, 5, 0)

TieToEven X (4, 5, 0) TieToZero X (4, 5, 0) TieToAway X (0, 0, 0) TowardZero X Injection Correction Value (G, R, S) Rounding Mode Signinj

slide-13
SLIDE 13

13

Comparison

Injection-based rounding with correction. Random logic and decimal incrementer. Rounding 8: add, subtract, minNum, maxNum, compare, quantize, sameQuantum, roundToIntegral 2: add, subtract Supported DFP Operations Before the result is rounded After result is rounded Overflow Detection Two extra flags for rounding Kogge-Stone with flag tracing for post-correction Carry-propagate network Exponent computation and LZD in parallel Exponent computation and LZD in series Operand Alignment Excess-3 encoding Thompson’s Design BCD encoding Internal format This Design

slide-14
SLIDE 14

14

Extension to Support More DFP Operations

  • ToIntegralValue(A)

– Round A to an integer value

  • ToIntegralValue(13545 x 10-3) = 14 with round-ties-to-even

– Design strategy

  • Set CB1 and EB1 to zero
  • Enable right shift even if CB1=0
  • Set effective operation to ADD
  • Quantize (A, B)

– Change EA to EB

  • Quantize(12345 x 10-4, 1 x 10-2) = 123 x 10-2 with round-down

– Design strategy

  • Set CB1 to zero
  • Enable right shift even if CB1=0
  • Set effective operation to ADD
slide-15
SLIDE 15

15

Extension to Support More DFP Operations

  • SameQuantum(A, B)

– Check if EA ≡ EB – Generate an extra flag in the operand alignment stage

  • minNum, maxNum, and compare use the
  • riginal datapath
  • Many changes are made to exception flag logic
  • A post-processing unit is added to handle

special operands such as infinity and Not-a- Number

slide-16
SLIDE 16

16

Block Diagram of the DFP Adder and Multifunction Unit

Overflow

Op A Op B

Forward Format Conversion Operand Alignment Calculation and Swapping Pre-correction and Operand Placement K-S Network Post- correction Shift and Round Backward Format Conversion Post- processing

IEEE P754 Result (Z) CA1 EB1 CB1 EA1 CAS CBS CA3 CB3 UCR CR1 CR2

Barrel Shifters

CA2 CB2 C1 F1 F2 R1 Rounding Mode Operation SA1 SB1

Sign

ER1 SR1

  • verflow

ER2 RSA LSA

slide-17
SLIDE 17

17

Hardware Implementation

  • Modeled using RTL Verilog and simulated

using Modelsim

  • Synthesized using LSI Logic’s 0.11um

Standard Cell Library and Synopsys Design Compiler

  • Tested using a comprehensive testbench

generator and the decNumber library 3.32

slide-18
SLIDE 18

18

Delay and Area Comparison

1.6% 21.0% Improvement 22086 NAND eq. gates 22443 NAND eq. gates Area 2.76 ns, 50.2 FO4 3.50 ns, 63.6 FO4 Delay (comb.) Injection-based adder Thompson’s adder Metric

9.7% 2.8% Overhead 24233 NAND eq. gates 22086 NAND eq. gates Area 2.84ns, 51.6 FO4 2.76 ns, 50.2 FO4 Delay Multifunction Unit Injection-based adder Metric

Table 1. Improvement over Thompson’s Design Table 2. Overhead of the Multifunction Unit Compared to the Injection-based Adder

  • Combinational circuit designs
slide-19
SLIDE 19

19

. 1 . 2 . 3 . 4 . 5 . 6 . 1 2 3 4 5 6 #

  • f

S t a g e s F O 4

Cycle Times vs. Pipeline Depth

  • Synthesized using the pipeline_design

command from the Synopsys Design Compiler

20000 40000 60000 80000 100000 120000 1 2 3 4 5 6 # of Stages

Area (NAND2 Gate eq.)

slide-20
SLIDE 20

20

Conclusion

  • A 16-digit DFP adder and multifunction unit

compliant with the IEEE P754 standard

  • Novel features:

– Delay optimization in the operand alignment, rounding, and overflow detection units – A modified injection-based rounding method – Extensions to support multiple DFP operations

  • Design analysis

– 21% delay improvement over Thompson’s design – 2.8% delay overhead for DFP multifunction unit

slide-21
SLIDE 21

21

Questions?

slide-22
SLIDE 22

22

Backup Slide

  • More on Forward Conversion
  • More on Operand Alignment
  • More on Post-correction
  • More on Carry Propagation Network
  • More on Overflow Detection
  • More on Sign and Backward Conversion
  • More on Extension to Support More DFP Operations
  • More on Area Comparison
slide-23
SLIDE 23

23

Forward Format Conversion

  • Extract sign bits, biased exponents, and

significands from operands in the IEEE format

– Combination field G contains the classification of a number, the encoding information, the most significant digit of significand and a biased exponent. – Trailing significand field T encodes a significand using Densely Packed Decimal (DPD) encoding. DPD encoding represents three digits using ten bits.

  • Convert significands in DPD encoding to the BCD

encoding

  • Generate flag signals for special operands

(signaling NaN, quiet NaN, zero, and infinity)

Sign S Combination G Trailing Significand T Width (w+5) bits = 13 bits Field 1 bit

decimal64

  • perand

t =10J bits = 50 =3J digits=15

slide-24
SLIDE 24

24

Operand Alignment and Pre-correction

  • Operands are shifted

using one 16-digit left- shift and one 18-digit right-shift decimal barrel shifters.

  • Guard and round digits,

and sticky bit are

  • generated. CB becomes

a 18-digit operand with a sticky bit.

  • Operands are placed

based on the effective

  • peration flag to simplify

the rounding.

CA'2 CB'2 Addition R S CB2[71:4], 18 digits OR{CBS[3:0], sticky} CA2[63:0] G CA'2 CB'2 Subtraction R S CB2[71:0], 18 digits sticky CA2[63:0] 16 digits G 3 digits L L 16 digits 3 digits 19 digits

slide-25
SLIDE 25

25

Operand Alignment and Pre-correction

LZD LZD

Significand Swapping

CA1 CB1 SUB-ABS EA1 EB1 SUB CAS CBS SUB

Right Shift Corrector

LSA ER1 RSA EAS |EA1-EB1| MUX LAS LA1 LB1 |EA1-EB1|-LAS MUX MUX swap select

slide-26
SLIDE 26

26

Validity of Post-correction

  • Add:

– At Pre-correction: Ai + Bi + 6 – If digit carry is 0, Ai + Bi + Ci-1< 10, subtract 6 from Sumi

  • Sub:

– Expect: A + (10…0 - B) – At Pre-correction: A + (9…9- B) + 6…6 – If carry out of MSD is 1,

  • Result is positive. Add the late carry-in from the LSD.
  • If the digit sum after incrementing the late carry-in is less

than 10 (A + (9-B) + 6 + C < 10), subtract 6 from Sum

slide-27
SLIDE 27

27

Validity of Post-correction

– Else

  • Result is negative. Invert Sum. Sumi = 15 – (Ai +15 – Bi + Ci-

1)= Bi – Ai – Ci-1

  • If Bi – (Ai +Ci-1) <0

– Need to borrow from the next digit – 25>=15-[Bi – (Ai + Ci-1)]>=169>=Sum>=0. This generates a carry to the next digit. – After inverting, F>=Sumi’>=6. Need to subtract 6

  • Else,

– No borrow from the next digit – 15>=15-[Bi – (Ai + Ci-1)]>6, No carry is generated – After inverting, 9>=Sumi’>=0. No subtraction is needed.

– E.g 135 – 424 = 135 + bdb = d10 with 011 as borrow

  • signals. After inversion, d102ef. Subtract by six on

two LSDs, 2ef289

slide-28
SLIDE 28

28

Carry Propagation Network

  • Use Kogge-Stone parallel prefix network
  • Three sets of flags in addition to the carry bits are generated.
  • Flag F1 handles the digit increment in the post-correction stage to

increment results and is generated from the propagate bits.

  • Flags F2 traces the trailing nine of the result before the post-

correction stage.

( ) ( )

( )

( )

( )

( )

( )

( ) ( )

( ) (

) ( )

( ) (

) ( )

( )

( ) ( ) ( ) ( ) ( ) ( )

⎩ ⎨ ⎧ = ∧ = ∧ = ⎩ ⎨ ⎧ = ≡ ≡ ∧ ≡ ∨ ≡ ∧ ≡ = ∧ ≡ ∨ ≡ = =

− −

− − − − − − + 4 4 2 2 1 1 2 1 1 3 4 3 4 1 1

1 1

19...5 i , 15 1 14 15 9 15 4 ... 19 flagSUB flagADD F flagSUB flagSUB flagSUB flagADD flagADD flagADD UCR P UCR P UCR flagSUB C UCR UCR flagADD i

x x

i x i x i x i x i x i x i i i i i i

EOP = ADD EOP = SUB , where X = 1~4

( ) ( ) ( ) ( )i

i x i x i x

P F x P P P

x

4 1 2 1 1

... 4 where = = ∧ =

− − −

slide-29
SLIDE 29

29

Overflow Detection

  • Injection-based rounding simplifies the
  • verflow detection

– The result is overflow before rounding (carry-

  • ut is generated from the most significant digit
  • f the result)
  • Not influenced by the injection correction value

– The result is overflow after rounding

  • Handle by the injected value

– Overflow detection can examine the result before the rounding unit

slide-30
SLIDE 30

30

Sign and Backward Conversion

  • Sign bit is determined by the signs of
  • perands, the rounding mode, and if either
  • f the operands is normal numbers.

– Sign = (!EOP ∩ SignA) ∪ (EOP ∩ ((EA≥EB) ⊕ SignA ⊕ carryout)

  • Backward conversion combines the sign

bit, the exponent, and the significand to form the P754 compliant result.

slide-31
SLIDE 31

31

Extension to Support More DFP Operations

  • Quantize (A, B)

– Change the unit of A to EB – Set CB to zero – Enable right shift even if CB=0 – Set effective operation to ADD to avoid wrong rounding operations

  • SameQuantum(A, B)

– Check if EA ≡ EB – Generate an extra flag in the operand alignment stage.

  • MinNum, MaxNum, and Compare

– Set the operator to SUB and observe the sign

  • ToIntegralValue(A)

– Round A to an integer value – Set CB and EB to zero – Enable right shift even if CB=0 – Set effective operation to ADD to avoid wrong rounding operations

  • Many changes to the conditions of exception flags are added. The post-

processing unit is added to handle special operands such as infinity and Not-a-Number.

slide-32
SLIDE 32

32

Area Comparison

0.0 2000.0 4000.0 6000.0 8000.0 10000.0 12000.0 14000.0 16000.0

B a r r e l S h i f t e r ( L e f t ) B a r r e l S h i f t e r ( R i g h t ) F

  • r

w a r d F

  • r

m a t C

  • n

v e r s i

  • n

E f f e c t i v e O p e r a t i

  • n

B a c k w a r d F

  • r

m a t C

  • n

v e r s i

  • n

K

  • g

g e S t

  • n

e N e t w

  • r

k O A C S U O p e r a n d P r e

  • c
  • r

r e c t i

  • n

P

  • s

t

  • c
  • r

r e c t i

  • n

P

  • s

t

  • P

r

  • c

e s s i n g S h i f t a n d R

  • u

n d S i g n Z e r

  • D

e t e c t i

  • n

S p e c i a l O p e r a t i

  • n

H a n d l e r O t h e r L

  • g

i c

Area (NAND Equivalent Units) Thompson's Adder [14] Adder w ith Inj-based Rounding Multifunction w ith Inj-based Rounding

slide-33
SLIDE 33

33

Comparison with Software (IBM’s decNumber library)

1 1 6 4 47.1 107.2 67.6 60.8 89.4 44.2 82.7 83.2 Slow Fast Slow Fast 6 4 Hardware 70.7 282.8 Compare 160.8 643.1 MinNum 101.4 405.4 MaxNum 91.2 364.6 ToIntegralValue 89.4 89.4 SameQuantum 66.4 265.4 Quantize 124.1 496.2 SUB 124.9 499.4 ADD Improvement Software DFP Operations

  • decNumber library using the SimpleScalar simulator with PISA

architecture

slide-34
SLIDE 34

34

Comparison with Software (Intel’s BID library)

108 113 45 45 133 133 Slow Fast 6 4 11.5 12.5 4.5 4.5 11.8 11.8 Min Max Slow Fast 6 4 Hardware 27 69 MinNum* 28.3 75 MaxNum* 11.3 27 ToIntegralValue 11.3 27 Quantize 33.3 71 SUB 33.3 71 ADD Improvement Software DFP Operations

  • Intel’s BID library and EM64t Xeon 5100 3.0GHz
  • Results taken from their paper
slide-35
SLIDE 35

35

Other DFP Operations

  • Other DFP operations that can reuse our

DFP adders include:

– nextUp – nextDown

  • Other DFP operations that can use our

DFP with little extra gate

– ABS – Negate – copySign

slide-36
SLIDE 36

36

Delay Comparison

0.2 0.4 0.6 0.8 1 1.2 1.4

I E E E t

  • B

C D F

  • r

m a t C

  • n

v e r s i

  • n

U n i t O A C S U B a r r e l S h i f t e r R i g h t O p e r a n d P r e

  • c
  • r

r e c t i

  • n

U n i t K

  • g

g e S t

  • n

e N e t w

  • r

k U n i t P

  • s

t

  • c
  • r

r e c t i

  • n

U n i t S h i f t a n d R

  • u

n d U n i t B C D t

  • I

E E E F

  • r

m a t C

  • n

v e r s i

  • n

U n i t P

  • s

t

  • P

r

  • c

e s s i n g U n i t

Delay (ns)

Thompson's Adder [14] Adder with Inj-based Rounding