a new family of high performance parallel decimal
play

A New Family of High-Performance Parallel Decimal Multipliers* - PowerPoint PPT Presentation

A New Family of High-Performance Parallel Decimal Multipliers* Alvaro Vzquez, Elisardo Antelo Paolo Montuschi Dept. of Electronic and Computer Science Dept. of Computer Engineering University of Santiago de Compostela Politecnico di Torino


  1. A New Family of High-Performance Parallel Decimal Multipliers* Alvaro Vázquez, Elisardo Antelo Paolo Montuschi Dept. of Electronic and Computer Science Dept. of Computer Engineering University of Santiago de Compostela Politecnico di Torino Spain Italy alvaro@dec.usc.es elisardo@dec.usc.es montuschi@polito.it *A. Vázquez and E. Antelo supported in part by the Ministry of Science and Technology of Spain under contract TIN2004-07797-C02 and Xunta de Galicia under contract PGIDT03TIC10502PR. 1 ARITH 18 - Montpellier, France. June 25-27, 2007

  2. A New Family of High-Performance Parallel Decimal Multipliers Outline • Introduction. Previous work. • Implementation of decimal parallel multiplication: – Fast carry-save addition using non conventional BCD. – Design of high-performance decimal p:2 CSAs. – Parallel partial product generation . • Architectures. – Signed-digit (SD) Radix-10. – SD Radix-4/Radix-5 (combined binary/decimal). • Evaluation and Comparison. • Conclusions. 2 ARITH 18 - Montpellier, France. June 25-27, 2007

  3. A New Family of High-Performance Parallel Decimal Multipliers Introduction • High-performance decimal floating-point units. • Parallel multiplier: scaling performance by pipelining. • Multiplication stages: 1. Generation of partial products (PPG) 2. Reduction of partial products (PPR) 3. Conversion to non-redundant representation. • Problems of decimal implementation: – High value-range for decimal digits (0-9) PPG – Inefficiency of conventional BCD coding PPG, PPR 3 ARITH 18 - Montpellier, France. June 25-27, 2007

  4. A New Family of High-Performance Parallel Decimal Multipliers Previous Work on Decimal Multiplication • Previous proposals for PPG 1. Direct generation of partial products (digit-by-digit) 2. Using multiplicand multiples (X,2X,3X,4X,…,9X). – Direct implementation. – SD multiplier. [Ex. 2 radix5 digits (-5X, 5X) (-2X,-X, X,2X)] • Previous proposals for PPR 1. Carry-save BCD-8421. a. Full BCD operands (3:2 CSAs + correction) b. Carry operand 1 bit each 4-bit. (4-bit decimal CPAs) 2. Signed-digit representation for decimal digits. – SD adders more complex than CSA based implementations. 4 ARITH 18 - Montpellier, France. June 25-27, 2007

  5. A New Family of High-Performance Parallel Decimal Multipliers Proposed techniques • X multiplicand, Y multiplier BCD integer words. • BCD digit represented as: BCD-8421 (r j =2 j ) 3 ∑ = Z z r BCD-4221 (r 3 ,r 2 ,r 1 ,r 0 ) = (4,2,2,1) i i , j j = BCD-5211 (r 3 ,r 2 ,r 1 ,r 0 ) = (5,2,1,1) j 0 1. Decimal carry-save addition using BCD-4221. 2. Implementation of decimal CSAs for PPR. 3. Implementation of PPG using multiplier recoding: – SD radix-10 – SD radix-4. – SD radix-5. 5 ARITH 18 - Montpellier, France. June 25-27, 2007

  6. A New Family of High-Performance Parallel Decimal Multipliers Decimal carry-save addition (BCD-8421) • Add 3 decimal digits to produce 2 decimal digits (sum and carry digits). A i ,B i ,C i ,S i ,H i є [0,9] A i +B i +C i = S i +2H i 2H i є [0,18] and even a i,j b i,j c i,j 8 4 2 1 A i : 5 0 1 0 1 3:2 CSA B i : 6 0 1 1 0 C i : 9 1 0 0 1 Xor (a i,j ,b i,j ,c i,j ) s i,j = Xor S i : 10 1 0 1 0 h i,j = a i,j b i,j + (a i,j + b i,j ) c i,j H i : 5 0 1 0 1 PROBLEM WITH BCD-8421 Carry-out x2 Input digits in [0,9] BUT Sum digit out of Carry-in decimal range [0,9] ->[0,16] 10 1 0 0 0 - 2H i : A i +B i +C i = S i +2H i = 20 Sum digits require correction 6 ARITH 18 - Montpellier, France. June 25-27, 2007

  7. A New Family of High-Performance Parallel Decimal Multipliers Decimal carry-save addition (BCD-4221) • Add 3 decimal digits to produce 2 decimal digits (sum and carry digits). A i ,B i ,C i ,S i ,H i ,W i є [0,9] A i +B i +C i = S i +2H i = S i + L1 shift (W i ) L1- -shift 4 2 2 1 a i,j b i,j c i,j A i : 5 1 0 0 1 3:2 CSA B i : 6 1 1 0 0 C i : 9 1 1 1 1 Xor (a i,j ,b i,j ,c i,j ) s i,j = Xor S i : 6 1 0 1 0 h i,j = a i,j b i,j + (a i,j + b i,j ) c i,j H i : 7 1 1 0 1 SOLUTION WITH BCD-4221 x2 W i : 7 1 1 0 0 (BCD-5211) Input digits in [0,9] and Sum digit always in range [0,9]. 2H i : 14 1 1 0 0 - L1-shift (W i ) Carry-out Carry-in A i +B i +C i = S i +2H i = 20 7 ARITH 18 - Montpellier, France. June 25-27, 2007

  8. A New Family of High-Performance Parallel Decimal Multipliers Decimal carry-save addition (BCD-5211) • Add 3 decimal digits to produce 2 decimal digits (sum and carry digits). A i ,B i ,C i ,S i ,H i є [0,9] A i +B i +C i = S i +2H i = S i + L1 shift (H i ) BCD-4221 L1- -shift 5 2 1 1 a i,j b i,j c i,j A i : 5 1 0 0 0 3:2 CSA B i : 6 1 0 0 1 C i : 9 1 1 1 1 Xor (a i,j ,b i,j ,c i,j ) s i,j = Xor S i : 8 1 1 1 0 h i,j = a i,j b i,j + (a i,j + b i,j ) c i,j H i : 6 1 0 0 1 Carry-in SOLUTION WITH BCD-5211 x2 L1-shift 2H i : 12 1 0 0 1 - Input digits in [0,9] and Sum digit BCD-4221 Carry-out always in range [0,9]. 12 1 0 1 0 - BCD-5211 A i +B i +C i = S i +2H i = 20 8 ARITH 18 - Montpellier, France. June 25-27, 2007

  9. A New Family of High-Performance Parallel Decimal Multipliers Decimal multiplication by ±2 n and ±5 n • Multiplication by 2 • Multiplication by 5 • Multiplication by 2 • Multiplication by 5 x 10 x 10 4 2 2 1 4 2 2 1 4 2 2 1 4 2 2 1 4 2 2 1 25 BCD-4221 25 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 BCD-4221 x 10 L3-SHIFT x 5 Digit recoding 5 2 1 1 5 2 1 1 x 100 x 10 5 2 1 1 5 2 1 1 5 2 1 1 25 0 1 0 0 1 0 0 0 BCD-5211 BCD-5211 125 0 0 1 0 0 1 0 0 1 - - - x 2 L1-SHIFT x 10 4 2 2 1 4 2 2 1 Digit recoding x 100 x 10 4 2 2 1 4 2 2 1 4 2 2 1 1 0 0 1 0 0 0 0 BCD-4221 50 BCD-4221 125 0 0 0 1 0 1 0 0 1 0 0 1 • Negative operands (10’ ’s complement) by bit inversion (2 s complement) by bit inversion (2’ ’s complement) s complement) • Negative operands (10 BCD-4221 BCD-4221 0 5 9 6 9 4 0 3 0000 1001 1111 1100 1111 0110 0000 0011 +1 Bit-complement Hot-one -596 = - 10000 + 9403 +1 9 ARITH 18 - Montpellier, France. June 25-27, 2007

  10. A New Family of High-Performance Parallel Decimal Multipliers Proposed decimal 3:2 CSA (BCD-4221) A i +B i +C i = S i +2H i = S i + L1 shift (W i ) L1- -shift 10 ARITH 18 - Montpellier, France. June 25-27, 2007

  11. A New Family of High-Performance Parallel Decimal Multipliers Proposed decimal 3:2 CSA (BCD-4221) BCD-4221 BCD-5211 0 0000 0000 Digit recoder 1 0001 0001 BCD-4221 to BCD-5211 2 0010 0100 Critical path 0100 AREA: 18 NAND2 3 0011 0101 (0.35 times 4-bit 3:2 CSA area) 0101 DELAY: 4 FO4 4 0100 0111 (0.9 times binary 3:2 CSA delay) 0110 Decimal (digit) 3:2 CSA 5 1001 1000 0111 AREA: 66 NAND2 6 1100 1010 (1.35 times 4-bit 3:2 CSA area) 1010 *DELAY: 1.4 times carry 7 1101 1011 path/ same sum path 1011 8 1110 1110 *Ratio respect sum path (critical path) delay of bin. 3:2 CSA. 9 1111 1111 11 ARITH 18 - Montpellier, France. June 25-27, 2007

  12. A New Family of High-Performance Parallel Decimal Multipliers Decimal CSA tree (BCD-4221) • Example: 9:2 Decimal CSA (digit slice). 4-bit 3:2 4-bit 3:2 4-bit 3:2 • 1.35 area ratio resp. binary CSA. Critical path • 1.40 delay ratio resp. binary CSA. x2 x2 x2 • Hardware complexity (1 digit): 4-bit 3:2 4-bit 3:2 – 4-bit 3to2: 7x48 NAND2 – Digit recoder (x2): 7x18 NAND2. x2 x2 Mux 2:1 • Critical path delay: 4-bit 3:2 – 1-bit 3to2: 4.5/2.2 FO4 (2/1 XOR) For combined – Recoder: 4 FO4 (1.75 XOR) x2 Decimal/Binary CSA – 9:2 Decimal CSA: 25 FO4. – 9:2 Binary CSA: 18 FO4. 4-bit 3:2 x2 12 ARITH 18 - Montpellier, France. June 25-27, 2007

  13. A New Family of High-Performance Parallel Decimal Multipliers Decimal CSA tree BCD-4221 (area-optimized) • Example: 9:2 Decimal CSA (digit slice). 4-bit 3:2 4-bit 3:2 4-bit 3:2 • Area optimization : Group inputs Critical path with similar multiplicative factor. x2 x1 • 1.20 area ratio resp. binary CSA. 4-bit 3:2 4-bit 3:2 • 1.40 delay ratio resp. binary CSA. x2 x2 • Hardware complexity (1 digit): – 4-bit 3to2: 7x48 NAND2 4-bit 3:2 – Digit recoder (x2): 5x18 NAND2. x2 x1 x2 x2 • Critical path delay: – 9:2 Decimal CSA: 25 FO4. 4-bit 3:2 – 9:2 Binary CSA: 18 FO4. x2 13 ARITH 18 - Montpellier, France. June 25-27, 2007

  14. A New Family of High-Performance Parallel Decimal Multipliers SD radix-10 multiplier recoding • Multiplicand X (BCD-4221) • Multiplier Y (BCD-8421) 4d Y i є [0,9] 4 x 2 x 5 SD radix-10 digit recoder x 2 Yb i є [-5,5] 1 5 (hot-one code) 4d-bit decimal adder Mult. multiples gen. X 2X 3X 4X 5X Mux-5 (recoded sign) Integer d-digit precision operands 4d • 1 SD radix-10 digit/multiplicand digit • d+1 partial products (additional encoded SD radix-10 digit) 14 ARITH 18 - Montpellier, France. June 25-27, 2007

  15. A New Family of High-Performance Parallel Decimal Multipliers SD radix-4 multiplier recoding • Multiplicand X (BCD-4221) • Multiplier Y (BCD-8421) 4d Y i є [0,9] 4 x 2 SD radix-4 digit recoder x 2 1 Yb i = Y U i 4+ Y L i 2 2 x 2 Y U i є [0,2] Y L i є [-2,2] 8X 4X 2X X Mult. multiples gen. (hot-one code) Mux-2 Mux-2 (recoded sign) 4d 4d Integer d-digit precision operands • 2 SD radix-4 digit/multiplicand digit • 2d partial products 15 ARITH 18 - Montpellier, France. June 25-27, 2007

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend