EE 457 Unit 2c Fast Multipliers 2 Multiplication Overview - - PowerPoint PPT Presentation

ee 457 unit 2c
SMART_READER_LITE
LIVE PREVIEW

EE 457 Unit 2c Fast Multipliers 2 Multiplication Overview - - PowerPoint PPT Presentation

1 EE 457 Unit 2c Fast Multipliers 2 Multiplication Overview Multiplication approaches: Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow) Combinational: Array multiplier uses an array of adders


slide-1
SLIDE 1

1

EE 457 Unit 2c

Fast Multipliers

slide-2
SLIDE 2

2

Multiplication Overview

  • Multiplication approaches:

– Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow) – Combinational: Array multiplier uses an array of adders

  • Can be as simple as N-1 ripple-carry adders for an NxN multiplication

m3 m2 m1 m0 x q3 q2 q1 q0 m3q0 m2q0 m1q0 m0q0 m3q1 m2q1 m1q1 m0q1 - m3q2 m2q2 m1q2 m0q2 -

  • + m3q3 m2q3 m1q3 m0q3 -
  • p7 p6 p5 p4 p3 p2 p1 p0

m3·q0 m2·q0 m1·q0 m0·q0 m3·q1 m2·q1 m1·q1 m0·q1 m3·q2 m2·q2 m1·q2 m0·q2 m3·q3 m2·q3 m1·q3 m0·q3 m3 m2 m1 m0 q0 q1 q2 q3

AND Gate Array produces partial product terms

slide-3
SLIDE 3

3

Array Multiplier

  • Maximum delay = ?

– Do you look for the longest path or the shortest path between any input and output? – Compare with the delay of a shift-and-add method

FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Ci Co

HA

X Y S Co

m3q1 m2q1 m1q1 m0q1 m3q0 m2q0 m1q0 m0q0 FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Ci Co

HA

X Y S Co

m3q2 m2q2 m1q2 m0q2 FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Ci Co

HA

X Y S Co

m3q3 m2q3 m1q3 m0q3 P[1] P[0] P[3] P[2] P[4] P[5] P[6] P[7]

Can this be a HA?

slide-4
SLIDE 4

4

Pipelined Multiplier

  • Now try to pipeline the previous design

HA

X S Y Co

FA

X Y S Ci Co

FA

X Y S Ci Co

HA

X Y S Co

m3q1 m2q1 m1q1 m0q1 m3q0 m2q0 m1q0 m0q0 FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Ci Co

HA

X Y S Co

FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Ci Co

HA

X Y S Co

P[1] P[0] P[3] P[2] P[4] P[5] P[6] P[7] m3q2 m2q2 m1q2 m0q2 m3q3 m2q3 m1q3 m0q3

Determine the maximum stage delay to decide the pipeline clock rate. Assume zero-delay for stage latches. How does the latency of the pipeline compare with the simple combinational array of the previous stage?

slide-5
SLIDE 5

5

Carry-Save Multiplier

  • Instead of propagating the carries to the left in the same row, carries are

now sent down to the next stage to reduce stage delay and facilitate pipelining

The upper three stages are 3-bit Carry Save Adders (CSA’s) each with 2-gate delays. The last stage is a Ripple Carry Adder (RCA) which requires longer delay. It can be replaced by a CLA for larger multipliers.

FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Co

m3q0 m2q0 m1q0 m0q0 FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Co

FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Co

m2q3 m1q3 m0q3 P[1] P[0] P[3] P[2] P[4] P[5] P[6] P[7]

Ci

m2q1 m1q1 m0q1

Ci

m2q2 m1q2 m0q2 FA

X Y S Ci Co

FA

X Y S Ci Co

FA

X Y S Co Ci Ci

m3q2 m3q3 m3q1

RCA CSA’s

slide-6
SLIDE 6

6

Carry Save Adders

  • Consider the decimal addition of

47 + 96 + 58 = 201

  • One way is to add 47 to 96 to get 143 and then add 58
  • Here the ten’s column cannot be added until the carry is produced
  • In the carry-save style, we add the one’s column and ten’s column

simultaneous 4 7 + 9 6 1 4 3 + 5 8 2 0 1 4 7 9 6 + 5 8 2 1 + 1 8 _ 2 0 1

1

1 1

2

1

3 4 5 6 1 2 3 4

slide-7
SLIDE 7

7

Carry-Save (3,2) Adders

  • A carry save adder is also called a (3,2)

adder or a (3,2) counter (refer to Computer Arithmetic Algorithms by Israel Koren) as it takes three vectors, adds them up, and reduces them to two vectors, namely a sum vector and a carry vector

  • CSA’s are based on the principle that

carries do not have to be added as soon as possible, but can be combined in a later step

  • An n-bit CSA consist of n disjoint full

adders

0 1 0 1 1 0 0 1 + 1 0 1 1 1 0 0 1 _ 0 1 1 1

Carry vector Sum vector

FA

X Y S Co Z

FA

X Y S Co Z

FA

X Y S Co Z

FA

X Y S Co Z

A[3] B[3] C[3] A[2] B[2] C[2] A[1] B[1] C[1] A[0] B[0] C[0] C[4] S[3] C[3] S[2] C[2] S[1] C[1] S[0]

slide-8
SLIDE 8

8

1-bit FA vs. 1-bit CSA

  • Any difference between an ordinary full adder and 1-

bit CSA? NO!

  • 16-bit wide CSA takes (more / equal / less) time to

produce its outputs compared to an 8-bit wide CSA

  • Carry-save adder (is / is not) useful in adding only 2

numbers

slide-9
SLIDE 9

9

CSA Organization

  • We can arrange our

CSA’s in a linear manner where one partial product is added per CSA (after the first level)

slide-10
SLIDE 10

10

Wallace Tree Multiplier

  • Using the previous example as a

template, to build an NxN multiplier you need (n-1) of (n-1) bit CSAs followed by a final (n-1)-bit RCA

  • Delay = Delay of (n-1) CSA’s

+ Delay of (n-1) bit RCA

= 2 * (n-1) * Delay(FullAdder)

  • We can reduce the CSA component
  • f the delay by organizing the CSA’s

in a tree (i.e. logarithmic delay)

CSA CSA

q7·M q6·M q2·M q1·M q0·M

CSA

q3·M q4·M q5·M

CSA CSA CSA Propagation Adder

Product

Note: The vectors (partial products) need to be aligned before summing. These details are not shown in the block diagram.

slide-11
SLIDE 11

11

Logic Delay

  • Consider the gate

arrangement for OR’ing 8 bits

  • Linear:

– Delay = 7 gates

  • Tree

– Depth of tree = log28 = 3 levels

  • Consider OR’ing 16-bits

using 4-bit OR gates, how many levels would you need?

slide-12
SLIDE 12

12

Wallace Tree Discussion

  • A 4-input OR gate reduces 4 literals to 1 (i.e. a factor of 4

reduction)

  • A CSA reduces 3 vectors to 2 vectors (i.e. a factor of 1.5)

– This reduction factor may not be convenient to develop an efficient tree to sum 16 or 32 partial products – Wallace tree may not achieve a great reduction in delay due to wastage of an extra level

  • Also note the Wallace tree shown earlier does not show…

– Size of buses – What bits are “retired” progressivley – Relative significance (alignment) of partial products – Size of the carry-propagate adder (e.g. RCA or CLA) needs to be figured

  • ut and overall delay estimated
slide-13
SLIDE 13

13

slide-14
SLIDE 14

14

10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1 11 10 9 8 7 6 5 4 3 2 1 Original 6x6 Matrix Reorganized 6x6 matrix Level 1 CSA Level 2 CSA Results of Level 1 Level 3 CSA

slide-15
SLIDE 15

15

Credits

  • These slides were derived from Gandhi

Puvvada’s EE 457 Class Notes