EE 457 Unit 2c Fast Multipliers 2 Multiplication Overview - PowerPoint PPT Presentation

1 EE 457 Unit 2c Fast Multipliers

2 Multiplication Overview • Multiplication approaches: – Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow) – Combinational: Array multiplier uses an array of adders • Can be as simple as N-1 ripple-carry adders for an NxN multiplication m3 m2 m1 m0 q0 m3 m2 m1 m0 m3·q0 m2·q0 m1·q0 m0·q0 x q3 q2 q1 q0 q1 m3q0 m2q0 m1q0 m0q0 m3·q1 m2·q1 m1·q1 m0·q1 m3q1 m2q1 m1q1 m0q1 - q2 m3q2 m2q2 m1q2 m0q2 - - + m3q3 m2q3 m1q3 m0q3 - - - m3·q2 m2·q2 m1·q2 m0·q2 q3 p7 p6 p5 p4 p3 p2 p1 p0 m3·q3 m2·q3 m1·q3 m0·q3 AND Gate Array produces partial product terms

3 Array Multiplier m1q0 m0q0 m3q0 m2q0 0 Can this be a HA? m3q1 m2q1 m1q1 m0q1 X Y X Y X Y X Y Co Ci Co Ci Co Ci Co FA FA FA HA S S S S m1q2 m0q2 m3q2 m2q2 X Y X Y X Y X Y Co Ci Co Ci Co Ci Co FA FA FA HA S S S S m1q3 m0q3 m3q3 m2q3 X Y X Y X Y X Y Co Ci Co Ci Co Ci Co FA FA FA HA S S S S P[7] P[6] P[5] P[1] P[0] P[4] P[3] P[2] • Maximum delay = ? – Do you look for the longest path or the shortest path between any input and output? – Compare with the delay of a shift-and-add method

4 Pipelined Multiplier • Now try to pipeline the previous design m1q2 m0q2 m3q2 m2q2 m1q3 m0q3 m3q3 m2q3 m1q1 m0q1 m1q0 m0q0 m2q1 m3q0 m2q0 m3q1 X X Y X Y X Y Co Y Co Ci Co Ci Co HA FA FA HA S S S S X Y X Y X Y X Y Co Ci Co Ci Co Ci Co FA FA FA HA S S S S X Y X Y X Y X Y Co Ci Co Ci Co Ci Co FA FA FA HA S S S S P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0] Determine the maximum stage delay to decide the pipeline clock rate. Assume zero-delay for stage latches. How does the latency of the pipeline compare with the simple combinational array of the previous stage?

5 Carry-Save Multiplier • Instead of propagating the carries to the left in the same row, carries are now sent down to the next stage to reduce stage delay and facilitate pipelining m3q0 m2q0 m1q0 m0q0 0 0 0 m3q1 m2q1 m1q1 m0q1 X Y X Y X Y CSA’s Co Ci Co Ci Co Ci FA FA FA S S S m2q2 m1q2 m0q2 m3q2 X Y X Y X Y Co Ci Co Ci Co Ci FA FA FA S S S m3q3 m2q3 m1q3 m0q3 The upper three stages are 3-bit X Y X Y X Y Carry Save Adders (CSA’s) each Co Ci Co Ci Co Ci FA FA FA with 2-gate delays. S S S RCA The last stage is a Ripple Carry Adder (RCA) which requires X Y X Y X Y longer delay. It can be replaced Co Ci Co Ci Co Ci FA FA FA 0 by a CLA for larger multipliers. S S S P[7] P[6] P[5] P[1] P[0] P[4] P[3] P[2]

6 Carry Save Adders • Consider the decimal addition of 47 + 96 + 58 = 201 • One way is to add 47 to 96 to get 143 and then add 58 • Here the ten’s column cannot be added until the carry is produced • In the carry- save style, we add the one’s column and ten’s column simultaneous 1 1 4 7 4 7 + 9 6 9 6 1 1 4 3 + 5 8 3 2 1 + 5 8 2 1 1 2 0 1 + 1 8 2 _ 5 4 6 2 0 1 4 3

7 Carry-Save (3,2) Adders • A carry save adder is also called a (3,2) adder or a (3,2) counter (refer to 0 1 0 1 Computer Arithmetic Algorithms by 1 0 0 1 Israel Koren) as it takes three vectors, + 1 0 1 1 adds them up, and reduces them to 1 0 0 1 _ Carry vector 0 1 1 1 two vectors, namely a sum vector and a Sum vector carry vector • CSA’s are based on the principle that carries do not have to be added as soon A[3] B[3] C[3] A[2] B[2] C[2] A[1] B[1] C[1] A[0] B[0] C[0] as possible, but can be combined in a Z X Y Z X Y Z X Y X Y Z Co Co Co Co FA FA FA FA later step S S S S • An n-bit CSA consist of n disjoint full C[4] S[3] C[3] S[2] C[2] S[1] C[1] S[0] adders

8 1-bit FA vs. 1-bit CSA • Any difference between an ordinary full adder and 1- bit CSA? NO! • 16-bit wide CSA takes ( more / equal / less ) time to produce its outputs compared to an 8-bit wide CSA • Carry-save adder ( is / is not ) useful in adding only 2 numbers

9 CSA Organization • We can arrange our CSA’s in a linear manner where one partial product is added per CSA (after the first level)

10 Wallace Tree Multiplier • Using the previous example as a template, to build an NxN multiplier q1·M q0·M q7·M q6·M q5·M q4·M q3·M q2·M you need (n-1) of CSA CSA (n-1) bit CSAs followed by a CSA CSA final (n-1)-bit RCA • Delay = Delay of (n- 1) CSA’s CSA + Delay of (n-1) bit RCA CSA = 2 * (n-1) * Delay(FullAdder) Propagation Adder • We can reduce the CSA component Product of the delay by organizing the CSA’s Note: The vectors (partial products) in a tree (i.e. logarithmic delay) need to be aligned before summing. These details are not shown in the block diagram.

11 Logic Delay • Consider the gate arrangement for OR’ing 8 bits • Linear: – Delay = 7 gates • Tree – Depth of tree = log 2 8 = 3 levels • Consider OR’ing 16 -bits using 4-bit OR gates, how many levels would you need?

12 Wallace Tree Discussion • A 4-input OR gate reduces 4 literals to 1 (i.e. a factor of 4 reduction) • A CSA reduces 3 vectors to 2 vectors (i.e. a factor of 1.5) – This reduction factor may not be convenient to develop an efficient tree to sum 16 or 32 partial products – Wallace tree may not achieve a great reduction in delay due to wastage of an extra level • Also note the Wallace tree shown earlier does not show… – Size of buses – What bits are “retired” progressivley – Relative significance (alignment) of partial products – Size of the carry-propagate adder (e.g. RCA or CLA) needs to be figured out and overall delay estimated

14 10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 Original 6x6 Matrix Reorganized 6x6 matrix 10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 Level 1 CSA Level 2 CSA 10 9 8 7 6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0 Results of Level 1 Level 3 CSA

15 Credits • These slides were derived from Gandhi Puvvada’s EE 457 Class Notes

EE 457 Unit 2c Fast Multipliers 2 Multiplication Overview - PowerPoint PPT Presentation

1 EE 457 Unit 2c Fast Multipliers 2 Multiplication Overview Multiplication approaches: Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow) Combinational: Array multiplier uses an array of adders

457 Retirement Program 41-10390-29 2018/01/05 457 Retirement Program Things You Already Know

Credits These slides were derived from Gandhi Puvvadas EE 457 Class Notes EE 457 Unit 1

EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 General Digital System

Deferred Compensation Plans 457(b) & 457(f) Presented By: Nonqualified Deferred Compensation

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

EE 457 Unit 1 Overview of Digital System Design 1.2 Credits These slides were derived from

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Caroline Van Wie AT&T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 4 Computer System Performance 2 Motivation An individual user wants to:

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several

EE 457 Unit 2 Fixed Point Systems and Arithmetic 2 Unsigned 2s Complement Sign and Zero

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

EE 457 Unit 2b Fast Adders (Carry-Lookahead Adder) 2 Carry-Lookahead Adders FAST ADDERS 3

EE 457 Unit 6b Data Hazards 2 Data Hazards Consider the data dependencies in the following

EE 457 Unit 2a Unsigned 2s Complement Sign and Zero Extension Fixed Point Systems and

JUST THE MATHS SLIDES NUMBER 17.6 NUMERICAL MATHEMATICS 6 (Numerical solution) of

Direction Finding Using Sparse Linear Arrays with Missing Data Mianzhi Wang, Zhen Zhang, and Arye

Last time on Types ... I Modified ML with polymorphic types anywhere Identity, Generalisation and

Exact Lifted Inference with Distinct Soft Evidence on Every Object Hung Hai Bui, Tuyen N. Huynh,

Generalized Computability in Approximation Spaces Alexey Stukachev Sobolev Institute of

An analogue of Stokes phenomenon for Slopes q -difference equations Classification Jacques

Second order Implicit-Explicit Total Variation Diminishing schemes for the Euler system in the

Fermionic DM Higgs Portal An EFT approach Michael A. Fedderke University of Chicago Based on

EE 457 Unit 2c Fast Multipliers 2 Multiplication Overview - PowerPoint PPT Presentation

1 EE 457 Unit 2c Fast Multipliers 2 Multiplication Overview Multiplication approaches: Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow) Combinational: Array multiplier uses an array of adders

457 Retirement Program 41-10390-29 2018/01/05 457 Retirement Program Things You Already Know

Credits These slides were derived from Gandhi Puvvadas EE 457 Class Notes EE 457 Unit 1

EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 General Digital System

Deferred Compensation Plans 457(b) &amp; 457(f) Presented By: Nonqualified Deferred Compensation

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

EE 457 Unit 1 Overview of Digital System Design 1.2 Credits These slides were derived from

Unit Identifier Unit October 21, 2014 Unit Identifiers Unit Members Representing Name Email

Unit Title: Presentation Software Unit Level: 2 Unit Credit Value: 4 GLH: 30 LASER Unit

Caroline Van Wie AT&amp;T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 4 Computer System Performance 2 Motivation An individual user wants to:

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy &amp; Caching Use several

EE 457 Unit 2 Fixed Point Systems and Arithmetic 2 Unsigned 2s Complement Sign and Zero

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

EE 457 Unit 2b Fast Adders (Carry-Lookahead Adder) 2 Carry-Lookahead Adders FAST ADDERS 3

EE 457 Unit 6b Data Hazards 2 Data Hazards Consider the data dependencies in the following

EE 457 Unit 2a Unsigned 2s Complement Sign and Zero Extension Fixed Point Systems and

JUST THE MATHS SLIDES NUMBER 17.6 NUMERICAL MATHEMATICS 6 (Numerical solution) of

Direction Finding Using Sparse Linear Arrays with Missing Data Mianzhi Wang, Zhen Zhang, and Arye

Last time on Types ... I Modified ML with polymorphic types anywhere Identity, Generalisation and

Exact Lifted Inference with Distinct Soft Evidence on Every Object Hung Hai Bui, Tuyen N. Huynh,

Generalized Computability in Approximation Spaces Alexey Stukachev Sobolev Institute of

An analogue of Stokes phenomenon for Slopes q -difference equations Classification Jacques

Second order Implicit-Explicit Total Variation Diminishing schemes for the Euler system in the

Fermionic DM Higgs Portal An EFT approach Michael A. Fedderke University of Chicago Based on

Deferred Compensation Plans 457(b) & 457(f) Presented By: Nonqualified Deferred Compensation

Caroline Van Wie AT&T Services Inc. T: 202.457.3053 AVP - Federal Regulatory 1120 20 th

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several