multiplication overview
play

Multiplication Overview Multiplication approaches: Sequential: - PowerPoint PPT Presentation

2c.1 2c.2 Multiplication Overview Multiplication approaches: Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow) EE 457 Unit 2c Combinational: Array multiplier uses an array of adders Can be


  1. 2c.1 2c.2 Multiplication Overview • Multiplication approaches: – Sequential: Shift-and-Add produces one product bit per clock cycle time (usually slow) EE 457 Unit 2c – Combinational: Array multiplier uses an array of adders • Can be as simple as N-1 ripple-carry adders for an NxN multiplication m3 m2 m1 m0 Fast Multipliers x q3 q2 q1 q0 m3q0 m2q0 m1q0 m0q0 m3q1 m2q1 m1q1 m0q1 - m3q2 m2q2 m1q2 m0q2 - - + m3q3 m2q3 m1q3 m0q3 - - - p7 p6 p5 p4 p3 p2 p1 p0 AND Gate Array produces partial product terms 2c.3 2c.4 Array Multiplier Pipelined Multiplier • Now try to pipeline the previous design Can this be a HA? • Maximum delay = ____________________ – Do you look for the longest path or the shortest path between any input and output? Determine the maximum stage delay to decide the pipeline clock rate. – Compare with the delay of a shift-and-add method Assume zero-delay for stage latches. How does the latency of the pipeline compare with the simple combinational array of the previous stage?

  2. 2c.5 2c.6 Carry-Save Multiplier Carry Save Adders • Instead of propagating the carries to the left in the same row, carries are • Consider the decimal addition of now sent down to the next stage to reduce stage delay and facilitate 47 + 96 + 58 = 201 pipelining • One way is to add ________ to get ____ and _____ m1q0 m0q0 m3q0 m2q0 0 0 0 m3q1 m1q1 m0q1 m2q1 Here the _____ column cannot be added ___________ is produced • X Y X Y X Y CSA’s Co Ci Co Ci Co Ci FA FA FA In the carry-save style, we add the ____ column and _____ column • S S S simultaneous m1q2 m0q2 m3q2 m2q2 X Y X Y X Y Co Ci Co Ci Co Ci FA FA FA S S S 1 1 4 7 4 7 m1q3 m0q3 m3q3 m2q3 The upper three stages are 3-bit + 9 6 9 6 X Y X Y X Y Carry Save Adders (CSA’s) each Co Ci Co Ci Co Ci 1 FA FA FA 1 4 3 + 5 8 with 2-gate delays. S S S 3 2 1 RCA + 5 8 2 1 1 The last stage is a Ripple Carry 2 Adder (RCA) which requires X Y X Y X Y 2 0 1 + 1 8 _ longer delay. It can be replaced Co Ci Co Ci Co Ci FA FA FA 0 5 4 6 2 0 1 by a CLA for larger multipliers. S S S 4 3 P[7] P[6] P[5] P[1] P[0] P[4] P[3] P[2] 2c.7 2c.8 Carry-Save (3,2) Adders 1-bit FA vs. 1-bit CSA • A carry save adder is also called a (3,2) • Any difference between an ordinary full adder and 1- adder or a (3,2) counter (refer to bit CSA? 0 1 0 1 Computer Arithmetic Algorithms by 1 0 0 1 Israel Koren) as it takes three vectors, + 1 0 1 1 adds them up, and reduces them to 1 0 0 1 _ Carry vector two vectors, namely a sum vector and a 0 1 1 1 Sum vector carry vector • 16-bit wide CSA takes ( more / equal / less ) time to • CSA’s are based on the principle that produce its outputs compared to an 8-bit wide CSA carries do not have to be added _______________, but can be • Carry-save adder ( is / is not ) useful in adding only 2 combined ______________ numbers • An n-bit CSA consist of n disjoint full adders

  3. 2c.9 2c.10 CSA Organization Wallace Tree Multiplier • Using the previous example as a • We can arrange our template, to build an NxN multiplier q7·M q6·M q5·M q4·M q3·M q2·M q1·M q0·M CSA’s in a _______ you need (n-1) of CSA CSA manner where ____ (n-1) bit CSAs followed by a CSA CSA partial product is final (n-1)-bit RCA • Delay = Delay of (n-1) CSA’s CSA added per CSA (after + Delay of (n-1) bit RCA the first level) CSA = ______________________ Propagation Adder • We can reduce the CSA component Product of the delay by organizing the CSA’s Note: The vectors (partial products) in a _____ (i.e. ___________ delay) need to be aligned before summing. These details are not shown in the block diagram. 2c.11 2c.12 Logic Delay Wallace Tree Discussion Consider the gate • • A 4-input OR gate reduces 4 literals to 1 (i.e. a factor of 4 arrangement for OR’ing reduction) 8 bits • A CSA reduces 3 vectors to 2 vectors (i.e. a factor of 1.5) Linear: • – This reduction factor may not be convenient to develop an efficient tree Delay = __ gates – to sum 16 or 32 partial products Tree • – Wallace tree may not achieve a great reduction in delay due to wastage of Depth of tree = ____ = – an extra level __ levels • Consider OR’ing 16-bits • Also not the Wallace tree shown earlier does not show… using 4-bit OR gates, – Size of buses how many levels would – What bits are “retired” progressivley you need? – Relative significance (alignment) of partial products – Size of the carry-propagate adder (e.g. RCA or CLA) needs to be figured out and overall delay estimated

  4. 2c.13 2c.14 10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 Original 6x6 Matrix Reorganized 6x6 matrix 10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 Level 1 CSA Level 2 CSA 10 9 8 7 6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0 Results of Level 1 Level 3 CSA 2c.15 Credits • These slides were derived from Gandhi Puvvada’s EE 457 Class Notes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend