UMBC A B M A L T F O U M B C I M Y O R T 1 - - PowerPoint PPT Presentation

▶

Jul 06, 2023 95 likes •473 views

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 Digital Device Components A simple processor illustrates many of the basic components used in any digital system: Memory Input-Output Control Datapath Datapath: The core --

SLIDE 1

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 1 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Digital Device Components A simple processor illustrates many of the basic components used in any digital system:

Datapath: The core -- all other components are support units that store

either the results of the datapath or determine what happens in the next cycle. Control Memory Datapath Input-Output

SLIDE 2

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 2 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Digital Device Components

Memory:

A broad range of classes exist determined by the way data is accessed: Read-Only vs. Read-Write Sequential vs. Random access Single-ported vs. Multi-ported access Or by their data retention characteristics: Dynamic vs. Static Stay tuned for a more extensive treatment of memories.

Control:

A FSM (sequential circuit) implemented using random logic, PLAs or memories.

Interconnect and Input-Output:

Parasitic resistance, capacitance and inductance affects performance of wires both on and off the chip. Growing die size increases the length of the on-chip interconnect, increasing the value of the parasitics.

SLIDE 3

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 3 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Digital Device Components Datapath elements include adders, multipliers, shifters, BFUs, etc. The speed of these elements often dominates the overall system performance so optimization techniques are important. However, as we will see, the task is non-trivial since there are multiple equivalent logic and circuit topologies to choose from, each with adv./

disadv. in terms of speed, power and area.

Also, optimizations focused at one design level, e.g., sizing transistors, leads to inferior designs. Data-In Registers Adder Shifter Multiplexer Data-Out Control Bit-sliced organization is common for datapaths. Bit 0 Bit 1 Bit 3 Bit 4 Bit 2

SLIDE 4

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 4 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Let’s start with addition, since it is a very common datapath element and

ften a speed-limiting element.

Optimizations can be applied at the logic or circuit level. Logic-level optimization try to rearrange the Boolean equations to produce a faster or smaller circuit, e.g. carry look-ahead adder. Circuit-level optimizations manipulate transistor sizes and circuit topology to optimize speed. Let’s start with some basic definitions before considering optimizations: Ci B A G(A.B) P(A+B) P’(A + B) Sum Co 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 delete Carry status delete propagate propagate propagate propagate generate generate

SLIDE 5

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 5 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction G(A.B): (generate) Occurs when a Co is internally generated within the adder (occurs independent of Ci). P(A+B): (propagate) Indicates that Ci is propagated (passed) to Co. P’(A XOR B): (propagate) Used in some adders for the P term since it can be reused to generate the sum term. D(A.B): (delete) Ensures that a carry bit will be deleted at Co. The Boolean expressions for S and Co are: Sum = A.B.Ci + A.B.Ci + A.B.Ci + A.B.Ci = A XOR B XOR C Carry = A.B + A.Ci + B.Ci

SLIDE 6

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 6 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction But S and Co can be written in terms of G and P’: Co(G, P’) = G + P’Ci (or P in this case). S(G, P’) = P’ XOR Ci Note that G and P’ are INdependent of Ci. (Also, Co and S can be expressed in terms of delete (D)). Ripple-carry adder: The critical path (worst case delay over all possible inputs) is a ripple from lsb to msb. Ci,0 Co,0 =Ci,1 A0 B0 S0 Co,1 A1 B1 S1 Co,2 A2 B2 S2 Co,3 A3 B3 S3 FA FA FA FA

SLIDE 7

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 7 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction The delay in this case is proportional to the number of bits, N, in the input words: tadder = (N - 1)tcarry + tsum where tcarry and tsum are the propagation delays from Ci to Co & S. One possible worst case bit pattern (from lsb to msb) is: A: 00000001; B: 01111111 Convince yourself that this is true. Note that when optimizing this structure, it is far more important to optimize tcarry than tsum. The inverting property of a full adder can be used to achieve this goal: Ci Co A B FA S Ci Co A B FA S

SLIDE 8

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 8 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Thus, S(A, B, Ci) = S(A, B, Ci) Co(A, B, Ci) = Co(A, B, Ci) One possible (un-optimized) implementation: Ci A B A B Co A B Ci S Transistor level diagram uses 32 transistors. (see Weste and Eshraghian). Ci.P(A + B) G(A.B) P’ XOR Ci

SLIDE 9

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 9 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Co is reused in the S term as: Sum = A.B.Ci + (A + B + Ci)Co Even with some design tricks, e.g., transistors on the critical path, Ci placed closest to the output and symmetrical design, this implementation is slow. A B Ci A B A B Ci A B Ci A B A B Ci Ci B A A B Ci Co S Are the n and p trees duals

f each other?

28 transistors Co Symmetrical design eliminates diffusion caps and reduces series R.

SLIDE 10

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 10 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction The load capacitance in previous version on Co consists of 2 diffusion capacitances (inverter) and 6 (next bit) gate capacitances: This version increases Co’s load to 4 diffusion caps, 2 internal (sum) gate caps plus the 6 (next bit) gate caps. Cin C<n> C<3> S<3> S<2> S<1> S<0> S<n> B<n> A<n> B<3> A<3> A<2> A<1> A<0> B<2> B<1> B<0> Sign of C<n+1> Overflow C<3> S<3> S<2> S<1> S<0> B<3> A<3> A<2> A<1> A<0> B<2> B<1> B<0> the result Subtract Eliminates the inverter delay per bit for carry!

SLIDE 11

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 11 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Serial addition can be used if area is a concern: In this case, you want equal Sum and Carry delays in order to minimize clock cycle time. Bit-level pipelining can be used to break the dependency between addition time and the number of bits by inserting FAs between each register bit. n bit shift register Clk n bit shift register addend augand Cin Cout Clk Set Clr Reg 1-bit result

SLIDE 12

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 12 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Transmission-gate Adder: Note: S and Co delay times are approximately equal -- good for multipliers. See Weste and Eshraghian for an 18 transistor implementation. XOR XNOR A B Ci S Co Total transistors is 26

SLIDE 13

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 13 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Dynamic Adder Design: np-CMOS adder A0 B0 Ci0 Ci A1 B1 S0 φ φ A0 B0 φ φ φ φ φ A1 B1 Ci1 Ci2 φ φ φ Ci0 A0 B0 Ci0 A0 B0 φ φ A1 φ B1 Ci1 φ Ci1 A1 B1 S1

SLIDE 14

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 14 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Dynamic Adder Design: Manchester Carry-Chain adder. A chain of pass-transistors are used to implement the carry chain. Precharge: All intermediate nodes, e.g. Co,0, charged to VDD. Evaluate: Node Co,k is discharged, for example, if there is an incoming carry, Ci,0 and the previous propagate signals are high, P0 to Pk-1. Only 4 diffusion capacitances are present per node but the distributed RC- nature of the chain results in delay that is quadratic with number of bits. Buffers and/or transistor sizing can be used to improve performance. φ Ci,0 φ P0 G0 P1 G1 P2 G2 P3 G3 P4 G4 Co,4 Co,0 Co,1 Co,2 Co,3

3 2.5 2 1.5 1 3.5 4 3.5 3 2.5 2 1.5 3 2.5 2 1.5 1

Co,4 Transistor sizes largest here since worst case is to discharge all nodes Co,k.

SLIDE 15

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 15 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Consider the worst case delay of the carry chain: Elmore delay is given by: The delay of the RC network is then: tp = 0.69(C1R1 + C2(R1 + R2) + C3(R1 + R2 + R3) + C4(R1 + R2 + R3 + R4) + C5(R1 + R2 + R3 + R4 + R5) + C6(R1 + R2 + R3 + R4 + R5 + R6) Since R1 appears 6 times in the expression, it makes sense to minimize its contribution. Note that reducing R by a factor, e.g. k, at each stage increases the capacitance by a factor k and increases area. A k-factor of 1.5, reduces delay by 40% and increases area by 3.5X.

R1 R2 R3 R4 R5 R6 C1 C2 C3 C4 C6 C5 Out

tp 0.69 Ci i 1 = N

∑

      R j j 1 = i

∑

      =

SLIDE 16

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 16 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Carry-Bypass adder: Assume Ak and Bk (for k = 1...3) are set such that all Pk (propagate) are high. In this case, an incoming carry Ci,0 = 1, propagates along the com- plete chain and Co,3 = 1. In other words: if (P0P1P2P3 == 1) then Co,3 = Ci,0 else either DELETE or GENERATE

ccurred.

Ci,0 Co,0 P0 G0 Co,1 P1 G1 Co,2 P2 G2 Co,3 P3 G3 FA FA FA FA Mux BP = P0P1P2P3 Co,3

SLIDE 17

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 17 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Linear Carry-Select adder: One way around waiting for the incoming carry is to compute the result

f both possible values in advance and let the incoming carry select the

correct result. A Square-Root Carry-Select Adder (delay = O(N1/2)) is constructed by increasing the number of input bits in each block from lsb to msb. Setup 0-carry propagation 1-carry propagation Mux Sum Generation Carry vector Co,k-1 Co,k+3 1 P,G This block adds bits k to k+3. Select operation is much faster than time to compute either of the two possible carry vectors. For Square-Root Carry-Select, higher order blocks take more

perand bits than lower order

blocks.

SLIDE 18

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 18 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Carry look-ahead adder (avoiding the ripple altogether): Compute the carries to each stage in parallel. Note that the low-order terms, e.g., P0 and G0, appear in the expression for every bit, making the fanout load large. Co,k = Gk + Pk . Co,k-1 where Gk = Ak . Bk Pk = Ak + Bk The carry out of the kth stage is computed as: For example, for 4 stages of look-ahead: C0 = G0 + P0Ci C1 = G1 + P1G0 + P1P0Ci C2 = G2 + P2G1 + P2P1G0 + P2P1P0Ci C3 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0Ci The dependency between Co,k and Co,k-1 can be eliminated by expanding Co,k-1. Co,k = Gk + Pk . (Gk-1 + Pk-1.Co,k-2)

SLIDE 19

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 19 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Carry look-ahead adder: One possible implementation without using simple logic gates. Size and fan-in of the gates limit the size to about four. P0 P1 P2 P3 Ci,0 G3 G2 G1 G0 C0,3

SLIDE 20

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 20 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction Carry look-ahead adder: C3 = G3 + P3(G2 + P2(G1 + P1(G0 + P0Ci,0))) Factoring term C3 yields: Domino CMOS implementation: Clk P<0> Ci,0 G<0> P<1> G<1> P<2> G<2> P<3> Clk G<3> C<3> Worst case is pull-down through 6 series n-channel Other high speed versions given in Weste and Eshraghian. transistors.

SLIDE 21

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 21 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Addition/Subtraction The Logarithmic look-ahead adder: O(log2N) delay: The number of logic levels is proportional to log2N, fan-in is limited and the layout is compact (jigsaw puzzle) (see Rabaey for details). (G0, P0) (G1, P1) (G2, P2) (G3, P3) (G4, P4) (G5, P5) (G6, P6) (G7, P7) Co,0 Co,1 Co,2 Co,3 (C4-7,P4-7) Co,4 Co,5 Co,6 Co,7 Forward binary tree Inverse binary tree The dot operator ( )is defined as: (g, p) . (g’, p’) = (g + pg’, pp’)

SLIDE 22

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 22 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Comparison Magnitude Comparators: May be built from an adder, complementer (XOR gates) and a zero detect unit. Think about the modifications necessary to make it a signed comparator (Hint: A couple of XOR gates). B<3> A<3> A<2> A<1> A<0> B<2> B<1> B<0> B = A Zero detect NOR gate. B >= A

SLIDE 23

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 23 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Binary Counters Asynchronous: Based on the Toggle register. Not a good choice for performance and testability (with no reset). C Q Q T Q T Q T Q T Q T Q T Q T Q T Q Q<0> Q<1> Q<2> Q<3> Clk Clk T T T T T T T T Q<3> "Ripple Carry" Binary counter

SLIDE 24

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 24 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Binary Counters Synchronous counter. Replace AND gate with an adder for up/down counting capability. Weste and Eshraghian also show a version that can be initialized. D Q 1-bit Reg Clk Clear D Q 1-bit Reg 1 Clk Clear D Q 1-bit Reg 1 Clk Clear D Q 1-bit Reg 1 Clk Clear Clk Q<0> Q<1> Q<2> Q<3> Clear

SLIDE 25

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 25 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Multiplication can be broken down into two steps: Computation of partial products. Accumulation of the shifted partial products. Multipliers may be classified by the format in which data words are accessed: Serial Serial/parallel Parallel The parallel form computes the partial products in parallel. 1100 0101 1100 0000 1100 0000 X 0111100 Binary multiplication equivalent to AND operation

SLIDE 26

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 26 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Parallel Unsigned Multiplication: X Xi2i i = m 1 –

∑

= Y Y j2 j j = n 1 –

∑

= Multiplying 2 unsigned binary integers results in: P X Y × Xi2i Y j2 j j = n 1 –

∑

i = m 1 –

∑

Pk2k k = m n 1

–

∑

= = = X3 X2 X1 X0 Y3 Y2 Y1 Y0 X3Y0 X2Y0 X1Y0 X0Y0 X3Y1 X2Y1 X1Y1 X0Y1 X3Y2 X2Y2 X1Y2 X0Y2 X3Y3 X2Y3 X1Y3 X0Y3 P7 P6 P5 P4 P3 P2 P1 P0 There are m*n summands produced by a set of m*n AND gates in parallel. Multiplicand Multiplier

SLIDE 27

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 27 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Parallel Multiplication: Multiplication is carried out using a bitwise AND of the operands, Xi and Yi. Most of the work (and delay) is in summing the partial products. Sum the Co X A NxN multiplier requires: N(N-2) full adders N half adders N2 AND gates Y A B Ci Multiplication Partial products

SLIDE 28

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 28 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Array multiplier: Y0 X0 X1 X2 X3 P0 Y1 X0 X1 X2 X3 P1 HA FA FA HA Y2 X0 X1 X2 X3 P2 HA FA FA FA Y3 X0 X1 X2 X3 P3 HA FA FA FA P4 P5 P6 P7 There are a large number of nearly identical critical paths in this circuit. M N tmult = (M-1)+(N-2)tcarry + (N-1)tsum + tand

SLIDE 29

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 29 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication From the delay expression and the fact that all critical paths have the same length, minimizing tmult requires minimizing both tcarry and tsum. This is in contrast with the adder where minimizing tcarry was key. The transmission gate adder is a good choice here. Parallel Signed Multiplication:

P a – m 1 – 2m 1 – ai2i i = m 2 –

∑

+       bn 1 – – 2n 1 – bi2i i = n 2 –

∑

+       = am 1 – bn 1 – 2m n 2 – + aib j2i j + j = n 2 –

∑

i = m 2 –

∑

aibn 1 – 2n 1 – i + i = m 2 –

∑

– am 1 – bi2m 1 – i + i = n 2 –

∑

– + =

Baugh-Wooley algorithm:

A am 1 – 2m 1 – ai2i i = m 2 –

∑

+ – = B bm 1 – 2m 1 – bi2i i = m 2 –

∑

+ – =

Expanding shows that the last two rows of simply adds in their negations. Only 3 additional adders required over the unsigned version. summands are all negative so the algorithm Let A and B represent signed integers.

SLIDE 30

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 30 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Parallel Signed Multiplication:

a7b7 ( )

ADD

a7b7 ( )

ADD

a7b1 ( )

AND

a7b7 ( )

AND

a7b6 ( )

AND

a7b5 ( )

AND

a7b4 ( )

AND

a7b3 ( )

AND

a7b2 ( )

AND

a3b4 ( )

AND ADD

a3b0 ( )

AND

a7b0 ( )

AND

a5b0 ( )

AND

a4b0 ( )

AND

a2b0 ( )

AND

a1b0 ( )

AND

a0b0 ( )

AND

a6b0 ( )

AND P0 P1 P2 P3 P4 P5 P6

a6b1 ( )

AND ADD

a5b1 ( )

AND ADD

a4b1 ( )

AND ADD

a3b1 ( )

AND ADD

a2b1 ( )

AND ADD

a1b1 ( )

AND ADD

a0b1 ( )

AND ADD

a6b2 ( )

AND ADD

a5b2 ( )

AND ADD

a4b2 ( )

AND ADD

a3b2 ( )

AND ADD

a2b2 ( )

AND ADD

a1b2 ( )

AND ADD

a0b2 ( )

AND ADD

a6b3 ( )

AND ADD

a5b3 ( )

AND ADD

a4b3 ( )

AND ADD

a3b3 ( )

AND ADD

a2b3 ( )

AND ADD

a1b3 ( )

AND ADD

a0b3 ( )

AND ADD

a6b4 ( )

AND ADD

a5b4 ( )

AND ADD

a4b4 ( )

AND ADD

a2b4 ( )

AND ADD

a1b4 ( )

AND ADD

a0b4 ( )

AND ADD

a6b5 ( )

AND ADD

a5b5 ( )

AND ADD

a4b5 ( )

AND ADD

a3b5 ( )

AND ADD

a2b5 ( )

AND ADD

a1b5 ( )

AND ADD

a0b5 ( )

AND ADD

a6b6 ( )

AND ADD

a5b6 ( )

AND ADD

a4b6 ( )

AND ADD

a3b6 ( )

AND ADD

a2b6 ( )

AND ADD

a1b6 ( )

AND ADD

a0b6 ( )

AND ADD

a6b7 ( )

AND ADD

a5b7 ( )

AND ADD

a4b7 ( )

AND ADD

a3b7 ( )

AND ADD

a2b7 ( )

AND ADD

a1b7 ( )

AND ADD

a0b7 ( )

AND ADD ADD ADD ADD ADD ADD ADD ADD ADD P8 P7 P9 P10 P11 P12 P13 P14 P15

a7 a7 a6 a5 a4 a3 a2 a1 a0 a6 a5 a4 a3 a2 a1 a0 b0b0 b1b1 b2b2 b3b3 b4b4 b5b5 b6b6 b7b7

SLIDE 31

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 31 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Carry-Save Multiplier: Carry bits can be passed diagonally downwards instead of to the left. Here the carry bits are not immediately added but rather “saved” for the next adder stage. HA HA HA HA FA FA FA HA FA FA FA HA HA FA FA HA Vector-merging adder tmult = (N-1)tcarry + tand + tmerge Critical path is uniquely defined: Cost: A little extra area: Advantage: (Assuming tadd = tcarry). Minimizing tmerge is useful, e.g. use carry-select or lookahead. 4x4 version

SLIDE 32

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 32 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Serial Unsigned Multiplication: Serial/Parallel Unsigned Multiplier shown in Weste and Eshraghian. Clk serial register Cin Clk Reg 1-bit G2 G1 X Y reset Xi and Yi delivered serially to the inputs of G1 at different rates. P7 P0 Computes the summands row-wise from right to left. Disadv: Quadratic delay: tmult = M x N x tcarry If area is a concern.

SLIDE 33

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 33 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Booth Encoding: A special encoding of the multiplier word reduces the number of required addition stages and speeds up multiplication substantially. Radix-4 scheme: The number of partial products (and additions) is halved, resulting in area and speed advantage. The disadvantage is a somewhat more involved multiplier cell. AND operation replaced with inversion and shift logic. Virtually every multiplier in use employs the Booth scheme. Y Y j4 j with Y j 2 1 0 1 2 , , , – , – { } ∈ ( ) j = N 1 – ( ) 2 ⁄

∑

SLIDE 34

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 34 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Multiplication Wallace Multiplier: Trees can be used to replace the linear partial-sum adders: Y0 Y1 Y2 FA FA FA FA Y3 Y4 Y5 Ci-1 Ci-1 Ci-1 Ci Ci Ci Ci Sum Y0 Y1 Y2 Y3 Y4 Y5 # of ripple stages is N-2 Slice of a 6-bit carry-save mult. Ci-1 Ci Ci Ci FA FA FA FA Ci-1 C Sum Adv: O(log2N) mult time. Disadv: Very irregular -- difficult to layout.

SLIDE 35

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 35 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Shifters Right/Left 1-bit shifter: Right/Left S S S S Mux Mux Mux Mux H0 H1 H2 H3 1 1 1 1 A3 IR A2 A1 A0 IL

SLIDE 36

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711 36 (December 11, 2000 3:44 pm)

UMBC

U M B C U N I V E R S I T Y O F M A R Y L A N D B A L T I M O R E C O U N T Y 1 9 6 6

Datapath Operators: Shifters Barrel shifter: s<3> s<2> s<1> s<0> l<6:0> r<3> r<2> r<1> r<0> shift result 1 2 4 8 l<3:0> l<4:1> l<5:2> l<6:3> Arithmetic and logical shifts and rotates possible by muxing l<6:0> to the appropriate values.