[PPT] - Summary of previous lecture computer architecture = instruction PowerPoint Presentation

SLIDE 1

dt10 2011 4.1

Summary of previous lecture

computer architecture
CPI = ∑ (CPIi x instr. counti) / (∑ instr. counti)
minimise:
CISC:

RISC:

= instruction set architecture + machine organisation

exe. time instr. count cycle time

CPI

X X

=

instr. count code size cycle time

CPI

instr. count code size cycle time

CPI

SLIDE 2

dt10 2011 4.2

Computer arithmetic

(3rd Ed: p.160-175, Apx. B; 4th Ed: p.87-94, p.224-229, Apx. C.5)

two’s complement: signed integer representation
e.g. 10112C = (1× -23) + (0×22) + (1×21)+(1×20) = -5ten
n-bit: range (-2n-1) .. (2n-1-1)
sign extension: 10112C = 11110112C
overflow: A, B > 0,

A+B ≤ 0 A, B < 0, A+B ≥ 0

in MIPS: slt, slti work with two’s complement

sltu, sltiu work with unsigned representation (do not cause exception when overflow)

SLIDE 3

dt10 2011 4.3

Logical operations

shift left logical

– sll $10, $16, 8 # reg10 = reg16 « 8 bits

– reg16 0 .. 0 0000 0000 1101 – reg10 0 .. 0 1101 0000 0000 – format

shift left logical variable (sllv): shamt in register

source1

right shifts: srl, srlv, sra (sign-extend high order bits)
bitwise: or, and, ori, andi

20 bits introduce zeros

16 10 8 R type funct source2 dest. shamt source1

SLIDE 4

dt10 2011 4.4

ALU building blocks

and, or, inv, mux
ALU: Arithmetic Logic Unit n=32 for MIPS
bit-level realisation: hierarchical, regular structure

SLIDE 5

dt10 2011 4.5

Bit-wise logical / selection operations

and, or, …
selector / multiplexor

SLIDE 6

dt10 2011 4.6

Add / subtract

SLIDE 7

dt10 2011 4.7

Deriving ALU cell by interleaving components

group components together to form larger repeated unit
the dotted boxes have the same function and interface

SLIDE 8

dt10 2011 4.8

Selecting ALU operation

programmable inverter for bi (using xor)
connecting mux in series
d0d1: 00 and, 01 or,

10 add, 11 subtract

detecting overflow: exercise

SLIDE 9

dt10 2011 4.9

Comparison operations

slt: set on less than, if a < b then 1 else 0
if a < b, a-b < 0, so MSB of (a-b) is 1
implementation

– provide additional input to each cell – LSB input from MSB ALUb output, other inputs set to 0 – include additional mux in cell for selection – to select slt, s=0, d0=1, d1=1

SLIDE 10

dt10 2011 4.10

Zero detector

beq, bne: test a=b or a-b=0
include another gate to test if output zero
summary: s d0 d1

011 100 101 110 111 function: set on and or add subtract less than

SLIDE 11

dt10 2011 4.11

Performance estimation

clocked circuit: no combinational loops
speed limited by propagation delay through the

slowest combinational path

slowest path: usually carry path
clock rate: approx. 1/(delay of slowest path)

assuming

– edge-triggered design – flip-flop propagation delay, set-up time, clock skew etc. negligible (see 3rd Ed: B.7, B.11; 4th Ed: C.7, C.11)

SLIDE 12

dt10 2011 4.12

Fast addition

carry select

– compute both zero-carry-in and one-carry-in after n stages – e.g. 8 bits: use three 4-bit ripple carry adders

other possibilities

– carry-lookahead adder – conditional-sum adder

SLIDE 13

dt10 2011 4.13

Multiply and divide

(3rd Ed: p.250-274, 4th Ed: p.230-242)

implementing multiplication using ALU
Booth’s multiplication algorithm
implementing division using ALU
related MIPS instructions

– mult, multu, div, divu, mfhi, mflo, sll, srl, sra

SLIDE 14

dt10 2011 4.14

Multiply: example

multiplicand × multiplier = product
idea: sum of multiplicand shifted successively by

1 bit relative to multiplier; CSAA

2

× 11 = 22 0010 mc × 1011 mp ...0010 ← mc shifted 0 bit × bit 0 of mp ..0010. ← " " 1 " × " 1 " + 0010... ← " " 3 " × " 3 " 0010110

SLIDE 15

dt10 2011 4.15

Multiplication hardware: first version

The multiplicand register, ALU, and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. The 32-bit multiplicand starts in the right half of the Multiplicand register, and is shifted left 1 bit on each

step. The multiplier is shifted in the opposite direction at each step. The algorithm starts with the product initialised

to 0. Control decides when to shift the Multiplicand and Multiplier registers and when to write new values into the Product register.

SLIDE 16

dt10 2011 4.16

pr = mp × mc

SLIDE 17

dt10 2011 4.17

Multiplication hardware: second version

The Multiplicand register, ALU, and Multiplier register are all 32 bits wide, with only the Product register left as 64 bits. Now the product is shifted right.

SLIDE 18

dt10 2011 4.18

pr = mp × mc

LH: left Half

SLIDE 19

dt10 2011 4.19

Booth’s insight

substitute n additions by 1 subtraction, 1 addition
successive 1s in multiplier mp

⇒ successive addition of shifted multiplicand mc

given number of 1s in mp = k, and initially shifted

mc = m, then summing the k terms give: m + 2m + 22m +···+2k-1m (geometric series)

0010 mc × 0110 mp 0010 shift left mc since mp1 = 1 + 0010 shift left mc since mp2 = 1 001100

m = 00100 k = 2

SLIDE 20

dt10 2011 4.20

Booth’s algorithm

replace summing k terms m + 2m +···+ 2k-1m

by 1 subtraction and 1 addition: -m + 2km (and k shifts to get the 2k factor)

proof: let S = 1+2+···+2k-1, so 2×S = 2+4+···+2k-1+2k

S = 2×S - S = 2k+(2k-1-2k-1)+···+(2-2)-1 = 2k-1

exercise: check that it works for signed numbers
algorithm detects a string of 1s in mp:

.. 0 1 1 1 1 .. 1 1 0 0 0 4 cases

SLIDE 21

dt10 2011 4.21

Division

invented by Briggs
Dividend = Quotient × Divisor + Remainder

74 = 9 × 8 + 2

compare r´ and ds: calculate r ´ = r ´ - ds

– r ´ < 0, r ´ = r ´ + ds (restore old value of r ´) – r ´ ≥ 0, accept r ´ for further calculation

1001 q ds 1000 1001010 dd (r ´ : intermediate value)

1000

align MSB(ds) and MSB(dd) 0010 r ´< ds: q = q ++ <0> 0101 1010 ← r ´≥ ds: q = q ++ <1>

1000

10 r (r = r ´ when finished)

SLIDE 22

dt10 2011 4.22

First version of the division hardware

loop: r = r - ds if r ≥ 0 left shift q, LSB(q)=1 else r = r + ds, left shift (q) LSB(q) = 0 right shift ds

The Divisor register, ALU, and Remainder register are all 64 bits wide, with only the Quotient register being 32 bits. The 32-bit divisor starts in the left half of the Divisor register and is shifted right 1 bit on each step. The remainder is initialised with the dividend. Control decides when to shift the Divisor and Quotient registers and when to write the new value into the Remainder register.

SLIDE 23

dt10 2011 4.23

dd = (q × ds) + r

SLIDE 24

dt10 2011 4.24

To note

refining division implementation

– remainder shift left: reduce divisor / ALU size – combine quotient and remainder registers

MIPS instructions

– multu, divu: unsigned operations – result in HI, LO registers – mflo: move data from LO register

exercise: signed numbers in

Summary of previous lecture

RISC:

= instruction set architecture + machine organisation

exe. time instr. count cycle time

CPI

=

instr. count code size cycle time

CPI

instr. count code size cycle time

CPI

Computer arithmetic

A+B ≤ 0 A, B < 0, A+B ≥ 0

sltu, sltiu work with unsigned representation (do not cause exception when overflow)

Logical operations

– sll $10, $16, 8 # reg10 = reg16 « 8 bits

– reg16 0 .. 0 0000 0000 1101 – reg10 0 .. 0 1101 0000 0000 – format

source1

ALU building blocks

Bit-wise logical / selection operations

Add / subtract

Deriving ALU cell by interleaving components

Selecting ALU operation

10 add, 11 subtract

Comparison operations

– provide additional input to each cell – LSB input from MSB ALUb output, other inputs set to 0 – include additional mux in cell for selection – to select slt, s=0, d0=1, d1=1

Zero detector

011 100 101 110 111 function: set on and or add subtract less than

Performance estimation

slowest combinational path

assuming

– edge-triggered design – flip-flop propagation delay, set-up time, clock skew etc. negligible (see 3rd Ed: B.7, B.11; 4th Ed: C.7, C.11)

Fast addition

– compute both zero-carry-in and one-carry-in after n stages – e.g. 8 bits: use three 4-bit ripple carry adders

– carry-lookahead adder – conditional-sum adder

Multiply and divide

– mult, multu, div, divu, mfhi, mflo, sll, srl, sra

Multiply: example

1 bit relative to multiplier; CSAA

× 11 = 22 0010 mc × 1011 mp ...0010 ← mc shifted 0 bit × bit 0 of mp ..0010. ← " " 1 " × " 1 " + 0010... ← " " 3 " × " 3 " 0010110

Multiplication hardware: first version

pr = mp × mc

Multiplication hardware: second version

pr = mp × mc

Booth’s insight

⇒ successive addition of shifted multiplicand mc

mc = m, then summing the k terms give: m + 2m + 22m +···+2k-1m (geometric series)

0010 mc × 0110 mp 0010 shift left mc since mp1 = 1 + 0010 shift left mc since mp2 = 1 001100

Booth’s algorithm

by 1 subtraction and 1 addition: -m + 2km (and k shifts to get the 2k factor)

S = 2×S - S = 2k+(2k-1-2k-1)+···+(2-2)-1 = 2k-1

.. 0 1 1 1 1 .. 1 1 0 0 0 4 cases

Division

74 = 9 × 8 + 2

– r ´ < 0, r ´ = r ´ + ds (restore old value of r ´) – r ´ ≥ 0, accept r ´ for further calculation

First version of the division hardware

dd = (q × ds) + r

To note

– remainder shift left: reduce divisor / ALU size – combine quotient and remainder registers

– multu, divu: unsigned operations – result in HI, LO registers – mflo: move data from LO register

– multiplication (3rd Ed: p.180, 4th Ed: p.234) – division (3rd Ed: p.187, 4th Ed: p.239)