Summary of previous lecture computer architecture = instruction - - PowerPoint PPT Presentation

summary of previous lecture
SMART_READER_LITE
LIVE PREVIEW

Summary of previous lecture computer architecture = instruction - - PowerPoint PPT Presentation

Summary of previous lecture computer architecture = instruction set architecture + machine organisation CPI = (CPI i x instr. count i ) / ( instr. count i ) exe. instr. cycle minimise: X CPI X = time count time


slide-1
SLIDE 1

dt10 2011 4.1

Summary of previous lecture

  • computer architecture
  • CPI = ∑ (CPIi x instr. counti) / (∑ instr. counti)
  • minimise:
  • CISC:

RISC:

= instruction set architecture + machine organisation

exe. time instr. count cycle time

CPI

X X

=

instr. count code size cycle time

CPI

instr. count code size cycle time

CPI

slide-2
SLIDE 2

dt10 2011 4.2

Computer arithmetic

(3rd Ed: p.160-175, Apx. B; 4th Ed: p.87-94, p.224-229, Apx. C.5)

  • two’s complement: signed integer representation
  • e.g. 10112C = (1× -23) + (0×22) + (1×21)+(1×20) = -5ten
  • n-bit: range (-2n-1) .. (2n-1-1)
  • sign extension: 10112C = 11110112C
  • overflow: A, B > 0,

A+B ≤ 0 A, B < 0, A+B ≥ 0

  • in MIPS: slt, slti work with two’s complement

sltu, sltiu work with unsigned representation (do not cause exception when overflow)

slide-3
SLIDE 3

dt10 2011 4.3

Logical operations

  • shift left logical

– sll $10, $16, 8 # reg10 = reg16 « 8 bits

– reg16 0 .. 0 0000 0000 1101 – reg10 0 .. 0 1101 0000 0000 – format

  • shift left logical variable (sllv): shamt in register

source1

  • right shifts: srl, srlv, sra (sign-extend high order bits)
  • bitwise: or, and, ori, andi

20 bits introduce zeros

16 10 8 R type funct source2 dest. shamt source1

slide-4
SLIDE 4

dt10 2011 4.4

ALU building blocks

  • and, or, inv, mux
  • ALU: Arithmetic Logic Unit n=32 for MIPS
  • bit-level realisation: hierarchical, regular structure
slide-5
SLIDE 5

dt10 2011 4.5

Bit-wise logical / selection operations

  • and, or, …
  • selector / multiplexor
slide-6
SLIDE 6

dt10 2011 4.6

Add / subtract

slide-7
SLIDE 7

dt10 2011 4.7

Deriving ALU cell by interleaving components

  • group components together to form larger repeated unit
  • the dotted boxes have the same function and interface
slide-8
SLIDE 8

dt10 2011 4.8

Selecting ALU operation

  • programmable inverter for bi (using xor)
  • connecting mux in series
  • d0d1: 00 and, 01 or,

10 add, 11 subtract

  • detecting overflow: exercise
slide-9
SLIDE 9

dt10 2011 4.9

Comparison operations

  • slt: set on less than, if a < b then 1 else 0
  • if a < b, a-b < 0, so MSB of (a-b) is 1
  • implementation

– provide additional input to each cell – LSB input from MSB ALUb output, other inputs set to 0 – include additional mux in cell for selection – to select slt, s=0, d0=1, d1=1

slide-10
SLIDE 10

dt10 2011 4.10

Zero detector

  • beq, bne: test a=b or a-b=0
  • include another gate to test if output zero
  • summary: s d0 d1

011 100 101 110 111 function: set on and or add subtract less than

slide-11
SLIDE 11

dt10 2011 4.11

Performance estimation

  • clocked circuit: no combinational loops
  • speed limited by propagation delay through the

slowest combinational path

  • slowest path: usually carry path
  • clock rate: approx. 1/(delay of slowest path)

assuming

– edge-triggered design – flip-flop propagation delay, set-up time, clock skew etc. negligible (see 3rd Ed: B.7, B.11; 4th Ed: C.7, C.11)

slide-12
SLIDE 12

dt10 2011 4.12

Fast addition

  • carry select

– compute both zero-carry-in and one-carry-in after n stages – e.g. 8 bits: use three 4-bit ripple carry adders

  • other possibilities

– carry-lookahead adder – conditional-sum adder

slide-13
SLIDE 13

dt10 2011 4.13

Multiply and divide

(3rd Ed: p.250-274, 4th Ed: p.230-242)

  • implementing multiplication using ALU
  • Booth’s multiplication algorithm
  • implementing division using ALU
  • related MIPS instructions

– mult, multu, div, divu, mfhi, mflo, sll, srl, sra

slide-14
SLIDE 14

dt10 2011 4.14

Multiply: example

  • multiplicand × multiplier = product
  • idea: sum of multiplicand shifted successively by

1 bit relative to multiplier; CSAA

  • 2

× 11 = 22 0010 mc × 1011 mp ...0010 ← mc shifted 0 bit × bit 0 of mp ..0010. ← " " 1 " × " 1 " + 0010... ← " " 3 " × " 3 " 0010110

slide-15
SLIDE 15

dt10 2011 4.15

Multiplication hardware: first version

The multiplicand register, ALU, and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. The 32-bit multiplicand starts in the right half of the Multiplicand register, and is shifted left 1 bit on each

  • step. The multiplier is shifted in the opposite direction at each step. The algorithm starts with the product initialised

to 0. Control decides when to shift the Multiplicand and Multiplier registers and when to write new values into the Product register.

slide-16
SLIDE 16

dt10 2011 4.16

pr = mp × mc

slide-17
SLIDE 17

dt10 2011 4.17

Multiplication hardware: second version

The Multiplicand register, ALU, and Multiplier register are all 32 bits wide, with only the Product register left as 64 bits. Now the product is shifted right.

slide-18
SLIDE 18

dt10 2011 4.18

pr = mp × mc

LH: left Half

slide-19
SLIDE 19

dt10 2011 4.19

Booth’s insight

  • substitute n additions by 1 subtraction, 1 addition
  • successive 1s in multiplier mp

⇒ successive addition of shifted multiplicand mc

  • given number of 1s in mp = k, and initially shifted

mc = m, then summing the k terms give: m + 2m + 22m +···+2k-1m (geometric series)

0010 mc × 0110 mp 0010 shift left mc since mp1 = 1 + 0010 shift left mc since mp2 = 1 001100

m = 00100 k = 2

slide-20
SLIDE 20

dt10 2011 4.20

Booth’s algorithm

  • replace summing k terms m + 2m +···+ 2k-1m

by 1 subtraction and 1 addition: -m + 2km (and k shifts to get the 2k factor)

  • proof: let S = 1+2+···+2k-1, so 2×S = 2+4+···+2k-1+2k

S = 2×S - S = 2k+(2k-1-2k-1)+···+(2-2)-1 = 2k-1

  • exercise: check that it works for signed numbers
  • algorithm detects a string of 1s in mp:

.. 0 1 1 1 1 .. 1 1 0 0 0 4 cases

slide-21
SLIDE 21

dt10 2011 4.21

Division

  • invented by Briggs
  • Dividend = Quotient × Divisor + Remainder

74 = 9 × 8 + 2

  • compare r´ and ds: calculate r ´ = r ´ - ds

– r ´ < 0, r ´ = r ´ + ds (restore old value of r ´) – r ´ ≥ 0, accept r ´ for further calculation

1001 q ds 1000 1001010 dd (r ´ : intermediate value)

  • 1000

align MSB(ds) and MSB(dd) 0010 r ´< ds: q = q ++ <0> 0101 1010 ← r ´≥ ds: q = q ++ <1>

  • 1000

10 r (r = r ´ when finished)

slide-22
SLIDE 22

dt10 2011 4.22

First version of the division hardware

loop: r = r - ds if r ≥ 0 left shift q, LSB(q)=1 else r = r + ds, left shift (q) LSB(q) = 0 right shift ds

The Divisor register, ALU, and Remainder register are all 64 bits wide, with only the Quotient register being 32 bits. The 32-bit divisor starts in the left half of the Divisor register and is shifted right 1 bit on each step. The remainder is initialised with the dividend. Control decides when to shift the Divisor and Quotient registers and when to write the new value into the Remainder register.

slide-23
SLIDE 23

dt10 2011 4.23

dd = (q × ds) + r

slide-24
SLIDE 24

dt10 2011 4.24

To note

  • refining division implementation

– remainder shift left: reduce divisor / ALU size – combine quotient and remainder registers

  • MIPS instructions

– multu, divu: unsigned operations – result in HI, LO registers – mflo: move data from LO register

  • exercise: signed numbers in

– multiplication (3rd Ed: p.180, 4th Ed: p.234) – division (3rd Ed: p.187, 4th Ed: p.239)