Chapter Five 1 2004 Morgan Kaufmann Publishers The Processor: - - PowerPoint PPT Presentation

chapter five
SMART_READER_LITE
LIVE PREVIEW

Chapter Five 1 2004 Morgan Kaufmann Publishers The Processor: - - PowerPoint PPT Presentation

Chapter Five 1 2004 Morgan Kaufmann Publishers The Processor: Datapath & Control We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical


slide-1
SLIDE 1

Chapter Five

1

2004 Morgan Kaufmann Publishers

slide-2
SLIDE 2
  • We're ready to look at an implementation of the MIPS
  • Simplified to contain only:

– memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt – control flow instructions: beq, j

  • Generic Implementation:

The Processor: Datapath & Control

2

2004 Morgan Kaufmann Publishers

– use the program counter (PC) to supply instruction address – get the instruction from memory – read registers – use the instruction to decide exactly what to do

  • All instructions use the ALU after reading the registers

Why? memory-reference? arithmetic? control flow?

slide-3
SLIDE 3
  • Abstract / Simplified View:

More Implementation Details

3

2004 Morgan Kaufmann Publishers

  • Two types of functional units:

– elements that operate on data values (combinational) – elements that contain state (sequential)

slide-4
SLIDE 4

More Implementation Details

  • Include necessary multiplexors and control lines

4

2004 Morgan Kaufmann Publishers

slide-5
SLIDE 5
  • Unclocked vs. Clocked
  • Clocks used in synchronous logic

– when should an element that contains state be updated?

State Elements

Falling edge

5

2004 Morgan Kaufmann Publishers

Clock period Rising edge

slide-6
SLIDE 6
  • The set-reset latch

– output depends on present inputs and also on past inputs

An unclocked state element

R Q

6

2004 Morgan Kaufmann Publishers

S Q

slide-7
SLIDE 7
  • Output is equal to the stored value inside the element

(don't need to ask for permission to look at the value)

  • Change of state (value) is based on the clock
  • Latches: whenever the inputs change, and the clock is asserted
  • Flip-flop: state changes only on a clock edge

(edge-triggered methodology)

Latches and Flip-flops

7

2004 Morgan Kaufmann Publishers

"logically true", — could mean electrically low A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written

slide-8
SLIDE 8
  • Two inputs:

– the data value to be stored (D) – the clock signal (C) indicating when to read & store D

  • Two outputs:

– the value of the internal state (Q) and it's complement

D-latch

8

2004 Morgan Kaufmann Publishers

Q C D _ Q

D C Q

slide-9
SLIDE 9

D flip-flop

  • Output changes only on the clock edge

D D latch D C Q D latch D C Q Q Q Q

9

2004 Morgan Kaufmann Publishers

D C Q

C

slide-10
SLIDE 10

Our Implementation

  • An edge triggered methodology
  • Typical execution:

– read contents of some state elements, – send values through some combinational logic – write results to one or more state elements

10

2004 Morgan Kaufmann Publishers

slide-11
SLIDE 11
  • Built using D flip-flops

Register File

11

2004 Morgan Kaufmann Publishers

Do you understand? What is the “Mux” above?

slide-12
SLIDE 12

Register File

  • Note: we still use the real clock to determine when to write

12

2004 Morgan Kaufmann Publishers

slide-13
SLIDE 13

Building a Datapath

  • Datapath:

– For fetching instrs and incrementing the PC – For R-type (or arithmetic-logical) instrs – For MIPS load & store instrs – For the beq instr – For the jump (j) instr

  • Instruction formats:

13

2004 Morgan Kaufmann Publishers

  • Instruction formats:

Field size R-format I-format J-format 6 bits

  • p
  • p
  • p

5 bits rs rs target address 5 bits rt rt 5 bits rd address/immediate 5 bits shamt 6 bits funct All MIPS instructions 32 bits Arithmetic instructions format Transfer, branch, imm. format Jump instruction format Fields Name Comments

slide-14
SLIDE 14

Building a Datapath: Fetching instrs and incrementing PC

  • Operations:

– To execute any instr: fetch the instr from memory – To prepare for the next instr: increment the PC (4 bytes later)

  • The datapath elements: two state elements and an adder
  • state elements: to store and access instructions,
  • adder: to compute the next instruction address.

14

2004 Morgan Kaufmann Publishers

slide-15
SLIDE 15

Building a Datapath: Fetching instrs and incrementing PC

  • A portion of the datapath used for fetching instrs and increment PC:

15

2004 Morgan Kaufmann Publishers

slide-16
SLIDE 16

Building a Datapath: R-type (arithmetic-logical) instrs

  • Instrs included: add, sub, and, or, slt

– E.g.: add $t1, $t2, $t3 # t1 = t2 + t3

  • Operations: (assume that the instr has already been fetched)

Read two regs, perform an ALU op on the contents of the regs, & write the result.

  • The datapath elements: register file and ALU.

16

2004 Morgan Kaufmann Publishers

slide-17
SLIDE 17
  • The datapath for R-type instrs (add):

Instruction Registers Read data 1 Read register 1 Read register 2

  • ALU

Zero ALU operation 3

Building a Datapath: R-type (arithmetic-logical) instrs 17

2004 Morgan Kaufmann Publishers

Registers Write register Read data 2 Write data ALU result ALU RegWrite

slide-18
SLIDE 18

Building a Datapath: load and store instrs

  • Instrs included: lw, sw

– E.g.s: lw $t1, offset_value($t2) sw $t1, offset_value($t2)

  • Ops: (assume that the instr has already been fetched)

– Compute a mem addr by adding the base reg ($t2) to the 16-bit signed,

  • ffset field contained in the instr.

– For sw, the value to be stored must be read from the reg file ($t1). – For lw, the value read from mem must be written to the reg file ($t1).

18

2004 Morgan Kaufmann Publishers

  • The datapath elements: data memory unit and sign extension unit (in

addition to register file and ALU for R-type instr.)

slide-19
SLIDE 19

Building a Datapath: load and store instrs

  • The datapath for a load or store:

Instruction Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Read data Write ALU result Zero ALU Address MemWrite ALU operation 3

19

2004 Morgan Kaufmann Publishers

FIGURE: The datapath for a load or store does a register access, followed by a memory address calculation, then a read or write from memory, and a write into the register file if the instruction is a load.

16 32 data 2 Data memory Write data Write data Sign extend MemRead RegWrite

slide-20
SLIDE 20

Building a Datapath: beq instr

  • Branch instr:

– E.g. : beq $t1, $t2, offset# if ($t1 = $t2) go to PC+4+(4× × × ×offset)

  • Ops: (assume that the instr has already been fetched)

i. Compute the branch target address: PC ← ← ← ← PC+4+(4× × × ×offset) ii. Compare the reg contents:

  • If the condition is true, the branch target addr becomes the

new PC.

20

2004 Morgan Kaufmann Publishers

new PC.

  • Otherwise, the incremented PC replaces the current PC.
slide-21
SLIDE 21

Building a Datapath: beq instr

  • The datapath for a branch (beq):

FIGURE 5.9 The datapath for a branch uses the ALU to evaluate the branch

21

2004 Morgan Kaufmann Publishers

evaluate the branch condition and a separate adder to compute the branch target as the sum of the incremented PC and the sign-extended, lower 16 bits of the instruction (the branch displacement), shifted left 2 bits.

slide-22
SLIDE 22

Building a Datapath: jump (j) instr

  • The jump instr:

– E.g.: j offset # go to offset

  • Ops:

– Replace a portion of the PC with the lower 26 bits of the instr shifted left by 2 bits (i.e., concatenating 00 to the jump offset).

22

2004 Morgan Kaufmann Publishers

slide-23
SLIDE 23

Two different implementations

  • The implementations:

i. a simple implementation (Single Cycle implementation) ii. a multicycle implementation

23

2004 Morgan Kaufmann Publishers

slide-24
SLIDE 24

A Simple Implementation Scheme (p.298)

  • The simple implementation (Single Cycle Approach):

– the MIPS subset: lw, sw, add, sub, and, or, slt, beq – uses a single clock cycle for every instr:

  • no datapath resource can be used more than once per instr
  • any element needed more than once must be duplicated
  • need a mem for instrs separate from one for data

– The sharing of a datapath element: multiplexor (data selector)

24

2004 Morgan Kaufmann Publishers

– The sharing of a datapath element: multiplexor (data selector)

  • allow multiple connections to the input of an element
  • have a control signal select among the inputs
slide-25
SLIDE 25

How to combines these together?

Instruction

25

2004 Morgan Kaufmann Publishers

Instruction Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Write data ALU result ALU Zero RegWrite ALU operation 3

Load & Store R-type Beq fetch

slide-26
SLIDE 26

The Datapath with MUXs and Control Signals

R-type & lw beq Non-branch sw

26

2004 Morgan Kaufmann Publishers

R-type lw R-type lw & sw lw sw R-type lw

slide-27
SLIDE 27

Simple Implementation for MIPS architecture

  • Figure 5.11: Use multiplexors to stitch them together

Add ALU result M u x Registers Shift left 2 4 ALU operation PCSrc Add

beq Non-branch

3

27

2004 Morgan Kaufmann Publishers

PC Instruction memory Read address Instruction 16 32 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x ALU operation 3 RegWrite MemRead MemWrite ALUSrc MemtoReg ALU result Zero ALU Data memory Address

  • Write

data Read data M u x Sign extend

lw & sw lw R-type

R-type lw

M U X

rt rd rt rs rd

R & beq

4 1 2

slide-28
SLIDE 28

A Simple Datapath for MIPS Subset Inst. Set

28

2004 Morgan Kaufmann Publishers

R-type Load Branch

slide-29
SLIDE 29

Control

  • Selecting the operations to perform (ALU, read/write, etc.)
  • Controlling the flow of data (multiplexor inputs)
  • Information comes from the 32 bits of the instruction
  • Example:

add $8, $17, $18 Instruction Format:

29

2004 Morgan Kaufmann Publishers

000000 10001 10010 01000 00000 100000

  • p

rs rt rd shamt funct

  • Two-layer Control to simplify the design

– The ALU Control – The Main Control Unit

slide-30
SLIDE 30
  • The functions performed by the ALU:

– For load & store instrs:

  • use the ALU to compute the mem addr by addition

– For R-type instrs:

  • The ALU performs one of the five actions depending on the

value of the 6-bit func field in the low-order bits of the instr. – For branch equal:

The ALU Control

30

2004 Morgan Kaufmann Publishers

  • The ALU perform a subtraction.
  • Given instruction type

00 = lw, sw 01 = beq, 10 = arithmetic

  • function code for arithmetic
  • ALU control input

– (the ALU designed in Ch3):

0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR ALUOp

generate

slide-31
SLIDE 31

The ALU Control

Instruction

  • pcode

ALUOp Instruction

  • peration

Funct field Desired ALU action ALU control input LW 00 Load word XXXXXX Add 0010 SW 00 Store word XXXXXX Add 0010 Branch equal 01 Branch equal XXXXXX Subtract 0110 R-type 10 Add 100000 Add 0010 R-type 10 Subtract 100010 Subtract 0110 R-type 10 AND 100100 And 0000 R-type 10 OR 100101 Or 0001 R-type 10 Slt 101010 Set less than 0111

31

2004 Morgan Kaufmann Publishers

  • Describe it using a truth table (can turn into gates):

ALUOp Funct field Operation ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 X X X X X X 0010 (lw/sw) X 1 X X X X X X 0110 (beq) 1 X X X 0010 (add) 1 X X X 1 0110 (sub) 1 X X X 1 0000 (AND) 1 X X X 1 1 0001 (OR) 1 X X X 1 1 0111 (slt)

C3 C2 C1 C0 R-type 10 Slt 101010 Set less than 0111

slide-32
SLIDE 32

The ALU Control (Appendix C)

C2 C1 C0

32

2004 Morgan Kaufmann Publishers

slide-33
SLIDE 33

The ALU Control

33

2004 Morgan Kaufmann Publishers

ALU Control

ALUOp Instruction [5,0]

slide-34
SLIDE 34

The Main Control Unit

  • Major observations about the instr formats:

– The op field, opcode, is always contained in bits 31 ~ 26: Op[5-0] – For R-type instrs, beq, and sw, the 2 regs to be read are always specified by the rs (25 ~ 21) and rt (20 ~ 16) fields. – The base reg for lw and sw is always in rs (25 ~ 21). – The 16-bit offset for beq, lw, and sw is always in bits 15 ~ 0. – The destination reg is in one of two places:

34

2004 Morgan Kaufmann Publishers

– The destination reg is in one of two places: lw: rt (20 ~ 16) R-type instr: rd (15 ~ 11) rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 rs rt address 31:26 25:21 20:16 15:0 rs rt 31:26 25:21 20:16 15:0 address R-type instruction Load or Store Branch instruction

slide-35
SLIDE 35

The Main Control Unit

35

2004 Morgan Kaufmann Publishers

rt rd rs

slide-36
SLIDE 36

The Main Control Unit

  • The effect of the seven control signals

36

2004 Morgan Kaufmann Publishers

slide-37
SLIDE 37

The flow of an R-type instr through the datapath

  • E.g.: add $t1, $t2, $t3
  • Four steps:
  • 1. An instr is fetched from the instr mem and the PC is incremented
  • 2. Two regs ($t2, $t3) are read from the reg file.
  • 3. The ALU operates on the data read from the reg file, using the

function code (bits 5-0) to generate the ALU function.

  • 4. The result from the ALU is written into the reg file using bits 15-

37

2004 Morgan Kaufmann Publishers

  • 4. The result from the ALU is written into the reg file using bits 15-

11 of the instr to select the destination reg ($t1).

slide-38
SLIDE 38

R-type (Fig.5-19)

PC Read address Instruction [25 21] Add MemtoReg ALUOp MemWrite RegWrite MemRead Branch RegDst ALUSrc Instruction [31 26] 4 Control Shift left 2 Add ALU result M u x 1 Read Read register 1

38

2004 Morgan Kaufmann Publishers

PC Instruction memory address Instruction [31– 0] Instruction [20 16] Instruction [5 0] 16 32 Instruction [15 0] M u x 1 ALU control Registers Write register Write data Read data 1 Read data 2 Read register 2 Sign extend M u x 1 ALU result Zero Data memory Write data Read data M u x 1 Instruction [15 11] ALU Address

slide-39
SLIDE 39

The datapath in operation for a load word instr

  • E.g.: lw $t1, offset($t2)
  • Five steps:
  • 1. An instr is fetched from the instr mem and the PC is incremented.
  • 2. A reg ($t2) value is read from the reg file.
  • 3. The ALU compute the sum of the value read from the reg file and

the sign-extended, lower 16 bits of the instr (offset).

  • 4. The sum from the ALU is used as the addr to read from data mem.

39

2004 Morgan Kaufmann Publishers

  • 4. The sum from the ALU is used as the addr to read from data mem.
  • 5. The data from the mem unit is written into the reg file; the reg

destination is given by $t1 (bits 20-16).

slide-40
SLIDE 40

Load Word (Fig.5-20)

PC Read address Instruction [25– 21] Add MemtoReg ALUOp MemWrite RegWrite MemRead Branch RegDst ALUSrc Instruction [31– 26] 4 Control Shift left 2 Add ALU result M u x 1 Read Read register 1

40

2004 Morgan Kaufmann Publishers

PC Instruction memory address Instruction [31– 0] Instruction [15– 11] Instruction [20– 16] Instruction [5– 0] 16 32 Instruction [15– 0] M u x 1 ALU control Registers Write register Write data Read data 1 Read data 2 Read register 2 Sign extend M u x 1 ALU result Zero Data memory Write data Read data M u x 1 ALU

  • Address
slide-41
SLIDE 41

The datapath in operation for a branch equal instr

  • E.g.: beq $t1, $t2, offset
  • Five steps:
  • 1. An instr is fetched from the instr mem and the PC is incremented.
  • 2. Two regs ($t1 & $t2) are read from the reg file.
  • 3. The ALU performs a subtract on the data values read from the

reg file. The value of PC + 4 is added to the sign-extended, lower 16 bits of the instr (offset) shifted left by two; the result is the

41

2004 Morgan Kaufmann Publishers

16 bits of the instr (offset) shifted left by two; the result is the branch target addr.

  • 4. The Zero result from the ALU is used to decide which adder

result to store into the PC.

slide-42
SLIDE 42

Branch on Equal (Fig.5-21)

Read Instruction [25– 21] Add MemtoReg ALUOp MemWrite RegWrite MemRead Branch RegDst ALUSrc Instruction [31– 26] 4 Shift left 2 Control Read register 1 Add ALU result M u x 1

42

2004 Morgan Kaufmann Publishers

PC Instruction memory Read address Instruction [31– 0] Instruction [15– 11] Instruction [20– 16] Instruction [5– 0] 16 32 Instruction [15– 0] M u x 1 ALU control Registers Write register Write data Read data 1 register 1 Read register 2 Sign extend 1 ALU result Zero Data memory Write data Read data M u x Read data 2 M u x 1 ALU Address

slide-43
SLIDE 43

Finalizing the Main Control Unit (p.367)

  • The setting of the control lines:

43

2004 Morgan Kaufmann Publishers

  • The encoding for each of the opcodes of interest:

Page 367

slide-44
SLIDE 44

The Main Control Unit

  • The structured implementation of the ctrl function: PLA (Appendix C)

Input or

  • utput

Signal name R-format lw sw beq Inputs Op5 1 1 Op4 Op3 1 Op2 1 Op1 1 1

44

2004 Morgan Kaufmann Publishers

Op0 1 1 Outputs RegDst 1 X X ALUSrc 1 1 MemtoReg 1 X X RegWrite 1 1 MemRead 1 MemWrite 1 Branch 1 ALUOp1 1 ALUOp2 1

slide-45
SLIDE 45

Control

  • Simple combinational logic (truth tables)

45

2004 Morgan Kaufmann Publishers

ALU Control Unit Main Control Unit

slide-46
SLIDE 46

The datapath for a jump instruction (Fig.5-24)

000010 address

31:26 25:0

PC+4 address 00

31:28 27:2 1:0

Instruction (opcode=2) Target address

46

2004 Morgan Kaufmann Publishers

slide-47
SLIDE 47

Is Single-Cycle Design Practical?

  • CPI = 1

– Each instruction has the same cycle time – the cycle time is determined by the most time-consuming instruction – CPU execution time = Instruction count × × × × Clock cycle time

  • In general, “load” has the longest path for execution:

instruction memory, register file, ALU, data memory, and

47

2004 Morgan Kaufmann Publishers

instruction memory, register file, ALU, data memory, and register file

slide-48
SLIDE 48

Performance of Single-Cycle Machines

48

2004 Morgan Kaufmann Publishers

slide-49
SLIDE 49

Performance of Single-Cycle Machines

<Ans>

49

2004 Morgan Kaufmann Publishers

slide-50
SLIDE 50

Performance of Single-Cycle Machines

Instruction mix:

50

2004 Morgan Kaufmann Publishers

CPU Performancevariable clock CPU Performancesingle clock CPU execution timesingle clock CPU execution timevariable clock CPU clock cyclesingle clock CPU clock cyclevariable clock = =

  • Fixed-length clock cycle time = 8 ns
  • Variable-length clock cycle time
slide-51
SLIDE 51

Where we are headed

  • Single Cycle Problems:

– Violate out key design principle of making the common case fast.

  • what if we had a more complicated instruction like floating point?

– Some functional units must be duplicated, raising the cost.

  • Each functional unit can be used only once per clock.
  • One Solution:

– use a “smaller” cycle time – have different instructions take different numbers of cycles

51

2004 Morgan Kaufmann Publishers

– have different instructions take different numbers of cycles – a “multicycle” datapath:

Abstract version of a multicycle datapath

slide-52
SLIDE 52
  • Differences b/t this version & the single-cycle version

i. A single mem unit is used for both instrs and data.

  • ii. There is only a single ALU.
  • iii. One or more regs are added after every major functional unit to

hold the output of that unit until the value is used in a subsequent clock cycle:

IR, MDR, A, B, ALUout, and some multiplexers

Multicycle Approach

52

2004 Morgan Kaufmann Publishers

  • We will be reusing functional units

– ALU used to compute address and to increment PC – Memory used for instruction and data – Note: Functional units can be shared within a single instr, but need to be used in different cycles.

  • Our control signals will not be determined directly by instruction

– e.g., what should the ALU do for a “subtract” instruction?

  • We’ll use a finite state machine for control
slide-53
SLIDE 53
  • Break up the instructions into steps, each step takes a cycle

– balance the amount of work to be done – restrict each cycle to use only one major functional unit

  • At the end of a cycle

– store values for use in later cycles – introduce additional “internal” registers

  • Major difference in multiplexer: source of mem address, source of ALU

Multicycle Approach

53

2004 Morgan Kaufmann Publishers

slide-54
SLIDE 54

Instructions from ISA perspective

  • Consider each instruction from perspective of ISA.
  • Example:

– The add instruction changes a register. – Register specified by bits 15:11 of instruction. – Instruction specified by the PC. – New value is the sum (“op”) of two registers. – Registers specified by bits 25:21 and 20:16 of the instruction

54

2004 Morgan Kaufmann Publishers

– Registers specified by bits 25:21 and 20:16 of the instruction Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] – In order to accomplish this we must break up the instruction. (kind of like introducing variables when programming)

slide-55
SLIDE 55

Breaking down an instruction

  • ISA definition of arithmetic:

Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]

  • Could break down to:

– IR <= Memory[PC] – A <= Reg[IR[25:21]]

55

2004 Morgan Kaufmann Publishers

– A <= Reg[IR[25:21]] – B <= Reg[IR[20:16]] – ALUOut <= A op B – Reg[IR[20:16]] <= ALUOut

  • We forgot an important part of the definition of arithmetic!

– PC <= PC + 4

slide-56
SLIDE 56

Idea behind multicycle approach

  • We define each instruction from the ISA perspective (do this!)
  • Break it down into steps following our rule that data flows through at

most one major functional unit (e.g., balance work across steps)

  • Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)
  • Finally try and pack as much work into each step

56

2004 Morgan Kaufmann Publishers

  • Finally try and pack as much work into each step

(avoid unnecessary cycles) while also trying to share steps where possible (minimizes control, helps to simplify solution)

  • Result: Our book’s multicycle Implementation!
slide-57
SLIDE 57
  • Instruction Fetch
  • Instruction Decode and Register Fetch
  • Execution, Memory Address Computation, or Branch Completion
  • Memory Access or R-type instruction completion

Five Execution Steps

57

2004 Morgan Kaufmann Publishers

  • Memory Access or R-type instruction completion
  • Write-back step

INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

slide-58
SLIDE 58
  • Function: fetch the instr from mem & increment the PC
  • Can be described succinctly using RTL “Register-Transfer-Language”

IR = Memory[PC] PC = PC + 4

  • Operations and the setting of control signals:
  • i. Send the PC to the mem as the addr, perform a read and fetch the

instr into the IR

Step 1: Instruction Fetch

58

2004 Morgan Kaufmann Publishers

instr into the IR

  • set IorD = 0

assert MemRead & IRWrite

  • ii. Increment the PC by 4
  • set ALUSrcA = 0 (PC)

ALUSrcB = 01 (4) ALUop = 00 (add) iii.Store the incremented instr addr back into the PC

  • set PCSource = 00

assert PCWrite

slide-59
SLIDE 59
  • Read registers rs and rt in case we need them
  • Compute the branch address in case the instruction is a branch
  • RTL:

A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15:0]) << 2);

  • We aren't setting any control lines based on the instruction type

Step 2: Instruction Decode and Register Fetch

59

2004 Morgan Kaufmann Publishers

  • We aren't setting any control lines based on the instruction type

(we are busy "decoding" it in our control logic)

  • Operations and the setting of control signals:
  • i. Access the reg file to read regs rs and rt and store the results

into the regs A and B.

  • ii. Compute the branch target addr and stores the addr in ALUout.
  • set ALUSrcA = 0,

ALUSrcB = 11, ALUop = 00

slide-60
SLIDE 60
  • ALU is performing one of three functions, based on instruction type
  • Memory Reference:

ALUOut <= A + sign-extend(IR[15:0]);

  • set ALUSrcA = 1, ALUSrcB = 10, ALUop = 00
  • R-type:

ALUOut <= A op B;

  • set ALUSrcA = 1, ALUSrcB = 00, ALUop = 10

Step 3 (instruction dependent)

60

2004 Morgan Kaufmann Publishers

  • set ALUSrcA = 1, ALUSrcB = 00, ALUop = 10
  • Branch:

if (A==B) PC <= ALUOut; – The ALU is used to do the equal comparison between the two regs read in the previous step.

  • set ALUSrcA = 1, ALUSrcB = 00, ALUop = 01

– The Zero signal is used to determine whether or not to branch.

  • assert PCWriteCond, PCSource = 01
slide-61
SLIDE 61
  • Loads and stores access memory

MDR <= Memory[ALUOut];

  • r

Memory[ALUOut] <= B; assert MemRead (for a load) or MemWrite (for a store), set IorD = 1

  • R-type instructions finish

Step 4 (R-type or memory-access)

61

2004 Morgan Kaufmann Publishers

  • R-type instructions finish

Reg[IR[15:11]] <= ALUOut; assert RegWrite, set RegDst = 1, MemtoReg = 0

  • The write actually takes place at the end of the cycle on the edge
slide-62
SLIDE 62
  • Reg[IR[20:16]] <= MDR;
  • set MemtoReg = 1, RegDst = 0 &

assert RegWrite Which instruction needs this?

Write-back step

62

2004 Morgan Kaufmann Publishers

slide-63
SLIDE 63

Summary: 63

2004 Morgan Kaufmann Publishers

slide-64
SLIDE 64
  • How many cycles will it take to execute this code?

lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not add $t5, $t2, $t3 sw $t5, 8($t3) Label: ...

Simple Questions

64

2004 Morgan Kaufmann Publishers

Label: ...

  • What is going on during the 8th cycle of execution?
  • In what cycle does the actual addition of $t2 and $t3 takes place?
slide-65
SLIDE 65
slide-66
SLIDE 66
  • Finite state machines:

– a set of states and – next state function (determined by current state and the input) – output function (determined by current state and possibly input)

Review: finite state machines

66

2004 Morgan Kaufmann Publishers

– We’ll use a Moore machine (output based only on current state)

slide-67
SLIDE 67
  • Value of control signals is dependent upon:

– what instruction is being executed – which step is being performed

  • Use the information we’ve accumulated to specify a finite state machine

– specify the finite state machine graphically, or – use microprogramming

Implementing the Control

67

2004 Morgan Kaufmann Publishers

– use microprogramming

  • Implementation can be derived from specification
slide-68
SLIDE 68
  • Note:

– don’t care if not mentioned – asserted if name only –

  • therwise exact value
  • How many state

Graphical Specification of FSM

68

2004 Morgan Kaufmann Publishers

bits will we need?

1 1

slide-69
SLIDE 69
  • Implementation:

Finite State Machine for Control

69

2004 Morgan Kaufmann Publishers

slide-70
SLIDE 70

PLA Implementation

product term (minterm): AND of inputs.

  • utputs: OR of product terms.

70

2004 Morgan Kaufmann Publishers

slide-71
SLIDE 71
  • Next State table

Graphical Specification of FSM

71

2004 Morgan Kaufmann Publishers

1

slide-72
SLIDE 72

Truth Tables for Next State Bits

NS3 (Next State 8, 9) NS2 (Next State 4, 5, 6, 7) NS1 (Next State 2, 3, 6, 7) NS0 (Next State 1,3,5,7,9)

NS3 NS2

0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1

NS3, 2,1, 0

72

2004 Morgan Kaufmann Publishers

NS0 (Next State 1,3,5,7,9)

NS1 NS0

slide-73
SLIDE 73
  • ROM = "Read Only Memory"

– values of memory locations are fixed ahead of time

  • A ROM can be used to implement a truth table

– if the address is m-bits, we can address 2m entries in the ROM. – our outputs are the bits of data that the address points to.

ROM Implementation

73

2004 Morgan Kaufmann Publishers

m is the "height", and n is the "width"

m n

0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1

slide-74
SLIDE 74

ROM Implementation

  • The upper 16 bits of control words in ROM (Appendix C in CD)

74

2004 Morgan Kaufmann Publishers

  • For what ROM address will the bit corresponding to PCWrite, the

high bit of the control word, be 1 ?

slide-75
SLIDE 75

ROM Implementation

  • The lower 4 bits of control words in ROM (Appendix C in CD)

75

2004 Morgan Kaufmann Publishers

slide-76
SLIDE 76

76

2004 Morgan Kaufmann Publishers

slide-77
SLIDE 77
  • How many inputs are there?

6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses)

  • How many outputs are there?

16 datapath-control outputs, 4 state bits = 20 outputs

  • ROM is 210 x 20 = 20K bits (and a rather unusual size)

ROM Implementation

77

2004 Morgan Kaufmann Publishers

  • Rather wasteful, since for lots of the entries, the outputs are the

same — i.e., opcode is often ignored

slide-78
SLIDE 78
  • Break up the table into two parts

— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM — 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM — Total: 4.3K bits of ROM

  • PLA is much smaller

— can share product terms

ROM vs PLA

78

2004 Morgan Kaufmann Publishers

— only need entries that produce an active output — can take into account don't cares

  • Size is (#inputs ´

´ ´ ´ #product-terms) + (#outputs ´ ´ ´ ´ #product-terms) For this example = (10x17)+(20x17) = 510 PLA cells – Break the PLA into two parts: (4, 10, 16) and (10, 10, 4)

  • 4x10+10x16=200, 10x10+10x4=140

200+140=340

  • PLA cells usually about the size of a ROM cell (slightly bigger)
slide-79
SLIDE 79
  • Complex instructions: the "next state" is often current state + 1

Another Implementation Style

79

2004 Morgan Kaufmann Publishers

slide-80
SLIDE 80

Details

80

2004 Morgan Kaufmann Publishers

slide-81
SLIDE 81

ROM with Sequenter

  • ROM size

– 26x4 + 26x4 + 24x18 bits

81

2004 Morgan Kaufmann Publishers

slide-82
SLIDE 82

Microprogramming

82

2004 Morgan Kaufmann Publishers

What are the “microinstructions” ?

  • The execution sequence of datapath’s internal components for one instruction
  • Several microinstruction realizes one instruction
  • A microinstruction is composed of all control signals and signals for sequencing
slide-83
SLIDE 83
  • A specification methodology

– appropriate if hundreds of opcodes, modes, cycles, etc. – signals specified symbolically using microinstructions

Microprogramming

Label ALU control SRC1 SRC2 Register control Memory PCWrite control Sequencing Fetch Add PC 4 Read PC ALU Seq Add PC Extshft Read Dispatch 1 Mem1 Add A Extend Dispatch 2

83

2004 Morgan Kaufmann Publishers

  • Will two implementations of the same architecture have the same microcode?
  • What would a microassembler do?

Mem1 Add A Extend Dispatch 2 LW2 Read ALU Seq Write MDR Fetch SW2 Write ALU Fetch Rformat1 Func code A B Seq Write ALU Fetch BEQ1 Subt A B ALUOut-cond Fetch JUMP1 Jump address Fetch

slide-84
SLIDE 84

Microinstruction format

Field name Value Signals active Comment Add ALUOp = 00 Cause the ALU to add. ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare for branches. Func code ALUOp = 10 Use the instruction's function code to determine ALU control. SRC1 PC ALUSrcA = 0 Use the PC as the first ALU input. A ALUSrcA = 1 Register A is the first ALU input. B ALUSrcB = 00 Register B is the second ALU input. SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input. Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input. Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input. Read Read two registers using the rs and rt fields of the IR as the register numbers and putting the data into registers A and B. Write ALU RegWrite, Write a register using the rd field of the IR as the register number and Register RegDst = 1, the contents of the ALUOut as the data. control MemtoReg = 0 Write MDR RegWrite, Write a register using the rt field of the IR as the register number and

84

2004 Morgan Kaufmann Publishers

Write MDR RegWrite, Write a register using the rt field of the IR as the register number and RegDst = 0, the contents of the MDR as the data. MemtoReg = 1 Read PC MemRead, Read memory using the PC as address; write result into IR (and lorD = 0 the MDR). Memory Read ALU MemRead, Read memory using the ALUOut as address; write result into MDR. lorD = 1 Write ALU MemWrite, Write memory using the ALUOut as address, contents of B as the lorD = 1 data. ALU PCSource = 00 Write the output of the ALU into the PC. PCWrite PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contents PCWriteCond

  • f the register ALUOut.

jump address PCSource = 10, Write the PC with the jump address from the instruction. PCWrite Seq AddrCtl = 11 Choose the next microinstruction sequentially. Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction. Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1. Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2.

slide-85
SLIDE 85
  • No encoding:

– 1 bit for each datapath operation – faster, requires more memory (logic) – used for Vax 780 — an astonishing 400K of memory!

  • Lots of encoding:

– send the microinstructions through logic to get control signals

Maximally vs. Minimally Encoded

85

2004 Morgan Kaufmann Publishers

– uses less memory, slower

  • Historical context of CISC:

– Too much logic to put on a single chip with everything else – Use a ROM (or even RAM) to hold the microcode – It’s easy to add new instructions

slide-86
SLIDE 86

Historical Perspective

  • In the ‘60s and ‘70s microprogramming was very important for

implementing machines

  • This led to more sophisticated ISAs and the VAX
  • In the ‘80s RISC processors based on pipelining became popular
  • Pipelining the microinstructions is also possible!
  • Implementations of IA-32 architecture processors since 486 use:

– “hardwired control” for simpler instructions

86

2004 Morgan Kaufmann Publishers

– “hardwired control” for simpler instructions

(few cycles, FSM control implemented using PLA or random logic)

– “microcoded control” for more complex instructions

(large numbers of cycles, central control store)

  • The IA-64 architecture uses a RISC-style ISA and can be

implemented without a large central control store

slide-87
SLIDE 87

Chapter 5 Summary

  • If we understand the instructions…

We can build a simple processor!

  • If instructions take different amounts of time, multi-cycle is better
  • Datapath implemented using:

– Combinational logic for arithmetic – State holding elements to remember bits

87

2004 Morgan Kaufmann Publishers

– State holding elements to remember bits

  • Control implemented using:

– Combinational logic for single-cycle implementation – Finite state machine for multi-cycle implementation