CPE 335 Computer Organization Computer Organization Basic MIPS - - PowerPoint PPT Presentation

cpe 335 computer organization computer organization basic
SMART_READER_LITE
LIVE PREVIEW

CPE 335 Computer Organization Computer Organization Basic MIPS - - PowerPoint PPT Presentation

CPE 335 Computer Organization Computer Organization Basic MIPS Architecture Part II Dr. Iyad Jafar Ad Adapted from Dr. Gheith Abandah slides t d f D Gh ith Ab d h lid http://www.abandah.com/gheith/Courses/CPE335_S08/index.html CPE232


slide-1
SLIDE 1

CPE 335 Computer Organization Computer Organization Basic MIPS Architecture – Part II

  • Dr. Iyad Jafar

Ad t d f D Gh ith Ab d h lid Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html

CPE232 Basic MIPS Architecture 1

slide-2
SLIDE 2

Multicycle Datapath Approach

Let an instruction take more than 1 clock cycle to complete Let an instruction take more than 1 clock cycle to complete

Break up instructions into steps where

  • each step takes a cycle while trying to balance the amount of work to be

done in each step

  • restrict each cycle to use only one major functional unit; unless used in

parallel

Not every instruction takes the same number of clock cycles

In addition to faster clock rates multicycle allows functional In addition to faster clock rates, multicycle allows functional

units that can be used more than once per instruction as long as they are used on different clock cycles, as a result

Need one memory only– but only one memory access per cycle Need one ALU/adder only – but only one ALU operation per cycle

CPE232 Basic MIPS Architecture 2

slide-3
SLIDE 3

At the end of a cycle

Multicycle Datapath Approach, con’t

At the end of a cycle

Store values needed in a later cycle by the current instruction in internal registers

(A,B, IR, and MDR) . These registers are invisible to the programmer.

All of these registers, except IR, hold data only between a pair of adjacent clock All of these registers, except IR, hold data only between a pair of adjacent clock

cycles thus they don’t need write control signal.

Address Read Data Memory PC Read Addr 1 Read Addr 2 Register Read Data 1 IR A

  • ut

Read Data (Instr. or Data) Write Data Read Addr 2 Write Addr File Data 1 Read Data 2 ALU Write Data MDR B ALUo

IR – Instruction Register MDR – Memory Data Register A, B – regfile read data registers ALUout – ALU output register

CPE232 Basic MIPS Architecture 3

, g g p g

Data used by subsequent instructions are stored in programmer visible registers

(i.e., register file, PC, or memory)

slide-4
SLIDE 4

Multicycle Datapath Approach, con’t

Similar to single cycle, shared functional units should have multiplexers at their inputs. There is only one adder that will be used to update PC perform ALU

CPE232 Basic MIPS Architecture 4

There is only one adder that will be used to update PC, perform ALU

  • perations, comparison for beq, memory address computation, and

branch address computation.

slide-5
SLIDE 5

Multicycle Datapath Approach- Control Signals

CPE232 Basic MIPS Architecture 5

slide-6
SLIDE 6

The Multicycle Datapath with Control Signals

ALUOp Control MemWrite MemRead IorD PCWrite PCWriteCond ALUSrcA ALUSrcB PCSource Shift IRWrite MemtoReg RegDst RegWrite PC[31-28] Instr[31- 28 Memory PC Read Addr 1 A Shift left 2

1 2

Instr[25-0] 26] 28 Address Read Data (Instr. or Data) P Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU Write Data IR A B ALUout zero

1 1 1

Write Data Data 2 Write Data MDR B Sign Extend Shift left 2 ALU

1 1 2 3 4

Instr[15-0] 32

CPE232 Basic MIPS Architecture 6

ALU control Instr[5-0] 32

slide-7
SLIDE 7

Multicycle Machine: 1-bit Control Signals

Signal Effect when deasserted Effect when asserted RegDst

The destination register number comes from the rt field The destination register number comes from the rd field

RegWrite

None Write is enabled to selected destination register

ALUSrcA

The first ALU operand is the PC The first ALU operand is register A

MemRead

None Content of memory address is placed on Memory data out

MemWrtite

None Memory location specified by the address is replaced by the value on Write data input replaced by the value on Write data input

MemtoReg

The value fed to register file is from ALUOut The value fed to register file is from memory

IorD

PC is used as an address to memory ALUOut is used to supply the address to the

IorD

unit memory unit

IRWrite

None The output of memory is written into IR

PCWrite

None PC is written; the source is controlled by

CPE232 Basic MIPS Architecture 7

PCWrite

None y PCSource

PCWriteCond

None PC is written if Zero output from ALU is also active

slide-8
SLIDE 8

Multicycle Machine: 2-bit Control Signals

Signal Value Effect

00 ALU performs add operation

ALUOp

01 ALU performs subtract operation 10 The funct field of the instruction determines the ALU operation

ALUSrcB

00 The second input to the ALU comes from register B 01 The second input to the ALU is 4 (to increment PC) The second input to the ALU is the sign extended offset lower 16

ALUSrcB

10 The second input to the ALU is the sign extended offset , lower 16 bits of IR. 11 The second input to the ALU is the sign extended , lower 16 bits of the IR shifted left by two bits

PCSource

00 Output of ALU (PC +4) is sent to the PC for writing 01 The content of ALUOut are sent to the PC for writing (Branch address)

CPE232 Basic MIPS Architecture 8

) 10 The jump address is sent to the PC for writing

slide-9
SLIDE 9

Breaking Instruction Execution into Clock Cycles

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 IFetch Dec Exec Mem WB

  • 1. IFetch: Instruction Fetch and Update PC (Same for all

instructions)

IFetch Dec Exec Mem WB

instructions)

  • Operations

1 1 I t ti F t h IR M [PC] 1.1 Instruction Fetch: IR <= Memory[PC] 1.2 Update PC : PC <= PC + 4

  • Control signals values
  • Control signals values
  • IorD = 0 , MemRead = 1 , IRWrite = 1
  • ALUSrcA = 0, ALUSrcB = 01, ALUOp = 00, PCWrite = 1

CPE232 Basic MIPS Architecture 9

  • PCSrc = 00
slide-10
SLIDE 10

Breaking Instruction Execution into Clock Cycles

f (

  • 2. Decode - Instruction decode and register fetch (same

for all instructions) We don’t know the instruction yet do non harmful We don t know the instruction yet, do non harmful

  • perations
  • Operations

p

2.1 read the two source registers rs and rt and place them in registers A and B, respectively. A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] 2.2 Compute the branch address ALUOut <= PC + (sign-extend(IR[15:0]) <<2)

C t l i l l

CPE232 Basic MIPS Architecture 10

  • Control signals values
  • ALUSrcA = 0, ALUSrcB = 11, ALUOp = 00
slide-11
SLIDE 11

Breaking Instruction Execution into Clock Cycles

  • 3. Execution, Memory address computation, or branch

completion Operation in this cycle depends on instruction type Operation in this cycle depends on instruction type

  • Operations

* if f dd * if memory reference, compute address ALUOut <= A + sign-extend(IR[15:0]) ALUS A 1 ALUS B 10 ALUO 00 ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00 * if arithmetic logic instruction perform operation * if arithmetic-logic instruction, perform operation ALUOut <= A op B ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10

CPE232 Basic MIPS Architecture 11

ALUSrcA = 1, ALUSrcB = 00, ALUOp = 10

slide-12
SLIDE 12

Breaking Instruction Execution into Clock Cycles

  • 3. Execution, Memory address computation, or branch

completion (continued)

  • peration depends on instruction type
  • peration depends on instruction type
  • Operations

* if b h i i * if branch instruction if (A == B) PC<= ALUOut ALUS A 1 ALUS B 00 ALUO 01 ALUSrcA = 1, ALUSrcB = 00, ALUOp = 01, PCWriteCond = 1, PCSrc = 01 * if jump instruction PC <= {PC[31:28], (IR[25:0],2’b00)} PCSource = 10, PCWrite = 1

CPE232 Basic MIPS Architecture 12

slide-13
SLIDE 13

Breaking Instruction Execution into Clock Cycles

  • 4. Memory access or R-type completion
  • peration in this cycle depends on instruction type
  • Operations

* if load instruction : read value from memory into MDR MDR <= Memory[ALUOut] MemRead = 1, IorD = 1 * if store instruction: store rt into memory Memory[ALUOut] <= B M W it 1 I D 1 MemWrite = 1, IorD = 1 * if arithmetic-logical instruction: write ALU result into rd

CPE232 Basic MIPS Architecture 13

Reg[IR[15:11]] <= ALUOut MemtoReg = 0, RegDst = 1, RegWrite = 1

slide-14
SLIDE 14

Breaking Instruction Execution into Clock Cycles

  • 5. Memory read completion

Needed for the load instruction only

  • Operations

5.1 store the loaded value in MDR into rt Reg[IR[20:16]] <= MDR RegWrite = 1, MemtoReg = 1, RegDst = 0

CPE232 Basic MIPS Architecture 14

slide-15
SLIDE 15

Breaking Instruction Execution into Clock Cycles

  • In this implementation, not all instructions take 5

cycles

Instruction Class Clock Cycles Required Load 5 Store 4 Branch 3 Branch 3 Arithmetic-logical 4 Jump 3

CPE232 Basic MIPS Architecture 15

slide-16
SLIDE 16

Multicycle Performance

C C f f

Compute the average CPI for multicycle implementation for

SPECINT2000 program which has the following instruction mix: 25% loads, 10% stores, 11% branches, 2% jumps, 52%

  • ALU. Assume the CPI for each instruction class as given in

the previous table

CPI = Σ CPIi x ICi / IC CPI = Σ CPIi x ICi / IC

= 0.25 x 5 + 0.1 x 4 + 0.11 x 3 + 0.02 x 3 + 0.52 x 4 = 4 12 = 4.12

Compare to CPI = 1 for single cycle ?!! Assume CCM = 1/5 CCS

M S

Then

PerformanceM / PerformanceS = (IC x 1 x CCS ) / (IC x 4.12 x (1/5) CCS) 1 21

CPE232 Basic MIPS Architecture 16

= 1.21

Multicycle is also cost-effective in terms of hardware.

slide-17
SLIDE 17

M lti l d t th t l i l t d t i d l l

Multicycle Control Unit

Multicycle datapath control signals are not determined solely

by the bits in the instruction

e.g., op code bits tell what operation the ALU should be doing, but

g , p p g, not what instruction cycle is to be done next

Since the instruction is broken into multiple cycles, we need to know

what we did in the previous cycle(s) in order to determine the current what we did in the previous cycle(s) in order to determine the current action

Must use a finite state machine (FSM) for control

a set of states (current state stored in State Register) next state function (determined

by current state and the input)

Combinational control logic Datapath control points . . .

  • utput function (determined by

current state and the input)

control logic

State Reg

Inst points . . . . . .

CPE232 Basic MIPS Architecture 17

Inst Opcode Next State

slide-18
SLIDE 18

The States of the Control Unit

10 states are

required in the FSM control FSM control

The sequence of

states is determined by five steps of execution and the instruction

CPE232 Basic MIPS Architecture 18

slide-19
SLIDE 19

The Control Unit

  • 1. Logic gates
  • inputs : present state +
  • pcode #bits = 10
  • utputs: control + next

state #bits = 20

  • truth table size = 210 rows

x 20 columns

  • 2. ROM
  • Can be used to implement
  • Can be used to implement

the truth table above (210 x 20 bit = 20 Kbit)

  • Each location stores the
  • Each location stores the

control signals values and the next state

  • Each location is

CPE232 Basic MIPS Architecture 19

addressable by the opcode and next state value

slide-20
SLIDE 20

Micro-programmed Control Unit

ROM i l t ti i

  • ROM implementation is

vulnerable to bugs and expensive especially for complex

  • CPU. Size increase as the
  • CPU. Size increase as the

number and complexity of instructions (states) increases.

  • Use Microprogramming
  • Use Microprogramming
  • The next state value may not

be sequential

  • Generate the next state
  • utside the storage element
  • Each state is a
  • Each state is a

microinstruction and the signals are specified symbolically

CPE232 Basic MIPS Architecture 20

  • Use labels for sequencing
slide-21
SLIDE 21

Sequencer

CPE232 Basic MIPS Architecture 21

slide-22
SLIDE 22

Microprogram

  • The microassembler converts the microcode into actual signal values

CPE232 Basic MIPS Architecture 22

  • The microassembler converts the microcode into actual signal values
  • The sequencing field is used along with the opcode to determine the

next state

slide-23
SLIDE 23

Multicycle Advantages & Disadvantages

ff

Uses the clock cycle efficiently – the clock cycle is timed to

accommodate the slowest instruction step

Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 lw sw R-type

Multicycle implementations allow functional units to be used

IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch yp

Multicycle implementations allow functional units to be used

more than once per instruction as long as they are used on different clock cycles but

Requires additional internal state registers, more muxes,

CPE232 Basic MIPS Architecture 23

and more complicated (FSM) control

slide-24
SLIDE 24

Single Cycle vs. Multiple Cycle Timing

Clk Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste

multicycle clock

Multiple Cycle Implementation:

multicycle clock slower than 1/5th of single cycle clock due to state register

Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

to state register

  • verhead

IFetch Dec Exec Mem WB IFetch Dec Exec Mem lw sw IFetch R-type

CPE232 Basic MIPS Architecture 24