CS 35101 Computer Architecture Spring 2008 Week 10: Chapter - - PowerPoint PPT Presentation

cs 35101 computer architecture spring 2008 week 10
SMART_READER_LITE
LIVE PREVIEW

CS 35101 Computer Architecture Spring 2008 Week 10: Chapter - - PowerPoint PPT Presentation

CS 35101 Computer Architecture Spring 2008 Week 10: Chapter 5.1-5.3 Materials adapated from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [ adapted from D. Patterson slides ] CS 35101 Ch 5.1 Steinfadt, SP08 KSU Heads Up


slide-1
SLIDE 1

CS 35101 Ch 5.1 Steinfadt, SP08 KSU

CS 35101 Computer Architecture Spring 2008 Week 10: Chapter 5.1-5.3

Materials adapated from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [adapted from D. Patterson slides]

slide-2
SLIDE 2

CS 35101 Ch 5.2 Steinfadt, SP08 KSU

Head’s Up

 Last course week’s material

 Understanding performance, Ch. 4.1-4.6

 This week’s material

 Designing a MIPS single cycle datapath

  • Reading assignment – PH 5.1-5.3

 Next week’s material

 More on single and multi-cycle datapath design

  • Reading assignment – PH: 5.4-5.6

 Reminders

 HW 3 is due Thursday, 3/27 by the start of class  Project 2 is posted and due on 4/22  Exam #2 is Tuesday, April 15

slide-3
SLIDE 3

CS 35101 Ch 5.3 Steinfadt, SP08 KSU

Datapath design tended to just work … Control paths are where the system complexity lives. Bugs spawned from control path design errors reside in the microcode flow, the finite-state machines, and all the special exceptions that inevitably spring up in a machine design like thistles in a flower garden. The Pentium Chronicles, Colwell, pg. 64

slide-4
SLIDE 4

CS 35101 Ch 5.4 Steinfadt, SP08 KSU

Review: Design Principles

 Simplicity favors regularity

 fixed size instructions – 32-bits  only three instruction formats

 Good design demands good compromises

 three instruction formats

 Smaller is faster

 limited instruction set  limited number of registers in register file  limited number of addressing modes

 Make the common case fast

 arithmetic operands from the register file (load-store

machine)

 allow instructions to contain immediate operands

slide-5
SLIDE 5

CS 35101 Ch 5.5 Steinfadt, SP08 KSU

 We're ready to look at an implementation of the MIPS  Simplified to contain only:

 memory-reference instructions: lw, sw  arithmetic-logical instructions: add, addu, sub, subu,

and, or, xor, nor, slt, sltu

 arithmetic-logical immediate instructions: addi, addiu,

andi, ori, xori, slti, sltiu

 control flow instructions: beq, j

 Generic implementation:

 use the program counter (PC) to supply

the instruction address and fetch the instruction from memory (and update the PC)

 decode the instruction (and read registers)  execute the instruction

The Processor: Datapath & Control

Fetch PC = PC+4 Decode Exec

slide-6
SLIDE 6

CS 35101 Ch 5.6 Steinfadt, SP08 KSU

Abstract Implementation View

 Two types of functional units:

 elements that operate on data values (combinational)  elements that contain state (sequential)

 Single cycle operation  Split memory model - one memory for instructions

and one for data

Address Instruction Instruction Memory Write Data Reg Addr Reg Addr Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data

slide-7
SLIDE 7

CS 35101 Ch 5.7 Steinfadt, SP08 KSU

Clocking Methodologies

 Clocking methodology defines when signals can

be read and when they can be written

falling (negative) edge rising (positive) edge clock cycle

clock rate = 1/(clock cycle) e.g., 10 nsec clock cycle = 100 MHz clock rate 1 nsec clock cycle = 1 GHz clock rate  State element design choices

 level sensitive latch  master-slave and edge-triggered flipflops

slide-8
SLIDE 8

CS 35101 Ch 5.8 Steinfadt, SP08 KSU

State Elements

 Set-reset latch

R S Q !Q

!Q(t) 1 !Q(t+1) 1 1 Q(t) 1 1 1 Q(t+1) S R

clock D Q !Q clock D Q

 Level sensitive D latch

 latch is transparent when clock is high (copies input to

  • utput)
slide-9
SLIDE 9

CS 35101 Ch 5.9 Steinfadt, SP08 KSU

Two-Sided Clock Constraint

 Race problem with latch based design …

D clock Q !Q D-latch0 D clock Q !Q D-latch1 clock

 Consider the case when D-latch0 holds a 0 and D-

latch1 holds a 1 and you want to transfer the contents of D-latch0 to D-latch1 and vica versa

 must have the clock high long enough for the transfer to

take place

 must not leave the clock high so long that the

transferred data is copied back into the original latch

 Two-sided clock constraint

slide-10
SLIDE 10

CS 35101 Ch 5.10 Steinfadt, SP08 KSU

State Elements, con’t

 Solution is to use flipflops that change state (Q)

  • nly on clock edge (master-slave)

 master (first D-latch) copies the input when the clock is

high (the slave (second D-latch) is locked in its memory state and the output does not change)

 slave copies the master when the clock goes low (the

master is now locked in its memory state so changes at the input are not loaded into the master D-latch)

D clock Q !Q D-latch D clock Q !Q D-latch Q !Q D clock

clock D Q

slide-11
SLIDE 11

CS 35101 Ch 5.11 Steinfadt, SP08 KSU

One-Slided Clock Constraint

 Master-slave (edge-triggered) flipflops removes

  • ne of the clock constraints

D clock Q !Q MS-ff0 D clock Q !Q MS-ff1 clock

 Consider the case when MS-ff0 holds a 0 and

MS-ff1 holds a 1 and you want to transfer the contents of MS-ff0 to MS-ff1 and vica versa

 must have the clock cycle time long enough to

accommodate the worst case delay path

 One-sided clock constraint

slide-12
SLIDE 12

CS 35101 Ch 5.12 Steinfadt, SP08 KSU

Latches vs Flipflops

 Output is equal to the stored value inside the

element

 Change of state (value) is based on the clock

 Latches: output changes whenever the inputs change

and the clock is asserted (level sensitive methodology)

  • Two-sided timing constraint

 Flip-flop: output changes only on a clock edge (edge-

triggered methodology)

  • One-sided timing constraint

A clocking methodology defines when signals can be read and written – wouldn’t want to read a signal at the same time it was being written

slide-13
SLIDE 13

CS 35101 Ch 5.13 Steinfadt, SP08 KSU

Our Implementation

 An edge-triggered methodology, typical execution

 read contents of some state elements (combinational

activity, so no clock control signal needed)

 send values through some combinational logic  write results to one or more state elements on clock

edge

State element 1 State element 2 Combinational logic clock

  • ne clock cycle

 Assumes state elements are written on every clock

cycle; if not, need explicit write control signal

 write occurs only when both the write control is asserted

and the clock edge occurs

slide-14
SLIDE 14

CS 35101 Ch 5.14 Steinfadt, SP08 KSU

Fetching Instructions

 Fetching instructions involves

 reading the instruction from the Instruction Memory  updating the PC value to be the address of the next

(sequential) instruction

Read Address Instruction Instruction Memory Add PC 4

 PC is updated every clock cycle, so it does not need an

explicit write control signal just a clock signal

 Reading from the Instruction Memory is a combinational

activity, so it doesn’t need an explicit read control signal

Fetch PC = PC+4 Decode Exec

clock

slide-15
SLIDE 15

CS 35101 Ch 5.15 Steinfadt, SP08 KSU

Instruction Formats Review

5:0 10:6 15:11 20:16 25:21 31:26 funct shamt rd rt rs

  • p

15:0 20:16 25:21 31:26 immed rt rs

  • p

25:0 31:26 address

  • p
slide-16
SLIDE 16

CS 35101 Ch 5.16 Steinfadt, SP08 KSU

Decoding Instructions

 Decoding instructions involves

 sending the fetched instruction’s opcode and function

field bits to the control unit and

Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 Control Unit

 reading two values from the Register File

  • Register File addresses are contained in the instruction

Fetch PC = PC+4 Decode Exec

slide-17
SLIDE 17

CS 35101 Ch 5.17 Steinfadt, SP08 KSU

 Note that both RegFile read ports are active for all

instructions during the Decode cycle using the rs and rt instruction field addresses

 Since haven’t decoded the instruction yet, don’t know what

the instruction is !

 Just in case the instruction uses values from the RegFile do

“work ahead” by reading the two source operands

Which instructions do make use of the RegFile values?

 Also, all instructions (except j) use the ALU after

reading the registers Why? memory-reference? arithmetic? control flow?

Reading Registers “Just in Case”

slide-18
SLIDE 18

CS 35101 Ch 5.18 Steinfadt, SP08 KSU

Executing R Format Operations

 R format operations (add, sub, slt, and, or)

 perform operation (op and funct) on values in rs and rt  store the result back into the Register File (into location rd)

Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

  • verflow

zero ALU control RegWrite

R-type: 31 25 20 15 5

  • p

rs rt rd funct shamt 10

 Note that Register File is not written every cycle (e.g. sw), so

we need an explicit write control signal for the Register File

Fetch PC = PC+4 Decode Exec

slide-19
SLIDE 19

CS 35101 Ch 5.19 Steinfadt, SP08 KSU

Consider slt Instruction

 R format operations (add, sub, slt, and, or)

 perform operation (op and funct) on values in rs and rt  store the result back into the Register File (into location rd)

Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

  • verflow

zero ALU control RegWrite

R-type: 31 25 20 15 5

  • p

rs rt rd funct shamt 10

 Note that Register File is not written every cycle (e.g. sw), so

we need an explicit write control signal for the Register File

Fetch PC = PC+4 Decode Exec

slide-20
SLIDE 20

CS 35101 Ch 5.20 Steinfadt, SP08 KSU

 Remember the R format instruction slt

slt $t0, $s0, $s1 # if $s0 < $s1 # then $t0 = 1 # else $t0 = 0

Consider the slt Instruction

 Where does the 1 (or 0) come from to store into $t0 in the

Register File at the end of the execute cycle?

Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

  • verflow

zero ALU control RegWrite

slide-21
SLIDE 21

CS 35101 Ch 5.21 Steinfadt, SP08 KSU

slide-22
SLIDE 22

CS 35101 Ch 5.22 Steinfadt, SP08 KSU

Executing Load and Store Operations

 Load and store operations have to

 compute a memory address by adding the base

register (in rs) to the 16-bit signed offset field in the instruction

  • base register was read from the Register File during

decode

  • offset value in the low order 16 bits of the instruction

must be sign extended to create a 32-bit signed value

 store value, read from the Register File during

decode, must be written to the Data Memory

 load value, read from the Data Memory, must be

stored in the Register File

I-Type:

  • p

rs rt address offset 31 25 20 15

slide-23
SLIDE 23

CS 35101 Ch 5.23 Steinfadt, SP08 KSU

Executing Load and Store Operations, con’t

Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

  • verflow

zero ALU control RegWrite

Data Memory Address Write Data Read Data Sign Extend

MemWrite MemRead

slide-24
SLIDE 24

CS 35101 Ch 5.25 Steinfadt, SP08 KSU

Executing Branch Operations

 Branch operations have to

 compare the operands read from the Register File

during decode (rs and rt values) for equality (zero ALU output)

 compute the branch target address by adding the

updated PC to the sign extended16-bit signed

  • ffset field in the instruction
  • “base register” is the updated PC
  • offset value in the low order 16 bits of the instruction

must be sign extended to create a 32-bit signed value and then shifted left 2 bits to turn it into a word address

I-Type:

  • p

rs rt address offset 31 25 20 15

slide-25
SLIDE 25

CS 35101 Ch 5.26 Steinfadt, SP08 KSU

Executing Branch Operations, con’t

Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

zero ALU control

Sign Extend 16 32 Shift left 2 Add 4 Add PC

Branch target address (to branch control logic)

slide-26
SLIDE 26

CS 35101 Ch 5.28 Steinfadt, SP08 KSU

Executing Jump Operations

 Jump operations have to

 replace the lower 28 bits of the PC with the lower 26 bits

  • f the fetched instruction shifted left by 2 bits

Read Address Instruction Instruction Memory Add PC 4 Shift left 2

Jump address

26 4 28

J-Type: op 31 25 jump target address

slide-27
SLIDE 27

CS 35101 Ch 5.29 Steinfadt, SP08 KSU

Creating a Single Datapath from the Parts

 Assemble the datapath elements, add control lines

as needed, and design the control path

 Fetch, decode and execute each instructions in

  • ne clock cycle – single cycle design

 no datapath resource can be used more than once per

instruction, so some must be duplicated (e.g., why we have a separate Instruction Memory and Data Memory)

 to share datapath elements between two different

instruction classes will need multiplexors at the input of the shared elements with control lines to do the selection

 Cycle time is determined by length of the longest

path

slide-28
SLIDE 28

CS 35101 Ch 5.30 Steinfadt, SP08 KSU

Fetch, R, and Memory Access Portions

Read Address Instruction Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

  • vf

zero ALU control RegWrite

Data Memory Address Write Data Read Data

MemWrite MemRead

Sign Extend 16 32

slide-29
SLIDE 29

CS 35101 Ch 5.31 Steinfadt, SP08 KSU

Multiplexor Insertion

MemtoReg

Read Address Instruction Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

  • vf

zero ALU control RegWrite

Data Memory Address Write Data Read Data

MemWrite MemRead

Sign Extend 16 32

ALUSrc

slide-30
SLIDE 30

CS 35101 Ch 5.32 Steinfadt, SP08 KSU

Clock Distribution

MemtoReg

Read Address Instruction Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

  • vf

zero ALU control RegWrite

Data Memory Address Write Data Read Data

MemWrite MemRead

Sign Extend 16 32

ALUSrc

System Clock

clock cycle

slide-31
SLIDE 31

CS 35101 Ch 5.33 Steinfadt, SP08 KSU

Adding the Branch Portion

Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU

  • vf

zero ALU control RegWrite

Data Memory Address Write Data Read Data

MemWrite MemRead

Sign Extend 16 32

MemtoReg ALUSrc

Read Address Instruction Instruction Memory Add PC 4

slide-32
SLIDE 32

CS 35101 Ch 5.35 Steinfadt, SP08 KSU

 We wait for everything to settle down

 ALU might not produce “right answer” right away  Memory and RegFile reads are combinational (as are

ALU, adders, muxes, shifter, signextender)

 Use write signals along with the clock edge to determine

when to write to the sequential elements (to the PC, to the Register File and to the Data Memory)

 The clock cycle time is determined by the logic

delay through the longest path

Our Simple Control Structure

We are ignoring some details like register setup and hold times