Designing a Single Cycle Datapath Computer Science 104 Alvin R. - - PowerPoint PPT Presentation

designing a single cycle datapath
SMART_READER_LITE
LIVE PREVIEW

Designing a Single Cycle Datapath Computer Science 104 Alvin R. - - PowerPoint PPT Presentation

Designing a Single Cycle Datapath Computer Science 104 Alvin R. Lebeck cps 104 1 Administrivia Homework Due Friday March 2 Prof. Hilton no office hours this week Reading Ch 4. Review Register File Where are we with


slide-1
SLIDE 1

cps 104 1

Designing a Single Cycle Datapath

Computer Science 104 Alvin R. Lebeck

slide-2
SLIDE 2

cps 104 2

Administrivia

 Homework Due Friday March 2  Prof. Hilton no office hours this week  Reading Ch 4.  Review  Register File  Where are we with respect to the BIG picture?  The Steps of Designing a Processor  Datapath Design

slide-3
SLIDE 3

cps 104 3

What did you learn last time?

slide-4
SLIDE 4

cps 104 4

Register Cells on a bus One can “source” and “sink” from any cell on the bus by activating the right controls, IE--input enable, and OE--output enable.

D E Q Q

D latch IE OE

D E Q Q

D latch IE OE

D E Q Q

D latch IE OE

D E Q Q

D latch IE OE

slide-5
SLIDE 5

cps 104 5

3-Port Register Cell

Q Q D

ata-In

D E

nable

OutA OutB

Bus-B Bus-A Bus-C

Complement Q

  • Stores one bit of a register
  • Can Read onto Bus-A & Bus-B and Write from Bus-C

Simultaneously

slide-6
SLIDE 6

cps 104 6

Q Q

Bus-B Bus-A Bus-C

Q Q

Bus-B Bus-A Bus-C

Bit-0 Bit-1

EA EB EC

3-Port Register File

slide-7
SLIDE 7

cps 104 7

Address Decode Circuit

A0 A1 EA B0 B1 EB C0 C1 EC

Q Q

Data-in OutA OutB

DEnable

Bus-A Bus-B Bus-C

Register address: 01

slide-8
SLIDE 8

cps 104 8

Reg-0 Reg-1 Reg-2 Reg-3

One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell One-bit Cell

A3 B3 C3

A-En Add-A1 Add-A0 B

  • En

Add-B 1 Add-B C

  • En

Add-C 1 Add-C

A1 B1 C1 A2 B2 C2 A0 B0 C0

Register File (Four 4-bit Registers)

slide-9
SLIDE 9

cps 104 9

Digital Logic Summary

° Given Boolean function, generate a circuit to “realize” the function. ° Constructed circuits that can add and subtract. ° The ALU: a circuit that can add, subtract, detect overflow, compare, and do bit-wise operations (AND, OR, NOT) ° Shifter ° Memory Elements: SR-Latch, D Latch, D Flip-Flop ° Tri-state drivers & Bus Communication vs. MUX ° Register Files ° Control Signals modify what the circuit does with inputs

  • ALU, Shift, Register Read/Write
slide-10
SLIDE 10

cps 104 10

What is Computer Architecture?

  • Coordination of levels of abstraction

I/O system CPU Compiler Operating System Application Digital Design Circuit Design

  • Under a set of rapidly changing technology Forces

Instruction Set Architecture, Memory, I/O Firmware Memory

Software Hardware Interface Between HW and SW

slide-11
SLIDE 11

cps 104 11

The Big Picture: Where are We Now?

 The Five Classic Components of a Computer

Control Datapath Memory Processor Input Output

Today’s Topic: Datapath Design

slide-12
SLIDE 12

cps 104 12

Datapath Design

° How do we build hardware to implement the MIPS instructions? ° Add, LW, SW, Beq, Jump

slide-13
SLIDE 13

cps 104 13

An Abstract View of the Implementation

Clk 5 Rw Ra Rb 32 32-bit Registers Rd ALU Clk Data In DataOut Data Address Ideal Data Memory Instruction Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 16 Imm 32 32 32 32 A B

slide-14
SLIDE 14

cps 104 14

The MIPS Instruction Formats

° All MIPS instructions are 32 bits long. The three instruction formats:

  • R-type
  • I-type
  • J-type

° The different fields are:

  • op: operation of the instruction
  • rs, rt, rd: the source and destination register specifiers
  • shamt: shift amount
  • funct: selects the variant of the operation in the “op” field
  • address / immediate: address offset or immediate value
  • target address: target address of the jump instruction
  • p

target address 26 31 6 bits 26 bits

  • p

rs rt rd shamt funct 6 11 16 21 26 31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits

slide-15
SLIDE 15

cps 104 15

The MIPS Subset (We can’t implement them all!)

° ADD and subtract

  • add rd, rs, rt
  • sub rd, rs, rt

° OR Immediate:

  • ori rt, rs, imm16

° LOAD and STORE

  • lw rt, rs, imm16
  • sw rt, rs, imm16

° BRANCH:

  • beq rs, rt, imm16

° JUMP:

  • j target
  • p

target address 26 31 6 bits 26 bits

  • p

rs rt rd shamt funct 6 11 16 21 26 31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits

slide-16
SLIDE 16

cps 104 16

The Hardware “Program”

Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction

How do I build the hardware to implement the MIPS instructions and their sequencing?

slide-17
SLIDE 17

cps 104 17

Combinational Logic Elements (Basic Building Blocks)

° Adder ° MUX ° ALU

32 32 A B 32 Sum Carry 32 32 A B 32 Result Zero OP 32 A B 32 Y 32 Select Adder MUX ALU CarryIn

slide-18
SLIDE 18

cps 104 18

Storage Element: Register (Basic Building Block)

° Register

  • Similar to the D Flip Flop except
  • N-bit input and output
  • Write Enable input
  • Write Enable:
  • negated (0): Data Out will not change
  • asserted (1): Data Out will become the same

as Data In.

Clk Data In Write Enable N N Data Out

slide-19
SLIDE 19

cps 104 19

Storage Element: Register File

° Register File consists of 32 registers:

  • Two 32-bit output busses:

busA and busB

  • One 32-bit input bus: busW

° Register is selected by:

  • RA selects the register to put on busA
  • RB selects the register to put on busB
  • RW selects the register to be written

via busW when Write Enable is 1 ° Clock input (CLK)

  • The CLK input is a factor ONLY during write operation
  • During read operation, behaves as a combinational logic block:
  • RA or RB valid => busA or busB valid after “access time.”

Clk busW Write Enable 32 32 busA 32 busB 5 5 5 RW RA RB 32 32-bit Registers

slide-20
SLIDE 20

cps 104 20

Storage Element: Idealized Memory

° Memory (idealized)

  • One input bus: Data In
  • One output bus: Data Out

° Memory word is selected by:

  • Write Enable = 0: Address selects the word to put on the

Data Out bus

  • Write Enable = 1: Address selects the memory

word to be written via the Data In bus ° Clock input (CLK)

  • The CLK input is a factor ONLY during write operation
  • During read operation, behaves as a combinational logic

block:

  • Address valid => Data Out valid after “access time.”

Clk Data In Write Enable 32 32 DataOut Address

slide-21
SLIDE 21

cps 104 21

An Abstract View of the Implementation

Clk 5 Rw Ra Rb 32 32-bit Registers Rd ALU Clk Data In DataOut Data Address Ideal Data Memory Instruction Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 16 Imm 32 32 32 32 A B

slide-22
SLIDE 22

cps 104 22

Clocking Methodology

 All storage elements are clocked by the same clock edge  Cycle Time >= CLK-to-Q + Longest Delay Path + Setup + Clock

Skew

 Longest delay path = critical path

Clk Don’t Care Setup Hold . . . . . . . . . . . . Setup Hold

slide-23
SLIDE 23

cps 104 23

An Abstract View of the Critical Path

° Register file and ideal memory:

  • The CLK input is a factor ONLY during write operation
  • During read operation, behave as combinational logic:
  • Address valid => Output valid after “access time.”

Clk 5 Rw Ra Rb 32 32-bit Registers Rd ALU Clk Data In DataOut Data Address Ideal Data Memory Instruction Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 16 Imm 32 32 32 32

slide-24
SLIDE 24

cps 104 24

The Steps of Designing a Processor

° Instruction Set Architecture => Register Transfer Language ° Register Transfer Language =>

  • Datapath components
  • Datapath interconnect

° Datapath components => Control signals ° Control signals => Control logic

slide-25
SLIDE 25

cps 104 25

Overview of the Instruction Fetch Unit

° The common Register Transfer Language (RTL) operations

  • Fetch the Instruction: mem[PC]
  • Update the program counter:
  • Sequential Code: PC <- PC + 4
  • Branch and Jump: PC <- “something else”

32 Instruction Word Address Instruction Memory PC Clk Next Address Logic

slide-26
SLIDE 26

cps 104 26

RTL: The ADD Instruction

° add rd, rs, rt

  • mem[PC]

Fetch the instruction from memory

  • R[rd] <- R[rs] + R[rt]

The ADD operation

  • PC <- PC + 4

Calculate the next instruction’s address

slide-27
SLIDE 27

cps 104 27

RTL: The Load Instruction

° lw rt, rs, imm16

  • mem[PC]

Fetch the instruction from memory

  • Address <- R[rs] + SignExt(imm16)

Calculate the memory address

  • R[rt] <- Mem[Address]

Load the data into the register

  • PC <- PC + 4

Calculate the next instruction’s address

slide-28
SLIDE 28

cps 104 28

RTL: The ADD Instruction

° add rd, rs, rt

  • mem[PC]

Fetch the instruction from memory

  • R[rd] <- R[rs] + R[rt]

The actual operation

  • PC <- PC + 4

Calculate the next instruction’s address

  • p

rs rt rd shamt funct 6 11 16 21 26 31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

slide-29
SLIDE 29

cps 104 29

RTL: The Subtract Instruction

° sub rd, rs, rt

  • mem[PC]

Fetch the instruction from memory

  • R[rd] <- R[rs] - R[rt]

The actual operation

  • PC <- PC + 4

Calculate the next instruction’s address

  • p

rs rt rd shamt funct 6 11 16 21 26 31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

slide-30
SLIDE 30

cps 104 30

Datapath for Register-Register Operations

° R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt

  • Ra, Rb, and Rw comes from instruction’s rs, rt, and rd fields
  • ALUctr and RegWr: control logic after decoding the instruction

fields: op and func

32 Result ALUctr Clk busW RegWr 32 32 busA 32 busB 5 5 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd ALU

  • p

rs rt rd shamt funct 6 11 16 21 26 31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

slide-31
SLIDE 31

cps 104 31

Register-Register Timing

32 Result ALUctr Clk busW RegWr 32 32 busA 32 busB 5 5 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd ALU

Register Write Occurs Here

Clk PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctr Instruction Memory Access Time Old Value New Value RegWr Old Value New Value Delay through Control Logic busA, B Register File Access Time Old Value New Value busW ALU Delay Old Value New Value Old Value New Value New Value Old Value

Inst fetch Decode Opr. fetch Execute Write Back

slide-32
SLIDE 32

cps 104 32

RTL: The OR Immediate Instruction

° ori rt, rs, imm16

  • mem[PC]

Fetch the instruction from memory

  • R[rt] <- R[rs] or ZeroExt(imm16)

The OR operation

  • PC <- PC + 4

Calculate the next instruction’s address

immediate 16 15 31 16 bits 16 bits 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits

slide-33
SLIDE 33

cps 104 33

Datapath for Logical Operations with Immediate

° R[rt] <- R[rs] op ZeroExt[imm16]] Example: ori rt, rs, imm16

32 Result ALUctr Clk busW RegWr 32 32 busA 32 busB 5 5 5 Rw Ra Rb 32 32-bit Registers Rs Rt Don’t Care (Rt) Rd RegDst ZeroExt Mux Mux 32 16 imm16 ALUSrc ALU

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits

slide-34
SLIDE 34

cps 104 34

RTL: The Load Instruction

° lw rt, rs, imm16

  • mem[PC]

Fetch the instruction from memory

  • Address <- R[rs] + SignExt(imm16)

Calculate the memory address R[rt] <- Mem[Address] Load the data into the register

  • PC <- PC + 4

Calculate the next instruction’s address

immediate 16 15 31 16 bits 16 bits 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 15 31 immediate 16 bits 16 bits 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits

slide-35
SLIDE 35

cps 104 35

Datapath for Load Operations

° R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits 32 ALUctr Clk busW RegWr 32 32 busA 32 busB 5 5 5 Rw Ra Rb 32 32-bit Registers Rs Rt Don’t Care (Rt) Rd RegDst Extender Mux Mux 32 16 imm16 ALUSrc ExtOp Mux MemtoReg Clk Data In WrEn 32 Adr Data Memory 32 ALU MemWr

slide-36
SLIDE 36

cps 104 36

RTL: The Store Instruction

° sw rt, rs, imm16

  • mem[PC]

Fetch the instruction from memory

  • Address <- R[rs] + SignExt(imm16)

Calculate the memory address

  • Mem[Address] <- R[rt]

Store the register into memory

  • PC <- PC + 4

Calculate the next instruction’s address

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits

slide-37
SLIDE 37

cps 104 37

Datapath for Store Operations

° Mem[R[rs] + SignExt[imm16] <- R[rt]] Example: sw rt, rs, imm16

32 ALUctr Clk busW RegWr 32 32 busA 32 busB 5 5 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rt Rd RegDst Extender Mux Mux 32 16 imm16 ALUSrc ExtOp Mux MemtoReg Clk Data In WrEn 32 Adr Data Memory 32 MemWr ALU

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits

slide-38
SLIDE 38

cps 104 38

RTL: The Branch Instruction

° beq rs, rt, imm16

  • mem[PC]

Fetch the instruction from memory

  • Cond <- R[rs] - R[rt]

Calculate the branch condition

  • if (COND eq 0)

Calculate the next instruction’s address

  • PC <- PC + 4 + ( SignExt(imm16) x 4 )
  • else
  • PC <- PC + 4
  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits

slide-39
SLIDE 39

cps 104 39

Datapath for Branch Operations

° beq rs, rt, imm16 We need to compare Rs and Rt!

  • p

rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 5 bits ALUctr Clk busW RegWr 32 32 busA 32 busB 5 5 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rt Rd RegDst Extender Mux Mux 32 16 imm16 ALUSrc ExtOp ALU PC Clk Next Address Logic 16 imm16 Branch To Instruction Memory Zero

slide-40
SLIDE 40

cps 104 40

Binary Arithmetic for the Next Address

° In theory, the PC is a 32-bit byte address into the instruction memory:

  • Sequential operation: PC<31:0> = PC<31:0> + 4
  • Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4

° The magic number “4” always comes up because:

  • The 32-bit PC is a byte address
  • And all our instructions are 4 bytes (32 bits) long

° In other words:

  • The 2 LSBs of the 32-bit PC are always zeros
  • There is no reason to have hardware to keep the 2 LSBs

° In practice, we can simplify the hardware by using a 30-bit PC<31:2>:

  • Sequential operation: PC<31:2> = PC<31:2> + 1
  • Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]
  • In either case: Instruction-Memory-Address = PC<31:2> concat “00”
slide-41
SLIDE 41

cps 104 41

Next Address Logic: Expensive and Fast Solution

° Using a 30-bit PC:

  • Sequential operation: PC<31:2> = PC<31:2> + 1
  • Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]
  • In either case: Instruction-Memory-Address = PC<31:2> concat “00”

30 30 SignExt 30 16 imm16 Mux 1 Adder “1” PC Clk Adder 30 30 Branch Zero Addr<31:2> Instruction Memory Addr<1:0> “00” 32 Instruction<31:0> Instruction<15:0> 30

slide-42
SLIDE 42

cps 104 42

Next Address Logic

30 30 SignExt 30 16 imm16 Mux 1 Adder “0” PC Clk 30 Branch Zero Addr<31:2> Instruction Memory Addr<1:0> “00” 32 Instruction<31:0> 30 “1” Carry In Instruction<15:0>

slide-43
SLIDE 43

cps 104 43

RTL: The Jump Instruction

° j target

  • mem[PC]

Fetch the instruction from memory

  • PC <- PC+4<31:28> concat target<25:0> concat <00>

Calculate the next instruction’s address

  • p

target address 26 31 6 bits 26 bits

slide-44
SLIDE 44

cps 104 44

Instruction Fetch Unit

30 30 SignExt 30 16 imm16 Mux 1 Adder “1” PC Clk Adder 30 30 Branch Zero “00” Addr<31:2> Instruction Memory Addr<1:0> 32 Mux 1 26 4 PC+4<31:28> Target 30

° j target

  • PC<31:2> <- PC+4<31:28> concat target<25:0>

Jump Instruction<15:0> Instruction<31:0> 30 Instruction<25:0>

slide-45
SLIDE 45

cps 104 45

Putting it All Together: A Single Cycle Datapath

32 ALUctr Clk busW RegWr 32 32 busA 32 busB 5 5 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rt Rd RegDst Extender Mux Mux 32 16 imm16 ALUSrc ExtOp Mux MemtoReg Clk Data In WrEn 32 Adr Data Memory 32 MemWr ALU Instruction Fetch Unit Clk Zero Instruction<31:0> Jump Branch

° We have everything except control signals.

1 1 1 <21:25> <16:20> <11:15> <0:15> Imm16 Rd Rs Rt