Hypothetical Single-cycle Implementation of DLX Assume Each - - PowerPoint PPT Presentation

hypothetical single cycle implementation of dlx
SMART_READER_LITE
LIVE PREVIEW

Hypothetical Single-cycle Implementation of DLX Assume Each - - PowerPoint PPT Presentation

Hypothetical Single-cycle Implementation of DLX Assume Each instructions completes in 1 (LONG!!) clock cycle Registers have stable values following rising clock edge During clock cycle: 1. Instruction is read from Instruction memory (IM)


slide-1
SLIDE 1

Hypothetical Single-cycle Implementation of DLX

Assume Each instructions completes in 1 (LONG!!) clock cycle

  • Registers have stable values following rising clock edge

During clock cycle: 1. Instruction is read from Instruction memory (IM) 2. Decoded and control signals for use during the cycle are generated 3. Register values are read 4. ALU outputs are generated 5. Data Memory is read or written for Load or Store 6. New PC value is computed

  • All registers and memory are updated at next rising clock edge.

1

slide-2
SLIDE 2

Datapaths

R-R, R-Imm, lw, sw:

PC

IM ALU Register File

Decode

ALUop RWrite DATA

a b c rs rt (rs) d

+

4

EXT

p q

DM

MREAD MUX

(rt)

ALUSel

MUX

RDataSel

MWRITE

ALUop Rwrite ALUSel RDataSel WSel

MREAD MWRITE

MUX

WSel

rd

ADDR DATA

Cycle 3:Datapath for lw

2

slide-3
SLIDE 3

Execution of an RI instruction R-R, R-Imm, lw, sw:

PC

IM ALU Register File

Decode

ALUop RWrite DATA

a b c rs rt (rs) d

+

4

EXT

p q

DM

MREAD MUX

(rt)

ALUSel

MUX

RDataSel

MWRITE

ALUop Rwrite ALUSel RDataSel WSel

MREAD MWRITE

MUX

WSel

rd

ADDR DATA 3

slide-4
SLIDE 4

Single Cycle Design

  • Cycle time determined by longest instruction
  • No reuse of Functional Units (Separate IM and DM, ALU and Increment Unit)

LW

IM READ DECODE REG READ ADDRESS DATA MEMORY READ REG WRITE PC+4 IM READ DECODE REG READ ADD REG WRITE PC+4

ADD

IDLE

4

slide-5
SLIDE 5

Multi Cycle Implementation

P IR

MEM

MDR

REG FILE ALU

ALU OUT

A B

PCWrite IRWrite MEMRead ALUop C

4

MDRWrite AWrite ALUWrite BWrite

STATE MACHINE DECODER

1

p q

5

slide-6
SLIDE 6

Multi-Cycle Design State Machine Model

LD (5 cycles) ADD (4 cycles)

S1 S0

Instruction Fetch: IR = IM[PC]; PC = PC+4 Instruction Decode: Generate Control Signals A = REG[rs] B = REG[rt] ALUout = PC + Shift(SE(offset)) R-R : p= A q = B ALUout = p op q lw : p= A q = SE(d) ALUout = p op q sw : p= A q =SE(d) ALUout = p op q R-R : REG[rd] = ALUout lw : MDR = DM[ALUout] lw : REG[rt] = MDR

S0 S1 S2 S3 S5 S8 10 S6 S7

sw : DM[ALUout] = B

S9

beq : p = A; q = B; Z = (p .eq. q); If (z == 1) PC = ALUOUT;

S10

S0 S5 S1 S2 S6 S7 S3 S0

2 To S0 To S0 To S0 To S0

6

slide-7
SLIDE 7

Cycle 1: Instruction Fetch Datapath

P

IR

MEM

MDR

REG ALU

ALUout

A B

PCWrite IRWrite MEMRead ALUop C

4

IR = MEM[PC] PC = PC+4

S0

Assert PCWrite, IRWrite, MemRead Set ALUop to ADD Set MUXes at ALU inputs and PC 7

IR CONTROL

FSM

S0 S1

slide-8
SLIDE 8

Cycle 2:Datapath for LW PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE << Optimistic Reads of register file Optimistic computing

  • f Branch target

address

d

IR

CONTROL

ALUop

Set ALUop to ADD Set MUXes at ALU inputs Assert AWRITE, BWRITE, ALUWRITE S0 S1

AWRITE BWRITE

8

FSM

S1 S5

rs rt

ALUWRITE

slide-9
SLIDE 9

Cycle 3:Datapath for lw PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE

IR CONTROL

ALUop

S0 S1 Set ALUop to ADD Set MUXes at ALU inputs Assert ALUWRITE S5

ALUWRITE

9

FSM

S5 S6

slide-10
SLIDE 10

Datapath Control PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE << MUX

4

MUX

10

slide-11
SLIDE 11

Cycle 4: Datapath for lw PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE <<

MDRWRITE MEM READ

S0 S1 Assert MEM READ Assert MDRWRITE S5 S6 11

IR CONTROL

FSM

S6 S7

slide-12
SLIDE 12

Cycle 5: Datapath for lw PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE << RegWrite

DATA rt

S0 S1 Assert REGWRITE Set MUXEs for DEST REG and DATA S5 S6 S7 12

IR CONTROL

FSM

S7 S0

slide-13
SLIDE 13

Datapath Control PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE << MUX MUX c

DATA rt r d 13

slide-14
SLIDE 14

Datapath Control PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE << MUX

14

PCSELECT

slide-15
SLIDE 15

Cycle 1: Instruction Fetch Datapath

P

IR

MEM

MDR

REG ALU

ALUout

A B

PCWrite IRWrite MEMRead ALUop C

4

IR = MEM[PC] PC = PC+4

S0

Assert PCWrite, IRWrite, MemRead Set ALUop to ADD Set MUXes at ALU inputs and PC 15

IR CONTROL

FSM

S0 S1

slide-16
SLIDE 16

Cycle 2:Datapath for BEQ PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE << Optimistic Reads of register file Optimistic computing

  • f Branch target

address

d

IR

CONTROL

ALUop

Set ALUop to ADD Set MUXes at ALU inputs Assert AWRITE, BWRITE, ALUWRITE S0 S1

AWRITE BWRITE

16

FSM

S1 S10

rs rt

ALUWRITE

slide-17
SLIDE 17

Cycle 3:Datapath for BEQ PC

IR

MEM

MDR

REG ALU

ALUout

A B

SE

IR CONTROL

ALUop

S0 S1 Set ALUop to ADD Set MUXes at ALU inputs Assert PCWRITE if z equals 1 S10

ALUWRITE

17

FSM

S10 S0

z PCWRITE

slide-18
SLIDE 18

Performance Model

Processor Model m classes of instruction: Ij , j = 1, …m Instructions of class Ij require Cj clock cycles to execute Clock Frequency = F (Hz = cycles/sec) Clock Period = 1/F (sec / cycle) Program Model Executes a total of N instructions: Nj of class Ij: N = ∑ j = 1, ..,m Nj Program Execution Time (Clock Cycles) = ∑ j = 1, ..,m ( Nj x Cj ) Program Execution Time (sec) = TN = ( ∑ j = 1, ..,m Nj x Cj ) / F Average Cycles Per Instruction (CPI): Total number of cycles/ Instruction Count = ∑ j = 1, ..,m ( Nj x Cj ) / N Depends on both the processor and the mix of instructions in the program

LD (5 cycles) ADD (4 cycles) 18

slide-19
SLIDE 19

Processor performance measures

  • Most useful metrics depends on the benchmark program being measured
  • Benchmark independent measures alone (e.g. clock speed) do not provide sufficient information
  • Multi-cycle design: Program with all BEQ instructions is 1.67 times faster than one with all LDs !!
  • Real processors: Relation between instruction mix and performance is complex
  • Instruction execution time may depend on context of instruction
  • Raises issues regarding choice of benchmark programs, degree and modes of optimization permitted, etc.
  • Imperative to understand the issues behind the performance numbers presented by different vendors
  • CPI (Cycles per Instruction): Average number of clock cycles to execute an instruction

i = 1, ..,m ( Ni x Ci ) / N = ∑ i = 1, ..,m ( Ni x Ci ) / ∑ i = 1, ..,m Ni

  • IPC (Instructions per Cycle): Average number of instructions executed per clock cycle
  • IPC = 1/ CPI
  • Sequential processors require several clocks for an instruction, so CPI > 1 and IPC < 1
  • Simple Pipelining aims for an IPC of 1
  • Superscalar,VLIW, SMT processor designs try to increase the IPC beyond 1
  • MIPS (Millions of Instructions per second):
  • MIPS = 10-6 x N/ TN = 10-6 x F x (∑ i = 1, ..,m Ni ) / ( ∑ i = 1, ..,m Ni x Ci )
  • MIPS = 10-6 x F (cycles/sec) x IPC (instructions/cycle) = 10-6 x 1 / (Clock Period x CPI )
  • Program Execution Time (microsec 10-6 sec)
  • TN = N / MIPS = N x CPI x Clock Period (microseconds) = N x CPI / F(MHz)

19